Re: [datameet] Help with R logic - near similar name

2020-08-30 Thread Herry Gulabani
Ram, Would it be possible to add a number of beds to Hospital data? If so, it could be a great data source to complement the Census Number of Hospital Beds data. Or is it some kind of Location/Map data that you are scrapping to get the name and location of Hospitals. Great work by the way! On

Re: [datameet] Help with R logic - near similar name

2020-08-29 Thread rammano...@gmail.com
Thank you Dilawar, Rahul, Ravikant, Sudatta, Madhu, Nikhil: I mix-matched all the options you suggested. Finally, I have 18k hospital list in India. I will be providing this data from http://india-data.com/ , where people can search information by Pincode. Beta version is live

Re: [datameet] Help with R logic - near similar name

2020-08-26 Thread Nikhil VJ
Hi Ram, I'm not sure about R, but if you have the list in an excel / csv then OpenRefine can help you iron it all out in a jiffy. Check out this article I've written that explains the flow for this particular task: http://datameet.org/2018/06/13/openrefine-bus-stop/ OpenRefine is a tool made for

Re: [datameet] Help with R logic - near similar name

2020-08-25 Thread m...@ncf-india.org
Hi Ram In addition to the helpful suggestions made above, here are some R-specific pointers: — stringr is an extremely helpful package with which to do most of the string manipulation actions (whitespace removal, tokenisation, regex matching) recommended above. — you may also need a package

Re: [datameet] Help with R logic - near similar name

2020-08-25 Thread Sudatta Ray
Hi Ram, Faced with similar issues, the following worked for me - 1. Make everything lower or upper case using tolower/ toupper 2. Grep to match the common pattern of name Best, Sudatta > On Aug 25, 2020, at 7:52 AM, Rahul Gupta wrote: > > Hi Ram, > > Not sure if there is something very

Re: [datameet] Help with R logic - near similar name

2020-08-25 Thread Ravikant P
Hi Ram, For one project I had to match a village name in one dataset with another dataset containing ~44000 villages in Maharashtra. I had faced a similar situation. To find exact(or closest) match I had used following tricks from both strings to be compared: 1. remove white spaces 2.

Re: [datameet] Help with R logic - near similar name

2020-08-25 Thread Rahul Gupta
Hi Ram, Not sure if there is something very similar to FuzzyWuzzy (Python) in R. But you can try this link https://astrostatistics.psu.edu/su07/R/html/base/html/agrep.html It is similar kind of approximate string matching. You can set your own threshold criteria and filter data accordingly. On

Re: [datameet] Help with R logic - near similar name

2020-08-25 Thread Dilawar Singh
Not sure what is the equivalent of python difflib (SequenceMatcher) in R. If you have one, it will work. Sent from a handheld device. Pardon the brevity and typos. On Aug 25, 2020, 20:09 +0530, rammano...@gmail.com , wrote: > Hi, > > I have collected hospital data from multiple sources.

[datameet] Help with R logic - near similar name

2020-08-25 Thread rammano...@gmail.com
Hi, I have collected hospital data from multiple sources. However, each source have different name. Trying to clean list with no duplicates. I am using R and couldn't resolve with stringdist_join . Appreciate you suggesting some approach. For example, Guntur (A.P) is listed with following