Ram,
Would it be possible to add a number of beds to Hospital data? If so, it
could be a great data source to complement the Census Number of
Hospital Beds data.
Or is it some kind of Location/Map data that you are scrapping to get the
name and location of Hospitals.
Great work by the way!
On
Thank you Dilawar, Rahul, Ravikant, Sudatta, Madhu, Nikhil:
I mix-matched all the options you suggested. Finally, I have 18k hospital
list in India. I will be providing this data from http://india-data.com/
, where people can search information by Pincode. Beta version is live
Hi Ram,
I'm not sure about R, but if you have the list in an excel / csv then
OpenRefine can help you iron it all out in a jiffy. Check out this article
I've written that explains the flow for this particular task:
http://datameet.org/2018/06/13/openrefine-bus-stop/
OpenRefine is a tool made for
Hi Ram
In addition to the helpful suggestions made above, here are some R-specific
pointers:
— stringr is an extremely helpful package with which to do most of the
string manipulation actions (whitespace removal, tokenisation, regex
matching) recommended above.
— you may also need a package
Hi Ram,
Faced with similar issues, the following worked for me -
1. Make everything lower or upper case using tolower/ toupper
2. Grep to match the common pattern of name
Best,
Sudatta
> On Aug 25, 2020, at 7:52 AM, Rahul Gupta wrote:
>
> Hi Ram,
>
> Not sure if there is something very
Hi Ram,
For one project I had to match a village name in one dataset with another
dataset containing ~44000 villages in Maharashtra. I had faced a similar
situation. To find exact(or closest) match I had used following tricks
from both strings to be compared:
1. remove white spaces
2.
Hi Ram,
Not sure if there is something very similar to FuzzyWuzzy (Python) in R.
But you can try this link
https://astrostatistics.psu.edu/su07/R/html/base/html/agrep.html
It is similar kind of approximate string matching. You can set your own
threshold criteria and filter data accordingly.
On
Not sure what is the equivalent of python difflib (SequenceMatcher) in R. If
you have one, it will work.
Sent from a handheld device. Pardon the brevity and typos.
On Aug 25, 2020, 20:09 +0530, rammano...@gmail.com ,
wrote:
> Hi,
>
> I have collected hospital data from multiple sources.
Hi,
I have collected hospital data from multiple sources. However, each source
have different name. Trying to clean list with no duplicates. I am using R
and couldn't resolve with stringdist_join . Appreciate you suggesting some
approach.
For example, Guntur (A.P) is listed with following