Re: [datameet] Help with R logic - near similar name

Herry Gulabani Sun, 30 Aug 2020 07:09:46 -0700

Ram,

Would it be possible to add a number of beds to Hospital data? If so, it
could be a great data source to complement the Census Number of
Hospital Beds data.


Or is it some kind of Location/Map data that you are scrapping to get the
name and location of Hospitals.

Great work by the way!

On Sat, Aug 29, 2020 at 11:23 PM rammano...@gmail.com <
rammanohar....@gmail.com> wrote:

> Thank you Dilawar, Rahul, Ravikant, Sudatta, Madhu, Nikhil:
>
> I mix-matched all the options you suggested. Finally, I have 18k hospital
> list in India. I will be providing this data from  http://india-data.com/
> , where people can search information by Pincode. Beta version is live
> http://india-data.com/pincode/221107/   .
>
> Thanks again to all.
>
> Regards
> Ram
>
>
>
> On Wednesday, 26 August 2020 at 02:50:09 UTC-4 nikh...@gmail.com wrote:
>
>> Hi Ram,
>>
>> I'm not sure about R, but if you have the list in an excel / csv then
>> OpenRefine can help you iron it all out in a jiffy. Check out this article
>> I've written that explains the flow for this particular task:
>> http://datameet.org/2018/06/13/openrefine-bus-stop/
>>
>> OpenRefine is a tool made for non-coders to clean up messy data. Site:
>> https://openrefine.org/
>>
>> --
>> Cheers,
>> Nikhil VJ
>> https://nikhilvj.co.in
>>
>>
>> On Wed, Aug 26, 2020 at 6:21 AM m...@ncf-india.org <m...@ncf-india.org>
>> wrote:
>>
>>> Hi Ram
>>>
>>> In addition to the helpful suggestions made above, here are some
>>> R-specific pointers:
>>> — stringr is an extremely helpful package with which to do most of the
>>> string manipulation actions (whitespace removal, tokenisation, regex
>>> matching) recommended above.
>>> — you may also need a package that helps you compute ‘distances’ between
>>> the strings you are comparing. stringdist is one such package. However,
>>> with Indian names, I found some of the phonetic distance algorithms
>>> (rogerroot, soundex) in the phonics package much more helpful.
>>>
>>> Hope this helps! Good luck!
>>> Madhu
>>>
>>> On Wednesday, 26 August 2020 at 00:48:45 UTC+5:30 sudat...@gmail.com
>>> wrote:
>>>
>>>> Hi Ram,
>>>>
>>>> Faced with similar issues, the following worked for me -
>>>>
>>>> 1. Make everything lower or upper case using tolower/ toupper
>>>> 2. Grep to match the common pattern of name
>>>>
>>>> Best,
>>>> Sudatta
>>>>
>>>> On Aug 25, 2020, at 7:52 AM, Rahul Gupta <rahulgu...@gmail.com> wrote:
>>>>
>>>> Hi Ram,
>>>>
>>>> Not sure if there is something very similar to FuzzyWuzzy (Python) in
>>>> R. But you can try this link
>>>> https://astrostatistics.psu.edu/su07/R/html/base/html/agrep.html
>>>>
>>>> It is similar kind of approximate string matching. You can set your own
>>>> threshold criteria and filter data accordingly.
>>>>
>>>> On Tue, 25 Aug, 2020, 8:09 pm rammano...@gmail.com, <
>>>> rammano...@gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I have collected hospital data from multiple sources. However, each
>>>>> source have different name. Trying to clean list with no duplicates. I am
>>>>> using R and couldn't resolve with stringdist_join . Appreciate you
>>>>> suggesting some approach.
>>>>>
>>>>> For example, Guntur (A.P) is listed with following names. Can we mark
>>>>> (or eliminate) duplicate?
>>>>>
>>>>> Example 1
>>>>> SANKARA EYE HOSPITAL(GUNTUR)
>>>>> SANKARA EYE HOSPITAL
>>>>> SANKARA EYE HOSPITAL ( A UNIT OF SRI KANCHI KAMA KOTI MEDICAL TRUST)
>>>>>
>>>>>
>>>>> Example 2
>>>>> ASHIRWAD HEART HOSPITAL ( GHATKOPAR )
>>>>> Ashirwad Heart Hospital
>>>>> ASHIRWAD HEART HOSPITAL ( GHATKOPAR )
>>>>> Ashirwad Heart Hospita-Ghatkopar
>>>>>
>>>>> Thanks
>>>>> Ram
>>>>>
>>>>> --
>>>>> Datameet is a community of Data Science enthusiasts in India. Know
>>>>> more about us by visiting http://datameet.org
>>>>> ---
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "datameet" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to datameet+u...@googlegroups.com.
>>>>> To view this discussion on the web visit
>>>>> https://groups.google.com/d/msgid/datameet/19ee8101-84ec-42b0-974a-43035b5902f1n%40googlegroups.com
>>>>> <https://groups.google.com/d/msgid/datameet/19ee8101-84ec-42b0-974a-43035b5902f1n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>> .
>>>>>
>>>> --
>>>> Datameet is a community of Data Science enthusiasts in India. Know more
>>>> about us by visiting http://datameet.org
>>>> ---
>>>> You received this message because you are subscribed to the Google
>>>> Groups "datameet" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to datameet+u...@googlegroups.com.
>>>>
>>>> To view this discussion on the web visit
>>>> https://groups.google.com/d/msgid/datameet/CAKxLuZeB5_2K4Td%3DP8-_AjFob9Wp2Vc9jic649HD%2BV1itEpYfg%40mail.gmail.com
>>>> <https://groups.google.com/d/msgid/datameet/CAKxLuZeB5_2K4Td%3DP8-_AjFob9Wp2Vc9jic649HD%2BV1itEpYfg%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>>> .
>>>>
>>>> --
>>> Datameet is a community of Data Science enthusiasts in India. Know more
>>> about us by visiting http://datameet.org
>>> ---
>>> You received this message because you are subscribed to the Google
>>> Groups "datameet" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to datameet+u...@googlegroups.com.
>>>
>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/datameet/ccf8287d-4b7e-4fe3-8efd-b15614f7f056n%40googlegroups.com
>>> <https://groups.google.com/d/msgid/datameet/ccf8287d-4b7e-4fe3-8efd-b15614f7f056n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>> --
> Datameet is a community of Data Science enthusiasts in India. Know more
> about us by visiting http://datameet.org
> ---
> You received this message because you are subscribed to the Google Groups
> "datameet" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to datameet+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/datameet/401db8e5-feb0-4ccd-a942-734df8d4f0ban%40googlegroups.com
> <https://groups.google.com/d/msgid/datameet/401db8e5-feb0-4ccd-a942-734df8d4f0ban%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>


-- 
Herry Gulabani
Master Of Planning (2019)
USC Sol Price School of Public Policy
(213)431-7634 | gulabanihe...@gmail.com
Website: gulabani.wixsite.com/portfolio

-- 
Datameet is a community of Data Science enthusiasts in India. Know more about 
us by visiting http://datameet.org
--- 
You received this message because you are subscribed to the Google Groups 
"datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/datameet/CAAPGWhi6yva3Fm4k7kTd9JbCD6GUXUMB_%2BfBPLV8r0w9gGWLUg%40mail.gmail.com.

Re: [datameet] Help with R logic - near similar name

Reply via email to