Ram, Would it be possible to add a number of beds to Hospital data? If so, it could be a great data source to complement the Census Number of Hospital Beds data.
Or is it some kind of Location/Map data that you are scrapping to get the name and location of Hospitals. Great work by the way! On Sat, Aug 29, 2020 at 11:23 PM rammano...@gmail.com < rammanohar....@gmail.com> wrote: > Thank you Dilawar, Rahul, Ravikant, Sudatta, Madhu, Nikhil: > > I mix-matched all the options you suggested. Finally, I have 18k hospital > list in India. I will be providing this data from http://india-data.com/ > , where people can search information by Pincode. Beta version is live > http://india-data.com/pincode/221107/ . > > Thanks again to all. > > Regards > Ram > > > > On Wednesday, 26 August 2020 at 02:50:09 UTC-4 nikh...@gmail.com wrote: > >> Hi Ram, >> >> I'm not sure about R, but if you have the list in an excel / csv then >> OpenRefine can help you iron it all out in a jiffy. Check out this article >> I've written that explains the flow for this particular task: >> http://datameet.org/2018/06/13/openrefine-bus-stop/ >> >> OpenRefine is a tool made for non-coders to clean up messy data. Site: >> https://openrefine.org/ >> >> -- >> Cheers, >> Nikhil VJ >> https://nikhilvj.co.in >> >> >> On Wed, Aug 26, 2020 at 6:21 AM m...@ncf-india.org <m...@ncf-india.org> >> wrote: >> >>> Hi Ram >>> >>> In addition to the helpful suggestions made above, here are some >>> R-specific pointers: >>> — stringr is an extremely helpful package with which to do most of the >>> string manipulation actions (whitespace removal, tokenisation, regex >>> matching) recommended above. >>> — you may also need a package that helps you compute ‘distances’ between >>> the strings you are comparing. stringdist is one such package. However, >>> with Indian names, I found some of the phonetic distance algorithms >>> (rogerroot, soundex) in the phonics package much more helpful. >>> >>> Hope this helps! Good luck! >>> Madhu >>> >>> On Wednesday, 26 August 2020 at 00:48:45 UTC+5:30 sudat...@gmail.com >>> wrote: >>> >>>> Hi Ram, >>>> >>>> Faced with similar issues, the following worked for me - >>>> >>>> 1. Make everything lower or upper case using tolower/ toupper >>>> 2. Grep to match the common pattern of name >>>> >>>> Best, >>>> Sudatta >>>> >>>> On Aug 25, 2020, at 7:52 AM, Rahul Gupta <rahulgu...@gmail.com> wrote: >>>> >>>> Hi Ram, >>>> >>>> Not sure if there is something very similar to FuzzyWuzzy (Python) in >>>> R. But you can try this link >>>> https://astrostatistics.psu.edu/su07/R/html/base/html/agrep.html >>>> >>>> It is similar kind of approximate string matching. You can set your own >>>> threshold criteria and filter data accordingly. >>>> >>>> On Tue, 25 Aug, 2020, 8:09 pm rammano...@gmail.com, < >>>> rammano...@gmail.com> wrote: >>>> >>>>> Hi, >>>>> >>>>> I have collected hospital data from multiple sources. However, each >>>>> source have different name. Trying to clean list with no duplicates. I am >>>>> using R and couldn't resolve with stringdist_join . Appreciate you >>>>> suggesting some approach. >>>>> >>>>> For example, Guntur (A.P) is listed with following names. Can we mark >>>>> (or eliminate) duplicate? >>>>> >>>>> Example 1 >>>>> SANKARA EYE HOSPITAL(GUNTUR) >>>>> SANKARA EYE HOSPITAL >>>>> SANKARA EYE HOSPITAL ( A UNIT OF SRI KANCHI KAMA KOTI MEDICAL TRUST) >>>>> >>>>> >>>>> Example 2 >>>>> ASHIRWAD HEART HOSPITAL ( GHATKOPAR ) >>>>> Ashirwad Heart Hospital >>>>> ASHIRWAD HEART HOSPITAL ( GHATKOPAR ) >>>>> Ashirwad Heart Hospita-Ghatkopar >>>>> >>>>> Thanks >>>>> Ram >>>>> >>>>> -- >>>>> Datameet is a community of Data Science enthusiasts in India. Know >>>>> more about us by visiting http://datameet.org >>>>> --- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "datameet" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to datameet+u...@googlegroups.com. >>>>> To view this discussion on the web visit >>>>> https://groups.google.com/d/msgid/datameet/19ee8101-84ec-42b0-974a-43035b5902f1n%40googlegroups.com >>>>> <https://groups.google.com/d/msgid/datameet/19ee8101-84ec-42b0-974a-43035b5902f1n%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>> . >>>>> >>>> -- >>>> Datameet is a community of Data Science enthusiasts in India. Know more >>>> about us by visiting http://datameet.org >>>> --- >>>> You received this message because you are subscribed to the Google >>>> Groups "datameet" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to datameet+u...@googlegroups.com. >>>> >>>> To view this discussion on the web visit >>>> https://groups.google.com/d/msgid/datameet/CAKxLuZeB5_2K4Td%3DP8-_AjFob9Wp2Vc9jic649HD%2BV1itEpYfg%40mail.gmail.com >>>> <https://groups.google.com/d/msgid/datameet/CAKxLuZeB5_2K4Td%3DP8-_AjFob9Wp2Vc9jic649HD%2BV1itEpYfg%40mail.gmail.com?utm_medium=email&utm_source=footer> >>>> . >>>> >>>> -- >>> Datameet is a community of Data Science enthusiasts in India. Know more >>> about us by visiting http://datameet.org >>> --- >>> You received this message because you are subscribed to the Google >>> Groups "datameet" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to datameet+u...@googlegroups.com. >>> >> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/datameet/ccf8287d-4b7e-4fe3-8efd-b15614f7f056n%40googlegroups.com >>> <https://groups.google.com/d/msgid/datameet/ccf8287d-4b7e-4fe3-8efd-b15614f7f056n%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> >> -- > Datameet is a community of Data Science enthusiasts in India. Know more > about us by visiting http://datameet.org > --- > You received this message because you are subscribed to the Google Groups > "datameet" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to datameet+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/datameet/401db8e5-feb0-4ccd-a942-734df8d4f0ban%40googlegroups.com > <https://groups.google.com/d/msgid/datameet/401db8e5-feb0-4ccd-a942-734df8d4f0ban%40googlegroups.com?utm_medium=email&utm_source=footer> > . > -- Herry Gulabani Master Of Planning (2019) USC Sol Price School of Public Policy (213)431-7634 | gulabanihe...@gmail.com Website: gulabani.wixsite.com/portfolio -- Datameet is a community of Data Science enthusiasts in India. Know more about us by visiting http://datameet.org --- You received this message because you are subscribed to the Google Groups "datameet" group. To unsubscribe from this group and stop receiving emails from it, send an email to datameet+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/datameet/CAAPGWhi6yva3Fm4k7kTd9JbCD6GUXUMB_%2BfBPLV8r0w9gGWLUg%40mail.gmail.com.