On Thu, 26 May 2011 10:15:50 +1200 Andrej <andrej.gro...@gmail.com> wrote:
>On 26 May 2011 09:13, Tarlika Elisabeth Schmitz ><postgres...@numerixtechnology.de> wrote: >> On Wed, 25 May 2011 09:25:48 -0600 >> Rob Sargent <robjsarg...@gmail.com> wrote: >> >>> >>> >>>On 05/24/2011 10:57 AM, Lew wrote: >>>> Tarlika Elisabeth Schmitz wrote: >>>> >>>>> CREATE TABLE person >>>>> ( >>>>> id integer NOT NULL, >>>>> "name" character varying(256) NOT NULL, >>>>> "location" character varying(256), >>>>> CONSTRAINT person_pkey PRIMARY KEY (id) >>>>> ); >>>>> >>>>> this was just a TEMPORARY table I created for quick analysis >>>>> of my CSV data (now renamed to temp_person). >> >> CREATE TABLE country >> ( >> id character varying(3) NOT NULL, -- alpha-3 code >> "name" character varying(50) NOT NULL, >> CONSTRAINT country_pkey PRIMARY KEY (id) >> ); >> >> >>>To minimize the ultimately quite necessary human adjudication, one >>>might make good use of what is often termed "crowd sourcing": Keep >>>all the distinct "hand entered" values and a map to the final human >>>assessment. >>[...] >> I could do with a concept for this problem, which applies to a lot of >> string-type info. > >I'd start w/ downloading a list as mentioned here: >http://answers.google.com/answers/threadview?id=596822 > >And run it through a wee perl script using >http://search.cpan.org/~maurice/Text-DoubleMetaphone-0.07/DoubleMetaphone.pm >to make phonetic matches ... > >Then I'd run your own data through DoubleMetaphone, and clean up >matches if not too many false positives show up. Many thanks for all your suggestions. It will take me a while to work my way through these as I have several open ends. In a similar vein, the PERSONs names are 1) <firstname> <surname> 2) <initials> <surname> (more common) 3) <title> <initials>|<firstname> <surname> Where I have firstname and or title I'd be quite keen to determine sex as it would be interesting from a statistics point of view to distinguish. I am basically just interested in people from two countries, names mainly English. -- Best Regards, Tarlika Elisabeth Schmitz -- Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-sql