Have you looked at http://en.m.wikipedia.org/wiki/Damerau%E2%80%93Levenshtein_distance
----- Reply message ----- From: "Tarlika Elisabeth Schmitz" <postgres...@numerixtechnology.de> Date: Wed, May 25, 2011 6:13 pm Subject: [SQL] extracting location info from string To: <pgsql-sql@postgresql.org> On Wed, 25 May 2011 09:25:48 -0600 Rob Sargent <robjsarg...@gmail.com> wrote: > > >On 05/24/2011 10:57 AM, Lew wrote: >> Tarlika Elisabeth Schmitz wrote: >> >>> CREATE TABLE person >>> ( >>> id integer NOT NULL, >>> "name" character varying(256) NOT NULL, >>> "location" character varying(256), >>> CONSTRAINT person_pkey PRIMARY KEY (id) >>> ); >>> >>> this was just a TEMPORARY table I created for quick analysis >>> of my CSV data (now renamed to temp_person). CREATE TABLE country ( id character varying(3) NOT NULL, -- alpha-3 code "name" character varying(50) NOT NULL, CONSTRAINT country_pkey PRIMARY KEY (id) ); >To minimize the ultimately quite necessary human adjudication, one >might make good use of what is often termed "crowd sourcing": Keep >all the distinct "hand entered" values and a map to the final human >assessment. I was wondering how to do just that. I don't think it would be a good idea to hard code this into the clean-up script. Take, for instance, variations of COUNTRY.NAME spelling. Where would I store these? I could do with a concept for this problem, which applies to a lot of string-type info. -- Best Regards, Tarlika Elisabeth Schmitz -- Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-sql