Re: [SQL] extracting location info from string

Rob Sargent Wed, 25 May 2011 15:08:04 -0700


On 05/25/2011 03:13 PM, Tarlika Elisabeth Schmitz wrote:

On Wed, 25 May 2011 09:25:48 -0600
Rob Sargent<[email protected]>  wrote:



On 05/24/2011 10:57 AM, Lew wrote:

Tarlika Elisabeth Schmitz wrote:

CREATE TABLE person
(
id integer NOT NULL,
"name" character varying(256) NOT NULL,
"location" character varying(256),
CONSTRAINT person_pkey PRIMARY KEY (id)
);

this was just a TEMPORARY table I created for quick analysis
of my CSV data (now renamed to temp_person).


CREATE TABLE country
(
   id character varying(3) NOT NULL, -- alpha-3 code
   "name" character varying(50) NOT NULL,
   CONSTRAINT country_pkey PRIMARY KEY (id)
);

To minimize the ultimately quite necessary human adjudication, one
might make good use of what is often termed "crowd sourcing":  Keep
all the distinct "hand entered" values and a map to the final human
assessment.


I was wondering how to do just that. I don't think it would be a good
idea to hard code this into the clean-up script. Take, for instance,
variations of COUNTRY.NAME spelling. Where would I store these?

I could do with a concept for this problem, which applies to a lot of
string-type info.

I think you keep your current structures used for deducing the canonicalforms, but with each unique input encounter you add it to youseen-thus-far list which becomes just one more check (possibly the firstsuch check).


create table address_input
(
   id unique/sequence,
   human_input character varying(256),
   resolution character varying(256)
)

You may have to add a column for the type of input (if you know forinstance the input is for street address v. country) or you may want theresolution to be portioned in to county, city and so on.



--
Sent via pgsql-sql mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql

Re: [SQL] extracting location info from string

Reply via email to