On Tue, 24 May 2011 12:57:57 -0400 Lew <no...@lewscanon.com> wrote: >Tarlika Elisabeth Schmitz wrote: >>this was just a TEMPORARY table I created for quick analysis >> of my CSV data (now renamed to temp_person). >Ah, yes, that makes much more sense. Temporary tables such as you >describe can be very convenient and effective.
The ER model is pretty simple. It has only 30 tables, of which 12 are "active" tables; the rest are pretty static (e.g. COUNTRY). There are 7-8 CVS formats and my idea is to import, clean up, normalize and distribute the data via temp tables and associated triggers. The whole process has to be pretty much automated as volume is too big to babysit import. As I said in my other post, with the daily import, I can't reject data just because part of the data cannot [yet] be tidied because other data will relate to these. Therefore my intention is to partially import (using the above example import a PERSON without resolving location) and flag up cases for later manual intervention in a log table. Some data just can't be cleaned up and I have to take a pragmatic approach - for instance, non-existing countries: I see there is an ISO standard for split countries such as Czechoslovakia (http://en.wikipedia.org/wiki/ISO_3166-3). That gets 12 PERSONs off my list. But what on earth do I do with "North America". Create a new country because some info is better than none? I can hardly make them US citizens because I know how upset people get when mistaken for their neighbours. >I think this problem is very widespread, namely how to get structured >information out of freeform data. The PERSON.location is peanuts compared to other data of 20x the volume and the really important information encoded in a freeform string with all sorts of abbreviations used. >That said, if you have a robust process to correct errors as the user >population discovers them, then you can approach perfection >asymptotically.[...] 1 user - moi > From an engineering standpoint, user feedback is a vital element of >homeostatic control. I'll be having a lot of conversations with myself. ;-) -- Best Regards, Tarlika Elisabeth Schmitz -- Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-sql