On Fri, Nov 25, 2011 at 11:05 AM, Rich Shepard <rshep...@appl-ecosys.com>wrote:
> The data originated in a spreadsheet and, based on my experience, contains > duplicate records. After reformatting there are 143,260 rows to insert in > the table. The approach I tried seems to have problems (explained below) > and > I would like to learn the proper way to insert rows in either an empty > table > or one with existing rows since I'll need to do this procedure for my > projects. > > The table was created with the primary key and I used INSERT INTO ... to > load the data. Many duplicate records, so I split the file into smaller > ones > and re-ran the command to load them. I'd then remove the reported duplicate > rows from the source (text) file. Between runs, I'd issue the DROP TABLE > command within psql and check it was empty using 'select count(*) from > waterchem;'. > > It appeared that I removed duplicates from the first couple of smaller > files so I combined them into one file named ok.sql. But, when I tested the > combined file it, too, reported many duplicate records. Something wrong > here. > > Perhaps a better approach is to put the CREATE TABLE command above the > INSERT INTO section of the file (without specifying a PK), load that using > the command 'psql -d <database> -f waterchem.sql', then add the PK and > remove duplicates as postgres presents them. Or, perhaps there is a much > more efficient way to accomplish this task. > > Rather than my flailing around and spending a lot of time failing to load > all non-duplicate rows into the table I'd like to learn the proper way to > accomplish this task. Suggestions, recommendations, and your experiences > are > requested. > Why don't you first load the data into a table (no primary key), then use SQL to find your dups? once loaded: SELECT <primary_key_column>, count(1) from <table> group by 1 having count(1) > 1; At least then, you'll really know what you're in for. You can either script a DELETE or... whatever you want to do, once clean, you can add the PK. -- Scott > > TIA, > > Rich > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/**mailpref/pgsql-general<http://www.postgresql.org/mailpref/pgsql-general> >