On Fri, Nov 25, 2011 at 11:05 AM, Rich Shepard <rshep...@appl-ecosys.com>wrote:

>  The data originated in a spreadsheet and, based on my experience, contains
> duplicate records. After reformatting there are 143,260 rows to insert in
> the table. The approach I tried seems to have problems (explained below)
> and
> I would like to learn the proper way to insert rows in either an empty
> table
> or one with existing rows since I'll need to do this procedure for my
> projects.
>
>  The table was created with the primary key and I used INSERT INTO ... to
> load the data. Many duplicate records, so I split the file into smaller
> ones
> and re-ran the command to load them. I'd then remove the reported duplicate
> rows from the source (text) file. Between runs, I'd issue the DROP TABLE
> command within psql and check it was empty using 'select count(*) from
> waterchem;'.
>
>  It appeared that I removed duplicates from the first couple of smaller
> files so I combined them into one file named ok.sql. But, when I tested the
> combined file it, too, reported many duplicate records. Something wrong
> here.
>
>  Perhaps a better approach is to put the CREATE TABLE command above the
> INSERT INTO section of the file (without specifying a PK), load that using
> the command 'psql -d <database> -f waterchem.sql', then add the PK and
> remove duplicates as postgres presents them. Or, perhaps there is a much
> more efficient way to accomplish this task.
>
>  Rather than my flailing around and spending a lot of time failing to load
> all non-duplicate rows into the table I'd like to learn the proper way to
> accomplish this task. Suggestions, recommendations, and your experiences
> are
> requested.
>

Why don't you first load the data into a table (no primary key), then use
SQL to find your dups?

once loaded:
  SELECT <primary_key_column>, count(1) from <table> group by 1 having
count(1) > 1;

 At least then, you'll really know what you're in for.  You can either
script a DELETE or... whatever you want to do, once clean, you can add the
PK.

--
Scott

>
> TIA,
>
> Rich
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/**mailpref/pgsql-general<http://www.postgresql.org/mailpref/pgsql-general>
>

Reply via email to