On Dec 8, 2014, at 9:35 PM, Scott Marlowe wrote:
> select a,b,c into newtable from oldtable group by a,b,c;
>
> On pass, done.
This is a bit naive, but couldn't this approach potentially be faster
(depending on the system)?
SELECT a, b, c INTO duplicate_records FROM ( SELECT a, b, c, count(*)
AS counted FROM source_table GROUP BY a, b, c ) q_inner WHERE q_inner.counted >
1;
DELETE FROM source_table USING duplicate_records WHERE source_table.a =
duplicate_records.a AND source_table.b = duplicate_records.b AND source_table.c
= duplicate_records.c;
It would require multiple full table scans, but it would minimize the writing
to disk -- and isn't a 'read' operation usually much more efficient than a
'write' operation? If the duplicate checking is only done on a small subset of
columns, indexes could speed things up too.
--
Sent via pgsql-general mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general