Richard Huxton wrote:
Christophe wrote:
Now, since this database has been production since 7.2 days, cruft has
crept in: in particular, there are duplicate email addresses, some
with mismatched attributes. The policy decision by the client is that
the correct row is the one with the earliest timestamp.
Something like (untested):
CREATE TEMPORARY TABLE earliest_duplicates AS
SELECT
email AS tgt_email,
min(create_date) AS tgt_date
FROM mytable
GROUP BY email
HAVING count(*) > 1;
DELETE FROM mytable USING earliest duplicates
WHERE email=tgt_email AND create_date > tgt_date;
If it is possible that two rows exist for the same email/date; then you
will likely need to deal with these manually.
If you rerun the above SELECT after running the delete you should
identify these rows.
--
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql