Richard Huxton wrote:
Christophe wrote:
Now, since this database has been production since 7.2 days, cruft has crept in: in particular, there are duplicate email addresses, some with mismatched attributes. The policy decision by the client is that the correct row is the one with the earliest timestamp.

Something like (untested):

CREATE TEMPORARY TABLE earliest_duplicates AS
SELECT
  email AS tgt_email,
  min(create_date) AS tgt_date
FROM mytable
GROUP BY email
HAVING count(*) > 1;

DELETE FROM mytable USING earliest duplicates
WHERE email=tgt_email AND create_date > tgt_date;



If it is possible that two rows exist for the same email/date; then you will likely need to deal with these manually.

If you rerun the above SELECT after running the delete you should identify these rows.

--
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql

Reply via email to