The Subject: is somewhat imprecise, but here's what I'm trying to do. For some reason, my brain is locking up over it.

I'm moving a 7.2 (yes) database to 8.4. In the table in question, the structure is along the lines of:

        serial_number   SERIAL, PRIMARY KEY
        email           TEXT
        create_date     TIMESTAMP
        attr1           type
        attr2           type
        attr3           type
        ...

(The point of the "attr" fields is that there are many more columns for each row.)

The new structure removes the "serial_number" field, and uses "email" as the primary key, but is otherwise unchanged:

        email           TEXT, PRIMARY KEY
        create_date     TIMESTAMP
        attr1           type
        attr2           type
        attr3           type
        ...

Now, since this database has been production since 7.2 days, cruft has crept in: in particular, there are duplicate email addresses, some with mismatched attributes. The policy decision by the client is that the correct row is the one with the earliest timestamp. (The timestamps are widely distributed; it's not the case that there is a single timestamp above which all the duplicates live.) Thus, ideally, I want to select exactly one row per "email", picking the row with the earliest timestamp in the case that there is more than one row with that email.

Any suggestions on how to write such a SELECT? Of course, I could do this with an application against the db, but a single SELECT would be great if possible.

TIA!

--
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql

Reply via email to