At 09:50 PM 9/29/2008, Richard Broersma wrote:
On Mon, Sep 29, 2008 at 7:48 PM, Steve Midgley <[EMAIL PROTECTED]>
wrote:
> In my specific case it turns out I only had duplicates, but there
could have
> been n-plicates, so your code is still correct for my use-case
(though I
> didn't say that in my OP).
Ya there are a lot of neat queries that you can construct. If you
have a good background in math and set theory (which I don't have) you
can develop all sorts of powerful analysis queries.
On a side note, I thought that I should mention that unwanted
duplicates are an example where some ~have gotten bitten~ with a
purely surrogate key approach. To make matter worse, is when some
users update part of one duplicate and another updates a different
duplicated on a another field(s). Then once the designer discovers
the duplicate problem, she/he has to figure out some way of merging
these non-exact duplicates. So even if the designer has no intention
of implementing natural primary/foreign keys, he/she will still
benefit from a natural key consideration in that a strategy can be
designed to prevent getting bitten by duplicated data.
I only mention this because db designers get bitten by this all the
time. Well at least the ones that subscribe to www.utteraccess.com
get bitten. From what I've seen not one day has gone by without
someone posting a question to this site about how to both find and
remove all but one of the duplicates.
Truly. I have worked with some school districts around the US and this
duplicate record problem is more than theoretical. Some of the
gnarliest, dirtiest, n-plicate data I've ever seen comes out of the US
public education system.
More generally where I have seen a need for natural keys, I've always
taken the "best of both worlds" approach. So I always stick an
integer/serial PK into any table - why not - they're cheap and
sometimes are handy. And then for tables along the lines of your
description, I add a compound unique index which serves the business
rule of "no dupes along these lines."
Am I following your point? Any reason why using serial PK's with
"compound natural unique indices" is better/worse than just using
natural PK's?
Steve
--
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql