Re: [SQL] surrogate vs natural primary keys

Steve Midgley Wed, 17 Sep 2008 22:01:32 -0700

To: [email protected]
From:  Seb <[EMAIL PROTECTED]>
Subject: Re: surrogate vs natural primary keys
Date:  Mon, 15 Sep 2008 17:56:31 -0500
Organization:  Church of Emacs
Lines: 20
Message-ID:  <[EMAIL PROTECTED]>
References:  <[EMAIL PROTECTED]>
        <[EMAIL PROTECTED]>
X-Archive-Number: 200809/101
X-Sequence-Number: 31553


On Mon, 15 Sep 2008 16:45:08 -0600,
"Scott Marlowe" <[EMAIL PROTECTED]> wrote:

[...]

> I think this question is a lot like "how large should I set

> shared_buffers?" There's lots of different answers based on howyou

> are using your data.

Yes, this is precisely what I'm after: *criteria* to help me decide
which approach to take for different scenarios.  Such guidance is what
seems to be lacking from most of the discussions I've seen on the
subject.  It's hard to distill this information when most of the
discussion is centered on advocating one or the other approach.

I think Scott and others have laid out the main ideas in a verycool-headed way already, but here's my follow-on input:

I agree with Andrew Sullivan that using industry standard id's as yourprimary key can be problematic. But I do sometimes apply unique indicesto such "industry standard" columns to ensure they are in fact uniqueand can be a surrogate for the "real" integer/serial primary key.

As a rule, I have decided to stay away from "meaningful" (natural)primary keys for these reasons:

1) They sometimes change b/c of business rule changes, forcingtechnical changes to the relationship model, when only internal tableschema changes should be required to support the new businessrequirements.

2) Generating arbitrary/surrogate keys is easier b/c you can usesequence generators. (When creating a new record, I have to figure outthe value of a meaningful column before saving the record whichsometimes I don't want to do!)

3) Surrogate keys are guaranteed unique regardless of semantic contentof the table.

4) All tables can all join to each other in the same ways: property.idholds the same data type as contact.id. All id fields are the same intype/format.

I think there's even a reasonable argument for "globally unique"surrogate keys: all keys for any table use the same sequence of id's. Iimplemented a system in the 90's that used globally unique id's and itopened up some interesting solutions that I wouldn't have thought ofwhen I started the project (self joins were the same as foreign joinssince the id's in both entities were guaranteed unique).

I've heard some people argue the use of GUID's for id's but I've beentoo scared to try that in a real system.

Sequential, arbitrary primary keys (as surrogate keys) are predictablethough. So if you share those keys with the public (via URL's forexample), then competitors can learn information about your business(how fast keys are generated for a certain table for example).

That's an argument for random, arbitrary primary keys though, not forcompound/meaningful keys.

I think natural or compound keys make more sense to DBA's and let youimplement some kinds of database solutions more quickly.

All in all, I don't really understand the merits of natural keysoutside of data warehouse applications. In data warehouses, in myexperience, compound natural keys just end up turning into fact tables!:)

In summary: I've never heard someone say they've been bitten by usingan arbitrary surrogate key system, but I myself have been bitten andhave heard lots of stories of problems when using natural keys.


I hope this helps some,

Steve


--
Sent via pgsql-sql mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql

Re: [SQL] surrogate vs natural primary keys

Reply via email to