Wouldn't UUID PK cause a significant drop in insert performance because every 
insert is now out of order, which leads to a constant re-arranging of the B+ 
tree? The amount of random IO's that's going to generate would just kill the 
performance.

--- On Fri, 10/15/10, Craig Ringer <cr...@postnewspapers.com.au> wrote:

From: Craig Ringer <cr...@postnewspapers.com.au>
Subject: Re: [PERFORM] UUID performance as primary key
To: "Navkirat Singh" <navkir...@gmail.com>
Cc: pgsql-performance@postgresql.org
Date: Friday, October 15, 2010, 10:59 PM

On 16/10/2010 9:58 AM, Navkirat Singh wrote:
> Hi Guys,
> 
> I am interested in finding out the pros/cons of using UUID as a primary key 
> field. My requirement states that UUID would be perfect in my case as I will 
> be having many small databases which will link up to a global database using 
> the UUID. Hence, the need for a unique key across all databases. It would be 
> extremely helpful if someone could help me figure this out, as it is critical 
> for my project.

Pro: No need for (serverid,serverseq) pair primary keys or hacks with modulus 
based key generation. Doesn't set any pre-determined limit on how many 
servers/databases may be in a cluster.

Con: Slower than modulo key generation approach, uses more storage. Foreign key 
relationships may be slower too.

Overall, UUIDs seem to be a favoured approach. The other way people seem to do 
this is by assigning a unique instance id to each server/database out of a 
maximum "n" instances decided at setup time. Every key generation sequence 
increments by "n" whenever it generates a key, with an offset of the 
server/database id. That way, if n=100, server 1 will generate primary keys 
001, 101, 201, 301, ..., server 2 will generate primary keys 002, 102, 202, 
302, ... and so on.

That works great until you need more than 100 instances, at which point you're 
really, REALLY boned. In really busy systems it also limits the total amount of 
primary key space - but with BIGINT primary keys, that's unlikely to be 
something you need to worry about.

The composite primary key (serverid,sequenceid) approach avoids the need for a 
pre-defined maximum number of servers, but can be slow to index and can require 
more storage, especially because of tuple headers.

I have no firsthand experience with any of these approaches so I can't offer 
you a considered opinion. I know that the MS-SQL crowd at least strongly prefer 
UUIDs, but they have very strong in-database UUID support. MySQL folks seem to 
mostly favour the modulo primary key generation approach. I don't see much 
discussion of the issue here - I get the impression Pg doesn't see heavy use in 
sharded environments.

-- Craig Ringer

Tech-related writing at http://soapyfrogs.blogspot.com/

-- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance



      

Reply via email to