Re: [PERFORM] UUID as primary key

2009-10-16 Thread decibel

On Oct 10, 2009, at 10:40 AM, Mark Mielke wrote:

On 10/10/2009 01:14 AM, tsuraan wrote:

The most significant impact is that it takes up twice as much space,
including the primary key index. This means fewer entries per block,
which means slower scans and/or more blocks to navigate through.  
Still,
compared to the rest of the overhead of an index row or a table  
row, it
is low - I think it's more important to understand whether you  
can get
away with using a sequential integer, in which case UUID is  
unnecessary
overhead - or whether you are going to need UUID anyways. If you  
need

UUID anyways - having two primary keys is probably not worth it.


Ok, that's what I was hoping.  Out of curiosity, is there a preferred
way to store 256-bit ints in postgres?  At that point, is a bytea the
most reasonable choice, or is there a better way to do it?



Do you need to be able to do queries on it? Numeric should be able  
to store 256-bit integers.


If you don't need to do queries on it, an option I've considered in  
the past is to break it up into 4 x int64. Before UUID was  
supported, I had seriously considered storing UUID as 2 x int64.  
Now that UUID is supported, you might also abuse UUID where 1 x 256- 
bit = 2 x UUID.


If you want it to be seemless and fully optimal, you would  
introduce a new int256 type (or whatever the name of the type you  
are trying to represent). Adding new types to PostgreSQL is not  
that hard. This would allow queries (=, , , ) as well.



If you want an example of that, we had Command Prompt create a full  
set of hash datatypes (SHA*, and I think md5). That stuff should be  
on pgFoundry; if it's not drop me a note at jna...@cashnetusa.com and  
I'll get it added.

--
Decibel!, aka Jim C. Nasby, Database Architect  deci...@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] UUID as primary key

2009-10-16 Thread Alvaro Herrera
decibel escribió:

 If you want it to be seemless and fully optimal, you would
 introduce a new int256 type (or whatever the name of the type you
 are trying to represent). Adding new types to PostgreSQL is not
 that hard. This would allow queries (=, , , ) as well.
 
 If you want an example of that, we had Command Prompt create a full
 set of hash datatypes (SHA*, and I think md5). That stuff should be
 on pgFoundry; if it's not drop me a note at jna...@cashnetusa.com
 and I'll get it added.

It's at project shatypes.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] UUID as primary key

2009-10-16 Thread decibel

On Oct 10, 2009, at 10:40 AM, Mark Mielke wrote:

On 10/10/2009 01:14 AM, tsuraan wrote:

The most significant impact is that it takes up twice as much space,
including the primary key index. This means fewer entries per block,
which means slower scans and/or more blocks to navigate through.  
Still,
compared to the rest of the overhead of an index row or a table  
row, it
is low - I think it's more important to understand whether you  
can get
away with using a sequential integer, in which case UUID is  
unnecessary
overhead - or whether you are going to need UUID anyways. If you  
need

UUID anyways - having two primary keys is probably not worth it.


Ok, that's what I was hoping.  Out of curiosity, is there a preferred
way to store 256-bit ints in postgres?  At that point, is a bytea the
most reasonable choice, or is there a better way to do it?



Do you need to be able to do queries on it? Numeric should be able  
to store 256-bit integers.


If you don't need to do queries on it, an option I've considered in  
the past is to break it up into 4 x int64. Before UUID was  
supported, I had seriously considered storing UUID as 2 x int64.  
Now that UUID is supported, you might also abuse UUID where 1 x 256- 
bit = 2 x UUID.


If you want it to be seemless and fully optimal, you would  
introduce a new int256 type (or whatever the name of the type you  
are trying to represent). Adding new types to PostgreSQL is not  
that hard. This would allow queries (=, , , ) as well.



If you want an example of that, we had Command Prompt create a full  
set of hash datatypes (SHA*, and I think md5). That stuff should be  
on pgFoundry; if it's not drop me a note at jna...@cashnetusa.com and  
I'll get it added.

--
Decibel!, aka Jim C. Nasby, Database Architect  deci...@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] UUID as primary key

2009-10-10 Thread Mark Mielke

On 10/10/2009 01:14 AM, tsuraan wrote:

The most significant impact is that it takes up twice as much space,
including the primary key index. This means fewer entries per block,
which means slower scans and/or more blocks to navigate through. Still,
compared to the rest of the overhead of an index row or a table row, it
is low - I think it's more important to understand whether you can get
away with using a sequential integer, in which case UUID is unnecessary
overhead - or whether you are going to need UUID anyways. If you need
UUID anyways - having two primary keys is probably not worth it.
 

Ok, that's what I was hoping.  Out of curiosity, is there a preferred
way to store 256-bit ints in postgres?  At that point, is a bytea the
most reasonable choice, or is there a better way to do it?
   


Do you need to be able to do queries on it? Numeric should be able to 
store 256-bit integers.


If you don't need to do queries on it, an option I've considered in the 
past is to break it up into 4 x int64. Before UUID was supported, I had 
seriously considered storing UUID as 2 x int64. Now that UUID is 
supported, you might also abuse UUID where 1 x 256-bit = 2 x UUID.


If you want it to be seemless and fully optimal, you would introduce a 
new int256 type (or whatever the name of the type you are trying to 
represent). Adding new types to PostgreSQL is not that hard. This would 
allow queries (=, , , ) as well.


Cheers,
mark

--
Mark Mielkem...@mielke.cc


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] UUID as primary key

2009-10-09 Thread tsuraan
I have a system where it would be very useful for the primary keys for
a few tables to be UUIDs (actually MD5s of files, but UUID seems to be
the best 128-bit type available).  What is the expected performance of
using a UUID as a primary key which will have numerous foreign
references to it, versus using a 64-bit int (32-bit isn't big enough)?

From the uuid.c in adt, it looks like a UUID is just stored as 8
consecutive bytes, and are compared using memcmp, whereas an int uses
primitive CPU instructions for comparison.  Is that a significant
issue with foreign key performance, or is it mostly just the size that
the key would take in all related tables?

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] UUID as primary key

2009-10-09 Thread Mark Mielke

On 10/09/2009 12:56 PM, tsuraan wrote:

I have a system where it would be very useful for the primary keys for
a few tables to be UUIDs (actually MD5s of files, but UUID seems to be
the best 128-bit type available).  What is the expected performance of
using a UUID as a primary key which will have numerous foreign
references to it, versus using a 64-bit int (32-bit isn't big enough)?

 From the uuid.c in adt, it looks like a UUID is just stored as 8
consecutive bytes, and are compared using memcmp, whereas an int uses
primitive CPU instructions for comparison.  Is that a significant
issue with foreign key performance, or is it mostly just the size that
the key would take in all related tables?
   


The most significant impact is that it takes up twice as much space, 
including the primary key index. This means fewer entries per block, 
which means slower scans and/or more blocks to navigate through. Still, 
compared to the rest of the overhead of an index row or a table row, it 
is low - I think it's more important to understand whether you can get 
away with using a sequential integer, in which case UUID is unnecessary 
overhead - or whether you are going to need UUID anyways. If you need 
UUID anyways - having two primary keys is probably not worth it.


Cheers,
mark


--
Mark Mielkem...@mielke.cc


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] UUID as primary key

2009-10-09 Thread tsuraan
 The most significant impact is that it takes up twice as much space,
 including the primary key index. This means fewer entries per block,
 which means slower scans and/or more blocks to navigate through. Still,
 compared to the rest of the overhead of an index row or a table row, it
 is low - I think it's more important to understand whether you can get
 away with using a sequential integer, in which case UUID is unnecessary
 overhead - or whether you are going to need UUID anyways. If you need
 UUID anyways - having two primary keys is probably not worth it.

Ok, that's what I was hoping.  Out of curiosity, is there a preferred
way to store 256-bit ints in postgres?  At that point, is a bytea the
most reasonable choice, or is there a better way to do it?

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance