Re: [PERFORM] UUID as primary key
On Oct 10, 2009, at 10:40 AM, Mark Mielke wrote: On 10/10/2009 01:14 AM, tsuraan wrote: The most significant impact is that it takes up twice as much space, including the primary key index. This means fewer entries per block, which means slower scans and/or more blocks to navigate through. Still, compared to the rest of the overhead of an index row or a table row, it is low - I think it's more important to understand whether you can get away with using a sequential integer, in which case UUID is unnecessary overhead - or whether you are going to need UUID anyways. If you need UUID anyways - having two primary keys is probably not worth it. Ok, that's what I was hoping. Out of curiosity, is there a preferred way to store 256-bit ints in postgres? At that point, is a bytea the most reasonable choice, or is there a better way to do it? Do you need to be able to do queries on it? Numeric should be able to store 256-bit integers. If you don't need to do queries on it, an option I've considered in the past is to break it up into 4 x int64. Before UUID was supported, I had seriously considered storing UUID as 2 x int64. Now that UUID is supported, you might also abuse UUID where 1 x 256- bit = 2 x UUID. If you want it to be seemless and fully optimal, you would introduce a new int256 type (or whatever the name of the type you are trying to represent). Adding new types to PostgreSQL is not that hard. This would allow queries (=, , , ) as well. If you want an example of that, we had Command Prompt create a full set of hash datatypes (SHA*, and I think md5). That stuff should be on pgFoundry; if it's not drop me a note at jna...@cashnetusa.com and I'll get it added. -- Decibel!, aka Jim C. Nasby, Database Architect deci...@decibel.org Give your computer some brain candy! www.distributed.net Team #1828 -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] UUID as primary key
decibel escribió: If you want it to be seemless and fully optimal, you would introduce a new int256 type (or whatever the name of the type you are trying to represent). Adding new types to PostgreSQL is not that hard. This would allow queries (=, , , ) as well. If you want an example of that, we had Command Prompt create a full set of hash datatypes (SHA*, and I think md5). That stuff should be on pgFoundry; if it's not drop me a note at jna...@cashnetusa.com and I'll get it added. It's at project shatypes. -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] UUID as primary key
On Oct 10, 2009, at 10:40 AM, Mark Mielke wrote: On 10/10/2009 01:14 AM, tsuraan wrote: The most significant impact is that it takes up twice as much space, including the primary key index. This means fewer entries per block, which means slower scans and/or more blocks to navigate through. Still, compared to the rest of the overhead of an index row or a table row, it is low - I think it's more important to understand whether you can get away with using a sequential integer, in which case UUID is unnecessary overhead - or whether you are going to need UUID anyways. If you need UUID anyways - having two primary keys is probably not worth it. Ok, that's what I was hoping. Out of curiosity, is there a preferred way to store 256-bit ints in postgres? At that point, is a bytea the most reasonable choice, or is there a better way to do it? Do you need to be able to do queries on it? Numeric should be able to store 256-bit integers. If you don't need to do queries on it, an option I've considered in the past is to break it up into 4 x int64. Before UUID was supported, I had seriously considered storing UUID as 2 x int64. Now that UUID is supported, you might also abuse UUID where 1 x 256- bit = 2 x UUID. If you want it to be seemless and fully optimal, you would introduce a new int256 type (or whatever the name of the type you are trying to represent). Adding new types to PostgreSQL is not that hard. This would allow queries (=, , , ) as well. If you want an example of that, we had Command Prompt create a full set of hash datatypes (SHA*, and I think md5). That stuff should be on pgFoundry; if it's not drop me a note at jna...@cashnetusa.com and I'll get it added. -- Decibel!, aka Jim C. Nasby, Database Architect deci...@decibel.org Give your computer some brain candy! www.distributed.net Team #1828 -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] UUID as primary key
On 10/10/2009 01:14 AM, tsuraan wrote: The most significant impact is that it takes up twice as much space, including the primary key index. This means fewer entries per block, which means slower scans and/or more blocks to navigate through. Still, compared to the rest of the overhead of an index row or a table row, it is low - I think it's more important to understand whether you can get away with using a sequential integer, in which case UUID is unnecessary overhead - or whether you are going to need UUID anyways. If you need UUID anyways - having two primary keys is probably not worth it. Ok, that's what I was hoping. Out of curiosity, is there a preferred way to store 256-bit ints in postgres? At that point, is a bytea the most reasonable choice, or is there a better way to do it? Do you need to be able to do queries on it? Numeric should be able to store 256-bit integers. If you don't need to do queries on it, an option I've considered in the past is to break it up into 4 x int64. Before UUID was supported, I had seriously considered storing UUID as 2 x int64. Now that UUID is supported, you might also abuse UUID where 1 x 256-bit = 2 x UUID. If you want it to be seemless and fully optimal, you would introduce a new int256 type (or whatever the name of the type you are trying to represent). Adding new types to PostgreSQL is not that hard. This would allow queries (=, , , ) as well. Cheers, mark -- Mark Mielkem...@mielke.cc -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] UUID as primary key
I have a system where it would be very useful for the primary keys for a few tables to be UUIDs (actually MD5s of files, but UUID seems to be the best 128-bit type available). What is the expected performance of using a UUID as a primary key which will have numerous foreign references to it, versus using a 64-bit int (32-bit isn't big enough)? From the uuid.c in adt, it looks like a UUID is just stored as 8 consecutive bytes, and are compared using memcmp, whereas an int uses primitive CPU instructions for comparison. Is that a significant issue with foreign key performance, or is it mostly just the size that the key would take in all related tables? -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] UUID as primary key
On 10/09/2009 12:56 PM, tsuraan wrote: I have a system where it would be very useful for the primary keys for a few tables to be UUIDs (actually MD5s of files, but UUID seems to be the best 128-bit type available). What is the expected performance of using a UUID as a primary key which will have numerous foreign references to it, versus using a 64-bit int (32-bit isn't big enough)? From the uuid.c in adt, it looks like a UUID is just stored as 8 consecutive bytes, and are compared using memcmp, whereas an int uses primitive CPU instructions for comparison. Is that a significant issue with foreign key performance, or is it mostly just the size that the key would take in all related tables? The most significant impact is that it takes up twice as much space, including the primary key index. This means fewer entries per block, which means slower scans and/or more blocks to navigate through. Still, compared to the rest of the overhead of an index row or a table row, it is low - I think it's more important to understand whether you can get away with using a sequential integer, in which case UUID is unnecessary overhead - or whether you are going to need UUID anyways. If you need UUID anyways - having two primary keys is probably not worth it. Cheers, mark -- Mark Mielkem...@mielke.cc -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] UUID as primary key
The most significant impact is that it takes up twice as much space, including the primary key index. This means fewer entries per block, which means slower scans and/or more blocks to navigate through. Still, compared to the rest of the overhead of an index row or a table row, it is low - I think it's more important to understand whether you can get away with using a sequential integer, in which case UUID is unnecessary overhead - or whether you are going to need UUID anyways. If you need UUID anyways - having two primary keys is probably not worth it. Ok, that's what I was hoping. Out of curiosity, is there a preferred way to store 256-bit ints in postgres? At that point, is a bytea the most reasonable choice, or is there a better way to do it? -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance