On 04/24/2014 05:23 PM, Marti Raudsepp wrote: > On Thu, Apr 24, 2014 at 8:40 PM, Josh Berkus <j...@agliodbs.com> wrote: >> A pseudo-random UUID is frankly pretty >> useless to me because (a) it's not really unique > > This is FUD. A pseudorandom UUID contains 122 bits of randomness. As > long as you can trust the random number generator, the chances of a > value occurring twice can be estimated using the birthday paradox: > there's a 50% chance of having *one* collision in a set of 2^61 items. > Storing this amount of UUIDs alone requires 32 exabytes of storage. > Factor in the tuple and indexing overheads and you'd be needing close > to all the hard disk space ever manufactured in the world.
Well, I've already had collisions with UUID-OSSP, in production, with only around 20 billion values. So clearly there aren't 122bits of true randomness in OSSP. I can't speak for other implementations because I haven't tried them. >> (b) it doesn't help me route data at all. > > That's really out of scope for UUIDs. They're about generating > identifiers, not describing what the identifier means. UUIDs also > don't happen to cure cancer. http://it.toolbox.com/blogs/database-soup/primary-keyvil-part-i-7327 On the contrary, I would argue that an object identifier which is completely random is possibly the worst way to form an ID of all possible concepts; there's no relationship whatsoever between the ID, the application stack, and the application data; you don't even get the pseudo-time indexing you get with Serials. The only reason to do it is because you're too lazy do implement a better way. Or to put it another way: a value which is truly random is no identifier at all. Compare this with a composite identifier which carries information about the node, table, and schema of origin for the tuple. Not only does this help ensure uniqueness, but it also supports intelligent sharding and multi-master replication systems. I don't speak hypothetically; we've done this in the past and will do it again in the future. I would love to have some machinery inside PostgreSQL to make this easier (for example, a useful unique database ID), but I suspect that acutal implementation will always remain application-specific. You may say "oh, that's not the job of the identifer", but if it's not, WTF is the identifer for, then? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (email@example.com) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers