[sqlite] AUTOINC vs. UUIDs

Scott Robison Thu, 21 May 2015 12:54:38 -0600

On Thu, May 21, 2015 at 11:37 AM, Valentin Davydov <sqlite-user at soi.spb.ru>
wrote:

> On Wed, May 20, 2015 at 11:52:08PM +0000, Peter Aronson wrote:
> > Now you're just getting silly.  What if the application sets all rowids,
> > everywhere to 1?  The fact is, the chance of collision on a UUID is
> pretty
> > astronomically low as long as a decent source of entropy is used
> > (see
> http://en.wikipedia.org/wiki/Universally_unique_identifier#Random_UUID_probability_of_duplicates
> ).
> >  Yes, some application might not generate proper UUIDs, but that's true
> > with any scheme that needs to coordinate disconnected data editing or
> > generation on multiple machines.
>
> Moreover, there are widespread examples of colliding UUIDs, say
> EBD0A0A2-B9E5-4433-87C0-68B6B72699C7. This means that this idea
> have already proven to fail on it's intended usage.
>

MBR uses one byte unsigned integers to indicate partition type. If two
companies pick the same byte value to represent two different things, then
you have a collision and some software somewhere is going to do the wrong
thing if it doesn't know how to disambiguate using extra information. The
problem is not with the software, it is with two developers picking the
same byte value to mean different things. Since the "partition type"
namespace only has 2^8 possible values, and there are more than 256
partition formats defined, collisions are "very likely" (by which I mean
have already happened). This was unavoidable with a single byte, but given
that MBR dates back to a time that multiple bootable partitions on a single
hard drive were unheard of, it worked (for a while). Once people started
dual booting, partition type collisions became more painful.

GPT uses GUIDs. This gives 2^128 possible partition types. Collisions are
much less likely if one uses any of the RFC UUID generation procedures
(versions 1 through 5). If one deliberately chooses the exact same byte
sequence to mean something different, as happened in this case, that's not
a failure of UUID-style identifier creation. Any bad actor (whether acting
out of ignorance or maliciousness) can result in the exact same outcome
whether using 8 bit or 128 bit (or any other finite bit) identifiers.

>  There are lots of applications out there that use UUIDs pretty
> successfully.
>
> Much less than a number of applications which use integers ;-)

Integers have been around a lot longer, so of course there are more
successful applications. :)

I would not say UUIDs or their ilk should be used indiscriminately as was
recommended in the linked article that started this thread. They do have
their place though, particularly when the cost to coordinate the allocation
of relatively small integers is "too high". Maybe a device has no
connectivity to a central server while in the field (short of maybe IP over
Avian Carrier: https://tools.ietf.org/html/rfc1149). In that case one has
two choices. One, allocate an integer that you know is almost certainly
going to collide with someone else and sort it out later). Two, do what
SQLite does when the largest possible key is in use: start picking random
integers until you find an open slot or you time out. UUIDs (whether random
or deterministic in their generation technique) are option two (practically
if not exactly).

I would not use UUID-like identifiers in most cases. There are some cases
where they work better than integers. In fact, one need not limit oneself
to the official UUID generation techniques. One can do the same sort of
thing with 64 bit integers (as described above by SQLite), or 256 bit
blobs. Heck, one can do it with one bit integers, though the chance of
collision is rather high. :)

-- 
Scott Robison

[sqlite] AUTOINC vs. UUIDs

Reply via email to