On Thu, May 21, 2015 at 11:37 AM, Valentin Davydov <sqlite-user at soi.spb.ru> wrote:
> On Wed, May 20, 2015 at 11:52:08PM +0000, Peter Aronson wrote: > > Now you're just getting silly. What if the application sets all rowids, > > everywhere to 1? The fact is, the chance of collision on a UUID is > pretty > > astronomically low as long as a decent source of entropy is used > > (see > http://en.wikipedia.org/wiki/Universally_unique_identifier#Random_UUID_probability_of_duplicates > ). > > Yes, some application might not generate proper UUIDs, but that's true > > with any scheme that needs to coordinate disconnected data editing or > > generation on multiple machines. > > Moreover, there are widespread examples of colliding UUIDs, say > EBD0A0A2-B9E5-4433-87C0-68B6B72699C7. This means that this idea > have already proven to fail on it's intended usage. > MBR uses one byte unsigned integers to indicate partition type. If two companies pick the same byte value to represent two different things, then you have a collision and some software somewhere is going to do the wrong thing if it doesn't know how to disambiguate using extra information. The problem is not with the software, it is with two developers picking the same byte value to mean different things. Since the "partition type" namespace only has 2^8 possible values, and there are more than 256 partition formats defined, collisions are "very likely" (by which I mean have already happened). This was unavoidable with a single byte, but given that MBR dates back to a time that multiple bootable partitions on a single hard drive were unheard of, it worked (for a while). Once people started dual booting, partition type collisions became more painful. GPT uses GUIDs. This gives 2^128 possible partition types. Collisions are much less likely if one uses any of the RFC UUID generation procedures (versions 1 through 5). If one deliberately chooses the exact same byte sequence to mean something different, as happened in this case, that's not a failure of UUID-style identifier creation. Any bad actor (whether acting out of ignorance or maliciousness) can result in the exact same outcome whether using 8 bit or 128 bit (or any other finite bit) identifiers. > There are lots of applications out there that use UUIDs pretty > successfully. > > Much less than a number of applications which use integers ;-) Integers have been around a lot longer, so of course there are more successful applications. :) I would not say UUIDs or their ilk should be used indiscriminately as was recommended in the linked article that started this thread. They do have their place though, particularly when the cost to coordinate the allocation of relatively small integers is "too high". Maybe a device has no connectivity to a central server while in the field (short of maybe IP over Avian Carrier: https://tools.ietf.org/html/rfc1149). In that case one has two choices. One, allocate an integer that you know is almost certainly going to collide with someone else and sort it out later). Two, do what SQLite does when the largest possible key is in use: start picking random integers until you find an open slot or you time out. UUIDs (whether random or deterministic in their generation technique) are option two (practically if not exactly). I would not use UUID-like identifiers in most cases. There are some cases where they work better than integers. In fact, one need not limit oneself to the official UUID generation techniques. One can do the same sort of thing with 64 bit integers (as described above by SQLite), or 256 bit blobs. Heck, one can do it with one bit integers, though the chance of collision is rather high. :) -- Scott Robison