[sqlite] AUTOINC vs. UUIDs

J Decker Thu, 21 May 2015 05:17:35 -0700

Sorry to be argumentative.  I'll drop it.

On Thu, May 21, 2015 at 4:44 AM, Mark Straver <
sqlite-users-list at palemoon.org> wrote:


> On 21/05/2015 09:33, J Decker wrote:
> > chance of collision is slim... it's a truly large amount of bits...
> > There's libuuid for linux, and builtin functions for windows to create
> them.
>
> The chance of collision is only slim if you (as some people have already
> indicated) have a proper entropy source to generate type 4s. In any other
> case
> the chance of collision is pretty much guaranteed, and not just once, but
> for
> each PK entry because of how PRNGs work.
>
> The other main concern that seems to be the case here is merging data from
> separate DBs with the same PK. Of course, if you are dealing with multiple
> clients with individual input in local DBs, you would not just take the DBs
> and slap them one on top of the other on the central storage location;
> that is
> just being extremely naive.
>
> I also wouldn't want to be the poor server having to check all UUIDs for
> matching entries on every delta if those are used, which, because of them
> being pseudo-random is very much a requirement.
>
> > Oh and the other feature! of UUIDs is merging databases... if I have a
> > database deployed to 15 customers, each with their own ORGID I can easily
> > throw them in the same database without worrying about fixing up primary
> > keys and all records related to those keys.
>
> You wouldn't with sequential PKs either, but instead of stuffing 2 types of
> data into a single field (which I think is also a bad idea) you simply
> have an
> extra field with ORGID. The PK should always be unique to the target DB,
> unrelated to the source DBs, when merging. Blurring those lines is asking
> for
> trouble, IMHO.
>
> having the ORGID alone doesn't help... you still have to fixup all related
records that reference the primary key... just inserting to get new PK's
doesn't help.  If the dataset is fairly complex (6 wide and 4 deep in
relations) that can be a very hairy problem.


> So, I think the following can be said:
> * UUIDs are not human-readable in case someone actually needs to remember
> a PK
>
(first reaction)Remembering PK's is a horrible practice, and best broken
early.
(edited)Remembering PKs is a habit one can get into when it's like 1-100,
like changing the item type or something... but having keys that are not
memorable just cause you to modify your habits, but don't hamper getting
jobs done.


> * If you stuff an ORGID into the UUID, you're actually making it easier to
> leak data; only a truly random UUID would make sense.

I wasn't saying the ORGID was part of the UUID it is itself unique, but
really irrelavent other than to indicate it would be possible to separate
the 15 back to original datasets by referncing a ORGID column.


>

* UUIDs offer no key space advantage to a DB that is actually relevant to
> even
> large DB systems (let alone normal SQLite use)
>
Other than ease of merge; ease of building local transactions that don't
require locks; ...


> * UUIDs are not offering guarantees of uniqueness which a one-way
> auto-increment would, requiring extensive checking "just in case"
> (contributing to the next point)
>
Other than they don't require updating all related records using the PK as
a FK.  Since appropriate entropy was given such that your million records a
second for the past 10 years still has a very tiny chance of collision.
 (and something generating that much data is probably a data point
collector like a monitor system for sensors on a rocket engine or maybe the
CERN collider, and should not have an issue with locking the database while
it logs its data).


> * UUIDs depend on the client-side implementation of a PRNG and entropy
> source,
> which may be flawed, produce repeatable results, etc.
>
and should be taken into consideration... something simple like 'rand()'
only has 32 bits of entropic chains for instance (given 0-0xffffffff best
case) you end up with only that many predictable chains... fortunatly
libuuid and windows libs reference more bits of entropy and are written to
fulfill the purpose.


> * UUIDs take more resources to use (memory, CPU cycles for comp, hash,
> etc.)
> which I think is right against what you'd want for SQLite
>
> This is true... but since numbers are just stored as strings anyway, the
failure of matching is probably the same number of characters (3-4 in the
lead) it will either fail early or match until its length.  It could be
more optimal if there was a binary representation of UUID available... but
given that the keys are random the indexes are generally well populated in
a balanced tree without getting too long on a side.  (although I assume
auto increment primary keys have 'tricks' to make them be a balanced sort
anyway)


> As a result, I don't think it's wise to use them, objectively speaking. It
> has
> very little to do with having used them in the past myself or not (OT:
> that is
> actually a psychological trap employed by many shady business models ;) ).
> I do hope that if you use them, you never run into collisions and having to
> sort out the pieces afterwards.
>
> I used them heavily with a C# project, which I had the data stored in
DataSet's and appropriate relationships defined between DataTables, so on
insert if there was a collision in the primary key field, I could just pick
a new key and cascade update all related tables that referenced it as a
FK.  It worked quite well... and reduced the overall transaction time by
not having to constantly check the database for 'what's the last known
value of X' and updating to use that... plus as stated multiple clients
could build their own transaction datasets without contacting the database
until they were to be commited.. then a begin transaction; (mass inserts);
commit; kept the database locked for only a very short time.

For smaller tables like my options tables, I really notice no degredation
of performance for searching through the indexed UUIDs.  Yes it's a storage
hit, but storage is cheap, and gets cheaper all the time.  I remember when
20M drives were SO BIG!  than 120M! OMG it took forever to fill that.  and
that's MB's not GB's and far from TB's of space that are common now.

-----------
My first run-in with them as an implementation was OpenSim (a second life
server replication with distributed grid servers forming a hypergrid).  I
thought at first they were rather annoying but began to realize the
benefits... since my items from server A could move to server B and
maintain exactly the same identities (and all related textures/scripts
would still have the same FK for the PK of the items etc).  Before that, I
hesitated myself for a long time debating cost of space/comparison
times/increased statement lengths, but have come to learn that in practice
none of that really mattered, and certainly was easier to build datasets
locally and store them without having to do a insert and query for each
record.

But this is definately a 'you can put a cat in front of food, but can't
make him eat' conversation.


M.
>
> _______________________________________________
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>

[sqlite] AUTOINC vs. UUIDs

Reply via email to