Re: [sqlite] Article: UUID or GUID as Primary Keys? Be Careful!

R Smith Sat, 10 Jun 2017 05:15:04 -0700


On 2017/06/10 6:27 AM, Jens Alfke wrote:

On Jun 9, 2017, at 3:05 PM, Simon Slavin <[email protected]> wrote:


Tangential to SQLite, but there’s little on the list at the moment so perhaps 
some of you might like this.
<https://tomharrisonjr.com/uuid-or-guid-as-primary-keys-be-careful-7b2aa3dcb439 
<https://tomharrisonjr.com/uuid-or-guid-as-primary-keys-be-careful-7b2aa3dcb439>>

He makes some questionable points, like saying that an ASCII string of hex has 
a “9x cost in size” compared to a binary representation, or that hex strings 
would somehow get larger when converted from ISO-8859-1 to UTF-8.

Just in case people wonder about your assertion that his assertion iswrong - to be specific: UTF8 consumes the exact same space as ASCII whenyou use only characters from the first block (0..7F) - which is exactlywhat a UUID uses, at a maximum characters 0..9, A..F or a..f, -, {, and}. Nothing outside of the first Unicode block - though it should benoted that some systems may use Double-byte (16 bit) characterrepresentations internally whenever the DB table text *storage* type isset to UTF-anything (which should be noted is not the same thing as the*DB-Interface* type being UTF-anything).


So the blogger's point doesn't hold on that assertion.

Further, a UUID/GUID as per the standard (RFC4122) consists of 128 bitvalue formatted to present like this 36-character sequence:

xxxxxxxx-xxxx-Axxx-Bxxx-xxxxxxxxxxxx

where A is a variant and B is the version of the UUID represented.Variants define different methods of calculation, like whether the MACaddress with a time component was used, or a Domain/Namespace based UUIDetc. In DB systems we usually use variant 1 (MAC+Time with 100nanosecond precision) which, unless mechanical failure or intentionaldeceit, must be unique (i.e. probability for global collision = 0 ifcreated exactly as described and all systems work as designed, and somecosmic ray doesn't hit your processor just right [or is it just wrong?]).

Anyway, about the layout, you can of course simply store the UUID as a128 bit value (or 2 64-bit INTs - considering you use the exact samevariant and version for all your IDs, but this takes processing and youend up with a value that needs to be re-computed before it can becompared to anything outside of your system), or at a minimum remove anydashes and braces, but in reality most people will just plop it as-isinto a Text/Varchar field that's been Uniqued and probably PK'd.

In that worst case scenario (all of the UUID plus dashes and braces),the full storage requirement for a UUID would look like this:{xxxxxxxx-xxxx-Axxx-Bxxx-xxxxxxxxxxxx} which totals 38 characters ofASCII (or UTF-8) text space which totals 38 bytes.Let's be generous and assume the user made VARCHAR(40) provision on anold-style DB which reserves all the bytes, or better yet, a modern onewith a length definition that takes a further 32-bit value, so 42 bytesthen. Even in this very worst case scenario, the full space requirementfor a UUID is a dismal ~2.7 times more than the 16 bytes of space theoriginal 128-bit value consumed. Let's further assume the worst textstorage system using DBCS to store 16 bits per character (and nobodyreally does this), even then we only get to just over 5 times. Where didhe get 9 times from??The typical usage, storing full text UUID minus braces in an ASCII/UTF-8sequence will result in a hair over 2.3 times[1] the storage of INTs.Not really that bad I think.

I find it fascinating that the number 1 reason to not use UUIDs, andprobably the only reason, he never even mentioned. Sheer speed. (Herefers sorting speed, but the real gain is look-up speed, which getscompounded in a compound query). In MSSQL I measured almost double thelookup speed using INTs in a PK in stead of VARCHARs (I didn't even useUUIDs, simply 6-character client codes of the form ABC001 etc.).

Where I DO agree with the blogger: Where space is not a big concern, useboth UUIDs and INTs locally in your DB, that way it is always scalable,always merge-able with other global data and always fast with the rightquery.


Cheers,
Ryan

[1] - It's hard to say exactly, most DBs use extra bits/bytes for fieldspecifications, lengths etc, even for the INT fields, so making an exactblanket assertion here about ratio of char vs. int storage is notpossible, but the given ratio should be close.


_______________________________________________
sqlite-users mailing list
[email protected]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Article: UUID or GUID as Primary Keys? Be Careful!

Reply via email to