the article was integers (4 bytes) vs guid as a primary key, which as string takes 36 bytes. so he was right in saying that string guid/uuid take 9 time the space of simple integers.
in general the article was quite good and allowed people to ponder about their choices/beliefs. he said be: *careful*, which is a good rule of thumb, as design decisions are often cast in stone and cannot be change because to many external programs depends on them. regards 2017-06-10 8:13 GMT-04:00 R Smith <rsm...@rsweb.co.za>: > > On 2017/06/10 6:27 AM, Jens Alfke wrote: > >> On Jun 9, 2017, at 3:05 PM, Simon Slavin <slav...@bigfraud.org> wrote: >>> >>> Tangential to SQLite, but there’s little on the list at the moment so >>> perhaps some of you might like this. >>> <https://tomharrisonjr.com/uuid-or-guid-as-primary-keys-be- >>> careful-7b2aa3dcb439 <https://tomharrisonjr.com/uui >>> d-or-guid-as-primary-keys-be-careful-7b2aa3dcb439>> >>> >> He makes some questionable points, like saying that an ASCII string of >> hex has a “9x cost in size” compared to a binary representation, or that >> hex strings would somehow get larger when converted from ISO-8859-1 to >> UTF-8. >> > > Just in case people wonder about your assertion that his assertion is > wrong - to be specific: UTF8 consumes the exact same space as ASCII when > you use only characters from the first block (0..7F) - which is exactly > what a UUID uses, at a maximum characters 0..9, A..F or a..f, -, {, and }. > Nothing outside of the first Unicode block - though it should be noted that > some systems may use Double-byte (16 bit) character representations > internally whenever the DB table text *storage* type is set to UTF-anything > (which should be noted is not the same thing as the *DB-Interface* type > being UTF-anything). > > So the blogger's point doesn't hold on that assertion. > > Further, a UUID/GUID as per the standard (RFC4122) consists of 128 bit > value formatted to present like this 36-character sequence: > xxxxxxxx-xxxx-Axxx-Bxxx-xxxxxxxxxxxx > > where A is a variant and B is the version of the UUID represented. > Variants define different methods of calculation, like whether the MAC > address with a time component was used, or a Domain/Namespace based UUID > etc. In DB systems we usually use variant 1 (MAC+Time with 100 nanosecond > precision) which, unless mechanical failure or intentional deceit, must be > unique (i.e. probability for global collision = 0 if created exactly as > described and all systems work as designed, and some cosmic ray doesn't hit > your processor just right [or is it just wrong?]). > > Anyway, about the layout, you can of course simply store the UUID as a 128 > bit value (or 2 64-bit INTs - considering you use the exact same variant > and version for all your IDs, but this takes processing and you end up with > a value that needs to be re-computed before it can be compared to anything > outside of your system), or at a minimum remove any dashes and braces, but > in reality most people will just plop it as-is into a Text/Varchar field > that's been Uniqued and probably PK'd. > > In that worst case scenario (all of the UUID plus dashes and braces), the > full storage requirement for a UUID would look like this: > {xxxxxxxx-xxxx-Axxx-Bxxx-xxxxxxxxxxxx} which totals 38 characters of > ASCII (or UTF-8) text space which totals 38 bytes. > Let's be generous and assume the user made VARCHAR(40) provision on an > old-style DB which reserves all the bytes, or better yet, a modern one with > a length definition that takes a further 32-bit value, so 42 bytes then. > Even in this very worst case scenario, the full space requirement for a > UUID is a dismal ~2.7 times more than the 16 bytes of space the original > 128-bit value consumed. Let's further assume the worst text storage system > using DBCS to store 16 bits per character (and nobody really does this), > even then we only get to just over 5 times. Where did he get 9 times from?? > The typical usage, storing full text UUID minus braces in an ASCII/UTF-8 > sequence will result in a hair over 2.3 times[1] the storage of INTs. Not > really that bad I think. > > I find it fascinating that the number 1 reason to not use UUIDs, and > probably the only reason, he never even mentioned. Sheer speed. (He refers > sorting speed, but the real gain is look-up speed, which gets compounded in > a compound query). In MSSQL I measured almost double the lookup speed using > INTs in a PK in stead of VARCHARs (I didn't even use UUIDs, simply > 6-character client codes of the form ABC001 etc.). > > Where I DO agree with the blogger: Where space is not a big concern, use > both UUIDs and INTs locally in your DB, that way it is always scalable, > always merge-able with other global data and always fast with the right > query. > > Cheers, > Ryan > > [1] - It's hard to say exactly, most DBs use extra bits/bytes for field > specifications, lengths etc, even for the INT fields, so making an exact > blanket assertion here about ratio of char vs. int storage is not possible, > but the given ratio should be close. > > > _______________________________________________ > sqlite-users mailing list > sqlite-users@mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users > -- Daniel *L'action accède à la perfection quand, bien que vivant, vous êtes déjà mort* *Bunan* _______________________________________________ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users