the article was integers (4 bytes) vs guid as a primary key, which as
string takes 36 bytes. so he was right in saying that string guid/uuid take
9 time the space of simple integers.

in general the article was quite good and allowed people to ponder about
their choices/beliefs.

he said be: *careful*, which is a good rule of thumb, as design decisions
are often cast in stone and cannot be change because to many external
programs depends on them.


regards


2017-06-10 8:13 GMT-04:00 R Smith <rsm...@rsweb.co.za>:

>
> On 2017/06/10 6:27 AM, Jens Alfke wrote:
>
>> On Jun 9, 2017, at 3:05 PM, Simon Slavin <slav...@bigfraud.org> wrote:
>>>
>>> Tangential to SQLite, but there’s little on the list at the moment so
>>> perhaps some of you might like this.
>>> <https://tomharrisonjr.com/uuid-or-guid-as-primary-keys-be-
>>> careful-7b2aa3dcb439 <https://tomharrisonjr.com/uui
>>> d-or-guid-as-primary-keys-be-careful-7b2aa3dcb439>>
>>>
>> He makes some questionable points, like saying that an ASCII string of
>> hex has a “9x cost in size” compared to a binary representation, or that
>> hex strings would somehow get larger when converted from ISO-8859-1 to
>> UTF-8.
>>
>
> Just in case people wonder about your assertion that his assertion is
> wrong - to be specific: UTF8 consumes the exact same space as ASCII when
> you use only characters from the first block (0..7F) - which is exactly
> what a UUID uses, at a maximum characters 0..9, A..F or a..f, -, {, and }.
> Nothing outside of the first Unicode block - though it should be noted that
> some systems may use Double-byte (16 bit) character representations
> internally whenever the DB table text *storage* type is set to UTF-anything
> (which should be noted is not the same thing as the *DB-Interface* type
> being UTF-anything).
>
> So the blogger's point doesn't hold on that assertion.
>
> Further, a UUID/GUID as per the standard (RFC4122) consists of 128 bit
> value formatted to present like this 36-character sequence:
> xxxxxxxx-xxxx-Axxx-Bxxx-xxxxxxxxxxxx
>
> where A is a variant and B is the version of the UUID represented.
> Variants define different methods of calculation, like whether the MAC
> address with a time component was used, or a Domain/Namespace based UUID
> etc. In DB systems we usually use variant 1 (MAC+Time with 100 nanosecond
> precision) which, unless mechanical failure or intentional deceit, must be
> unique (i.e. probability for global collision = 0 if created exactly as
> described and all systems work as designed, and some cosmic ray doesn't hit
> your processor just right [or is it just wrong?]).
>
> Anyway, about the layout, you can of course simply store the UUID as a 128
> bit value (or 2 64-bit INTs - considering you use the exact same variant
> and version for all your IDs, but this takes processing and you end up with
> a value that needs to be re-computed before it can be compared to anything
> outside of your system), or at a minimum remove any dashes and braces, but
> in reality most people will just plop it as-is into a Text/Varchar field
> that's been Uniqued and probably PK'd.
>
> In that worst case scenario (all of the UUID plus dashes and braces), the
> full storage requirement for a UUID would look like this:
> {xxxxxxxx-xxxx-Axxx-Bxxx-xxxxxxxxxxxx} which totals 38 characters of
> ASCII (or UTF-8) text space which totals 38 bytes.
> Let's be generous and assume the user made VARCHAR(40) provision on an
> old-style DB which reserves all the bytes, or better yet, a modern one with
> a length definition that takes a further 32-bit value, so 42 bytes then.
> Even in this very worst case scenario, the full space requirement for a
> UUID is a dismal ~2.7 times more than the 16 bytes of space the original
> 128-bit value consumed. Let's further assume the worst text storage system
> using DBCS to store 16 bits per character (and nobody really does this),
> even then we only get to just over 5 times. Where did he get 9 times from??
> The typical usage, storing full text UUID minus braces in an ASCII/UTF-8
> sequence will result in a hair over 2.3 times[1] the storage of INTs. Not
> really that bad I think.
>
> I find it fascinating that the number 1 reason to not use UUIDs, and
> probably the only reason, he never even mentioned. Sheer speed. (He refers
> sorting speed, but the real gain is look-up speed, which gets compounded in
> a compound query). In MSSQL I measured almost double the lookup speed using
> INTs in a PK in stead of VARCHARs (I didn't even use UUIDs, simply
> 6-character client codes of the form ABC001 etc.).
>
> Where I DO agree with the blogger: Where space is not a big concern, use
> both UUIDs and INTs locally in your DB, that way it is always scalable,
> always merge-able with other global data and always fast with the right
> query.
>
> Cheers,
> Ryan
>
> [1] - It's hard to say exactly, most DBs use extra bits/bytes for field
> specifications, lengths etc, even for the INT fields, so making an exact
> blanket assertion here about ratio of char vs. int storage is not possible,
> but the given ratio should be close.
>
>
> _______________________________________________
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>



-- 
Daniel
*L'action accède à la perfection quand, bien que vivant, vous êtes déjà
mort*
*Bunan*
_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to