I think Greg's points are very valid and if we were starting from scratch I 
think I would be a proponent of making some changes to the data model, but I 
also think that George is correct in suggesting that at the end of the day our 
UUIDs are nowhere near our biggest bottleneck.

-- Allen


On Thu, 2006-03-16 at 18:30, George Daswani wrote:
> It's interesting that some people are arguing over the key strategy.  I 
> personally would rather use UUID's than sequences, identities, integers (some 
> of which are DB specific, and some aren't even safe on the cluster).  Most of 
> the problems people are pointing out would probably be valid if roller was 
> hitting the database directly every time, but for people who looked at the 
> code knows that it's doing a lot of caching in order to scale.  Changing your 
> key strategy won't allow you to scale any better.  It would be better IMHO if 
> the developers instead focused on improving caching even more, then make the 
> roller work properly on a cluster.  It's too bad Tangosol doesn't offer an 
> offer source license for Coherence, but then again - there are other 
> distributed caches out there like Jboss Cache and etc.
> 
> George Daswani
> -----Original Message-----
> From: Greg Hamer [mailto:[EMAIL PROTECTED] 
> Sent: Thursday, March 16, 2006 5:50 PM
> To: roller-dev@incubator.apache.org
> Subject: Re: why primary key is using hex instead of integer
> 
> Following is a minor addendum to my email Tuesday.
> 
> In my work today I was reviewing the following from the MySQL 5.1 Reference 
> Manual:  
> 7.4.2. Make Your Data as Small as Possible
> 
> Particularly relevant excerpts follow below.  I believe Oracle, Microsoft, 
> Sybase and IBM provide similar guidelines.
> 
> 
> EXCERPTS
> 
> One of the most basic optimizations is to design your tables to take as 
> little space on the disk as possible. This can result in huge improvements 
> because disk reads are faster, and smaller tables normally require less main 
> memory while their contents are being actively processed during query 
> execution. Indexing also is a lesser resource burden if done on smaller 
> columns.
> 
> ...
> 
> You can get better performance for a table and minimize storage space by 
> using the techniques listed here:
>   a.. Use the most efficient (smallest) data types possible. MySQL has many 
> specialized types that save disk space and memory. For example, use the 
> smaller integer types if possible to get smaller tables. MEDIUMINT is often a 
> better choice than INT because a MEDIUMINT column uses 25% less space. 
>   b.. ... 
>   c.. The primary index of a table should be as short as possible. This makes 
> identification of each row easy and efficient. 
>   d.. ... 
>   e.. ... Shorter indexes are faster, not only because they require less disk 
> space, but because they give also you more hits in the index cache, and thus 
> fewer disk seeks.
> 
> 
> 
>   ----- Original Message ----- 
>   From: Greg Hamer 
>   To: roller-dev@incubator.apache.org 
>   Sent: Tuesday, March 14, 2006 2:52 PM
>   Subject: Re: why primary key is using hex instead of integer
> 
> 
>   Based on my experience as a DBA, I suspect there are penalties, but you are 
> only likely to see them under load.
> 
>   All data returned from databases passes through RAM.  All databases enhance 
> performance via caching data in RAM.  The longer a row (i.e the more bytes in 
> a row), the less rows that can be cached in a given amount of RAM.  Making 
> primary keys that require 32 bytes of storage per row versus, say integers 
> which require 4 bytes, reduces the amount of data that can be served from 
> cache.
> 
>   Now multiply the 32 vs 4 bytes by:  a) every foreign key referencing back 
> to primary key; b) every non-clustered index on the primary key; and c) every 
> index on the foreign keys.  
> 
>   Add it all up and there likely is detrimental impact on:  a) performance; 
> b) scalability; and c) cost (for both disks and RAM).
> 
>   I think this is way 9 out of 10 DBAs prefer the shortest, most compact 
> datatype possible for primary keys ;-)
> 
>   Yes, there are cases where uuids are nice (e.g. Elias and Matt's examples 
> of "merging content from different servers" and "migrate a blog").  But 
> having the larger datatype introduces a penalty that you pay 100% of the 
> time, vs these less frequent cases when having uuids help out.
> 
>   Just my 2ยข. 
> 
>   g
> 
>   P.S.  a quick google turned up the following link of a MSSQL DBA's thoughts 
> on GUIDs as primary keys:
>   http://www.sqljunkies.com/WebLog/odds_and_ends/archive/2005/08/31/16595.aspx
> 
> 
>   ----- Original Message ----- 
>   From: "Elias Torres" <[EMAIL PROTECTED]>
>   To: <roller-dev@incubator.apache.org>
>   Sent: Tuesday, March 14, 2006 8:46 AM
>   Subject: Re: why primary key is using hex instead of integer
> 
> 
>   -----BEGIN PGP SIGNED MESSAGE-----
>   Hash: SHA1
> 
>   I don't think it imposes a performance hit and even if it did it's
>   neglible. Once the database has indexed the strings or the ints the
>   lookup cost should be the same. I think it might help if we used fixed
>   char(32) as opposed to varchar(40), but that's mostly a space issue
>   since we are not using the last 8 bytes.
> 
>   - -Elias
> 
>   Allen Gilliland wrote:
>   > I don't know about why this decision was made historically, but i would
>   > agree that technically this imposes a slight performance hit.  AFAIK,
>   > databases can do lookups on purely numeric keys a little bit faster than
>   > alphanumerics.
>   > 
>   > I have no idea if the difference is significant or not.
>   > 
>   > -- Allen
>   > 
>   > 
>   > On Tue, 2006-03-14 at 07:25, Lance Lavandowska wrote:
>   > 
>   >>IIRC it is largely historical: at one point Roller used Castor, which
>   >>has as it's "default" a GUID generator that creates a 30-char primary
>   >>key.
>   >>
>   >>Lance
>   >>
>   >>On 3/14/06, David M Johnson <[EMAIL PROTECTED]> wrote:
>   >>
>   >>>On Mar 11, 2006, at 4:45 PM, BigLiu wrote:
>   >>>
>   >>>>I want to create some customer tables to extend roller. But I have
>   >>>>question
>   >>>>regarding why the primary key is defined in hex instead of integer?
>   >>>>Will
>   >>>>this cause any performance problem?
>   >>>
>   >>>I can't really remember why I chose the hex key. I don't think it has
>   >>>a significant impact on performance.
>   >>>
>   >>>- Dave

Reply via email to