I think Greg's points are very valid and if we were starting from scratch I think I would be a proponent of making some changes to the data model, but I also think that George is correct in suggesting that at the end of the day our UUIDs are nowhere near our biggest bottleneck.
-- Allen On Thu, 2006-03-16 at 18:30, George Daswani wrote: > It's interesting that some people are arguing over the key strategy. I > personally would rather use UUID's than sequences, identities, integers (some > of which are DB specific, and some aren't even safe on the cluster). Most of > the problems people are pointing out would probably be valid if roller was > hitting the database directly every time, but for people who looked at the > code knows that it's doing a lot of caching in order to scale. Changing your > key strategy won't allow you to scale any better. It would be better IMHO if > the developers instead focused on improving caching even more, then make the > roller work properly on a cluster. It's too bad Tangosol doesn't offer an > offer source license for Coherence, but then again - there are other > distributed caches out there like Jboss Cache and etc. > > George Daswani > -----Original Message----- > From: Greg Hamer [mailto:[EMAIL PROTECTED] > Sent: Thursday, March 16, 2006 5:50 PM > To: roller-dev@incubator.apache.org > Subject: Re: why primary key is using hex instead of integer > > Following is a minor addendum to my email Tuesday. > > In my work today I was reviewing the following from the MySQL 5.1 Reference > Manual: > 7.4.2. Make Your Data as Small as Possible > > Particularly relevant excerpts follow below. I believe Oracle, Microsoft, > Sybase and IBM provide similar guidelines. > > > EXCERPTS > > One of the most basic optimizations is to design your tables to take as > little space on the disk as possible. This can result in huge improvements > because disk reads are faster, and smaller tables normally require less main > memory while their contents are being actively processed during query > execution. Indexing also is a lesser resource burden if done on smaller > columns. > > ... > > You can get better performance for a table and minimize storage space by > using the techniques listed here: > a.. Use the most efficient (smallest) data types possible. MySQL has many > specialized types that save disk space and memory. For example, use the > smaller integer types if possible to get smaller tables. MEDIUMINT is often a > better choice than INT because a MEDIUMINT column uses 25% less space. > b.. ... > c.. The primary index of a table should be as short as possible. This makes > identification of each row easy and efficient. > d.. ... > e.. ... Shorter indexes are faster, not only because they require less disk > space, but because they give also you more hits in the index cache, and thus > fewer disk seeks. > > > > ----- Original Message ----- > From: Greg Hamer > To: roller-dev@incubator.apache.org > Sent: Tuesday, March 14, 2006 2:52 PM > Subject: Re: why primary key is using hex instead of integer > > > Based on my experience as a DBA, I suspect there are penalties, but you are > only likely to see them under load. > > All data returned from databases passes through RAM. All databases enhance > performance via caching data in RAM. The longer a row (i.e the more bytes in > a row), the less rows that can be cached in a given amount of RAM. Making > primary keys that require 32 bytes of storage per row versus, say integers > which require 4 bytes, reduces the amount of data that can be served from > cache. > > Now multiply the 32 vs 4 bytes by: a) every foreign key referencing back > to primary key; b) every non-clustered index on the primary key; and c) every > index on the foreign keys. > > Add it all up and there likely is detrimental impact on: a) performance; > b) scalability; and c) cost (for both disks and RAM). > > I think this is way 9 out of 10 DBAs prefer the shortest, most compact > datatype possible for primary keys ;-) > > Yes, there are cases where uuids are nice (e.g. Elias and Matt's examples > of "merging content from different servers" and "migrate a blog"). But > having the larger datatype introduces a penalty that you pay 100% of the > time, vs these less frequent cases when having uuids help out. > > Just my 2ยข. > > g > > P.S. a quick google turned up the following link of a MSSQL DBA's thoughts > on GUIDs as primary keys: > http://www.sqljunkies.com/WebLog/odds_and_ends/archive/2005/08/31/16595.aspx > > > ----- Original Message ----- > From: "Elias Torres" <[EMAIL PROTECTED]> > To: <roller-dev@incubator.apache.org> > Sent: Tuesday, March 14, 2006 8:46 AM > Subject: Re: why primary key is using hex instead of integer > > > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > I don't think it imposes a performance hit and even if it did it's > neglible. Once the database has indexed the strings or the ints the > lookup cost should be the same. I think it might help if we used fixed > char(32) as opposed to varchar(40), but that's mostly a space issue > since we are not using the last 8 bytes. > > - -Elias > > Allen Gilliland wrote: > > I don't know about why this decision was made historically, but i would > > agree that technically this imposes a slight performance hit. AFAIK, > > databases can do lookups on purely numeric keys a little bit faster than > > alphanumerics. > > > > I have no idea if the difference is significant or not. > > > > -- Allen > > > > > > On Tue, 2006-03-14 at 07:25, Lance Lavandowska wrote: > > > >>IIRC it is largely historical: at one point Roller used Castor, which > >>has as it's "default" a GUID generator that creates a 30-char primary > >>key. > >> > >>Lance > >> > >>On 3/14/06, David M Johnson <[EMAIL PROTECTED]> wrote: > >> > >>>On Mar 11, 2006, at 4:45 PM, BigLiu wrote: > >>> > >>>>I want to create some customer tables to extend roller. But I have > >>>>question > >>>>regarding why the primary key is defined in hex instead of integer? > >>>>Will > >>>>this cause any performance problem? > >>> > >>>I can't really remember why I chose the hex key. I don't think it has > >>>a significant impact on performance. > >>> > >>>- Dave