In general, the main problem with large, compound keys is that said keys do
not hash well; and by hash well I mean that they do not hash to proximate
groups, as for example, sequential numeric keys would.

There is read-ahead logic and RAM in your disk drive(s). There is read-ahead
logic and RAM in your disk controller(s). There is read-ahead logic in the
O/S. None of this works very well when records are randomly scattered
throughout the file.

If I used sequential, numeric keys, and I wanted all the records created
yesterday, they would likely all be near each other on the physical disk.
When I accessed the first one, the disk/controller/os will have pre-fetched
many of the day's other records as well. That makes for speedy access.

This is part of the reason why I think long, compound keys are a PITA and
are to be avoided. Simple numeric keys will process quicker because they
hash better, and are easier to type too.  This is often the problem with
"intelligent" keys; by embedding data in the key, you almost always make the
key longer and the file hash poorly. IMO it makes way more sense to use
simple numeric keys and create real attributes for the data you are tempted
to build the key out of.  I say this with 20/20 hindsight, as I have
designed many systems with large files that use compound keys, every one of
which I have come to regret.

Roy, you could prove this by writing a program that reads every record in
your original file and writes it out to a new file (with the same modulo &
sep as the original file) using a simple incrementing counter as the key. I
will bet that the new file performs better than your original one does, even
though it should have more attributes (necessary to accommodate the data
values that were embedded in the key to the original file).

My 0.02,
/Scott Ballinger
Pareto Corporation
Edmonds, WA USA
206 713 6006
-------
u2-users mailing list
[email protected]
To unsubscribe please visit http://listserver.u2ug.org/

Reply via email to