Kenneth Marshall wrote:
I think that if the case of >1 entry per hash becomes common enough to
be significant, and the key is stored in the hash, that a btree will
perform equal or better, and there is no point in pursuing such a hash
index model. This is where we are today.
The value of storing the
actual value, possibly as an option, is that the key check can be done
in the index without requiring a heap lookup to check the actual value
which would be a benefit for a unique index. I agree that supporting
variable length data would complicate the index and reduce performance.
I am not willing to assume that ~1 entry per hash bucket is necessarily
what we want, at least without some testing.
If the key is stored, all of these factors likely favor the btree format
over the hash format. Again, this is where we are today.
It seems reasonable that
with the performance differences between L1/L2/L3 cache, main memory,
and the disk subsystem a higher load factor would result in better
overall system performance by reducing cache line misses and improving
hardware pre-fetch efficiency.
Space efficiency is provided by not storing the key, nor the header data
required (length prefix?).
Along with the hypothetical performance
wins, the hash index space efficiency would be improved by a similar
factor. Obviously, all of these ideas would need to be tested in
various workload environments. In the large index arena, 10^6 to 10^9
keys and more, space efficiency will help keep the index manageable
in todays system memories.
Please keep the ideas and comments coming. I am certain that a synthesis
of them will provide an implementation with the performance characteristics
that we are seeking.
The subject interests me. :-)
Mark Mielke <[EMAIL PROTECTED]>