Neil Conway wrote:
> You might find this patch useful:
Oh, I had forgot about that.
> It implements the "just store the hash in the index" idea; it also sorts
> the entries in a bucket by the hash value, which allows binary search to
> be used to locate candidate matches.
> I was surprised that this didn't result in a performance improvement for
> the benchmarks that I ran, but I never got around to investigating
> further (either running more benchmarks or checking whether there was a
> bug in the implementation).
You did get a smaller index, which has cache benefits with larger
tables. You didn't compare the size comparison against b-tree, but a
smaller index is a good thing.
I think you left some money on the table on that front. Since all the
HashItems are fixed size, you can get rid of the line pointers
altogether. Given that a sizeof(HashItemData) is 8 bytes, and a line
pointer is 4 bytes, that should give a further 1/3 reduction in index
size. If you're willing to give up on the alignment of HashItemData, you
can get it down to 6 bytes.
Even from a CPU point of view, as the table becomes bigger and the
b-tree becomes deeper, the difference between O(1) and O(log n) lookup
starts to become more significant.
---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings