On Fri, Sep 07, 2007 at 12:55:37PM +0100, Heikki Linnakangas wrote:
> Neil Conway wrote:
> > You might find this patch useful:
> > http://archives.postgresql.org/pgsql-patches/2005-05/msg00164.php
> Oh, I had forgot about that.
> > It implements the "just store the hash in the index" idea; it also sorts
> > the entries in a bucket by the hash value, which allows binary search to
> > be used to locate candidate matches.
> > I was surprised that this didn't result in a performance improvement for
> > the benchmarks that I ran, but I never got around to investigating
> > further (either running more benchmarks or checking whether there was a
> > bug in the implementation).
> You did get a smaller index, which has cache benefits with larger
> tables. You didn't compare the size comparison against b-tree, but a
> smaller index is a good thing.
> I think you left some money on the table on that front. Since all the
> HashItems are fixed size, you can get rid of the line pointers
> altogether. Given that a sizeof(HashItemData) is 8 bytes, and a line
> pointer is 4 bytes, that should give a further 1/3 reduction in index
> size. If you're willing to give up on the alignment of HashItemData, you
> can get it down to 6 bytes.
> Even from a CPU point of view, as the table becomes bigger and the
> b-tree becomes deeper, the difference between O(1) and O(log n) lookup
> starts to become more significant.
If you use the size values from my initial tests, the hash index is
down to 3X the btree size instead of 5X. If we can make these additional
changes and add a unique flags to the index entry, we would have a much
smaller index. I had not thought of it at the time, but the addition of
the unique/sole item flag would allow the use of the hash index to
support unique indexes and be used for primary keys. In some usage
cases, the O(1) would be very advantageous.
---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend