Re: [HACKERS] Hash index todo list item

Kenneth Marshall Tue, 25 Sep 2007 09:26:54 -0700

On Tue, Sep 25, 2007 at 03:35:47PM +0100, Gregory Stark wrote:
> "Kenneth Marshall" <[EMAIL PROTECTED]> writes:
> 
> > On Thu, Sep 20, 2007 at 05:12:45PM -0700, Tom Raney wrote:
> >
> >> Using our implementation, build times and index sizes are
> >> comparable with btree index build times and index sizes.
> >...
> > That is super! (and timely)
> 
> It is pretty super. I have a few comments to raise but don't take it to be too
> negative, it sounds like this is a big step forward towards making hash
> indexes valuable.
> 
> Firstly in the graphs it seems the index size graph has an exponential x-axis
> but a linear y-axis. This makes it hard to see what I would expect to be
> pretty much linear growth. The curves look exponential which would mean linear
> growth but of course it's hard to tell.
> 
> Also, the growth in the time chart looks pretty much linear. That seems weird
> since I would expect there would be a visible extra component since sort times
> are n-log(n). Perhaps you need to test still larger tables to see that though.
> 
> In any case it's clear from the results you have there that the change is a
> positive one and fixes a fundamental problem with the hash index build code.
> 
> Something else you should perhaps test is indexing something which is
> substantially larger than hash function output. A hash function is going to
> make the most sense when indexing something like strings for which you want to
> avoid the long comparison costs. Especially consider testing this on a UTF8
> locale with expensive comparisons (like a CJK locale for example).
> 
> Note that the bottom line for the problems with hash indexes is that the
> current implementation doesn't offer any advantages over btree indexes. Hash
> indexes need to be not only as good of a choice as btree indexes but
> significantly better a significantly better choice at least for some specific
> circumstances.
> 
> Also, if you're going to submit a patch you should check out a copy of the CVS
> HEAD and work from that. I don't think there are huge differences in the area
> of hash indexes though. But in most other areas you would be spending quite a
> bit of time dealing details which have changed since.
> 
> Finally note that we're in the final throes of the 8.3 feature freeze.
> Normally any patch submitted now would be held until 8.3 is released and
> development on 8.4 is started. I could imagine an exception being made for
> hash indexes since they're so moribund currently but probably not. The flip
> side of that argument is that there's not much point in making an exception
> for something which will only be really useful once further work is done in
> the same area.
>


Although I am very excited about this patch, I do not see any real value
in including it in 8.3. As you mention, we need to to have a hash index
implementation that outperforms btree in some problem regime and that is
currently not the case. I have just recently started the process of
gathering ideas and having discussions on various approaches to making
hash indexes more performant and we have a number of ideas on which to
start our investigation. I do think that this patch will make the testing
and evaluation, that will be needed to truly improve the hash index, much
much easier.

Regards,
Ken

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Hash index todo list item

Reply via email to