Re: [HACKERS] [PATCH]-hash index improving

Kenneth Marshall Thu, 17 Jul 2008 14:02:13 -0700

On Thu, Jul 17, 2008 at 04:24:28PM -0400, Jonah H. Harris wrote:
> On Thu, Jul 17, 2008 at 5:26 AM, Xiao Meng <[EMAIL PROTECTED]> wrote:
> > The patch store hash code only in the index tuple.
> > It based on  Neil Conway's patch with an old version of PostgreSQL.
> > It passes the regression test but I didn't test the performance yet.
> > Anyone interested can make a performance test;-)
> > You can undefine the macro HASHVALUE_ONLY in hash.h to get the
> > original implementation.
> > It's a preliminary implementation and  I'm looking for input here.
> > Hope to hear from you.
> 
> I've spent some time today performing tests similar to those mentioned
> here (http://archives.postgresql.org/pgsql-hackers/2007-09/msg00208.php)
> 
> Using a word list of 2650024 unique words (maximum length is 118
> bytes), build times are still high, but I'm not really seeing any
> performance improvements over b-tree.  I haven't profiled it yet, but
> my test is as follows:
> 
> - Created the dict table
> - Loaded the dict table
> - Counted the records in the dict table
> - Created the index
> - Shutdown the database
> - Randomly selected 200 entries from the word list and built a file
> full of (SELECT * FROM dict WHERE word = '<word>') queries using them.
> - Cleared out the kernel cache
> - Started the database
> - Ran the query file
> 
> The result of this is between 5-10ms improvement in the overall
> execution time of all 200 queries.  The time-per-query is practically
> unnoticeable.  As this is in the range of noise, methinks there's a
> larger problem with hash indexes.  I haven't looked heavily into their
> implementation, but do you any of you know of any major design flaws?
> 
Jonah,


Thank you for running these tests. I was trying to reproduce my initial
tests on the original system to make it more apples to apples, but the
latest release needs more resources semaphore-wise than the 8.2 and
to fix it on Solaris 8 I will need a reboot. Would you mind posting
the timings for the hash_only index versus the hash_value versus the
btree index for the same test. Also, what is the on-disk size of all
three indexes? This will allow us to figure out the bucket/page load
or fill-factor for each scenario.

The basic implementation looked reasonable. I will take a look at
the patch this evening.

Regards,
Ken

-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH]-hash index improving

Reply via email to