On Wed, Sep 10, 2008 at 10:17:31PM -0600, Alex Hunsaker wrote: > On Wed, Sep 10, 2008 at 9:49 PM, Alex Hunsaker <[EMAIL PROTECTED]> wrote: > > On Wed, Sep 10, 2008 at 7:04 AM, Kenneth Marshall <[EMAIL PROTECTED]> wrote: > >> On Tue, Sep 09, 2008 at 07:23:03PM -0600, Alex Hunsaker wrote: > >>> On Tue, Sep 9, 2008 at 7:48 AM, Kenneth Marshall <[EMAIL PROTECTED]> > >>> wrote: > >>> > I think that the glacial speed for generating a big hash index is > >>> > the same problem that the original code faced. > >>> > >>> Yeah sorry, I was not saying it was a new problem with the patch. Err > >>> at least not trying to :) *Both* of them had been running at 18+ (I > >>> finally killed them sometime Sunday or around +32 hours...) > >>> > >>> > It would be useful to have an equivalent test for the hash-only > >>> > index without the modified int8 hash function, since that would > >>> > be more representative of its performance. The collision rates > >>> > that I was observing in my tests of the old and new mix() functions > >>> > was about 2 * (1/10000) of what you test generated. You could just > >>> > test against the integers between 1 and 2000000. > >>> > >>> Sure but then its pretty much just a general test of patch vs no > >>> patch. i.e. How do we measure how much longer collisions take when > >>> the new patch makes things faster? That's what I was trying to > >>> measure... Though I apologize I don't think that was clearly stated > >>> anywhere... > >> > >> Right, I agree that we need to benchmark the collision processing > >> time difference. I am not certain that two data points is useful > >> information. There are 469 collisions with our current hash function > >> on the integers from 1 to 2000000. What about testing the performance > >> at power-of-2 multiples of 500, i.e. 500, 1000, 2000, 4000, 8000,... > >> Unless you adjust the fill calculation for the CREATE INDEX, I would > >> stop once the time to create the index spikes. It might also be useful > >> to see if a CLUSTER affects the performance as well. What do you think > >> of that strategy? > > > > Not sure it will be a good benchmark of collision processing. Then > > again you seem to have studied the hash algo closer than me. Ill go > > see about doing this. Stay tuned. > > Assuming I understood you correctly, And I probably didn't this does > not work very well because you max out at 27,006 values before you get > this error: > ERROR: index row size 8152 exceeds hash maximum 8144 > HINT: Values larger than a buffer page cannot be indexed. > > So is a power-of-2 multiple of 500 not simply: > x = 500; > while(1) > { > print x; > x *= 2; > } > > ? > Alex,
I meant to check the performance with increasing numbers of collisions, not increasing size of the hashed item. In other words, something like this: for ($coll=500; $i<=1000000; $i=$i*2) { for ($i=0; $i<=1000000; $i++) { hash(int8 $i); } # add the appropriate number of collisions, distributed evenly to # minimize the packing overrun problem for ($dup=0; $dup<=$coll; $dup++) { hash(int8 MAX_INT + $dup * 1000000/$coll); } } Ken -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches