On Sat, May 26, 2012 at 12:33 AM, Heikki Linnakangas <
heikki.linnakan...@enterprisedb.com> wrote:

> I think we should rewrite the way we track the parents completely. Rather
> than keep a path stack attached to every node buffer, let's just maintain a
> second hash table that contains the parent of every internal node. Whenever
> a downlink is moved to another page, update the hash table with the new
> information. That way we always have up-to-date information about the
> parent of every internal node.
>
> That's much easier to understand than the path stack structures we have
> now. I think the overall memory consumption will be about the same too.
> Although we need the extra hash table with one entry for every internal
> node, we get rid of the path stack structs, which are also one per every
> internal node at the moment.
>
> I believe it is faster too. I added some instrumentation to the existing
> gist code (with the additional getNodeBuffer() call added to fix this bug),
> to measure the time spent moving right, when refinding the parent of a
> page. I added gettimeofday() calls before and after moving right, and
> summed the total. In my test case, the final index size was about 19GB, and
> the index build took 3545 seconds (59 minutes). Of that time, 580 seconds
> (~ 10 minutes) was spent moving right to refind parents. That's a lot. I
> also printed a line whenever a refind operation had to move right 20 pages
> or more. That happened 2482 times during the build, in the worst case we
> moved right over 40000 pages.
>
> Attached is a patch to replace the path stacks with a hash table. With
> this patch, the index build time in my test case dropped from 59 minutes to
> about 55 minutes. I don'αΊ— know how representative or repeatable this test
> case is, but this definitely seems very worthwhile, not only because it
> fixes the bug and makes the code simpler, but also on performance grounds.
>

Cool, seems that we've both simplier and faster implementation of finding
parent. Thanks!


> Alexander, do you still have the test environments and data lying around
> that you used for GiST buffering testing last summer? Could you rerun some
> of those tests with this patch?


I think I can restore test environment and data. Will rerun tests soon.

------
With best regards,
Alexander Korotkov.

Reply via email to