Re: [HACKERS] Fillfactor for GIN indexes

Heikki Linnakangas Wed, 08 Jul 2015 12:28:33 -0700

In dataPlaceToPageLeaf-function:

        if (append)
        {
                /*
                 * Even when appending, trying to append more items than will 
fit is
                 * not completely free, because we will merge the new items and 
old
                 * items into an array below. In the best case, every new item 
fits in
                 * a single byte, and we can use all the free space on the old 
page as
                 * well as the new page. For simplicity, ignore segment 
overhead etc.
                 */
                maxitems = Min(maxitems, freespace + GinDataPageMaxDataSize);
        }

Hmm. So after splitting the page, there is freespace +GinDataPageMaxDataSize bytes available on both pages together. freespacehas been adjusted with the fillfactor, but GinDataPageMaxDataSize isnot. So this overshoots, because when leafRepackItems actuallydistributes the segments on the pages, it will fill both pages only upto the fillfactor. This is an upper bound, so that's harmless, it onlyleads to some unnecessary work in dealing with the item lists. But Ithink that should be:


maxitems = Min(maxitems, freespace + leaf->maxdatasize);

        else
        {
                /*
                 * Calculate a conservative estimate of how many new items we 
can fit
                 * on the two pages after splitting.
                 *
                 * We can use any remaining free space on the old page to store 
full
                 * segments, as well as the new page. Each full-sized segment 
can hold
                 * at least MinTuplesPerSegment items
                 */
                int                     nnewsegments;

                nnewsegments = freespace / GinPostingListSegmentMaxSize;
                nnewsegments += GinDataPageMaxDataSize / 
GinPostingListSegmentMaxSize;
                maxitems = Min(maxitems, nnewsegments * MinTuplesPerSegment);
        }

This branch makes the same mistake, but this is calculating a lowerbound. It's important that maxitems is not set to higher value than whatactually fits on the page, otherwise you can get an ERROR later in thefunction, when it turns out that not all the items actually fit on thepage. The saving grace here is that this branch is never taken whenbuilding a new index, because index build inserts all the TIDs in order,but that seems pretty fragile. Should use leaf->maxdatasize instead ofGinDataPageMaxDataSize here too.

But that can lead to funny things, if fillfactor is small, and BLCKSZ issmall too. With the minimums, BLCKSZ=1024 and fillfactor=0.2, the aboveformula will set nnewsegments to zero. That's not going to end up well.The problem is that maxdatasize becomes smaller thanGinPostingListSegmentMaxSize, which is a problem. I thinkGinGetMaxDataSize() needs to make sure that the returned value is always>= GinPostingListSegmentMaxSize.

Now that we have a fillfactor, shouldn't we replace the 75% heuristiclater in that function, when inserting to the rightmost page rather thanbuilding it from scratch? In B-tree, the fillfactor is applied to thatcase too.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fillfactor for GIN indexes

Reply via email to