Hi! On Wed, Jul 8, 2015 at 10:27 PM, Heikki Linnakangas <[email protected]> wrote:
> In dataPlaceToPageLeaf-function:
>
> if (append)
>> {
>> /*
>> * Even when appending, trying to append more items than
>> will fit is
>> * not completely free, because we will merge the new
>> items and old
>> * items into an array below. In the best case, every new
>> item fits in
>> * a single byte, and we can use all the free space on
>> the old page as
>> * well as the new page. For simplicity, ignore segment
>> overhead etc.
>> */
>> maxitems = Min(maxitems, freespace +
>> GinDataPageMaxDataSize);
>> }
>>
>
> Hmm. So after splitting the page, there is freespace +
> GinDataPageMaxDataSize bytes available on both pages together. freespace
> has been adjusted with the fillfactor, but GinDataPageMaxDataSize is not.
> So this overshoots, because when leafRepackItems actually distributes the
> segments on the pages, it will fill both pages only up to the fillfactor.
> This is an upper bound, so that's harmless, it only leads to some
> unnecessary work in dealing with the item lists. But I think that should be:
>
> maxitems = Min(maxitems, freespace + leaf->maxdatasize);
>
Good catch! Fixed.
>
> else
>> {
>> /*
>> * Calculate a conservative estimate of how many new
>> items we can fit
>> * on the two pages after splitting.
>> *
>> * We can use any remaining free space on the old page to
>> store full
>> * segments, as well as the new page. Each full-sized
>> segment can hold
>> * at least MinTuplesPerSegment items
>> */
>> int nnewsegments;
>>
>> nnewsegments = freespace / GinPostingListSegmentMaxSize;
>> nnewsegments += GinDataPageMaxDataSize /
>> GinPostingListSegmentMaxSize;
>> maxitems = Min(maxitems, nnewsegments *
>> MinTuplesPerSegment);
>> }
>>
>
> This branch makes the same mistake, but this is calculating a lower bound.
> It's important that maxitems is not set to higher value than what actually
> fits on the page, otherwise you can get an ERROR later in the function,
> when it turns out that not all the items actually fit on the page. The
> saving grace here is that this branch is never taken when building a new
> index, because index build inserts all the TIDs in order, but that seems
> pretty fragile. Should use leaf->maxdatasize instead of
> GinDataPageMaxDataSize here too.
>
Fixed.
> But that can lead to funny things, if fillfactor is small, and BLCKSZ is
> small too. With the minimums, BLCKSZ=1024 and fillfactor=0.2, the above
> formula will set nnewsegments to zero. That's not going to end up well. The
> problem is that maxdatasize becomes smaller than
> GinPostingListSegmentMaxSize, which is a problem. I think
> GinGetMaxDataSize() needs to make sure that the returned value is always >=
> GinPostingListSegmentMaxSize.
>
Fixed.
> Now that we have a fillfactor, shouldn't we replace the 75% heuristic
> later in that function, when inserting to the rightmost page rather than
> building it from scratch? In B-tree, the fillfactor is applied to that case
> too.
Sounds reasonable. Now it works like btree.
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
gin_fillfactor_6.patch
Description: Binary data
-- Sent via pgsql-hackers mailing list ([email protected]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
