On Thu, Mar 1, 2018 at 3:18 PM, Claudio Freire <klaussfre...@gmail.com> wrote:
> given...
>
> +        if (P_ISLEAF(lpageop) && P_RIGHTMOST(lpageop) &&
> +            !P_INCOMPLETE_SPLIT(lpageop) &&
> +            !P_IGNORE(lpageop) &&
> +            (PageGetFreeSpace(page) > itemsz) &&
> +            _bt_compare(rel, natts, itup_scankey, page,
> PageGetMaxOffsetNumber(page)) > 0)

One simple problem with this code is that it assumes that there must
be at least one item on a leaf page, which isn't guaranteed. There has
to be a highkey on non-righmost pages, but rightmost leaf pages can be
totally empty. While btvacuumpage() is very unlikely to not be able to
begin deleting the page there and then, it can happen (see remarks at
the end of btvacuumpage() about recursing).

Other issues spotted:

* The variable name "lpageop" seems like is was lifted from somewhere
dealing with a page to the left of some other page, which is not the
case here.

* P_INCOMPLETE_SPLIT() checks a bit that can only be set on a
non-rightmost page -- a page that has a sibling to its right that
doesn't have a downlink in parent. The bit is only ever set on the
"target" of a (still unfinished) page split. This is why
_bt_moveright() doesn't bother with completing a page split when it
reaches a rightmost page. I don't see why you need this part of the
test at all.

> The key there is strictly greater than the rightmost key. As such, it
> must precede the first page with an equal key, and since it's the
> rightmost page... there's no such key. But if there was, the check for
> uniqueness only needs the insert point to point to the first such
> page, and this guarantees it.

This code assumes that it can use block number as a stable way of
finding the same point in the B-Tree over time, without any interlock.
That may actually work out in practice, since even if the page is
recycled there can only ever be one rightmost leaf page, but it seems
like a brittle approach to me. The page image may be recycled by the
FSM already, and we really shouldn't be depending on the page not
looking a particular way once that has happened. I guess that you can
say the same thing about using page LSN to determine cached block
staleness instead, but that still seems a lot safer.

Basically, I'm worried that we could do something entirely
unpredictable when a page has actually been recycled, since we're
entirely opting out of the RecentGlobalXmin interlock business on the
assumption that we can be sure that recycling hasn't occurred in some
other way. Imagine what could happen if we ever teach the FSM to share
freespace among small relations, which seems quite possible to me.

> You could allow for some slack, by comparing against the leftmost key
> instead. You just need to know whether the key fits in the page. Then,
> the insert can proceed as usual, doing the binsearch to find the
> proper insertion point, etc.

The whole way that this patch opts out of using an exclusive buffer
lock on the "first page the value could be on" (the _bt_check_unique()
+ _bt_findinsertloc() stuff) makes me uneasy. I know that that isn't
terribly concrete.

-- 
Peter Geoghegan

Reply via email to