Re: [HACKERS] Brain dump: btree collapsing

Curtis Faith Tue, 18 Feb 2003 06:53:42 -0800

> tom lane wrote:
> > Hm.  A single lock that must be grabbed for operations anywhere in 
> > the index is a concurrency bottleneck.

I replied a bit later: 
> I don't think it would be that bad.  It's not a lock but a
> mutex that would only be held for relatively brief duration. 
> It looks like tens or a few hundred instructions at most 
> covered by the mutex and only during scans that cross page boundaries.
> 
> Unless you were talking about a 16 to 64 way SMP box, I don't
> think there would be much actual contention since there would 
> have to be contention for the same index at the same time and 
> depending on the number of keys per page, very little of the 
> run time is spent on the lock transition for traversal across 
> pages as compared to all the other stuff that would be going on.

tom also presciently wrote:
> But maybe we could avoid that

After sleeping on it a bit, it occurs to me that you are again correct;
the mutex is not required to do what I suggested. It is simply a matter
of assuring that during the page traversal the lock is either obtained
or the process added to the waiters queue before the current page lock
is released. This will ensure than any process trying to obtain an
exclusive or superexclusive lock will wait or fail an immediate lock
acquisition.

traversal pseudo code:

    if the next page lock can be obtained immediately

        grab it

        release the current page lock

    else if we would have to wait

        add current process to waiters queue

        release current page lock

        sleep on the lock

    end if

end traversal pseudo code

Like the current release before next page acquisition method, this won't
cause deadlocks with other scanners going in the opposite direction or
with concurrently inserting processes since the current page lock is
released before sleeping.

If this change is made then my original suggestion, that the merge
process attempt a non-deadlocking acquisition of all the locks with a
drop of all locks and a retry if any of the attempts would have caused a
wait, should work fine.

If the traversal code is changed as per above, a merging VACUUM process
is guaranteed to either run into the lock held by the scanning process
on the current page or the next page.

This has the potential side benefit of permitting the simplification of
the scan code in the case where the lock is obtained immediately. Since
the scanning process would know that no inserting process has added
pages to the tree because of a split, it wouldn't have to check for
newly inserted pages to the right of the next page. I don't know how
much work is involved in this check but if it is significant it could be
eliminated in this case, one which is the most likely case in many
scenarios.

What do you think?

- Curtis

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] Brain dump: btree collapsing

Reply via email to