On 06.09.2011 16:40, Robert Haas wrote:
On Tue, Sep 6, 2011 at 6:21 AM, Heikki Linnakangas
<heikki.linnakan...@enterprisedb.com>  wrote:
The way it would work is that on page split the right page is flagged with
MISSING_DOWNLINK flag. When the downlink is inserted into the parent, the
flag is cleared in the same critical section as the WAL record for the
insertion of the parent is written. Normally, a backend would never see the
flag set, because the locks on the split pages are not released until the
parent record is written and the flag cleared again. But if inserting the
downlink fails for any reason, the next inserter or vacuum that steps on the
page can finish the split by inserting the downlink.

Unfortunately that means holding the locks on the split pages longer than we
do at the moment. Currently they are released as soon as the parent page is
locked; with this change they would need to be held until the WAL record of
the downlink insertion is done. B-tree is so heavily used that I'm a bit
hesitant to sacrifice any concurrency there, but I don't think it would be
noticeable in practice.

Do you really need to hold the page locks for all that time, or could
you cheat?  Like... release the locks on the split pages but then go
back and reacquire them to clear the flag...

Hmm, there's two issues with that:

1. While you're not holding the locks on the child pages, someone can step onto the page and see that the MISSING_DOWNLINK flag is set, and try to finish the split for you.

2. If you don't hold the page locked while you clear the flag, someone can start and finish a checkpoint after you've inserted the downlink, and before you've cleared the flag. You end up in a scenario where the flag is set, but the page in fact *does* have a downlink in the parent.

So, nope, we can't cheat.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to