Hello, 

I had an inquiry about the following log messages.

2016-07-20 10:16:58.294 JST,,,3240,,578ed102.ca8,1,,2016-07-20 10:16:50 
JST,30/75,0,LOG,00000,"no left sibling (concurrent deletion?) in 
""some_index_rel""",,,,,,,,"_bt_unlink_halfdead_page, nbtpage.c:1643",""
2016-07-20 10:16:58.294 JST,,,3240,,578ed102.ca8,2,,2016-07-20 10:16:50 
JST,30/75,0,ERROR,XX000,"lock main 13879 is not held",,,,,"automatic vacuum of 
table ""db.nsp.tbl""",,,"LWLockRelease, lwlock.c:1137","" 

These are gotten after pg_upgrade from 9.1.13 to 9.4.

The first line is emitted for simultaneous deletion of a index
page, which is impossible by design in a consistent state so the
complained situation should be the result of an index corruption
before upgading, specifically, inconsistent sibling pointers
around a deleted page.

I noticed the following part in nbtpage.c related to this. It is
the same still in the master.

nbtpage.c:1635@9.4.8:

>  while (P_ISDELETED(opaque) || opaque->btpo_next != target)
>  {
>          /* step right one page */
>          leftsib = opaque->btpo_next;
>          _bt_relbuf(rel, lbuf);
>          if (leftsib == P_NONE)
>          {
>                  elog(LOG, "no left sibling (concurrent deletion?) in \"%s\"",
>                           RelationGetRelationName(rel));
>                  return false;

With the condition for the while loop, if the just left sibling
of target is (mistakenly, of course) in deleted state (and the
target is somehow pointing to the deleted page as left sibling),
lbuf finally goes beyond to right side of the target. This seems
to result in unintentional releasing of the lock on target and
the second log message.


My point here is that if concurrent deletion can't be perfomed by
the current implement, this while loop could be removed and
immediately error out or log a message,


> if (P_ISDELETED(opaque) || opaque->btpo_next != target)
> {
>    elog(ERROR, "no left sibling of page %d (concurrent deletion?) in 
> \"%s\"",..


or, the while loop at least should stop before overshooting the
target.

> while (P_ISDELETED(opaque) || opaque->btpo_next != target)
> {
>    /* step right one page */
>    leftsib = opaque->btpo_next;
>    _bt_relbuf(rel, lbuf);
>    if (leftsib == target || leftsib == P_NONE)
>    {
>      elog(ERROR, "no left sibling of page %d (concurrent deletion?) in 
> \"%s\"",..


I'd like to propose to do the former since the latter still is
not perfect for such situations, anyway.


Any thoughts or opinions?

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to