On Wed, Jun 15, 2016 at 2:41 AM, Thomas Munro
<thomas.mu...@enterprisedb.com> wrote:
> I spent some time chasing down the exact circumstances.  I suspect
> that there may be an interlocking problem in heap_update.  Using the
> line numbers from cae1c788 [1], I see the following interaction
> between the VACUUM, UPDATE and SELECT (pg_check_visible) backends, all
> in reference to the same block number:
>
>   [VACUUM] sets all visible bit
>
>   [UPDATE] heapam.c:3931 HeapTupleHeaderSetXmax(oldtup.t_data, 
> xmax_old_tuple);
>   [UPDATE] heapam.c:3938 LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
>
>   [SELECT] LockBuffer(buffer, BUFFER_LOCK_SHARE);
>   [SELECT] observes VM_ALL_VISIBLE as true
>   [SELECT] observes tuple in HEAPTUPLE_DELETE_IN_PROGRESS state
>   [SELECT] barfs
>
>   [UPDATE] heapam.c:4116 visibilitymap_clear(...)

Yikes: heap_update() sets the tuple's XMAX, CMAX, infomask, infomask2,
and CTID without logging anything or clearing the all-visible flag and
then releases the lock on the heap page to go do some more work that
might even ERROR out.  Only if that other work all goes OK do we
relock the page and perform the WAL-logged actions.

That doesn't seem like a good idea even in existing releases, because
you've taken a tuple on an all-visible page and made it not
all-visible, and you've made a page modification that is not
necessarily atomic without logging it.  This is is particularly bad in
9.6, because if that page is also all-frozen then XMAX will eventually
be pointing into space and VACUUM will never visit the page to
re-freeze it the way it would have done in earlier releases.  However,
even in older releases, I think there's a remote possibility of data
corruption.  Backend #1 makes these changes to the page, releases the
lock, and errors out.  Backend #2 writes the page to the OS.  DBA
takes a hot backup, tearing the page in the middle of XMAX.  Oops.

I'm not sure what to do about this: this part of the heap_update()
logic has been like this forever, and I assume that if it were easy to
refactor this away, somebody would have done it by now.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to