On Wed, Jun 15, 2016 at 08:56:52AM -0400, Robert Haas wrote:
> On Wed, Jun 15, 2016 at 2:41 AM, Thomas Munro
> <thomas.mu...@enterprisedb.com> wrote:
> > I spent some time chasing down the exact circumstances.  I suspect
> > that there may be an interlocking problem in heap_update.  Using the
> > line numbers from cae1c788 [1], I see the following interaction
> > between the VACUUM, UPDATE and SELECT (pg_check_visible) backends, all
> > in reference to the same block number:
> >
> >   [VACUUM] sets all visible bit
> >
> >   [UPDATE] heapam.c:3931 HeapTupleHeaderSetXmax(oldtup.t_data, 
> > xmax_old_tuple);
> >   [UPDATE] heapam.c:3938 LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
> >
> >   [SELECT] LockBuffer(buffer, BUFFER_LOCK_SHARE);
> >   [SELECT] observes VM_ALL_VISIBLE as true
> >   [SELECT] observes tuple in HEAPTUPLE_DELETE_IN_PROGRESS state
> >   [SELECT] barfs
> >
> >   [UPDATE] heapam.c:4116 visibilitymap_clear(...)
> Yikes: heap_update() sets the tuple's XMAX, CMAX, infomask, infomask2,
> and CTID without logging anything or clearing the all-visible flag and
> then releases the lock on the heap page to go do some more work that
> might even ERROR out.  Only if that other work all goes OK do we
> relock the page and perform the WAL-logged actions.
> That doesn't seem like a good idea even in existing releases, because
> you've taken a tuple on an all-visible page and made it not
> all-visible, and you've made a page modification that is not
> necessarily atomic without logging it.  This is is particularly bad in
> 9.6, because if that page is also all-frozen then XMAX will eventually
> be pointing into space and VACUUM will never visit the page to
> re-freeze it the way it would have done in earlier releases.  However,
> even in older releases, I think there's a remote possibility of data
> corruption.  Backend #1 makes these changes to the page, releases the
> lock, and errors out.  Backend #2 writes the page to the OS.  DBA
> takes a hot backup, tearing the page in the middle of XMAX.  Oops.

I agree the non-atomic, unlogged change is a problem.  A related threat
doesn't require a torn page:

  AssignTransactionId() xid=123
  heapam.c:3931 HeapTupleHeaderSetXmax(oldtup.t_data, 123);
  some ERROR before heap_update() finishes
  rollback;  -- xid=123
  some backend flushes the modified page
  immediate shutdown
  AssignTransactionId() xid=123
  commit;  -- xid=123

If nothing wrote an xlog record that witnesses xid 123, the cluster can
reassign it after recovery.  The failed update is now considered a successful
update, and the row improperly becomes dead.  That's important.

I don't know whether the 9.6 all-frozen mechanism materially amplifies the
consequences of this bug.  The interaction with visibility map and freeze map
is not all bad; indeed, it can reduce the risk of experiencing consequences
from the non-atomic, unlogged change bug.  If the row is all-visible when
heap_update() starts, every transaction should continue to consider the row
visible until heap_update() finishes successfully.  If an ERROR interrupts
heap_update(), visibility verdicts should be as though the heap_update() never
happened.  If one of the previously-described mechanisms would make an xmax
visibility test give the wrong answer, an all-visible bit could mask the
problem for awhile.  Having said that, freeze map hurts in scenarios involving
toast_insert_or_update() failures and no crash recovery.  Instead of VACUUM
cleaning up the aborted xmax, that xmax could persist long enough for its xid
to be reused in a successful transaction.  When some other modification
finally clears all-frozen and all-visible, the row improperly becomes dead.
Both scenarios are fairly rare; I don't know which is more rare.  [Disclaimer:
I have not built tests cases to verify those alleged failure mechanisms.]

If we made this pre-9.6 bug a 9.6 open item, would anyone volunteer to own it?
Then we wouldn't need to guess whether 9.6 will be safer with the freeze map
or safer without the freeze map.


Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to