On Wed, Jun 15, 2016 at 08:56:52AM -0400, Robert Haas wrote: > On Wed, Jun 15, 2016 at 2:41 AM, Thomas Munro > <thomas.mu...@enterprisedb.com> wrote: > > I spent some time chasing down the exact circumstances. I suspect > > that there may be an interlocking problem in heap_update. Using the > > line numbers from cae1c788 , I see the following interaction > > between the VACUUM, UPDATE and SELECT (pg_check_visible) backends, all > > in reference to the same block number: > > > > [VACUUM] sets all visible bit > > > > [UPDATE] heapam.c:3931 HeapTupleHeaderSetXmax(oldtup.t_data, > > xmax_old_tuple); > > [UPDATE] heapam.c:3938 LockBuffer(buffer, BUFFER_LOCK_UNLOCK); > > > > [SELECT] LockBuffer(buffer, BUFFER_LOCK_SHARE); > > [SELECT] observes VM_ALL_VISIBLE as true > > [SELECT] observes tuple in HEAPTUPLE_DELETE_IN_PROGRESS state > > [SELECT] barfs > > > > [UPDATE] heapam.c:4116 visibilitymap_clear(...) > > Yikes: heap_update() sets the tuple's XMAX, CMAX, infomask, infomask2, > and CTID without logging anything or clearing the all-visible flag and > then releases the lock on the heap page to go do some more work that > might even ERROR out. Only if that other work all goes OK do we > relock the page and perform the WAL-logged actions. > > That doesn't seem like a good idea even in existing releases, because > you've taken a tuple on an all-visible page and made it not > all-visible, and you've made a page modification that is not > necessarily atomic without logging it. This is is particularly bad in > 9.6, because if that page is also all-frozen then XMAX will eventually > be pointing into space and VACUUM will never visit the page to > re-freeze it the way it would have done in earlier releases. However, > even in older releases, I think there's a remote possibility of data > corruption. Backend #1 makes these changes to the page, releases the > lock, and errors out. Backend #2 writes the page to the OS. DBA > takes a hot backup, tearing the page in the middle of XMAX. Oops.
I agree the non-atomic, unlogged change is a problem. A related threat doesn't require a torn page: AssignTransactionId() xid=123 heapam.c:3931 HeapTupleHeaderSetXmax(oldtup.t_data, 123); some ERROR before heap_update() finishes rollback; -- xid=123 some backend flushes the modified page immediate shutdown AssignTransactionId() xid=123 commit; -- xid=123 If nothing wrote an xlog record that witnesses xid 123, the cluster can reassign it after recovery. The failed update is now considered a successful update, and the row improperly becomes dead. That's important. I don't know whether the 9.6 all-frozen mechanism materially amplifies the consequences of this bug. The interaction with visibility map and freeze map is not all bad; indeed, it can reduce the risk of experiencing consequences from the non-atomic, unlogged change bug. If the row is all-visible when heap_update() starts, every transaction should continue to consider the row visible until heap_update() finishes successfully. If an ERROR interrupts heap_update(), visibility verdicts should be as though the heap_update() never happened. If one of the previously-described mechanisms would make an xmax visibility test give the wrong answer, an all-visible bit could mask the problem for awhile. Having said that, freeze map hurts in scenarios involving toast_insert_or_update() failures and no crash recovery. Instead of VACUUM cleaning up the aborted xmax, that xmax could persist long enough for its xid to be reused in a successful transaction. When some other modification finally clears all-frozen and all-visible, the row improperly becomes dead. Both scenarios are fairly rare; I don't know which is more rare. [Disclaimer: I have not built tests cases to verify those alleged failure mechanisms.] If we made this pre-9.6 bug a 9.6 open item, would anyone volunteer to own it? Then we wouldn't need to guess whether 9.6 will be safer with the freeze map or safer without the freeze map. Thanks, nm -- Sent via pgsql-hackers mailing list (firstname.lastname@example.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers