On Thu, Jul 7, 2016 at 10:53 AM, Alvaro Herrera <alvhe...@2ndquadrant.com> wrote: > Robert Haas wrote: >> On Wed, Jul 6, 2016 at 6:06 PM, Andres Freund <and...@anarazel.de> wrote: >> > Hm. We can't easily do that in the back-patched version; because a >> > standby won't know to check for the flag . That's kinda ok, since we >> > don't yet need to clear all-visible yet at that point of >> > heap_update. But that better means we don't do so on the master either. >> >> Is there any reason to back-patch this in the first place? > > Wasn't this determined to be a pre-existing bug? I think the > probability of occurrence has increased, but it's still possible in > earlier releases. I wonder if there are unexplained bugs that can be > traced down to this. > > I'm not really following this (sorry about that) but I wonder if (in > back branches) the failure to propagate in case the standby wasn't > updated can cause actual problems. If it does, maybe it'd be a better > idea to have a new WAL record type instead of piggybacking on lock > tuple. Then again, apparently the probability of this bug is low enough > that we shouldn't sweat over it ... Moreso considering Robert's apparent > opinion that perhaps we shouldn't backpatch at all in the first place. > > In any case I would like to see much more commentary in the patch next > to the new XLHL flag. It's sufficiently different than the rest than it > deserves so, IMO.
There are two issues being discussed on this thread. One of them is a new issue in 9.6: heap_lock_tuple needs to clear the all-frozen bit in the freeze map even though it does not clear all-visible. The one that's actually a preexisting bug is that we can start to update a tuple without WAL-logging anything and then release the page lock in order to go perform TOAST insertions. At this point, other backends (on the master) will see this tuple as in the process of being updated because xmax has been set and ctid has been made to point back to the same tuple. I'm guessing that if the UPDATE goes on to complete, any discrepancy between the master and the standby is erased by the replay of the WAL record covering the update itself. I haven't checked that, but it seems like that WAL record must set both xmax and ctid appropriately or we'd be in big trouble. The infomask bits are in play too, but presumably the update's WAL is going to set those correctly also. So in this case I don't think there's really any issue for the standby. Or for the master, either: it may technically be true the tuple is not all-visible any more, but the only backend that could potentially fail to see it is the one performing the update, and no user code can run in the middle of toast_insert_or_update, so I think we're OK. On the other hand, if the UPDATE aborts, there's now a persistent difference between the master and standby: the infomask, xmax, and ctid of the tuple may differ. I don't know whether that could cause any problem. It's probably a very rare case, because there aren't all that many things that will cause us to abort in the middle of inserting TOAST tuples. Out of disk space comes to mind, or maybe some kind of corruption that throws an elog(). As far as back-patching goes, the question is whether it's worth the risk. Introducing new WAL logging at this point could certainly cause performance problems if nothing else, never mind the risk of garden-variety bugs. I'm not sure it's worth taking that risk in released branches for the sake of a bug which has existed for a decade without anybody finding it until now. I'm not going to argue strongly for that position, but I think it's worth thinking about. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (firstname.lastname@example.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers