On Fri, Jun 08, 2018 at 11:03:38AM -0700, Andres Freund wrote: > On 2018-06-08 09:23:02 +0100, Simon Riggs wrote: > > For context, AEL locks are normally removed by COMMIT or ABORT. > > StandbyReleaseOldLocks() is just a sweeper to catch anything that > > didn't send an abort before it died, so it hardly ever activates. The > > coding of StandbyReleaseOldLocks() is backwards... if it ain't in the > > running list, then we remove it. > > > > But that wasn't working correctly either, since as of 49bff5300d527 we > > assigned AELs to subxids. Subxids weren't in the running list and so > > AELs held by them would have been removed at the wrong time, an extant > > bug in PG10. It looks to me like they would have been removed either > > early or late, up to the next runningxact info record. They would be > > removed, so no leakage, but the late timing wouldn't be noticeable for > > tests or most usage, since it would look just like lock contention. > > Early release might give same issue of block access to non-existent > > block/file. > > Yikes, that's kinda bad. It can also just cause plain crashes, no? The > on-disk / catalog state isn't necessarily consistent during DDL, which > is why we hold AE locks. At the very least it can cause corruption of > in-use relcache entries and such.
Yes. If I'm reading the commit message right, the committed code also releases locks too early: > commit 15378c1 > Author: Simon Riggs <si...@2ndquadrant.com> > AuthorDate: Sat Jun 16 14:03:29 2018 +0100 > Commit: Simon Riggs <si...@2ndquadrant.com> > CommitDate: Sat Jun 16 14:03:29 2018 +0100 > > Remove AELs from subxids correctly on standby > > Issues relate only to subtransactions that hold AccessExclusiveLocks > when replayed on standby. > > Prior to PG10, aborting subtransactions that held an > AccessExclusiveLock failed to release the lock until top level commit or > abort. 49bff5300d527 fixed that. > > However, 49bff5300d527 also introduced a similar bug where subtransaction > commit would fail to release an AccessExclusiveLock, leaving the lock to > be removed sometimes early and sometimes late. This commit fixes > that bug also. Backpatch to PG10 needed. Subtransaction commit is too early to release an arbitrary AccessExclusiveLock. The primary releases every AccessExclusiveLock at top-level transaction commit, top-level transaction abort, or subtransaction abort. CommitSubTransaction() doesn't do that; it transfers locks to the parent sub(xact). Standby nodes can't safely remove an arbitrary lock earlier than the primary would.