On 8 June 2018 at 19:03, Andres Freund <and...@anarazel.de> wrote: > Hi, > > On 2018-06-08 09:23:02 +0100, Simon Riggs wrote: >> I have also found another bug which affects what we do next. >> >> For context, AEL locks are normally removed by COMMIT or ABORT. >> StandbyReleaseOldLocks() is just a sweeper to catch anything that >> didn't send an abort before it died, so it hardly ever activates. The >> coding of StandbyReleaseOldLocks() is backwards... if it ain't in the >> running list, then we remove it. >> >> But that wasn't working correctly either, since as of 49bff5300d527 we >> assigned AELs to subxids. Subxids weren't in the running list and so >> AELs held by them would have been removed at the wrong time, an extant >> bug in PG10. It looks to me like they would have been removed either >> early or late, up to the next runningxact info record. They would be >> removed, so no leakage, but the late timing wouldn't be noticeable for >> tests or most usage, since it would look just like lock contention. >> Early release might give same issue of block access to non-existent >> block/file. > > Yikes, that's kinda bad. It can also just cause plain crashes, no? The > on-disk / catalog state isn't necessarily consistent during DDL, which > is why we hold AE locks. At the very least it can cause corruption of > in-use relcache entries and such. In practice the fact this probably > hits only a limited number of people because it requires DDL to happen > in subtransactions, which probably isn't *that* common.
Yep, kinda bad, hence fix. >> So the attached patch fixes both the bug in the recent commit and the >> one I just found by observation of 49bff5300d527, since they are >> related. > > Can we please keep them separate? The attached patch is separate. It fixes 49bff5300d527 and also the committed code, but should work fine even if we revert. Will doublecheck. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services