On Wed, May 8, 2019 at 3:53 PM Tom Lane <t...@sss.pgh.pa.us> wrote: > Thomas Munro <thomas.mu...@gmail.com> writes: > > Reproduced here. Once the system reaches a state where it's leaking > > (which happens only occasionally for me during installcheck-parallel), > > it keeps leaking for future SSI transactions. The cause is > > SxactGlobalXmin getting stuck. The attached fixes it for me. I can't > > remember why on earth I made that change, but it is quite clearly > > wrong: you have to check every transaction, or you might never advance > > SxactGlobalXmin. > > Hm. So I don't have any opinion about whether this is a correct fix for > the leak, but I am quite distressed that the system failed to notice that > it was leaking predicate locks. Shouldn't there be the same sort of > leak-detection infrastructure that we have for most types of resources?
Well, it is hooked up the usual release machinery, because it's in ReleasePredicateLocks(), which is wired into the RESOURCE_RELEASE_LOCKS phase of resowner.c. The thing is that lock lifetime is linked to the last transaction with the oldest known xmin, not the transaction that created them. More analysis: Lock clean-up is deferred until "... the last serializable transaction with the oldest xmin among serializable transactions completes", but I broke that by excluding read-only transactions from the check so that SxactGlobalXminCount gets out of sync. There's a read-only SSI transaction in src/test/regress/sql/transactions.sql, but I think the reason the problem manifests only intermittently with installcheck-parallel is because sometimes the read-only optimisation kicks in (effectively dropping us to plain old SI because there's no concurrent serializable activity) and it doesn't take any locks at all, and sometimes the read-only transaction doesn't have the oldest known xmin among serializable transactions. However, if a read-write SSI transaction had already taken a snapshot and has the oldest xmin and then the read-only one starts with the same xmin, we get into trouble. When the read-only one releases, we fail to decrement SxactGlobalXminCount, and then we'll never call ClearOldPredicateLocks(). -- Thomas Munro https://enterprisedb.com