Re: Help understanding SIReadLock growing without bound on completed transaction
On second look, it does seems the xid crossed the 2^32 mark recently, since most tables have a frozenxid close to 4b and the current xid is ~50m: SELECT relname, age(relfrozenxid), relfrozenxid FROM pg_class WHERE relkind = 'r' and relname not like 'pg%' order by relname; relname | age | relfrozenxid ---+---+-- | 107232506 | 4237961815 | 93692362 | 4251501959 | 183484103 | 4161710218 | 50760536 | 4294433785 | 58821410 | 4286372911 | 117427283 | 4227767038 | 9454 | 4250653210 … select max(backend_xid::text), min(backend_xmin::text) from pg_stat_activity where state='active'; max | min --+-- 50350294 | 50350065 -Mike On Tue, May 26, 2020 at 8:42 AM, Mike Klaas < m...@superhuman.com > wrote: > > On Fri, May 22, 2020 at 3:15 PM, Thomas Munro < thomas. munro@ gmail. com ( > thomas.mu...@gmail.com ) > wrote: > >> >> >> Predicate locks are released by ClearOldPredicateLocks(), which releases >> SERIALIZABLEXACTs once they are no longer interesting. It has a >> conservative idea of what is no longer interesting: it waits until the >> lowest xmin across active serializable snapshots is >= the transaction's >> finishedBefore xid, which was the system's next xid (an xid that hasn't >> been used yet*) at the time the SERIALIZABLEXACT committed. One >> implication of this scheme is that SERIALIZABLEXACTs are cleaned up in >> commit order. If you somehow got into a state where a few of them were >> being kept around for a long time, but others committed later were being >> cleaned up (which I suppose must be the case or your system would be >> complaining about running out of SERIALIZABLEXACTs), that might imply that >> there is a rare leak somewhere in this scheme. In the past I have wondered >> if there might be a problem with wraparound in the xid tracking for >> finished transactions, but I haven't worked out the details (transaction >> ID wraparound is both figuratively and literally the Ground Hog Day of >> PostgreSQL bug surfaces). >> >> >> >> > > > > Thanks for the detailed reply, Thomas. Is SERIALIZABLEXACT transaction ID > wraparound the same as global xid wraparound? The max transaction age in > the db is ~197M [1] so I don't think we've gotten close to global > wraparound lately. > > > > Would it be helpful to cross-post this thread to pgsql-bugs or further > investigate on my end > > > > -Mike > > > > [1] superhuman@ production => select datname, datfrozenxid, > age(datfrozenxid) from pg_catalog.pg_database; > > > datname | datfrozenxid | age > > > > > ---+--+--- > > > > > cloudsqladmin | 4173950091 | 169089900 > > > > > template0 | 4266855294 | 76184697 > > > > > postgres | 4173951306 | 169088685 > > > > > template1 | 4266855860 | 76184131 > > > > > superhuman | 4145766807 | 197273184 > > >
Re: Help understanding SIReadLock growing without bound on completed transaction
On Fri, May 22, 2020 at 3:15 PM, Thomas Munro < thomas.mu...@gmail.com > wrote: > > > > Predicate locks are released by ClearOldPredicateLocks(), which releases > SERIALIZABLEXACTs once they are no longer interesting. It has a > conservative idea of what is no longer interesting: it waits until the > lowest xmin across active serializable snapshots is >= the transaction's > finishedBefore xid, which was the system's next xid (an xid that hasn't > been used yet*) at the time the SERIALIZABLEXACT committed. One > implication of this scheme is that SERIALIZABLEXACTs are cleaned up in > commit order. If you somehow got into a state where a few of them were > being kept around for a long time, but others committed later were being > cleaned up (which I suppose must be the case or your system would be > complaining about running out of SERIALIZABLEXACTs), that might imply that > there is a rare leak somewhere in this scheme. In the past I have wondered > if there might be a problem with wraparound in the xid tracking for > finished transactions, but I haven't worked out the details (transaction > ID wraparound is both figuratively and literally the Ground Hog Day of > PostgreSQL bug surfaces). > > > > Thanks for the detailed reply, Thomas. Is SERIALIZABLEXACT transaction ID wraparound the same as global xid wraparound? The max transaction age in the db is ~197M [1] so I don't think we've gotten close to global wraparound lately. Would it be helpful to cross-post this thread to pgsql-bugs or further investigate on my end -Mike [1] superhuman@ production => select datname, datfrozenxid, age(datfrozenxid) from pg_catalog.pg_database; datname | datfrozenxid | age ---+--+--- cloudsqladmin | 4173950091 | 169089900 template0 | 4266855294 | 76184697 postgres | 4173951306 | 169088685 template1 | 4266855860 | 76184131 superhuman | 4145766807 | 197273184
Re: Help understanding SIReadLock growing without bound on completed transaction
On Fri, May 22, 2020 at 7:48 AM Mike Klaas wrote: > It's my understanding that these locks should be cleared when there are no > conflicting transactions. These locks had existed for > 1 week and we have > no transactions that last more than a few seconds (the oldest transaction in > pg_stat_activity is always < 1minute old). > Why would a transaction that is finished continue accumulating locks over > time? Predicate locks are released by ClearOldPredicateLocks(), which releases SERIALIZABLEXACTs once they are no longer interesting. It has a conservative idea of what is no longer interesting: it waits until the lowest xmin across active serializable snapshots is >= the transaction's finishedBefore xid, which was the system's next xid (an xid that hasn't been used yet*) at the time the SERIALIZABLEXACT committed. One implication of this scheme is that SERIALIZABLEXACTs are cleaned up in commit order. If you somehow got into a state where a few of them were being kept around for a long time, but others committed later were being cleaned up (which I suppose must be the case or your system would be complaining about running out of SERIALIZABLEXACTs), that might imply that there is a rare leak somewhere in this scheme. In the past I have wondered if there might be a problem with wraparound in the xid tracking for finished transactions, but I haven't worked out the details (transaction ID wraparound is both figuratively and literally the Ground Hog Day of PostgreSQL bug surfaces). *Interestingly, it takes an unlocked view of that value, but that doesn't seem relevant here; it could see a value that's too low, not too high.
Re: Help understanding SIReadLock growing without bound on completed transaction
On Thu, May 21, 2020 at 5:19 PM, Thomas Munro < thomas.mu...@gmail.com > wrote: > > > > On Fri, May 22, 2020 at 7:48 AM Mike Klaas < mike@ superhuman. com ( > m...@superhuman.com ) > wrote: > > > >> >> >> pid:2263461 >> >> >> > > > > That's an unusually high looking pid. Is that expected, for example did > you crank Linux's pid_max right up, or is this AIX, or something? > > > > Unfortunately I'm not sure exactly what it's running on as it's a cloud-provided database instance running on google cloud: => select version(); PostgreSQL 9.6.16 on x86_64-pc-linux-gnu, compiled by clang version 7.0.0-3~ubuntu0.18.04.1 (tags/RELEASE_700/final), 64-bit -Mike
Re: Help understanding SIReadLock growing without bound on completed transaction
On Fri, May 22, 2020 at 7:48 AM Mike Klaas wrote: > locktype: page > relation::regclass::text: _pkey > virtualtransaction: 36/296299968 > granted:t > pid:2263461 That's an unusually high looking pid. Is that expected, for example did you crank Linux's pid_max right up, or is this AIX, or something?
Help understanding SIReadLock growing without bound on completed transaction
We recently experienced an issue where a transaction that was finished and no longer existed kept slowly accumulating SIReadLocks over the period of a week. The only way we could remove the locks was by restarting postgresql. The entries in pg_locks resembled: mode: SIReadLock locktype: page relation::regclass::text: _pkey virtualtransaction: 36/296299968 granted:t pid: 2263461 count(1): 5559 (when grouped) Note that this pid did not exist in pg_stat_activity. I understand that it is normal for SSI locks to persist after a transaction is finished. There are however two aspects to this that I don't understand: * It's my understanding that these locks should be cleared when there are no conflicting transactions. These locks had existed for > 1 week and we have no transactions that last more than a few seconds (the oldest transaction in pg_stat_activity is always < 1minute old). * Why would a transaction that is finished continue accumulating locks over time? If it helps, here is some more information about the state of the system: * There were a total of six pids in pg_locks that didn't exist in pg_stat_activity. They held a variety of SIReadLocks, but they weren't increasing in number over time. I'm not sure how long they were present; I only know that the problematic pid existed for a week due to its continual growth reflecting in our internal lock monitoring system. * I tried finding overlapping SIReadLocks (see query below), but none were returned (I realize that the SSI conflict resolution algo is much more involved than this simple query) * PG version: 9.6.17 I would appreciate any hints of what I could've done to investigate this further or how I could've resolved the issue without restarting the db (and thus experiencing downtime). thank you in advance, -Mike SELECT waiting.locktype AS w_locktype, LEFT(waiting.relation::regclass::text,25) AS waiting_table, COALESCE(waiting_stm.query,'?') AS w_query, waiting.page ( http://waiting.page/ ) AS w_page, waiting.tuple AS w_tuple, waiting.pid ( http://waiting.pid/ ) AS w_pid, other.locktype AS o_locktype, LEFT(other.relation::regclass::text,15) AS other_table, LEFT(COALESCE(other_stm.query, '?'), 50) AS other_query, other.page ( http://other.page/ ) AS o_page, other.tuple AS o_tuple, other.pid ( http://other.pid/ ) AS other_pid, other.GRANTED AS o_granted FROM pg_catalog.pg_locks AS waiting LEFT JOIN pg_catalog.pg_stat_activity AS waiting_stm ON waiting_stm.pid ( http://waiting_stm.pid/ ) = waiting.pid ( http://waiting.pid/ ) JOIN pg_catalog.pg_locks AS other ON ( ( waiting."database" = other."database" AND waiting.relation = other.relation and waiting.locktype = other.locktype AND ( CASE WHEN other.locktype = 'page' THEN waiting.page ( http://waiting.page/ ) IS NOT DISTINCT FROM other.page ( http://other.page/ ) WHEN other.locktype = 'tuple' THEN waiting.page ( http://waiting.page/ ) IS NOT DISTINCT FROM other.page ( http://other.page/ ) and waiting.tuple IS NOT DISTINCT FROM other.tuple ELSE true END ) ) OR waiting.transactionid = other.transactionid ) AND waiting.pid ( http://waiting.pid/ ) <> other.pid ( http://other.pid/ ) LEFT JOIN pg_catalog.pg_stat_activity AS other_stm ON other_stm.pid ( http://other_stm.pid/ ) = other.pid ( http://other.pid/ ) WHERE waiting.pid ( http://waiting.pid/ ) IN (2263461, 2263276, 2263283, 2263284, 2263459, 2263527 )