Re: Help understanding SIReadLock growing without bound on completed transaction

2020-05-26 Thread Mike Klaas
On second look, it does seems the xid crossed the 2^32 mark recently, since 
most tables have a frozenxid close to 4b and the current xid is ~50m:

SELECT relname, age(relfrozenxid), relfrozenxid FROM pg_class WHERE relkind = 
'r' and relname not like 'pg%' order by relname;

relname  |    age    | relfrozenxid

---+---+--

  | 107232506 |   4237961815

  |  93692362 |   4251501959

  | 183484103 |   4161710218

  |  50760536 |   4294433785

  |  58821410 |   4286372911

  | 117427283 |   4227767038

  |  9454 |   4250653210

…

select max(backend_xid::text), min(backend_xmin::text) from pg_stat_activity 
where state='active';

max | min

--+--

50350294 | 50350065

-Mike

On Tue, May 26, 2020 at 8:42 AM, Mike Klaas < m...@superhuman.com > wrote:

> 
> On Fri, May 22, 2020 at 3:15 PM, Thomas Munro < thomas. munro@ gmail. com (
> thomas.mu...@gmail.com ) > wrote:
> 
>> 
>> 
>> Predicate locks are released by ClearOldPredicateLocks(), which releases
>> SERIALIZABLEXACTs once they are no longer interesting. It has a
>> conservative idea of what is no longer interesting: it waits until the
>> lowest xmin across active serializable snapshots is >= the transaction's
>> finishedBefore xid, which was the system's next xid (an xid that hasn't
>> been used yet*) at the time the SERIALIZABLEXACT committed. One
>> implication of this scheme is that SERIALIZABLEXACTs are cleaned up in
>> commit order. If you somehow got into a state where a few of them were
>> being kept around for a long time, but others committed later were being
>> cleaned up (which I suppose must be the case or your system would be
>> complaining about running out of SERIALIZABLEXACTs), that might imply that
>> there is a rare leak somewhere in this scheme. In the past I have wondered
>> if there might be a problem with wraparound in the xid tracking for
>> finished transactions, but I haven't worked out the details (transaction
>> ID wraparound is both figuratively and literally the Ground Hog Day of
>> PostgreSQL bug surfaces).
>> 
>> 
>> 
>> 
> 
> 
> 
> Thanks for the detailed reply, Thomas.  Is SERIALIZABLEXACT transaction ID
> wraparound the same as global xid wraparound?  The max transaction age in
> the db is ~197M [1] so I don't think we've gotten close to global
> wraparound lately.
> 
> 
> 
> Would it be helpful to cross-post this thread to pgsql-bugs or further
> investigate on my end
> 
> 
> 
> -Mike
> 
> 
> 
> [1] superhuman@ production => select datname, datfrozenxid,
> age(datfrozenxid) from pg_catalog.pg_database;
> 
> 
> datname | datfrozenxid | age
> 
> 
> 
> 
> ---+--+---
> 
> 
> 
> 
> cloudsqladmin | 4173950091 | 169089900
> 
> 
> 
> 
> template0 | 4266855294 | 76184697
> 
> 
> 
> 
> postgres | 4173951306 | 169088685
> 
> 
> 
> 
> template1 | 4266855860 | 76184131
> 
> 
> 
> 
> superhuman | 4145766807 | 197273184
> 
> 
>

Re: Help understanding SIReadLock growing without bound on completed transaction

2020-05-26 Thread Mike Klaas
On Fri, May 22, 2020 at 3:15 PM, Thomas Munro < thomas.mu...@gmail.com > wrote:

> 
> 
> 
> Predicate locks are released by ClearOldPredicateLocks(), which releases
> SERIALIZABLEXACTs once they are no longer interesting. It has a
> conservative idea of what is no longer interesting: it waits until the
> lowest xmin across active serializable snapshots is >= the transaction's
> finishedBefore xid, which was the system's next xid (an xid that hasn't
> been used yet*) at the time the SERIALIZABLEXACT committed. One
> implication of this scheme is that SERIALIZABLEXACTs are cleaned up in
> commit order. If you somehow got into a state where a few of them were
> being kept around for a long time, but others committed later were being
> cleaned up (which I suppose must be the case or your system would be
> complaining about running out of SERIALIZABLEXACTs), that might imply that
> there is a rare leak somewhere in this scheme. In the past I have wondered
> if there might be a problem with wraparound in the xid tracking for
> finished transactions, but I haven't worked out the details (transaction
> ID wraparound is both figuratively and literally the Ground Hog Day of
> PostgreSQL bug surfaces).
> 
> 
> 
> 

Thanks for the detailed reply, Thomas.  Is SERIALIZABLEXACT transaction ID 
wraparound the same as global xid wraparound?  The max transaction age in the 
db is ~197M [1] so I don't think we've gotten close to global wraparound lately.

Would it be helpful to cross-post this thread to pgsql-bugs or further 
investigate on my end

-Mike

[1] superhuman@ production => select datname, datfrozenxid, age(datfrozenxid) 
from pg_catalog.pg_database;

datname | datfrozenxid | age

---+--+---

cloudsqladmin | 4173950091 | 169089900

template0 | 4266855294 | 76184697

postgres | 4173951306 | 169088685

template1 | 4266855860 | 76184131

superhuman | 4145766807 | 197273184

Re: Help understanding SIReadLock growing without bound on completed transaction

2020-05-22 Thread Thomas Munro
On Fri, May 22, 2020 at 7:48 AM Mike Klaas  wrote:
> It's my understanding that these locks should be cleared when there are no 
> conflicting transactions.  These locks had existed for > 1 week and we have 
> no transactions that last more than a few seconds (the oldest transaction in 
> pg_stat_activity is always < 1minute old).
> Why would a transaction that is finished continue accumulating locks over 
> time?

Predicate locks are released by ClearOldPredicateLocks(), which
releases SERIALIZABLEXACTs once they are no longer interesting.  It
has a  conservative idea of what is no longer interesting: it waits
until the lowest xmin across active serializable snapshots is >= the
transaction's finishedBefore xid, which was the system's next xid (an
xid that hasn't been used yet*) at the time the SERIALIZABLEXACT
committed.  One implication of this scheme is that SERIALIZABLEXACTs
are cleaned up in commit order.  If you somehow got into a state where
a few of them were being kept around for a long time, but others
committed later were being cleaned up (which I suppose must be the
case or your system would be complaining about running out of
SERIALIZABLEXACTs), that might imply that there is a rare leak
somewhere in this scheme.  In the past I have wondered if there might
be a problem with wraparound in the xid tracking for finished
transactions, but I haven't worked out the details (transaction ID
wraparound is both figuratively and literally the Ground Hog Day of
PostgreSQL bug surfaces).

*Interestingly, it takes an unlocked view of that value, but that
doesn't seem relevant here; it could see a value that's too low, not
too high.




Re: Help understanding SIReadLock growing without bound on completed transaction

2020-05-21 Thread Mike Klaas
On Thu, May 21, 2020 at 5:19 PM, Thomas Munro < thomas.mu...@gmail.com > wrote:

> 
> 
> 
> On Fri, May 22, 2020 at 7:48 AM Mike Klaas < mike@ superhuman. com (
> m...@superhuman.com ) > wrote:
> 
> 
> 
>> 
>> 
>> pid:2263461
>> 
>> 
>> 
> 
> 
> 
> That's an unusually high looking pid. Is that expected, for example did
> you crank Linux's pid_max right up, or is this AIX, or something?
> 
> 
> 
> 

Unfortunately I'm not sure exactly what it's running on as it's a 
cloud-provided database instance running on google cloud:

=> select version();

PostgreSQL 9.6.16 on x86_64-pc-linux-gnu, compiled by clang version 
7.0.0-3~ubuntu0.18.04.1 (tags/RELEASE_700/final), 64-bit

-Mike

Re: Help understanding SIReadLock growing without bound on completed transaction

2020-05-21 Thread Thomas Munro
On Fri, May 22, 2020 at 7:48 AM Mike Klaas  wrote:
> locktype: page
> relation::regclass::text: _pkey
> virtualtransaction: 36/296299968
> granted:t
> pid:2263461

That's an unusually high looking pid.  Is that expected, for example
did you crank Linux's pid_max right up, or is this AIX, or something?




Help understanding SIReadLock growing without bound on completed transaction

2020-05-21 Thread Mike Klaas
We recently experienced an issue where a transaction that was finished and no 
longer existed kept slowly accumulating SIReadLocks over the period of a week.  
The only way we could remove the locks was by restarting postgresql.

The entries in pg_locks resembled:

mode: SIReadLock

locktype: page

relation::regclass::text: _pkey

virtualtransaction: 36/296299968

granted:t

pid: 2263461

count(1): 5559 (when grouped)

Note that this pid did not exist in pg_stat_activity.  I understand that it is 
normal for SSI locks to persist after a transaction is finished.  There are 
however two aspects to this that I don't understand:

* It's my understanding that these locks should be cleared when there are no 
conflicting transactions.  These locks had existed for > 1 week and we have no 
transactions that last more than a few seconds (the oldest transaction in 
pg_stat_activity is always < 1minute old).

* Why would a transaction that is finished continue accumulating locks over 
time?

If it helps, here is some more information about the state of the system:

* There were a total of six pids in pg_locks that didn't exist in 
pg_stat_activity.  They held a variety of SIReadLocks, but they weren't 
increasing in number over time.  I'm not sure how long they were present; I 
only know that the problematic pid existed for a week due to its continual 
growth reflecting in our internal lock monitoring system.

* I tried finding overlapping SIReadLocks (see query below), but none were 
returned (I realize that the SSI conflict resolution algo is much more involved 
than this simple query)

* PG version: 9.6.17

I would appreciate any hints of what I could've done to investigate this 
further or how I could've resolved the issue without restarting the db (and 
thus experiencing downtime).

thank you in advance,
-Mike

SELECT

waiting.locktype AS w_locktype,

LEFT(waiting.relation::regclass::text,25) AS waiting_table,

COALESCE(waiting_stm.query,'?') AS w_query,

waiting.page ( http://waiting.page/ ) AS w_page,

waiting.tuple AS w_tuple,

waiting.pid ( http://waiting.pid/ ) AS w_pid,

other.locktype AS o_locktype,

LEFT(other.relation::regclass::text,15) AS other_table,

LEFT(COALESCE(other_stm.query, '?'), 50) AS other_query,

other.page ( http://other.page/ ) AS o_page,

other.tuple AS o_tuple,

other.pid ( http://other.pid/ ) AS other_pid,

other.GRANTED AS o_granted

FROM

pg_catalog.pg_locks AS waiting

LEFT JOIN

pg_catalog.pg_stat_activity AS waiting_stm

ON waiting_stm.pid ( http://waiting_stm.pid/ ) = waiting.pid ( 
http://waiting.pid/ )

JOIN

pg_catalog.pg_locks AS other

ON (

(

waiting."database" = other."database"

AND waiting.relation = other.relation

and waiting.locktype = other.locktype

AND ( CASE WHEN other.locktype = 'page' THEN waiting.page ( 
http://waiting.page/ ) IS NOT DISTINCT FROM other.page ( http://other.page/ )

WHEN other.locktype = 'tuple' THEN waiting.page ( http://waiting.page/ ) IS NOT 
DISTINCT FROM other.page ( http://other.page/ ) and waiting.tuple IS NOT 
DISTINCT FROM other.tuple

ELSE true END

)

)

OR waiting.transactionid = other.transactionid

) AND waiting.pid ( http://waiting.pid/ ) <> other.pid ( http://other.pid/ )

LEFT JOIN

pg_catalog.pg_stat_activity AS other_stm

ON other_stm.pid ( http://other_stm.pid/ ) = other.pid ( http://other.pid/ )

WHERE waiting.pid ( http://waiting.pid/ ) IN (2263461, 2263276, 2263283, 
2263284, 2263459, 2263527 )