When a backend is blocked on writing data (such as with a network error or a very slow client), indicated with wait event ClientWrite, it appears to not properly notice that it's overrunning max_standby_streaming_delay, and therefore does not cancel the transaction on the backend.
I've reproduced this repeatedly on Ubuntu 20.04 with PostgreSQL 15 out of the debian packages. Curiously enough, if I install the debug symbols and restart, in order to get a backtrace, it starts processing the cancellation again and can no longer reproduce. So it sounds like some timing issue around it. My simple test was, with session 1 on the standby and session 2 on the primary: Session 1: begin transaction isolation level repeatable read; Session 1: select count(*) from testtable; Session 2: alter table testtable rename to testtable2; Session 1: select * from testtable t1 cross join testtable t2; kill -STOP <the pid of session 1> At this point, replication lag sartgs growing on the standby and it never terminates the session. If I then SIGCONT it, it will get terminated by replication conflict. If I kill the session hard, the replication lag recovers immediately. AFAICT if the confliact happens at ClientRead, for example, it's picked up immediately, but there's something in ClientWrite that prevents it. My first thought would be OpenSSL, but this is reproducible both on tls-over-tcp and on unix sockets. -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/