Hi,

On 2022-07-26 13:57:53 -0400, Tom Lane wrote:
> I happened to notice that while skink continues to fail off-and-on
> in 031_recovery_conflict.pl, the symptoms have changed!  What
> we're getting now typically looks like [1]:
> 
> [10:45:11.475](0.023s) ok 14 - startup deadlock: lock acquisition is waiting
> Waiting for replication conn standby's replay_lsn to pass 0/33FB8B0 on primary
> done
> timed out waiting for match: (?^:User transaction caused buffer deadlock with 
> recovery.) at t/031_recovery_conflict.pl line 367.
> 
> where absolutely nothing happens in the standby log, until we time out:
> 
> 2022-07-24 10:45:11.452 UTC [1468367][client backend][2/4:0] LOG:  statement: 
> SELECT * FROM test_recovery_conflict_table2;
> 2022-07-24 10:45:11.472 UTC [1468547][client backend][3/2:0] LOG:  statement: 
> SELECT 'waiting' FROM pg_locks WHERE locktype = 'relation' AND NOT granted;
> 2022-07-24 10:48:15.860 UTC [1468362][walreceiver][:0] FATAL:  could not 
> receive data from WAL stream: server closed the connection unexpectedly
> 
> So this is not a case of RecoveryConflictInterrupt doing the wrong thing:
> the startup process hasn't detected the buffer conflict in the first
> place.

I wonder if this, at least partially, could be be due to the elog thing
I was complaining about nearby. I.e. we decide to FATAL as part of a
recovery conflict interrupt, and then during that ERROR out as part of
another recovery conflict interrupt (because nothing holds interrupts as
part of FATAL).

Greetings,

Andres Freund


Reply via email to