Hello hackers,

While investigating the recent skink failure [1], I've reproduced this
failure under Valgrind on a slow machine and found that this happens due to
the last checkpoint recorded in the segment 2, that is removed in the test:
The failure log contains:
2023-10-10 19:10:08.212 UTC [2144251][startup][:0] LOG:  invalid checkpoint 
2023-10-10 19:10:08.214 UTC [2144251][startup][:0] PANIC:  could not locate a 
valid checkpoint record

The line above:
[19:10:02.701](318.076s) ok 1 - 000000010000000000000001 differs from 
tells us about the duration of previous operations (> 5 mins).

2023-10-10 19:04:50.149 UTC [1845798][postmaster][:0] LOG:  database system is 
ready to accept connections
2023-10-10 19:09:49.131 UTC [1847585][checkpointer][:0] LOG: checkpoint 
starting: time
2023-10-10 19:10:02.058 UTC [1847585][checkpointer][:0] LOG: checkpoint 
complete: ... lsn=0/*2093980*, redo lsn=0/1F62760

And here is one more instance of this failure [2]:
2022-11-08 02:35:25.826 UTC [1614205][][:0] PANIC:  could not locate a valid 
checkpoint record
2022-11-08 02:35:26.164 UTC [1612967][][:0] LOG:  startup process (PID 1614205) 
was terminated by signal 6: Aborted

2022-11-08 02:29:57.961 UTC [1546469][][:0] LOG:  database system is ready to 
accept connections
2022-11-08 02:35:10.764 UTC [1611737][][2/10:0] LOG:  statement: SELECT 
2022-11-08 02:35:11.598 UTC [1546469][][:0] LOG:  received immediate shutdown 

The next successful run after the failure [1] shows the following duration:
[21:34:48.556](180.150s) ok 1 - 000000010000000000000001 differs from 
And the last successful run:
[03:03:53.892](126.206s) ok 1 - 000000010000000000000001 differs from 

So to fail on the test, skink should perform at least twice slower than
usual, and may be it's an extraordinary condition indeed, but on the other
hand, may be increase checkpoint_timeout as already done in several tests
(015_promotion_pages, 038_save_logical_slots_shutdown, 039_end_of_wal, ...).


Best regards,

Reply via email to