On Fri, May 5, 2023 at 7:53 PM Drouvot, Bertrand <bertranddrouvot...@gmail.com> wrote: > > > On 5/5/23 2:28 PM, Amit Kapila wrote: > > On Fri, May 5, 2023 at 5:36 PM Drouvot, Bertrand > > <bertranddrouvot...@gmail.com> wrote: > > > > It seems due to some reason the current wal file is not switched due > > to some reason. > > Oh wait, here is a NON failing one: > https://cirrus-ci.com/task/5086849685782528 (I modified the > .cirrus.yml so that we can download the "testrun.zip" file even if the test > is not failing). > > So, in this testrun.zip we can see, that the test is ok: > > $ grep -i retain > ./build/testrun/recovery/035_standby_logical_decoding/log/regress_log_035_standby_logical_decoding > [10:06:08.789](0.000s) ok 19 - invalidated logical slots do not lead to > retaining WAL > > and that the WAL file we expect to be removed is: > > $ grep "WAL file is" > ./build/testrun/recovery/035_standby_logical_decoding/log/regress_log_035_standby_logical_decoding > [10:06:08.789](0.925s) # BDT WAL file is > /Users/admin/pgsql/build/testrun/recovery/035_standby_logical_decoding/data/t_035_standby_logical_decoding_standby_data/pgdata/pg_wal/000000010000000000000003 > > This WAL file has been removed by the standby: > > $ grep -i 000000010000000000000003 > ./build/testrun/recovery/035_standby_logical_decoding/log/035_standby_logical_decoding_standby.log > | grep -i recy > 2023-05-05 10:06:08.787 UTC [17521][checkpointer] DEBUG: 00000: recycled > write-ahead log file "000000010000000000000003" > > But on the primary, it has been recycled way after that time: > > $ grep -i 000000010000000000000003 > ./build/testrun/recovery/035_standby_logical_decoding/log/035_standby_logical_decoding_primary.log > | grep -i recy > 2023-05-05 10:06:13.370 UTC [16785][checkpointer] DEBUG: 00000: recycled > write-ahead log file "000000010000000000000003" > > As, the checkpoint on the primary after the WAL file switch only recycled > (001 and 002): > > $ grep -i recycled > ./build/testrun/recovery/035_standby_logical_decoding/log/035_standby_logical_decoding_primary.log > 2023-05-05 10:05:57.196 UTC [16785][checkpointer] LOG: 00000: checkpoint > complete: wrote 4 buffers (3.1%); 0 WAL file(s) added, 0 removed, 0 recycled; > write=0.001 s, sync=0.001 s, total=0.027 s; sync files=0, longest=0.000 s, > average=0.000 s; distance=11219 kB, estimate=11219 kB; lsn=0/2000060, redo > lsn=0/2000028 > 2023-05-05 10:06:08.138 UTC [16785][checkpointer] DEBUG: 00000: recycled > write-ahead log file "000000010000000000000001" > 2023-05-05 10:06:08.138 UTC [16785][checkpointer] DEBUG: 00000: recycled > write-ahead log file "000000010000000000000002" > 2023-05-05 10:06:08.138 UTC [16785][checkpointer] LOG: 00000: checkpoint > complete: wrote 20 buffers (15.6%); 0 WAL file(s) added, 0 removed, 2 > recycled; write=0.001 s, sync=0.001 s, total=0.003 s; sync files=0, > longest=0.000 s, average=0.000 s; distance=32768 kB, estimate=32768 kB; > lsn=0/40000D0, redo lsn=0/4000098 > > > So, even on a successful test, we can see that the WAL file we expect to be > removed on the standby has not been recycled on the primary before the test. >
Okay, one possibility of not removing on primary is that at the time of checkpoint (when we compute RedoRecPtr), the wal_swtich and insert is not yet performed because in that case it will compute the RedoRecPtr as a location before those operations which would be 0000*3 file. However, it is not clear how is that possible except from a background checkpoint happening at that point but from LOGs, it appears that the checkpoint triggered by test has recycled the wal files. > > I think we need to add more DEBUG info to find that > > out. Can you please try to print 'RedoRecPtr', '_logSegNo', and > > recptr? > > > > Yes, will do. > Okay, thanks, please try to print similar locations on standby in CreateRestartPoint(). -- With Regards, Amit Kapila.