On Fri, May 5, 2023 at 7:53 PM Drouvot, Bertrand
<bertranddrouvot...@gmail.com> wrote:
>
>
> On 5/5/23 2:28 PM, Amit Kapila wrote:
> > On Fri, May 5, 2023 at 5:36 PM Drouvot, Bertrand
> > <bertranddrouvot...@gmail.com> wrote:
> >
> > It seems due to some reason the current wal file is not switched due
> > to some reason.
>
> Oh wait, here is a NON failing one: 
> https://cirrus-ci.com/task/5086849685782528 (I modified the
> .cirrus.yml so that we can download the "testrun.zip" file even if the test 
> is not failing).
>
> So, in this testrun.zip we can see, that the test is ok:
>
> $ grep -i retain 
> ./build/testrun/recovery/035_standby_logical_decoding/log/regress_log_035_standby_logical_decoding
> [10:06:08.789](0.000s) ok 19 - invalidated logical slots do not lead to 
> retaining WAL
>
> and that the WAL file we expect to be removed is:
>
> $ grep "WAL file is" 
> ./build/testrun/recovery/035_standby_logical_decoding/log/regress_log_035_standby_logical_decoding
> [10:06:08.789](0.925s) # BDT WAL file is 
> /Users/admin/pgsql/build/testrun/recovery/035_standby_logical_decoding/data/t_035_standby_logical_decoding_standby_data/pgdata/pg_wal/000000010000000000000003
>
> This WAL file has been removed by the standby:
>
> $ grep -i 000000010000000000000003 
> ./build/testrun/recovery/035_standby_logical_decoding/log/035_standby_logical_decoding_standby.log
>  | grep -i recy
> 2023-05-05 10:06:08.787 UTC [17521][checkpointer] DEBUG:  00000: recycled 
> write-ahead log file "000000010000000000000003"
>
> But on the primary, it has been recycled way after that time:
>
> $ grep -i 000000010000000000000003 
> ./build/testrun/recovery/035_standby_logical_decoding/log/035_standby_logical_decoding_primary.log
>  | grep -i recy
> 2023-05-05 10:06:13.370 UTC [16785][checkpointer] DEBUG:  00000: recycled 
> write-ahead log file "000000010000000000000003"
>
> As, the checkpoint on the primary after the WAL file switch only recycled 
> (001 and 002):
>
> $ grep -i recycled 
> ./build/testrun/recovery/035_standby_logical_decoding/log/035_standby_logical_decoding_primary.log
> 2023-05-05 10:05:57.196 UTC [16785][checkpointer] LOG:  00000: checkpoint 
> complete: wrote 4 buffers (3.1%); 0 WAL file(s) added, 0 removed, 0 recycled; 
> write=0.001 s, sync=0.001 s, total=0.027 s; sync files=0, longest=0.000 s, 
> average=0.000 s; distance=11219 kB, estimate=11219 kB; lsn=0/2000060, redo 
> lsn=0/2000028
> 2023-05-05 10:06:08.138 UTC [16785][checkpointer] DEBUG:  00000: recycled 
> write-ahead log file "000000010000000000000001"
> 2023-05-05 10:06:08.138 UTC [16785][checkpointer] DEBUG:  00000: recycled 
> write-ahead log file "000000010000000000000002"
> 2023-05-05 10:06:08.138 UTC [16785][checkpointer] LOG:  00000: checkpoint 
> complete: wrote 20 buffers (15.6%); 0 WAL file(s) added, 0 removed, 2 
> recycled; write=0.001 s, sync=0.001 s, total=0.003 s; sync files=0, 
> longest=0.000 s, average=0.000 s; distance=32768 kB, estimate=32768 kB; 
> lsn=0/40000D0, redo lsn=0/4000098
>
>
> So, even on a successful test, we can see that the WAL file we expect to be 
> removed on the standby has not been recycled on the primary before the test.
>

Okay, one possibility of not removing on primary is that at the time
of checkpoint (when we compute RedoRecPtr), the wal_swtich and insert
is not yet performed because in that case it will compute the
RedoRecPtr as a location before those operations which would be 0000*3
file. However, it is not clear how is that possible except from a
background checkpoint happening at that point but from LOGs, it
appears that the checkpoint triggered by test has recycled the wal
files.

> > I think we need to add more DEBUG info to find that
> > out. Can you please try to print 'RedoRecPtr', '_logSegNo', and
> > recptr?
> >
>
> Yes, will do.
>

Okay, thanks, please try to print similar locations on standby in
CreateRestartPoint().

-- 
With Regards,
Amit Kapila.


Reply via email to