Re: pgsql: Improve runtime and output of tests for replication slots checkp

vignesh C Mon, 23 Jun 2025 05:53:47 -0700

On Sun, 22 Jun 2025 at 05:46, Alexander Korotkov <[email protected]> wrote:
>
> On Sat, Jun 21, 2025 at 2:42 AM Tom Lane <[email protected]> wrote:
> >
> > Alexander Korotkov <[email protected]> writes:
> > > And I see the following variable values.
> >
> > > (lldb) p/x targetPagePtr
> > > (XLogRecPtr) 0x0000000029004000
> > > (lldb) p/x RecPtr
> > > (XLogRecPtr) 0x0000000029002138
> >
> > > I hardly understand how is this possible given it was compiled with "-O0".
> > > I'm planning to continue investigating this tomorrow.
> >
> > Yeah, I see
> >
> > (lldb) p/x targetPagePtr
> > (XLogRecPtr) 0x0000000029004000
> > (lldb) p/x RecPtr
> > (XLogRecPtr) 0x0000000029002138
> > (lldb) p/x RecPtr - (RecPtr % 8192)
> > (XLogRecPtr) 0x0000000029002000
> >
> > We're here:
> >
> >             /* Calculate pointer to beginning of next page */
> >             targetPagePtr += XLOG_BLCKSZ;
> >
> >             /* Wait for the next page to become available */
> >             readOff = ReadPageInternal(state, targetPagePtr,
> >                                        Min(total_len - gotlen + 
> > SizeOfXLogShortPHD,
> >                                            XLOG_BLCKSZ));
> >
> > so that's where the increment of targetPagePtr came from.
> > But "Wait for the next page to become available" seems awfully
> > trusting that there will be another page.  Should this be
> > using the no-wait code path?
>
> Thank you for the help.  It seems to me that problem is deeper.  The
> code seems to only trying to read till the end of given WAL record,
> but can't reach it.  According to the values I've seen in XLogCtl, it
> seems that RedoRecPtr points somewhere inside of that record's body.
> I don't feel confident about to understand what's going on and how to
> fix it.
>
> I've tried two things.
> 1) slot_tests_wait_for_checkpoint.patch
> Make tests wait for checkpoint completion (as I think they were
> originally intended).  However, the problem still persists.
> 2) revert_slot_last_saved_restart_lsn.patch
> Revert ca307d5cec90 and make new tests reserve WAL using wal_keep_size
> GUC.  The problem still persists.  It seems to be some problem
> independent to my attempts to fix retaining WAL files with slot's
> restart_lsn.  The new tests just spotted the existing bug.


I was able to reproduce this issue after a few runs. At the time of
failure, the last few WAL records were as follows:
mgr: XLOG        len (rec/tot):    114/   114, tx:          0, lsn:
0/290020C0, prev 0/29002088, desc: CHECKPOINT_ONLINE redo 0/29002068;
tli 1; prev tli 1; fpw true; wal_level logical; xid 0:753; oid 24576;
multi 1; offset 0; oldest xid 744 in DB 1; oldest multi 1 in DB 1;
oldest/newest commit timestamp xid: 0/0; oldest running xid 753;
online
rmgr: XLOG        len (rec/tot):     42/    42, tx:          0, lsn:
0/29004018, prev 0/290020C0, desc: OVERWRITE_CONTRECORD lsn
0/29002138; time 2025-06-23 11:28:22.748866 IST
rmgr: XLOG        len (rec/tot):    114/   114, tx:          0, lsn:
0/29004048, prev 0/29004018, desc: CHECKPOINT_SHUTDOWN redo
0/29004048; tli 1; prev tli 1; fpw true; wal_level logical; xid 0:753;
oid 24576; multi 1; offset 0; oldest xid 744 in DB 1; oldest multi 1
in DB 1; oldest/newest commit timestamp xid: 0/0; oldest running xid
0; shutdown
rmgr: Standby     len (rec/tot):     50/    50, tx:          0, lsn:
0/290040C0, prev 0/29004048, desc: RUNNING_XACTS nextXid 753
latestCompletedXid 752 oldestRunningXid 753

When trying to read the CHECKPOINT_ONLINE record at LSN 0/290020C0,
the system determines that the total record length is 8133 bytes, we
can get these values from XLogDecodeNextRecord function:
(gdb) p total_len
$111 = 8133

Based on this, it expects to read 277 bytes from the next page:
(gdb) p gotlen
$112 = 7880         # Data available in the current page
(gdb) p SizeOfXLogShortPHD
$113 = 24
(gdb) p total_len - gotlen + SizeOfXLogShortPHD
$115 = 277          # Data expected from the next page

However, in the read_local_xlog_page_guts function, it finds that WAL
data is available only up to 0x290040F8, whereas it needs data up to
0x29004115. As a result, it continuously waits for more WAL data to
appear.

One thing I'm not sure about is why the CHECKPOINT_ONLINE record is
reported as requiring 8133 bytes. This seems unusually large for a
checkpoint record, and I plan to investigate this further. Thoughts?
Another key point is the presence of an OVERWRITE_CONTRECORD in the
next page. This indicates that the record at 0/29002138 should be
skipped as a partial record is written. However, this skip is
currently not happening, leading to the wait.
This appears to be a defect in how OVERWRITE_CONTRECORD is handled —
it fails to recognize that an OVERWRITE_CONTRECORD has been written,
indicating the current record is no longer valid and should be
skipped. As a result, it continues waiting for WAL data based on an
invalid record.

Regards,
Vignesh

Re: pgsql: Improve runtime and output of tests for replication slots checkp

Reply via email to