On Sun, 22 Jun 2025 at 05:46, Alexander Korotkov <aekorot...@gmail.com> wrote: > > On Sat, Jun 21, 2025 at 2:42 AM Tom Lane <t...@sss.pgh.pa.us> wrote: > > > > Alexander Korotkov <aekorot...@gmail.com> writes: > > > And I see the following variable values. > > > > > (lldb) p/x targetPagePtr > > > (XLogRecPtr) 0x0000000029004000 > > > (lldb) p/x RecPtr > > > (XLogRecPtr) 0x0000000029002138 > > > > > I hardly understand how is this possible given it was compiled with "-O0". > > > I'm planning to continue investigating this tomorrow. > > > > Yeah, I see > > > > (lldb) p/x targetPagePtr > > (XLogRecPtr) 0x0000000029004000 > > (lldb) p/x RecPtr > > (XLogRecPtr) 0x0000000029002138 > > (lldb) p/x RecPtr - (RecPtr % 8192) > > (XLogRecPtr) 0x0000000029002000 > > > > We're here: > > > > /* Calculate pointer to beginning of next page */ > > targetPagePtr += XLOG_BLCKSZ; > > > > /* Wait for the next page to become available */ > > readOff = ReadPageInternal(state, targetPagePtr, > > Min(total_len - gotlen + > > SizeOfXLogShortPHD, > > XLOG_BLCKSZ)); > > > > so that's where the increment of targetPagePtr came from. > > But "Wait for the next page to become available" seems awfully > > trusting that there will be another page. Should this be > > using the no-wait code path? > > Thank you for the help. It seems to me that problem is deeper. The > code seems to only trying to read till the end of given WAL record, > but can't reach it. According to the values I've seen in XLogCtl, it > seems that RedoRecPtr points somewhere inside of that record's body. > I don't feel confident about to understand what's going on and how to > fix it. > > I've tried two things. > 1) slot_tests_wait_for_checkpoint.patch > Make tests wait for checkpoint completion (as I think they were > originally intended). However, the problem still persists. > 2) revert_slot_last_saved_restart_lsn.patch > Revert ca307d5cec90 and make new tests reserve WAL using wal_keep_size > GUC. The problem still persists. It seems to be some problem > independent to my attempts to fix retaining WAL files with slot's > restart_lsn. The new tests just spotted the existing bug.
I was able to reproduce this issue after a few runs. At the time of failure, the last few WAL records were as follows: mgr: XLOG len (rec/tot): 114/ 114, tx: 0, lsn: 0/290020C0, prev 0/29002088, desc: CHECKPOINT_ONLINE redo 0/29002068; tli 1; prev tli 1; fpw true; wal_level logical; xid 0:753; oid 24576; multi 1; offset 0; oldest xid 744 in DB 1; oldest multi 1 in DB 1; oldest/newest commit timestamp xid: 0/0; oldest running xid 753; online rmgr: XLOG len (rec/tot): 42/ 42, tx: 0, lsn: 0/29004018, prev 0/290020C0, desc: OVERWRITE_CONTRECORD lsn 0/29002138; time 2025-06-23 11:28:22.748866 IST rmgr: XLOG len (rec/tot): 114/ 114, tx: 0, lsn: 0/29004048, prev 0/29004018, desc: CHECKPOINT_SHUTDOWN redo 0/29004048; tli 1; prev tli 1; fpw true; wal_level logical; xid 0:753; oid 24576; multi 1; offset 0; oldest xid 744 in DB 1; oldest multi 1 in DB 1; oldest/newest commit timestamp xid: 0/0; oldest running xid 0; shutdown rmgr: Standby len (rec/tot): 50/ 50, tx: 0, lsn: 0/290040C0, prev 0/29004048, desc: RUNNING_XACTS nextXid 753 latestCompletedXid 752 oldestRunningXid 753 When trying to read the CHECKPOINT_ONLINE record at LSN 0/290020C0, the system determines that the total record length is 8133 bytes, we can get these values from XLogDecodeNextRecord function: (gdb) p total_len $111 = 8133 Based on this, it expects to read 277 bytes from the next page: (gdb) p gotlen $112 = 7880 # Data available in the current page (gdb) p SizeOfXLogShortPHD $113 = 24 (gdb) p total_len - gotlen + SizeOfXLogShortPHD $115 = 277 # Data expected from the next page However, in the read_local_xlog_page_guts function, it finds that WAL data is available only up to 0x290040F8, whereas it needs data up to 0x29004115. As a result, it continuously waits for more WAL data to appear. One thing I'm not sure about is why the CHECKPOINT_ONLINE record is reported as requiring 8133 bytes. This seems unusually large for a checkpoint record, and I plan to investigate this further. Thoughts? Another key point is the presence of an OVERWRITE_CONTRECORD in the next page. This indicates that the record at 0/29002138 should be skipped as a partial record is written. However, this skip is currently not happening, leading to the wait. This appears to be a defect in how OVERWRITE_CONTRECORD is handled — it fails to recognize that an OVERWRITE_CONTRECORD has been written, indicating the current record is no longer valid and should be skipped. As a result, it continues waiting for WAL data based on an invalid record. Regards, Vignesh