====================================================== 1.An XLOG_SWITCH record can be written within a WAL page with plenty of free space remaining afterward: backend> select 'pg_current_wal_flush_lsn' as lsn_name, pg_current_wal_flush_lsn(), pg_walfile_name_offset(pg_current_wal_flush_lsn()) union all select 'pg_current_wal_insert_lsn', pg_current_wal_insert_lsn(), pg_walfile_name_offset(pg_current_wal_insert_lsn()) union all select 'pg_current_wal_lsn', pg_current_wal_lsn(), pg_walfile_name_offset(pg_current_wal_lsn()); 1: lsn_name (typeid = 25, len = -1, typmod = -1, byval = f) 2: pg_current_wal_flush_lsn (typeid = 3220, len = 8, typmod = -1, byval = t) 3: pg_walfile_name_offset (typeid = 2249, len = -1, typmod = -1, byval = f) ---- 1: = "pg_current_wal_flush_lsn" (typeid = 25, len = -1, typmod = -1, byval = f) 2: = "0/03000190" (typeid = 3220, len = 8, typmod = -1, byval = t) 3: = "(000000010000000000000003,400)" (typeid = 2249, len = -1, typmod = -1, byval = f) ---- 1: = "pg_current_wal_insert_lsn" (typeid = 25, len = -1, typmod = -1, byval = f) 2: = "0/03000190" (typeid = 3220, len = 8, typmod = -1, byval = t) 3: = "(000000010000000000000003,400)" (typeid = 2249, len = -1, typmod = -1, byval = f) ---- 1: = "pg_current_wal_lsn" (typeid = 25, len = -1, typmod = -1, byval = f) 2: = "0/03000190" (typeid = 3220, len = 8, typmod = -1, byval = t) 3: = "(000000010000000000000003,400)" (typeid = 2249, len = -1, typmod = -1, byval = f) ---- backend> SELECT pg_switch_wal(); 1: pg_switch_wal (typeid = 3220, len = 8, typmod = -1, byval = t) ---- 1: pg_switch_wal = "0/030001A8" (typeid = 3220, len = 8, typmod = -1, byval = t) ---- Works fine: "0/03000190" + MAXALIGN(SizeOfXLogRecord) == "0/030001A8" ====================================================== 2.An XLOG_SWITCH record may exactly fill a WAL page, leaving no free space afterward. backend> insert into tt1 values(1); backend> SELECT pg_switch_wal(); Let¡¯s set a breakpoint at startbytepos = Insert->CurrBytePos; inside ReserveXLogSwitch(). This line advances Insert->CurrBytePos to a position MAXALIGN(SizeOfXLogRecord) bytes ahead within the current WAL page. GDB:set Insert->CurrBytePos = XLogRecPtrToBytePos(XLogBytePosToRecPtr(Insert->CurrBytePos) + (XLOG_BLCKSZ - XLogBytePosToRecPtr(Insert->CurrBytePos) % XLOG_BLCKSZ) - size) Continue execution, then hit the breakpoint after the line EndPos = StartPos + MAXALIGN(SizeOfXLogRecord); inside XLogInsertRecord(), and proceed with debugging. GDB:p StartPos ===>$5 = 67117032 Let us analyze it. GDB:p 67117032/wal_segment_size ===>$10 = 4 (WAL segment number) GDB:p (67117032%wal_segment_size)/XLOG_BLCKSZ ===>$11 = 0 (block number, zero-based) GDB:p 67117032%XLOG_BLCKSZ ===>$12 = 8168 (The distance ahead of the page boundary equals MAXALIGN (SizeOfXLogRecord).) GDB:p 67117032 % wal_segment_size ====>$13 = 8168 We adjusted StartPos to 8168. Current value: 8168, at block number 0 of WAL segment file 4. GDB:p EndPos ===>$6 = 67117056 GDB:p StartPos / XLOG_BLCKSZ ===>$7 = 8192 (Current page offset is 8192) GDB:p EndPos / XLOG_BLCKSZ ===>$8 = 8193 EndPos points to the first byte of the next record, which will be written at page offset 8193. GDB:p (EndPos - 1) / XLOG_BLCKSZ ===>$9 = 8192 EndPos - 1 is the last written byte of the WAL record, located at page offset 8192. This position lands exactly on the page boundary without crossing to a new page. The unpatched code uses EndPos / XLOG_BLCKSZ to check whether the next WAL record resides on the same page as the current one. The v3 patch uses (EndPos - 1) / XLOG_BLCKSZ to compare whether the first and last bytes of the current WAL record are within the same page. Since MAXALIGN(SizeOfXLogRecord) is only 24 bytes, we only need to handle the scenario of crossing at most one page; multi-page crossing is impossible here. backend> SELECT pg_switch_wal(); 1: pg_switch_wal (typeid = 3220, len = 8, typmod = -1, byval = t) ---- 1: pg_switch_wal = "0/04002000" (typeid = 3220, len = 8, typmod = -1, byval = t) ---- Combined with pg_switch_wal(), the logic works correctly and yields the expected result: 8168 (block 0 of WAL segment file 4) + MAXALIGN(SizeOfXLogRecord) equals LSN 0/04002000. ====================================================== 3.Calling pg_switch_wal() twice consecutively does not trigger an actual WAL segment switch. backend> select 'pg_current_wal_flush_lsn' as lsn_name, pg_current_wal_flush_lsn(), pg_walfile_name_offset(pg_current_wal_flush_lsn()) union all select 'pg_current_wal_insert_lsn', pg_current_wal_insert_lsn(), pg_walfile_name_offset(pg_current_wal_insert_lsn()) union all select 'pg_current_wal_lsn', pg_current_wal_lsn(), pg_walfile_name_offset(pg_current_wal_lsn()); 1: lsn_name (typeid = 25, len = -1, typmod = -1, byval = f) 2: pg_current_wal_flush_lsn (typeid = 3220, len = 8, typmod = -1, byval = t) 3: pg_walfile_name_offset (typeid = 2249, len = -1, typmod = -1, byval = f) ---- 1: = "pg_current_wal_flush_lsn" (typeid = 25, len = -1, typmod = -1, byval = f) 2: = "0/05000000" (typeid = 3220, len = 8, typmod = -1, byval = t) 3: = "(000000010000000000000005,0)" (typeid = 2249, len = -1, typmod = -1, byval = f) ---- 1: = "pg_current_wal_insert_lsn" (typeid = 25, len = -1, typmod = -1, byval = f) 2: = "0/05000028" (typeid = 3220, len = 8, typmod = -1, byval = t) 3: = "(000000010000000000000005,40)" (typeid = 2249, len = -1, typmod = -1, byval = f) ---- 1: = "pg_current_wal_lsn" (typeid = 25, len = -1, typmod = -1, byval = f) 2: = "0/05000000" (typeid = 3220, len = 8, typmod = -1, byval = t) 3: = "(000000010000000000000005,0)" (typeid = 2249, len = -1, typmod = -1, byval = f) ---- backend> SELECT pg_switch_wal(); 1: pg_switch_wal (typeid = 3220, len = 8, typmod = -1, byval = t) ---- 1: pg_switch_wal = "0/05000000" (typeid = 3220, len = 8, typmod = -1, byval = t) ---- backend> SELECT pg_switch_wal(); 1: pg_switch_wal (typeid = 3220, len = 8, typmod = -1, byval = t) ---- 1: pg_switch_wal = "0/05000000" (typeid = 3220, len = 8, typmod = -1, byval = t) ---- backend> ====================================================== 4.Cross page boundaries within a single WAL segment file: backend> insert into tt1 values(1); backend> select 'pg_current_wal_flush_lsn' as lsn_name, pg_current_wal_flush_lsn(), pg_walfile_name_offset(pg_current_wal_flush_lsn()) union all select 'pg_current_wal_insert_lsn', pg_current_wal_insert_lsn(), pg_walfile_name_offset(pg_current_wal_insert_lsn()) union all select 'pg_current_wal_lsn', pg_current_wal_lsn(), pg_walfile_name_offset(pg_current_wal_lsn()); 1: lsn_name (typeid = 25, len = -1, typmod = -1, byval = f) 2: pg_current_wal_flush_lsn (typeid = 3220, len = 8, typmod = -1, byval = t) 3: pg_walfile_name_offset (typeid = 2249, len = -1, typmod = -1, byval = f) ---- 1: = "pg_current_wal_flush_lsn" (typeid = 25, len = -1, typmod = -1, byval = f) 2: = "0/070000F8" (typeid = 3220, len = 8, typmod = -1, byval = t) 3: = "(000000010000000000000007,248)" (typeid = 2249, len = -1, typmod = -1, byval = f) ---- 1: = "pg_current_wal_insert_lsn" (typeid = 25, len = -1, typmod = -1, byval = f) 2: = "0/070000F8" (typeid = 3220, len = 8, typmod = -1, byval = t) 3: = "(000000010000000000000007,248)" (typeid = 2249, len = -1, typmod = -1, byval = f) ---- 1: = "pg_current_wal_lsn" (typeid = 25, len = -1, typmod = -1, byval = f) 2: = "0/070000F8" (typeid = 3220, len = 8, typmod = -1, byval = t) 3: = "(000000010000000000000007,248)" (typeid = 2249, len = -1, typmod = -1, byval = f) ---- backend> SELECT pg_switch_wal(); 1: pg_switch_wal (typeid = 3220, len = 8, typmod = -1, byval = t) ---- Let¡¯s set a breakpoint at StartBytePos = Insert->CurrBytePos; inside ReserveXLogSwitch(). We adjust Insert->CurrBytePos to a position sizeof(uint32) bytes ahead of the current page boundary via this assignment: GDB:set Insert->CurrBytePos = XLogRecPtrToBytePos(XLogBytePosToRecPtr(Insert->CurrBytePos) + (XLOG_BLCKSZ - XLogBytePosToRecPtr(Insert->CurrBytePos) % XLOG_BLCKSZ) - sizeof(uint32)); After executing the line *EndPos = XLogBytePosToEndRecPtr(endbytepos); within ReserveXLogSwitch(): GDB:p *EndPos ===>$9 = 117448748 #Record the value returned by XLogBytePosToEndRecPtr(). After the statement EndPos = StartPos + MAXALIGN(SizeOfXLogRecord); inside XLogInsertRecord() GDB:p StartPos ===>$10 = 117448700 GDB:p 117448700/wal_segment_size ===>$11 = 7 (WAL segment number) GDB:p (117448700%wal_segment_size)/XLOG_BLCKSZ ===>$12 = 0 (block number, zero-based) GDB:p 117448700%XLOG_BLCKSZ ===>$13 = 8188 (Remaining space before page boundary = sizeof(uint32)) GDB:p sizeof(uint32) ===>$14 = 4 GDB:p 117448700 % wal_segment_size ===>$15 = 8188 StartPos aligns with our manual tweak, offset 4 bytes (sizeof(uint32)) before the page boundary. Let¡¯s examine EndPos - 1. GDB:p EndPos-1 ===>$16 = 117448723 GDB:p 117448723/wal_segment_size ===>$17 = 7 (WAL segment number) GDB:p (117448723%wal_segment_size)/XLOG_BLCKSZ ===>$18 = 1 (block number, zero-based, crossing page within same segment file) GDB:p 117448723%XLOG_BLCKSZ ===>$19 = 19 (20 bytes) 4 bytes are stored on block 0 and 20 bytes on block 1; their combined total length is equal to MAXALIGN(SizeOfXLogRecord). Continuing the analysis: The condition if (StartPos / XLOG_BLCKSZ != (EndPos - 1) / XLOG_BLCKSZ) evaluates to true. GDB:p StartPos / XLOG_BLCKSZ ===>$21 = 14336 GDB:p (EndPos - 1) / XLOG_BLCKSZ ===>$22 = 14337 go on: "uint64 offset = XLogSegmentOffset(EndPos, wal_segment_size);" GDB:p offset ===>$23 = 8212 then run :EndPos += SizeOfXLogShortPHD; GDB:p EndPos ===>$26 = 117448748 GDB:p 117448748/wal_segment_size ===>$28 = 7 (WAL segment number) GDB:p (117448748%wal_segment_size)/XLOG_BLCKSZ ===>$29 = 1 (block number, zero-based, crossing page within same segment file) GDB:p 117448748%XLOG_BLCKSZ ===>$30 = 44 (no. 44 bytes position) backend> SELECT pg_switch_wal(); 1: pg_switch_wal (typeid = 3220, len = 8, typmod = -1, byval = t) ---- 1: pg_switch_wal = "0/0700202C" (typeid = 3220, len = 8, typmod = -1, byval = t) ---- backend> Check against pg_switch_wal() result:: 117448748(EndPos) = 117448700(StartPos)+ 24(MAXALIGN(SizeOfXLogRecord)) + 24(SizeOfXLogShortPHD) == "0/0700202C" The value 117448748 matches the result assigned via *EndPos = XLogBytePosToEndRecPtr(endbytepos); in ReserveXLogSwitch(). Reference: #define SizeOfXLogShortPHD MAXALIGN(sizeof(XLogPageHeaderData)) `p MAXALIGN(sizeof(XLogPageHeaderData)) ===>$31 = 24 #define SizeOfXLogLongPHD MAXALIGN(sizeof(XLogLongPageHeaderData)) `p MAXALIGN(sizeof(XLogLongPageHeaderData)) ===>$32 = 40 then next WAL segment number 8: backend> select 'pg_current_wal_flush_lsn' as lsn_name, pg_current_wal_flush_lsn(), pg_walfile_name_offset(pg_current_wal_flush_lsn()) union all select 'pg_current_wal_insert_lsn', pg_current_wal_insert_lsn(), pg_walfile_name_offset(pg_current_wal_insert_lsn()) union all select 'pg_current_wal_lsn', pg_current_wal_lsn(), pg_walfile_name_offset(pg_current_wal_lsn()); 1: lsn_name (typeid = 25, len = -1, typmod = -1, byval = f) 2: pg_current_wal_flush_lsn (typeid = 3220, len = 8, typmod = -1, byval = t) 3: pg_walfile_name_offset (typeid = 2249, len = -1, typmod = -1, byval = f) ---- 1: = "pg_current_wal_flush_lsn" (typeid = 25, len = -1, typmod = -1, byval = f) 2: = "0/08000000" (typeid = 3220, len = 8, typmod = -1, byval = t) 3: = "(000000010000000000000008,0)" (typeid = 2249, len = -1, typmod = -1, byval = f) ---- 1: = "pg_current_wal_insert_lsn" (typeid = 25, len = -1, typmod = -1, byval = f) 2: = "0/08000028" (typeid = 3220, len = 8, typmod = -1, byval = t) 3: = "(000000010000000000000008,40)" (typeid = 2249, len = -1, typmod = -1, byval = f) ---- 1: = "pg_current_wal_lsn" (typeid = 25, len = -1, typmod = -1, byval = f) 2: = "0/08000000" (typeid = 3220, len = 8, typmod = -1, byval = t) 3: = "(000000010000000000000008,0)" (typeid = 2249, len = -1, typmod = -1, byval = f) ---- backend> ====================================================== 5.This WAL record will cross both page boundaries and WAL segment files. backend> SELECT pg_switch_wal(); 1: pg_switch_wal (typeid = 3220, len = 8, typmod = -1, byval = t) ---- 1: pg_switch_wal = "0/0A000000" (typeid = 3220, len = 8, typmod = -1, byval = t) ---- backend> insert into tt1 values(1); backend> SELECT pg_switch_wal(); 1: pg_switch_wal (typeid = 3220, len = 8, typmod = -1, byval = t) Let¡¯s set a breakpoint at the line StartBytePos = Insert->CurrBytePos; inside ReserveXLogSwitch(). We adjust Insert->CurrBytePos to a position sizeof(uint32) bytes before the final page boundary of the current WAL segment file with this assignment: GDB:Insert->CurrBytePos = XLogRecPtrToBytePos( XLogBytePosToRecPtr(Insert->CurrBytePos) + (wal_segment_size - XLogBytePosToRecPtr(Insert->CurrBytePos) % wal_segment_size) - sizeof(uint32) ) Then we examine the state after executing *EndPos = XLogBytePosToEndRecPtr(endbytepos); within ReserveXLogSwitch(). GDB: p *StartPos ===>$42 = 184549372 GDB: p *EndPos ===>$43 = 184549436 #Record this value. after "EndPos = StartPos + MAXALIGN(SizeOfXLogRecord);" at XLogInsertRecord() GDB:p StartPos ===>$44 = 184549372 GDB:p 184549372/wal_segment_size ===>$45 = 10 (WAL segment number) GDB:p (184549372%wal_segment_size)/XLOG_BLCKSZ ===>$46 = 2047 (block number, zero-based) GDB:p 184549372%XLOG_BLCKSZ ===>$48 = 8188 (ahead the pageboundary is sizeof(uint32)) GDB:p sizeof(uint32) ===>$14 = 4 GDB:p 184549372 % wal_segment_size ===>$50 = 16777212 At this point, StartPos matches our manual tweak: it sits sizeof(uint32) (4 bytes) before the last page boundary of the current WAL segment file. Let us inspect the value of EndPos - 1: GDB:p EndPos-1 ===>$51 = 184549395 GDB:p 184549395/wal_segment_size ===>$52 = 11 (WAL segment number, crossing segment file) GDB:p (184549395%wal_segment_size)/XLOG_BLCKSZ ===>$53 = 0 (block number, zero-based, crossing page ) GDB:p 184549395%XLOG_BLCKSZ ===>$54 = 19 (20 bytes) 4 bytes reside on block 2047 of WAL segment 10, and the remaining 20 bytes are stored on block 0 of WAL segment 11. The combined total length equals MAXALIGN(SizeOfXLogRecord). Continue the analysis: The conditional expression if (StartPos / XLOG_BLCKSZ != (EndPos - 1) / XLOG_BLCKSZ) evaluates to true. GDB:p StartPos / XLOG_BLCKSZ ===>$55 = 22527 GDB:p (EndPos - 1) / XLOG_BLCKSZ ===>$56 = 22528 go on: "uint64 offset = XLogSegmentOffset(EndPos, wal_segment_size);" GDB:p offset ===>$57 = 20 then run :EndPos += SizeOfXLogLongPHD; GDB:p EndPos ===>$58 = 184549436 GDB:p 184549436/wal_segment_size ===>$59 = 11 (WAL segment number, crossing segment file) GDB:p (184549436%wal_segment_size)/XLOG_BLCKSZ ===>$60 = 0 (block number, zero-based, corssing page) GDB:p 184549436%XLOG_BLCKSZ ===>$61 = 60 (no. 60 bytes position) backend> SELECT pg_switch_wal(); 1: pg_switch_wal (typeid = 3220, len = 8, typmod = -1, byval = t) ---- 1: pg_switch_wal = "0/0B00003C" (typeid = 3220, len = 8, typmod = -1, byval = t) ---- backend> 184549436(EndPos) = 184549372(StartPos) + 24(MAXALIGN(SizeOfXLogRecord)) + 40(SizeOfXLogLongPHD)== "0/0B00003C" The value 184549436 matches the result assigned by the statement *EndPos = XLogBytePosToEndRecPtr(endbytepos); inside ReserveXLogSwitch().