Hi ZizhuanLiu,
Thanks for the caller survey — classifying the call sites by how they
consume the return value is a useful way to frame the risk, and I agree
pg_switch_wal() is the one path where the LSN escapes to external
consumers.
To help bound that risk, I re-ran my April comparison on a second
platform (Windows 11, MSYS2/gcc 16.1.0, commit 9d141466ff, 19beta1),
building unpatched master and v2 side by side and running the same
sequence on identically-initialized clusters:
before_switch: 0/017CF958 (both builds)
switch_1: 0/017CF970 (both builds)
after_1: 0/02000000 (both builds)
switch_2/3: 0/02000000, no-op (both builds)
The two builds returned byte-for-byte identical LSNs at every step,
and "make check" passes on the patched build (All 245 tests passed).
Incidentally this also confirms numerically that the MAXALIGN() change
is a no-op: 0/017CF958 + 24 = 0/017CF970, since SizeOfXLogRecord is
already 8-byte aligned.
As far as I can tell, the only input where v2 can return a different
value than current master is when the XLOG_SWITCH record ends exactly
on a page boundary (StartPos at page offset XLOG_BLCKSZ - 24): old code
takes the cross-page branch and adds a page header size even though no
part of the record lies on the next page, while v2 returns the boundary
itself. The v2 value is the one consistent with the end-pointer
convention of XLogBytePosToEndRecPtr(). So any external tool that
observed a different value in that rare alignment was depending on a
value inconsistent with PostgreSQL's own end-pointer semantics — I'd
read v2 as correcting that, rather than breaking compatibility.
So in summary: identical SQL-visible behavior on all common paths
(verified on Linux in April and Windows now), with the single divergent
case being a consistency fix. If it would help, I could try to craft a
reproduction of the exact page-boundary case (padding WAL position via
pg_logical_emit_message), or write a small TAP test pinning down the
boundary behavior of pg_switch_wal().
Regards,
Jihyun Bahn
PostgreSQL patch #6680 â pg_switch_wal() return value comparison
================================================================
Date: 2026-06-11T06:18:07Z
Host: Microsoft Windows 11 Pro 10.0.26200 (native, MSYS2/UCRT64 toolchain)
Compiler: gcc.exe (Rev5, Built by MSYS2 project) 16.1.0
Source: github.com/postgres/postgres (master mirror)
Commit: 9d141466ff "Undo thinko in commit e78d1d6d4." (server version 19beta1)
Build: ./configure --enable-cassert --enable-debug --without-icu
--without-readline CFLAGS="-O0 -ggdb"
Patch-base suitability check:
The pre-patch code (`EndPos = StartPos + SizeOfXLogRecord;`) is present
verbatim in xlog.c on this commit, i.e. the v2 patch is NOT yet merged
into master, so this HEAD is a valid UNPATCHED baseline.
v2 patch applied with `git apply` (clean, --check passed):
1 file changed, 9 insertions(+), 2 deletions(-) [xlog.c only]
Both clusters were initialized identically:
initdb -U postgres -E UTF8 --no-locale ; pg_ctl -o "-p 55432" start
Identical SQL sequence was run against each build on a fresh cluster.
===== UNPATCHED MASTER =====
commit: 9d141466ff Undo thinko in commit e78d1d6d4.
branch: master
server version: 19beta1
CREATE TABLE
INSERT 0 1000
before_switch
---------------
0/017CF958
(1 row)
switch_1 | after_1
------------+------------
0/017CF970 | 0/02000000
(1 row)
switch_2 | after_2
------------+------------
0/02000000 | 0/02000000
(1 row)
switch_3 | after_3
------------+------------
0/02000000 | 0/02000000
(1 row)
server log scan: log clean (no FATAL/PANIC/assertion)
===== V2 PATCHED =====
commit: 9d141466ff Undo thinko in commit e78d1d6d4.
branch: review-6680-v2
server version: 19beta1
CREATE TABLE
INSERT 0 1000
before_switch
---------------
0/017CF958
(1 row)
switch_1 | after_1
------------+------------
0/017CF970 | 0/02000000
(1 row)
switch_2 | after_2
------------+------------
0/02000000 | 0/02000000
(1 row)
switch_3 | after_3
------------+------------
0/02000000 | 0/02000000
(1 row)
server log scan: log clean (no FATAL/PANIC/assertion)
===== REGRESSION TESTS (v2 patched build) =====
make check NO_LOCALE=1 â "All 245 tests passed." (exit 0)
===== CONCLUSION =====
Both builds show the same externally observable pattern:
- mid-segment switch returns the logical end of the XLOG_SWITCH record
(before_switch 0/017CF958 + SizeOfXLogRecord(24) = 0/017CF970; identical
in both builds, since MAXALIGN(SizeOfXLogRecord) == SizeOfXLogRecord here)
- pg_current_wal_lsn() then reports the next segment boundary (0/02000000)
- repeated switches at an exact segment boundary are idempotent no-ops
(both return value and pg_current_wal_lsn() stay pinned at 0/02000000)
In this run the two builds returned byte-for-byte identical LSNs at every
step (deterministic because the clusters were initialized with identical
options on the same commit).
=> v2 does not change the SQL-visible semantics of pg_switch_wal().
The compatibility concern about external tools depending on the
returned LSN is not triggered by this patch in the common
(non-page-boundary) case exercised here.
>
> 2026. 6. 10. 오후 10:28, ZizhuanLiu X-MAN <[email protected]> 작성:
>
>
>>
>> Original
>> From: cca5507 <[email protected]>
>> Date: 2026-04-21 19:35
>> To:반지현<[email protected]>, pgsql-hackers
>> <[email protected]>
>> Subject: Re: Return value of XLogInsertRecord() for XLOG_SWITCH record
>>
>>
>> Hi, thanks for the test!
>>
>>> One question:
>>> The original code did not apply MAXALIGN() to SizeOfXLogRecord before
>>> adding it. In practice SizeOfXLogRecord is likely already MAXALIGN'd
>>> (given typical record header layout), but could you confirm whether
>>> MAXALIGN() here is a correctness fix, a defensive no-op, or something
>>> that requires a separate note in the commit message
>>>
>>> Otherwise the change looks good to me, and I think it's a reasonable
>>> cleanup.
>>
>> I apply the MAXALIGN() to keep it consistent with ReserveXLogSwitch(), the
>> value seems unchanged though.
>>
>> Attach v2 which is more efficient.
>>
>> --
>> Regards,
>> ChangAo Chen
>
>
> Hi, cca5507,반지현,
>
> Thanks for your patch and test— it’s sparked my deeper dive into the WAL
> switch logic.
>
> I’ve done a preliminary survey of all top-level callers of XLogInsertRecord()
> that generate XLOG_SWITCH records, and grouped them into three categories:
> 1.do_pg_backup_start(), do_pg_backup_stop(), ShutdownXLOG()
> These callers completely ignore the return value of XLogInsertRecord(),
> so a semantic change here would have zero internal impact on them.
>
> 2.CheckArchiveTimeout()
> It stores the returned LSN into switchpoint only for debug logging:
> ```c
> if (XLogSegmentOffset(switchpoint, wal_segment_size) != 0)
> elog(DEBUG1, "write-ahead log switch forced (\"archive_timeout\"=%d)",
> XLogArchiveTimeout);
> ```c
> The offset check here is just a trivial debug print, and would not
> introduce functional defects even if the returned LSN changes its meaning.
>
> 3.pg_switch_wal(PG_FUNCTION_ARGS) — the critical public entry point
> This function directly returns the LSN from XLogInsertRecord() as
> its SQL-level return value exposed to end users, scripts, and external
> tooling.
>
> Risk analysis:
> From the above classification, internal PostgreSQL core logic suffers
> no functional breakage if we unify the return value semantics of
> XLogInsertRecord().
>
> However, there are compatibility risks for external consumers relying on
> pg_switch_wal()’s output:
> Custom applications, backup scripts, and monitoring tools may take
> the returned LSN as the exact position of the XLOG_SWITCH record to
> run space / time range analysis between the switch record and segment end.
>
> Since we cannot guarantee how existing third-party systems interpret this
> return
> LSN, altering its definition would break established external workflows and
> trigger
> unpredictable side effects for legacy deployments.
>
> That is the main backward-compatibility risk I can identify for this change.
>
> Happy to hear your thoughts or any corrections to my analysis.
>
>
> regards,
> --
> ZizhuanLiu (X-MAN)
> [email protected]