On Fri, Nov 10, 2023 at 07:59:43PM -0800, Andres Freund wrote: > Hi, > > On 2023-11-09 16:14:07 -0500, Bruce Momjian wrote: > > On Thu, Nov 9, 2023 at 09:49:48PM +0100, Matthias van de Meent wrote: > > > Either way, I think fix #1 is most correct (as was attached in > > > offset2.diff, and quoted verbatim here), because that has no chance of > > > having surprising underflowing behaviour when you use '0/0'::lsn as > > > input. > > > > Attached is the full patch that changes pg_walfile_name_offset() and > > pg_walfile_name(). There is no need for doc changes. > > I think this needs to add tests "documenting" the changed behaviour and > perhaps also for a few other edge cases. You could e.g. test > SELECT * FROM pg_walfile_name_offset('0/0') > which today returns bogus values, and which is independent of the wal segment > size. > > And with > SELECT setting::int8 AS segment_size FROM pg_settings WHERE name = > 'wal_segment_size' \gset > you can test real things too, e.g.: > SELECT segment_number, file_offset FROM pg_walfile_name_offset('0/0'::pg_lsn > + :segment_size), pg_split_walfile_name(file_name); > SELECT segment_number, file_offset FROM pg_walfile_name_offset('0/0'::pg_lsn > + :segment_size + 1), pg_split_walfile_name(file_name); > SELECT segment_number, file_offset = :segment_size - 1 FROM > pg_walfile_name_offset('0/0'::pg_lsn + :segment_size - 1), > pg_split_walfile_name(file_name);
Sure, I have added these tests. FYI, pg_walfile_name_offset() has this C comment, which I have removed in this patch; * Note that a location exactly at a segment boundary is taken to be in * the previous segment. This is usually the right thing, since the * expected usage is to determine which xlog file(s) are ready to archive. There is also a documentation mention of this behavior: When the given write-ahead log location is exactly at a write-ahead log file boundary, both these functions return the name of the preceding write-ahead log file. This is usually the desired behavior for managing write-ahead log archiving behavior, since the preceding file is the last one that currently needs to be archived. After seeing the doc mention, I started digging into the history of this feature. It was originally called pg_current_xlogfile_offset() and proposed in this email thread, which started on 2006-07-31: https://www.postgresql.org/message-id/flat/1154384790.3226.21.camel%40localhost.localdomain In the initial patch by Simon Riggs, there was no "previous segment file" behavior, just a simple filename/offset calculation. This was applied on 2006-08-06 with this commit: commit 704ddaaa09 Author: Tom Lane <t...@sss.pgh.pa.us> Date: Sun Aug 6 03:53:44 2006 +0000 Add support for forcing a switch to a new xlog file; cause such a switch to happen automatically during pg_stop_backup(). Add some functions for interrogating the current xlog insertion point and for easily extracting WAL filenames from the hex WAL locations displayed by pg_stop_backup and friends. Simon Riggs with some editorialization by Tom Lane. and the email of the commit said: https://www.postgresql.org/message-id/18457.1154836638%40sss.pgh.pa.us I also made the new user-level functions a bit more orthogonal, so that filenames could be extracted from the existing functions like pg_stop_backup. There is later talk about returning last write pointer vs. the current insert pointer, and having it match what is returned by pg_stop_backup(): https://www.postgresql.org/message-id/1155124565.2368.95.camel%40localhost.localdomain Methinks it should be the Write pointer all of the time, since I can't think of a valid reason for wanting to know where the Insert pointer is *before* we've written to the xlog file. Having it be the Insert pointer could lead to some errors. and I suspect that it was the desire to return the last write pointer that caused the function to return the previous segment on a boundary offset. This was intended to simplify log shipping implementations, I think. The function eventually was renamed in the xlog-to-wal renaming and moved from xlog.c to xlogfuncs.c. This thread in 2022 mentioned the inconsistency for 0/0, but didn't seem to talk about the inconsistency of offset vs file name: https://www.postgresql.org/message-id/flat/20220204225057.GA1535307%40nathanxps13#d964275c9540d8395e138efc0a75f7e8 and it concluded with: Yes, its the deliberate choice of design, or a kind of questionable-but-unoverturnable decision. I think there are many external tools conscious of this behavior. However, with the report about the inconsistency, the attached patch fixes the behavior and removes the documentation about the odd behavior. This will need to be mentioned as an incompatibility in the major version release notes. -- Bruce Momjian <br...@momjian.us> https://momjian.us EDB https://enterprisedb.com Only you can decide what is important to you.
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index d963f0a0a0..7c22420839 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -27044,11 +27044,6 @@ postgres=# SELECT * FROM pg_walfile_name_offset((pg_backup_stop()).lsn); (1 row) </programlisting> Similarly, <function>pg_walfile_name</function> extracts just the write-ahead log file name. - When the given write-ahead log location is exactly at a write-ahead log file boundary, both - these functions return the name of the preceding write-ahead log file. - This is usually the desired behavior for managing write-ahead log archiving - behavior, since the preceding file is the last one that currently - needs to be archived. </para> <para> diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c index 45a70668b1..45452d937c 100644 --- a/src/backend/access/transam/xlogfuncs.c +++ b/src/backend/access/transam/xlogfuncs.c @@ -374,10 +374,6 @@ pg_last_wal_replay_lsn(PG_FUNCTION_ARGS) /* * Compute an xlog file name and decimal byte offset given a WAL location, * such as is returned by pg_backup_stop() or pg_switch_wal(). - * - * Note that a location exactly at a segment boundary is taken to be in - * the previous segment. This is usually the right thing, since the - * expected usage is to determine which xlog file(s) are ready to archive. */ Datum pg_walfile_name_offset(PG_FUNCTION_ARGS) @@ -414,7 +410,7 @@ pg_walfile_name_offset(PG_FUNCTION_ARGS) /* * xlogfilename */ - XLByteToPrevSeg(locationpoint, xlogsegno, wal_segment_size); + XLByteToSeg(locationpoint, xlogsegno, wal_segment_size); XLogFileName(xlogfilename, GetWALInsertionTimeLine(), xlogsegno, wal_segment_size); @@ -457,7 +453,7 @@ pg_walfile_name(PG_FUNCTION_ARGS) errhint("%s cannot be executed during recovery.", "pg_walfile_name()"))); - XLByteToPrevSeg(locationpoint, xlogsegno, wal_segment_size); + XLByteToSeg(locationpoint, xlogsegno, wal_segment_size); XLogFileName(xlogfilename, GetWALInsertionTimeLine(), xlogsegno, wal_segment_size); diff --git a/src/test/regress/expected/misc_functions.out b/src/test/regress/expected/misc_functions.out index c669948370..c41dc2d4be 100644 --- a/src/test/regress/expected/misc_functions.out +++ b/src/test/regress/expected/misc_functions.out @@ -619,7 +619,7 @@ SELECT count(*) > 0 AS ok FROM pg_control_system(); t (1 row) --- pg_split_walfile_name +-- pg_split_walfile_name, pg_walfile_name & pg_walfile_name_offset SELECT * FROM pg_split_walfile_name(NULL); segment_number | timeline_id ----------------+------------- @@ -642,3 +642,42 @@ SELECT segment_number > 0 AS ok_segment_number, timeline_id t | 4294967295 (1 row) +SELECT setting::int8 AS segment_size +FROM pg_settings +WHERE name = 'wal_segment_size' +\gset +SELECT segment_number, file_offset +FROM pg_walfile_name_offset('0/0'::pg_lsn + :segment_size), + pg_split_walfile_name(file_name); + segment_number | file_offset +----------------+------------- + 1 | 0 +(1 row) + +SELECT segment_number, file_offset +FROM pg_walfile_name_offset('0/0'::pg_lsn + :segment_size + 1), + pg_split_walfile_name(file_name); + segment_number | file_offset +----------------+------------- + 1 | 1 +(1 row) + +SELECT segment_number, file_offset = :segment_size - 1 +FROM pg_walfile_name_offset('0/0'::pg_lsn + :segment_size - 1), + pg_split_walfile_name(file_name); + segment_number | ?column? +----------------+---------- + 0 | t +(1 row) + +SELECT * +FROM (values ('0/16ffffff'), ('0/17000000'), ('0/17000001')) as t(lsn), + LATERAL pg_walfile_name_offset(lsn::pg_lsn), + LATERAL pg_walfile_name(lsn::pg_lsn); + lsn | file_name | file_offset | pg_walfile_name +------------+--------------------------+-------------+-------------------------- + 0/16ffffff | 000000010000000000000016 | 16777215 | 000000010000000000000016 + 0/17000000 | 000000010000000000000017 | 0 | 000000010000000000000017 + 0/17000001 | 000000010000000000000017 | 1 | 000000010000000000000017 +(3 rows) + diff --git a/src/test/regress/sql/misc_functions.sql b/src/test/regress/sql/misc_functions.sql index b57f01f3e9..872b9d3af1 100644 --- a/src/test/regress/sql/misc_functions.sql +++ b/src/test/regress/sql/misc_functions.sql @@ -230,10 +230,27 @@ SELECT count(*) > 0 AS ok FROM pg_control_init(); SELECT count(*) > 0 AS ok FROM pg_control_recovery(); SELECT count(*) > 0 AS ok FROM pg_control_system(); --- pg_split_walfile_name +-- pg_split_walfile_name, pg_walfile_name & pg_walfile_name_offset SELECT * FROM pg_split_walfile_name(NULL); SELECT * FROM pg_split_walfile_name('invalid'); SELECT segment_number > 0 AS ok_segment_number, timeline_id FROM pg_split_walfile_name('000000010000000100000000'); SELECT segment_number > 0 AS ok_segment_number, timeline_id FROM pg_split_walfile_name('ffffffFF00000001000000af'); +SELECT setting::int8 AS segment_size +FROM pg_settings +WHERE name = 'wal_segment_size' +\gset +SELECT segment_number, file_offset +FROM pg_walfile_name_offset('0/0'::pg_lsn + :segment_size), + pg_split_walfile_name(file_name); +SELECT segment_number, file_offset +FROM pg_walfile_name_offset('0/0'::pg_lsn + :segment_size + 1), + pg_split_walfile_name(file_name); +SELECT segment_number, file_offset = :segment_size - 1 +FROM pg_walfile_name_offset('0/0'::pg_lsn + :segment_size - 1), + pg_split_walfile_name(file_name); +SELECT * +FROM (values ('0/16ffffff'), ('0/17000000'), ('0/17000001')) as t(lsn), + LATERAL pg_walfile_name_offset(lsn::pg_lsn), + LATERAL pg_walfile_name(lsn::pg_lsn);