Hi all, While reviewing some code around pg_lsn_in() I came across a couple of (potential?) issues:
1. Commit 21f428eb moves lsn conversion functionality from pg_lsn_in() to a new function pg_lsn_in_internal(). It takes two parameters the lsn string and a pointer to a boolean (*have_error) to indicate if there was an error while converting string format to XLogRecPtr. pg_lsn_in_internal() only sets the *have_error to 'true' if there is an error, but leaves it for the caller to make sure it was passed by initializing as 'false'. Currently it is only getting called from pg_lsn_in() and timestamptz_in() where it has been taken care that the flag is set to false before making the call. But I think in general it opens the door for unpredictable bugs if pg_lsn_in_internal() gets called from other locations in future (if need maybe) and by mistake, it just checks the return value of the flag without setting it to false before making a call. I am attaching a patch that makes sure that *have_error is set to false in pg_lsn_in_internal() itself, rather than being caller dependent. Also, I think there might be callers who may not care if there had been an error while converting and just ok to use InvalidXLogRecPtr against return value, and may pass just a NULL boolean pointer to this function, but for now, I have left that untouched. Maybe just adding an Assert would improve the situation for time being. I have attached a patch (fix_have_error_flag.patch) to take care of above. 2. I happened to peep in test case pg_lsn.sql, and I started exploring the macros around lsn. Following macros: {code} /* * Zero is used indicate an invalid pointer. Bootstrap skips the first possible * WAL segment, initializing the first WAL page at WAL segment size, so no XLOG * record can begin at zero. */ #define InvalidXLogRecPtr 0 #define XLogRecPtrIsInvalid(r) ((r) == InvalidXLogRecPtr) {code} IIUC, in the comment above we clearly want to mark 0 as an invalid lsn (also further IIUC the comment states - lsn would start from (walSegSize + 1)). Given this, should not it be invalid to allow "0/0" as the value of type pg_lsn, or for that matter any number < walSegSize? There is a test scenario in test case pg_lsn.sql which tests insertion of "0/0" in a table having a pg_lsn column. I think this is contradictory to the comment. I am not sure of thought behind this and might be wrong while making the above assumption. But, I tried to look around a bit in hackers emails and could not locate any related discussion. I have attached a patch (mark_lsn_0_invalid.patch) that makes above changes. Thoughts? Regards, Jeevan Ladhe
fix_have_error_flag.patch
Description: Binary data
mark_lsn_0_invalid.patch
Description: Binary data