concerns around pg_lsn

Jeevan Ladhe Mon, 29 Jul 2019 10:27:05 -0700

Hi all,

While reviewing some code around pg_lsn_in() I came across a couple of
(potential?) issues:


1.
Commit 21f428eb moves lsn conversion functionality from pg_lsn_in() to a new
function pg_lsn_in_internal(). It takes two parameters the lsn string and a
pointer to a boolean (*have_error) to indicate if there was an error while
converting string format to XLogRecPtr.

pg_lsn_in_internal() only sets the *have_error to 'true' if there is an
error,
but leaves it for the caller to make sure it was passed by initializing as
'false'. Currently it is only getting called from pg_lsn_in() and
timestamptz_in()
where it has been taken care that the flag is set to false before making the
call. But I think in general it opens the door for unpredictable bugs if
pg_lsn_in_internal() gets called from other locations in future (if need
maybe) and by mistake, it just checks the return value of the flag without
setting it to false before making a call.

I am attaching a patch that makes sure that *have_error is set to false in
pg_lsn_in_internal() itself, rather than being caller dependent.

Also, I think there might be callers who may not care if there had been an
error
while converting and just ok to use InvalidXLogRecPtr against return value,
and
may pass just a NULL boolean pointer to this function, but for now, I have
left
that untouched. Maybe just adding an Assert would improve the situation for
time being.

I have attached a patch (fix_have_error_flag.patch) to take care of above.

2.
I happened to peep in test case pg_lsn.sql, and I started exploring the
macros
around lsn.

Following macros:

{code}
/*
 * Zero is used indicate an invalid pointer. Bootstrap skips the first
possible
 * WAL segment, initializing the first WAL page at WAL segment size, so no
XLOG
 * record can begin at zero.

 */
#define InvalidXLogRecPtr   0
#define XLogRecPtrIsInvalid(r)  ((r) == InvalidXLogRecPtr)
{code}

IIUC, in the comment above we clearly want to mark 0 as an invalid lsn (also
further IIUC the comment states - lsn would start from (walSegSize + 1)).
Given
this, should not it be invalid to allow "0/0" as the value of type pg_lsn,
or
for that matter any number < walSegSize?

There is a test scenario in test case pg_lsn.sql which tests insertion of
"0/0"
in a table having a pg_lsn column. I think this is contradictory to the
comment.

I am not sure of thought behind this and might be wrong while making the
above
assumption. But, I tried to look around a bit in hackers emails and could
not
locate any related discussion.

I have attached a patch (mark_lsn_0_invalid.patch) that makes above changes.

Thoughts?

Regards,
Jeevan Ladhe

fix_have_error_flag.patch
Description: Binary data

mark_lsn_0_invalid.patch
Description: Binary data

concerns around pg_lsn

Reply via email to