On 2021/03/04 10:28, Kyotaro Horiguchi wrote:
At Wed, 3 Mar 2021 14:56:25 -0800, Soumyadeep Chakraborty 
<soumyadeep2...@gmail.com> wrote in
On 2021/03/03 17:46, Heikki Linnakangas wrote:

I think it should be reset even earlier, inside XlogReadTwoPhaseData()
probably. With your patch, doesn't the LogStandbySnapshot() call just
above where you're ressetting ThisTimeLineID also write a WAL record
with incorrect timeline?

Agreed.

Right.

On Wed, Mar 3, 2021 at 1:04 AM Fujii Masao <masao.fu...@oss.nttdata.com> wrote:

Even better, can we avoid setting ThisTimeLineID in XlogReadTwoPhaseData() in 
the first place?



Or isn't it better to reset ThisTimeLineID in read_local_xlog_page(), i.e.,
prevent read_local_xlog_page() from changing ThisTimeLineID? I'm not
sure if that's possible, though.. In the future other functions that calls
read_local_xlog_page() during the promotion may appear. Fixing the issue
outside read_local_xlog_page() may cause those functions to get
the same issue.

I agree. We should fix the issue in read_local_xlog_page(). I have
attached two different patches which do so:
saved_ThisTimeLineID.patch and pass_ThisTimeLineID.patch.

The former saves the value of the ThisTimeLineID before it gets changed
in read_local_xlog_page() and resets it after ThisTimeLineID has been
used later on in the code (by XLogReadDetermineTimeline()).

The latter removes occurrences of ThisTimeLineID from
XLogReadDetermineTimeline() and introduces an argument currTLI to
XLogReadDetermineTimeline() to be used in its stead.

read_local_xlog_page() works as a part of logical decoding and has
responsibility to update ThisTimeLineID properly.  As the comment in
the function, it is the proper place to update ThisTimeLineID since we
miss a timeline change if we check it earlier and the function uses
the value just after. So we cannot change that behavior of the
function.  That is, neither of them doesn't seem to be the right fix.

Could you tell me what actual issue happens if read_local_xlog_page() resets
ThisTimeLineID at the end? Some replication slot-related functions that use
read_local_xlog_page() can be executed even during recovery. For example,
you mean that, when timeline swithes during recovery, those functions
behave incorrectly if ThisTimeLineID is reset?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION


Reply via email to