Re: PITR promote bug: Checkpointer writes to older timeline

2021-06-27 Thread Tom Lane
I wrote: > It sure looks like recovering a prepared > transaction creates a transient state in which a new backend will > compute a broken snapshot. Oh, after further digging this is the same issue discussed here:

Re: PITR promote bug: Checkpointer writes to older timeline

2021-06-27 Thread Tom Lane
I wrote: > Buildfarm member hornet just reported a failure in this test: > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hornet=2021-06-27%2013%3A40%3A57 > It's not clear whether this is a problem with the test case or an > actual server bug, but I'm leaning to the latter theory. My gut

Re: PITR promote bug: Checkpointer writes to older timeline

2021-06-27 Thread Tom Lane
Michael Paquier writes: > I have been working on that over the last couple of days, and applied > a fix down to 10. One thing that I did not like in the test was the > use of compare() to check if the contents of the WAL segment before > and after the timeline jump remained the same as this

Re: PITR promote bug: Checkpointer writes to older timeline

2021-03-21 Thread Michael Paquier
On Thu, Mar 18, 2021 at 12:56:12PM +0900, Michael Paquier wrote: > I was looking at uses of ThisTimeLineID in the wild, and could not > find it getting checked or used actually in backend-side code that > involved the WAL reader facility. Even if it brings confidence, it > does not mean that it

Re: PITR promote bug: Checkpointer writes to older timeline

2021-03-17 Thread Michael Paquier
On Wed, Mar 17, 2021 at 05:09:50PM +0900, Michael Paquier wrote: > Currently with HEAD and back branches, nothing would be broken as > logical contexts cannot exist in recovery. Still it would be easy > to miss the new behavior for anybody attempting to work more on this > feature in the future

Re: PITR promote bug: Checkpointer writes to older timeline

2021-03-17 Thread Michael Paquier
On Mon, Mar 15, 2021 at 04:38:08PM +0900, Michael Paquier wrote: > On Mon, Mar 15, 2021 at 03:01:09PM +0900, Kyotaro Horiguchi wrote: >> Logical decoding stuff is (I think) designed to turn any backend into >> a walsender, which may need to maintain ThisTimeLineID. It seems to >> me that logical

Re: PITR promote bug: Checkpointer writes to older timeline

2021-03-15 Thread Michael Paquier
On Mon, Mar 15, 2021 at 03:01:09PM +0900, Kyotaro Horiguchi wrote: > Logical decoding stuff is (I think) designed to turn any backend into > a walsender, which may need to maintain ThisTimeLineID. It seems to > me that logical decoding stuff indents to maintain ThisTimeLineID of > such backends

Re: PITR promote bug: Checkpointer writes to older timeline

2021-03-15 Thread Kyotaro Horiguchi
At Sun, 14 Mar 2021 17:59:59 +0900, Michael Paquier wrote in > On Thu, Mar 04, 2021 at 05:10:36PM +0900, Kyotaro Horiguchi wrote: > > read_local_xlog_page is *designed* to maintain ThisTimeLineID. > > Currently it doesn't seem utilized but I think it's sufficiently > > reasonable that the

Re: PITR promote bug: Checkpointer writes to older timeline

2021-03-14 Thread Michael Paquier
On Thu, Mar 04, 2021 at 05:10:36PM +0900, Kyotaro Horiguchi wrote: > read_local_xlog_page is *designed* to maintain ThisTimeLineID. > Currently it doesn't seem utilized but I think it's sufficiently > reasonable that the function maintains ThisTimeLineID. I don't quite follow this line of

Re: PITR promote bug: Checkpointer writes to older timeline

2021-03-13 Thread Soumyadeep Chakraborty
Hello, PFA version 2 of the TAP test. I removed the non-deterministic sleep and introduced retries until the WAL segment is archived and promotion is complete. Some additional tidying up too. Regards, Soumyadeep (VMware) diff --git a/src/test/recovery/t/022_pitr_prepared_xact.pl

Re: PITR promote bug: Checkpointer writes to older timeline

2021-03-04 Thread Soumyadeep Chakraborty
Hey all, I took a stab at a quick and dirty TAP test (my first ever). So it can probably be improved a lot. Please take a look. On Thu, Mar 04, 2021 at 10:28:31AM +0900, Kyotaro Horiguchi wrote: > 2. Restore ThisTimeLineID after calling XLogReadRecord() in the > *caller* side. This is what

Re: PITR promote bug: Checkpointer writes to older timeline

2021-03-04 Thread Kyotaro Horiguchi
At Thu, 04 Mar 2021 16:17:34 +0900 (JST), Kyotaro Horiguchi wrote in > At Thu, 4 Mar 2021 11:18:42 +0900, Michael Paquier > wrote in > > I have not looked in details at the solutions proposed here, but could > > it be possible to have a TAP test at least please? Seeing the script > > from

Re: PITR promote bug: Checkpointer writes to older timeline

2021-03-04 Thread Kyotaro Horiguchi
At Thu, 4 Mar 2021 14:57:13 +0900, Fujii Masao wrote in > > read_local_xlog_page() works as a part of logical decoding and has > > responsibility to update ThisTimeLineID properly. As the comment in > > the function, it is the proper place to update ThisTimeLineID since we > > miss a timeline

Re: PITR promote bug: Checkpointer writes to older timeline

2021-03-03 Thread Kyotaro Horiguchi
At Thu, 4 Mar 2021 11:18:42 +0900, Michael Paquier wrote in > On Thu, Mar 04, 2021 at 10:28:31AM +0900, Kyotaro Horiguchi wrote: > > read_local_xlog_page() works as a part of logical decoding and has > > responsibility to update ThisTimeLineID properly. As the comment in > > the function, it

Re: PITR promote bug: Checkpointer writes to older timeline

2021-03-03 Thread Fujii Masao
On 2021/03/04 10:28, Kyotaro Horiguchi wrote: At Wed, 3 Mar 2021 14:56:25 -0800, Soumyadeep Chakraborty wrote in On 2021/03/03 17:46, Heikki Linnakangas wrote: I think it should be reset even earlier, inside XlogReadTwoPhaseData() probably. With your patch, doesn't the

Re: PITR promote bug: Checkpointer writes to older timeline

2021-03-03 Thread Michael Paquier
On Thu, Mar 04, 2021 at 10:28:31AM +0900, Kyotaro Horiguchi wrote: > read_local_xlog_page() works as a part of logical decoding and has > responsibility to update ThisTimeLineID properly. As the comment in > the function, it is the proper place to update ThisTimeLineID since we > miss a timeline

Re: PITR promote bug: Checkpointer writes to older timeline

2021-03-03 Thread Kyotaro Horiguchi
At Wed, 3 Mar 2021 14:56:25 -0800, Soumyadeep Chakraborty wrote in > On 2021/03/03 17:46, Heikki Linnakangas wrote: > > > I think it should be reset even earlier, inside XlogReadTwoPhaseData() > > probably. With your patch, doesn't the LogStandbySnapshot() call just > > above where you're

Re: PITR promote bug: Checkpointer writes to older timeline

2021-03-03 Thread Soumyadeep Chakraborty
On 2021/03/03 17:46, Heikki Linnakangas wrote: > I think it should be reset even earlier, inside XlogReadTwoPhaseData() > probably. With your patch, doesn't the LogStandbySnapshot() call just > above where you're ressetting ThisTimeLineID also write a WAL record > with incorrect timeline?

Re: PITR promote bug: Checkpointer writes to older timeline

2021-03-03 Thread Fujii Masao
On 2021/03/03 17:46, Heikki Linnakangas wrote: On 03/03/2021 08:47, Kyotaro Horiguchi wrote: At Tue, 2 Mar 2021 17:56:03 -0800, Soumyadeep Chakraborty wrote in When there are prepared transactions in an older timeline, in the checkpointer, a call to CheckPointTwoPhase() and subsequently

Re: PITR promote bug: Checkpointer writes to older timeline

2021-03-03 Thread Heikki Linnakangas
On 03/03/2021 08:47, Kyotaro Horiguchi wrote: At Tue, 2 Mar 2021 17:56:03 -0800, Soumyadeep Chakraborty wrote in When there are prepared transactions in an older timeline, in the checkpointer, a call to CheckPointTwoPhase() and subsequently to XlogReadTwoPhaseData() and subsequently to

Re: PITR promote bug: Checkpointer writes to older timeline

2021-03-02 Thread Kyotaro Horiguchi
At Tue, 2 Mar 2021 17:56:03 -0800, Soumyadeep Chakraborty wrote in > Hello hackers, > > We came across an issue where the checkpointer writes to the older > timeline while a promotion is ongoing after reaching the recovery point > in a PITR, when there are prepared transactions before the

PITR promote bug: Checkpointer writes to older timeline

2021-03-02 Thread Soumyadeep Chakraborty
Hello hackers, We came across an issue where the checkpointer writes to the older timeline while a promotion is ongoing after reaching the recovery point in a PITR, when there are prepared transactions before the recovery point. We came across this issue first in REL_12_STABLE and saw that it