Re: prevent immature WAL streaming

2022-03-23 Thread Alvaro Herrera
On 2021-Sep-03, Alvaro Herrera wrote: > The last commit is something I noticed in pg_rewind ... I had missed this one; it's pushed now. -- Álvaro HerreraBreisgau, Deutschland — https://www.EnterpriseDB.com/ "I can see support will not be a problem. 10 out of 10."(Simon Wittber)

Re: prevent immature WAL streaming

2021-11-26 Thread Alvaro Herrera
On 2021-Nov-26, Amul Sul wrote: > On Fri, Nov 26, 2021 at 1:42 AM Tom Lane wrote: > > > > Meh ... but given the simplicity of the write-side fix, maybe changing > > it is appropriate. Actually, fixing the other side is equally simple, and it is also more correct. What changed my mind is that

Re: prevent immature WAL streaming

2021-11-25 Thread Amul Sul
On Fri, Nov 26, 2021 at 1:42 AM Tom Lane wrote: > > Alvaro Herrera writes: > > On 2021-Nov-25, Tom Lane wrote: > >> Really? AFAICS the WAL record contains the correct value, or at least > >> we should define that one as being correct, for precisely this reason. > > > I don't know what is the

Re: prevent immature WAL streaming

2021-11-25 Thread Tom Lane
I wrote: > However, this seems too forgiving: ... also, I don't know if you intended this already, but the VerifyOverwriteContrecord change should only be applied in back branches. There's no need for it in HEAD. regards, tom lane

Re: prevent immature WAL streaming

2021-11-25 Thread Tom Lane
Alvaro Herrera writes: > On 2021-Nov-25, Tom Lane wrote: >> Really? AFAICS the WAL record contains the correct value, or at least >> we should define that one as being correct, for precisely this reason. > I don't know what is the correct value for a record that comes exactly > after the page

Re: prevent immature WAL streaming

2021-11-25 Thread Alvaro Herrera
On 2021-Nov-25, Tom Lane wrote: > Alvaro Herrera writes: > > > The problem is that the bug occurs while writing the WAL record. Fixed > > servers won't produce such records, but if you run an unpatched server > > and it happens to write one, without a mitigation you cannot get away > > from

Re: prevent immature WAL streaming

2021-11-25 Thread Tom Lane
Alvaro Herrera writes: > On 2021-Nov-25, Tom Lane wrote: >> Uh, why? The fix should remove the problem, and if it doesn't, we're >> still looking at inconsistent WAL aren't we? > The problem is that the bug occurs while writing the WAL record. Fixed > servers won't produce such records, but if

Re: prevent immature WAL streaming

2021-11-25 Thread Alvaro Herrera
On 2021-Nov-25, Tom Lane wrote: > Alvaro Herrera writes: > > Oh, but also I think I should push a mitigation in case a production > > system hits this problem: maybe reduce the message from FATAL to WARNING > > if the registered LSN is at a page boundary. > > Uh, why? The fix should remove the

Re: prevent immature WAL streaming

2021-11-25 Thread Tom Lane
Alvaro Herrera writes: > Oh, but also I think I should push a mitigation in case a production > system hits this problem: maybe reduce the message from FATAL to WARNING > if the registered LSN is at a page boundary. Uh, why? The fix should remove the problem, and if it doesn't, we're still

Re: prevent immature WAL streaming

2021-11-25 Thread Alvaro Herrera
Oh, but also I think I should push a mitigation in case a production system hits this problem: maybe reduce the message from FATAL to WARNING if the registered LSN is at a page boundary. -- Álvaro Herrera Valdivia, Chile — https://www.EnterpriseDB.com/ "Entristecido, Wutra

Re: prevent immature WAL streaming

2021-11-25 Thread Alvaro Herrera
On 2021-Nov-25, Amul Sul wrote: > In XLogReadRecord(), both the variables being compared have > inconsistency in the assignment -- one gets assigned from > state->currRecPtr where other is from RecPtr. > > . > state->overwrittenRecPtr = state->currRecPtr; > . > state->abortedRecPtr =

Re: prevent immature WAL streaming

2021-11-24 Thread Amul Sul
On Wed, Nov 24, 2021 at 2:10 AM Alvaro Herrera wrote: > > On 2021-Nov-23, Tom Lane wrote: > > > We're *still* not out of the woods with 026_overwrite_contrecord.pl, > > as we are continuing to see occasional "mismatching overwritten LSN" > > failures, further down in the test where it tries to

Re: prevent immature WAL streaming

2021-11-23 Thread Alvaro Herrera
On 2021-Nov-23, Tom Lane wrote: > We're *still* not out of the woods with 026_overwrite_contrecord.pl, > as we are continuing to see occasional "mismatching overwritten LSN" > failures, further down in the test where it tries to start up the > standby: Augh. > Looking at adjacent successful

Re: prevent immature WAL streaming

2021-11-23 Thread Tom Lane
We're *still* not out of the woods with 026_overwrite_contrecord.pl, as we are continuing to see occasional "mismatching overwritten LSN" failures, further down in the test where it tries to start up the standby: sysname |branch | snapshot | stage |

Re: prevent immature WAL streaming

2021-11-10 Thread Dagfinn Ilmari Mannsåker
Tom Lane writes: > Also, I think we want > > -ok($initfile != $endfile, "$initfile differs from $endfile"); > +ok($initfile ne $endfile, "$initfile differs from $endfile"); > > The existing coding works as long as all characters of these > WAL segment names happen to be decimal digits, but ...

Re: prevent immature WAL streaming

2021-11-10 Thread Alvaro Herrera
On 2021-Nov-09, Tom Lane wrote: > This is still happening off and on, which makes it look like a > timing-sensitive problem. Confirming that, I can make it fail > every time by adding a long sleep just ahead of where > 026_overwrite_contrecord.pl captures $initfile. On reflection > I think the

Re: prevent immature WAL streaming

2021-11-09 Thread Tom Lane
I wrote: > Seems like this hasn't fixed the problem: skink still fails on > this test occasionally. > # Failed test '00010002 differs from > 00010002' > # at t/026_overwrite_contrecord.pl line 61. This is still happening off and on, which makes it look like a

Re: prevent immature WAL streaming

2021-11-08 Thread Alvaro Herrera
On 2021-Nov-08, Alexander Lakhin wrote: > Hello Alvaro, > 14.10.2021 01:09, Alvaro Herrera wrote: > >> Yea, let's go for your patch then. I've verified that at least locally it > >> passes under valgrind. > > Ah great, thanks. Pushed then. > > > While translating messages I've noticed that the

Re: prevent immature WAL streaming

2021-11-07 Thread Alexander Lakhin
Hello Alvaro, 14.10.2021 01:09, Alvaro Herrera wrote: >> Yea, let's go for your patch then. I've verified that at least locally it >> passes under valgrind. > Ah great, thanks. Pushed then. > While translating messages I've noticed that the version of the patch ported to

Re: prevent immature WAL streaming

2021-11-02 Thread Tom Lane
Alvaro Herrera writes: > On 2021-Oct-13, Andres Freund wrote: >> Yea, let's go for your patch then. I've verified that at least locally it >> passes under valgrind. > Ah great, thanks. Pushed then. Seems like this hasn't fixed the problem: skink still fails on this test occasionally.

Re: prevent immature WAL streaming

2021-10-25 Thread Kyotaro Horiguchi
At Mon, 25 Oct 2021 10:34:27 +0530, Amul Sul wrote in > On Mon, Oct 25, 2021 at 7:02 AM Kyotaro Horiguchi > wrote: > > > > At Fri, 22 Oct 2021 18:43:52 +0530, Amul Sul wrote in > > > Any thoughts about the patch posted previously? > > > > Honestly, xlogreader looks fine with the current shape.

Re: prevent immature WAL streaming

2021-10-24 Thread Amul Sul
On Mon, Oct 25, 2021 at 7:02 AM Kyotaro Horiguchi wrote: > > At Fri, 22 Oct 2021 18:43:52 +0530, Amul Sul wrote in > > Any thoughts about the patch posted previously? > > Honestly, xlogreader looks fine with the current shape. The reason is > that it seems cleaner as an interface boundary since

Re: prevent immature WAL streaming

2021-10-24 Thread Kyotaro Horiguchi
At Fri, 22 Oct 2021 18:43:52 +0530, Amul Sul wrote in > Any thoughts about the patch posted previously? Honestly, xlogreader looks fine with the current shape. The reason is that it seems cleaner as an interface boundary since the caller of xlogreader doesn't need to know about the details of

Re: prevent immature WAL streaming

2021-10-22 Thread Amul Sul
On Thu, Oct 14, 2021 at 6:14 PM Amul Sul wrote: > > On Wed, Oct 13, 2021 at 10:58 PM Alvaro Herrera > wrote: > > > > On 2021-Oct-13, Amul Sul wrote: > > > > > I have one more question, regarding the need for other global > > > variables i.e. abortedRecPtr. (Sorry for coming back after so

Re: prevent immature WAL streaming

2021-10-14 Thread Amul Sul
On Wed, Oct 13, 2021 at 10:58 PM Alvaro Herrera wrote: > > On 2021-Oct-13, Amul Sul wrote: > > > I have one more question, regarding the need for other global > > variables i.e. abortedRecPtr. (Sorry for coming back after so long.) > > > > Instead of abortedRecPtr point, isn't enough to write >

Re: prevent immature WAL streaming

2021-10-13 Thread Alvaro Herrera
On 2021-Oct-13, Andres Freund wrote: > Hi, > > On 2021-10-13 17:46:53 -0300, Alvaro Herrera wrote: > > On 2021-Oct-13, Andres Freund wrote: > > > > > Something very roughly like the attached. Perhaps that's going a bit > > > overboard > > > though. But it seems like it might be something we

Re: prevent immature WAL streaming

2021-10-13 Thread Andres Freund
Hi, On 2021-10-13 17:46:53 -0300, Alvaro Herrera wrote: > On 2021-Oct-13, Andres Freund wrote: > > > Something very roughly like the attached. Perhaps that's going a bit > > overboard > > though. But it seems like it might be something we could use in a few tests? > > Hah, our emails crossed.

Re: prevent immature WAL streaming

2021-10-13 Thread Alvaro Herrera
Hi On 2021-Oct-13, Andres Freund wrote: > Something very roughly like the attached. Perhaps that's going a bit overboard > though. But it seems like it might be something we could use in a few tests? Hah, our emails crossed. If you want to turn this into a patch to the 026 test file, please go

Re: prevent immature WAL streaming

2021-10-13 Thread Alvaro Herrera
This works nicely with the TAP test: -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "Postgres is bloatware by design: it was built to house PhD theses." (Joey Hellerstein, SIGMOD annual conference 2002) >From e9bcdb3b44f36ffa1a2377f252d73ff3d7210fd9 Mon Sep

Re: prevent immature WAL streaming

2021-10-13 Thread Andres Freund
Hi, On 2021-10-13 12:13:45 -0700, Andres Freund wrote: > On 2021-10-13 15:52:46 -0300, Alvaro Herrera wrote: > > I think I realized partway through writing the test that I could use > > emit_message instead of using a batched row insert ... so, yeah, we > > can use it here also. > > Cool. Even if

Re: prevent immature WAL streaming

2021-10-13 Thread Alvaro Herrera
On 2021-Oct-13, Andres Freund wrote: > > > Another thing: filling a segment by inserting lots of very tiny rows is > > > pretty > > > expensive. Can't we use something a bit wider? Perhaps even emit_message? > > FWIW, the count of inserted rows is something like 171985 ;) This does ~1600

Re: prevent immature WAL streaming

2021-10-13 Thread Andres Freund
Hi, On 2021-10-13 15:52:46 -0300, Alvaro Herrera wrote: > > Hm. I guess we can disable autovac. But that's not a great solution, there > > might be WAL files due to catalog access etc too. > > Well, we don't expect anything else to happen -- the cluster is > otherwise idle. I think we should do

Re: prevent immature WAL streaming

2021-10-13 Thread Alvaro Herrera
On 2021-Oct-13, Andres Freund wrote: > I added LSNs to the error message: > not ok 1 - 00010002 (0/2002350) differs from > 00010002 (0/2099600) > > It appears that the problem is that inbetween the determination of > rows_walsize the insert LSN moved to the next

Re: prevent immature WAL streaming

2021-10-13 Thread Andres Freund
Hi, On 2021-10-13 11:03:39 -0700, Andres Freund wrote: > It seems that 026_overwrite_contrecord.pl test often fails under valgrind. I > first thought that related failures on skink were due to me migrating the > animal to a new host (and then running out of space due to a mistake in ccache >

Re: prevent immature WAL streaming

2021-10-13 Thread Andres Freund
Hi, It seems that 026_overwrite_contrecord.pl test often fails under valgrind. I first thought that related failures on skink were due to me migrating the animal to a new host (and then running out of space due to a mistake in ccache config). But it happened again after I fixed those, and I just

Re: prevent immature WAL streaming

2021-10-13 Thread Alvaro Herrera
On 2021-Oct-13, Robert Haas wrote: > On Wed, Oct 13, 2021 at 2:01 AM Amul Sul wrote: > > Instead of abortedRecPtr point, isn't enough to write > > overwrite-contrecord at XLogCtl->lastReplayedEndRecPtr? I think both > > are pointing to the same location then can't we use > >

Re: prevent immature WAL streaming

2021-10-13 Thread Alvaro Herrera
On 2021-Oct-13, Amul Sul wrote: > I have one more question, regarding the need for other global > variables i.e. abortedRecPtr. (Sorry for coming back after so long.) > > Instead of abortedRecPtr point, isn't enough to write > overwrite-contrecord at XLogCtl->lastReplayedEndRecPtr? I think

Re: prevent immature WAL streaming

2021-10-13 Thread Robert Haas
On Wed, Oct 13, 2021 at 2:01 AM Amul Sul wrote: > Instead of abortedRecPtr point, isn't enough to write > overwrite-contrecord at XLogCtl->lastReplayedEndRecPtr? I think both > are pointing to the same location then can't we use > lastReplayedEndRecPtr instead of abortedRecPtr to write >

RE: prevent immature WAL streaming

2021-10-13 Thread Jakub Wartak
On 2021-Sep-25, Alvaro Herrera wrote: >> On 2021-Sep-24, Alvaro Herrera wrote: >> >> > Here's the set for all branches, which I think are really final, in >> > case somebody wants to play and reproduce their respective problem >> scenarios. >> >> I forgot to mention that I'll wait until 14.0 is

Re: prevent immature WAL streaming

2021-10-13 Thread Amul Sul
Hi, On Thu, Oct 7, 2021 at 6:57 PM Alvaro Herrera wrote: > > On 2021-Oct-07, Amul Sul wrote: > > > Make sense, thanks for the explanation. > > You're welcome. Also, I forgot: thank you for taking the time to review > the code. Much appreciated. :) > > I have one more question, regarding the

Re: prevent immature WAL streaming

2021-10-07 Thread Alvaro Herrera
On 2021-Oct-07, Amul Sul wrote: > Make sense, thanks for the explanation. You're welcome. Also, I forgot: thank you for taking the time to review the code. Much appreciated. -- Álvaro Herrera 39°49'30"S 73°17'W — https://www.EnterpriseDB.com/

Re: prevent immature WAL streaming

2021-10-07 Thread Amul Sul
On Thu, 7 Oct 2021 at 6:41 PM, Alvaro Herrera wrote: > On 2021-Oct-07, Amul Sul wrote: > > > While reading this commit (ff9f111bce24), wondered can't we skip > > missingContrecPtr global variable declaration and calculate that from > > abortedRecPtr value whenever it needed. IIUC,

Re: prevent immature WAL streaming

2021-10-07 Thread Alvaro Herrera
On 2021-Oct-07, Amul Sul wrote: > While reading this commit (ff9f111bce24), wondered can't we skip > missingContrecPtr global variable declaration and calculate that from > abortedRecPtr value whenever it needed. IIUC, missingContrecPtr is the > next page to the page that abortedRecPtr contain

Re: prevent immature WAL streaming

2021-10-07 Thread Amul Sul
Hi, On Wed, Sep 29, 2021 at 8:14 PM Alvaro Herrera wrote: > > On 2021-Sep-24, Alvaro Herrera wrote: > > > Here's the set for all branches, which I think are really final, in case > > somebody wants to play and reproduce their respective problem scenarios. > > Nathan already confirmed that his

Re: prevent immature WAL streaming

2021-09-30 Thread Tom Lane
Alvaro Herrera writes: > Hmm. Well, as I said, maybe this part of the test isn't worth much > anyway. Rather than spending time trying to figure out why isn't this > triggering the WAL overwriting, I compared the coverage report for > running only the first test to the coverage report of

Re: prevent immature WAL streaming

2021-09-30 Thread Alvaro Herrera
On 2021-Sep-30, Tom Lane wrote: > Just when you thought it was safe to go back in the water: > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prairiedog=2021-09-29%2022%3A05%3A44 > > which is complaining that the (misspelled, BTW) Ah, the case of the missing juxtaposed consonants.

Re: prevent immature WAL streaming

2021-09-30 Thread Tom Lane
Just when you thought it was safe to go back in the water: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prairiedog=2021-09-29%2022%3A05%3A44 which is complaining that the (misspelled, BTW) log message 'sucessfully skipped missing contrecord at' doesn't show up. This machine is old,

Re: prevent immature WAL streaming

2021-09-30 Thread Tom Lane
Alvaro Herrera writes: > On 2021-Sep-30, Tom Lane wrote: >> There's still the issue of these tests overriding postgresql.conf >> entries made by init(), but maybe we can live with that? > I vote to at least have has_archiving=>1 set wal_level=replica, and > potentially max_wal_senders=2 too

Re: prevent immature WAL streaming

2021-09-30 Thread Alvaro Herrera
On 2021-Sep-30, Tom Lane wrote: > Andrew Dunstan writes: > > Regardless of this problem, I think we should simply call > > set_replication_conf unconditionally in init().  Replication connections > > are now allowed by default on Unix, this would just bring Windows nodes > > into line with that.

Re: prevent immature WAL streaming

2021-09-30 Thread Tom Lane
Andrew Dunstan writes: > Regardless of this problem, I think we should simply call > set_replication_conf unconditionally in init().  Replication connections > are now allowed by default on Unix, this would just bring Windows nodes > into line with that. Yeah, I was thinking along the same lines

Re: prevent immature WAL streaming

2021-09-30 Thread Andrew Dunstan
On 9/29/21 5:29 PM, Tom Lane wrote: > Andrew Dunstan writes: >> On 9/29/21 4:33 PM, Tom Lane wrote: >>> which looks like a pretty straightforward bogus-connection-configuration >>> problem, except why wouldn't other BF members show it? >> This: >> ... >> doesn't have "allows_streaming => 1". >

Re: prevent immature WAL streaming

2021-09-30 Thread Alvaro Herrera
Hello On 2021-Sep-30, Andres Freund wrote: > FWIW, with that fixed I see the test hanging (e.g. [1]): > > can't unlink > c:/cirrus/src/test/recovery/tmp_check/t_026_overwrite_contrecord_primary2_data/archives/00010008.fail: > Permission denied at t/026_overwrite_contrecord.pl

Re: prevent immature WAL streaming

2021-09-30 Thread Andres Freund
Hi, On 2021-09-29 17:04:48 -0400, Andrew Dunstan wrote: > On 9/29/21 4:33 PM, Tom Lane wrote: > # Second test: a standby that receives WAL via archive/restore commands. > $node = PostgresNode->new('primary2'); > $node->init( >     has_archiving => 1, >     extra =>

Re: prevent immature WAL streaming

2021-09-29 Thread Tom Lane
Andrew Dunstan writes: > On 9/29/21 5:27 PM, Alvaro Herrera wrote: >> So, do we take the stance that we have no right to expect pg_basebackup >> to work if we didn't pass allow_streaming => 1? If so, the fix is to >> add it. But my preferred fix would be to call set_replication_conf if >>

Re: prevent immature WAL streaming

2021-09-29 Thread Andrew Dunstan
On 9/29/21 5:27 PM, Alvaro Herrera wrote: > On 2021-Sep-29, Andrew Dunstan wrote: > >>> The relevant info seems to be >>> >>> # Running: pg_basebackup -D >>> /home/pgrunner/bf/root/REL_14_STABLE/pgsql.build/src/test/recovery/tmp_check/t_026_overwrite_contrecord_primary2_data/backup/backup >>>

Re: prevent immature WAL streaming

2021-09-29 Thread Alvaro Herrera
On 2021-Sep-29, Tom Lane wrote: > ... although I wonder why the fact that sub init otherwise sets > wal_level = minimal doesn't cause a problem for this test. > Maybe the test script is cowboy-ishly overriding that? It is. (I claim that it should be set otherwise when has_archiving=>1). --

Re: prevent immature WAL streaming

2021-09-29 Thread Tom Lane
Andrew Dunstan writes: > On 9/29/21 4:33 PM, Tom Lane wrote: >> which looks like a pretty straightforward bogus-connection-configuration >> problem, except why wouldn't other BF members show it? > This: > ... > doesn't have "allows_streaming => 1". Oh, and that only breaks things on Windows, cf

Re: prevent immature WAL streaming

2021-09-29 Thread Alvaro Herrera
On 2021-Sep-29, Andrew Dunstan wrote: > > The relevant info seems to be > > > > # Running: pg_basebackup -D > > /home/pgrunner/bf/root/REL_14_STABLE/pgsql.build/src/test/recovery/tmp_check/t_026_overwrite_contrecord_primary2_data/backup/backup > > -h 127.0.0.1 -p 59502 --checkpoint fast

Re: prevent immature WAL streaming

2021-09-29 Thread Andrew Dunstan
On 9/29/21 4:33 PM, Tom Lane wrote: > Alvaro Herrera writes: >> Pushed. Watching for buildfarm fireworks now. > jacana didn't like it: > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=jacana=2021-09-29%2018%3A25%3A53 > > The relevant info seems to be > > # Running: pg_basebackup -D

Re: prevent immature WAL streaming

2021-09-29 Thread Tom Lane
Alvaro Herrera writes: > Pushed. Watching for buildfarm fireworks now. jacana didn't like it: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=jacana=2021-09-29%2018%3A25%3A53 The relevant info seems to be # Running: pg_basebackup -D

Re: prevent immature WAL streaming

2021-09-29 Thread Alvaro Herrera
On 2021-Sep-24, Hannu Krosing wrote: Hi Hannu > In the first email you write > > > As mentioned in the course of thread [1], we're missing a fix for > streaming replication to avoid sending records that the primary hasn't > fully flushed yet. This patch is a first attempt at fixing that

Re: prevent immature WAL streaming

2021-09-29 Thread Alvaro Herrera
On 2021-Sep-24, Alvaro Herrera wrote: > Here's the set for all branches, which I think are really final, in case > somebody wants to play and reproduce their respective problem scenarios. > Nathan already confirmed that his reproducer no longer shows a problem, > and this version shouldn't affect

Re: prevent immature WAL streaming

2021-09-25 Thread Alvaro Herrera
On 2021-Sep-24, Alvaro Herrera wrote: > Here's the set for all branches, which I think are really final, in case > somebody wants to play and reproduce their respective problem scenarios. I forgot to mention that I'll wait until 14.0 is tagged before getting anything pushed. -- Álvaro Herrera

Re: prevent immature WAL streaming

2021-09-24 Thread Hannu Krosing
Hi Alvaro, I just started reading this thread, but maybe you can confirm or refute my understanding of what was done. In the first email you write > As mentioned in the course of thread [1], we're missing a fix for streaming replication to avoid sending records that the primary hasn't fully

Re: prevent immature WAL streaming

2021-09-23 Thread Alvaro Herrera
On 2021-Sep-23, Alvaro Herrera wrote: > However, I notice now that the pg_rewind tests reproducibly fail in > branch 14 for reasons I haven't yet understood. It's strange that no > other branch fails, even when run quite a few times. Turns out that this is a real bug (setting EndOfLog seems

Re: prevent immature WAL streaming

2021-09-23 Thread Alvaro Herrera
This took some time because backpatching the tests was more work than I anticipated -- function name changes, operators that don't exist, definition of the WAL segment size in pg_settings. I had to remove the second test in branches 13 and earlier due to lack of LSN+bytes operator. Fortunately,

Re: prevent immature WAL streaming

2021-09-21 Thread Alvaro Herrera
I made one final pass over the whole thing to be sure it's all commented as thoroughly as possible, and changed the names of things to make them all consistent. So here's the last version which I intend to push to all branches soon. (The only difference in back-branches is that the xlogreader

Re: prevent immature WAL streaming

2021-09-17 Thread Alvaro Herrera
On 2021-Sep-17, Bossart, Nathan wrote: > > That was the first implementation, a few versions of the patch ago. An > > added benefit of a separate WAL record is that you can carry additional > > data for validation, such as -- as suggested by Andres -- the CRC of the > > partial data contained in

Re: prevent immature WAL streaming

2021-09-17 Thread Bossart, Nathan
On 9/17/21, 1:32 PM, "Alvaro Herrera" wrote: > On 2021-Sep-17, Bossart, Nathan wrote: > >> I gave the patch a read-through. I'm wondering if the >> XLOG_OVERWRITE_CONTRECORD records are actually necessary. IIUC we >> will set XLP_FIRST_IS_ABORTED_PARTIAL on the next page, and >> xlp_pageaddr on

Re: prevent immature WAL streaming

2021-09-17 Thread Alvaro Herrera
On 2021-Sep-17, Bossart, Nathan wrote: > I gave the patch a read-through. I'm wondering if the > XLOG_OVERWRITE_CONTRECORD records are actually necessary. IIUC we > will set XLP_FIRST_IS_ABORTED_PARTIAL on the next page, and > xlp_pageaddr on that page will already be validated in >

Re: prevent immature WAL streaming

2021-09-17 Thread Bossart, Nathan
On 9/17/21, 10:35 AM, "Alvaro Herrera" wrote: > On 2021-Sep-17, Bossart, Nathan wrote: > >> On 9/17/21, 9:37 AM, "Alvaro Herrera" wrote: > >> > If you have some time to try your reproducers with this new proposed >> > fix, I would appreciate it. >> >> I haven't had a chance to look at the patch

Re: prevent immature WAL streaming

2021-09-17 Thread Alvaro Herrera
On 2021-Sep-17, Bossart, Nathan wrote: > On 9/17/21, 9:37 AM, "Alvaro Herrera" wrote: > > If you have some time to try your reproducers with this new proposed > > fix, I would appreciate it. > > I haven't had a chance to look at the patch yet, but it appears to fix > things with my original

Re: prevent immature WAL streaming

2021-09-17 Thread Alvaro Herrera
On 2021-Sep-17, Alvaro Herrera wrote: > Added Matsumura-san to CC, because he was interested in this topic too > per the earlier thread. I failed to do this, so hopefully this serves as a ping. -- Álvaro Herrera 39°49'30"S 73°17'W — https://www.EnterpriseDB.com/

Re: prevent immature WAL streaming

2021-09-17 Thread Bossart, Nathan
On 9/17/21, 9:37 AM, "Alvaro Herrera" wrote: > OK, this version is much more palatable, because here we verify that the > OVERWRITE_CONTRECORD we replay matches the record that was lost. Also, > I wrote a test script that creates such a broken record (by the simple > expedient of deleting the

Re: prevent immature WAL streaming

2021-09-17 Thread Alvaro Herrera
OK, this version is much more palatable, because here we verify that the OVERWRITE_CONTRECORD we replay matches the record that was lost. Also, I wrote a test script that creates such a broken record (by the simple expedient of deleting the WAL file containing the second half while the server is

Re: prevent immature WAL streaming

2021-09-15 Thread Alvaro Herrera
On 2021-Sep-15, Kyotaro Horiguchi wrote: > + CopyXLogRecordToWAL(rechdr->xl_tot_len, isLogSwitch, > + flags & > XLOG_SET_ABORTED_PARTIAL, > + rdata, StartPos, > EndPos); > > The

Re: prevent immature WAL streaming

2021-09-14 Thread Kyotaro Horiguchi
At Tue, 14 Sep 2021 22:32:04 -0300, Alvaro Herrera wrote in > On 2021-Sep-14, Alvaro Herrera wrote: > > > On 2021-Sep-08, Kyotaro Horiguchi wrote: > > > > > Thanks! As my understanding the new record add the ability to > > > cross-check between a teard-off contrecord and the new record

Re: prevent immature WAL streaming

2021-09-14 Thread Alvaro Herrera
On 2021-Sep-14, Alvaro Herrera wrote: > On 2021-Sep-08, Kyotaro Horiguchi wrote: > > > Thanks! As my understanding the new record add the ability to > > cross-check between a teard-off contrecord and the new record inserted > > after the teard-off record. I didn't test the version by myself

Re: prevent immature WAL streaming

2021-09-14 Thread Alvaro Herrera
On 2021-Sep-08, Kyotaro Horiguchi wrote: > Thanks! As my understanding the new record add the ability to > cross-check between a teard-off contrecord and the new record inserted > after the teard-off record. I didn't test the version by myself but > the previous version implemented the

Re: prevent immature WAL streaming

2021-09-08 Thread Kyotaro Horiguchi
At Tue, 7 Sep 2021 18:41:57 +, "Bossart, Nathan" wrote in > On 9/4/21, 10:26 AM, "Alvaro Herrera" wrote: > > Attached are the same patches as last night, except I added a test for > > XLOG_DEBUG where pertinent. (The elog(PANIC) is not made conditional on > > that, since it's a

Re: prevent immature WAL streaming

2021-09-07 Thread Bossart, Nathan
On 9/4/21, 10:26 AM, "Alvaro Herrera" wrote: > Attached are the same patches as last night, except I added a test for > XLOG_DEBUG where pertinent. (The elog(PANIC) is not made conditional on > that, since it's a cross-check rather than informative.) Also fixed the > silly pg_rewind mistake I

Re: prevent immature WAL streaming

2021-09-04 Thread Alvaro Herrera
On 2021-Sep-03, Andres Freund wrote: > > +#ifdef WAL_DEBUG > > + ereport(LOG, > > + (errmsg_internal("recovery overwriting broken > > contrecord at %X/%X (EndRecPtr: %X/%X)", > > + > >

Re: prevent immature WAL streaming

2021-09-04 Thread Alvaro Herrera
On 2021-Sep-03, Andres Freund wrote: > Hi, > > On 2021-09-03 20:01:50 -0400, Alvaro Herrera wrote: > > From 6abc5026f92b99d704bff527d1306eb8588635e9 Mon Sep 17 00:00:00 2001 > > From: Alvaro Herrera > > Date: Tue, 31 Aug 2021 20:55:10 -0400 > > Subject: [PATCH v3 1/5] Revert "Avoid creating

Re: prevent immature WAL streaming

2021-09-03 Thread Andres Freund
Hi, On 2021-09-03 20:01:50 -0400, Alvaro Herrera wrote: > From 6abc5026f92b99d704bff527d1306eb8588635e9 Mon Sep 17 00:00:00 2001 > From: Alvaro Herrera > Date: Tue, 31 Aug 2021 20:55:10 -0400 > Subject: [PATCH v3 1/5] Revert "Avoid creating archive status ".ready" files > too early" > This

Re: prevent immature WAL streaming

2021-09-03 Thread Alvaro Herrera
I thought that the way to have debug output for this new WAL code is to use WAL_DEBUG; that way it won't bother anyone and we can remove them later if necessary. Also, I realized that we should cause any error in the path that assembles a record from contrecords is to set a flag that we can test

Re: prevent immature WAL streaming

2021-09-03 Thread Alvaro Herrera
On 2021-Sep-03, Andres Freund wrote: > Hi, > > On 2021-09-03 12:55:23 -0400, Alvaro Herrera wrote: > > Oh, but of course we can't modify XLogReaderState in backbranches to add > > the new struct member abortedContrecordPtr (or whatever we end up naming > > that.) > > Why is that? Afaict it's

Re: prevent immature WAL streaming

2021-09-03 Thread Andres Freund
Hi, On 2021-09-03 12:55:23 -0400, Alvaro Herrera wrote: > Oh, but of course we can't modify XLogReaderState in backbranches to add > the new struct member abortedContrecordPtr (or whatever we end up naming > that.) Why is that? Afaict it's always allocated via XLogReaderAllocate(), so adding a

Re: prevent immature WAL streaming

2021-09-03 Thread Alvaro Herrera
On 2021-Sep-03, Kyotaro Horiguchi wrote: > At Thu, 2 Sep 2021 18:43:33 -0400, Alvaro Herrera > wrote in > The name sounds like the start LSN. doesn't contrecordAbort(ed)Ptr work? > > > if (!(pageHeader->xlp_info & XLP_FIRST_IS_CONTRECORD)) > > { > >

Re: prevent immature WAL streaming

2021-09-03 Thread Alvaro Herrera
Oh, but of course we can't modify XLogReaderState in backbranches to add the new struct member abortedContrecordPtr (or whatever we end up naming that.) I think I'm going to fix this, in backbranches only, by having xlogreader.c have a global variable, which is going to be used by ReadRecord

Re: prevent immature WAL streaming

2021-09-03 Thread Alvaro Herrera
On 2021-Sep-03, Kyotaro Horiguchi wrote: > At Thu, 2 Sep 2021 18:43:33 -0400, Alvaro Herrera > wrote in > 0002: > > + XLogRecPtr abortedContrecordPtr; /* LSN of incomplete record at > > end of > > + * > > WAL */ >

Re: prevent immature WAL streaming

2021-09-03 Thread Kyotaro Horiguchi
At Thu, 2 Sep 2021 18:43:33 -0400, Alvaro Herrera wrote in > On 2021-Sep-02, Kyotaro Horiguchi wrote: > > > So, this is a crude PoC of that. > > I had ended up with something very similar, except I was trying to cram > the flag via the checkpoint record instead of hacking >

Re: prevent immature WAL streaming

2021-09-02 Thread Alvaro Herrera
On 2021-Sep-02, Kyotaro Horiguchi wrote: > So, this is a crude PoC of that. I had ended up with something very similar, except I was trying to cram the flag via the checkpoint record instead of hacking AdvanceXLInsertBuffer(). I removed that stuff and merged both, here's the result. > 1. This

Re: prevent immature WAL streaming

2021-09-01 Thread Kyotaro Horiguchi
At Wed, 1 Sep 2021 15:01:43 +0900, Fujii Masao wrote in > > > On 2021/09/01 12:15, Andres Freund wrote: > > Hi, > > On 2021-09-01 11:34:34 +0900, Fujii Masao wrote: > >> On 2021/09/01 0:53, Andres Freund wrote: > >>> Of course, we need to be careful to not weaken WAL validity checking > >>>

Re: prevent immature WAL streaming

2021-09-01 Thread Bossart, Nathan
On 9/1/21, 10:06 AM, "Andres Freund" wrote: > On 2021-09-01 15:01:43 +0900, Fujii Masao wrote: >> Thanks for clarifying that! Unless I misunderstand that, when recovery ends >> at a partially-flushed segment-spanning record, it sets >> XLP_FIRST_IS_ABORTED_PARTIAL in the next segment before

Re: prevent immature WAL streaming

2021-09-01 Thread Andres Freund
Hi, On 2021-09-01 15:01:43 +0900, Fujii Masao wrote: > Thanks for clarifying that! Unless I misunderstand that, when recovery ends > at a partially-flushed segment-spanning record, it sets > XLP_FIRST_IS_ABORTED_PARTIAL in the next segment before starting writing > new WAL data there. Therefore

Re: prevent immature WAL streaming

2021-09-01 Thread Fujii Masao
On 2021/09/01 12:15, Andres Freund wrote: Hi, On 2021-09-01 11:34:34 +0900, Fujii Masao wrote: On 2021/09/01 0:53, Andres Freund wrote: Of course, we need to be careful to not weaken WAL validity checking too much. How about the following: If we're "aborting" a continued record, we set

Re: prevent immature WAL streaming

2021-08-31 Thread Kyotaro Horiguchi
At Tue, 31 Aug 2021 20:15:24 -0700, Andres Freund wrote in > Hi, > > On 2021-09-01 11:34:34 +0900, Fujii Masao wrote: > > On 2021/09/01 0:53, Andres Freund wrote: > > > Of course, we need to be careful to not weaken WAL validity checking too > > > much. How about the following: > > > > > > If

Re: prevent immature WAL streaming

2021-08-31 Thread Andres Freund
Hi, On 2021-09-01 11:34:34 +0900, Fujii Masao wrote: > On 2021/09/01 0:53, Andres Freund wrote: > > Of course, we need to be careful to not weaken WAL validity checking too > > much. How about the following: > > > > If we're "aborting" a continued record, we set XLP_FIRST_IS_ABORTED_PARTIAL > >

Re: prevent immature WAL streaming

2021-08-31 Thread Fujii Masao
On 2021/09/01 0:53, Andres Freund wrote: I was thinking that on a normal WAL write we'd do nothing. Instead we would have dedicated code at the end of recovery that, if the WAL ends in a partial record, changes the page following the "valid" portion of the WAL to indicate that an incomplete

Re: prevent immature WAL streaming

2021-08-31 Thread Andres Freund
Hi, On 2021-08-31 09:56:30 -0400, Alvaro Herrera wrote: > On 2021-Aug-30, Andres Freund wrote: > > I think a better approach might be to handle this on the WAL layout > > level. What if we never overwrite partial records but instead just > > skipped over them during decoding? > > Maybe this is a

  1   2   >