On 2021-Sep-03, Alvaro Herrera wrote:
> The last commit is something I noticed in pg_rewind ...
I had missed this one; it's pushed now.
--
Álvaro HerreraBreisgau, Deutschland — https://www.EnterpriseDB.com/
"I can see support will not be a problem. 10 out of 10."(Simon Wittber)
On 2021-Nov-26, Amul Sul wrote:
> On Fri, Nov 26, 2021 at 1:42 AM Tom Lane wrote:
> >
> > Meh ... but given the simplicity of the write-side fix, maybe changing
> > it is appropriate.
Actually, fixing the other side is equally simple, and it is also more
correct. What changed my mind is that
On Fri, Nov 26, 2021 at 1:42 AM Tom Lane wrote:
>
> Alvaro Herrera writes:
> > On 2021-Nov-25, Tom Lane wrote:
> >> Really? AFAICS the WAL record contains the correct value, or at least
> >> we should define that one as being correct, for precisely this reason.
>
> > I don't know what is the
I wrote:
> However, this seems too forgiving:
... also, I don't know if you intended this already, but the
VerifyOverwriteContrecord change should only be applied in
back branches. There's no need for it in HEAD.
regards, tom lane
Alvaro Herrera writes:
> On 2021-Nov-25, Tom Lane wrote:
>> Really? AFAICS the WAL record contains the correct value, or at least
>> we should define that one as being correct, for precisely this reason.
> I don't know what is the correct value for a record that comes exactly
> after the page
On 2021-Nov-25, Tom Lane wrote:
> Alvaro Herrera writes:
>
> > The problem is that the bug occurs while writing the WAL record. Fixed
> > servers won't produce such records, but if you run an unpatched server
> > and it happens to write one, without a mitigation you cannot get away
> > from
Alvaro Herrera writes:
> On 2021-Nov-25, Tom Lane wrote:
>> Uh, why? The fix should remove the problem, and if it doesn't, we're
>> still looking at inconsistent WAL aren't we?
> The problem is that the bug occurs while writing the WAL record. Fixed
> servers won't produce such records, but if
On 2021-Nov-25, Tom Lane wrote:
> Alvaro Herrera writes:
> > Oh, but also I think I should push a mitigation in case a production
> > system hits this problem: maybe reduce the message from FATAL to WARNING
> > if the registered LSN is at a page boundary.
>
> Uh, why? The fix should remove the
Alvaro Herrera writes:
> Oh, but also I think I should push a mitigation in case a production
> system hits this problem: maybe reduce the message from FATAL to WARNING
> if the registered LSN is at a page boundary.
Uh, why? The fix should remove the problem, and if it doesn't, we're
still
Oh, but also I think I should push a mitigation in case a production
system hits this problem: maybe reduce the message from FATAL to WARNING
if the registered LSN is at a page boundary.
--
Álvaro Herrera Valdivia, Chile — https://www.EnterpriseDB.com/
"Entristecido, Wutra
On 2021-Nov-25, Amul Sul wrote:
> In XLogReadRecord(), both the variables being compared have
> inconsistency in the assignment -- one gets assigned from
> state->currRecPtr where other is from RecPtr.
>
> .
> state->overwrittenRecPtr = state->currRecPtr;
> .
> state->abortedRecPtr =
On Wed, Nov 24, 2021 at 2:10 AM Alvaro Herrera wrote:
>
> On 2021-Nov-23, Tom Lane wrote:
>
> > We're *still* not out of the woods with 026_overwrite_contrecord.pl,
> > as we are continuing to see occasional "mismatching overwritten LSN"
> > failures, further down in the test where it tries to
On 2021-Nov-23, Tom Lane wrote:
> We're *still* not out of the woods with 026_overwrite_contrecord.pl,
> as we are continuing to see occasional "mismatching overwritten LSN"
> failures, further down in the test where it tries to start up the
> standby:
Augh.
> Looking at adjacent successful
We're *still* not out of the woods with 026_overwrite_contrecord.pl,
as we are continuing to see occasional "mismatching overwritten LSN"
failures, further down in the test where it tries to start up the
standby:
sysname |branch | snapshot | stage |
Tom Lane writes:
> Also, I think we want
>
> -ok($initfile != $endfile, "$initfile differs from $endfile");
> +ok($initfile ne $endfile, "$initfile differs from $endfile");
>
> The existing coding works as long as all characters of these
> WAL segment names happen to be decimal digits, but ...
On 2021-Nov-09, Tom Lane wrote:
> This is still happening off and on, which makes it look like a
> timing-sensitive problem. Confirming that, I can make it fail
> every time by adding a long sleep just ahead of where
> 026_overwrite_contrecord.pl captures $initfile. On reflection
> I think the
I wrote:
> Seems like this hasn't fixed the problem: skink still fails on
> this test occasionally.
> # Failed test '00010002 differs from
> 00010002'
> # at t/026_overwrite_contrecord.pl line 61.
This is still happening off and on, which makes it look like a
On 2021-Nov-08, Alexander Lakhin wrote:
> Hello Alvaro,
> 14.10.2021 01:09, Alvaro Herrera wrote:
> >> Yea, let's go for your patch then. I've verified that at least locally it
> >> passes under valgrind.
> > Ah great, thanks. Pushed then.
> >
> While translating messages I've noticed that the
Hello Alvaro,
14.10.2021 01:09, Alvaro Herrera wrote:
>> Yea, let's go for your patch then. I've verified that at least locally it
>> passes under valgrind.
> Ah great, thanks. Pushed then.
>
While translating messages I've noticed that the version of the patch
ported to
Alvaro Herrera writes:
> On 2021-Oct-13, Andres Freund wrote:
>> Yea, let's go for your patch then. I've verified that at least locally it
>> passes under valgrind.
> Ah great, thanks. Pushed then.
Seems like this hasn't fixed the problem: skink still fails on
this test occasionally.
At Mon, 25 Oct 2021 10:34:27 +0530, Amul Sul wrote in
> On Mon, Oct 25, 2021 at 7:02 AM Kyotaro Horiguchi
> wrote:
> >
> > At Fri, 22 Oct 2021 18:43:52 +0530, Amul Sul wrote in
> > > Any thoughts about the patch posted previously?
> >
> > Honestly, xlogreader looks fine with the current shape.
On Mon, Oct 25, 2021 at 7:02 AM Kyotaro Horiguchi
wrote:
>
> At Fri, 22 Oct 2021 18:43:52 +0530, Amul Sul wrote in
> > Any thoughts about the patch posted previously?
>
> Honestly, xlogreader looks fine with the current shape. The reason is
> that it seems cleaner as an interface boundary since
At Fri, 22 Oct 2021 18:43:52 +0530, Amul Sul wrote in
> Any thoughts about the patch posted previously?
Honestly, xlogreader looks fine with the current shape. The reason is
that it seems cleaner as an interface boundary since the caller of
xlogreader doesn't need to know about the details of
On Thu, Oct 14, 2021 at 6:14 PM Amul Sul wrote:
>
> On Wed, Oct 13, 2021 at 10:58 PM Alvaro Herrera
> wrote:
> >
> > On 2021-Oct-13, Amul Sul wrote:
> >
> > > I have one more question, regarding the need for other global
> > > variables i.e. abortedRecPtr. (Sorry for coming back after so
On Wed, Oct 13, 2021 at 10:58 PM Alvaro Herrera wrote:
>
> On 2021-Oct-13, Amul Sul wrote:
>
> > I have one more question, regarding the need for other global
> > variables i.e. abortedRecPtr. (Sorry for coming back after so long.)
> >
> > Instead of abortedRecPtr point, isn't enough to write
>
On 2021-Oct-13, Andres Freund wrote:
> Hi,
>
> On 2021-10-13 17:46:53 -0300, Alvaro Herrera wrote:
> > On 2021-Oct-13, Andres Freund wrote:
> >
> > > Something very roughly like the attached. Perhaps that's going a bit
> > > overboard
> > > though. But it seems like it might be something we
Hi,
On 2021-10-13 17:46:53 -0300, Alvaro Herrera wrote:
> On 2021-Oct-13, Andres Freund wrote:
>
> > Something very roughly like the attached. Perhaps that's going a bit
> > overboard
> > though. But it seems like it might be something we could use in a few tests?
>
> Hah, our emails crossed.
Hi
On 2021-Oct-13, Andres Freund wrote:
> Something very roughly like the attached. Perhaps that's going a bit overboard
> though. But it seems like it might be something we could use in a few tests?
Hah, our emails crossed. If you want to turn this into a patch to the
026 test file, please go
This works nicely with the TAP test:
--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
"Postgres is bloatware by design: it was built to house
PhD theses." (Joey Hellerstein, SIGMOD annual conference 2002)
>From e9bcdb3b44f36ffa1a2377f252d73ff3d7210fd9 Mon Sep
Hi,
On 2021-10-13 12:13:45 -0700, Andres Freund wrote:
> On 2021-10-13 15:52:46 -0300, Alvaro Herrera wrote:
> > I think I realized partway through writing the test that I could use
> > emit_message instead of using a batched row insert ... so, yeah, we
> > can use it here also.
>
> Cool. Even if
On 2021-Oct-13, Andres Freund wrote:
> > > Another thing: filling a segment by inserting lots of very tiny rows is
> > > pretty
> > > expensive. Can't we use something a bit wider? Perhaps even emit_message?
>
> FWIW, the count of inserted rows is something like 171985 ;)
This does ~1600
Hi,
On 2021-10-13 15:52:46 -0300, Alvaro Herrera wrote:
> > Hm. I guess we can disable autovac. But that's not a great solution, there
> > might be WAL files due to catalog access etc too.
>
> Well, we don't expect anything else to happen -- the cluster is
> otherwise idle. I think we should do
On 2021-Oct-13, Andres Freund wrote:
> I added LSNs to the error message:
> not ok 1 - 00010002 (0/2002350) differs from
> 00010002 (0/2099600)
>
> It appears that the problem is that inbetween the determination of
> rows_walsize the insert LSN moved to the next
Hi,
On 2021-10-13 11:03:39 -0700, Andres Freund wrote:
> It seems that 026_overwrite_contrecord.pl test often fails under valgrind. I
> first thought that related failures on skink were due to me migrating the
> animal to a new host (and then running out of space due to a mistake in ccache
>
Hi,
It seems that 026_overwrite_contrecord.pl test often fails under valgrind. I
first thought that related failures on skink were due to me migrating the
animal to a new host (and then running out of space due to a mistake in ccache
config). But it happened again after I fixed those, and I just
On 2021-Oct-13, Robert Haas wrote:
> On Wed, Oct 13, 2021 at 2:01 AM Amul Sul wrote:
> > Instead of abortedRecPtr point, isn't enough to write
> > overwrite-contrecord at XLogCtl->lastReplayedEndRecPtr? I think both
> > are pointing to the same location then can't we use
> >
On 2021-Oct-13, Amul Sul wrote:
> I have one more question, regarding the need for other global
> variables i.e. abortedRecPtr. (Sorry for coming back after so long.)
>
> Instead of abortedRecPtr point, isn't enough to write
> overwrite-contrecord at XLogCtl->lastReplayedEndRecPtr? I think
On Wed, Oct 13, 2021 at 2:01 AM Amul Sul wrote:
> Instead of abortedRecPtr point, isn't enough to write
> overwrite-contrecord at XLogCtl->lastReplayedEndRecPtr? I think both
> are pointing to the same location then can't we use
> lastReplayedEndRecPtr instead of abortedRecPtr to write
>
On 2021-Sep-25, Alvaro Herrera wrote:
>> On 2021-Sep-24, Alvaro Herrera wrote:
>>
>> > Here's the set for all branches, which I think are really final, in
>> > case somebody wants to play and reproduce their respective problem
>> scenarios.
>>
>> I forgot to mention that I'll wait until 14.0 is
Hi,
On Thu, Oct 7, 2021 at 6:57 PM Alvaro Herrera wrote:
>
> On 2021-Oct-07, Amul Sul wrote:
>
> > Make sense, thanks for the explanation.
>
> You're welcome. Also, I forgot: thank you for taking the time to review
> the code. Much appreciated.
:)
>
>
I have one more question, regarding the
On 2021-Oct-07, Amul Sul wrote:
> Make sense, thanks for the explanation.
You're welcome. Also, I forgot: thank you for taking the time to review
the code. Much appreciated.
--
Álvaro Herrera 39°49'30"S 73°17'W — https://www.EnterpriseDB.com/
On Thu, 7 Oct 2021 at 6:41 PM, Alvaro Herrera
wrote:
> On 2021-Oct-07, Amul Sul wrote:
>
> > While reading this commit (ff9f111bce24), wondered can't we skip
> > missingContrecPtr global variable declaration and calculate that from
> > abortedRecPtr value whenever it needed. IIUC,
On 2021-Oct-07, Amul Sul wrote:
> While reading this commit (ff9f111bce24), wondered can't we skip
> missingContrecPtr global variable declaration and calculate that from
> abortedRecPtr value whenever it needed. IIUC, missingContrecPtr is the
> next page to the page that abortedRecPtr contain
Hi,
On Wed, Sep 29, 2021 at 8:14 PM Alvaro Herrera wrote:
>
> On 2021-Sep-24, Alvaro Herrera wrote:
>
> > Here's the set for all branches, which I think are really final, in case
> > somebody wants to play and reproduce their respective problem scenarios.
> > Nathan already confirmed that his
Alvaro Herrera writes:
> Hmm. Well, as I said, maybe this part of the test isn't worth much
> anyway. Rather than spending time trying to figure out why isn't this
> triggering the WAL overwriting, I compared the coverage report for
> running only the first test to the coverage report of
On 2021-Sep-30, Tom Lane wrote:
> Just when you thought it was safe to go back in the water:
>
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prairiedog=2021-09-29%2022%3A05%3A44
>
> which is complaining that the (misspelled, BTW)
Ah, the case of the missing juxtaposed consonants.
Just when you thought it was safe to go back in the water:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prairiedog=2021-09-29%2022%3A05%3A44
which is complaining that the (misspelled, BTW) log message
'sucessfully skipped missing contrecord at' doesn't show up.
This machine is old,
Alvaro Herrera writes:
> On 2021-Sep-30, Tom Lane wrote:
>> There's still the issue of these tests overriding postgresql.conf
>> entries made by init(), but maybe we can live with that?
> I vote to at least have has_archiving=>1 set wal_level=replica, and
> potentially max_wal_senders=2 too
On 2021-Sep-30, Tom Lane wrote:
> Andrew Dunstan writes:
> > Regardless of this problem, I think we should simply call
> > set_replication_conf unconditionally in init(). Replication connections
> > are now allowed by default on Unix, this would just bring Windows nodes
> > into line with that.
Andrew Dunstan writes:
> Regardless of this problem, I think we should simply call
> set_replication_conf unconditionally in init(). Replication connections
> are now allowed by default on Unix, this would just bring Windows nodes
> into line with that.
Yeah, I was thinking along the same lines
On 9/29/21 5:29 PM, Tom Lane wrote:
> Andrew Dunstan writes:
>> On 9/29/21 4:33 PM, Tom Lane wrote:
>>> which looks like a pretty straightforward bogus-connection-configuration
>>> problem, except why wouldn't other BF members show it?
>> This:
>> ...
>> doesn't have "allows_streaming => 1".
>
Hello
On 2021-Sep-30, Andres Freund wrote:
> FWIW, with that fixed I see the test hanging (e.g. [1]):
>
> can't unlink
> c:/cirrus/src/test/recovery/tmp_check/t_026_overwrite_contrecord_primary2_data/archives/00010008.fail:
> Permission denied at t/026_overwrite_contrecord.pl
Hi,
On 2021-09-29 17:04:48 -0400, Andrew Dunstan wrote:
> On 9/29/21 4:33 PM, Tom Lane wrote:
> # Second test: a standby that receives WAL via archive/restore commands.
> $node = PostgresNode->new('primary2');
> $node->init(
> has_archiving => 1,
> extra =>
Andrew Dunstan writes:
> On 9/29/21 5:27 PM, Alvaro Herrera wrote:
>> So, do we take the stance that we have no right to expect pg_basebackup
>> to work if we didn't pass allow_streaming => 1? If so, the fix is to
>> add it. But my preferred fix would be to call set_replication_conf if
>>
On 9/29/21 5:27 PM, Alvaro Herrera wrote:
> On 2021-Sep-29, Andrew Dunstan wrote:
>
>>> The relevant info seems to be
>>>
>>> # Running: pg_basebackup -D
>>> /home/pgrunner/bf/root/REL_14_STABLE/pgsql.build/src/test/recovery/tmp_check/t_026_overwrite_contrecord_primary2_data/backup/backup
>>>
On 2021-Sep-29, Tom Lane wrote:
> ... although I wonder why the fact that sub init otherwise sets
> wal_level = minimal doesn't cause a problem for this test.
> Maybe the test script is cowboy-ishly overriding that?
It is. (I claim that it should be set otherwise when has_archiving=>1).
--
Andrew Dunstan writes:
> On 9/29/21 4:33 PM, Tom Lane wrote:
>> which looks like a pretty straightforward bogus-connection-configuration
>> problem, except why wouldn't other BF members show it?
> This:
> ...
> doesn't have "allows_streaming => 1".
Oh, and that only breaks things on Windows, cf
On 2021-Sep-29, Andrew Dunstan wrote:
> > The relevant info seems to be
> >
> > # Running: pg_basebackup -D
> > /home/pgrunner/bf/root/REL_14_STABLE/pgsql.build/src/test/recovery/tmp_check/t_026_overwrite_contrecord_primary2_data/backup/backup
> > -h 127.0.0.1 -p 59502 --checkpoint fast
On 9/29/21 4:33 PM, Tom Lane wrote:
> Alvaro Herrera writes:
>> Pushed. Watching for buildfarm fireworks now.
> jacana didn't like it:
>
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=jacana=2021-09-29%2018%3A25%3A53
>
> The relevant info seems to be
>
> # Running: pg_basebackup -D
Alvaro Herrera writes:
> Pushed. Watching for buildfarm fireworks now.
jacana didn't like it:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=jacana=2021-09-29%2018%3A25%3A53
The relevant info seems to be
# Running: pg_basebackup -D
On 2021-Sep-24, Hannu Krosing wrote:
Hi Hannu
> In the first email you write
>
> > As mentioned in the course of thread [1], we're missing a fix for
> streaming replication to avoid sending records that the primary hasn't
> fully flushed yet. This patch is a first attempt at fixing that
On 2021-Sep-24, Alvaro Herrera wrote:
> Here's the set for all branches, which I think are really final, in case
> somebody wants to play and reproduce their respective problem scenarios.
> Nathan already confirmed that his reproducer no longer shows a problem,
> and this version shouldn't affect
On 2021-Sep-24, Alvaro Herrera wrote:
> Here's the set for all branches, which I think are really final, in case
> somebody wants to play and reproduce their respective problem scenarios.
I forgot to mention that I'll wait until 14.0 is tagged before getting
anything pushed.
--
Álvaro Herrera
Hi Alvaro,
I just started reading this thread, but maybe you can confirm or
refute my understanding of what was done.
In the first email you write
> As mentioned in the course of thread [1], we're missing a fix for
streaming replication to avoid sending records that the primary hasn't
fully
On 2021-Sep-23, Alvaro Herrera wrote:
> However, I notice now that the pg_rewind tests reproducibly fail in
> branch 14 for reasons I haven't yet understood. It's strange that no
> other branch fails, even when run quite a few times.
Turns out that this is a real bug (setting EndOfLog seems
This took some time because backpatching the tests was more work than I
anticipated -- function name changes, operators that don't exist,
definition of the WAL segment size in pg_settings. I had to remove the
second test in branches 13 and earlier due to lack of LSN+bytes
operator. Fortunately,
I made one final pass over the whole thing to be sure it's all commented
as thoroughly as possible, and changed the names of things to make them
all consistent. So here's the last version which I intend to push to
all branches soon. (The only difference in back-branches is that
the xlogreader
On 2021-Sep-17, Bossart, Nathan wrote:
> > That was the first implementation, a few versions of the patch ago. An
> > added benefit of a separate WAL record is that you can carry additional
> > data for validation, such as -- as suggested by Andres -- the CRC of the
> > partial data contained in
On 9/17/21, 1:32 PM, "Alvaro Herrera" wrote:
> On 2021-Sep-17, Bossart, Nathan wrote:
>
>> I gave the patch a read-through. I'm wondering if the
>> XLOG_OVERWRITE_CONTRECORD records are actually necessary. IIUC we
>> will set XLP_FIRST_IS_ABORTED_PARTIAL on the next page, and
>> xlp_pageaddr on
On 2021-Sep-17, Bossart, Nathan wrote:
> I gave the patch a read-through. I'm wondering if the
> XLOG_OVERWRITE_CONTRECORD records are actually necessary. IIUC we
> will set XLP_FIRST_IS_ABORTED_PARTIAL on the next page, and
> xlp_pageaddr on that page will already be validated in
>
On 9/17/21, 10:35 AM, "Alvaro Herrera" wrote:
> On 2021-Sep-17, Bossart, Nathan wrote:
>
>> On 9/17/21, 9:37 AM, "Alvaro Herrera" wrote:
>
>> > If you have some time to try your reproducers with this new proposed
>> > fix, I would appreciate it.
>>
>> I haven't had a chance to look at the patch
On 2021-Sep-17, Bossart, Nathan wrote:
> On 9/17/21, 9:37 AM, "Alvaro Herrera" wrote:
> > If you have some time to try your reproducers with this new proposed
> > fix, I would appreciate it.
>
> I haven't had a chance to look at the patch yet, but it appears to fix
> things with my original
On 2021-Sep-17, Alvaro Herrera wrote:
> Added Matsumura-san to CC, because he was interested in this topic too
> per the earlier thread.
I failed to do this, so hopefully this serves as a ping.
--
Álvaro Herrera 39°49'30"S 73°17'W — https://www.EnterpriseDB.com/
On 9/17/21, 9:37 AM, "Alvaro Herrera" wrote:
> OK, this version is much more palatable, because here we verify that the
> OVERWRITE_CONTRECORD we replay matches the record that was lost. Also,
> I wrote a test script that creates such a broken record (by the simple
> expedient of deleting the
OK, this version is much more palatable, because here we verify that the
OVERWRITE_CONTRECORD we replay matches the record that was lost. Also,
I wrote a test script that creates such a broken record (by the simple
expedient of deleting the WAL file containing the second half while the
server is
On 2021-Sep-15, Kyotaro Horiguchi wrote:
> + CopyXLogRecordToWAL(rechdr->xl_tot_len, isLogSwitch,
> + flags &
> XLOG_SET_ABORTED_PARTIAL,
> + rdata, StartPos,
> EndPos);
>
> The
At Tue, 14 Sep 2021 22:32:04 -0300, Alvaro Herrera
wrote in
> On 2021-Sep-14, Alvaro Herrera wrote:
>
> > On 2021-Sep-08, Kyotaro Horiguchi wrote:
> >
> > > Thanks! As my understanding the new record add the ability to
> > > cross-check between a teard-off contrecord and the new record
On 2021-Sep-14, Alvaro Herrera wrote:
> On 2021-Sep-08, Kyotaro Horiguchi wrote:
>
> > Thanks! As my understanding the new record add the ability to
> > cross-check between a teard-off contrecord and the new record inserted
> > after the teard-off record. I didn't test the version by myself
On 2021-Sep-08, Kyotaro Horiguchi wrote:
> Thanks! As my understanding the new record add the ability to
> cross-check between a teard-off contrecord and the new record inserted
> after the teard-off record. I didn't test the version by myself but
> the previous version implemented the
At Tue, 7 Sep 2021 18:41:57 +, "Bossart, Nathan"
wrote in
> On 9/4/21, 10:26 AM, "Alvaro Herrera" wrote:
> > Attached are the same patches as last night, except I added a test for
> > XLOG_DEBUG where pertinent. (The elog(PANIC) is not made conditional on
> > that, since it's a
On 9/4/21, 10:26 AM, "Alvaro Herrera" wrote:
> Attached are the same patches as last night, except I added a test for
> XLOG_DEBUG where pertinent. (The elog(PANIC) is not made conditional on
> that, since it's a cross-check rather than informative.) Also fixed the
> silly pg_rewind mistake I
On 2021-Sep-03, Andres Freund wrote:
> > +#ifdef WAL_DEBUG
> > + ereport(LOG,
> > + (errmsg_internal("recovery overwriting broken
> > contrecord at %X/%X (EndRecPtr: %X/%X)",
> > +
> >
On 2021-Sep-03, Andres Freund wrote:
> Hi,
>
> On 2021-09-03 20:01:50 -0400, Alvaro Herrera wrote:
> > From 6abc5026f92b99d704bff527d1306eb8588635e9 Mon Sep 17 00:00:00 2001
> > From: Alvaro Herrera
> > Date: Tue, 31 Aug 2021 20:55:10 -0400
> > Subject: [PATCH v3 1/5] Revert "Avoid creating
Hi,
On 2021-09-03 20:01:50 -0400, Alvaro Herrera wrote:
> From 6abc5026f92b99d704bff527d1306eb8588635e9 Mon Sep 17 00:00:00 2001
> From: Alvaro Herrera
> Date: Tue, 31 Aug 2021 20:55:10 -0400
> Subject: [PATCH v3 1/5] Revert "Avoid creating archive status ".ready" files
> too early"
> This
I thought that the way to have debug output for this new WAL code is to
use WAL_DEBUG; that way it won't bother anyone and we can remove them
later if necessary.
Also, I realized that we should cause any error in the path that
assembles a record from contrecords is to set a flag that we can test
On 2021-Sep-03, Andres Freund wrote:
> Hi,
>
> On 2021-09-03 12:55:23 -0400, Alvaro Herrera wrote:
> > Oh, but of course we can't modify XLogReaderState in backbranches to add
> > the new struct member abortedContrecordPtr (or whatever we end up naming
> > that.)
>
> Why is that? Afaict it's
Hi,
On 2021-09-03 12:55:23 -0400, Alvaro Herrera wrote:
> Oh, but of course we can't modify XLogReaderState in backbranches to add
> the new struct member abortedContrecordPtr (or whatever we end up naming
> that.)
Why is that? Afaict it's always allocated via XLogReaderAllocate(), so adding
a
On 2021-Sep-03, Kyotaro Horiguchi wrote:
> At Thu, 2 Sep 2021 18:43:33 -0400, Alvaro Herrera
> wrote in
> The name sounds like the start LSN. doesn't contrecordAbort(ed)Ptr work?
>
> > if (!(pageHeader->xlp_info & XLP_FIRST_IS_CONTRECORD))
> > {
> >
Oh, but of course we can't modify XLogReaderState in backbranches to add
the new struct member abortedContrecordPtr (or whatever we end up naming
that.)
I think I'm going to fix this, in backbranches only, by having
xlogreader.c have a global variable, which is going to be used by
ReadRecord
On 2021-Sep-03, Kyotaro Horiguchi wrote:
> At Thu, 2 Sep 2021 18:43:33 -0400, Alvaro Herrera
> wrote in
> 0002:
> > + XLogRecPtr abortedContrecordPtr; /* LSN of incomplete record at
> > end of
> > + *
> > WAL */
>
At Thu, 2 Sep 2021 18:43:33 -0400, Alvaro Herrera
wrote in
> On 2021-Sep-02, Kyotaro Horiguchi wrote:
>
> > So, this is a crude PoC of that.
>
> I had ended up with something very similar, except I was trying to cram
> the flag via the checkpoint record instead of hacking
>
On 2021-Sep-02, Kyotaro Horiguchi wrote:
> So, this is a crude PoC of that.
I had ended up with something very similar, except I was trying to cram
the flag via the checkpoint record instead of hacking
AdvanceXLInsertBuffer(). I removed that stuff and merged both, here's
the result.
> 1. This
At Wed, 1 Sep 2021 15:01:43 +0900, Fujii Masao
wrote in
>
>
> On 2021/09/01 12:15, Andres Freund wrote:
> > Hi,
> > On 2021-09-01 11:34:34 +0900, Fujii Masao wrote:
> >> On 2021/09/01 0:53, Andres Freund wrote:
> >>> Of course, we need to be careful to not weaken WAL validity checking
> >>>
On 9/1/21, 10:06 AM, "Andres Freund" wrote:
> On 2021-09-01 15:01:43 +0900, Fujii Masao wrote:
>> Thanks for clarifying that! Unless I misunderstand that, when recovery ends
>> at a partially-flushed segment-spanning record, it sets
>> XLP_FIRST_IS_ABORTED_PARTIAL in the next segment before
Hi,
On 2021-09-01 15:01:43 +0900, Fujii Masao wrote:
> Thanks for clarifying that! Unless I misunderstand that, when recovery ends
> at a partially-flushed segment-spanning record, it sets
> XLP_FIRST_IS_ABORTED_PARTIAL in the next segment before starting writing
> new WAL data there. Therefore
On 2021/09/01 12:15, Andres Freund wrote:
Hi,
On 2021-09-01 11:34:34 +0900, Fujii Masao wrote:
On 2021/09/01 0:53, Andres Freund wrote:
Of course, we need to be careful to not weaken WAL validity checking too
much. How about the following:
If we're "aborting" a continued record, we set
At Tue, 31 Aug 2021 20:15:24 -0700, Andres Freund wrote in
> Hi,
>
> On 2021-09-01 11:34:34 +0900, Fujii Masao wrote:
> > On 2021/09/01 0:53, Andres Freund wrote:
> > > Of course, we need to be careful to not weaken WAL validity checking too
> > > much. How about the following:
> > >
> > > If
Hi,
On 2021-09-01 11:34:34 +0900, Fujii Masao wrote:
> On 2021/09/01 0:53, Andres Freund wrote:
> > Of course, we need to be careful to not weaken WAL validity checking too
> > much. How about the following:
> >
> > If we're "aborting" a continued record, we set XLP_FIRST_IS_ABORTED_PARTIAL
> >
On 2021/09/01 0:53, Andres Freund wrote:
I was thinking that on a normal WAL write we'd do nothing. Instead we would
have dedicated code at the end of recovery that, if the WAL ends in a partial
record, changes the page following the "valid" portion of the WAL to indicate
that an incomplete
Hi,
On 2021-08-31 09:56:30 -0400, Alvaro Herrera wrote:
> On 2021-Aug-30, Andres Freund wrote:
> > I think a better approach might be to handle this on the WAL layout
> > level. What if we never overwrite partial records but instead just
> > skipped over them during decoding?
>
> Maybe this is a
1 - 100 of 119 matches
Mail list logo