On 27.09.2021 11:30, Kyotaro Horiguchi wrote:
Thank you for the comments! (Sorry for the late resopnse.)

At Tue, 10 Aug 2021 14:14:05 -0400, Robert Haas <robertmh...@gmail.com> wrote in
On Thu, Mar 4, 2021 at 10:01 PM Kyotaro Horiguchi
<horikyota....@gmail.com> wrote:
The patch assumed that CHKPT_START/COMPLETE barrier are exclusively
used each other, but MarkBufferDirtyHint which delays checkpoint start
is called in RelationTruncate while delaying checkpoint completion.
That is not a strange nor harmful behavior.  I changed delayChkpt to a
bitmap integer from an enum so that both barrier are separately
triggered.

I'm not sure this is the way to go here, though.  This fixes the issue
of a crash during RelationTruncate, but the issue of smgrtruncate
failure during RelationTruncate still remains (unless we treat that
failure as PANIC?).
I like this patch. As I understand it, we're currently cheating by
allowing checkpoints to complete without necessarily flushing all of
the pages that were dirty at the time we fixed the redo pointer out to
disk. We think this is OK because we know that those pages are going
to get truncated away, but it's not really OK because when the system
starts up, it has to replay WAL starting from the checkpoint's redo
pointer, but the state of the page is not the same as it was at the
time when the redo pointer was the end of WAL, so redo fails. In the
case described in
http://postgr.es/m/byapr06mb63739b2692dc6dbb3c5f186cab...@byapr06mb6373.namprd06.prod.outlook.com
modifications are made to the page before the redo pointer is fixed
and those changes never make it to disk, but the truncation also never
makes it to the disk either. With this patch, that can't happen,
because no checkpoint can intervene between when we (1) decide we're
not going to bother writing those dirty pages and (2) actually
truncate them away. So either the pages will get written as part of
the checkpoint, or else they'll be gone before the checkpoint
completes. In the latter case, I suppose redo that would have modified
those pages will just be skipped, thus dodging the problem.
I think your understanding is right.

In RelationTruncate, I suggest that we ought to clear the
delay-checkpoint flag before rather than after calling
FreeSpaceMapVacuumRange. Since the free space map is not fully
WAL-logged, anything we're doing there should be non-critical. Also, I
Agreed and fixed.

think it might be better if MarkBufferDirtyHint stays closer to the
existing coding and just uses a Boolean and an if-test to decide
whether to clear the bit, instead of inventing a new mechanism. I
don't really see anything wrong with the new mechanism, but I think
it's better to keep the patch minimal.
Yeah, that was a a kind of silly. Fixed.

As you say, this doesn't fix the problem that truncation might fail.
But as Andres and Sawada-san said, the solution to that is to get rid
of the comments saying that it's OK for truncation to fail and make it
a PANIC. However, I don't think that change needs to be part of this
patch. Even if we do that, we still need to do this. And even if we do
this, we still need to do that.
Ok. Addition to the aboves, I rewrote the comment in RelatinoTruncate.

+        * Delay the concurrent checkpoint's completion until this truncation
+        * successfully completes, so that we don't establish a redo-point 
between
+        * buffer deletion and file-truncate. Otherwise we can leave 
inconsistent
+        * file content against the WAL records after the REDO position and 
future
+        * recovery fails.

However, a problem for me for now is that I cannot reproduce the
problem.

To avoid further confusion, the attached is named as *.patch.

regards.

Hi. This is my first attempt to review a patch so feel free to tell me if I missed something.

As of today's state of REL_14_STABLE (ef9706bbc8ce917a366e4640df8c603c9605817a), the problem is reproducible using the script provided by Daniel Wood in this (1335373813.287510.1573611814...@connect.xfinity.com) message. Also, the latest patch seems not to be applicable and requires some minor tweaks.


Regards,

Daniel Shelepanov



Reply via email to