On 2021-Sep-03, Kyotaro Horiguchi wrote:
> At Thu, 2 Sep 2021 18:43:33 -0400, Alvaro Herrera <[email protected]>
> wrote in
> 0002:
> > + XLogRecPtr abortedContrecordPtr; /* LSN of incomplete record at
> > end of
> > + *
> > WAL */
>
> The name sounds like the start LSN. doesn't contrecordAbort(ed)Ptr work?
I went over various iterations of the name of this, and still not
entirely happy. I think we need to convey the ideas that
* This is the endptr+1 of the known-good part of the record, that is,
the beginning of the next part of the record. I think "endPtr"
summarizes this well; we use this name elsewhere.
* At some point before recovery, this was the last WAL record that
existed
* there is an invalid contrecord, or we were looking for a contrecord
and found invalid data
* this record is incomplete
So maybe
1. incompleteRecEndPtr
2. finalInvalidRecEndPtr
3. brokenContrecordEndPtr
> > if (!(pageHeader->xlp_info & XLP_FIRST_IS_CONTRECORD))
> > {
> > report_invalid_record(state,
> >
> > "there is no contrecord flag at %X/%X",
> >
> > LSN_FORMAT_ARGS(RecPtr));
> > - goto err;
> > + goto aborted_contrecord;
>
> This loses the exclusion check between XLP_FIRST_IS_CONTRECORD and
> _IS_ABROTED_PARTIAL. Is it okay? (I don't object to remove the check.).
Yeah, I was unsure about that. I think it's good to have it as a
cross-check, though it should never occur. I'll put it back.
Another related point is whether it's a good idea to have the ereport()
about the bit appearing in a not-start-of-page address being a PANIC.
If we degrade to WARNING then it'll be lost in the noise, but I'm not
sure what else can we do. (If it's a PANIC, then you end up with an
unusable database).
> I didin't thought this as an aborted contrecord. but on second
> thought, when we see a record broken in any style, we stop recovery at
> the point. I agree to the change and all the silmiar changes.
>
> + /* XXX should we goto
> aborted_contrecord here? */
>
> I think it should be aborted_contrecord.
>
> When that happens, the loaded bytes actually looked like the first
> fragment of a continuation record to xlogreader, even if the cause
> were a broken total_len. So if we abort the record there, the next
> time xlogreader will meet XLP_FIRST_IS_ABORTED_PARTIAL at the same
> page, and correctly finds a new record there.
>
> On the other hand if we just errored-out there, we will step-back to
> the beginning of the broken record in the previous page or segment
> then start writing a new record there but that is exactly what we want
> to avoid now.
Hmm, yeah, makes sense.
--
Álvaro Herrera 39°49'30"S 73°17'W — https://www.EnterpriseDB.com/