On Wed, Jul 20, 2022 at 8:21 PM Fujii Masao <[email protected]> wrote:
>
> Hi,
>
> I'd like to propose to remove "whichChkpt" and "report" arguments in
> ReadCheckpointRecord(). "report" is obviously useless because it's always
> true, i.e., there are two callers of the function and they always specify
> true as "report".
Yes, the report parameter is obvious to delete. The commit 1d919de5eb
removed the only call with the report parameter as false.
> "whichChkpt" indicates where the specified checkpoint location came from,
> pg_control or backup_label. This information is used to log different
> messages as follows.
>
> switch (whichChkpt)
> {
> case 1:
> ereport(LOG,
> (errmsg("invalid primary
> checkpoint link in control file")));
> break;
> default:
> ereport(LOG,
> (errmsg("invalid checkpoint
> link in backup_label file")));
> break;
> }
> return NULL;
> ...
> switch (whichChkpt)
> {
> case 1:
> ereport(LOG,
> (errmsg("invalid primary
> checkpoint record")));
> break;
> default:
> ereport(LOG,
> (errmsg("invalid checkpoint
> record")));
> break;
> }
> return NULL;
> ...
>
> But the callers of ReadCheckpointRecord() already output different log
> messages depending on where the invalid checkpoint record came from. So even
> if ReadCheckpointRecord() doesn't use "whichChkpt", i.e., use the same log
> message in both pg_control and backup_label cases, users can still identify
> where the invalid checkpoint record came from, by reading the log message.
>
> Also when whichChkpt = 0, "primary checkpoint" is used in the log message and
> sounds confusing because, as far as I recall correctly, we removed the
> concept of primary and secondary checkpoints before.
Yes, using "primary checkpoint" confuses, as we usually refer primary
in the context of replication and HA.
> Therefore I think that it's better to remove "whichChkpt" argument in
> ReadCheckpointRecord().
>
> Patch attached. Thought?
How about we transform the following messages into something like below?
(errmsg("could not locate a valid checkpoint record"))); after
ReadCheckpointRecord() for control file cases to "could not locate
valid checkpoint record in control file"
(errmsg("could not locate required checkpoint record"), after
ReadCheckpointRecord() for backup_label case to "could not locate
valid checkpoint record in backup_label file"
The above messages give more meaningful and unique info to the users.
May be unrelated, IIRC, for the errors like ereport(PANIC,
(errmsg("could not locate a valid checkpoint record"))); we wanted to
add a hint asking users to consider running pg_resetwal to fix the
issue. The error for ReadCheckpointRecord() in backup_label file
cases, already gives a hint errhint("If you are restoring from a
backup, touch \"%s/recovery.signal\" ...... Adding the hint of running
pg_resetwal (of course with a caution that it can cause inconsistency
in the data and use it as a last resort as described in the docs)
helps users and support engineers a lot to mitigate the server down
cases quickly.
Regards,
Bharath Rupireddy.