Re: Add recovery to pg_control and remove backup_label

David Steele Tue, 21 Nov 2023 09:18:11 -0800

On 11/21/23 12:41, Andres Freund wrote:


On 2023-11-21 07:42:42 -0400, David Steele wrote:

On 11/20/23 19:58, Andres Freund wrote:

On 2023-11-21 08:52:08 +0900, Michael Paquier wrote:

On Mon, Nov 20, 2023 at 12:37:46PM -0800, Andres Freund wrote:

Given that, I wonder if what we should do is to just add a new field to
pg_control that says "error out if backup_label does not exist", that we set
when creating a streaming base backup


That would mean that one still needs to take an extra step to update a
control file with this byte set, which is something you had a concern
with in terms of compatibility when it comes to external backup
solutions because more steps are necessary to take a backup, no?


I was thinking we'd just set it in the pg_basebackup style path, and we'd
error out if it's set and backup_label is present. But we'd still use
backup_label without the pg_control flag set.

So it'd just provide a cross-check that backup_label was not removed for
pg_basebackup style backup, but wouldn't do anything for external backups. But
imo the proposal to just us pg_control doesn't actually do anything for
external backups either - which is why I think my proposal would achieve as
much, for a much lower price.


I'm not sure why you think the patch under discussion doesn't do anything
for external backups. It provides the same benefits to both pg_basebackup
and external backups, i.e. they both receive the updated version of
pg_control.


Sure. They also receive a backup_label today. If an external solution forgets
to replace pg_control copied as part of the filesystem copy, they won't get an
error after the remove of backup_label, just like they don't get one today if
they don't put backup_label in the data directory.  Given that users don't do
the right thing with backup_label today, why can we rely on them doing the
right thing with pg_control?

I think reliable backup software does the right thing with backup_label,but if the user starts getting errors on recovery they the decide toremove backup_label. I know we can't do much about bad backup software,but we can at least make this a bit more resistant to user error afterthe fact.

It doesn't help that one of our hints suggests removing backup_label. Inhighly automated systems, the user might not even know they justrestored from a backup. They are only in the loop because the restorefailed and they are trying to figure out what is going wrong. When theyremove backup_label the cluster comes up just fine. Victory!

This is the scenario I've seen most often -- not the backup/restoreprocess getting it wrong but the user removing backup_label on their owninitiative. And because it yields such a positive result, at leastinitially, they remember in the future that the thing to do is to removebackup_label whenever they see the error.

If they only have pg_control, then their only choice is to get it rightor run pg_resetwal. Most users have no knowledge of pg_resetwal so itwill take them longer to get there. Also, I think that tool make itpretty clear that corruption will result and the only thing to do is alogical dump and restore after using it.

There are plenty of ways a user can mess things up. What I'd like toprevent is the appearance of everything being OK when in fact they havecorrupted their cluster. That's the situation we have now withbackup_label. Is this new solution perfect? No, but I do think it checksseveral boxes, and is a worthwhile improvement.


Regards,
-David

Regards,
-David

Re: Add recovery to pg_control and remove backup_label

Reply via email to