On 10/14/23 11:30, David Steele wrote:
On 10/12/23 10:19, David Steele wrote:
On 10/11/23 18:10, Thomas Munro wrote:
As Stephen mentioned[1], we could perhaps also complain if both backup
label and control file exist, and then hint that the user should
remove the *control file* (not the backup label!). I had originally
suggested we would just overwrite the control file, but by explicitly
complaining about it we would also bring the matter to tool/script
authors' attention, ie that they shouldn't be backing that file up, or
should be removing it in a later step if they copy everything. He
also mentions that there doesn't seem to be anything stopping us from
back-patching changes to the backup label contents if we go this way.
I don't have a strong opinion on that and we could leave the question
for later.
I'm worried about the possibility of back patching this unless the
solution comes out to be simpler than I think and that rarely comes to
pass. Surely throwing errors on something that is currently valid
(i.e. backup_label and pg_control both present).
But perhaps there is a simpler, acceptable solution we could back
patch (transparent to all parties except Postgres) and then a more
advanced solution we could go forward with.
I guess I had better get busy on this.
Attached is a very POC attempt at something along these lines that could
be back patched. I stopped when I realized excluding pg_control from the
backup is a necessity to make this work (else we still could end up with
a torn copy of pg_control even if there is a good copy elsewhere). To
enumerate the back patch issues as I see them:
Given that the above can't be back patched, I'm thinking we don't need
backup_label at all going forward. We just write the values we need for
recovery into pg_control and return *that* from pg_backup_stop() and
tell the user to store it with their backup. We already have "These
files are vital to the backup working and must be written byte for byte
without modification, which may require opening the file in binary
mode." in the documentation so dealing with pg_control should not be a
problem. pg_control also has a CRC so we will know if it gets munged.
It doesn't really matter where/how they store pg_control as long as it
is written back into PGDATA before the cluster starts. If
backupEndRequired, etc., are set appropriately then recovery will do the
right thing when it starts, just as now if PG crashes after it has
renamed backup_label but before recovery to consistency has completed.
We can still enforce the presence of recovery.signal by checking
backupEndRequired if that's something we want but it seems like
backupEndRequired would be enough. I'm fine either way.
Another thing we can do here is make backup from standby easier. The
user won't need to worry about *when* pg_control is copied. We can just
write the ideal min recovery point into pg_control.
Any informational data currently in backup_label can be returned as
columns (like the start/end lsn is now).
This makes the patch much less invasive and while it definitely,
absolutely cannot be back patched, it seems like a good way forward.
This is the direction I'm planning to work on patch-wise but I'd like to
hear people's thoughts.
Regards,
-David