Hello Robert, > In general, +1, although I think that the patch needs polishing in a > bunch of areas.
Thanks for your review! > Originally, I thought what we wanted was something like a --in-place > option to pg_combinebackup, allowing the output directory to be the > same as one of the input directories. However, I now think that what > this patch does is likely better, and this patch is a lot simpler to > write than that one would be. With the --in-place option, if I'm > combine backup1 and backup2, I must write either "pg_combinebackup > backup1 backup2 -o backup1" or "pg_combinebackup backup1 backup2 -o > backup2". In either case, I can skip whole-file copies from one of the > two source directories -- whichever one happens to be the output > directory. However, if I write "pg_combinebackup --link backup1 > backup2 -o result", I can skip *all* whole-file copies from *every* > input directory, which seems a lot nicer. > > Furthermore, if all the input directories are staging directories, > basically copies of original backups stored elsewhere, then the fact > that those directories get trashed is of no concern to me. In fact, > they don't even get trashed if pg_combinebackup is interrupted partway > through, because I can just remove the output directory and recreate > it and everything is fine. With an --in-place option, that would be > trickier. I see! I thought of not reinventing the wheel while writing, so I replayed in pg_combinebackup the same behavior implemented in pg_upgrade. And turns out the outcomes sounded convincing. > Regarding the patch itself, I think we need to rethink the test case > changes in particular. As written, the patch won't do any testing of > link mode unless the user sets PG_TEST_PG_COMBINEBACKUP_MODE=--link; > and if they do set that option, then we won't test anything else. > Also, even with that environment variable set, we'll only test --link > mode a bit ... accidentally. The patch doesn't really do anything to > make sure that link mode actually does what it's intended to do. It > only adapts the existing tests not to fail. I think it would be better > to decide that you're not supposed to set > PG_TEST_PG_COMBINEBACKUP_MODE=--link, and then write some new test, > not depending on PG_TEST_PG_COMBINEBACKUP_MODE, that specifically > verifies that link mode behaves as expected. > > After all, link mode is noticeably different from the other copy > modes. Those should all produce equivalent results, or fail outright, > we suppose. This produces a different user-visible result, so we > probably ought to test that we get that result. For example, we might > check that certain files end up with a link count of 1 or 2, as > appropriate. That makes sense to me too. I'm submitting a new patch which includes a new file 010_links.pl. That test takes care of generating full and incremental, backups, then combines them with --link mode and ensures that the hard link count for all files under PGDATA is as expected based on the operations that were applied in the cluster. I kept the "support" for PG_TEST_PG_COMBINEBACKUP_MODE=--link around, just in case someone wants to run that and verify those outcomes with the --link mode. At least now we might have a proper test for checking --link, which always runs independently of the env variable. > Does link() work on Windows? link is a function available only in Linux. The equivalent function in Windows is CreateHardLink. However, in Postgres link works in either of the operating systems because we implement a link function, which wraps the call to CreateHardLink. That's coded in the file src/port/win32link.c > I'm not sure what to do about the issue that using --link trashes the > input cluster if you subsequently start the database or otherwise > modify any hard-linked files. Keep in mind that, for command-line > arguments other than the first, these are incremental backups, and you > already can't run postgres on those directories. Only the first input > directory, which is a full backup not an incremental, is a potential > target for an unexpected database start. I'm tentatively inclined to > think we shouldn't modify the input directories and just emit a > warning like: > > warning: --link mode was used; any modifications to the output > directory may destructively modify input directories The previous patch already had a warning. But I enjoyed your message, so I slightly changed the warning and the docs in the new version of the patch. Best regards, Israel.
v2-0001-pg_combinebackup-add-support-for-hard-links.patch
Description: Binary data