On Tue, Sep 10, 2024 at 1:31 AM Robert Haas <robertmh...@gmail.com> wrote: > > I would rather that you didn't add simple_ptr_list_destroy_deep() > given that you don't need it for this patch series. > Done.
> + > "\"%s\" unexpected file in the tar format backup", > > This doesn't seem grammatical to me. Perhaps change this to: file > \"%s\" is not expected in a tar format backup > Ok, updated in the attached version. > + /* We are only interested in files that are not in the ignore list. */ > + if (member->is_directory || member->is_link || > + should_ignore_relpath(mystreamer->context, member->pathname)) > + return; > > Doesn't this need to happen after we add pg_tblspc/$OID to the path, > rather than before? I bet this doesn't work correctly for files in > user-defined tablespaces, compared to the way it work for a > directory-format backup. > > + char temp[MAXPGPATH]; > + > + /* Copy original name at temporary space */ > + memcpy(temp, member->pathname, MAXPGPATH); > + > + snprintf(member->pathname, MAXPGPATH, "%s/%d/%s", > + "pg_tblspc", mystreamer->tblspc_oid, temp); > > I don't like this at all. This function doesn't have any business > modifying the astreamer_member, and it doesn't need to. It can just do > char *pathname; char tmppathbuf[MAXPGPATH] and then set pathname to > either member->pathname or tmppathbuf depending on > OidIsValid(tblspcoid). Also, shouldn't this be using %u, not %d? > True, fixed in the attached version. > + mystreamer->mfile = (void *) m; > > Either the cast to void * isn't necessary, or it indicates that > there's a type mismatch that should be fixed. > Fixed -- was added in the very first version and forgotten in later updates. > + * We could have these checks while receiving contents. However, since > + * contents are received in multiple iterations, this would result in > + * these lengthy checks being performed multiple times. Instead, > setting > + * flags here and using them before proceeding with verification will > be > + * more efficient. > > Seems unnecessary to explain this. > Removed. > + Assert(mystreamer->verify_checksum); > + > + /* Should have came for the right file */ > + Assert(strcmp(member->pathname, m->pathname) == 0); > + > + /* > + * The checksum context should match the type noted in the backup > + * manifest. > + */ > + Assert(checksum_ctx->type == m->checksum_type); > > What do you think about: > > Assert(m != NULL && !m->bad); > Assert(checksum_ctx->type == m->checksum_type); > Assert(strcmp(member->pathname, m->pathname) == 0); > > Or possibly change the first one to Assert(should_verify_checksum(m))? > LGTM. > + memcpy(&control_file, streamer->bbs_buffer.data, > sizeof(ControlFileData)); > > This probably doesn't really hurt anything, but it's a bit ugly. You > first use astreamer_buffer_until() to force the entire file into a > buffer. And then here, you copy the entire file into a second buffer > which is exactly the same except that it's guaranteed to be properly > aligned. It would be possible to include a ControlFileData in > astreamer_verify and copy the bytes directly into it (you'd need a > second astreamer_verify field for the number of bytes already copied > into that structure). I'm not 100% sure that's worth the code, but it > seems like it wouldn't take more than a few lines, so perhaps it is. > I think we could skip this memcpy() and directly cast streamer->bbs_buffer.data to ControlFileData *, as we already ensure that the correct length is being read just before this memcpy(). Did the same in the attached version. > +/* > + * Verify plain backup. > + */ > +static void > +verify_plain_backup(verifier_context *context) > +{ > + Assert(context->format == 'p'); > + verify_backup_directory(context, NULL, context->backup_directory); > +} > + > > This seems like a waste of space. > Yeah, but aim to keep the function name more self-explanatory and consistent with the naming style. > +verify_tar_backup(verifier_context *context) > > I like this a lot better now! I'm still not quite sure about the > decision to have the ignore list apply to both the backup directory > and the tar file contents -- but given the low participation on this > thread, I don't think we have much chance of getting feedback from > anyone else right now, so let's just do it the way you have it and we > can change it later if someone shows up to complain. > Ok. > +verify_all_tar_files(verifier_context *context, SimplePtrList *tarfiles) > > I think this code could be moved into its only caller instead of > having a separate function. And then if you do that, maybe > verify_one_tar_file could be renamed to just verify_tar_file. Or > perhaps that function could also be removed and just move the code > into the caller. It's not much code and not very deeply nested. > Similarly create_archive_verifier() could be considered for this > treatment. Maybe inlining all of these is too much and the result will > look messy, but I think we should at least try to combine some of > them. > I have removed verify_all_tar_files() and renamed verify_one_tar_file() as suggested. However, I can't merge further because I need verify_tar_file() (formerly verify_one_tar_file()) to remain a separate function. This way, regardless of whether it succeeds or encounters an error, I can easily perform cleanup afterward. On Tue, Sep 10, 2024 at 10:54 PM Robert Haas <robertmh...@gmail.com> wrote: > > + pg_log_error("pg_waldump cannot read from a tar"); > > "tar" isn't normally used as a noun as you do here, so I think this > should say "pg_waldump cannot read tar files". > Done. > Technically, the position of this check could lead to an unnecessary > failure, if -n wasn't specified but pg_wal.tar(.whatever) also doesn't > exist on disk. But I think it's OK to ignore that case. > > However, I also notice this change to the TAP tests in a few places: > > - [ 'pg_verifybackup', $tempdir ], > + [ 'pg_verifybackup', '-Fp', $tempdir ], > > It's not the end of the world to have to make a change like this, but > it seems easy to do better. Just postpone the call to > find_backup_format() until right after we call parse_manifest_file(). > That also means postponing the check mentioned above until right after > that, but that's fine: after parse_manifest_file() and then > find_backup_format(), you can do if (!no_parse_wal && context.format > == 't') { bail out }. > Done. > + if (stat(path, &sb) == 0) > + result = 'p'; > + else > + { > + if (errno != ENOENT) > + { > + pg_log_error("cannot determine backup format : %m"); > + pg_log_error_hint("Try \"%s --help\" for more > information.", progname); > + exit(1); > + } > + > + /* Otherwise, it is assumed to be a tar format backup. */ > + result = 't'; > + } > > This doesn't look good, for a few reasons: > > 1. It would be clearer to structure this as if (stat(...) == 0) result > = 'p'; else if (errno == ENOENT) result = 't'; else { report an error; > } instead of the way you have it. > > 2. "cannot determine backup format" is not an appropriate way to > report the failure of stat(). The appropriate message is "could not > stat file \"%s\"". > > 3. It is almost never correct to put a space before a colon in an error > message. > > 4. The hint doesn't look helpful, or necessary. I think you can just > delete that. > > Regarding both point #2 and point #4, I think we should ask ourselves > how stat() could realistically fail here. On my system (macOS), the > document failed modes for stat() are EACCES (i.e. permission denied), > EFAULT (i.e. we have a bug in pg_verifybackup), EIO (I/O Error), ELOOP > (symlink loop), ENAMETOOLONG, ENOENT, ENOTDIR, and EOVERFLOW. In none > of those cases does it seem likely that specifying the format manually > will help anything. Thus, suggesting that the user look at the help, > presumably to find --format, is unlikely to solve anything, and > telling them that the error happened while trying to determine the > backup format isn't really helping anything, either. What the user > needs to know is that it was stat() that failed, and the pathname for > which it failed. Then they need to sort out whatever problem is > causing them to get one of those really weird errors. > Done. > - of the backup. The backup must be stored in the "plain" > - format; a "tar" format backup can be checked after extracting it. > + of the backup. The backup must be stored in the "plain" or "tar" > + format. Verification is supported for <literal>gzip</literal>, > + <literal>lz4</literal>, and <literal>zstd</literal> compressed tar > backup; > + any other compressed format backups can be checked after decompressing > them. > > I don't think that we need to say that the backup must be stored in > the plain or tar format, because those are the only backup formats > pg_basebackup knows about. Similarly, it doesn't seem help to me to > enumerate all the compression algorithms that pg_basebackup supports > and say we only support those; what else would a user expect? > > What I would do is replace the original sentence ("The backup must be > stored...") with: The backup may be stored either in the "plain" or > the "tar" format; this includes "tar" backups compressed with any > algorithm supported by pg_basebackup. However, at present, WAL > verification is supported only for plain-format backups. Therefore, if > the backup is stored in "tar" format, the <literal>-n, > --no-parse-wal<literal> option should be used. > Done > + # Add tar backup format option > + push @backup, ('-F', 't'); > + # Add switch to skip WAL verification. > + push @verify, ('-n'); > > Say why, not what. The second comment should say something like "WAL > verification not yet supported for tar-format backups". > Done. > + "$format backup fails with algorithm \"$algorithm\""); > + $primary->command_ok(\@backup, "$format backup ok with > algorithm \"$algorithm\""); > + ok(-f "$backup_path/backup_manifest", "$format backup > manifest exists"); > + "verify $format backup with algorithm \"$algorithm\""); > > Personally I would change "$format" to "$format format" in all of > these places, so that we talk about a "tar format backup" or a "plain > format backup" instead of a "tar backup" or a "plain backup". > Done. > + 'skip_on_windows' => 1 > > I don't understand why 4 of the 7 new tests are skipped on Windows. > The existing "skip" message for this parameter says "unix-style > permissions not supported on Windows" but that doesn't seem applicable > for any of the new cases, and I couldn't find a comment about it, > either. > I was a bit unsure whether Windows could handle unpacking and repacking tar files and the required path formats for these tests but the "Cirrus CI / Windows - Server 2019, VS 2019" workflow doesn’t have any issues with them. I’ve removed the flag. > + my @files = glob("*"); > + system_or_bail($tar, '-cf', "$backup_path/$archive", @files); > > Why not just system_or_bail($tar, '-cf', "$backup_path/$archive", '.')? > Doesn't suit since re-packing includes "./" at the beginning of each file path. > Also, instead of having separate entries in the test array to do > basically the same thing on Windows, could we just iterate through the > test array twice and do everything once for plain format and then a > second time for tar format, and do the tests once for each? I don't > think that idea QUITE works, because the open_file_fails, > open_directory_fails, and search_directory_fails tests are really not > applicable to tar format. But we could rename skip_on_windows to > tests_file_permissions and skip those both on Windows and for tar > format. But aside from that, I don't quite see why it makes sense to, > for example, test extra_file for both formats but not > extra_tablespace_file, and indeed it seems like an important bit of > test coverage. > Added test extra_file and missing_file test for tablespace as well. > I also feel like we should have tests someplace that add extra files > to a tar-format backup in the backup directory (e.g. 1234567.tar, or > wug.tar, or 123456.wug) or remove entire files. > If I am not missing something, tar_backup_unexpected_file test does that. I have added a test that removes the tablespace archive in the attached version. The updated version attached. Thank you for the review ! Regards, Amul
From e02e69b65e70553313ea1287e65d77cd34a00688 Mon Sep 17 00:00:00 2001 From: Amul Sul <amul.sul@enterprisedb.com> Date: Wed, 14 Aug 2024 10:42:37 +0530 Subject: [PATCH v14 06/12] Refactor: split verify_backup_file() function and rename it. The function verify_backup_file() has now been renamed to verify_plain_backup_file() to make it clearer that it is specifically used for verifying files in a plain backup. Similarly, in a future patch, we would have a verify_tar_backup_file() function for verifying TAR backup files. In addition to that, moved the manifest entry verification code into a new function called verify_manifest_entry() so that it can be reused for tar backup verification. If verify_manifest_entry() doesn't find an entry, it reports an error as before and returns NULL to the caller. This is why a NULL check is added to should_verify_checksum(). --- src/bin/pg_verifybackup/pg_verifybackup.c | 58 +++++++++++++++-------- src/bin/pg_verifybackup/pg_verifybackup.h | 6 ++- 2 files changed, 42 insertions(+), 22 deletions(-) diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c index 3fcfb167217..5bfc98e7874 100644 --- a/src/bin/pg_verifybackup/pg_verifybackup.c +++ b/src/bin/pg_verifybackup/pg_verifybackup.c @@ -64,8 +64,8 @@ static void report_manifest_error(JsonManifestParseContext *context, static void verify_backup_directory(verifier_context *context, char *relpath, char *fullpath); -static void verify_backup_file(verifier_context *context, - char *relpath, char *fullpath); +static void verify_plain_backup_file(verifier_context *context, + char *relpath, char *fullpath); static void verify_control_file(const char *controlpath, uint64 manifest_system_identifier); static void report_extra_backup_files(verifier_context *context); @@ -570,7 +570,7 @@ verify_backup_directory(verifier_context *context, char *relpath, newrelpath = psprintf("%s/%s", relpath, filename); if (!should_ignore_relpath(context, newrelpath)) - verify_backup_file(context, newrelpath, newfullpath); + verify_plain_backup_file(context, newrelpath, newfullpath); pfree(newfullpath); pfree(newrelpath); @@ -591,7 +591,8 @@ verify_backup_directory(verifier_context *context, char *relpath, * verify_backup_directory. */ static void -verify_backup_file(verifier_context *context, char *relpath, char *fullpath) +verify_plain_backup_file(verifier_context *context, char *relpath, + char *fullpath) { struct stat sb; manifest_file *m; @@ -627,6 +628,32 @@ verify_backup_file(verifier_context *context, char *relpath, char *fullpath) return; } + /* Check the backup manifest entry for this file. */ + m = verify_manifest_entry(context, relpath, sb.st_size); + + /* + * Validate the manifest system identifier, not available in manifest + * version 1. + */ + if (context->manifest->version != 1 && + strcmp(relpath, "global/pg_control") == 0 && + m != NULL && m->matched && !m->bad) + verify_control_file(fullpath, context->manifest->system_identifier); + + /* Update statistics for progress report, if necessary */ + if (show_progress && !context->skip_checksums && + should_verify_checksum(m)) + total_size += m->size; +} + +/* + * Verify file and its size entry in the manifest. + */ +manifest_file * +verify_manifest_entry(verifier_context *context, char *relpath, int64 filesize) +{ + manifest_file *m; + /* Check whether there's an entry in the manifest hash. */ m = manifest_files_lookup(context->manifest->files, relpath); if (m == NULL) @@ -634,40 +661,29 @@ verify_backup_file(verifier_context *context, char *relpath, char *fullpath) report_backup_error(context, "\"%s\" is present on disk but not in the manifest", relpath); - return; + return NULL; } /* Flag this entry as having been encountered in the filesystem. */ m->matched = true; /* Check that the size matches. */ - if (m->size != sb.st_size) + if (m->size != filesize) { report_backup_error(context, "\"%s\" has size %lld on disk but size %zu in the manifest", - relpath, (long long int) sb.st_size, m->size); + relpath, (long long int) filesize, m->size); m->bad = true; } - /* - * Validate the manifest system identifier, not available in manifest - * version 1. - */ - if (context->manifest->version != 1 && - strcmp(relpath, "global/pg_control") == 0) - verify_control_file(fullpath, context->manifest->system_identifier); - - /* Update statistics for progress report, if necessary */ - if (show_progress && !context->skip_checksums && - should_verify_checksum(m)) - total_size += m->size; - /* * We don't verify checksums at this stage. We first finish verifying that * we have the expected set of files with the expected sizes, and only * afterwards verify the checksums. That's because computing checksums may * take a while, and we'd like to report more obvious problems quickly. */ + + return m; } /* @@ -830,7 +846,7 @@ verify_file_checksum(verifier_context *context, manifest_file *m, /* * Double-check that we read the expected number of bytes from the file. - * Normally, a file size mismatch would be caught in verify_backup_file + * Normally, a file size mismatch would be caught in verify_manifest_entry * and this check would never be reached, but this provides additional * safety and clarity in the event of concurrent modifications or * filesystem misbehavior. diff --git a/src/bin/pg_verifybackup/pg_verifybackup.h b/src/bin/pg_verifybackup/pg_verifybackup.h index d8c566ed587..ff9476e356e 100644 --- a/src/bin/pg_verifybackup/pg_verifybackup.h +++ b/src/bin/pg_verifybackup/pg_verifybackup.h @@ -37,7 +37,8 @@ typedef struct manifest_file } manifest_file; #define should_verify_checksum(m) \ - (((m)->matched) && !((m)->bad) && (((m)->checksum_type) != CHECKSUM_TYPE_NONE)) + (((m) != NULL) && ((m)->matched) && !((m)->bad) && \ + (((m)->checksum_type) != CHECKSUM_TYPE_NONE)) /* * Define a hash table which we can use to store information about the files @@ -93,6 +94,9 @@ typedef struct verifier_context bool saw_any_error; } verifier_context; +extern manifest_file *verify_manifest_entry(verifier_context *context, + char *relpath, int64 filesize); + extern void report_backup_error(verifier_context *context, const char *pg_restrict fmt,...) pg_attribute_printf(2, 3); -- 2.18.0
From 01286f76085dcaf72238eb6ad0a80f38d9ee05a3 Mon Sep 17 00:00:00 2001 From: Amul Sul <amul.sul@enterprisedb.com> Date: Wed, 14 Aug 2024 15:14:15 +0530 Subject: [PATCH v14 07/12] Refactor: split verify_file_checksum() function. Move the core functionality of verify_file_checksum to a new function to reuse it instead of duplicating the code. The verify_file_checksum() function is designed to take a file path, open and read the file contents, and then calculate the checksum. However, for TAR backups, instead of a file path, we receive the file content in chunks, and the checksum needs to be calculated incrementally. While the initial operations for plain and TAR backup checksum calculations differ, the final checks and error handling are the same. By moving the shared logic to a separate function, we can reuse the code for both types of backups. --- src/bin/pg_verifybackup/pg_verifybackup.c | 20 +++++++++++++++++--- src/bin/pg_verifybackup/pg_verifybackup.h | 3 +++ 2 files changed, 20 insertions(+), 3 deletions(-) diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c index 5bfc98e7874..e44d0377cd5 100644 --- a/src/bin/pg_verifybackup/pg_verifybackup.c +++ b/src/bin/pg_verifybackup/pg_verifybackup.c @@ -792,8 +792,6 @@ verify_file_checksum(verifier_context *context, manifest_file *m, int fd; int rc; size_t bytes_read = 0; - uint8 checksumbuf[PG_CHECKSUM_MAX_LENGTH]; - int checksumlen; /* Open the target file. */ if ((fd = open(fullpath, O_RDONLY | PG_BINARY, 0)) < 0) @@ -844,6 +842,22 @@ verify_file_checksum(verifier_context *context, manifest_file *m, if (rc < 0) return; + /* Do the final computation and verification. */ + verify_checksum(context, m, &checksum_ctx, bytes_read); +} + +/* + * A helper function to finalize checksum computation and verify it against the + * backup manifest information. + */ +void +verify_checksum(verifier_context *context, manifest_file *m, + pg_checksum_context *checksum_ctx, int64 bytes_read) +{ + const char *relpath = m->pathname; + int checksumlen; + uint8 checksumbuf[PG_CHECKSUM_MAX_LENGTH]; + /* * Double-check that we read the expected number of bytes from the file. * Normally, a file size mismatch would be caught in verify_manifest_entry @@ -860,7 +874,7 @@ verify_file_checksum(verifier_context *context, manifest_file *m, } /* Get the final checksum. */ - checksumlen = pg_checksum_final(&checksum_ctx, checksumbuf); + checksumlen = pg_checksum_final(checksum_ctx, checksumbuf); if (checksumlen < 0) { report_backup_error(context, diff --git a/src/bin/pg_verifybackup/pg_verifybackup.h b/src/bin/pg_verifybackup/pg_verifybackup.h index ff9476e356e..fe0ce8a89aa 100644 --- a/src/bin/pg_verifybackup/pg_verifybackup.h +++ b/src/bin/pg_verifybackup/pg_verifybackup.h @@ -96,6 +96,9 @@ typedef struct verifier_context extern manifest_file *verify_manifest_entry(verifier_context *context, char *relpath, int64 filesize); +extern void verify_checksum(verifier_context *context, manifest_file *m, + pg_checksum_context *checksum_ctx, + int64 bytes_read); extern void report_backup_error(verifier_context *context, const char *pg_restrict fmt,...) -- 2.18.0
From 616f5a911dbc4c82e999d3b64a98bec7c9f557a8 Mon Sep 17 00:00:00 2001 From: Amul Sul <amul.sul@enterprisedb.com> Date: Wed, 21 Aug 2024 10:51:23 +0530 Subject: [PATCH v14 08/12] Refactor: split verify_control_file. Separated the manifest entry verification code into a new function and introduced the should_verify_control_data() macro, similar to should_verify_checksum(). Like verify_file_checksum(), verify_control_file() is too design to accept the pg_control file patch which will be opened and respective information will be verified. But, in case of tar backup we would be having pg_control file contents instead, that needs to be verified in the same way. For that reason the code that doing the verification is separated into separate function to so that can be reused for the tar backup verification as well. --- src/bin/pg_verifybackup/pg_verifybackup.c | 42 ++++++++++------------- src/bin/pg_verifybackup/pg_verifybackup.h | 14 ++++++++ 2 files changed, 33 insertions(+), 23 deletions(-) diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c index e44d0377cd5..d04e1d8c8ac 100644 --- a/src/bin/pg_verifybackup/pg_verifybackup.c +++ b/src/bin/pg_verifybackup/pg_verifybackup.c @@ -66,8 +66,6 @@ static void verify_backup_directory(verifier_context *context, char *relpath, char *fullpath); static void verify_plain_backup_file(verifier_context *context, char *relpath, char *fullpath); -static void verify_control_file(const char *controlpath, - uint64 manifest_system_identifier); static void report_extra_backup_files(verifier_context *context); static void verify_backup_checksums(verifier_context *context); static void verify_file_checksum(verifier_context *context, @@ -631,14 +629,20 @@ verify_plain_backup_file(verifier_context *context, char *relpath, /* Check the backup manifest entry for this file. */ m = verify_manifest_entry(context, relpath, sb.st_size); - /* - * Validate the manifest system identifier, not available in manifest - * version 1. - */ - if (context->manifest->version != 1 && - strcmp(relpath, "global/pg_control") == 0 && - m != NULL && m->matched && !m->bad) - verify_control_file(fullpath, context->manifest->system_identifier); + /* Validate the pg_control information */ + if (should_verify_control_data(context->manifest, m)) + { + ControlFileData *control_file; + bool crc_ok; + + pg_log_debug("reading \"%s\"", fullpath); + control_file = get_controlfile_by_exact_path(fullpath, &crc_ok); + + verify_control_data(control_file, fullpath, crc_ok, + context->manifest->system_identifier); + /* Release memory. */ + pfree(control_file); + } /* Update statistics for progress report, if necessary */ if (show_progress && !context->skip_checksums && @@ -687,18 +691,13 @@ verify_manifest_entry(verifier_context *context, char *relpath, int64 filesize) } /* - * Sanity check control file and validate system identifier against manifest - * system identifier. + * Sanity check control file data and validate system identifier against + * manifest system identifier. */ -static void -verify_control_file(const char *controlpath, uint64 manifest_system_identifier) +void +verify_control_data(ControlFileData *control_file, const char *controlpath, + bool crc_ok, uint64 manifest_system_identifier) { - ControlFileData *control_file; - bool crc_ok; - - pg_log_debug("reading \"%s\"", controlpath); - control_file = get_controlfile_by_exact_path(controlpath, &crc_ok); - /* Control file contents not meaningful if CRC is bad. */ if (!crc_ok) report_fatal_error("%s: CRC is incorrect", controlpath); @@ -714,9 +713,6 @@ verify_control_file(const char *controlpath, uint64 manifest_system_identifier) controlpath, (unsigned long long) manifest_system_identifier, (unsigned long long) control_file->system_identifier); - - /* Release memory. */ - pfree(control_file); } /* diff --git a/src/bin/pg_verifybackup/pg_verifybackup.h b/src/bin/pg_verifybackup/pg_verifybackup.h index fe0ce8a89aa..818064c6eed 100644 --- a/src/bin/pg_verifybackup/pg_verifybackup.h +++ b/src/bin/pg_verifybackup/pg_verifybackup.h @@ -40,6 +40,17 @@ typedef struct manifest_file (((m) != NULL) && ((m)->matched) && !((m)->bad) && \ (((m)->checksum_type) != CHECKSUM_TYPE_NONE)) +/* + * Validate the control file and its system identifier against the manifest + * system identifier. Note that this feature is not available in manifest + * version 1. This validation should only be performed after the manifest entry + * validation for the pg_control file has been completed without errors. + */ +#define should_verify_control_data(manifest, m) \ + (((manifest)->version != 1) && \ + ((m) != NULL) && ((m)->matched) && !((m)->bad) && \ + (strcmp((m)->pathname, "global/pg_control") == 0)) + /* * Define a hash table which we can use to store information about the files * mentioned in the backup manifest. @@ -99,6 +110,9 @@ extern manifest_file *verify_manifest_entry(verifier_context *context, extern void verify_checksum(verifier_context *context, manifest_file *m, pg_checksum_context *checksum_ctx, int64 bytes_read); +extern void verify_control_data(ControlFileData *control_file, + const char *controlpath, bool crc_ok, + uint64 manifest_system_identifier); extern void report_backup_error(verifier_context *context, const char *pg_restrict fmt,...) -- 2.18.0
From 8b60229b2ae59e7c01a2fdb83614b1aa362d52ab Mon Sep 17 00:00:00 2001 From: Amul Sul <amul.sul@enterprisedb.com> Date: Thu, 8 Aug 2024 16:01:33 +0530 Subject: [PATCH v14 09/12] Add simple_ptr_list_destroy() and simple_ptr_list_destroy_deep() API. We didn't have any helper function to destroy SimplePtrList, likely because it wasn't needed before, but it's required in a later patch in this set. I've added two functions for this purpose, inspired by list_free() and list_free_deep(). --- src/fe_utils/simple_list.c | 19 +++++++++++++++++++ src/include/fe_utils/simple_list.h | 1 + 2 files changed, 20 insertions(+) diff --git a/src/fe_utils/simple_list.c b/src/fe_utils/simple_list.c index 2d88eb54067..c07e6bd9180 100644 --- a/src/fe_utils/simple_list.c +++ b/src/fe_utils/simple_list.c @@ -173,3 +173,22 @@ simple_ptr_list_append(SimplePtrList *list, void *ptr) list->head = cell; list->tail = cell; } + +/* + * Destroy only pointer list and not the pointed-to element + */ +void +simple_ptr_list_destroy(SimplePtrList *list) +{ + SimplePtrListCell *cell; + + cell = list->head; + while (cell != NULL) + { + SimplePtrListCell *next; + + next = cell->next; + pg_free(cell); + cell = next; + } +} diff --git a/src/include/fe_utils/simple_list.h b/src/include/fe_utils/simple_list.h index d42ecded8ed..c83ab6f77e4 100644 --- a/src/include/fe_utils/simple_list.h +++ b/src/include/fe_utils/simple_list.h @@ -66,5 +66,6 @@ extern void simple_string_list_destroy(SimpleStringList *list); extern const char *simple_string_list_not_touched(SimpleStringList *list); extern void simple_ptr_list_append(SimplePtrList *list, void *ptr); +extern void simple_ptr_list_destroy(SimplePtrList *list); #endif /* SIMPLE_LIST_H */ -- 2.18.0
From d253d21df5f3137461e6410b2ee590ca052a3aea Mon Sep 17 00:00:00 2001 From: Amul Sul <amul.sul@enterprisedb.com> Date: Wed, 21 Aug 2024 11:03:44 +0530 Subject: [PATCH v14 10/12] pg_verifybackup: Add backup format and compression option ---- NOTE: This patch is not meant to be committed separately. It should be squashed with the next patch that implements tar format support. ---- --- src/bin/pg_verifybackup/pg_verifybackup.c | 68 ++++++++++++++++++++++- src/bin/pg_verifybackup/pg_verifybackup.h | 1 + 2 files changed, 67 insertions(+), 2 deletions(-) diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c index d04e1d8c8ac..c1542983b93 100644 --- a/src/bin/pg_verifybackup/pg_verifybackup.c +++ b/src/bin/pg_verifybackup/pg_verifybackup.c @@ -62,6 +62,7 @@ static void report_manifest_error(JsonManifestParseContext *context, const char *fmt,...) pg_attribute_printf(2, 3) pg_attribute_noreturn(); +static char find_backup_format(verifier_context *context); static void verify_backup_directory(verifier_context *context, char *relpath, char *fullpath); static void verify_plain_backup_file(verifier_context *context, @@ -97,6 +98,7 @@ main(int argc, char **argv) {"exit-on-error", no_argument, NULL, 'e'}, {"ignore", required_argument, NULL, 'i'}, {"manifest-path", required_argument, NULL, 'm'}, + {"format", required_argument, NULL, 'F'}, {"no-parse-wal", no_argument, NULL, 'n'}, {"progress", no_argument, NULL, 'P'}, {"quiet", no_argument, NULL, 'q'}, @@ -118,6 +120,7 @@ main(int argc, char **argv) progname = get_progname(argv[0]); memset(&context, 0, sizeof(context)); + context.format = '\0'; if (argc > 1) { @@ -154,7 +157,7 @@ main(int argc, char **argv) simple_string_list_append(&context.ignore_list, "recovery.signal"); simple_string_list_append(&context.ignore_list, "standby.signal"); - while ((c = getopt_long(argc, argv, "ei:m:nPqsw:", long_options, NULL)) != -1) + while ((c = getopt_long(argc, argv, "eF:i:m:nPqsw:", long_options, NULL)) != -1) { switch (c) { @@ -173,6 +176,15 @@ main(int argc, char **argv) manifest_path = pstrdup(optarg); canonicalize_path(manifest_path); break; + case 'F': + if (strcmp(optarg, "p") == 0 || strcmp(optarg, "plain") == 0) + context.format = 'p'; + else if (strcmp(optarg, "t") == 0 || strcmp(optarg, "tar") == 0) + context.format = 't'; + else + pg_fatal("invalid backup format \"%s\", must be \"plain\" or \"tar\"", + optarg); + break; case 'n': no_parse_wal = true; break; @@ -261,6 +273,21 @@ main(int argc, char **argv) */ context.manifest = parse_manifest_file(manifest_path); + /* Determine the backup format if it hasn't been specified. */ + if (context.format == '\0') + context.format = find_backup_format(&context); + + /* + * XXX: In the future, we should consider enhancing pg_waldump to read + * WAL files from the tar file. + */ + if (!no_parse_wal && context.format == 't') + { + pg_log_error("pg_waldump cannot read tar files"); + pg_log_error_hint("You must use -n or --no-parse-wal when verifying a tar-format backup."); + exit(1); + } + /* * Now scan the files in the backup directory. At this stage, we verify * that every file on disk is present in the manifest and that the sizes @@ -279,8 +306,13 @@ main(int argc, char **argv) /* * Now do the expensive work of verifying file checksums, unless we were * told to skip it. + * + * We were only checking the plain backup here. For the tar backup, file + * checksums verification (if requested) will be done immediately when the + * file is accessed, as we don't have random access to the files like we + * do with plain backups. */ - if (!context.skip_checksums) + if (!context.skip_checksums && context.format == 'p') verify_backup_checksums(&context); /* @@ -981,6 +1013,37 @@ should_ignore_relpath(verifier_context *context, const char *relpath) return false; } +/* + * To detect the backup format, it checks for the PG_VERSION file in the backup + * directory. If found, it will be considered a plain-format backup; otherwise, + * it will be assumed to be a tar-format backup. + */ +static char +find_backup_format(verifier_context *context) +{ + char result; + char *path; + struct stat sb; + + /* Should be here only if the backup format is unknown */ + Assert(context->format == '\0'); + + /* Check PG_VERSION file. */ + path = psprintf("%s/%s", context->backup_directory, "PG_VERSION"); + if (stat(path, &sb) == 0) + result = 'p'; + else if (errno == ENOENT) + result = 't'; + else + { + pg_log_error("could not stat file \"%s\": %m", path); + exit(1); + } + pfree(path); + + return result; +} + /* * Print a progress report based on the global variables. * @@ -1038,6 +1101,7 @@ usage(void) printf(_(" -e, --exit-on-error exit immediately on error\n")); printf(_(" -i, --ignore=RELATIVE_PATH ignore indicated path\n")); printf(_(" -m, --manifest-path=PATH use specified path for manifest\n")); + printf(_(" -F, --format=p|t backup format (plain, tar)\n")); printf(_(" -n, --no-parse-wal do not try to parse WAL files\n")); printf(_(" -P, --progress show progress information\n")); printf(_(" -q, --quiet do not print any output, except for errors\n")); diff --git a/src/bin/pg_verifybackup/pg_verifybackup.h b/src/bin/pg_verifybackup/pg_verifybackup.h index 818064c6eed..80031ad4dbc 100644 --- a/src/bin/pg_verifybackup/pg_verifybackup.h +++ b/src/bin/pg_verifybackup/pg_verifybackup.h @@ -100,6 +100,7 @@ typedef struct verifier_context manifest_data *manifest; char *backup_directory; SimpleStringList ignore_list; + char format; /* backup format: p(lain)/t(ar) */ bool skip_checksums; bool exit_on_error; bool saw_any_error; -- 2.18.0
From 68a85869b9a8c3b872035661df85976c890116f4 Mon Sep 17 00:00:00 2001 From: Amul Sul <amul.sul@enterprisedb.com> Date: Wed, 21 Aug 2024 12:49:04 +0530 Subject: [PATCH v14 11/12] pg_verifybackup: Read tar files and verify its contents This patch implements TAR format backup verification. For progress reporting support, we perform this verification in two passes: the first pass calculates total_size, and the second pass updates done_size as verification progresses. For the verification, in the first pass, we call precheck_tar_backup_file(), which performs basic verification by expecting only base.tar, pg_wal.tar, or <tablespaceoid>.tar files and raises an error for any other files. It also determines the compression type of the archive file. All this information is stored in a newly added tarFile struct, which is appended to a list that will be used in the second pass for the final verification. In the second pass, the tar archives are read, decompressed, and the required verification is carried out. For reading and decompression, fe_utils/astreamer.h is used. For verification, a new archive streamer has been added in astreamer_verify.c to handle TAR member files and their contents; see astreamer_verify_content() for details. The stack of astreamers will be set up for each TAR file in verify_tar_content(), depending on its compression type which is detected in the first pass. When information about a TAR member file (i.e., ASTREAMER_MEMBER_HEADER) is received, we first verify its entry against the backup manifest. We then decide if further checks are needed, such as checksum verification and control data verification (if it is a pg_control file), once the member file contents are received. Although this decision could be made when the contents are received, it is more efficient to make it earlier since the member file contents are received in multiple iterations. In short, we process ASTREAMER_MEMBER_CONTENTS multiple times but only once for other ASTREAMER_MEMBER_* cases. We maintain this information in the astreamer_verify structure for each member file, which is reset when the file ends. Unlike in a plain backup, checksum verification here occurs in two steps. First, as the contents are received, the checksum is computed incrementally (see member_compute_checksum). Then, at the end of processing the member file, the final verification is performed (see member_verify_checksum). Similarly, during the content receiving stage, if the file is pg_control, the data will be copied into a local buffer (see member_copy_control_data). The verification will then be carried out at the end of the member file processing (see member_verify_control_data) --- src/bin/pg_verifybackup/Makefile | 2 + src/bin/pg_verifybackup/astreamer_verify.c | 358 +++++++++++++++++++++ src/bin/pg_verifybackup/meson.build | 1 + src/bin/pg_verifybackup/pg_verifybackup.c | 326 ++++++++++++++++++- src/bin/pg_verifybackup/pg_verifybackup.h | 6 + src/tools/pgindent/typedefs.list | 2 + 6 files changed, 692 insertions(+), 3 deletions(-) create mode 100644 src/bin/pg_verifybackup/astreamer_verify.c diff --git a/src/bin/pg_verifybackup/Makefile b/src/bin/pg_verifybackup/Makefile index 7c045f142e8..374d4a8afd1 100644 --- a/src/bin/pg_verifybackup/Makefile +++ b/src/bin/pg_verifybackup/Makefile @@ -17,10 +17,12 @@ top_builddir = ../../.. include $(top_builddir)/src/Makefile.global # We need libpq only because fe_utils does. +override CPPFLAGS := -I$(libpq_srcdir) $(CPPFLAGS) LDFLAGS_INTERNAL += -L$(top_builddir)/src/fe_utils -lpgfeutils $(libpq_pgport) OBJS = \ $(WIN32RES) \ + astreamer_verify.o \ pg_verifybackup.o all: pg_verifybackup diff --git a/src/bin/pg_verifybackup/astreamer_verify.c b/src/bin/pg_verifybackup/astreamer_verify.c new file mode 100644 index 00000000000..b496e9320ea --- /dev/null +++ b/src/bin/pg_verifybackup/astreamer_verify.c @@ -0,0 +1,358 @@ +/*------------------------------------------------------------------------- + * + * astreamer_verify.c + * + * Extend fe_utils/astreamer.h archive streaming facility to verify TAR + * format backup. + * + * Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group + * + * src/bin/pg_verifybackup/astreamer_verify.c + * + *------------------------------------------------------------------------- + */ + +#include "postgres_fe.h" + +#include "pg_verifybackup.h" + +typedef struct astreamer_verify +{ + astreamer base; + verifier_context *context; + char *archive_name; + Oid tblspc_oid; + pg_checksum_context *checksum_ctx; + + /* Hold information for a member file verification */ + manifest_file *mfile; + int64 received_bytes; + bool verify_checksum; + bool verify_control_data; +} astreamer_verify; + +static void astreamer_verify_content(astreamer *streamer, + astreamer_member *member, + const char *data, int len, + astreamer_archive_context context); +static void astreamer_verify_finalize(astreamer *streamer); +static void astreamer_verify_free(astreamer *streamer); + +static void member_verify_header(astreamer *streamer, astreamer_member *member); +static void member_compute_checksum(astreamer *streamer, + astreamer_member *member, + const char *data, int len); +static void member_verify_checksum(astreamer *streamer); +static void member_copy_control_data(astreamer *streamer, + astreamer_member *member, + const char *data, int len); +static void member_verify_control_data(astreamer *streamer); +static void member_reset_info(astreamer *streamer); + +static const astreamer_ops astreamer_verify_ops = { + .content = astreamer_verify_content, + .finalize = astreamer_verify_finalize, + .free = astreamer_verify_free +}; + +/* + * Create a astreamer that can verifies content of a TAR file. + */ +astreamer * +astreamer_verify_content_new(astreamer *next, verifier_context *context, + char *archive_name, Oid tblspc_oid) +{ + astreamer_verify *streamer; + + streamer = palloc0(sizeof(astreamer_verify)); + *((const astreamer_ops **) &streamer->base.bbs_ops) = + &astreamer_verify_ops; + + streamer->base.bbs_next = next; + streamer->context = context; + streamer->archive_name = archive_name; + streamer->tblspc_oid = tblspc_oid; + initStringInfo(&streamer->base.bbs_buffer); + + if (!context->skip_checksums) + streamer->checksum_ctx = pg_malloc(sizeof(pg_checksum_context)); + + return &streamer->base; +} + +/* + * The main entry point of the archive streamer for verifying tar members. + */ +static void +astreamer_verify_content(astreamer *streamer, astreamer_member *member, + const char *data, int len, + astreamer_archive_context context) +{ + astreamer_verify *mystreamer = (astreamer_verify *) streamer; + + Assert(context != ASTREAMER_UNKNOWN); + + switch (context) + { + case ASTREAMER_MEMBER_HEADER: + + /* + * Perform the initial check and setup verification steps. + */ + member_verify_header(streamer, member); + break; + + case ASTREAMER_MEMBER_CONTENTS: + + /* + * Since we are receiving the member content in chunks, it must be + * processed according to the flags set by the member header + * processing routine. This includes performing incremental + * checksum computations and copying control data to the local + * buffer. + */ + if (mystreamer->verify_checksum) + member_compute_checksum(streamer, member, data, len); + + if (mystreamer->verify_control_data) + member_copy_control_data(streamer, member, data, len); + break; + + case ASTREAMER_MEMBER_TRAILER: + + /* + * We have reached the end of the member file. By this point, we + * should have successfully computed the checksum of the received + * content and copied the entire pg_control file data into our + * local buffer. We can now proceed with the final verification. + */ + if (mystreamer->verify_checksum) + member_verify_checksum(streamer); + + if (mystreamer->verify_control_data) + member_verify_control_data(streamer); + + /* + * Reset the temporary information stored for the verification. + */ + member_reset_info(streamer); + break; + + case ASTREAMER_ARCHIVE_TRAILER: + break; + + default: + /* Shouldn't happen. */ + pg_fatal("unexpected state while parsing tar file"); + } +} + +/* + * End-of-stream processing for a astreamer_verify stream. + */ +static void +astreamer_verify_finalize(astreamer *streamer) +{ + Assert(streamer->bbs_next == NULL); +} + +/* + * Free memory associated with a astreamer_verify stream. + */ +static void +astreamer_verify_free(astreamer *streamer) +{ + astreamer_verify *mystreamer = (astreamer_verify *) streamer; + + if (mystreamer->checksum_ctx) + pfree(mystreamer->checksum_ctx); + + pfree(streamer->bbs_buffer.data); + pfree(streamer); +} + +/* + * Verifies whether the tar member entry exists in the backup manifest. + * + * If the archive being processed is a tablespace, it prepares the necessary + * file path first. If a valid entry is found in the backup manifest, it then + * determines whether checksum and control data verification should be + * performed during file content processing. + */ +static void +member_verify_header(astreamer *streamer, astreamer_member *member) +{ + astreamer_verify *mystreamer = (astreamer_verify *) streamer; + manifest_file *m; + char pathname[MAXPGPATH]; + + /* We are only interested in normal files. */ + if (member->is_directory || member->is_link) + return; + + /* + * The backup manifest stores a relative path to the base directory for + * files belonging to a tablespace, while the tablespace backup tar + * archive does not include this path. Ensure the required path is + * prepared; otherwise, the manifest entry verification will fail. + */ + if (OidIsValid(mystreamer->tblspc_oid)) + snprintf(pathname, MAXPGPATH, "%s/%u/%s", + "pg_tblspc", mystreamer->tblspc_oid, member->pathname); + else + memcpy(pathname, member->pathname, MAXPGPATH); + + + /* Ignore any files that are listed in the ignore list. */ + if (should_ignore_relpath(mystreamer->context, pathname)) + return; + + /* Check the manifest entry */ + m = verify_manifest_entry(mystreamer->context, pathname, + member->size); + mystreamer->mfile = m; + + /* Prepare for checksum and control data verification. */ + mystreamer->verify_checksum = + (!mystreamer->context->skip_checksums && should_verify_checksum(m)); + mystreamer->verify_control_data = + should_verify_control_data(mystreamer->context->manifest, m); + + /* Initialize the context required for checksum verification. */ + if (mystreamer->verify_checksum && + pg_checksum_init(mystreamer->checksum_ctx, m->checksum_type) < 0) + { + report_backup_error(mystreamer->context, + "%s: could not initialize checksum of file \"%s\"", + mystreamer->archive_name, m->pathname); + + /* + * Checksum verification cannot be performed without proper context + * initialization. + */ + mystreamer->verify_checksum = false; + } +} + +/* + * Computes the checksum incrementally for the received file content. + * + * Should have a correctly initialized checksum_ctx, which will be used for + * incremental checksum computation. + */ +static void +member_compute_checksum(astreamer *streamer, astreamer_member *member, + const char *data, int len) +{ + astreamer_verify *mystreamer = (astreamer_verify *) streamer; + pg_checksum_context *checksum_ctx = mystreamer->checksum_ctx; + manifest_file *m = mystreamer->mfile; + + Assert(mystreamer->verify_checksum); + + /* + * Should have been applied to the correct file. Note that strcmp() cannot + * be used because the member pathname (if it belongs to a tablespace) is + * not relative to the base directory, unlike the backup manifest. For + * more details, see member_verify_header(). + */ + Assert(should_verify_checksum(m)); + Assert(m->checksum_type == checksum_ctx->type); + Assert(strstr(m->pathname, member->pathname)); + + /* + * Update the total count of computed checksum bytes for cross-checking + * with the file size in the final verification stage. + */ + mystreamer->received_bytes += len; + + if (pg_checksum_update(checksum_ctx, (uint8 *) data, len) < 0) + { + report_backup_error(mystreamer->context, + "could not update checksum of file \"%s\"", + m->pathname); + mystreamer->verify_checksum = false; + } +} + +/* + * Perform the final computation and checksum verification after the entire + * file content has been processed. + */ +static void +member_verify_checksum(astreamer *streamer) +{ + astreamer_verify *mystreamer = (astreamer_verify *) streamer; + + Assert(mystreamer->verify_checksum); + + verify_checksum(mystreamer->context, mystreamer->mfile, + mystreamer->checksum_ctx, mystreamer->received_bytes); +} + +/* + * Stores the pg_control file contents into a local buffer; we need the entire + * control file data for verification. + */ +static void +member_copy_control_data(astreamer *streamer, astreamer_member *member, + const char *data, int len) +{ + /* Should be here only for control file */ + Assert(strcmp(member->pathname, "global/pg_control") == 0); + Assert(((astreamer_verify *) streamer)->verify_control_data); + + /* Copy enough control file data needed for verification. */ + astreamer_buffer_until(streamer, &data, &len, sizeof(ControlFileData)); +} + +/* + * Performs the CRC calculation of pg_control data and then calls the routines + * that execute the final verification of the control file information. + */ +static void +member_verify_control_data(astreamer *streamer) +{ + astreamer_verify *mystreamer = (astreamer_verify *) streamer; + manifest_data *manifest = mystreamer->context->manifest; + ControlFileData *control_file; + pg_crc32c crc; + bool crc_ok; + + /* Should be here only for control file */ + Assert(strcmp(mystreamer->mfile->pathname, "global/pg_control") == 0); + Assert(mystreamer->verify_control_data); + + /* Should have enough control file data needed for verification. */ + if (streamer->bbs_buffer.len != sizeof(ControlFileData)) + report_fatal_error("%s: unexpected control file size: %d, should be %zu", + mystreamer->archive_name, streamer->bbs_buffer.len, + sizeof(ControlFileData)); + + control_file = (ControlFileData *) streamer->bbs_buffer.data; + + /* Check the CRC. */ + INIT_CRC32C(crc); + COMP_CRC32C(crc, (char *) (control_file), offsetof(ControlFileData, crc)); + FIN_CRC32C(crc); + + crc_ok = EQ_CRC32C(crc, control_file->crc); + + /* Do the final control data verification. */ + verify_control_data(control_file, mystreamer->mfile->pathname, crc_ok, + manifest->system_identifier); +} + +/* + * Reset flags and free memory allocations for member file verification. + */ +static void +member_reset_info(astreamer *streamer) +{ + astreamer_verify *mystreamer = (astreamer_verify *) streamer; + + mystreamer->mfile = NULL; + mystreamer->received_bytes = 0; + mystreamer->verify_checksum = false; + mystreamer->verify_control_data = false; +} diff --git a/src/bin/pg_verifybackup/meson.build b/src/bin/pg_verifybackup/meson.build index 7c7d31a0350..0e09d1379d1 100644 --- a/src/bin/pg_verifybackup/meson.build +++ b/src/bin/pg_verifybackup/meson.build @@ -1,6 +1,7 @@ # Copyright (c) 2022-2024, PostgreSQL Global Development Group pg_verifybackup_sources = files( + 'astreamer_verify.c', 'pg_verifybackup.c' ) diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c index c1542983b93..e63a0ed0798 100644 --- a/src/bin/pg_verifybackup/pg_verifybackup.c +++ b/src/bin/pg_verifybackup/pg_verifybackup.c @@ -22,6 +22,7 @@ #include "common/parse_manifest.h" #include "fe_utils/simple_list.h" #include "getopt_long.h" +#include "limits.h" #include "pg_verifybackup.h" #include "pgtime.h" @@ -44,6 +45,16 @@ */ #define READ_CHUNK_SIZE (128 * 1024) +/* + * Tar file information needed for content verification. + */ +typedef struct tar_file +{ + char *relpath; + Oid tblspc_oid; + pg_compress_algorithm compress_algorithm; +} tar_file; + static manifest_data *parse_manifest_file(char *manifest_path); static void verifybackup_version_cb(JsonManifestParseContext *context, int manifest_version); @@ -63,10 +74,16 @@ static void report_manifest_error(JsonManifestParseContext *context, pg_attribute_printf(2, 3) pg_attribute_noreturn(); static char find_backup_format(verifier_context *context); +static void verify_plain_backup(verifier_context *context); +static void verify_tar_backup(verifier_context *context); static void verify_backup_directory(verifier_context *context, char *relpath, char *fullpath); -static void verify_plain_backup_file(verifier_context *context, - char *relpath, char *fullpath); +static void verify_plain_backup_file(verifier_context *context, char *relpath, + char *fullpath); +static void precheck_tar_backup_file(verifier_context *context, char *relpath, + char *fullpath, SimplePtrList *tarfiles); +static void verify_tar_file(verifier_context *context, char *relpath, + char *fullpath, astreamer *streamer); static void report_extra_backup_files(verifier_context *context); static void verify_backup_checksums(verifier_context *context); static void verify_file_checksum(verifier_context *context, @@ -75,6 +92,10 @@ static void verify_file_checksum(verifier_context *context, static void parse_required_wal(verifier_context *context, char *pg_waldump_path, char *wal_directory); +static astreamer *create_archive_verifier(verifier_context *context, + char *archive_name, + Oid tblspc_oid, + pg_compress_algorithm compress_algo); static void progress_report(bool finished); static void usage(void); @@ -294,7 +315,10 @@ main(int argc, char **argv) * match. We also set the "matched" flag on every manifest entry that * corresponds to a file on disk. */ - verify_backup_directory(&context, NULL, context.backup_directory); + if (context.format == 'p') + verify_plain_backup(&context); + else + verify_tar_backup(&context); /* * The "matched" flag should now be set on every entry in the hash table. @@ -546,6 +570,16 @@ verifybackup_per_wal_range_cb(JsonManifestParseContext *context, manifest->last_wal_range = range; } +/* + * Verify plain backup. + */ +static void +verify_plain_backup(verifier_context *context) +{ + Assert(context->format == 'p'); + verify_backup_directory(context, NULL, context->backup_directory); +} + /* * Verify one directory. * @@ -682,6 +716,257 @@ verify_plain_backup_file(verifier_context *context, char *relpath, total_size += m->size; } +/* + * Verify tar backup. + * + * Unlike plan backup verification, tar backup verification carried out in two + * passes; in the first pass this would simply sanity check on expected tar file + * to be present in the backup directory and it's compression type and collect + * these information is list. In the second pass, the tar archives are read, + * decompressed, and the required verification is carried out. + */ +static void +verify_tar_backup(verifier_context *context) +{ + DIR *dir; + struct dirent *dirent; + SimplePtrList tarfiles = {NULL, NULL}; + SimplePtrListCell *cell; + char *fullpath; + + Assert(context->format == 't'); + + progress_report(false); + + /* + * If the backup directory cannot be found, treat this as a fatal error. + */ + fullpath = context->backup_directory; + dir = opendir(fullpath); + if (dir == NULL) + report_fatal_error("could not open directory \"%s\": %m", fullpath); + + while (errno = 0, (dirent = readdir(dir)) != NULL) + { + char *filename = dirent->d_name; + char *newfullpath = psprintf("%s/%s", fullpath, filename); + + /* Skip "." and ".." */ + if (filename[0] == '.' && (filename[1] == '\0' + || strcmp(filename, "..") == 0)) + continue; + + /* First pass: Collect valid tar files from the backup. */ + if (!should_ignore_relpath(context, filename)) + precheck_tar_backup_file(context, filename, newfullpath, + &tarfiles); + + pfree(newfullpath); + } + + if (closedir(dir)) + { + report_backup_error(context, + "could not close directory \"%s\": %m", fullpath); + return; + } + + /* Second pass: Perform the final verification of the tar contents. */ + for (cell = tarfiles.head; cell != NULL; cell = cell->next) + { + tar_file *tar = (tar_file *) cell->ptr; + astreamer *streamer; + + /* + * Prepares the archive streamer stack according to the tar + * compression format. + */ + streamer = create_archive_verifier(context, + tar->relpath, + tar->tblspc_oid, + tar->compress_algorithm); + + /* Compute the full pathname to the target file. */ + fullpath = psprintf("%s/%s", context->backup_directory, + tar->relpath); + + /* Invoke the streamer for reading, decompressing, and verifying. */ + verify_tar_file(context, tar->relpath, fullpath, streamer); + + /* Cleanup. */ + pfree(tar->relpath); + pfree(tar); + pfree(fullpath); + + astreamer_finalize(streamer); + astreamer_free(streamer); + } + simple_ptr_list_destroy(&tarfiles); + + progress_report(true); +} + +/* + * Preparatory steps for verifying files in tar format backups. + * + * Carries out basic validation of the tar format backup file, detects the + * compression type, and appends that information to the tarfiles list. An + * error will be reported if the tar file is inaccessible, or if the file type, + * name, or compression type is not as expected. + * + * The arguments to this function are mostly the same as the + * verify_plain_backup_file. The additional argument outputs a list of valid + * tar files. + */ +static void +precheck_tar_backup_file(verifier_context *context, char *relpath, + char *fullpath, SimplePtrList *tarfiles) +{ + struct stat sb; + Oid tblspc_oid = InvalidOid; + pg_compress_algorithm compress_algorithm; + tar_file *tar; + char *suffix = NULL; + + /* Should be tar format backup */ + Assert(context->format == 't'); + + /* Get file information */ + if (stat(fullpath, &sb) != 0) + { + report_backup_error(context, + "could not stat file or directory \"%s\": %m", + relpath); + return; + } + + /* In a tar format backup, we expect only plain files. */ + if (!S_ISREG(sb.st_mode)) + { + report_backup_error(context, + "\"%s\" is not a plain file", + relpath); + return; + } + + /* + * We expect tar files for backing up the main directory, tablespace, and + * pg_wal directory. + * + * pg_basebackup writes the main data directory to an archive file named + * base.tar, the pg_wal directory to pg_wal.tar, and the tablespace + * directory to <tablespaceoid>.tar, each followed by a compression type + * extension such as .gz, .lz4, or .zst. + */ + if (strncmp("base", relpath, 4) == 0) + suffix = relpath + 4; + else if (strncmp("pg_wal", relpath, 6) == 0) + suffix = relpath + 6; + else + { + /* Expected a <tablespaceoid>.tar file here. */ + uint64 num = strtoul(relpath, &suffix, 10); + + /* + * Report an error if we didn't consume at least one character, if the + * result is 0, or if the value is too large to be a valid OID. + */ + if (suffix == NULL || num <= 0 || num > OID_MAX) + report_backup_error(context, + "file \"%s\" is not expected in a tar format backup", + relpath); + tblspc_oid = (Oid) num; + } + + /* Now, check the compression type of the tar */ + if (strcmp(suffix, ".tar") == 0) + compress_algorithm = PG_COMPRESSION_NONE; + else if (strcmp(suffix, ".tgz") == 0) + compress_algorithm = PG_COMPRESSION_GZIP; + else if (strcmp(suffix, ".tar.gz") == 0) + compress_algorithm = PG_COMPRESSION_GZIP; + else if (strcmp(suffix, ".tar.lz4") == 0) + compress_algorithm = PG_COMPRESSION_LZ4; + else if (strcmp(suffix, ".tar.zst") == 0) + compress_algorithm = PG_COMPRESSION_ZSTD; + else + { + report_backup_error(context, + "file \"%s\" is not expected in a tar format backup", + relpath); + return; + } + + /* + * Ignore WALs, as reading and verification will be handled through + * pg_waldump. + */ + if (strncmp("pg_wal", relpath, 6) == 0) + return; + + /* + * Append the information to the list for complete verification at a later + * stage. + */ + tar = pg_malloc(sizeof(tar_file)); + tar->relpath = pstrdup(relpath); + tar->tblspc_oid = tblspc_oid; + tar->compress_algorithm = compress_algorithm; + + simple_ptr_list_append(tarfiles, tar); + + /* Update statistics for progress report, if necessary */ + if (show_progress) + total_size += sb.st_size; +} + +/* + * Verification of a single tar file content. + * + * It reads a given tar archive in predefined chunks and passes it to the + * streamer, which initiates routines for decompression (if necessary) and then + * verifies each member within the tar file. + */ +static void +verify_tar_file(verifier_context *context, char *relpath, char *fullpath, + astreamer *streamer) +{ + int fd; + int rc; + char *buffer; + + pg_log_debug("reading \"%s\"", fullpath); + + /* Open the target file. */ + if ((fd = open(fullpath, O_RDONLY | PG_BINARY, 0)) < 0) + { + report_backup_error(context, "could not open file \"%s\": %m", + relpath); + return; + } + + buffer = pg_malloc(READ_CHUNK_SIZE * sizeof(uint8)); + + /* Perform the reads */ + while ((rc = read(fd, buffer, READ_CHUNK_SIZE)) > 0) + { + astreamer_content(streamer, NULL, buffer, rc, ASTREAMER_UNKNOWN); + + /* Report progress */ + done_size += rc; + progress_report(false); + } + + if (rc < 0) + report_backup_error(context, "could not read file \"%s\": %m", + relpath); + + /* Close the file. */ + if (close(fd) != 0) + report_backup_error(context, "could not close file \"%s\": %m", + relpath); +} + /* * Verify file and its size entry in the manifest. */ @@ -1044,6 +1329,41 @@ find_backup_format(verifier_context *context) return result; } +/* + * Identifies the necessary steps for verifying the contents of the + * provided tar file. + */ +static astreamer * +create_archive_verifier(verifier_context *context, char *archive_name, + Oid tblspc_oid, pg_compress_algorithm compress_algo) +{ + astreamer *streamer = NULL; + + /* Should be here only for tar backup */ + Assert(context->format == 't'); + + /* + * To verify the contents of the tar, the initial step is to parse its + * content. + */ + streamer = astreamer_verify_content_new(streamer, context, archive_name, + tblspc_oid); + streamer = astreamer_tar_parser_new(streamer); + + /* + * If the tar is compressed, we must perform the appropriate decompression + * operation before proceeding with the verification of its contents. + */ + if (compress_algo == PG_COMPRESSION_GZIP) + streamer = astreamer_gzip_decompressor_new(streamer); + else if (compress_algo == PG_COMPRESSION_LZ4) + streamer = astreamer_lz4_decompressor_new(streamer); + else if (compress_algo == PG_COMPRESSION_ZSTD) + streamer = astreamer_zstd_decompressor_new(streamer); + + return streamer; +} + /* * Print a progress report based on the global variables. * diff --git a/src/bin/pg_verifybackup/pg_verifybackup.h b/src/bin/pg_verifybackup/pg_verifybackup.h index 80031ad4dbc..be7438af346 100644 --- a/src/bin/pg_verifybackup/pg_verifybackup.h +++ b/src/bin/pg_verifybackup/pg_verifybackup.h @@ -18,6 +18,7 @@ #include "common/hashfn_unstable.h" #include "common/logging.h" #include "common/parse_manifest.h" +#include "fe_utils/astreamer.h" #include "fe_utils/simple_list.h" /* @@ -123,4 +124,9 @@ extern void report_fatal_error(const char *pg_restrict fmt,...) extern bool should_ignore_relpath(verifier_context *context, const char *relpath); +extern astreamer *astreamer_verify_content_new(astreamer *next, + verifier_context *context, + char *archive_name, + Oid tblspc_oid); + #endif /* PG_VERIFYBACKUP_H */ diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list index e9ebddde24d..2b155586f8c 100644 --- a/src/tools/pgindent/typedefs.list +++ b/src/tools/pgindent/typedefs.list @@ -3330,6 +3330,7 @@ astreamer_plain_writer astreamer_recovery_injector astreamer_tar_archiver astreamer_tar_parser +astreamer_verify astreamer_zstd_frame bgworker_main_type bh_node_type @@ -3951,6 +3952,7 @@ substitute_phv_relids_context subxids_array_status symbol tablespaceinfo +tar_file td_entry teSection temp_tablespaces_extra -- 2.18.0
From aeed7ab45f6c15abae8e4935b61c3659a9ba139e Mon Sep 17 00:00:00 2001 From: Amul Sul <amul.sul@enterprisedb.com> Date: Thu, 12 Sep 2024 15:35:49 +0530 Subject: [PATCH v14 12/12] pg_verifybackup: Tests and document ---- NOTE: This patch is not meant to be committed separately. It should be squashed with the previous patch that implements tar format support. ---- --- doc/src/sgml/ref/pg_verifybackup.sgml | 45 ++- src/bin/pg_verifybackup/t/002_algorithm.pl | 34 ++- src/bin/pg_verifybackup/t/003_corruption.pl | 267 +++++++++++++++++- src/bin/pg_verifybackup/t/004_options.pl | 2 +- src/bin/pg_verifybackup/t/008_untar.pl | 71 ++--- src/bin/pg_verifybackup/t/010_client_untar.pl | 48 +--- 6 files changed, 352 insertions(+), 115 deletions(-) diff --git a/doc/src/sgml/ref/pg_verifybackup.sgml b/doc/src/sgml/ref/pg_verifybackup.sgml index a3f167f9f6e..8380178ca49 100644 --- a/doc/src/sgml/ref/pg_verifybackup.sgml +++ b/doc/src/sgml/ref/pg_verifybackup.sgml @@ -34,8 +34,12 @@ PostgreSQL documentation integrity of a database cluster backup taken using <command>pg_basebackup</command> against a <literal>backup_manifest</literal> generated by the server at the time - of the backup. The backup must be stored in the "plain" - format; a "tar" format backup can be checked after extracting it. + of the backup. The backup may be stored either in the "plain" or the "tar" + format; this includes tar-format backups compressed with any algorithm + supported by <application>pg_basebackup</application>. However, at present, + <literal>WAL</literal> verification is supported only for plain-format + backups. Therefore, if the backup is stored in tar-format, the + <literal>-n, --no-parse-wal</literal> option should be used. </para> <para> @@ -168,6 +172,43 @@ PostgreSQL documentation </listitem> </varlistentry> + <varlistentry> + <term><option>-F <replaceable class="parameter">format</replaceable></option></term> + <term><option>--format=<replaceable class="parameter">format</replaceable></option></term> + <listitem> + <para> + Specifies the format of the backup. <replaceable>format</replaceable> + can be one of the following: + + <variablelist> + <varlistentry> + <term><literal>p</literal></term> + <term><literal>plain</literal></term> + <listitem> + <para> + Backup consists of plain files with the same layout as the + source server's data directory and tablespaces. + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term><literal>t</literal></term> + <term><literal>tar</literal></term> + <listitem> + <para> + Backup consists of tar files. A valid backup includes the main data + directory in a file named <filename>base.tar</filename>, the WAL + files in <filename>pg_wal.tar</filename>, and separate tar files for + each tablespace, named after the tablespace's OID, followed by the + compression extension. + </para> + </listitem> + </varlistentry> + </variablelist></para> + </listitem> + </varlistentry> + <varlistentry> <term><option>-n</option></term> <term><option>--no-parse-wal</option></term> diff --git a/src/bin/pg_verifybackup/t/002_algorithm.pl b/src/bin/pg_verifybackup/t/002_algorithm.pl index fb2a1fd7c4e..4959d5bd0b9 100644 --- a/src/bin/pg_verifybackup/t/002_algorithm.pl +++ b/src/bin/pg_verifybackup/t/002_algorithm.pl @@ -14,24 +14,35 @@ my $primary = PostgreSQL::Test::Cluster->new('primary'); $primary->init(allows_streaming => 1); $primary->start; -for my $algorithm (qw(bogus none crc32c sha224 sha256 sha384 sha512)) +sub test_checksums { - my $backup_path = $primary->backup_dir . '/' . $algorithm; + my ($format, $algorithm) = @_; + my $backup_path = $primary->backup_dir . '/' . $format . '/' . $algorithm; my @backup = ( 'pg_basebackup', '-D', $backup_path, '--manifest-checksums', $algorithm, '--no-sync', '-cfast'); my @verify = ('pg_verifybackup', '-e', $backup_path); + if ($format eq 'tar') + { + # Add switch to get a tar-format backup + push @backup, ('-F', 't'); + + # Add switch to skip WAL verification, which is not yet supported for + # tar-format backups + push @verify, ('-n'); + } + # A backup with a bogus algorithm should fail. if ($algorithm eq 'bogus') { $primary->command_fails(\@backup, - "backup fails with algorithm \"$algorithm\""); - next; + "$format format backup fails with algorithm \"$algorithm\""); + return; } # A backup with a valid algorithm should work. - $primary->command_ok(\@backup, "backup ok with algorithm \"$algorithm\""); + $primary->command_ok(\@backup, "$format format backup ok with algorithm \"$algorithm\""); # We expect each real checksum algorithm to be mentioned on every line of # the backup manifest file except the first and last; for simplicity, we @@ -39,7 +50,7 @@ for my $algorithm (qw(bogus none crc32c sha224 sha256 sha384 sha512)) # is none, we just check that the manifest exists. if ($algorithm eq 'none') { - ok(-f "$backup_path/backup_manifest", "backup manifest exists"); + ok(-f "$backup_path/backup_manifest", "$format format backup manifest exists"); } else { @@ -52,10 +63,19 @@ for my $algorithm (qw(bogus none crc32c sha224 sha256 sha384 sha512)) # Make sure that it verifies OK. $primary->command_ok(\@verify, - "verify backup with algorithm \"$algorithm\""); + "verify $format format backup with algorithm \"$algorithm\""); # Remove backup immediately to save disk space. rmtree($backup_path); } +# Do the check +for my $format (qw(plain tar)) +{ + for my $algorithm (qw(bogus none crc32c sha224 sha256 sha384 sha512)) + { + test_checksums($format, $algorithm); + } +} + done_testing(); diff --git a/src/bin/pg_verifybackup/t/003_corruption.pl b/src/bin/pg_verifybackup/t/003_corruption.pl index ae91e043384..c953bbc20d8 100644 --- a/src/bin/pg_verifybackup/t/003_corruption.pl +++ b/src/bin/pg_verifybackup/t/003_corruption.pl @@ -11,6 +11,8 @@ use PostgreSQL::Test::Cluster; use PostgreSQL::Test::Utils; use Test::More; +my $tar = $ENV{TAR}; + my $primary = PostgreSQL::Test::Cluster->new('primary'); $primary->init(allows_streaming => 1); $primary->start; @@ -32,62 +34,73 @@ EOM my @scenario = ( { 'name' => 'extra_file', + 'backup_format' => 'p', 'mutilate' => \&mutilate_extra_file, 'fails_like' => qr/extra_file.*present on disk but not in the manifest/ }, { 'name' => 'extra_tablespace_file', + 'backup_format' => 'p', 'mutilate' => \&mutilate_extra_tablespace_file, 'fails_like' => qr/extra_ts_file.*present on disk but not in the manifest/ }, { 'name' => 'missing_file', + 'backup_format' => 'p', 'mutilate' => \&mutilate_missing_file, 'fails_like' => qr/pg_xact\/0000.*present in the manifest but not on disk/ }, { 'name' => 'missing_tablespace', + 'backup_format' => 'p', 'mutilate' => \&mutilate_missing_tablespace, 'fails_like' => qr/pg_tblspc.*present in the manifest but not on disk/ }, { 'name' => 'append_to_file', + 'backup_format' => 'p', 'mutilate' => \&mutilate_append_to_file, 'fails_like' => qr/has size \d+ on disk but size \d+ in the manifest/ }, { 'name' => 'truncate_file', + 'backup_format' => 'p', 'mutilate' => \&mutilate_truncate_file, 'fails_like' => qr/has size 0 on disk but size \d+ in the manifest/ }, { 'name' => 'replace_file', + 'backup_format' => 'p', 'mutilate' => \&mutilate_replace_file, 'fails_like' => qr/checksum mismatch for file/ }, { 'name' => 'system_identifier', + 'backup_format' => 'p', 'mutilate' => \&mutilate_system_identifier, 'fails_like' => qr/manifest system identifier is .*, but control file has/ }, { 'name' => 'bad_manifest', + 'backup_format' => 'p', 'mutilate' => \&mutilate_bad_manifest, 'fails_like' => qr/manifest checksum mismatch/ }, { 'name' => 'open_file_fails', + 'backup_format' => 'p', 'mutilate' => \&mutilate_open_file_fails, 'fails_like' => qr/could not open file/, 'skip_on_windows' => 1 }, { 'name' => 'open_directory_fails', + 'backup_format' => 'p', 'mutilate' => \&mutilate_open_directory_fails, 'cleanup' => \&cleanup_open_directory_fails, 'fails_like' => qr/could not open directory/, @@ -95,10 +108,78 @@ my @scenario = ( }, { 'name' => 'search_directory_fails', + 'backup_format' => 'p', 'mutilate' => \&mutilate_search_directory_fails, 'cleanup' => \&cleanup_search_directory_fails, 'fails_like' => qr/could not stat file or directory/, 'skip_on_windows' => 1 + }, + { + 'name' => 'tar_backup_unexpected_file', + 'backup_format' => 't', + 'mutilate' => \&mutilate_extra_file, + 'fails_like' => + qr/file "extra_file" is not expected in a tar format backup/ + }, + { + 'name' => 'tar_backup_extra_file', + 'backup_format' => 't', + 'mutilate' => \&mutilate_tar_backup_extra_file, + 'fails_like' => + qr/extra_tar_member_file.*present on disk but not in the manifest/ + }, + { + 'name' => 'tar_extra_tablespace_file', + 'backup_format' => 't', + 'mutilate' => \&mutilate_tar_extra_tablespace_file, + 'fails_like' => + qr/extra_ts_member_file.*present on disk but not in the manifest/ + }, + { + 'name' => 'tar_backup_missing_file', + 'backup_format' => 't', + 'mutilate' => \&mutilate_tar_backup_missing_file, + 'fails_like' => + qr/pg_xact\/0000.*present in the manifest but not on disk/, + }, + { + 'name' => 'tar_missing_tablespace', + 'backup_format' => 't', + 'mutilate' => \&mutilate_tar_missing_tablespace, + 'fails_like' => + qr/pg_tblspc.*present in the manifest but not on disk/ + }, + { + 'name' => 'tar_missing_tablespace_file', + 'backup_format' => 't', + 'mutilate' => \&mutilate_tar_missing_tablespace_file, + 'fails_like' => + qr/pg_tblspc.*present in the manifest but not on disk/ + }, + { + 'name' => 'tar_backup_append_to_file', + 'backup_format' => 't', + 'mutilate' => \&mutilate_tar_backup_append_to_file, + 'fails_like' => qr/has size \d+ on disk but size \d+ in the manifest/, + }, + { + 'name' => 'tar_backup_truncate_file', + 'backup_format' => 't', + 'mutilate' => \&mutilate_tar_backup_truncate_file, + 'fails_like' => qr/has size 0 on disk but size \d+ in the manifest/, + }, + { + 'name' => 'tar_backup_replace_file', + 'backup_format' => 't', + 'mutilate' => \&mutilate_tar_backup_replace_file, + 'fails_like' => qr/checksum mismatch for file/, + }, + { + 'name' => 'tar_backup_system_identifier', + 'backup_format' => 't', + 'mutilate' => \&mutilate_system_identifier, + 'fails_like' => + qr/manifest system identifier is .*, but control file has/ }); for my $scenario (@scenario) @@ -111,29 +192,42 @@ for my $scenario (@scenario) if ($scenario->{'skip_on_windows'} && ($windows_os || $Config::Config{osname} eq 'cygwin')); + # Skip tests for tar format-backup if tar is not available. + skip "no tar program available", 4 + if ($scenario->{'backup_format'} eq 't' && (!defined $tar || $tar eq '')); + # Take a backup and check that it verifies OK. my $backup_path = $primary->backup_dir . '/' . $name; my $backup_ts_path = PostgreSQL::Test::Utils::tempdir_short(); + + my @backup = ( + 'pg_basebackup', '-D', $backup_path, '--no-sync', '-cfast', + '-T', "${source_ts_path}=${backup_ts_path}"); + my @verify = ('pg_verifybackup', $backup_path); + + if ($scenario->{'backup_format'} eq 't') + { + # Add switch to get a tar-format backup + push @backup, ('-F', 't'); + + # Add switch to skip WAL verification, which is not yet supported + # for tar-format backups + push @verify, ('-n'); + } + # The tablespace map parameter confuses Msys2, which tries to mangle # it. Tell it not to. # See https://www.msys2.org/wiki/Porting/#filesystem-namespaces local $ENV{MSYS2_ARG_CONV_EXCL} = $source_ts_prefix; - $primary->command_ok( - [ - 'pg_basebackup', '-D', $backup_path, '--no-sync', '-cfast', - '-T', "${source_ts_path}=${backup_ts_path}" - ], - "base backup ok"); - command_ok([ 'pg_verifybackup', $backup_path ], - "intact backup verified"); + + $primary->command_ok( \@backup, "base backup ok"); + command_ok(\@verify, "intact backup verified"); # Mutilate the backup in some way. $scenario->{'mutilate'}->($backup_path); # Now check that the backup no longer verifies. - command_fails_like( - [ 'pg_verifybackup', $backup_path ], - $scenario->{'fails_like'}, + command_fails_like(\@verify, $scenario->{'fails_like'}, "corrupt backup fails verification: $name"); # Run cleanup hook, if provided. @@ -260,6 +354,7 @@ sub mutilate_system_identifier $backup_path . '/backup_manifest') or BAIL_OUT "could not copy manifest to $backup_path"; $node->teardown_node(fail_ok => 1); + $node->clean_node(); return; } @@ -316,4 +411,154 @@ sub cleanup_search_directory_fails return; } +# Unpack tar file, perform the specified file operation, and then repack the +# modified content into same tar file at the same location. +sub mutilate_base_tar +{ + my ($backup_path, $archive, $op) = @_; + + my $tmpdir = "$backup_path/tmpdir"; + mkdir($tmpdir) || die "$!"; + + # Extract the archive + system_or_bail($tar, '-xf', "$backup_path/$archive", '-C', "$tmpdir"); + unlink("$backup_path/$archive") || die "$!"; + + if ($op eq 'add') + { + if ($archive eq 'base.tar') + { + create_extra_file($tmpdir, 'extra_tar_member_file'); + } + else + { + # must be a tablespace archive + my ($catvdir) = grep { $_ ne '.' && $_ ne '..' } + slurp_dir("$tmpdir"); + my ($tsdboid) = grep { $_ ne '.' && $_ ne '..' } + slurp_dir("$tmpdir/$catvdir"); + create_extra_file($tmpdir, + "$catvdir/$tsdboid/extra_ts_member_file"); + } + } + elsif ($op eq 'delete') + { + if ($archive eq 'base.tar') + { + mutilate_missing_file($tmpdir); + } + else + { + # must be a tablespace archive + my ($catvdir) = grep { $_ ne '.' && $_ ne '..' } + slurp_dir("$tmpdir"); + my ($tsdboid) = grep { $_ ne '.' && $_ ne '..' } + slurp_dir("$tmpdir/$catvdir"); + my ($reloid) = grep { $_ ne '.' && $_ ne '..' } + slurp_dir("$tmpdir/$catvdir/$tsdboid"); + my $pathname = "$tmpdir/$catvdir/$tsdboid/$reloid"; + unlink($pathname) || die "$pathname: $!"; + } + } + elsif ($op eq 'append') + { + mutilate_append_to_file($tmpdir); + } + elsif ($op eq 'truncate') + { + mutilate_truncate_file($tmpdir); + } + elsif ($op eq 'replace') + { + mutilate_replace_file($tmpdir); + } + else + { + die "mutilate_tar_backup: \"$op\" invalid operation"; + } + + + # Navigate to the extracted location and list the files. + chdir("$tmpdir") || die "$!"; + my @files = glob("*"); + # Repack the extracted content + system_or_bail($tar, '-cf', "$backup_path/$archive", @files); + chdir($backup_path) || die "$!"; + rmtree("$tmpdir") || die "$!"; +} + +# Add a file to the main directory archive. +sub mutilate_tar_backup_extra_file +{ + my ($backup_path) = @_; + mutilate_base_tar($backup_path, 'base.tar', 'add'); + return; +} + +# Add a file to the user-defined tablespace archive. +sub mutilate_tar_extra_tablespace_file +{ + my ($backup_path) = @_; + my ($archive) = + grep { $_ =~ qr/\d+\.tar/ } slurp_dir("$backup_path"); + die "tablespace tar backup not found." unless defined $archive; + mutilate_base_tar($backup_path, $archive, 'add'); + return; +} + +# Remove a file from main directory archive. +sub mutilate_tar_backup_missing_file +{ + my ($backup_path) = @_; + mutilate_base_tar($backup_path, 'base.tar', 'delete'); + return; +} + +# Remove the user-defined tablespace archive. +sub mutilate_tar_missing_tablespace +{ + my ($backup_path) = @_; + my ($archive) = + grep { $_ =~ qr/\d+\.tar/ } slurp_dir("$backup_path"); + die "tablespace tar backup not found." unless defined $archive; + my $pathname = "$backup_path/$archive"; + unlink($pathname) || die "$pathname: $!"; + return; +} + +# Remove the files from the user-defined tablespace archive. +sub mutilate_tar_missing_tablespace_file +{ + my ($backup_path) = @_; + my ($archive) = + grep { $_ =~ qr/\d+\.tar/ } slurp_dir("$backup_path"); + die "tablespace tar backup not found." unless defined $archive; + mutilate_base_tar($backup_path, $archive, 'delete'); + return; +} + +# Append an additional bytes to a file. +sub mutilate_tar_backup_append_to_file +{ + my ($backup_path) = @_; + mutilate_base_tar($backup_path, 'base.tar', 'append'); + return; +} + +# Truncate a file to zero length. +sub mutilate_tar_backup_truncate_file +{ + my ($backup_path) = @_; + mutilate_base_tar($backup_path, 'base.tar', 'truncate'); + return; +} + +# Replace a file's contents +sub mutilate_tar_backup_replace_file +{ + my ($backup_path) = @_; + mutilate_base_tar($backup_path, 'base.tar', 'replace'); + return; +} + done_testing(); diff --git a/src/bin/pg_verifybackup/t/004_options.pl b/src/bin/pg_verifybackup/t/004_options.pl index 8ed2214408e..2f197648740 100644 --- a/src/bin/pg_verifybackup/t/004_options.pl +++ b/src/bin/pg_verifybackup/t/004_options.pl @@ -108,7 +108,7 @@ unlike( # Test valid manifest with nonexistent backup directory. command_fails_like( [ - 'pg_verifybackup', '-m', + 'pg_verifybackup', '-Fp', '-m', "$backup_path/backup_manifest", "$backup_path/fake" ], qr/could not open directory/, diff --git a/src/bin/pg_verifybackup/t/008_untar.pl b/src/bin/pg_verifybackup/t/008_untar.pl index 7a09f3b75b2..e7ec8369362 100644 --- a/src/bin/pg_verifybackup/t/008_untar.pl +++ b/src/bin/pg_verifybackup/t/008_untar.pl @@ -16,6 +16,18 @@ my $primary = PostgreSQL::Test::Cluster->new('primary'); $primary->init(allows_streaming => 1); $primary->start; +# Create a tablespace directory. +my $source_ts_path = PostgreSQL::Test::Utils::tempdir_short(); + +# Create a tablespace with table in it. +$primary->safe_psql('postgres', qq( + CREATE TABLESPACE regress_ts1 LOCATION '$source_ts_path'; + SELECT oid FROM pg_tablespace WHERE spcname = 'regress_ts1'; + CREATE TABLE regress_tbl1(i int) TABLESPACE regress_ts1; + INSERT INTO regress_tbl1 VALUES(generate_series(1,5));)); +my $tsoid = $primary->safe_psql('postgres', qq( + SELECT oid FROM pg_tablespace WHERE spcname = 'regress_ts1')); + my $backup_path = $primary->backup_dir . '/server-backup'; my $extract_path = $primary->backup_dir . '/extracted-backup'; @@ -23,39 +35,31 @@ my @test_configuration = ( { 'compression_method' => 'none', 'backup_flags' => [], - 'backup_archive' => 'base.tar', + 'backup_archive' => ['base.tar', "$tsoid.tar"], 'enabled' => 1 }, { 'compression_method' => 'gzip', 'backup_flags' => [ '--compress', 'server-gzip' ], - 'backup_archive' => 'base.tar.gz', - 'decompress_program' => $ENV{'GZIP_PROGRAM'}, - 'decompress_flags' => ['-d'], + 'backup_archive' => [ 'base.tar.gz', "$tsoid.tar.gz" ], 'enabled' => check_pg_config("#define HAVE_LIBZ 1") }, { 'compression_method' => 'lz4', 'backup_flags' => [ '--compress', 'server-lz4' ], - 'backup_archive' => 'base.tar.lz4', - 'decompress_program' => $ENV{'LZ4'}, - 'decompress_flags' => [ '-d', '-m' ], + 'backup_archive' => ['base.tar.lz4', "$tsoid.tar.lz4" ], 'enabled' => check_pg_config("#define USE_LZ4 1") }, { 'compression_method' => 'zstd', 'backup_flags' => [ '--compress', 'server-zstd' ], - 'backup_archive' => 'base.tar.zst', - 'decompress_program' => $ENV{'ZSTD'}, - 'decompress_flags' => ['-d'], + 'backup_archive' => [ 'base.tar.zst', "$tsoid.tar.zst" ], 'enabled' => check_pg_config("#define USE_ZSTD 1") }, { 'compression_method' => 'zstd', 'backup_flags' => [ '--compress', 'server-zstd:level=1,long' ], - 'backup_archive' => 'base.tar.zst', - 'decompress_program' => $ENV{'ZSTD'}, - 'decompress_flags' => ['-d'], + 'backup_archive' => [ 'base.tar.zst', "$tsoid.tar.zst" ], 'enabled' => check_pg_config("#define USE_ZSTD 1") }); @@ -86,47 +90,16 @@ for my $tc (@test_configuration) my $backup_files = join(',', sort grep { $_ ne '.' && $_ ne '..' } slurp_dir($backup_path)); my $expected_backup_files = - join(',', sort ('backup_manifest', $tc->{'backup_archive'})); + join(',', sort ('backup_manifest', @{ $tc->{'backup_archive'} })); is($backup_files, $expected_backup_files, "found expected backup files, compression $method"); - # Decompress. - if (exists $tc->{'decompress_program'}) - { - my @decompress = ($tc->{'decompress_program'}); - push @decompress, @{ $tc->{'decompress_flags'} } - if $tc->{'decompress_flags'}; - push @decompress, $backup_path . '/' . $tc->{'backup_archive'}; - system_or_bail(@decompress); - } - - SKIP: - { - my $tar = $ENV{TAR}; - # don't check for a working tar here, to accommodate various odd - # cases. If tar doesn't work the init_from_backup below will fail. - skip "no tar program available", 1 - if (!defined $tar || $tar eq ''); - - # Untar. - mkdir($extract_path); - system_or_bail($tar, 'xf', $backup_path . '/base.tar', - '-C', $extract_path); - - # Verify. - $primary->command_ok( - [ - 'pg_verifybackup', '-n', - '-m', "$backup_path/backup_manifest", - '-e', $extract_path - ], - "verify backup, compression $method"); - } + # Verify tar backup. + $primary->command_ok(['pg_verifybackup', '-n', '-e', $backup_path], + "verify backup, compression $method"); # Cleanup. - unlink($backup_path . '/backup_manifest'); - unlink($backup_path . '/base.tar'); - unlink($backup_path . '/' . $tc->{'backup_archive'}); + rmtree($backup_path); rmtree($extract_path); } } diff --git a/src/bin/pg_verifybackup/t/010_client_untar.pl b/src/bin/pg_verifybackup/t/010_client_untar.pl index 8c076d46dee..6b7d7483f6e 100644 --- a/src/bin/pg_verifybackup/t/010_client_untar.pl +++ b/src/bin/pg_verifybackup/t/010_client_untar.pl @@ -29,41 +29,30 @@ my @test_configuration = ( 'compression_method' => 'gzip', 'backup_flags' => [ '--compress', 'client-gzip:5' ], 'backup_archive' => 'base.tar.gz', - 'decompress_program' => $ENV{'GZIP_PROGRAM'}, - 'decompress_flags' => ['-d'], 'enabled' => check_pg_config("#define HAVE_LIBZ 1") }, { 'compression_method' => 'lz4', 'backup_flags' => [ '--compress', 'client-lz4:5' ], 'backup_archive' => 'base.tar.lz4', - 'decompress_program' => $ENV{'LZ4'}, - 'decompress_flags' => ['-d'], - 'output_file' => 'base.tar', 'enabled' => check_pg_config("#define USE_LZ4 1") }, { 'compression_method' => 'zstd', 'backup_flags' => [ '--compress', 'client-zstd:5' ], 'backup_archive' => 'base.tar.zst', - 'decompress_program' => $ENV{'ZSTD'}, - 'decompress_flags' => ['-d'], 'enabled' => check_pg_config("#define USE_ZSTD 1") }, { 'compression_method' => 'zstd', 'backup_flags' => [ '--compress', 'client-zstd:level=1,long' ], 'backup_archive' => 'base.tar.zst', - 'decompress_program' => $ENV{'ZSTD'}, - 'decompress_flags' => ['-d'], 'enabled' => check_pg_config("#define USE_ZSTD 1") }, { 'compression_method' => 'parallel zstd', 'backup_flags' => [ '--compress', 'client-zstd:workers=3' ], 'backup_archive' => 'base.tar.zst', - 'decompress_program' => $ENV{'ZSTD'}, - 'decompress_flags' => ['-d'], 'enabled' => check_pg_config("#define USE_ZSTD 1"), 'possibly_unsupported' => qr/could not set compression worker count to 3: Unsupported parameter/ @@ -118,40 +107,9 @@ for my $tc (@test_configuration) is($backup_files, $expected_backup_files, "found expected backup files, compression $method"); - # Decompress. - if (exists $tc->{'decompress_program'}) - { - my @decompress = ($tc->{'decompress_program'}); - push @decompress, @{ $tc->{'decompress_flags'} } - if $tc->{'decompress_flags'}; - push @decompress, $backup_path . '/' . $tc->{'backup_archive'}; - push @decompress, $backup_path . '/' . $tc->{'output_file'} - if $tc->{'output_file'}; - system_or_bail(@decompress); - } - - SKIP: - { - my $tar = $ENV{TAR}; - # don't check for a working tar here, to accommodate various odd - # cases. If tar doesn't work the init_from_backup below will fail. - skip "no tar program available", 1 - if (!defined $tar || $tar eq ''); - - # Untar. - mkdir($extract_path); - system_or_bail($tar, 'xf', $backup_path . '/base.tar', - '-C', $extract_path); - - # Verify. - $primary->command_ok( - [ - 'pg_verifybackup', '-n', - '-m', "$backup_path/backup_manifest", - '-e', $extract_path - ], - "verify backup, compression $method"); - } + # Verify tar backup. + $primary->command_ok( [ 'pg_verifybackup', '-n', '-e', $backup_path ], + "verify backup, compression $method"); # Cleanup. rmtree($extract_path); -- 2.18.0