On Sat, Aug 24, 2024 at 2:02 AM Robert Haas <robertmh...@gmail.com> wrote:
>
> On Wed, Aug 21, 2024 at 7:08 AM Amul Sul <sula...@gmail.com> wrote:
> [....]
> Then the result verifies. But I feel like we should have some test
> cases that do this kind of stuff so that there is automated
> verification. In fact, the current patch seems to have no negative
> test cases at all. I think we should test all the cases in
> 003_corruption.pl with tar format backups as well as with plain format
> backups, which we could do by untarring one of the archives, messing
> something up, and then retarring it. I also think we should have some
> negative test case specific to tar-format backup. I suggest running a
> coverage analysis and trying to craft test cases that hit as much of
> the code as possible. There will probably be some errors you can't
> hit, but you should try to hit the ones you can.
>

Done. I’ve added a few tests that extract, modify, and repack the tar
files, mainly base.tar and skipping tablespace.tar since it mostly
duplicate tests. I’ve also updated 002_algorithm.pl to cover
tests for tar backups.

>
> > 0011 patch: Regarding function names:
> >
> > 1. named the function verify_tar_backup_file() to align with
> > verify_plain_backup_file(), but it does not perform the complete
> > verification as verify_plain_backup_file does. Not sure if it is the
> > right name.
>
> I was thinking of something like precheck_tar_backup_file().

Done.

> > 2. verify_tar_file_contents() is the second and final part of tar
> > backup verification. Should its name be aligned with
> > verify_tar_backup_file()? I’m unsure what the best name would be.
> > Perhaps verify_tar_backup_file_final(), but then
> > verify_tar_backup_file() would need to be renamed to something like
> > verify_tar_backup_file_initial(), which might be too lengthy.
>
> verify_tar_file_contents() actually verifies the contents of all the
> tar files, not just one, so the name is a bit confusing. Maybe
> verify_all_tar_files().
>

Done.

>
> > 3. verify_tar_contents() is the core of verify_tar_file_contents()
> > that handles the actual verification. I’m unsure about the current
> > naming. Should we rename it to something like
> > verify_tar_contents_core()? It wouldn’t be an issue if we renamed
> > verify_tar_file_contents() as pointed in point #2.
>
> verify_one_tar_file()?
>

Done.

>
> But with those renames, I think you really start to see why I'm not
> very comfortable with verify_backup_directory(). The tar and plain
> format cases aren't really doing the same thing. We're just gluing
> them into a single function anyway.

Agreed. I can see the uncomfortness -- added a new function.

>
> I am also still uncomfortable with the way you've refactored some of
> this so that we end up with very small amounts of code far away from
> other code that they influence. Like you end up with this:
>
>         /* Check the backup manifest entry for this file. */
>         m = verify_manifest_entry(context, relpath, sb.st_size);
>
>         /* Validate the pg_control information */
>         if (should_verify_control_data(context->manifest, m))
> ...
>         if (show_progress && !context->skip_checksums &&
>                 should_verify_checksum(m))
>
> But verify_manifest_entry can return NULL or it can set m->bad and
> either of those change the result of should_verify_control_data() and
> should_verify_checksum(), but none of that is obvious when you just
> look at this. Admittedly, the code in master isn't brilliant in terms
> of making it obvious what's happening either, but I think this is
> worse. I'm not really sure what I think we should do about that yet,
> but I'm uncomfortable with it.

I am not sure if I fully understand the concern, but I see it
differently. The verify_manifest_entry function returns an entry, m,
that the caller doesn't need to worry about, as it simply passes it to
subsequent routines or macros that are aware of the possible inputs --
whether it's NULL, m->bad, or something else.

Regards,
Amul
From 4f98ffa42916fe179e9a87b9043393b6449f1705 Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Wed, 14 Aug 2024 10:42:37 +0530
Subject: [PATCH v13 06/12] Refactor: split verify_backup_file() function and
 rename it.

The function verify_backup_file() has now been renamed to
verify_plain_backup_file() to make it clearer that it is specifically
used for verifying files in a plain backup. Similarly, in a future
patch, we would have a verify_tar_backup_file() function for
verifying TAR backup files.

In addition to that, moved the manifest entry verification code into a
new function called verify_manifest_entry() so that it can be reused
for tar backup verification. If verify_manifest_entry() doesn't find
an entry, it reports an error as before and returns NULL to the
caller. This is why a NULL check is added to should_verify_checksum().
---
 src/bin/pg_verifybackup/pg_verifybackup.c | 58 +++++++++++++++--------
 src/bin/pg_verifybackup/pg_verifybackup.h |  6 ++-
 2 files changed, 42 insertions(+), 22 deletions(-)

diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index 3fcfb167217..5bfc98e7874 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -64,8 +64,8 @@ static void report_manifest_error(JsonManifestParseContext *context,
 
 static void verify_backup_directory(verifier_context *context,
 									char *relpath, char *fullpath);
-static void verify_backup_file(verifier_context *context,
-							   char *relpath, char *fullpath);
+static void verify_plain_backup_file(verifier_context *context,
+									 char *relpath, char *fullpath);
 static void verify_control_file(const char *controlpath,
 								uint64 manifest_system_identifier);
 static void report_extra_backup_files(verifier_context *context);
@@ -570,7 +570,7 @@ verify_backup_directory(verifier_context *context, char *relpath,
 			newrelpath = psprintf("%s/%s", relpath, filename);
 
 		if (!should_ignore_relpath(context, newrelpath))
-			verify_backup_file(context, newrelpath, newfullpath);
+			verify_plain_backup_file(context, newrelpath, newfullpath);
 
 		pfree(newfullpath);
 		pfree(newrelpath);
@@ -591,7 +591,8 @@ verify_backup_directory(verifier_context *context, char *relpath,
  * verify_backup_directory.
  */
 static void
-verify_backup_file(verifier_context *context, char *relpath, char *fullpath)
+verify_plain_backup_file(verifier_context *context, char *relpath,
+						 char *fullpath)
 {
 	struct stat sb;
 	manifest_file *m;
@@ -627,6 +628,32 @@ verify_backup_file(verifier_context *context, char *relpath, char *fullpath)
 		return;
 	}
 
+	/* Check the backup manifest entry for this file. */
+	m = verify_manifest_entry(context, relpath, sb.st_size);
+
+	/*
+	 * Validate the manifest system identifier, not available in manifest
+	 * version 1.
+	 */
+	if (context->manifest->version != 1 &&
+		strcmp(relpath, "global/pg_control") == 0 &&
+		m != NULL && m->matched && !m->bad)
+		verify_control_file(fullpath, context->manifest->system_identifier);
+
+	/* Update statistics for progress report, if necessary */
+	if (show_progress && !context->skip_checksums &&
+		should_verify_checksum(m))
+		total_size += m->size;
+}
+
+/*
+ * Verify file and its size entry in the manifest.
+ */
+manifest_file *
+verify_manifest_entry(verifier_context *context, char *relpath, int64 filesize)
+{
+	manifest_file *m;
+
 	/* Check whether there's an entry in the manifest hash. */
 	m = manifest_files_lookup(context->manifest->files, relpath);
 	if (m == NULL)
@@ -634,40 +661,29 @@ verify_backup_file(verifier_context *context, char *relpath, char *fullpath)
 		report_backup_error(context,
 							"\"%s\" is present on disk but not in the manifest",
 							relpath);
-		return;
+		return NULL;
 	}
 
 	/* Flag this entry as having been encountered in the filesystem. */
 	m->matched = true;
 
 	/* Check that the size matches. */
-	if (m->size != sb.st_size)
+	if (m->size != filesize)
 	{
 		report_backup_error(context,
 							"\"%s\" has size %lld on disk but size %zu in the manifest",
-							relpath, (long long int) sb.st_size, m->size);
+							relpath, (long long int) filesize, m->size);
 		m->bad = true;
 	}
 
-	/*
-	 * Validate the manifest system identifier, not available in manifest
-	 * version 1.
-	 */
-	if (context->manifest->version != 1 &&
-		strcmp(relpath, "global/pg_control") == 0)
-		verify_control_file(fullpath, context->manifest->system_identifier);
-
-	/* Update statistics for progress report, if necessary */
-	if (show_progress && !context->skip_checksums &&
-		should_verify_checksum(m))
-		total_size += m->size;
-
 	/*
 	 * We don't verify checksums at this stage. We first finish verifying that
 	 * we have the expected set of files with the expected sizes, and only
 	 * afterwards verify the checksums. That's because computing checksums may
 	 * take a while, and we'd like to report more obvious problems quickly.
 	 */
+
+	return m;
 }
 
 /*
@@ -830,7 +846,7 @@ verify_file_checksum(verifier_context *context, manifest_file *m,
 
 	/*
 	 * Double-check that we read the expected number of bytes from the file.
-	 * Normally, a file size mismatch would be caught in verify_backup_file
+	 * Normally, a file size mismatch would be caught in verify_manifest_entry
 	 * and this check would never be reached, but this provides additional
 	 * safety and clarity in the event of concurrent modifications or
 	 * filesystem misbehavior.
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.h b/src/bin/pg_verifybackup/pg_verifybackup.h
index d8c566ed587..ff9476e356e 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.h
+++ b/src/bin/pg_verifybackup/pg_verifybackup.h
@@ -37,7 +37,8 @@ typedef struct manifest_file
 } manifest_file;
 
 #define should_verify_checksum(m) \
-	(((m)->matched) && !((m)->bad) && (((m)->checksum_type) != CHECKSUM_TYPE_NONE))
+	(((m) != NULL) && ((m)->matched) && !((m)->bad) && \
+	 (((m)->checksum_type) != CHECKSUM_TYPE_NONE))
 
 /*
  * Define a hash table which we can use to store information about the files
@@ -93,6 +94,9 @@ typedef struct verifier_context
 	bool		saw_any_error;
 } verifier_context;
 
+extern manifest_file *verify_manifest_entry(verifier_context *context,
+											char *relpath, int64 filesize);
+
 extern void report_backup_error(verifier_context *context,
 								const char *pg_restrict fmt,...)
 			pg_attribute_printf(2, 3);
-- 
2.18.0

From ab254802604c8c31b5ec05784d938593c6efd9b2 Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Wed, 14 Aug 2024 15:14:15 +0530
Subject: [PATCH v13 07/12] Refactor: split verify_file_checksum() function.

Move the core functionality of verify_file_checksum to a new function
to reuse it instead of duplicating the code.

The verify_file_checksum() function is designed to take a file path,
open and read the file contents, and then calculate the checksum.
However, for TAR backups, instead of a file path, we receive the file
content in chunks, and the checksum needs to be calculated
incrementally. While the initial operations for plain and TAR backup
checksum calculations differ, the final checks and error handling are
the same. By moving the shared logic to a separate function, we can
reuse the code for both types of backups.
---
 src/bin/pg_verifybackup/pg_verifybackup.c | 20 +++++++++++++++++---
 src/bin/pg_verifybackup/pg_verifybackup.h |  3 +++
 2 files changed, 20 insertions(+), 3 deletions(-)

diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index 5bfc98e7874..e44d0377cd5 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -792,8 +792,6 @@ verify_file_checksum(verifier_context *context, manifest_file *m,
 	int			fd;
 	int			rc;
 	size_t		bytes_read = 0;
-	uint8		checksumbuf[PG_CHECKSUM_MAX_LENGTH];
-	int			checksumlen;
 
 	/* Open the target file. */
 	if ((fd = open(fullpath, O_RDONLY | PG_BINARY, 0)) < 0)
@@ -844,6 +842,22 @@ verify_file_checksum(verifier_context *context, manifest_file *m,
 	if (rc < 0)
 		return;
 
+	/* Do the final computation and verification. */
+	verify_checksum(context, m, &checksum_ctx, bytes_read);
+}
+
+/*
+ * A helper function to finalize checksum computation and verify it against the
+ * backup manifest information.
+ */
+void
+verify_checksum(verifier_context *context, manifest_file *m,
+				pg_checksum_context *checksum_ctx, int64 bytes_read)
+{
+	const char *relpath = m->pathname;
+	int			checksumlen;
+	uint8		checksumbuf[PG_CHECKSUM_MAX_LENGTH];
+
 	/*
 	 * Double-check that we read the expected number of bytes from the file.
 	 * Normally, a file size mismatch would be caught in verify_manifest_entry
@@ -860,7 +874,7 @@ verify_file_checksum(verifier_context *context, manifest_file *m,
 	}
 
 	/* Get the final checksum. */
-	checksumlen = pg_checksum_final(&checksum_ctx, checksumbuf);
+	checksumlen = pg_checksum_final(checksum_ctx, checksumbuf);
 	if (checksumlen < 0)
 	{
 		report_backup_error(context,
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.h b/src/bin/pg_verifybackup/pg_verifybackup.h
index ff9476e356e..fe0ce8a89aa 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.h
+++ b/src/bin/pg_verifybackup/pg_verifybackup.h
@@ -96,6 +96,9 @@ typedef struct verifier_context
 
 extern manifest_file *verify_manifest_entry(verifier_context *context,
 											char *relpath, int64 filesize);
+extern void verify_checksum(verifier_context *context, manifest_file *m,
+							pg_checksum_context *checksum_ctx,
+							int64 bytes_read);
 
 extern void report_backup_error(verifier_context *context,
 								const char *pg_restrict fmt,...)
-- 
2.18.0

From 744f359f0782d590cac0f29bde76430561c038b5 Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Wed, 21 Aug 2024 10:51:23 +0530
Subject: [PATCH v13 08/12] Refactor: split verify_control_file.

Separated the manifest entry verification code into a new function and
introduced the should_verify_control_data() macro, similar to
should_verify_checksum().

Like verify_file_checksum(), verify_control_file() is too design to
accept the pg_control file patch which will be opened and respective
information will be verified. But, in case of tar backup we would be
having pg_control file contents instead, that needs to be verified in
the same way.  For that reason the code that doing the verification is
separated into separate function to so that can be reused for the tar
backup verification as well.
---
 src/bin/pg_verifybackup/pg_verifybackup.c | 42 ++++++++++-------------
 src/bin/pg_verifybackup/pg_verifybackup.h | 14 ++++++++
 2 files changed, 33 insertions(+), 23 deletions(-)

diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index e44d0377cd5..d04e1d8c8ac 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -66,8 +66,6 @@ static void verify_backup_directory(verifier_context *context,
 									char *relpath, char *fullpath);
 static void verify_plain_backup_file(verifier_context *context,
 									 char *relpath, char *fullpath);
-static void verify_control_file(const char *controlpath,
-								uint64 manifest_system_identifier);
 static void report_extra_backup_files(verifier_context *context);
 static void verify_backup_checksums(verifier_context *context);
 static void verify_file_checksum(verifier_context *context,
@@ -631,14 +629,20 @@ verify_plain_backup_file(verifier_context *context, char *relpath,
 	/* Check the backup manifest entry for this file. */
 	m = verify_manifest_entry(context, relpath, sb.st_size);
 
-	/*
-	 * Validate the manifest system identifier, not available in manifest
-	 * version 1.
-	 */
-	if (context->manifest->version != 1 &&
-		strcmp(relpath, "global/pg_control") == 0 &&
-		m != NULL && m->matched && !m->bad)
-		verify_control_file(fullpath, context->manifest->system_identifier);
+	/* Validate the pg_control information */
+	if (should_verify_control_data(context->manifest, m))
+	{
+		ControlFileData *control_file;
+		bool		crc_ok;
+
+		pg_log_debug("reading \"%s\"", fullpath);
+		control_file = get_controlfile_by_exact_path(fullpath, &crc_ok);
+
+		verify_control_data(control_file, fullpath, crc_ok,
+							context->manifest->system_identifier);
+		/* Release memory. */
+		pfree(control_file);
+	}
 
 	/* Update statistics for progress report, if necessary */
 	if (show_progress && !context->skip_checksums &&
@@ -687,18 +691,13 @@ verify_manifest_entry(verifier_context *context, char *relpath, int64 filesize)
 }
 
 /*
- * Sanity check control file and validate system identifier against manifest
- * system identifier.
+ * Sanity check control file data and validate system identifier against
+ * manifest system identifier.
  */
-static void
-verify_control_file(const char *controlpath, uint64 manifest_system_identifier)
+void
+verify_control_data(ControlFileData *control_file, const char *controlpath,
+					bool crc_ok, uint64 manifest_system_identifier)
 {
-	ControlFileData *control_file;
-	bool		crc_ok;
-
-	pg_log_debug("reading \"%s\"", controlpath);
-	control_file = get_controlfile_by_exact_path(controlpath, &crc_ok);
-
 	/* Control file contents not meaningful if CRC is bad. */
 	if (!crc_ok)
 		report_fatal_error("%s: CRC is incorrect", controlpath);
@@ -714,9 +713,6 @@ verify_control_file(const char *controlpath, uint64 manifest_system_identifier)
 						   controlpath,
 						   (unsigned long long) manifest_system_identifier,
 						   (unsigned long long) control_file->system_identifier);
-
-	/* Release memory. */
-	pfree(control_file);
 }
 
 /*
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.h b/src/bin/pg_verifybackup/pg_verifybackup.h
index fe0ce8a89aa..818064c6eed 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.h
+++ b/src/bin/pg_verifybackup/pg_verifybackup.h
@@ -40,6 +40,17 @@ typedef struct manifest_file
 	(((m) != NULL) && ((m)->matched) && !((m)->bad) && \
 	 (((m)->checksum_type) != CHECKSUM_TYPE_NONE))
 
+/*
+ * Validate the control file and its system identifier against the manifest
+ * system identifier. Note that this feature is not available in manifest
+ * version 1. This validation should only be performed after the manifest entry
+ * validation for the pg_control file has been completed without errors.
+ */
+#define should_verify_control_data(manifest, m) \
+	(((manifest)->version != 1) && \
+	 ((m) != NULL) && ((m)->matched) && !((m)->bad) && \
+	 (strcmp((m)->pathname, "global/pg_control") == 0))
+
 /*
  * Define a hash table which we can use to store information about the files
  * mentioned in the backup manifest.
@@ -99,6 +110,9 @@ extern manifest_file *verify_manifest_entry(verifier_context *context,
 extern void verify_checksum(verifier_context *context, manifest_file *m,
 							pg_checksum_context *checksum_ctx,
 							int64 bytes_read);
+extern void verify_control_data(ControlFileData *control_file,
+								const char *controlpath, bool crc_ok,
+								uint64 manifest_system_identifier);
 
 extern void report_backup_error(verifier_context *context,
 								const char *pg_restrict fmt,...)
-- 
2.18.0

From b74a6bed49474c4cacdb1a7a3626004ae9ad7f13 Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Thu, 8 Aug 2024 16:01:33 +0530
Subject: [PATCH v13 09/12] Add simple_ptr_list_destroy() and
 simple_ptr_list_destroy_deep() API.

We didn't have any helper function to destroy SimplePtrList, likely
because it wasn't needed before, but it's required in a later patch in
this set.  I've added two functions for this purpose, inspired by
list_free() and list_free_deep().
---
 src/fe_utils/simple_list.c         | 39 ++++++++++++++++++++++++++++++
 src/include/fe_utils/simple_list.h |  2 ++
 2 files changed, 41 insertions(+)

diff --git a/src/fe_utils/simple_list.c b/src/fe_utils/simple_list.c
index 2d88eb54067..9d218911c31 100644
--- a/src/fe_utils/simple_list.c
+++ b/src/fe_utils/simple_list.c
@@ -173,3 +173,42 @@ simple_ptr_list_append(SimplePtrList *list, void *ptr)
 		list->head = cell;
 	list->tail = cell;
 }
+
+/*
+ * Destroy a pointer list and optionally the pointed-to element
+ */
+static void
+simple_ptr_list_destroy_private(SimplePtrList *list, bool deep)
+{
+	SimplePtrListCell *cell;
+
+	cell = list->head;
+	while (cell != NULL)
+	{
+		SimplePtrListCell *next;
+
+		next = cell->next;
+		if (deep)
+			pg_free(cell->ptr);
+		pg_free(cell);
+		cell = next;
+	}
+}
+
+/*
+ * Destroy a pointer list and the pointed-to element
+ */
+void
+simple_ptr_list_destroy_deep(SimplePtrList *list)
+{
+	simple_ptr_list_destroy_private(list, true);
+}
+
+/*
+ * Destroy only pointer list and not the pointed-to element
+ */
+void
+simple_ptr_list_destroy(SimplePtrList *list)
+{
+	simple_ptr_list_destroy_private(list, false);
+}
diff --git a/src/include/fe_utils/simple_list.h b/src/include/fe_utils/simple_list.h
index d42ecded8ed..5b7cbec8a62 100644
--- a/src/include/fe_utils/simple_list.h
+++ b/src/include/fe_utils/simple_list.h
@@ -66,5 +66,7 @@ extern void simple_string_list_destroy(SimpleStringList *list);
 extern const char *simple_string_list_not_touched(SimpleStringList *list);
 
 extern void simple_ptr_list_append(SimplePtrList *list, void *ptr);
+extern void simple_ptr_list_destroy_deep(SimplePtrList *list);
+extern void simple_ptr_list_destroy(SimplePtrList *list);
 
 #endif							/* SIMPLE_LIST_H */
-- 
2.18.0

From b4fd85d44473e4fb21eee92f06a547f8518ca203 Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Wed, 21 Aug 2024 11:03:44 +0530
Subject: [PATCH v13 10/12] pg_verifybackup: Add backup format and compression
 option

----
NOTE: This patch is not meant to be committed separately. It should
be squashed with the next patch that implements tar format support.
----
---
 src/bin/pg_verifybackup/pg_verifybackup.c | 73 ++++++++++++++++++++++-
 src/bin/pg_verifybackup/pg_verifybackup.h |  1 +
 2 files changed, 72 insertions(+), 2 deletions(-)

diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index d04e1d8c8ac..bcae2d2990f 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -62,6 +62,7 @@ static void report_manifest_error(JsonManifestParseContext *context,
 								  const char *fmt,...)
 			pg_attribute_printf(2, 3) pg_attribute_noreturn();
 
+static char find_backup_format(verifier_context *context);
 static void verify_backup_directory(verifier_context *context,
 									char *relpath, char *fullpath);
 static void verify_plain_backup_file(verifier_context *context,
@@ -97,6 +98,7 @@ main(int argc, char **argv)
 		{"exit-on-error", no_argument, NULL, 'e'},
 		{"ignore", required_argument, NULL, 'i'},
 		{"manifest-path", required_argument, NULL, 'm'},
+		{"format", required_argument, NULL, 'F'},
 		{"no-parse-wal", no_argument, NULL, 'n'},
 		{"progress", no_argument, NULL, 'P'},
 		{"quiet", no_argument, NULL, 'q'},
@@ -118,6 +120,7 @@ main(int argc, char **argv)
 	progname = get_progname(argv[0]);
 
 	memset(&context, 0, sizeof(context));
+	context.format = '\0';
 
 	if (argc > 1)
 	{
@@ -154,7 +157,7 @@ main(int argc, char **argv)
 	simple_string_list_append(&context.ignore_list, "recovery.signal");
 	simple_string_list_append(&context.ignore_list, "standby.signal");
 
-	while ((c = getopt_long(argc, argv, "ei:m:nPqsw:", long_options, NULL)) != -1)
+	while ((c = getopt_long(argc, argv, "eF:i:m:nPqsw:", long_options, NULL)) != -1)
 	{
 		switch (c)
 		{
@@ -173,6 +176,15 @@ main(int argc, char **argv)
 				manifest_path = pstrdup(optarg);
 				canonicalize_path(manifest_path);
 				break;
+			case 'F':
+				if (strcmp(optarg, "p") == 0 || strcmp(optarg, "plain") == 0)
+					context.format = 'p';
+				else if (strcmp(optarg, "t") == 0 || strcmp(optarg, "tar") == 0)
+					context.format = 't';
+				else
+					pg_fatal("invalid backup format \"%s\", must be \"plain\" or \"tar\"",
+							 optarg);
+				break;
 			case 'n':
 				no_parse_wal = true;
 				break;
@@ -220,11 +232,26 @@ main(int argc, char **argv)
 		pg_fatal("cannot specify both %s and %s",
 				 "-P/--progress", "-q/--quiet");
 
+	/* Determine the backup format if it hasn't been specified. */
+	if (context.format == '\0')
+		context.format = find_backup_format(&context);
+
 	/* Unless --no-parse-wal was specified, we will need pg_waldump. */
 	if (!no_parse_wal)
 	{
 		int			ret;
 
+		/*
+		 * XXX: In the future, we should consider enhancing pg_waldump to read
+		 * WAL files from the tar file.
+		 */
+		if (context.format != 'p')
+		{
+			pg_log_error("pg_waldump cannot read from a tar");
+			pg_log_error_hint("You must use -n or --no-parse-wal when verifying a tar-format backup.");
+			exit(1);
+		}
+
 		pg_waldump_path = pg_malloc(MAXPGPATH);
 		ret = find_other_exec(argv[0], "pg_waldump",
 							  "pg_waldump (PostgreSQL) " PG_VERSION "\n",
@@ -279,8 +306,13 @@ main(int argc, char **argv)
 	/*
 	 * Now do the expensive work of verifying file checksums, unless we were
 	 * told to skip it.
+	 *
+	 * We were only checking the plain backup here. For the tar backup, file
+	 * checksums verification (if requested) will be done immediately when the
+	 * file is accessed, as we don't have random access to the files like we
+	 * do with plain backups.
 	 */
-	if (!context.skip_checksums)
+	if (!context.skip_checksums && context.format == 'p')
 		verify_backup_checksums(&context);
 
 	/*
@@ -981,6 +1013,42 @@ should_ignore_relpath(verifier_context *context, const char *relpath)
 	return false;
 }
 
+/*
+ * To detect the backup format, it checks for the PG_VERSION file in the backup
+ * directory. If found, it will be considered a plain-format backup; otherwise,
+ * it will be assumed to be a tar-format backup.
+ */
+static char
+find_backup_format(verifier_context *context)
+{
+	char		result;
+	char	   *path;
+	struct stat sb;
+
+	/* Should be here only if the backup format is unknown */
+	Assert(context->format == '\0');
+
+	/* Check PG_VERSION file. */
+	path = psprintf("%s/%s", context->backup_directory, "PG_VERSION");
+	if (stat(path, &sb) == 0)
+		result = 'p';
+	else
+	{
+		if (errno != ENOENT)
+		{
+			pg_log_error("cannot determine backup format : %m");
+			pg_log_error_hint("Try \"%s --help\" for more information.", progname);
+			exit(1);
+		}
+
+		/* Otherwise, it is assumed to be a tar format backup. */
+		result = 't';
+	}
+	pfree(path);
+
+	return result;
+}
+
 /*
  * Print a progress report based on the global variables.
  *
@@ -1038,6 +1106,7 @@ usage(void)
 	printf(_("  -e, --exit-on-error         exit immediately on error\n"));
 	printf(_("  -i, --ignore=RELATIVE_PATH  ignore indicated path\n"));
 	printf(_("  -m, --manifest-path=PATH    use specified path for manifest\n"));
+	printf(_("  -F, --format=p|t            backup format (plain, tar)\n"));
 	printf(_("  -n, --no-parse-wal          do not try to parse WAL files\n"));
 	printf(_("  -P, --progress              show progress information\n"));
 	printf(_("  -q, --quiet                 do not print any output, except for errors\n"));
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.h b/src/bin/pg_verifybackup/pg_verifybackup.h
index 818064c6eed..80031ad4dbc 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.h
+++ b/src/bin/pg_verifybackup/pg_verifybackup.h
@@ -100,6 +100,7 @@ typedef struct verifier_context
 	manifest_data *manifest;
 	char	   *backup_directory;
 	SimpleStringList ignore_list;
+	char		format;			/* backup format:  p(lain)/t(ar) */
 	bool		skip_checksums;
 	bool		exit_on_error;
 	bool		saw_any_error;
-- 
2.18.0

From 402a301f203cebac028d1aef5ba8db2cb87890b7 Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Wed, 21 Aug 2024 12:49:04 +0530
Subject: [PATCH v13 11/12] pg_verifybackup: Read tar files and verify its
 contents

This patch implements TAR format backup verification.

For progress reporting support, we perform this verification in two
passes: the first pass calculates total_size, and the second pass
updates done_size as verification progresses.

For the verification, in the first pass, we call verify_tar_backup_file(),
which performs basic verification by expecting only base.tar, pg_wal.tar, or
<tablespaceoid>.tar files and raises an error for any other files.  It
also determines the compression type of the archive file. All this
information is stored in a newly added tarFile struct, which is
appended to a list that will be used in the second pass (by
verify_tar_content()) for the final verification. In the second pass,
the tar archives are read, decompressed, and the required verification
is carried out.

For reading and decompression, fe_utils/astreamer.h is used. For
verification, a new archive streamer has been added in
astreamer_verify.c to handle TAR member files and their contents; see
astreamer_verify_content() for details. The stack of astreamers will
be set up for each TAR file in verify_tar_content(), depending on its
compression type which is detected in the first pass.

When information about a TAR member file (i.e., ASTREAMER_MEMBER_HEADER)
is received, we first verify its entry against the backup manifest. We
then decide if further checks are needed, such as checksum
verification and control data verification (if it is a pg_control
file), once the member file contents are received. Although this
decision could be made when the contents are received, it is more
efficient to make it earlier since the member file contents are
received in multiple iterations. In short, we process
ASTREAMER_MEMBER_CONTENTS multiple times but only once for other
ASTREAMER_MEMBER_* cases. We maintain this information in the
astreamer_verify structure for each member file, which is reset when
the file ends.

Unlike in a plain backup, checksum verification here occurs in two
steps. First, as the contents are received, the checksum is computed
incrementally (see member_compute_checksum). Then, at the end of
processing the member file, the final verification is performed (see
member_verify_checksum).

Similarly, during the content receiving stage, if the file is
pg_control, the data will be copied into a local buffer (see
member_copy_control_data).  The verification will then be carried out
at the end of the member file processing (see member_verify_control_data)
---
 src/bin/pg_verifybackup/Makefile           |   2 +
 src/bin/pg_verifybackup/astreamer_verify.c | 365 +++++++++++++++++++++
 src/bin/pg_verifybackup/meson.build        |   1 +
 src/bin/pg_verifybackup/pg_verifybackup.c  | 341 ++++++++++++++++++-
 src/bin/pg_verifybackup/pg_verifybackup.h  |   6 +
 src/tools/pgindent/typedefs.list           |   2 +
 6 files changed, 714 insertions(+), 3 deletions(-)
 create mode 100644 src/bin/pg_verifybackup/astreamer_verify.c

diff --git a/src/bin/pg_verifybackup/Makefile b/src/bin/pg_verifybackup/Makefile
index 7c045f142e8..374d4a8afd1 100644
--- a/src/bin/pg_verifybackup/Makefile
+++ b/src/bin/pg_verifybackup/Makefile
@@ -17,10 +17,12 @@ top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
 
 # We need libpq only because fe_utils does.
+override CPPFLAGS := -I$(libpq_srcdir) $(CPPFLAGS)
 LDFLAGS_INTERNAL += -L$(top_builddir)/src/fe_utils -lpgfeutils $(libpq_pgport)
 
 OBJS = \
 	$(WIN32RES) \
+	astreamer_verify.o \
 	pg_verifybackup.o
 
 all: pg_verifybackup
diff --git a/src/bin/pg_verifybackup/astreamer_verify.c b/src/bin/pg_verifybackup/astreamer_verify.c
new file mode 100644
index 00000000000..f08868170b4
--- /dev/null
+++ b/src/bin/pg_verifybackup/astreamer_verify.c
@@ -0,0 +1,365 @@
+/*-------------------------------------------------------------------------
+ *
+ * astreamer_verify.c
+ *
+ * Extend fe_utils/astreamer.h archive streaming facility to verify TAR
+ * format backup.
+ *
+ * Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
+ *
+ * src/bin/pg_verifybackup/astreamer_verify.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres_fe.h"
+
+#include "pg_verifybackup.h"
+
+typedef struct astreamer_verify
+{
+	astreamer	base;
+	verifier_context *context;
+	char	   *archive_name;
+	Oid			tblspc_oid;
+	pg_checksum_context *checksum_ctx;
+
+	/* Hold information for a member file verification */
+	manifest_file *mfile;
+	int64		received_bytes;
+	bool		verify_checksum;
+	bool		verify_control_data;
+} astreamer_verify;
+
+static void astreamer_verify_content(astreamer *streamer,
+									 astreamer_member *member,
+									 const char *data, int len,
+									 astreamer_archive_context context);
+static void astreamer_verify_finalize(astreamer *streamer);
+static void astreamer_verify_free(astreamer *streamer);
+
+static void member_verify_header(astreamer *streamer, astreamer_member *member);
+static void member_compute_checksum(astreamer *streamer,
+									astreamer_member *member,
+									const char *data, int len);
+static void member_verify_checksum(astreamer *streamer);
+static void member_copy_control_data(astreamer *streamer,
+									 astreamer_member *member,
+									 const char *data, int len);
+static void member_verify_control_data(astreamer *streamer);
+static void member_reset_info(astreamer *streamer);
+
+static const astreamer_ops astreamer_verify_ops = {
+	.content = astreamer_verify_content,
+	.finalize = astreamer_verify_finalize,
+	.free = astreamer_verify_free
+};
+
+/*
+ * Create a astreamer that can verifies content of a TAR file.
+ */
+astreamer *
+astreamer_verify_content_new(astreamer *next, verifier_context *context,
+							 char *archive_name, Oid tblspc_oid)
+{
+	astreamer_verify *streamer;
+
+	streamer = palloc0(sizeof(astreamer_verify));
+	*((const astreamer_ops **) &streamer->base.bbs_ops) =
+		&astreamer_verify_ops;
+
+	streamer->base.bbs_next = next;
+	streamer->context = context;
+	streamer->archive_name = archive_name;
+	streamer->tblspc_oid = tblspc_oid;
+	initStringInfo(&streamer->base.bbs_buffer);
+
+	if (!context->skip_checksums)
+		streamer->checksum_ctx = pg_malloc(sizeof(pg_checksum_context));
+
+	return &streamer->base;
+}
+
+/*
+ * The main entry point of the archive streamer for verifying tar members.
+ */
+static void
+astreamer_verify_content(astreamer *streamer, astreamer_member *member,
+						 const char *data, int len,
+						 astreamer_archive_context context)
+{
+	astreamer_verify *mystreamer = (astreamer_verify *) streamer;
+
+	Assert(context != ASTREAMER_UNKNOWN);
+
+	switch (context)
+	{
+		case ASTREAMER_MEMBER_HEADER:
+
+			/*
+			 * Perform the initial check and setup verification steps.
+			 */
+			member_verify_header(streamer, member);
+			break;
+
+		case ASTREAMER_MEMBER_CONTENTS:
+
+			/*
+			 * Since we are receiving the member content in chunks, it must be
+			 * processed according to the flags set by the member header
+			 * processing routine. This includes performing incremental
+			 * checksum computations and copying control data to the local
+			 * buffer.
+			 */
+			if (mystreamer->verify_checksum)
+				member_compute_checksum(streamer, member, data, len);
+
+			if (mystreamer->verify_control_data)
+				member_copy_control_data(streamer, member, data, len);
+			break;
+
+		case ASTREAMER_MEMBER_TRAILER:
+
+			/*
+			 * We have reached the end of the member file. By this point, we
+			 * should have successfully computed the checksum of the received
+			 * content and copied the entire pg_control file data into our
+			 * local buffer. We can now proceed with the final verification.
+			 */
+			if (mystreamer->verify_checksum)
+				member_verify_checksum(streamer);
+
+			if (mystreamer->verify_control_data)
+				member_verify_control_data(streamer);
+
+			/*
+			 * Reset the temporary information stored for the verification.
+			 */
+			member_reset_info(streamer);
+			break;
+
+		case ASTREAMER_ARCHIVE_TRAILER:
+			break;
+
+		default:
+			/* Shouldn't happen. */
+			pg_fatal("unexpected state while parsing tar file");
+	}
+}
+
+/*
+ * End-of-stream processing for a astreamer_verify stream.
+ */
+static void
+astreamer_verify_finalize(astreamer *streamer)
+{
+	Assert(streamer->bbs_next == NULL);
+}
+
+/*
+ * Free memory associated with a astreamer_verify stream.
+ */
+static void
+astreamer_verify_free(astreamer *streamer)
+{
+	astreamer_verify *mystreamer = (astreamer_verify *) streamer;
+
+	if (mystreamer->checksum_ctx)
+		pfree(mystreamer->checksum_ctx);
+
+	pfree(streamer->bbs_buffer.data);
+	pfree(streamer);
+}
+
+/*
+ * Verifies whether the tar member entry exists in the backup manifest.
+ *
+ * If the archive being processed is a tablespace, it prepares the necessary
+ * file path first. If a valid entry is found in the backup manifest, it then
+ * determines whether checksum and control data verification should be
+ * performed during file content processing.
+ */
+static void
+member_verify_header(astreamer *streamer, astreamer_member *member)
+{
+	astreamer_verify *mystreamer = (astreamer_verify *) streamer;
+	manifest_file *m;
+
+	/* We are only interested in files that are not in the ignore list. */
+	if (member->is_directory || member->is_link ||
+		should_ignore_relpath(mystreamer->context, member->pathname))
+		return;
+
+	/*
+	 * The backup manifest stores a relative path to the base directory for
+	 * files belonging to a tablespace, while the tablespace backup tar
+	 * archive does not include this path. Ensure the required path is
+	 * prepared; otherwise, the manifest entry verification will fail.
+	 */
+	if (OidIsValid(mystreamer->tblspc_oid))
+	{
+		char		temp[MAXPGPATH];
+
+		/* Copy original name at temporary space */
+		memcpy(temp, member->pathname, MAXPGPATH);
+
+		snprintf(member->pathname, MAXPGPATH, "%s/%d/%s",
+				 "pg_tblspc", mystreamer->tblspc_oid, temp);
+	}
+
+	/* Check the manifest entry */
+	m = verify_manifest_entry(mystreamer->context, member->pathname,
+							  member->size);
+	mystreamer->mfile = (void *) m;
+
+	/*
+	 * Prepare for checksum and control data verification.
+	 *
+	 * We could have these checks while receiving contents. However, since
+	 * contents are received in multiple iterations, this would result in
+	 * these lengthy checks being performed multiple times. Instead, setting
+	 * flags here and using them before proceeding with verification will be
+	 * more efficient.
+	 */
+	mystreamer->verify_checksum =
+		(!mystreamer->context->skip_checksums && should_verify_checksum(m));
+	mystreamer->verify_control_data =
+		should_verify_control_data(mystreamer->context->manifest, m);
+
+	/* Initialize the context required for checksum verification. */
+	if (mystreamer->verify_checksum &&
+		pg_checksum_init(mystreamer->checksum_ctx, m->checksum_type) < 0)
+	{
+		report_backup_error(mystreamer->context,
+							"%s: could not initialize checksum of file \"%s\"",
+							mystreamer->archive_name, m->pathname);
+
+		/*
+		 * Checksum verification cannot be performed without proper context
+		 * initialization.
+		 */
+		mystreamer->verify_checksum = false;
+	}
+}
+
+/*
+ * Computes the checksum incrementally for the received file content.
+ *
+ * Should have a correctly initialized checksum_ctx, which will be used for
+ * incremental checksum computation.
+ */
+static void
+member_compute_checksum(astreamer *streamer, astreamer_member *member,
+						const char *data, int len)
+{
+	astreamer_verify *mystreamer = (astreamer_verify *) streamer;
+	pg_checksum_context *checksum_ctx = mystreamer->checksum_ctx;
+	manifest_file *m = mystreamer->mfile;
+
+	Assert(mystreamer->verify_checksum);
+
+	/* Should have came for the right file */
+	Assert(strcmp(member->pathname, m->pathname) == 0);
+
+	/*
+	 * The checksum context should match the type noted in the backup
+	 * manifest.
+	 */
+	Assert(checksum_ctx->type == m->checksum_type);
+
+	/*
+	 * Update the total count of computed checksum bytes for cross-checking
+	 * with the file size in the final verification stage.
+	 */
+	mystreamer->received_bytes += len;
+
+	if (pg_checksum_update(checksum_ctx, (uint8 *) data, len) < 0)
+	{
+		report_backup_error(mystreamer->context,
+							"could not update checksum of file \"%s\"",
+							m->pathname);
+		mystreamer->verify_checksum = false;
+	}
+}
+
+/*
+ * Perform the final computation and checksum verification after the entire
+ * file content has been processed.
+ */
+static void
+member_verify_checksum(astreamer *streamer)
+{
+	astreamer_verify *mystreamer = (astreamer_verify *) streamer;
+
+	Assert(mystreamer->verify_checksum);
+
+	verify_checksum(mystreamer->context, mystreamer->mfile,
+					mystreamer->checksum_ctx, mystreamer->received_bytes);
+}
+
+/*
+ * Stores the pg_control file contents into a local buffer; we need the entire
+ * control file data for verification.
+ */
+static void
+member_copy_control_data(astreamer *streamer, astreamer_member *member,
+						 const char *data, int len)
+{
+	/* Should be here only for control file */
+	Assert(strcmp(member->pathname, "global/pg_control") == 0);
+	Assert(((astreamer_verify *) streamer)->verify_control_data);
+
+	/* Copy enough control file data needed for verification. */
+	astreamer_buffer_until(streamer, &data, &len, sizeof(ControlFileData));
+}
+
+/*
+ * Performs the CRC calculation of pg_control data and then calls the routines
+ * that execute the final verification of the control file information.
+ */
+static void
+member_verify_control_data(astreamer *streamer)
+{
+	astreamer_verify *mystreamer = (astreamer_verify *) streamer;
+	manifest_data *manifest = mystreamer->context->manifest;
+	ControlFileData control_file;
+	pg_crc32c	crc;
+	bool		crc_ok;
+
+	/* Should be here only for control file */
+	Assert(strcmp(mystreamer->mfile->pathname, "global/pg_control") == 0);
+	Assert(mystreamer->verify_control_data);
+
+	/* Should have enough control file data needed for verification. */
+	if (streamer->bbs_buffer.len != sizeof(ControlFileData))
+		report_fatal_error("%s: unexpected control file size: %d, should be %zu",
+						   mystreamer->archive_name, streamer->bbs_buffer.len,
+						   sizeof(ControlFileData));
+
+	memcpy(&control_file, streamer->bbs_buffer.data, sizeof(ControlFileData));
+
+	/* Check the CRC. */
+	INIT_CRC32C(crc);
+	COMP_CRC32C(crc, (char *) (&control_file), offsetof(ControlFileData, crc));
+	FIN_CRC32C(crc);
+
+	crc_ok = EQ_CRC32C(crc, control_file.crc);
+
+	/* Do the final control data verification. */
+	verify_control_data(&control_file, mystreamer->mfile->pathname, crc_ok,
+						manifest->system_identifier);
+}
+
+/*
+ * Reset flags and free memory allocations for member file verification.
+ */
+static void
+member_reset_info(astreamer *streamer)
+{
+	astreamer_verify *mystreamer = (astreamer_verify *) streamer;
+
+	mystreamer->mfile = NULL;
+	mystreamer->received_bytes = 0;
+	mystreamer->verify_checksum = false;
+	mystreamer->verify_control_data = false;
+}
diff --git a/src/bin/pg_verifybackup/meson.build b/src/bin/pg_verifybackup/meson.build
index 7c7d31a0350..0e09d1379d1 100644
--- a/src/bin/pg_verifybackup/meson.build
+++ b/src/bin/pg_verifybackup/meson.build
@@ -1,6 +1,7 @@
 # Copyright (c) 2022-2024, PostgreSQL Global Development Group
 
 pg_verifybackup_sources = files(
+  'astreamer_verify.c',
   'pg_verifybackup.c'
 )
 
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index bcae2d2990f..9cbfc07a7ba 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -22,6 +22,7 @@
 #include "common/parse_manifest.h"
 #include "fe_utils/simple_list.h"
 #include "getopt_long.h"
+#include "limits.h"
 #include "pg_verifybackup.h"
 #include "pgtime.h"
 
@@ -44,6 +45,16 @@
  */
 #define READ_CHUNK_SIZE				(128 * 1024)
 
+/*
+ * Tar file information needed for content verification.
+ */
+typedef struct tar_file
+{
+	char	   *relpath;
+	Oid			tblspc_oid;
+	pg_compress_algorithm compress_algorithm;
+} tar_file;
+
 static manifest_data *parse_manifest_file(char *manifest_path);
 static void verifybackup_version_cb(JsonManifestParseContext *context,
 									int manifest_version);
@@ -63,10 +74,18 @@ static void report_manifest_error(JsonManifestParseContext *context,
 			pg_attribute_printf(2, 3) pg_attribute_noreturn();
 
 static char find_backup_format(verifier_context *context);
+static void verify_plain_backup(verifier_context *context);
+static void verify_tar_backup(verifier_context *context);
 static void verify_backup_directory(verifier_context *context,
 									char *relpath, char *fullpath);
-static void verify_plain_backup_file(verifier_context *context,
-									 char *relpath, char *fullpath);
+static void verify_plain_backup_file(verifier_context *context, char *relpath,
+									 char *fullpath);
+static void precheck_tar_backup_file(verifier_context *context, char *relpath,
+									 char *fullpath, SimplePtrList *tarfiles);
+static void verify_all_tar_files(verifier_context *context,
+								 SimplePtrList *tarfiles);
+static void verify_one_tar_file(verifier_context *context, char *relpath,
+								char *fullpath, astreamer *streamer);
 static void report_extra_backup_files(verifier_context *context);
 static void verify_backup_checksums(verifier_context *context);
 static void verify_file_checksum(verifier_context *context,
@@ -75,6 +94,10 @@ static void verify_file_checksum(verifier_context *context,
 static void parse_required_wal(verifier_context *context,
 							   char *pg_waldump_path,
 							   char *wal_directory);
+static astreamer *create_archive_verifier(verifier_context *context,
+										  char *archive_name,
+										  Oid tblspc_oid,
+										  pg_compress_algorithm compress_algo);
 
 static void progress_report(bool finished);
 static void usage(void);
@@ -294,7 +317,10 @@ main(int argc, char **argv)
 	 * match. We also set the "matched" flag on every manifest entry that
 	 * corresponds to a file on disk.
 	 */
-	verify_backup_directory(&context, NULL, context.backup_directory);
+	if (context.format == 'p')
+		verify_plain_backup(&context);
+	else
+		verify_tar_backup(&context);
 
 	/*
 	 * The "matched" flag should now be set on every entry in the hash table.
@@ -546,6 +572,16 @@ verifybackup_per_wal_range_cb(JsonManifestParseContext *context,
 	manifest->last_wal_range = range;
 }
 
+/*
+ * Verify plain backup.
+ */
+static void
+verify_plain_backup(verifier_context *context)
+{
+	Assert(context->format == 'p');
+	verify_backup_directory(context, NULL, context->backup_directory);
+}
+
 /*
  * Verify one directory.
  *
@@ -682,6 +718,270 @@ verify_plain_backup_file(verifier_context *context, char *relpath,
 		total_size += m->size;
 }
 
+/*
+ * Verify tar backup.
+ *
+ * Unlike plan backup verification, tar backup verification carried out in two
+ * steps; in the first step this would simply sanity check on expected tar file
+ * to be present in the backup directory and it's compression type and collect
+ * these information is list. In the second pass, the tar archives are read,
+ * decompressed, and the required verification is carried out.
+ */
+static void
+verify_tar_backup(verifier_context *context)
+{
+	DIR		   *dir;
+	struct dirent *dirent;
+	SimplePtrList tarfiles = {NULL, NULL};
+	char	   *fullpath = context->backup_directory;
+
+	Assert(context->format == 't');
+
+	/*
+	 * If the backup directory cannot be found, treat this as a fatal error.
+	 */
+	dir = opendir(fullpath);
+	if (dir == NULL)
+		report_fatal_error("could not open directory \"%s\": %m", fullpath);
+
+	while (errno = 0, (dirent = readdir(dir)) != NULL)
+	{
+		char	   *filename = dirent->d_name;
+		char	   *newfullpath = psprintf("%s/%s", fullpath, filename);
+
+		/* Skip "." and ".." */
+		if (filename[0] == '.' && (filename[1] == '\0'
+								   || strcmp(filename, "..") == 0))
+			continue;
+
+		if (!should_ignore_relpath(context, filename))
+			precheck_tar_backup_file(context, filename, newfullpath,
+									 &tarfiles);
+
+		pfree(newfullpath);
+	}
+
+	/* Perform the final verification of the tar contents. */
+	verify_all_tar_files(context, &tarfiles);
+
+	if (closedir(dir))
+	{
+		report_backup_error(context,
+							"could not close directory \"%s\": %m", fullpath);
+		return;
+	}
+}
+
+/*
+ * Preparatory steps for verifying files in tar format backups.
+ *
+ * Carries out basic validation of the tar format backup file, detects the
+ * compression type, and appends that information to the tarfiles list. An
+ * error will be reported if the tar file is inaccessible, or if the file type,
+ * name, or compression type is not as expected.
+ *
+ * The arguments to this function are mostly the same as the
+ * verify_plain_backup_file. The additional argument outputs a list of valid
+ * tar files.
+ */
+static void
+precheck_tar_backup_file(verifier_context *context, char *relpath,
+						 char *fullpath, SimplePtrList *tarfiles)
+{
+	struct stat sb;
+	Oid			tblspc_oid = InvalidOid;
+	pg_compress_algorithm compress_algorithm;
+	tar_file   *tar;
+	char	   *suffix = NULL;
+
+	/* Should be tar format backup */
+	Assert(context->format == 't');
+
+	/* Get file information */
+	if (stat(fullpath, &sb) != 0)
+	{
+		report_backup_error(context,
+							"could not stat file or directory \"%s\": %m",
+							relpath);
+		return;
+	}
+
+	/* In a tar format backup, we expect only plain files. */
+	if (!S_ISREG(sb.st_mode))
+	{
+		report_backup_error(context,
+							"\"%s\" is not a plain file",
+							relpath);
+		return;
+	}
+
+	/*
+	 * We expect tar files for backing up the main directory, tablespace, and
+	 * pg_wal directory.
+	 *
+	 * pg_basebackup writes the main data directory to an archive file named
+	 * base.tar, the pg_wal directory to pg_wal.tar, and the tablespace
+	 * directory to <tablespaceoid>.tar, each followed by a compression type
+	 * extension such as .gz, .lz4, or .zst.
+	 */
+	if (strncmp("base", relpath, 4) == 0)
+		suffix = relpath + 4;
+	else if (strncmp("pg_wal", relpath, 6) == 0)
+		suffix = relpath + 6;
+	else
+	{
+		/* Expected a <tablespaceoid>.tar file here. */
+		uint64		num = strtoul(relpath, &suffix, 10);
+
+		/*
+		 * Report an error if we didn't consume at least one character, if the
+		 * result is 0, or if the value is too large to be a valid OID.
+		 */
+		if (suffix == NULL || num <= 0 || num > OID_MAX)
+			report_backup_error(context,
+								"\"%s\" unexpected file in the tar format backup",
+								relpath);
+		tblspc_oid = (Oid) num;
+	}
+
+	/* Now, check the compression type of the tar */
+	if (strcmp(suffix, ".tar") == 0)
+		compress_algorithm = PG_COMPRESSION_NONE;
+	else if (strcmp(suffix, ".tgz") == 0)
+		compress_algorithm = PG_COMPRESSION_GZIP;
+	else if (strcmp(suffix, ".tar.gz") == 0)
+		compress_algorithm = PG_COMPRESSION_GZIP;
+	else if (strcmp(suffix, ".tar.lz4") == 0)
+		compress_algorithm = PG_COMPRESSION_LZ4;
+	else if (strcmp(suffix, ".tar.zst") == 0)
+		compress_algorithm = PG_COMPRESSION_ZSTD;
+	else
+	{
+		report_backup_error(context,
+							"\"%s\" unexpected file in the tar format backup",
+							relpath);
+		return;
+	}
+
+	/*
+	 * Ignore WALs, as reading and verification will be handled through
+	 * pg_waldump.
+	 */
+	if (strncmp("pg_wal", relpath, 6) == 0)
+		return;
+
+	/*
+	 * Append the information to the list for complete verification at a later
+	 * stage.
+	 */
+	tar = pg_malloc(sizeof(tar_file));
+	tar->relpath = pstrdup(relpath);
+	tar->tblspc_oid = tblspc_oid;
+	tar->compress_algorithm = compress_algorithm;
+
+	simple_ptr_list_append(tarfiles, tar);
+
+	/* Update statistics for progress report, if necessary */
+	if (show_progress)
+		total_size += sb.st_size;
+}
+
+/*
+ * The final steps for verifying files in tar format backups.
+ *
+ * Prepares the archive streamer stack according to the tar compression format
+ * for each tar file and invokes them for reading, decompressing, and
+ * ultimately verifying the contents.
+ *
+ * The arguments to this function should be a list of valid tar files to
+ * verify, and the allocation will be freed once the verification is complete.
+ */
+static void
+verify_all_tar_files(verifier_context *context, SimplePtrList *tarfiles)
+{
+	SimplePtrListCell *cell;
+
+	progress_report(false);
+
+	for (cell = tarfiles->head; cell != NULL; cell = cell->next)
+	{
+		tar_file   *tar = (tar_file *) cell->ptr;
+		astreamer  *streamer;
+		char	   *fullpath;
+
+		/* Prepare archive streamer stack */
+		streamer = create_archive_verifier(context,
+										   tar->relpath,
+										   tar->tblspc_oid,
+										   tar->compress_algorithm);
+
+		/* Compute the full pathname to the target file. */
+		fullpath = psprintf("%s/%s", context->backup_directory,
+							tar->relpath);
+
+		/* Invoke the streamer for reading, decompressing, and verifying. */
+		verify_one_tar_file(context, tar->relpath, fullpath, streamer);
+
+		/* Cleanup. */
+		pfree(tar->relpath);
+		pfree(tar);
+		pfree(fullpath);
+
+		astreamer_finalize(streamer);
+		astreamer_free(streamer);
+	}
+	simple_ptr_list_destroy(tarfiles);
+
+	progress_report(true);
+}
+
+/*
+ * Verification of a single tar file content.
+ *
+ * It reads a given tar archive in predefined chunks and passes it to the
+ * streamer, which initiates routines for decompression (if necessary) and then
+ * verifies each member within the tar file.
+ */
+static void
+verify_one_tar_file(verifier_context *context, char *relpath, char *fullpath,
+					astreamer *streamer)
+{
+	int			fd;
+	int			rc;
+	char	   *buffer;
+
+	pg_log_debug("reading \"%s\"", fullpath);
+
+	/* Open the target file. */
+	if ((fd = open(fullpath, O_RDONLY | PG_BINARY, 0)) < 0)
+	{
+		report_backup_error(context, "could not open file \"%s\": %m",
+							relpath);
+		return;
+	}
+
+	buffer = pg_malloc(READ_CHUNK_SIZE * sizeof(uint8));
+
+	/* Perform the reads */
+	while ((rc = read(fd, buffer, READ_CHUNK_SIZE)) > 0)
+	{
+		astreamer_content(streamer, NULL, buffer, rc, ASTREAMER_UNKNOWN);
+
+		/* Report progress */
+		done_size += rc;
+		progress_report(false);
+	}
+
+	if (rc < 0)
+		report_backup_error(context, "could not read file \"%s\": %m",
+							relpath);
+
+	/* Close the file. */
+	if (close(fd) != 0)
+		report_backup_error(context, "could not close file \"%s\": %m",
+							relpath);
+}
+
 /*
  * Verify file and its size entry in the manifest.
  */
@@ -1049,6 +1349,41 @@ find_backup_format(verifier_context *context)
 	return result;
 }
 
+/*
+ * Identifies the necessary steps for verifying the contents of the
+ * provided tar file.
+ */
+static astreamer *
+create_archive_verifier(verifier_context *context, char *archive_name,
+						Oid tblspc_oid, pg_compress_algorithm compress_algo)
+{
+	astreamer  *streamer = NULL;
+
+	/* Should be here only for tar backup */
+	Assert(context->format == 't');
+
+	/*
+	 * To verify the contents of the tar, the initial step is to parse its
+	 * content.
+	 */
+	streamer = astreamer_verify_content_new(streamer, context, archive_name,
+											tblspc_oid);
+	streamer = astreamer_tar_parser_new(streamer);
+
+	/*
+	 * If the tar is compressed, we must perform the appropriate decompression
+	 * operation before proceeding with the verification of its contents.
+	 */
+	if (compress_algo == PG_COMPRESSION_GZIP)
+		streamer = astreamer_gzip_decompressor_new(streamer);
+	else if (compress_algo == PG_COMPRESSION_LZ4)
+		streamer = astreamer_lz4_decompressor_new(streamer);
+	else if (compress_algo == PG_COMPRESSION_ZSTD)
+		streamer = astreamer_zstd_decompressor_new(streamer);
+
+	return streamer;
+}
+
 /*
  * Print a progress report based on the global variables.
  *
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.h b/src/bin/pg_verifybackup/pg_verifybackup.h
index 80031ad4dbc..be7438af346 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.h
+++ b/src/bin/pg_verifybackup/pg_verifybackup.h
@@ -18,6 +18,7 @@
 #include "common/hashfn_unstable.h"
 #include "common/logging.h"
 #include "common/parse_manifest.h"
+#include "fe_utils/astreamer.h"
 #include "fe_utils/simple_list.h"
 
 /*
@@ -123,4 +124,9 @@ extern void report_fatal_error(const char *pg_restrict fmt,...)
 extern bool should_ignore_relpath(verifier_context *context,
 								  const char *relpath);
 
+extern astreamer *astreamer_verify_content_new(astreamer *next,
+											   verifier_context *context,
+											   char *archive_name,
+											   Oid tblspc_oid);
+
 #endif							/* PG_VERIFYBACKUP_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 9e951a9e6f3..be3d32ef68a 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -3329,6 +3329,7 @@ astreamer_plain_writer
 astreamer_recovery_injector
 astreamer_tar_archiver
 astreamer_tar_parser
+astreamer_verify
 astreamer_zstd_frame
 bgworker_main_type
 bh_node_type
@@ -3950,6 +3951,7 @@ substitute_phv_relids_context
 subxids_array_status
 symbol
 tablespaceinfo
+tar_file
 td_entry
 teSection
 temp_tablespaces_extra
-- 
2.18.0

From 500d9bc69d7d786fae637315261dca79916b91ab Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Thu, 29 Aug 2024 19:01:22 +0530
Subject: [PATCH v13 12/12] pg_verifybackup: Tests and document

----
NOTE: This patch is not meant to be committed separately. It should
be squashed with the previous patch that implements tar format support.
----
---
 doc/src/sgml/ref/pg_verifybackup.sgml         |  43 +++-
 src/bin/pg_verifybackup/t/001_basic.pl        |   6 +-
 src/bin/pg_verifybackup/t/002_algorithm.pl    |  32 ++-
 src/bin/pg_verifybackup/t/003_corruption.pl   | 187 ++++++++++++++++--
 src/bin/pg_verifybackup/t/004_options.pl      |   2 +-
 src/bin/pg_verifybackup/t/005_bad_manifest.pl |   2 +-
 src/bin/pg_verifybackup/t/008_untar.pl        |  73 +++----
 src/bin/pg_verifybackup/t/010_client_untar.pl |  48 +----
 8 files changed, 274 insertions(+), 119 deletions(-)

diff --git a/doc/src/sgml/ref/pg_verifybackup.sgml b/doc/src/sgml/ref/pg_verifybackup.sgml
index a3f167f9f6e..ea6bc3ccb12 100644
--- a/doc/src/sgml/ref/pg_verifybackup.sgml
+++ b/doc/src/sgml/ref/pg_verifybackup.sgml
@@ -34,8 +34,10 @@ PostgreSQL documentation
    integrity of a database cluster backup taken using
    <command>pg_basebackup</command> against a
    <literal>backup_manifest</literal> generated by the server at the time
-   of the backup.  The backup must be stored in the "plain"
-   format; a "tar" format backup can be checked after extracting it.
+   of the backup.  The backup must be stored in the "plain" or "tar"
+   format.  Verification is supported for <literal>gzip</literal>,
+   <literal>lz4</literal>, and  <literal>zstd</literal> compressed tar backup;
+   any other compressed format backups can be checked after decompressing them.
   </para>
 
   <para>
@@ -168,6 +170,43 @@ PostgreSQL documentation
       </listitem>
      </varlistentry>
 
+     <varlistentry>
+      <term><option>-F <replaceable class="parameter">format</replaceable></option></term>
+      <term><option>--format=<replaceable class="parameter">format</replaceable></option></term>
+      <listitem>
+       <para>
+        Specifies the format of the backup. <replaceable>format</replaceable>
+        can be one of the following:
+
+        <variablelist>
+         <varlistentry>
+          <term><literal>p</literal></term>
+          <term><literal>plain</literal></term>
+          <listitem>
+           <para>
+            Backup consists of plain files with the same layout as the
+            source server's data directory and tablespaces.
+           </para>
+          </listitem>
+         </varlistentry>
+
+         <varlistentry>
+          <term><literal>t</literal></term>
+          <term><literal>tar</literal></term>
+          <listitem>
+           <para>
+            Backup consists of tar files.  A valid backup includes the main data
+            directory in a file named <filename>base.tar</filename>, the WAL
+            files in <filename>pg_wal.tar</filename>, and separate tar files for
+            each tablespace, named after the tablespace's OID, followed by the
+            compression extension.
+           </para>
+           </listitem>
+         </varlistentry>
+        </variablelist></para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry>
       <term><option>-n</option></term>
       <term><option>--no-parse-wal</option></term>
diff --git a/src/bin/pg_verifybackup/t/001_basic.pl b/src/bin/pg_verifybackup/t/001_basic.pl
index 2f3e52d296f..ca5b0402b7d 100644
--- a/src/bin/pg_verifybackup/t/001_basic.pl
+++ b/src/bin/pg_verifybackup/t/001_basic.pl
@@ -17,11 +17,11 @@ command_fails_like(
 	qr/no backup directory specified/,
 	'target directory must be specified');
 command_fails_like(
-	[ 'pg_verifybackup', $tempdir ],
+	[ 'pg_verifybackup', '-Fp', $tempdir ],
 	qr/could not open file.*\/backup_manifest\"/,
 	'pg_verifybackup requires a manifest');
 command_fails_like(
-	[ 'pg_verifybackup', $tempdir, $tempdir ],
+	[ 'pg_verifybackup', '-Fp', $tempdir, $tempdir ],
 	qr/too many command-line arguments/,
 	'multiple target directories not allowed');
 
@@ -31,7 +31,7 @@ close($fh);
 
 # but then try to use an alternate, nonexisting manifest
 command_fails_like(
-	[ 'pg_verifybackup', '-m', "$tempdir/not_the_manifest", $tempdir ],
+	[ 'pg_verifybackup', '-Fp', '-m', "$tempdir/not_the_manifest", $tempdir ],
 	qr/could not open file.*\/not_the_manifest\"/,
 	'pg_verifybackup respects -m flag');
 
diff --git a/src/bin/pg_verifybackup/t/002_algorithm.pl b/src/bin/pg_verifybackup/t/002_algorithm.pl
index fb2a1fd7c4e..ac276f3da6b 100644
--- a/src/bin/pg_verifybackup/t/002_algorithm.pl
+++ b/src/bin/pg_verifybackup/t/002_algorithm.pl
@@ -14,24 +14,33 @@ my $primary = PostgreSQL::Test::Cluster->new('primary');
 $primary->init(allows_streaming => 1);
 $primary->start;
 
-for my $algorithm (qw(bogus none crc32c sha224 sha256 sha384 sha512))
+sub test_checksums
 {
-	my $backup_path = $primary->backup_dir . '/' . $algorithm;
+	my ($format, $algorithm) = @_;
+	my $backup_path = $primary->backup_dir . '/' . $format . '/' . $algorithm;
 	my @backup = (
 		'pg_basebackup', '-D', $backup_path,
 		'--manifest-checksums', $algorithm, '--no-sync', '-cfast');
 	my @verify = ('pg_verifybackup', '-e', $backup_path);
 
+	if ($format eq 'tar')
+	{
+		# Add tar backup format option
+		push @backup, ('-F', 't');
+		# Add switch to skip WAL verification.
+		push @verify, ('-n');
+	}
+
 	# A backup with a bogus algorithm should fail.
 	if ($algorithm eq 'bogus')
 	{
 		$primary->command_fails(\@backup,
-			"backup fails with algorithm \"$algorithm\"");
-		next;
+			"$format backup fails with algorithm \"$algorithm\"");
+		return;
 	}
 
 	# A backup with a valid algorithm should work.
-	$primary->command_ok(\@backup, "backup ok with algorithm \"$algorithm\"");
+	$primary->command_ok(\@backup, "$format backup ok with algorithm \"$algorithm\"");
 
 	# We expect each real checksum algorithm to be mentioned on every line of
 	# the backup manifest file except the first and last; for simplicity, we
@@ -39,7 +48,7 @@ for my $algorithm (qw(bogus none crc32c sha224 sha256 sha384 sha512))
 	# is none, we just check that the manifest exists.
 	if ($algorithm eq 'none')
 	{
-		ok(-f "$backup_path/backup_manifest", "backup manifest exists");
+		ok(-f "$backup_path/backup_manifest", "$format backup manifest exists");
 	}
 	else
 	{
@@ -52,10 +61,19 @@ for my $algorithm (qw(bogus none crc32c sha224 sha256 sha384 sha512))
 
 	# Make sure that it verifies OK.
 	$primary->command_ok(\@verify,
-		"verify backup with algorithm \"$algorithm\"");
+		"verify $format backup with algorithm \"$algorithm\"");
 
 	# Remove backup immediately to save disk space.
 	rmtree($backup_path);
 }
 
+# Do the check
+for my $format (qw(plain tar))
+{
+	for my $algorithm (qw(bogus none crc32c sha224 sha256 sha384 sha512))
+	{
+		test_checksums($format, $algorithm);
+	}
+}
+
 done_testing();
diff --git a/src/bin/pg_verifybackup/t/003_corruption.pl b/src/bin/pg_verifybackup/t/003_corruption.pl
index ae91e043384..d0c3ffedd14 100644
--- a/src/bin/pg_verifybackup/t/003_corruption.pl
+++ b/src/bin/pg_verifybackup/t/003_corruption.pl
@@ -11,6 +11,8 @@ use PostgreSQL::Test::Cluster;
 use PostgreSQL::Test::Utils;
 use Test::More;
 
+my $tar = $ENV{TAR};
+
 my $primary = PostgreSQL::Test::Cluster->new('primary');
 $primary->init(allows_streaming => 1);
 $primary->start;
@@ -32,62 +34,73 @@ EOM
 my @scenario = (
 	{
 		'name' => 'extra_file',
+		'backup_format' => 'p',
 		'mutilate' => \&mutilate_extra_file,
 		'fails_like' =>
 		  qr/extra_file.*present on disk but not in the manifest/
 	},
 	{
 		'name' => 'extra_tablespace_file',
+		'backup_format' => 'p',
 		'mutilate' => \&mutilate_extra_tablespace_file,
 		'fails_like' =>
 		  qr/extra_ts_file.*present on disk but not in the manifest/
 	},
 	{
 		'name' => 'missing_file',
+		'backup_format' => 'p',
 		'mutilate' => \&mutilate_missing_file,
 		'fails_like' =>
 		  qr/pg_xact\/0000.*present in the manifest but not on disk/
 	},
 	{
 		'name' => 'missing_tablespace',
+		'backup_format' => 'p',
 		'mutilate' => \&mutilate_missing_tablespace,
 		'fails_like' =>
 		  qr/pg_tblspc.*present in the manifest but not on disk/
 	},
 	{
 		'name' => 'append_to_file',
+		'backup_format' => 'p',
 		'mutilate' => \&mutilate_append_to_file,
 		'fails_like' => qr/has size \d+ on disk but size \d+ in the manifest/
 	},
 	{
 		'name' => 'truncate_file',
+		'backup_format' => 'p',
 		'mutilate' => \&mutilate_truncate_file,
 		'fails_like' => qr/has size 0 on disk but size \d+ in the manifest/
 	},
 	{
 		'name' => 'replace_file',
+		'backup_format' => 'p',
 		'mutilate' => \&mutilate_replace_file,
 		'fails_like' => qr/checksum mismatch for file/
 	},
 	{
 		'name' => 'system_identifier',
+		'backup_format' => 'p',
 		'mutilate' => \&mutilate_system_identifier,
 		'fails_like' =>
 		  qr/manifest system identifier is .*, but control file has/
 	},
 	{
 		'name' => 'bad_manifest',
+		'backup_format' => 'p',
 		'mutilate' => \&mutilate_bad_manifest,
 		'fails_like' => qr/manifest checksum mismatch/
 	},
 	{
 		'name' => 'open_file_fails',
+		'backup_format' => 'p',
 		'mutilate' => \&mutilate_open_file_fails,
 		'fails_like' => qr/could not open file/,
 		'skip_on_windows' => 1
 	},
 	{
 		'name' => 'open_directory_fails',
+		'backup_format' => 'p',
 		'mutilate' => \&mutilate_open_directory_fails,
 		'cleanup' => \&cleanup_open_directory_fails,
 		'fails_like' => qr/could not open directory/,
@@ -95,10 +108,61 @@ my @scenario = (
 	},
 	{
 		'name' => 'search_directory_fails',
+		'backup_format' => 'p',
 		'mutilate' => \&mutilate_search_directory_fails,
 		'cleanup' => \&cleanup_search_directory_fails,
 		'fails_like' => qr/could not stat file or directory/,
 		'skip_on_windows' => 1
+	},
+	{
+		'name' => 'tar_backup_unexpected_file',
+		'backup_format' => 't',
+		'mutilate' => \&mutilate_extra_file,
+		'fails_like' =>
+		  qr/extra_file.*unexpected file in the tar format backup/
+	},
+	{
+		'name' => 'tar_backup_extra_file',
+		'backup_format' => 't',
+		'mutilate' => \&mutilate_tar_backup_extra_file,
+		'fails_like' =>
+		  qr/extra_tar_member_file.*present on disk but not in the manifest/
+	},
+	{
+		'name' => 'tar_backup_missing_file',
+		'backup_format' => 't',
+		'mutilate' => \&mutilate_tar_backup_missing_file,
+		'fails_like' =>
+		  qr/pg_xact\/0000.*present in the manifest but not on disk/,
+		'skip_on_windows' => 1
+	},
+	{
+		'name' => 'tar_backup_append_to_file',
+		'backup_format' => 't',
+		'mutilate' => \&mutilate_tar_backup_append_to_file,
+		'fails_like' => qr/has size \d+ on disk but size \d+ in the manifest/,
+		'skip_on_windows' => 1
+	},
+	{
+		'name' => 'tar_backup_truncate_file',
+		'backup_format' => 't',
+		'mutilate' => \&mutilate_tar_backup_truncate_file,
+		'fails_like' => qr/has size 0 on disk but size \d+ in the manifest/,
+		'skip_on_windows' => 1
+	},
+	{
+		'name' => 'tar_backup_replace_file',
+		'backup_format' => 't',
+		'mutilate' => \&mutilate_tar_backup_replace_file,
+		'fails_like' => qr/checksum mismatch for file/,
+		'skip_on_windows' => 1
+	},
+	{
+		'name' => 'tar_backup_system_identifier',
+		'backup_format' => 't',
+		'mutilate' => \&mutilate_system_identifier,
+		'fails_like' =>
+		  qr/manifest system identifier is .*, but control file has/
 	});
 
 for my $scenario (@scenario)
@@ -111,29 +175,40 @@ for my $scenario (@scenario)
 		  if ($scenario->{'skip_on_windows'}
 			&& ($windows_os || $Config::Config{osname} eq 'cygwin'));
 
+		# Skip tests for tar format-backup if tar is not available.
+		skip "no tar program available", 4
+		if ($scenario->{'backup_format'} eq 't' && (!defined $tar || $tar eq ''));
+
 		# Take a backup and check that it verifies OK.
 		my $backup_path = $primary->backup_dir . '/' . $name;
 		my $backup_ts_path = PostgreSQL::Test::Utils::tempdir_short();
+
+		my @backup = (
+				'pg_basebackup', '-D', $backup_path, '--no-sync', '-cfast',
+				'-T', "${source_ts_path}=${backup_ts_path}");
+		my @verify = ('pg_verifybackup', $backup_path);
+
+		if ($scenario->{'backup_format'} eq 't')
+		{
+			# Add tar backup format option
+			push @backup, ('-F', 't');
+			# Add switch to skip WAL verification.
+			push @verify, ('-n');
+		}
+
 		# The tablespace map parameter confuses Msys2, which tries to mangle
 		# it. Tell it not to.
 		# See https://www.msys2.org/wiki/Porting/#filesystem-namespaces
 		local $ENV{MSYS2_ARG_CONV_EXCL} = $source_ts_prefix;
-		$primary->command_ok(
-			[
-				'pg_basebackup', '-D', $backup_path, '--no-sync', '-cfast',
-				'-T', "${source_ts_path}=${backup_ts_path}"
-			],
-			"base backup ok");
-		command_ok([ 'pg_verifybackup', $backup_path ],
-			"intact backup verified");
+
+		$primary->command_ok( \@backup, "base backup ok");
+		command_ok(\@verify, "intact backup verified");
 
 		# Mutilate the backup in some way.
 		$scenario->{'mutilate'}->($backup_path);
 
 		# Now check that the backup no longer verifies.
-		command_fails_like(
-			[ 'pg_verifybackup', $backup_path ],
-			$scenario->{'fails_like'},
+		command_fails_like(\@verify, $scenario->{'fails_like'},
 			"corrupt backup fails verification: $name");
 
 		# Run cleanup hook, if provided.
@@ -260,6 +335,7 @@ sub mutilate_system_identifier
 		$backup_path . '/backup_manifest')
 	  or BAIL_OUT "could not copy manifest to $backup_path";
 	$node->teardown_node(fail_ok => 1);
+	$node->clean_node();
 	return;
 }
 
@@ -316,4 +392,93 @@ sub cleanup_search_directory_fails
 	return;
 }
 
+# Unpack base.tar, perform the specified file operation, and then repack the
+# modified content into base.tar at the same location.
+sub mutilate_base_tar
+{
+	my ($backup_path, $op) = @_;
+
+	my $archive = 'base.tar';
+	my $tmpdir = "$backup_path/tmpdir";
+	mkdir($tmpdir) || die "$!";
+
+	# Extract the archive
+	system_or_bail($tar, '-xf', "$backup_path/$archive", '-C', "$tmpdir");
+	unlink("$backup_path/$archive") || die "$!";
+
+	if ($op eq 'add')
+	{
+		create_extra_file($tmpdir, 'extra_tar_member_file');
+	}
+	elsif ($op eq 'delete')
+	{
+		mutilate_missing_file($tmpdir);
+	}
+	elsif ($op eq 'append')
+	{
+		mutilate_append_to_file($tmpdir);
+	}
+	elsif ($op eq 'truncate')
+	{
+		mutilate_truncate_file($tmpdir);
+	}
+	elsif ($op eq 'replace')
+	{
+		mutilate_replace_file($tmpdir);
+	}
+	else
+	{
+		die "mutilate_tar_backup: \"$op\" invalid operation";
+	}
+
+
+	# Navigate to the extracted location and list the files.
+	chdir("$tmpdir") || die "$!";
+	my @files = glob("*");
+	# Repack the extracted content
+	system_or_bail($tar, '-cf', "$backup_path/$archive", @files);
+	chdir($backup_path) || die "$!";
+	rmtree("$tmpdir") || die "$!";
+}
+
+# Add a file into the base.tar of the backup.
+sub mutilate_tar_backup_extra_file
+{
+	my ($backup_path) = @_;
+	mutilate_base_tar($backup_path, 'add');
+	return;
+}
+
+# Remove a file.
+sub mutilate_tar_backup_missing_file
+{
+	my ($backup_path) = @_;
+	mutilate_base_tar($backup_path, 'delete');
+	return;
+}
+
+# Append an additional bytes to a file.
+sub mutilate_tar_backup_append_to_file
+{
+	my ($backup_path) = @_;
+	mutilate_base_tar($backup_path, 'append');
+	return;
+}
+
+# Truncate a file to zero length.
+sub mutilate_tar_backup_truncate_file
+{
+	my ($backup_path) = @_;
+	mutilate_base_tar($backup_path, 'truncate');
+	return;
+}
+
+# Replace a file's contents
+sub mutilate_tar_backup_replace_file
+{
+	my ($backup_path) = @_;
+	mutilate_base_tar($backup_path, 'replace');
+	return;
+}
+
 done_testing();
diff --git a/src/bin/pg_verifybackup/t/004_options.pl b/src/bin/pg_verifybackup/t/004_options.pl
index 8ed2214408e..2f197648740 100644
--- a/src/bin/pg_verifybackup/t/004_options.pl
+++ b/src/bin/pg_verifybackup/t/004_options.pl
@@ -108,7 +108,7 @@ unlike(
 # Test valid manifest with nonexistent backup directory.
 command_fails_like(
 	[
-		'pg_verifybackup', '-m',
+		'pg_verifybackup', '-Fp', '-m',
 		"$backup_path/backup_manifest", "$backup_path/fake"
 	],
 	qr/could not open directory/,
diff --git a/src/bin/pg_verifybackup/t/005_bad_manifest.pl b/src/bin/pg_verifybackup/t/005_bad_manifest.pl
index c4ed64b62d5..28c51b6feb0 100644
--- a/src/bin/pg_verifybackup/t/005_bad_manifest.pl
+++ b/src/bin/pg_verifybackup/t/005_bad_manifest.pl
@@ -208,7 +208,7 @@ sub test_bad_manifest
 	print $fh $manifest_contents;
 	close($fh);
 
-	command_fails_like([ 'pg_verifybackup', $tempdir ], $regexp, $test_name);
+	command_fails_like([ 'pg_verifybackup', '-Fp', $tempdir ], $regexp, $test_name);
 	return;
 }
 
diff --git a/src/bin/pg_verifybackup/t/008_untar.pl b/src/bin/pg_verifybackup/t/008_untar.pl
index 7a09f3b75b2..1c83f38d5b5 100644
--- a/src/bin/pg_verifybackup/t/008_untar.pl
+++ b/src/bin/pg_verifybackup/t/008_untar.pl
@@ -16,6 +16,20 @@ my $primary = PostgreSQL::Test::Cluster->new('primary');
 $primary->init(allows_streaming => 1);
 $primary->start;
 
+# Create a tablespace directory.
+my $TS1_LOCATION = $primary->backup_dir .'/ts1';
+$TS1_LOCATION =~ s/\/\.\//\//g;    # collapse foo/./bar to foo/bar
+mkdir($TS1_LOCATION);
+
+# Create a tablespace with table in it.
+$primary->safe_psql('postgres', qq(
+		CREATE TABLESPACE regress_ts1 LOCATION '$TS1_LOCATION';
+		SELECT oid FROM pg_tablespace WHERE spcname = 'regress_ts1';
+		CREATE TABLE regress_tbl1(i int) TABLESPACE regress_ts1;
+		INSERT INTO regress_tbl1 VALUES(generate_series(1,5));));
+my $tsoid = $primary->safe_psql('postgres', qq(
+		SELECT oid FROM pg_tablespace WHERE spcname = 'regress_ts1'));
+
 my $backup_path = $primary->backup_dir . '/server-backup';
 my $extract_path = $primary->backup_dir . '/extracted-backup';
 
@@ -23,39 +37,31 @@ my @test_configuration = (
 	{
 		'compression_method' => 'none',
 		'backup_flags' => [],
-		'backup_archive' => 'base.tar',
+		'backup_archive' => ['base.tar', "$tsoid.tar"],
 		'enabled' => 1
 	},
 	{
 		'compression_method' => 'gzip',
 		'backup_flags' => [ '--compress', 'server-gzip' ],
-		'backup_archive' => 'base.tar.gz',
-		'decompress_program' => $ENV{'GZIP_PROGRAM'},
-		'decompress_flags' => ['-d'],
+		'backup_archive' => [ 'base.tar.gz', "$tsoid.tar.gz" ],
 		'enabled' => check_pg_config("#define HAVE_LIBZ 1")
 	},
 	{
 		'compression_method' => 'lz4',
 		'backup_flags' => [ '--compress', 'server-lz4' ],
-		'backup_archive' => 'base.tar.lz4',
-		'decompress_program' => $ENV{'LZ4'},
-		'decompress_flags' => [ '-d', '-m' ],
+		'backup_archive' => ['base.tar.lz4', "$tsoid.tar.lz4" ],
 		'enabled' => check_pg_config("#define USE_LZ4 1")
 	},
 	{
 		'compression_method' => 'zstd',
 		'backup_flags' => [ '--compress', 'server-zstd' ],
-		'backup_archive' => 'base.tar.zst',
-		'decompress_program' => $ENV{'ZSTD'},
-		'decompress_flags' => ['-d'],
+		'backup_archive' => [ 'base.tar.zst', "$tsoid.tar.zst" ],
 		'enabled' => check_pg_config("#define USE_ZSTD 1")
 	},
 	{
 		'compression_method' => 'zstd',
 		'backup_flags' => [ '--compress', 'server-zstd:level=1,long' ],
-		'backup_archive' => 'base.tar.zst',
-		'decompress_program' => $ENV{'ZSTD'},
-		'decompress_flags' => ['-d'],
+		'backup_archive' => [ 'base.tar.zst', "$tsoid.tar.zst" ],
 		'enabled' => check_pg_config("#define USE_ZSTD 1")
 	});
 
@@ -86,47 +92,16 @@ for my $tc (@test_configuration)
 		my $backup_files = join(',',
 			sort grep { $_ ne '.' && $_ ne '..' } slurp_dir($backup_path));
 		my $expected_backup_files =
-		  join(',', sort ('backup_manifest', $tc->{'backup_archive'}));
+		  join(',', sort ('backup_manifest', @{ $tc->{'backup_archive'} }));
 		is($backup_files, $expected_backup_files,
 			"found expected backup files, compression $method");
 
-		# Decompress.
-		if (exists $tc->{'decompress_program'})
-		{
-			my @decompress = ($tc->{'decompress_program'});
-			push @decompress, @{ $tc->{'decompress_flags'} }
-			  if $tc->{'decompress_flags'};
-			push @decompress, $backup_path . '/' . $tc->{'backup_archive'};
-			system_or_bail(@decompress);
-		}
-
-	  SKIP:
-		{
-			my $tar = $ENV{TAR};
-			# don't check for a working tar here, to accommodate various odd
-			# cases. If tar doesn't work the init_from_backup below will fail.
-			skip "no tar program available", 1
-			  if (!defined $tar || $tar eq '');
-
-			# Untar.
-			mkdir($extract_path);
-			system_or_bail($tar, 'xf', $backup_path . '/base.tar',
-				'-C', $extract_path);
-
-			# Verify.
-			$primary->command_ok(
-				[
-					'pg_verifybackup', '-n',
-					'-m', "$backup_path/backup_manifest",
-					'-e', $extract_path
-				],
-				"verify backup, compression $method");
-		}
+		# Verify tar backup.
+		$primary->command_ok(['pg_verifybackup', '-n', '-e', $backup_path],
+			"verify backup, compression $method");
 
 		# Cleanup.
-		unlink($backup_path . '/backup_manifest');
-		unlink($backup_path . '/base.tar');
-		unlink($backup_path . '/' . $tc->{'backup_archive'});
+		rmtree($backup_path);
 		rmtree($extract_path);
 	}
 }
diff --git a/src/bin/pg_verifybackup/t/010_client_untar.pl b/src/bin/pg_verifybackup/t/010_client_untar.pl
index 8c076d46dee..6b7d7483f6e 100644
--- a/src/bin/pg_verifybackup/t/010_client_untar.pl
+++ b/src/bin/pg_verifybackup/t/010_client_untar.pl
@@ -29,41 +29,30 @@ my @test_configuration = (
 		'compression_method' => 'gzip',
 		'backup_flags' => [ '--compress', 'client-gzip:5' ],
 		'backup_archive' => 'base.tar.gz',
-		'decompress_program' => $ENV{'GZIP_PROGRAM'},
-		'decompress_flags' => ['-d'],
 		'enabled' => check_pg_config("#define HAVE_LIBZ 1")
 	},
 	{
 		'compression_method' => 'lz4',
 		'backup_flags' => [ '--compress', 'client-lz4:5' ],
 		'backup_archive' => 'base.tar.lz4',
-		'decompress_program' => $ENV{'LZ4'},
-		'decompress_flags' => ['-d'],
-		'output_file' => 'base.tar',
 		'enabled' => check_pg_config("#define USE_LZ4 1")
 	},
 	{
 		'compression_method' => 'zstd',
 		'backup_flags' => [ '--compress', 'client-zstd:5' ],
 		'backup_archive' => 'base.tar.zst',
-		'decompress_program' => $ENV{'ZSTD'},
-		'decompress_flags' => ['-d'],
 		'enabled' => check_pg_config("#define USE_ZSTD 1")
 	},
 	{
 		'compression_method' => 'zstd',
 		'backup_flags' => [ '--compress', 'client-zstd:level=1,long' ],
 		'backup_archive' => 'base.tar.zst',
-		'decompress_program' => $ENV{'ZSTD'},
-		'decompress_flags' => ['-d'],
 		'enabled' => check_pg_config("#define USE_ZSTD 1")
 	},
 	{
 		'compression_method' => 'parallel zstd',
 		'backup_flags' => [ '--compress', 'client-zstd:workers=3' ],
 		'backup_archive' => 'base.tar.zst',
-		'decompress_program' => $ENV{'ZSTD'},
-		'decompress_flags' => ['-d'],
 		'enabled' => check_pg_config("#define USE_ZSTD 1"),
 		'possibly_unsupported' =>
 		  qr/could not set compression worker count to 3: Unsupported parameter/
@@ -118,40 +107,9 @@ for my $tc (@test_configuration)
 		is($backup_files, $expected_backup_files,
 			"found expected backup files, compression $method");
 
-		# Decompress.
-		if (exists $tc->{'decompress_program'})
-		{
-			my @decompress = ($tc->{'decompress_program'});
-			push @decompress, @{ $tc->{'decompress_flags'} }
-			  if $tc->{'decompress_flags'};
-			push @decompress, $backup_path . '/' . $tc->{'backup_archive'};
-			push @decompress, $backup_path . '/' . $tc->{'output_file'}
-			  if $tc->{'output_file'};
-			system_or_bail(@decompress);
-		}
-
-	  SKIP:
-		{
-			my $tar = $ENV{TAR};
-			# don't check for a working tar here, to accommodate various odd
-			# cases. If tar doesn't work the init_from_backup below will fail.
-			skip "no tar program available", 1
-			  if (!defined $tar || $tar eq '');
-
-			# Untar.
-			mkdir($extract_path);
-			system_or_bail($tar, 'xf', $backup_path . '/base.tar',
-				'-C', $extract_path);
-
-			# Verify.
-			$primary->command_ok(
-				[
-					'pg_verifybackup', '-n',
-					'-m', "$backup_path/backup_manifest",
-					'-e', $extract_path
-				],
-				"verify backup, compression $method");
-		}
+		# Verify tar backup.
+		$primary->command_ok( [ 'pg_verifybackup', '-n', '-e', $backup_path ],
+			"verify backup, compression $method");
 
 		# Cleanup.
 		rmtree($extract_path);
-- 
2.18.0

Reply via email to