On Mon, Mar 28, 2022 at 03:50:50PM -0400, Robert Haas wrote:
> On Sun, Mar 27, 2022 at 1:47 PM Tom Lane <t...@sss.pgh.pa.us> wrote:
> > Coverity has a nitpick about this:
> >
> > /srv/coverity/git/pgsql-git/postgresql/src/common/backup_compression.c: 194 
> > in parse_bc_specification()
> > 193                     /* Advance to next entry and loop around. */
> > >>>     CID 1503251:  Null pointer dereferences  (REVERSE_INULL)
> > >>>     Null-checking "vend" suggests that it may be null, but it has 
> > >>> already been dereferenced on all paths leading to the check.
> > 194                     specification = vend == NULL ? kwend + 1 : vend + 1;
> > 195             }
> > 196     }
> >
> > Not sure if you should remove this null-check or add some other ones,
> > but I think you ought to do one or the other.
> 
> Yes, I think this is buggy.  I think there's only a theoretical bug
> right now, because the only keyword we have is "level" and that
> requires a value. But if I add an example keyword that does not
> require an associated value (as demonstrated in the attached patch)
> and do something like pg_basebackup -cfast -D whatever --compress
> lz4:example, then the present code will dereference "vend" even though
> it's NULL, which is not good. The attached patch also shows how I
> think that should be fixed.
> 
> As I hope is apparent, the first hunk of this patch is not for commit,
> and the second hunk is for commit.

Confirmed that it's a real issue with my patch for zstd long match mode.  But
you need to specify another option after the value-less flag option for it to
crash.

I suggest to write it differently, as in 0002.

This also fixes some rebase-induced errors with my previous patches, and adds
expect_boolean().
>From 9bedbfc6bfa471473a8b3479ffd1888d5da285ab Mon Sep 17 00:00:00 2001
From: Robert Haas <rh...@postgresql.org>
Date: Mon, 28 Mar 2022 13:25:44 -0400
Subject: [PATCH 1/4] Allow parallel zstd compression when taking a base
 backup.

libzstd allows transparent parallel compression just by setting
an option when creating the compression context, so permit that
for both client and server-side backup compression. To use this,
use something like pg_basebackup --compress WHERE-zstd:workers=N
where WHERE is "client" or "server" and N is an integer.

When compression is performed on the server side, this will spawn
threads inside the PostgreSQL backend. While there is almost no
PostgreSQL server code which is thread-safe, the threads here are used
internally by libzstd and touch only data structures controlled by
libzstd.

Patch by me, based in part on earlier work by Dipesh Pandit
and Jeevan Ladhe. Reviewed by Justin Pryzby.
---
 doc/src/sgml/protocol.sgml                    | 12 +++--
 doc/src/sgml/ref/pg_basebackup.sgml           |  4 +-
 src/backend/replication/basebackup_zstd.c     | 45 ++++++++++++-------
 src/bin/pg_basebackup/bbstreamer_zstd.c       | 40 ++++++++++++-----
 src/bin/pg_basebackup/t/010_pg_basebackup.pl  |  5 +++
 src/bin/pg_verifybackup/t/009_extract.pl      | 29 +++++++++++-
 src/bin/pg_verifybackup/t/010_client_untar.pl | 33 ++++++++++++--
 src/common/backup_compression.c               | 16 +++++++
 src/include/common/backup_compression.h       |  2 +
 src/test/perl/PostgreSQL/Test/Cluster.pm      |  3 +-
 10 files changed, 148 insertions(+), 41 deletions(-)

diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index 2fa3cedfe9e..98f0bc3cc34 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -2739,17 +2739,23 @@ The commands accepted in replication mode are:
           option.  If the value is an integer, it specifies the compression
           level.  Otherwise, it should be a comma-separated list of items,
           each of the form <literal>keyword</literal> or
-          <literal>keyword=value</literal>. Currently, the only supported
-          keyword is <literal>level</literal>, which sets the compression
-          level.
+          <literal>keyword=value</literal>. Currently, the supported keywords
+          are <literal>level</literal> and <literal>workers</literal>.
         </para>
 
         <para>
+          The <literal>level</literal> keyword sets the compression level.
           For <literal>gzip</literal> the compression level should be an
           integer between 1 and 9, for <literal>lz4</literal> an integer
           between 1 and 12, and for <literal>zstd</literal> an integer
           between 1 and 22.
          </para>
+
+        <para>
+          The <literal>workers</literal> keyword sets the number of threads
+          that should be used for parallel compression. Parallel compression
+          is supported only for <literal>zstd</literal>.
+         </para>
         </listitem>
        </varlistentry>
 
diff --git a/doc/src/sgml/ref/pg_basebackup.sgml b/doc/src/sgml/ref/pg_basebackup.sgml
index d9233beb8e1..82f5f606250 100644
--- a/doc/src/sgml/ref/pg_basebackup.sgml
+++ b/doc/src/sgml/ref/pg_basebackup.sgml
@@ -424,8 +424,8 @@ PostgreSQL documentation
         integer, it specifies the compression level.  Otherwise, it should be
         a comma-separated list of items, each of the form
         <literal>keyword</literal> or <literal>keyword=value</literal>.
-        Currently, the only supported keyword is <literal>level</literal>,
-        which sets the compression level.
+        Currently, the supported keywords are <literal>level</literal>
+        and <literal>workers</literal>.
        </para>
        <para>
         If no compression level is specified, the default compression level
diff --git a/src/backend/replication/basebackup_zstd.c b/src/backend/replication/basebackup_zstd.c
index 5496eaa72b7..f6876f48118 100644
--- a/src/backend/replication/basebackup_zstd.c
+++ b/src/backend/replication/basebackup_zstd.c
@@ -25,8 +25,8 @@ typedef struct bbsink_zstd
 	/* Common information for all types of sink. */
 	bbsink		base;
 
-	/* Compression level */
-	int			compresslevel;
+	/* Compression options */
+	bc_specification *compress;
 
 	ZSTD_CCtx  *cctx;
 	ZSTD_outBuffer zstd_outBuf;
@@ -67,22 +67,13 @@ bbsink_zstd_new(bbsink *next, bc_specification *compress)
 	return NULL;				/* keep compiler quiet */
 #else
 	bbsink_zstd *sink;
-	int		compresslevel;
 
 	Assert(next != NULL);
 
-	if ((compress->options & BACKUP_COMPRESSION_OPTION_LEVEL) == 0)
-		compresslevel = 0;
-	else
-	{
-		compresslevel = compress->level;
-		Assert(compresslevel >= 1 && compresslevel <= 22);
-	}
-
 	sink = palloc0(sizeof(bbsink_zstd));
 	*((const bbsink_ops **) &sink->base.bbs_ops) = &bbsink_zstd_ops;
 	sink->base.bbs_next = next;
-	sink->compresslevel = compresslevel;
+	sink->compress = compress;
 
 	return &sink->base;
 #endif
@@ -99,16 +90,36 @@ bbsink_zstd_begin_backup(bbsink *sink)
 	bbsink_zstd *mysink = (bbsink_zstd *) sink;
 	size_t		output_buffer_bound;
 	size_t		ret;
+	bc_specification *compress = mysink->compress;
 
 	mysink->cctx = ZSTD_createCCtx();
 	if (!mysink->cctx)
 		elog(ERROR, "could not create zstd compression context");
 
-	ret = ZSTD_CCtx_setParameter(mysink->cctx, ZSTD_c_compressionLevel,
-								 mysink->compresslevel);
-	if (ZSTD_isError(ret))
-		elog(ERROR, "could not set zstd compression level to %d: %s",
-			 mysink->compresslevel, ZSTD_getErrorName(ret));
+	if ((compress->options & BACKUP_COMPRESSION_OPTION_LEVEL) != 0)
+	{
+		ret = ZSTD_CCtx_setParameter(mysink->cctx, ZSTD_c_compressionLevel,
+									 compress->level);
+		if (ZSTD_isError(ret))
+			elog(ERROR, "could not set zstd compression level to %d: %s",
+				 compress->level, ZSTD_getErrorName(ret));
+	}
+
+	if ((compress->options & BACKUP_COMPRESSION_OPTION_WORKERS) != 0)
+	{
+		/*
+		 * On older versions of libzstd, this option does not exist, and trying
+		 * to set it will fail. Similarly for newer versions if they are
+		 * compiled without threading support.
+		 */
+		ret = ZSTD_CCtx_setParameter(mysink->cctx, ZSTD_c_nbWorkers,
+									 compress->workers);
+		if (ZSTD_isError(ret))
+			ereport(ERROR,
+					errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+					errmsg("could not set compression worker count to %d: %s",
+						   compress->workers, ZSTD_getErrorName(ret)));
+	}
 
 	/*
 	 * We need our own buffer, because we're going to pass different data to
diff --git a/src/bin/pg_basebackup/bbstreamer_zstd.c b/src/bin/pg_basebackup/bbstreamer_zstd.c
index 7946b6350b6..f94c5c041d3 100644
--- a/src/bin/pg_basebackup/bbstreamer_zstd.c
+++ b/src/bin/pg_basebackup/bbstreamer_zstd.c
@@ -67,7 +67,6 @@ bbstreamer_zstd_compressor_new(bbstreamer *next, bc_specification *compress)
 {
 #ifdef USE_ZSTD
 	bbstreamer_zstd_frame *streamer;
-	int			compresslevel;
 	size_t		ret;
 
 	Assert(next != NULL);
@@ -88,18 +87,35 @@ bbstreamer_zstd_compressor_new(bbstreamer *next, bc_specification *compress)
 		exit(1);
 	}
 
-	/* Initialize stream compression preferences */
-	if ((compress->options & BACKUP_COMPRESSION_OPTION_LEVEL) == 0)
-		compresslevel = 0;
-	else
-		compresslevel = compress->level;
-	ret = ZSTD_CCtx_setParameter(streamer->cctx, ZSTD_c_compressionLevel,
-								 compresslevel);
-	if (ZSTD_isError(ret))
+	/* Set compression level, if specified */
+	if ((compress->options & BACKUP_COMPRESSION_OPTION_LEVEL) != 0)
 	{
-		pg_log_error("could not set zstd compression level to %d: %s",
-					 compresslevel, ZSTD_getErrorName(ret));
-		exit(1);
+		ret = ZSTD_CCtx_setParameter(streamer->cctx, ZSTD_c_compressionLevel,
+									 compress->level);
+		if (ZSTD_isError(ret))
+		{
+			pg_log_error("could not set zstd compression level to %d: %s",
+						 compress->level, ZSTD_getErrorName(ret));
+			exit(1);
+		}
+	}
+
+	/* Set # of workers, if specified */
+	if ((compress->options & BACKUP_COMPRESSION_OPTION_WORKERS) != 0)
+	{
+		/*
+		 * On older versions of libzstd, this option does not exist, and
+		 * trying to set it will fail. Similarly for newer versions if they
+		 * are compiled without threading support.
+		 */
+		ret = ZSTD_CCtx_setParameter(streamer->cctx, ZSTD_c_nbWorkers,
+									 compress->workers);
+		if (ZSTD_isError(ret))
+		{
+			pg_log_error("could not set compression worker count to %d: %s",
+						 compress->workers, ZSTD_getErrorName(ret));
+			exit(1);
+		}
 	}
 
 	/* Initialize the ZSTD output buffer. */
diff --git a/src/bin/pg_basebackup/t/010_pg_basebackup.pl b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
index 47f3d00ac45..5ba84c22509 100644
--- a/src/bin/pg_basebackup/t/010_pg_basebackup.pl
+++ b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
@@ -130,6 +130,11 @@ my @compression_failure_tests = (
 		'invalid compression specification: found empty string where a compression option was expected',
 		'failure on extra, empty compression option'
 	],
+	[
+		'gzip:workers=3',
+		'invalid compression specification: compression algorithm "gzip" does not accept a worker count',
+		'failure on worker count for gzip'
+	],
 );
 for my $cft (@compression_failure_tests)
 {
diff --git a/src/bin/pg_verifybackup/t/009_extract.pl b/src/bin/pg_verifybackup/t/009_extract.pl
index 41a5b370cc5..d6f11b95535 100644
--- a/src/bin/pg_verifybackup/t/009_extract.pl
+++ b/src/bin/pg_verifybackup/t/009_extract.pl
@@ -34,6 +34,12 @@ my @test_configuration = (
 		'compression_method' => 'zstd',
 		'backup_flags' => ['--compress', 'server-zstd:5'],
 		'enabled' => check_pg_config("#define USE_ZSTD 1")
+	},
+	{
+		'compression_method' => 'parallel zstd',
+		'backup_flags' => ['--compress', 'server-zstd:workers=3'],
+		'enabled' => check_pg_config("#define USE_ZSTD 1"),
+		'possibly_unsupported' => qr/could not set compression worker count to 3: Unsupported parameter/
 	}
 );
 
@@ -55,8 +61,27 @@ for my $tc (@test_configuration)
 		my @verify = ('pg_verifybackup', '-e', $backup_path);
 
 		# A backup with a valid compression method should work.
-		$primary->command_ok(\@backup,
-							 "backup done, compression method \"$method\"");
+		my $backup_stdout = '';
+		my $backup_stderr = '';
+		my $backup_result = $primary->run_log(\@backup, '>', \$backup_stdout,
+											  '2>', \$backup_stderr);
+		if ($backup_stdout ne '')
+		{
+			print "# standard output was:\n$backup_stdout";
+		}
+		if ($backup_stderr ne '')
+		{
+			print "# standard error was:\n$backup_stderr";
+		}
+		if (! $backup_result && $tc->{'possibly_unsupported'} &&
+			$backup_stderr =~ /$tc->{'possibly_unsupported'}/)
+		{
+			skip "compression with $method not supported by this build", 2;
+		}
+		else
+		{
+			ok($backup_result, "backup done, compression $method");
+		}
 
 		# Make sure that it verifies OK.
 		$primary->command_ok(\@verify,
diff --git a/src/bin/pg_verifybackup/t/010_client_untar.pl b/src/bin/pg_verifybackup/t/010_client_untar.pl
index 488a6d1edee..c1cd12cb065 100644
--- a/src/bin/pg_verifybackup/t/010_client_untar.pl
+++ b/src/bin/pg_verifybackup/t/010_client_untar.pl
@@ -49,6 +49,15 @@ my @test_configuration = (
 		'decompress_program' => $ENV{'ZSTD'},
 		'decompress_flags' => [ '-d' ],
 		'enabled' => check_pg_config("#define USE_ZSTD 1")
+	},
+	{
+		'compression_method' => 'parallel zstd',
+		'backup_flags' => ['--compress', 'client-zstd:workers=3'],
+		'backup_archive' => 'base.tar.zst',
+		'decompress_program' => $ENV{'ZSTD'},
+		'decompress_flags' => [ '-d' ],
+		'enabled' => check_pg_config("#define USE_ZSTD 1"),
+		'possibly_unsupported' => qr/could not set compression worker count to 3: Unsupported parameter/
 	}
 );
 
@@ -69,9 +78,27 @@ for my $tc (@test_configuration)
 			'pg_basebackup', '-D', $backup_path,
 			'-Xfetch', '--no-sync', '-cfast', '-Ft');
 		push @backup, @{$tc->{'backup_flags'}};
-		$primary->command_ok(\@backup,
-							 "client side backup, compression $method");
-
+		my $backup_stdout = '';
+		my $backup_stderr = '';
+		my $backup_result = $primary->run_log(\@backup, '>', \$backup_stdout,
+											  '2>', \$backup_stderr);
+		if ($backup_stdout ne '')
+		{
+			print "# standard output was:\n$backup_stdout";
+		}
+		if ($backup_stderr ne '')
+		{
+			print "# standard error was:\n$backup_stderr";
+		}
+		if (! $backup_result && $tc->{'possibly_unsupported'} &&
+			$backup_stderr =~ /$tc->{'possibly_unsupported'}/)
+		{
+			skip "compression with $method not supported by this build", 3;
+		}
+		else
+		{
+			ok($backup_result, "client side backup, compression $method");
+		}
 
 		# Verify that the we got the files we expected.
 		my $backup_files = join(',',
diff --git a/src/common/backup_compression.c b/src/common/backup_compression.c
index 0650f975c44..969e08cca20 100644
--- a/src/common/backup_compression.c
+++ b/src/common/backup_compression.c
@@ -177,6 +177,11 @@ parse_bc_specification(bc_algorithm algorithm, char *specification,
 			result->level = expect_integer_value(keyword, value, result);
 			result->options |= BACKUP_COMPRESSION_OPTION_LEVEL;
 		}
+		else if (strcmp(keyword, "workers") == 0)
+		{
+			result->workers = expect_integer_value(keyword, value, result);
+			result->options |= BACKUP_COMPRESSION_OPTION_WORKERS;
+		}
 		else
 			result->parse_error =
 				psprintf(_("unknown compression option \"%s\""), keyword);
@@ -266,5 +271,16 @@ validate_bc_specification(bc_specification *spec)
 							min_level, max_level);
 	}
 
+	/*
+	 * Of the compression algorithms that we currently support, only zstd
+	 * allows parallel workers.
+	 */
+	if ((spec->options & BACKUP_COMPRESSION_OPTION_WORKERS) != 0 &&
+		(spec->algorithm != BACKUP_COMPRESSION_ZSTD))
+	{
+		return psprintf(_("compression algorithm \"%s\" does not accept a worker count"),
+						get_bc_algorithm_name(spec->algorithm));
+	}
+
 	return NULL;
 }
diff --git a/src/include/common/backup_compression.h b/src/include/common/backup_compression.h
index 0565cbc657d..6a0ecaa99c9 100644
--- a/src/include/common/backup_compression.h
+++ b/src/include/common/backup_compression.h
@@ -23,12 +23,14 @@ typedef enum bc_algorithm
 } bc_algorithm;
 
 #define	BACKUP_COMPRESSION_OPTION_LEVEL			(1 << 0)
+#define BACKUP_COMPRESSION_OPTION_WORKERS		(1 << 1)
 
 typedef struct bc_specification
 {
 	bc_algorithm algorithm;
 	unsigned	options;		/* OR of BACKUP_COMPRESSION_OPTION constants */
 	int			level;
+	int			workers;
 	char	   *parse_error;	/* NULL if parsing was OK, else message */
 } bc_specification;
 
diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm b/src/test/perl/PostgreSQL/Test/Cluster.pm
index bee6aacf47c..b6e33516110 100644
--- a/src/test/perl/PostgreSQL/Test/Cluster.pm
+++ b/src/test/perl/PostgreSQL/Test/Cluster.pm
@@ -2502,8 +2502,7 @@ sub run_log
 
 	local %ENV = $self->_get_env();
 
-	PostgreSQL::Test::Utils::run_log(@_);
-	return;
+	return PostgreSQL::Test::Utils::run_log(@_);
 }
 
 =pod
-- 
2.17.1

>From a0c100c4473863335dc54ffc6167669cdc858096 Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryz...@telsasoft.com>
Date: Mon, 28 Mar 2022 15:16:50 -0500
Subject: [PATCH 2/4] Avoid crash on backup connection strings with flags with
 no value

---
 src/common/backup_compression.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/src/common/backup_compression.c b/src/common/backup_compression.c
index 969e08cca20..477dc7eb49b 100644
--- a/src/common/backup_compression.c
+++ b/src/common/backup_compression.c
@@ -192,7 +192,11 @@ parse_bc_specification(bc_algorithm algorithm, char *specification,
 			pfree(value);
 
 		/* If we got an error or have reached the end of the string, stop. */
-		if (result->parse_error != NULL || *kwend == '\0' || *vend == '\0')
+		if (result->parse_error != NULL)
+			break;
+		if (*kwend == '\0')
+			break;
+		if (vend != NULL && *vend == '\0')
 			break;
 
 		/* Advance to next entry and loop around. */
-- 
2.17.1

>From f0e5ee4d78dce6bc4d111b8b574c6b75f546ee4a Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryz...@telsasoft.com>
Date: Sun, 27 Mar 2022 11:55:01 -0500
Subject: [PATCH 3/4] basebackup: support -Z zstd:long

---
 doc/src/sgml/protocol.sgml                | 10 +++++-
 doc/src/sgml/ref/pg_basebackup.sgml       |  4 +--
 src/backend/replication/basebackup_zstd.c | 12 +++++++
 src/bin/pg_basebackup/bbstreamer_zstd.c   | 13 +++++++
 src/common/backup_compression.c           | 44 +++++++++++++++++++++++
 src/include/common/backup_compression.h   |  2 ++
 6 files changed, 82 insertions(+), 3 deletions(-)

diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index 98f0bc3cc34..80f1a1f9a04 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -2740,7 +2740,8 @@ The commands accepted in replication mode are:
           level.  Otherwise, it should be a comma-separated list of items,
           each of the form <literal>keyword</literal> or
           <literal>keyword=value</literal>. Currently, the supported keywords
-          are <literal>level</literal> and <literal>workers</literal>.
+          are <literal>level</literal>, <literal>long</literal>, and
+          <literal>workers</literal>.
         </para>
 
         <para>
@@ -2751,6 +2752,13 @@ The commands accepted in replication mode are:
           between 1 and 22.
          </para>
 
+        <para>
+          The <literal>long</literal> keyword enables long-distance matching
+          mode, for improved compression ratio, at the expense of higher memory
+          use.  Long-distance mode is supported only for
+          <literal>zstd</literal>.
+         </para>
+
         <para>
           The <literal>workers</literal> keyword sets the number of threads
           that should be used for parallel compression. Parallel compression
diff --git a/doc/src/sgml/ref/pg_basebackup.sgml b/doc/src/sgml/ref/pg_basebackup.sgml
index 82f5f606250..014c454bfab 100644
--- a/doc/src/sgml/ref/pg_basebackup.sgml
+++ b/doc/src/sgml/ref/pg_basebackup.sgml
@@ -424,8 +424,8 @@ PostgreSQL documentation
         integer, it specifies the compression level.  Otherwise, it should be
         a comma-separated list of items, each of the form
         <literal>keyword</literal> or <literal>keyword=value</literal>.
-        Currently, the supported keywords are <literal>level</literal>
-        and <literal>workers</literal>.
+        Currently, the supported keywords are <literal>level</literal>,
+        <literal>long</literal>, and <literal>workers</literal>.
        </para>
        <para>
         If no compression level is specified, the default compression level
diff --git a/src/backend/replication/basebackup_zstd.c b/src/backend/replication/basebackup_zstd.c
index f6876f48118..dc23898f7fd 100644
--- a/src/backend/replication/basebackup_zstd.c
+++ b/src/backend/replication/basebackup_zstd.c
@@ -121,6 +121,18 @@ bbsink_zstd_begin_backup(bbsink *sink)
 						   compress->workers, ZSTD_getErrorName(ret)));
 	}
 
+	if ((compress->options & BACKUP_COMPRESSION_OPTION_ZSTD_LONG) != 0)
+	{
+		ret = ZSTD_CCtx_setParameter(mysink->cctx,
+									 ZSTD_c_enableLongDistanceMatching,
+									 compress->zstd_long);
+		if (ZSTD_isError(ret))
+			ereport(ERROR,
+					errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+					errmsg("could not set compression flag for %s: %s",
+						   "long", ZSTD_getErrorName(ret)));
+	}
+
 	/*
 	 * We need our own buffer, because we're going to pass different data to
 	 * the next sink than what gets passed to us.
diff --git a/src/bin/pg_basebackup/bbstreamer_zstd.c b/src/bin/pg_basebackup/bbstreamer_zstd.c
index f94c5c041d3..051b97458ba 100644
--- a/src/bin/pg_basebackup/bbstreamer_zstd.c
+++ b/src/bin/pg_basebackup/bbstreamer_zstd.c
@@ -118,6 +118,19 @@ bbstreamer_zstd_compressor_new(bbstreamer *next, bc_specification *compress)
 		}
 	}
 
+	if ((compress->options & BACKUP_COMPRESSION_OPTION_ZSTD_LONG) != 0)
+	{
+		ret = ZSTD_CCtx_setParameter(streamer->cctx,
+									 ZSTD_c_enableLongDistanceMatching,
+									 compress->zstd_long);
+		if (ZSTD_isError(ret))
+		{
+			pg_log_error("could not set compression flag for %s: %s",
+						 "long", ZSTD_getErrorName(ret));
+			exit(1);
+		}
+	}
+
 	/* Initialize the ZSTD output buffer. */
 	streamer->zstd_outBuf.dst = streamer->base.bbs_buffer.data;
 	streamer->zstd_outBuf.size = streamer->base.bbs_buffer.maxlen;
diff --git a/src/common/backup_compression.c b/src/common/backup_compression.c
index 477dc7eb49b..9fc865ff299 100644
--- a/src/common/backup_compression.c
+++ b/src/common/backup_compression.c
@@ -31,6 +31,8 @@
 
 static int	expect_integer_value(char *keyword, char *value,
 								 bc_specification *result);
+static bool	expect_boolean_value(char *keyword, char *value,
+								 bc_specification *result);
 
 /*
  * Look up a compression algorithm by name. Returns true and sets *algorithm
@@ -182,6 +184,11 @@ parse_bc_specification(bc_algorithm algorithm, char *specification,
 			result->workers = expect_integer_value(keyword, value, result);
 			result->options |= BACKUP_COMPRESSION_OPTION_WORKERS;
 		}
+		else if (strcmp(keyword, "long") == 0)
+		{
+			result->zstd_long = expect_boolean_value(keyword, value, result);
+			result->options |= BACKUP_COMPRESSION_OPTION_ZSTD_LONG;
+		}
 		else
 			result->parse_error =
 				psprintf(_("unknown compression option \"%s\""), keyword);
@@ -235,6 +242,43 @@ expect_integer_value(char *keyword, char *value, bc_specification *result)
 	return ivalue;
 }
 
+/*
+ * Parse 'value' as an boolean and return the result.
+ *
+ * If parsing fails, set result->parse_error to an appropriate message
+ * and return -1.  The caller must check result->parse_error to determine if
+ * the call was successful.
+ *
+ * Valid values are: yes, no, on, off, 1, 0.
+ *
+ * Inspired by ParseVariableBool().
+ */
+static bool
+expect_boolean_value(char *keyword, char *value, bc_specification *result)
+{
+	if (value == NULL)
+		return true;
+
+	if (pg_strcasecmp(value, "yes") == 0)
+		return true;
+	if (pg_strcasecmp(value, "on") == 0)
+		return true;
+	if (pg_strcasecmp(value, "1") == 0)
+		return true;
+
+	if (pg_strcasecmp(value, "no") == 0)
+		return false;
+	if (pg_strcasecmp(value, "off") == 0)
+		return false;
+	if (pg_strcasecmp(value, "0") == 0)
+		return false;
+
+	result->parse_error =
+		psprintf(_("value for compression option \"%s\" must be a boolean"),
+				 keyword);
+	return false;
+}
+
 /*
  * Returns NULL if the compression specification string was syntactically
  * valid and semantically sensible.  Otherwise, returns an error message.
diff --git a/src/include/common/backup_compression.h b/src/include/common/backup_compression.h
index 6a0ecaa99c9..a378631a8da 100644
--- a/src/include/common/backup_compression.h
+++ b/src/include/common/backup_compression.h
@@ -24,6 +24,7 @@ typedef enum bc_algorithm
 
 #define	BACKUP_COMPRESSION_OPTION_LEVEL			(1 << 0)
 #define BACKUP_COMPRESSION_OPTION_WORKERS		(1 << 1)
+#define BACKUP_COMPRESSION_OPTION_ZSTD_LONG		(1 << 2)
 
 typedef struct bc_specification
 {
@@ -31,6 +32,7 @@ typedef struct bc_specification
 	unsigned	options;		/* OR of BACKUP_COMPRESSION_OPTION constants */
 	int			level;
 	int			workers;
+	int			zstd_long;
 	char	   *parse_error;	/* NULL if parsing was OK, else message */
 } bc_specification;
 
-- 
2.17.1

>From bc6846ed93af475b079c4ab9bfa2a33c49a8a185 Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryz...@telsasoft.com>
Date: Thu, 10 Mar 2022 20:16:19 -0600
Subject: [PATCH 4/4] pg_basebackup: support Zstd negative compression levels

"higher than maximum" is bogus

TODO: each compression methods should enforce its own levels
---
 src/bin/pg_basebackup/bbstreamer_zstd.c | 1 +
 src/common/backup_compression.c         | 6 +++++-
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/src/bin/pg_basebackup/bbstreamer_zstd.c b/src/bin/pg_basebackup/bbstreamer_zstd.c
index 051b97458ba..491d6106cf5 100644
--- a/src/bin/pg_basebackup/bbstreamer_zstd.c
+++ b/src/bin/pg_basebackup/bbstreamer_zstd.c
@@ -114,6 +114,7 @@ bbstreamer_zstd_compressor_new(bbstreamer *next, bc_specification *compress)
 		{
 			pg_log_error("could not set compression worker count to %d: %s",
 						 compress->workers, ZSTD_getErrorName(ret));
+
 			exit(1);
 		}
 	}
diff --git a/src/common/backup_compression.c b/src/common/backup_compression.c
index 9fc865ff299..dbaf008af8e 100644
--- a/src/common/backup_compression.c
+++ b/src/common/backup_compression.c
@@ -308,13 +308,17 @@ validate_bc_specification(bc_specification *spec)
 		else if (spec->algorithm == BACKUP_COMPRESSION_LZ4)
 			max_level = 12;
 		else if (spec->algorithm == BACKUP_COMPRESSION_ZSTD)
+		{
 			max_level = 22;
+			/* The minimum level depends on the version.. */
+			min_level = -7;
+		}
 		else
 			return psprintf(_("compression algorithm \"%s\" does not accept a compression level"),
 							get_bc_algorithm_name(spec->algorithm));
 
 		if (spec->level < min_level || spec->level > max_level)
-			return psprintf(_("compression algorithm \"%s\" expects a compression level between %d and %d"),
+			return psprintf(_("compression algorithm \"%s\" expects a nonzero compression level between %d and %d"),
 							get_bc_algorithm_name(spec->algorithm),
 							min_level, max_level);
 	}
-- 
2.17.1

Reply via email to