Re: Add LZ4 compression in pg_dump

2023-05-17 Thread Tomas Vondra



On 5/17/23 10:59, Tomas Vondra wrote:
> On 5/17/23 08:18, Michael Paquier wrote:
>> On Tue, May 09, 2023 at 02:54:31PM +0200, Tomas Vondra wrote:
>>> Yeah. Thanks for the report, should have been found during review.
>>
>> Tomas, are you planning to do something by the end of this week for
>> beta1?  Or do you need some help of any kind?
> 
> I'll take care of it.
> 

FWIW I've pushed fixes for both open issues associated with the pg_dump
compression. I'll keep an eye on the buildfarm, but hopefully that'll do
it for beta1.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Add LZ4 compression in pg_dump

2023-05-17 Thread Tomas Vondra
On 5/17/23 08:18, Michael Paquier wrote:
> On Tue, May 09, 2023 at 02:54:31PM +0200, Tomas Vondra wrote:
>> Yeah. Thanks for the report, should have been found during review.
> 
> Tomas, are you planning to do something by the end of this week for
> beta1?  Or do you need some help of any kind?

I'll take care of it.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Add LZ4 compression in pg_dump

2023-05-17 Thread Michael Paquier
On Tue, May 09, 2023 at 02:54:31PM +0200, Tomas Vondra wrote:
> Yeah. Thanks for the report, should have been found during review.

Tomas, are you planning to do something by the end of this week for
beta1?  Or do you need some help of any kind?
--
Michael


signature.asc
Description: PGP signature


Re: Add LZ4 compression in pg_dump

2023-05-09 Thread Michael Paquier
On Tue, May 09, 2023 at 02:12:44PM +, gkokola...@pm.me wrote:
> Thank you both for looking. A small consolation is that now there are
> tests for this case.

+1, noticing that was pure luck ;)

Worth noting that the patch posted in [1] has these tests, not the
version posted in [2].

+create_sql   => 'INSERT INTO dump_test.test_compression_method (col1) '
+  . 'SELECT string_agg(a::text, \'\') FROM generate_series(1,4096) a;',

Yep, good and cheap idea to check for longer chunks.  That should be
enough to loop twice.

[1]: 
https://www.postgresql.org/message-id/SYTRcNgtAbzyn3y3IInh1x-UfNTKMNpnFvI3mr6SyqyVf3PkaDsMy_cpKKgsl3_HdLy2MFAH4zwjxDmFfiLO8rWtSiJWBtqT06OMjeNo4GA=@pm.me
[2]: 
https://www.postgresql.org/message-id/f735df01-0bb4-2fbc-1297-73a520cfc...@enterprisedb.com

> Moving on to the other open item for this, please find attached v2
> of the patch as requested.

Did you notice the comments of [3] about the second patch that aims to
add the null termination in the line from the LZ4 fgets() callback?

[3]: https://www.postgresql.org/message-id/zfhcyn4gm2eu6...@paquier.xyz
--
Michael


signature.asc
Description: PGP signature


Re: Add LZ4 compression in pg_dump

2023-05-09 Thread gkokolatos





--- Original Message ---
On Tuesday, May 9th, 2023 at 2:54 PM, Tomas Vondra 
 wrote:


> 
> 
> On 5/9/23 00:10, Michael Paquier wrote:
> 
> > On Mon, May 08, 2023 at 08:00:39PM +0200, Tomas Vondra wrote:
> > 
> > > The LZ4Stream_write() forgot to move the pointer to the next chunk, so
> > > it was happily decompressing the initial chunk over and over. A bit
> > > embarrassing oversight :-(
> > > 
> > > The custom format calls WriteDataToArchiveLZ4(), which was correct.
> > > 
> > > The attached patch fixes this for me.
> > 
> > Ouch. So this was corrupting the dumps and the compression when
> > trying to write more than two chunks at once, not the decompression
> > steps. That addresses the issue here as well, thanks!
> 
> 
> Yeah. Thanks for the report, should have been found during review.

Thank you both for looking. A small consolation is that now there are
tests for this case.

Moving on to the other open item for this, please find attached v2
of the patch as requested.

Cheers,
//Georgios

> 
> 
> regards
> 
> --
> Tomas Vondra
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL CompanyFrom cb0a229be59dffe09cc0ceceececdbd06a559d3f Mon Sep 17 00:00:00 2001
From: Georgios Kokolatos 
Date: Mon, 8 May 2023 11:58:57 +
Subject: [PATCH v2] Null terminate the output buffer of LZ4Stream_gets

LZ4Stream_gets did not null terminate its output buffer. Its callers expected
the buffer to be null terminated so they passed it around to functions such as
sscanf with unintended consequences.

Reported-by: Alexander Lakhin
---
 src/bin/pg_dump/compress_lz4.c | 12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/src/bin/pg_dump/compress_lz4.c b/src/bin/pg_dump/compress_lz4.c
index 423e1b7976..f97b7550d1 100644
--- a/src/bin/pg_dump/compress_lz4.c
+++ b/src/bin/pg_dump/compress_lz4.c
@@ -459,6 +459,10 @@ LZ4Stream_read_internal(LZ4State *state, void *ptr, int ptrsize, bool eol_flag)
 	if (!LZ4Stream_init(state, size, false /* decompressing */ ))
 		return -1;
 
+	/* No work needs to be done for a zero-sized output buffer */
+	if (size <= 0)
+		return 0;
+
 	/* Verify that there is enough space in the outbuf */
 	if (size > state->buflen)
 	{
@@ -636,7 +640,7 @@ LZ4Stream_gets(char *ptr, int size, CompressFileHandle *CFH)
 	LZ4State   *state = (LZ4State *) CFH->private_data;
 	int			ret;
 
-	ret = LZ4Stream_read_internal(state, ptr, size, true);
+	ret = LZ4Stream_read_internal(state, ptr, size - 1, true);
 	if (ret < 0 || (ret == 0 && !LZ4Stream_eof(CFH)))
 		pg_fatal("could not read from input file: %s", LZ4Stream_get_error(CFH));
 
@@ -644,6 +648,12 @@ LZ4Stream_gets(char *ptr, int size, CompressFileHandle *CFH)
 	if (ret == 0)
 		return NULL;
 
+	/*
+	 * Our caller expects the return string to be NULL terminated
+	 * and we know that ret is greater than zero.
+	 */
+	ptr[ret - 1] = '\0';
+
 	return ptr;
 }
 
-- 
2.34.1



Re: Add LZ4 compression in pg_dump

2023-05-09 Thread Tomas Vondra
On 5/9/23 00:10, Michael Paquier wrote:
> On Mon, May 08, 2023 at 08:00:39PM +0200, Tomas Vondra wrote:
>> The LZ4Stream_write() forgot to move the pointer to the next chunk, so
>> it was happily decompressing the initial chunk over and over. A bit
>> embarrassing oversight :-(
>>
>> The custom format calls WriteDataToArchiveLZ4(), which was correct.
>>
>> The attached patch fixes this for me.
> 
> Ouch.  So this was corrupting the dumps and the compression when
> trying to write more than two chunks at once, not the decompression
> steps.  That addresses the issue here as well, thanks!

Yeah. Thanks for the report, should have been found during review.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Add LZ4 compression in pg_dump

2023-05-08 Thread Michael Paquier
On Mon, May 08, 2023 at 08:00:39PM +0200, Tomas Vondra wrote:
> The LZ4Stream_write() forgot to move the pointer to the next chunk, so
> it was happily decompressing the initial chunk over and over. A bit
> embarrassing oversight :-(
> 
> The custom format calls WriteDataToArchiveLZ4(), which was correct.
> 
> The attached patch fixes this for me.

Ouch.  So this was corrupting the dumps and the compression when
trying to write more than two chunks at once, not the decompression
steps.  That addresses the issue here as well, thanks!
--
Michael


signature.asc
Description: PGP signature


Re: Add LZ4 compression in pg_dump

2023-05-08 Thread gkokolatos





--- Original Message ---
On Monday, May 8th, 2023 at 8:20 PM, Tomas Vondra 
 wrote:


> 
> 
> 
> 
> On 5/8/23 18:19, gkokola...@pm.me wrote:
> 
> > --- Original Message ---
> > On Monday, May 8th, 2023 at 3:16 AM, Tom Lane t...@sss.pgh.pa.us wrote:
> > 
> > > I wrote:
> > > 
> > > > Michael Paquier mich...@paquier.xyz writes:
> > > > 
> > > > > While testing this patch, I have triggered an error pointing out that
> > > > > the decompression path of LZ4 is broken for table data. I can
> > > > > reproduce that with a dump of the regression database, as of:
> > > > > make installcheck
> > > > > pg_dump --format=d --file=dump_lz4 --compress=lz4 regression
> > > 
> > > > Ugh. Reproduced here ... so we need an open item for this.
> > > 
> > > BTW, it seems to work with --format=c.
> > 
> > Thank you for the extra tests. It seems that exists a gap in the test
> > coverage. Please find a patch attached that is addressing the issue
> > and attempt to provide tests for it.
> 
> 
> Seems I'm getting messages with a delay - this is mostly the same fix I
> ended up with, not realizing you already posted a fix.

Thank you very much for looking.

> I don't think we need the local "in" variable - the pointer parameter is
> local in the function, so we can modify it directly (with a cast).
> WriteDataToArchiveLZ4 does it that way too.

Sure, patch updated.
 
> The tests are definitely a good idea.

Thank you.

> I wonder if we should add a
> comment to DEFAULT_IO_BUFFER_SIZE mentioning that if we choose to
> increase the value in the future, we needs to tweak the tests too to use
> more data in order to exercise the buffering etc. Maybe it's obvious?
> 

You are right. Added a comment both in the header and in the test.

I hope v2 gets closer to closing the open item for this.

Cheers,
//Georgios


> 
> regards
> 
> --
> Tomas Vondra
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL CompanyFrom 89e7066d6c3c6a7eeb147c3f2d345c3046a4d155 Mon Sep 17 00:00:00 2001
From: Georgios Kokolatos 
Date: Mon, 8 May 2023 19:48:03 +
Subject: [PATCH v2] Advance input pointer when LZ4 compressing data

LZ4File_write() did not advance the input pointer on subsequent invocations of
LZ4F_compressUpdate(). As a result the generated compressed output would be a
compressed version of the same input chunk.

WriteDataToArchiveLZ4() which is also using LZ4F_compressUpdate() did not suffer
from this omission. Tests failed to catch this error because all of their input
would comfortably fit within the same input chunk. Tests have been added to
provide adequate coverage.
---
 src/bin/pg_dump/compress_io.h|  7 -
 src/bin/pg_dump/compress_lz4.c   |  2 ++
 src/bin/pg_dump/t/002_pg_dump.pl | 46 
 3 files changed, 54 insertions(+), 1 deletion(-)

diff --git a/src/bin/pg_dump/compress_io.h b/src/bin/pg_dump/compress_io.h
index fd8752db0d..e8efc57f1a 100644
--- a/src/bin/pg_dump/compress_io.h
+++ b/src/bin/pg_dump/compress_io.h
@@ -17,7 +17,12 @@
 
 #include "pg_backup_archiver.h"
 
-/* Default size used for IO buffers */
+/*
+ * Default size used for IO buffers
+ *
+ * When altering this value it might be useful to verify that the relevant tests
+ * cases are meaningfully updated to provide coverage.
+ */
 #define DEFAULT_IO_BUFFER_SIZE	4096
 
 extern char *supports_compression(const pg_compress_specification compression_spec);
diff --git a/src/bin/pg_dump/compress_lz4.c b/src/bin/pg_dump/compress_lz4.c
index f97b7550d1..b869780c0b 100644
--- a/src/bin/pg_dump/compress_lz4.c
+++ b/src/bin/pg_dump/compress_lz4.c
@@ -588,6 +588,8 @@ LZ4Stream_write(const void *ptr, size_t size, CompressFileHandle *CFH)
 			errno = (errno) ? errno : ENOSPC;
 			return false;
 		}
+
+		ptr = ((const char *) ptr) + chunk;
 	}
 
 	return true;
diff --git a/src/bin/pg_dump/t/002_pg_dump.pl b/src/bin/pg_dump/t/002_pg_dump.pl
index 93e24d5145..d66f3b42ea 100644
--- a/src/bin/pg_dump/t/002_pg_dump.pl
+++ b/src/bin/pg_dump/t/002_pg_dump.pl
@@ -3108,6 +3108,52 @@ my %tests = (
 		},
 	},
 
+	'CREATE TABLE test_compression_method' => {
+		create_order => 110,
+		create_sql   => 'CREATE TABLE dump_test.test_compression_method (
+		   col1 text
+	   );',
+		regexp => qr/^
+			\QCREATE TABLE dump_test.test_compression_method (\E\n
+			\s+\Qcol1 text\E\n
+			\Q);\E
+			/xm,
+		like => {
+			%full_runs,
+			%dump_test_schema_runs,
+			section_pre_data => 1,
+		},
+		unlike => {
+			exclude_dump_test_schema => 1,
+			only_dump_measurement=> 1,
+		},
+	},
+
+	# Insert enough data to surpass DEFAULT_IO_BUFFER_SIZE during
+	# (de)compression operations
+	'COPY test_compression_method' => {
+		create_order => 111,
+		create_sql   => 'INSERT INTO dump_test.test_compression_method (col1) '
+		  . 'SELECT string_agg(a::text, \'\') FROM generate_series(1,4096) a;',
+		regexp => qr/^
+			\QCOPY dump_test.test_compression_method (col1) FROM stdin;\E
+			\n(?:\d{15277}\n){1}\\\.\n
+			/xm,
+		like => {

Re: Add LZ4 compression in pg_dump

2023-05-08 Thread Tomas Vondra



On 5/8/23 18:19, gkokola...@pm.me wrote:
> 
> 
> 
> 
> 
> --- Original Message ---
> On Monday, May 8th, 2023 at 3:16 AM, Tom Lane  wrote:
> 
> 
>>
>>
>> I wrote:
>>
>>> Michael Paquier mich...@paquier.xyz writes:
>>>
 While testing this patch, I have triggered an error pointing out that
 the decompression path of LZ4 is broken for table data. I can
 reproduce that with a dump of the regression database, as of:
 make installcheck
 pg_dump --format=d --file=dump_lz4 --compress=lz4 regression
>>
>>> Ugh. Reproduced here ... so we need an open item for this.
>>
>>
>> BTW, it seems to work with --format=c.
>>
> 
> Thank you for the extra tests. It seems that exists a gap in the test
> coverage. Please find a patch attached that is addressing the issue
> and attempt to provide tests for it.
> 

Seems I'm getting messages with a delay - this is mostly the same fix I
ended up with, not realizing you already posted a fix.

I don't think we need the local "in" variable - the pointer parameter is
local in the function, so we can modify it directly (with a cast).
WriteDataToArchiveLZ4 does it that way too.

The tests are definitely a good idea. I wonder if we should add a
comment to DEFAULT_IO_BUFFER_SIZE mentioning that if we choose to
increase the value in the future, we needs to tweak the tests too to use
more data in order to exercise the buffering etc. Maybe it's obvious?


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Add LZ4 compression in pg_dump

2023-05-08 Thread Tomas Vondra
On 5/8/23 03:16, Tom Lane wrote:
> I wrote:
>> Michael Paquier  writes:
>>> While testing this patch, I have triggered an error pointing out that
>>> the decompression path of LZ4 is broken for table data.  I can
>>> reproduce that with a dump of the regression database, as of:
>>> make installcheck
>>> pg_dump --format=d --file=dump_lz4 --compress=lz4 regression
> 
>> Ugh.  Reproduced here ... so we need an open item for this.
> 
> BTW, it seems to work with --format=c.
> 

The LZ4Stream_write() forgot to move the pointer to the next chunk, so
it was happily decompressing the initial chunk over and over. A bit
embarrassing oversight :-(

The custom format calls WriteDataToArchiveLZ4(), which was correct.

The attached patch fixes this for me.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Companydiff --git a/src/bin/pg_dump/compress_lz4.c b/src/bin/pg_dump/compress_lz4.c
index 423e1b7976f..43c4b9187ef 100644
--- a/src/bin/pg_dump/compress_lz4.c
+++ b/src/bin/pg_dump/compress_lz4.c
@@ -584,6 +584,8 @@ LZ4Stream_write(const void *ptr, size_t size, CompressFileHandle *CFH)
 			errno = (errno) ? errno : ENOSPC;
 			return false;
 		}
+
+		ptr = ((char *) ptr) + chunk;
 	}
 
 	return true;


Re: Add LZ4 compression in pg_dump

2023-05-08 Thread gkokolatos





--- Original Message ---
On Monday, May 8th, 2023 at 3:16 AM, Tom Lane  wrote:


> 
> 
> I wrote:
> 
> > Michael Paquier mich...@paquier.xyz writes:
> > 
> > > While testing this patch, I have triggered an error pointing out that
> > > the decompression path of LZ4 is broken for table data. I can
> > > reproduce that with a dump of the regression database, as of:
> > > make installcheck
> > > pg_dump --format=d --file=dump_lz4 --compress=lz4 regression
> 
> > Ugh. Reproduced here ... so we need an open item for this.
> 
> 
> BTW, it seems to work with --format=c.
> 

Thank you for the extra tests. It seems that exists a gap in the test
coverage. Please find a patch attached that is addressing the issue
and attempt to provide tests for it.

Cheers,
//Georgios

> regards, tom laneFrom 8c6c86c362820e93f066992ede6e5ca23f128807 Mon Sep 17 00:00:00 2001
From: Georgios Kokolatos 
Date: Mon, 8 May 2023 15:25:25 +
Subject: [PATCH v1] Advance input pointer when LZ4 compressing data

LZ4File_write() did not advance the input pointer on subsequent invocations of
LZ4F_compressUpdate(). As a result the generated compressed output would be a
compressed version of the same input chunk.

WriteDataToArchiveLZ4() which is also using LZ4F_compressUpdate() did not suffer
from this omission. Tests failed to catch this error because all of their input
would comfortably fit within the same input chunk. Tests have been added to
provide adequate coverage.
---
 src/bin/pg_dump/compress_lz4.c   |  5 +++-
 src/bin/pg_dump/t/002_pg_dump.pl | 44 
 2 files changed, 48 insertions(+), 1 deletion(-)

diff --git a/src/bin/pg_dump/compress_lz4.c b/src/bin/pg_dump/compress_lz4.c
index f97b7550d1..76211c82f0 100644
--- a/src/bin/pg_dump/compress_lz4.c
+++ b/src/bin/pg_dump/compress_lz4.c
@@ -564,6 +564,7 @@ LZ4Stream_write(const void *ptr, size_t size, CompressFileHandle *CFH)
 	LZ4State   *state = (LZ4State *) CFH->private_data;
 	size_t		status;
 	int			remaining = size;
+	const void *in = ptr;
 
 	/* Lazy init */
 	if (!LZ4Stream_init(state, size, true))
@@ -576,7 +577,7 @@ LZ4Stream_write(const void *ptr, size_t size, CompressFileHandle *CFH)
 		remaining -= chunk;
 
 		status = LZ4F_compressUpdate(state->ctx, state->buffer, state->buflen,
-	 ptr, chunk, NULL);
+	 in, chunk, NULL);
 		if (LZ4F_isError(status))
 		{
 			state->errcode = status;
@@ -588,6 +589,8 @@ LZ4Stream_write(const void *ptr, size_t size, CompressFileHandle *CFH)
 			errno = (errno) ? errno : ENOSPC;
 			return false;
 		}
+
+		in = ((const char *) in) + chunk;
 	}
 
 	return true;
diff --git a/src/bin/pg_dump/t/002_pg_dump.pl b/src/bin/pg_dump/t/002_pg_dump.pl
index 93e24d5145..c6b1225815 100644
--- a/src/bin/pg_dump/t/002_pg_dump.pl
+++ b/src/bin/pg_dump/t/002_pg_dump.pl
@@ -3108,6 +3108,50 @@ my %tests = (
 		},
 	},
 
+	'CREATE TABLE test_compression_method' => {
+		create_order => 110,
+		create_sql   => 'CREATE TABLE dump_test.test_compression_method (
+		   col1 text
+	   );',
+		regexp => qr/^
+			\QCREATE TABLE dump_test.test_compression_method (\E\n
+			\s+\Qcol1 text\E\n
+			\Q);\E
+			/xm,
+		like => {
+			%full_runs,
+			%dump_test_schema_runs,
+			section_pre_data => 1,
+		},
+		unlike => {
+			exclude_dump_test_schema => 1,
+			only_dump_measurement=> 1,
+		},
+	},
+
+	'COPY test_compression_method' => {
+		create_order => 111,
+		create_sql   => 'INSERT INTO dump_test.test_compression_method (col1) '
+		  . 'SELECT string_agg(a::text, \'\') FROM generate_series(1,4096) a;',
+		regexp => qr/^
+			\QCOPY dump_test.test_compression_method (col1) FROM stdin;\E
+			\n(?:\d{15277}\n){1}\\\.\n
+			/xm,
+		like => {
+			%full_runs,
+			data_only=> 1,
+			section_data => 1,
+			only_dump_test_schema=> 1,
+			test_schema_plus_large_objects=> 1,
+		},
+		unlike => {
+			binary_upgrade   => 1,
+			exclude_dump_test_schema => 1,
+			schema_only  => 1,
+			only_dump_measurement=> 1,
+		},
+	},
+
 	'CREATE TABLE fk_reference_test_table' => {
 		create_order => 21,
 		create_sql   => 'CREATE TABLE dump_test.fk_reference_test_table (
-- 
2.34.1



Re: Add LZ4 compression in pg_dump

2023-05-07 Thread Tomas Vondra



On 5/7/23 17:01, gkokola...@pm.me wrote:
> 
> 
> 
> On Sat, May 6, 2023 at 04:51, Michael Paquier   wrote:
>> On Fri, May 05, 2023 at 02:13:28PM +, gkokola...@pm.me wrote:
>> > Good point. I thought about it before submitting the patch. I
>> > concluded that given the complexity and operations involved in
>> > LZ4Stream_read_internal() and the rest of t he pg_dump/pg_restore
>> > code, the memset() call will be negligible. However from the
>> > readability point of view, the function is a bit cleaner with the
>> > memset().
>> >
>> > I will not object to any suggestion though, as this is a very
>> > trivial point. Please find attached a v2 of the patch following the
>> > suggested approach.
>>
>> Please note that an open item has been added for this stuff.
> Thank you but I am not certain I know what that means. Can you please
> explain?
> 

It means it was added to the list of items we need to fix before PG16
gets out:

https://wiki.postgresql.org/wiki/PostgreSQL_16_Open_Items


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Add LZ4 compression in pg_dump

2023-05-07 Thread Tom Lane
I wrote:
> Michael Paquier  writes:
>> While testing this patch, I have triggered an error pointing out that
>> the decompression path of LZ4 is broken for table data.  I can
>> reproduce that with a dump of the regression database, as of:
>> make installcheck
>> pg_dump --format=d --file=dump_lz4 --compress=lz4 regression

> Ugh.  Reproduced here ... so we need an open item for this.

BTW, it seems to work with --format=c.

regards, tom lane




Re: Add LZ4 compression in pg_dump

2023-05-07 Thread Michael Paquier
On Sun, May 07, 2023 at 09:09:25PM -0400, Tom Lane wrote:
> Ugh.  Reproduced here ... so we need an open item for this.

Yep.  Already added.
--
Michael


signature.asc
Description: PGP signature


Re: Add LZ4 compression in pg_dump

2023-05-07 Thread Tom Lane
Michael Paquier  writes:
> While testing this patch, I have triggered an error pointing out that
> the decompression path of LZ4 is broken for table data.  I can
> reproduce that with a dump of the regression database, as of:
> make installcheck
> pg_dump --format=d --file=dump_lz4 --compress=lz4 regression
> createdb regress_lz4
> pg_restore --format=d -d regress_lz4 dump_lz4
> pg_restore: error: COPY failed for table "clstr_tst": ERROR:  extra data 
> after last expected column
> CONTEXT:  COPY clstr_tst, line 15: "326   seis
> xyzzyxyzzyxyzzyxyzzyxyzzyxyzzyxyzzyxyzzyxyzzyxyzzyxyzzyxyzzyxyzzyxyzzyxyzzyxyzzyxyzzyxyzzy..."
> pg_restore: warning: errors ignored on restore: 1

Ugh.  Reproduced here ... so we need an open item for this.

regards, tom lane




Re: Add LZ4 compression in pg_dump

2023-05-07 Thread Michael Paquier
On Fri, May 05, 2023 at 02:13:28PM +, gkokola...@pm.me wrote:
> Good point. I thought about it before submitting the patch. I
> concluded that given the complexity and operations involved in
> LZ4Stream_read_internal() and the rest of t he pg_dump/pg_restore
> code, the memset() call will be negligible. However from the
> readability point of view, the function is a bit cleaner with the
> memset(). 
> 
> I will not object to any suggestion though, as this is a very
> trivial point. Please find attached a v2 of the patch following the
> suggested approach.

Hmm.  I was looking at this patch, and what you are trying to do
sounds rather right to keep a parallel with the gzip and zstd code
paths.

Looking at the code of gzread.c, gzgets() enforces a null-termination
on the string read.  Still, isn't that something we'd better enforce
in read_none() as well?  compress_io.h lists this as a requirement of
the callback, and Zstd_gets() does so already.  read_none() does not
enforce that, unfortunately.

+   /* No work needs to be done for a zero-sized output buffer */
+   if (size <= 0)
+   return 0;

Indeed.  This should be OK.

-   ret = LZ4Stream_read_internal(state, ptr, size, true);
+   Assert(size > 1);

The addition of this assertion is a bit surprising, and this is
inconsistent with Zstd_gets where a length of 1 is authorized.  We
should be more consistent across all the callbacks, IMO, not less, so
as we apply the same API contract across all the compression methods.

While testing this patch, I have triggered an error pointing out that
the decompression path of LZ4 is broken for table data.  I can
reproduce that with a dump of the regression database, as of:
make installcheck
pg_dump --format=d --file=dump_lz4 --compress=lz4 regression
createdb regress_lz4
pg_restore --format=d -d regress_lz4 dump_lz4
pg_restore: error: COPY failed for table "clstr_tst": ERROR:  extra data after 
last expected column
CONTEXT:  COPY clstr_tst, line 15: "32  6   seis
xyzzyxyzzyxyzzyxyzzyxyzzyxyzzyxyzzyxyzzyxyzzyxyzzyxyzzyxyzzyxyzzyxyzzyxyzzyxyzzyxyzzyxyzzy..."
pg_restore: warning: errors ignored on restore: 1

This does not show up with gzip or zstd, and the patch does not
influence the result.  In short it shows up with and without the
patch, on HEAD.  That does not look really stable :/
--
Michael


signature.asc
Description: PGP signature


Re: Add LZ4 compression in pg_dump

2023-05-07 Thread Michael Paquier
On Sun, May 07, 2023 at 03:01:52PM +, gkokola...@pm.me wrote:
> Thank you but I am not certain I know what that means. Can you please explain?

It means that this thread has been added to the following list:
https://wiki.postgresql.org/wiki/PostgreSQL_16_Open_Items#Open_Issues

pg_dump/compress_lz4.c is new as of PostgreSQL 16, and this patch is
fixing a deficiency.  That's just a way outside of the commit fest to
track any problems and make sure these are fixed before the release
happens.
--
Michael


signature.asc
Description: PGP signature


Re: Add LZ4 compression in pg_dump

2023-05-07 Thread gkokolatos
On Sat, May 6, 2023 at 04:51, Michael Paquier <[mich...@paquier.xyz](mailto:On 
Sat, May 6, 2023 at 04:51, Michael Paquier < wrote:

> On Fri, May 05, 2023 at 02:13:28PM +, gkokola...@pm.me wrote:
>> Good point. I thought about it before submitting the patch. I
>> concluded that given the complexity and operations involved in
>> LZ4Stream_read_internal() and the rest of t he pg_dump/pg_restore
>> code, the memset() call will be negligible. However from the
>> readability point of view, the function is a bit cleaner with the
>> memset().
>>
>> I will not object to any suggestion though, as this is a very
>> trivial point. Please find attached a v2 of the patch following the
>> suggested approach.
>
> Please note that an open item has been added for this stuff.

Thank you but I am not certain I know what that means. Can you please explain?

Cheers,
//Georgios

> --
> Michael

Re: Add LZ4 compression in pg_dump

2023-05-05 Thread Michael Paquier
On Fri, May 05, 2023 at 02:13:28PM +, gkokola...@pm.me wrote:
> Good point. I thought about it before submitting the patch. I
> concluded that given the complexity and operations involved in
> LZ4Stream_read_internal() and the rest of t he pg_dump/pg_restore
> code, the memset() call will be negligible. However from the
> readability point of view, the function is a bit cleaner with the
> memset(). 
> 
> I will not object to any suggestion though, as this is a very
> trivial point. Please find attached a v2 of the patch following the
> suggested approach.

Please note that an open item has been added for this stuff.
--
Michael


signature.asc
Description: PGP signature


Re: Add LZ4 compression in pg_dump

2023-05-05 Thread gkokolatos
--- Original Message ---
On Friday, May 5th, 2023 at 3:23 PM, Andrew Dunstan  wrote:

> On 2023-05-05 Fr 06:02, gkokola...@pm.me wrote:
>
>> --- Original Message ---
>> On Friday, May 5th, 2023 at 8:00 AM, Alexander Lakhin
>> [](mailto:exclus...@gmail.com)
>> wrote:
>>
>>> 23.03.2023 20:10, Tomas Vondra wrote:
>>>
 So pushed all three parts, after updating the commit messages a bit.

 This leaves the empty-data issue (which we have a fix for) and the
 switch to LZ4F. And then the zstd part.
>>>
>>> I'm sorry that I haven't noticed/checked that before, but when trying to
>>> perform check-world with Valgrind I've discovered another issue presumably
>>> related to LZ4File_gets().
>>> When running under Valgrind:
>>> PROVE_TESTS=t/002_pg_dump.pl make check -C src/bin/pg_dump/
>>> I get:
>>> ...
>>> 07:07:11.683 ok 1939 - compression_lz4_dir: glob check for
>>> .../src/bin/pg_dump/tmp_check/tmp_test_HB6A/compression_lz4_dir/*.dat.lz4
>>> # Running: pg_restore --jobs=2 
>>> --file=.../src/bin/pg_dump/tmp_check/tmp_test_HB6A/compression_lz4_dir.sql
>>> .../src/bin/pg_dump/tmp_check/tmp_test_HB6A/compression_lz4_dir
>>>
>>> ==00:00:00:00.579 2811926== Conditional jump or move depends on 
>>> uninitialised value(s)
>>> ==00:00:00:00.579 2811926== at 0x4853376: rawmemchr 
>>> (vg_replace_strmem.c:1548)
>>> ==00:00:00:00.579 2811926== by 0x4C96A67: _IO_str_init_static_internal 
>>> (strops.c:41)
>>> ==00:00:00:00.579 2811926== by 0x4C693A2: _IO_strfile_read (strfile.h:95)
>>> ==00:00:00:00.579 2811926== by 0x4C693A2: __isoc99_sscanf 
>>> (isoc99_sscanf.c:28)
>>> ==00:00:00:00.579 2811926== by 0x11DB6F: _LoadLOs 
>>> (pg_backup_directory.c:458)
>>> ==00:00:00:00.579 2811926== by 0x11DD1E: _PrintTocData 
>>> (pg_backup_directory.c:422)
>>> ==00:00:00:00.579 2811926== by 0x118484: restore_toc_entry 
>>> (pg_backup_archiver.c:882)
>>> ==00:00:00:00.579 2811926== by 0x1190CC: RestoreArchive 
>>> (pg_backup_archiver.c:699)
>>> ==00:00:00:00.579 2811926== by 0x10F25D: main (pg_restore.c:414)
>>> ==00:00:00:00.579 2811926==
>>> ...
>>>
>>> It looks like the line variable returned by gets_func() here is not
>>> null-terminated:
>>> while ((CFH->gets_func(line, MAXPGPATH, CFH)) != NULL)
>>>
>>> {
>>> ...
>>> if (sscanf(line, "%u %" CppAsString2(MAXPGPATH) "s\n", , lofname) != 2)
>>> ...
>>> And Valgrind doesn't like it.
>>
>> Valgrind is correct to not like it. LZ4Stream_gets() got modeled after
>> gets() when it should have been modeled after fgets().
>>
>> Please find a patch attached to address it.
>
> Isn't using memset here a bit wasteful? Why not just put a null at the end 
> after calling LZ4Stream_read_internal(), which tells you how many bytes it 
> has written?

Good point. I thought about it before submitting the patch. I concluded that 
given the complexity and operations involved in LZ4Stream_read_internal() and 
the rest of the pg_dump/pg_restore code, the memset() call will be negligible. 
However from the readability point of view, the function is a bit cleaner with 
the memset().

I will not object to any suggestion though, as this is a very trivial point. 
Please find attached a v2 of the patch following the suggested approach.

Cheers,

//Georgios

> cheers
>
> andrew
>
> --
> Andrew Dunstan
> EDB:
> https://www.enterprisedb.comFrom 65dbce1eb81597e3dd44eff62d8d667b0a3322da Mon Sep 17 00:00:00 2001
From: Georgios Kokolatos 
Date: Fri, 5 May 2023 09:47:02 +
Subject: [PATCH v2] Null terminate the output buffer of LZ4Stream_gets

LZ4Stream_gets did not null terminate its output buffer. Its callers expected
the buffer to be null terminated so they passed it around to functions such as
sscanf with unintended consequences.

Reported-by: Alexander Lakhin
---
 src/bin/pg_dump/compress_lz4.c | 14 +-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/src/bin/pg_dump/compress_lz4.c b/src/bin/pg_dump/compress_lz4.c
index 423e1b7976..0f447919b2 100644
--- a/src/bin/pg_dump/compress_lz4.c
+++ b/src/bin/pg_dump/compress_lz4.c
@@ -459,6 +459,10 @@ LZ4Stream_read_internal(LZ4State *state, void *ptr, int ptrsize, bool eol_flag)
 	if (!LZ4Stream_init(state, size, false /* decompressing */ ))
 		return -1;
 
+	/* No work needs to be done for a zero-sized output buffer */
+	if (size <= 0)
+		return 0;
+
 	/* Verify that there is enough space in the outbuf */
 	if (size > state->buflen)
 	{
@@ -636,7 +640,9 @@ LZ4Stream_gets(char *ptr, int size, CompressFileHandle *CFH)
 	LZ4State   *state = (LZ4State *) CFH->private_data;
 	int			ret;
 
-	ret = LZ4Stream_read_internal(state, ptr, size, true);
+	Assert(size > 1);
+
+	ret = LZ4Stream_read_internal(state, ptr, size - 1, true);
 	if (ret < 0 || (ret == 0 && !LZ4Stream_eof(CFH)))
 		pg_fatal("could not read from input file: %s", LZ4Stream_get_error(CFH));
 
@@ -644,6 +650,12 @@ LZ4Stream_gets(char *ptr, int size, CompressFileHandle *CFH)
 	if (ret == 0)
 		return NULL;
 
+	/*
+	 * Our caller expects the return string to be 

Re: Add LZ4 compression in pg_dump

2023-05-05 Thread Andrew Dunstan


On 2023-05-05 Fr 06:02, gkokola...@pm.me wrote:





--- Original Message ---
On Friday, May 5th, 2023 at 8:00 AM, Alexander Lakhin  
wrote:




23.03.2023 20:10, Tomas Vondra wrote:


So pushed all three parts, after updating the commit messages a bit.

This leaves the empty-data issue (which we have a fix for) and the
switch to LZ4F. And then the zstd part.


I'm sorry that I haven't noticed/checked that before, but when trying to
perform check-world with Valgrind I've discovered another issue presumably
related to LZ4File_gets().
When running under Valgrind:
PROVE_TESTS=t/002_pg_dump.pl make check -C src/bin/pg_dump/
I get:
...
07:07:11.683 ok 1939 - compression_lz4_dir: glob check for
.../src/bin/pg_dump/tmp_check/tmp_test_HB6A/compression_lz4_dir/*.dat.lz4
# Running: pg_restore --jobs=2 
--file=.../src/bin/pg_dump/tmp_check/tmp_test_HB6A/compression_lz4_dir.sql
.../src/bin/pg_dump/tmp_check/tmp_test_HB6A/compression_lz4_dir

==00:00:00:00.579 2811926== Conditional jump or move depends on uninitialised 
value(s)
==00:00:00:00.579 2811926== at 0x4853376: rawmemchr (vg_replace_strmem.c:1548)
==00:00:00:00.579 2811926== by 0x4C96A67: _IO_str_init_static_internal 
(strops.c:41)
==00:00:00:00.579 2811926== by 0x4C693A2: _IO_strfile_read (strfile.h:95)
==00:00:00:00.579 2811926== by 0x4C693A2: __isoc99_sscanf (isoc99_sscanf.c:28)
==00:00:00:00.579 2811926== by 0x11DB6F: _LoadLOs (pg_backup_directory.c:458)
==00:00:00:00.579 2811926== by 0x11DD1E: _PrintTocData 
(pg_backup_directory.c:422)
==00:00:00:00.579 2811926== by 0x118484: restore_toc_entry 
(pg_backup_archiver.c:882)
==00:00:00:00.579 2811926== by 0x1190CC: RestoreArchive 
(pg_backup_archiver.c:699)
==00:00:00:00.579 2811926== by 0x10F25D: main (pg_restore.c:414)
==00:00:00:00.579 2811926==
...

It looks like the line variable returned by gets_func() here is not
null-terminated:
while ((CFH->gets_func(line, MAXPGPATH, CFH)) != NULL)

{
...
if (sscanf(line, "%u %" CppAsString2(MAXPGPATH) "s\n", , lofname) != 2)
...
And Valgrind doesn't like it.


Valgrind is correct to not like it. LZ4Stream_gets() got modeled after
gets() when it should have been modeled after fgets().

Please find a patch attached to address it.




Isn't using memset here a bit wasteful? Why not just put a null at the 
end after calling LZ4Stream_read_internal(), which tells you how many 
bytes it has written?



cheers


andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com


Re: Add LZ4 compression in pg_dump

2023-05-05 Thread gkokolatos





--- Original Message ---
On Friday, May 5th, 2023 at 8:00 AM, Alexander Lakhin  
wrote:


> 
> 
> 23.03.2023 20:10, Tomas Vondra wrote:
> 
> > So pushed all three parts, after updating the commit messages a bit.
> > 
> > This leaves the empty-data issue (which we have a fix for) and the
> > switch to LZ4F. And then the zstd part.
> 
> 
> I'm sorry that I haven't noticed/checked that before, but when trying to
> perform check-world with Valgrind I've discovered another issue presumably
> related to LZ4File_gets().
> When running under Valgrind:
> PROVE_TESTS=t/002_pg_dump.pl make check -C src/bin/pg_dump/
> I get:
> ...
> 07:07:11.683 ok 1939 - compression_lz4_dir: glob check for
> .../src/bin/pg_dump/tmp_check/tmp_test_HB6A/compression_lz4_dir/*.dat.lz4
> # Running: pg_restore --jobs=2 
> --file=.../src/bin/pg_dump/tmp_check/tmp_test_HB6A/compression_lz4_dir.sql
> .../src/bin/pg_dump/tmp_check/tmp_test_HB6A/compression_lz4_dir
> 
> ==00:00:00:00.579 2811926== Conditional jump or move depends on uninitialised 
> value(s)
> ==00:00:00:00.579 2811926== at 0x4853376: rawmemchr (vg_replace_strmem.c:1548)
> ==00:00:00:00.579 2811926== by 0x4C96A67: _IO_str_init_static_internal 
> (strops.c:41)
> ==00:00:00:00.579 2811926== by 0x4C693A2: _IO_strfile_read (strfile.h:95)
> ==00:00:00:00.579 2811926== by 0x4C693A2: __isoc99_sscanf (isoc99_sscanf.c:28)
> ==00:00:00:00.579 2811926== by 0x11DB6F: _LoadLOs (pg_backup_directory.c:458)
> ==00:00:00:00.579 2811926== by 0x11DD1E: _PrintTocData 
> (pg_backup_directory.c:422)
> ==00:00:00:00.579 2811926== by 0x118484: restore_toc_entry 
> (pg_backup_archiver.c:882)
> ==00:00:00:00.579 2811926== by 0x1190CC: RestoreArchive 
> (pg_backup_archiver.c:699)
> ==00:00:00:00.579 2811926== by 0x10F25D: main (pg_restore.c:414)
> ==00:00:00:00.579 2811926==
> ...
> 
> It looks like the line variable returned by gets_func() here is not
> null-terminated:
> while ((CFH->gets_func(line, MAXPGPATH, CFH)) != NULL)
> 
> {
> ...
> if (sscanf(line, "%u %" CppAsString2(MAXPGPATH) "s\n", , lofname) != 2)
> ...
> And Valgrind doesn't like it.
> 

Valgrind is correct to not like it. LZ4Stream_gets() got modeled after
gets() when it should have been modeled after fgets().

Please find a patch attached to address it.

Cheers,
//Georgios

> Best regards,
> AlexanderFrom 587873da2b563c59b281051c2636cda667abf099 Mon Sep 17 00:00:00 2001
From: Georgios Kokolatos 
Date: Fri, 5 May 2023 09:47:02 +
Subject: [PATCH] Null terminate the output buffer of LZ4Stream_gets

LZ4Stream_gets did not null terminate its output buffer. Its callers expected
the buffer to be null terminated so they passed it around to functions such as
sscanf with unintended consequences.

Reported-by: Alexander Lakhin
---
 src/bin/pg_dump/compress_lz4.c | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/src/bin/pg_dump/compress_lz4.c b/src/bin/pg_dump/compress_lz4.c
index 423e1b7976..26c9a8b280 100644
--- a/src/bin/pg_dump/compress_lz4.c
+++ b/src/bin/pg_dump/compress_lz4.c
@@ -459,6 +459,10 @@ LZ4Stream_read_internal(LZ4State *state, void *ptr, int ptrsize, bool eol_flag)
 	if (!LZ4Stream_init(state, size, false /* decompressing */ ))
 		return -1;
 
+	/* No work needs to be done for a zero-sized output buffer */
+	if (size <= 0)
+		return 0;
+
 	/* Verify that there is enough space in the outbuf */
 	if (size > state->buflen)
 	{
@@ -636,7 +640,12 @@ LZ4Stream_gets(char *ptr, int size, CompressFileHandle *CFH)
 	LZ4State   *state = (LZ4State *) CFH->private_data;
 	int			ret;
 
-	ret = LZ4Stream_read_internal(state, ptr, size, true);
+	Assert(size > 1);
+
+	/* Our caller expects the return string to be NULL terminated */
+	memset(ptr, '\0', size);
+
+	ret = LZ4Stream_read_internal(state, ptr, size - 1, true);
 	if (ret < 0 || (ret == 0 && !LZ4Stream_eof(CFH)))
 		pg_fatal("could not read from input file: %s", LZ4Stream_get_error(CFH));
 
-- 
2.34.1



Re: Add LZ4 compression in pg_dump

2023-05-05 Thread Alexander Lakhin

23.03.2023 20:10, Tomas Vondra wrote:

So pushed all three parts, after updating the commit messages a bit.

This leaves the empty-data issue (which we have a fix for) and the
switch to LZ4F. And then the zstd part.



I'm sorry that I haven't noticed/checked that before, but when trying to
perform check-world with Valgrind I've discovered another issue presumably
related to LZ4File_gets().
When running under Valgrind:
PROVE_TESTS=t/002_pg_dump.pl make check -C src/bin/pg_dump/
I get:
...
[07:07:11.683](0.000s) ok 1939 - compression_lz4_dir: glob check for 
.../src/bin/pg_dump/tmp_check/tmp_test_HB6A/compression_lz4_dir/*.dat.lz4
# Running: pg_restore --jobs=2 --file=.../src/bin/pg_dump/tmp_check/tmp_test_HB6A/compression_lz4_dir.sql 
.../src/bin/pg_dump/tmp_check/tmp_test_HB6A/compression_lz4_dir


==00:00:00:00.579 2811926== Conditional jump or move depends on uninitialised 
value(s)
==00:00:00:00.579 2811926==    at 0x4853376: rawmemchr 
(vg_replace_strmem.c:1548)
==00:00:00:00.579 2811926==    by 0x4C96A67: _IO_str_init_static_internal 
(strops.c:41)
==00:00:00:00.579 2811926==    by 0x4C693A2: _IO_strfile_read (strfile.h:95)
==00:00:00:00.579 2811926==    by 0x4C693A2: __isoc99_sscanf 
(isoc99_sscanf.c:28)
==00:00:00:00.579 2811926==    by 0x11DB6F: _LoadLOs (pg_backup_directory.c:458)
==00:00:00:00.579 2811926==    by 0x11DD1E: _PrintTocData 
(pg_backup_directory.c:422)
==00:00:00:00.579 2811926==    by 0x118484: restore_toc_entry 
(pg_backup_archiver.c:882)
==00:00:00:00.579 2811926==    by 0x1190CC: RestoreArchive 
(pg_backup_archiver.c:699)
==00:00:00:00.579 2811926==    by 0x10F25D: main (pg_restore.c:414)
==00:00:00:00.579 2811926==
...

It looks like the line variable returned by gets_func() here is not
null-terminated:
    while ((CFH->gets_func(line, MAXPGPATH, CFH)) != NULL)
    {
...
    if (sscanf(line, "%u %" CppAsString2(MAXPGPATH) "s\n", , lofname) 
!= 2)
...
And Valgrind doesn't like it.

Best regards,
Alexander




Re: Add LZ4 compression in pg_dump

2023-04-26 Thread Michael Paquier
On Wed, Apr 26, 2023 at 08:50:46AM +, gkokola...@pm.me wrote:
> For what is worth, I think this would be the best approach. +1

Thanks.  I have gone with that, then!
--
Michael


signature.asc
Description: PGP signature


Re: Add LZ4 compression in pg_dump

2023-04-26 Thread gkokolatos






--- Original Message ---
On Tuesday, April 25th, 2023 at 8:02 AM, Michael Paquier  
wrote:


> 
> 
> On Wed, Apr 12, 2023 at 07:53:53PM -0500, Justin Pryzby wrote:
> 
> > I doubt it - in the !HAVE_LIBZ case, it's currently an "if" statement
> > with nothing but a comment, which isn't a problem.
> > 
> > I think the only issue with an empty "if" is when you have no braces,
> > like:
> > 
> > if (...)
> > #if ...
> > something;
> > #endif
> > 
> > // problem here //
> 
> 
> (My apologies for the late reply.)
> 
> Still it could be easily messed up, and that's not a style that
> really exists in the tree, either, because there are always #else
> blocks set up in such cases. Another part that makes me a bit
> uncomfortable is that we would still call twice
> parse_compress_specification(), something that should not happen but
> we are doing so on HEAD because the default compression_algorithm_str
> is "none" and we want to enforce "gzip" for custom and directory
> formats when building with zlib.
> 
> What about just moving this block a bit up, just before the
> compression spec parsing, then? If we set compression_algorithm_str,
> the specification is compiled with the expected default, once instead
> of twice.

For what is worth, I think this would be the best approach. +1

Cheers,
//Georgios

> --
> Michael




Re: Add LZ4 compression in pg_dump

2023-04-25 Thread Michael Paquier
On Wed, Apr 12, 2023 at 07:53:53PM -0500, Justin Pryzby wrote:
> I doubt it - in the !HAVE_LIBZ case, it's currently an "if" statement
> with nothing but a comment, which isn't a problem.
> 
> I think the only issue with an empty "if" is when you have no braces,
> like:
> 
>   if (...)
> #if ...
>   something;
> #endif
> 
>   // problem here //

(My apologies for the late reply.)

Still it could be easily messed up, and that's not a style that
really exists in the tree, either, because there are always #else
blocks set up in such cases.  Another part that makes me a bit
uncomfortable is that we would still call twice
parse_compress_specification(), something that should not happen but
we are doing so on HEAD because the default compression_algorithm_str
is "none" and we want to enforce "gzip" for custom and directory
formats when building with zlib.

What about just moving this block a bit up, just before the
compression spec parsing, then?  If we set compression_algorithm_str,
the specification is compiled with the expected default, once instead
of twice.
--
Michael
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 058244cd17..e4f32d170f 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -721,6 +721,21 @@ main(int argc, char **argv)
 	if (archiveFormat == archNull)
 		plainText = 1;
 
+	/*
+	 * Custom and directory formats are compressed by default with gzip when
+	 * available, not the others.  If gzip is not available, no compression
+	 * is done by default.
+	 */
+	if ((archiveFormat == archCustom || archiveFormat == archDirectory) &&
+		!user_compression_defined)
+	{
+#ifdef HAVE_LIBZ
+		compression_algorithm_str = "gzip";
+#else
+		compression_algorithm_str = "none";
+#endif
+	}
+
 	/*
 	 * Compression options
 	 */
@@ -749,21 +764,6 @@ main(int argc, char **argv)
 		pg_log_warning("compression option \"%s\" is not currently supported by pg_dump",
 	   "workers");
 
-	/*
-	 * Custom and directory formats are compressed by default with gzip when
-	 * available, not the others.
-	 */
-	if ((archiveFormat == archCustom || archiveFormat == archDirectory) &&
-		!user_compression_defined)
-	{
-#ifdef HAVE_LIBZ
-		parse_compress_specification(PG_COMPRESSION_GZIP, NULL,
-	 _spec);
-#else
-		/* Nothing to do in the default case */
-#endif
-	}
-
 	/*
 	 * If emitting an archive format, we always want to emit a DATABASE item,
 	 * in case --create is specified at pg_restore time.


signature.asc
Description: PGP signature


Re: Add LZ4 compression in pg_dump

2023-04-12 Thread Justin Pryzby
On Thu, Apr 13, 2023 at 09:37:06AM +0900, Michael Paquier wrote:
> > If you don't insist on calling parse(NONE), the only change is to remove
> > the empty #else, which was my original patch.
> 
> Removing the empty else has as problem to create an empty if block,
> which could be itself a cause of warnings?

I doubt it - in the !HAVE_LIBZ case, it's currently an "if" statement
with nothing but a comment, which isn't a problem.

I think the only issue with an empty "if" is when you have no braces,
like:

if (...)
#if ...
something;
#endif

// problem here //

-- 
Justin




Re: Add LZ4 compression in pg_dump

2023-04-12 Thread Michael Paquier
On Wed, Apr 12, 2023 at 05:52:40PM -0500, Justin Pryzby wrote:
> I don't think you need to call parse_compress_specification(NONE).
> As you wrote it, if zlib is unavailable, there's no parse(NONE) call,
> even for directory and custom formats.  And there's no parse(NONE) call
> for plan format when zlib is available.

Yeah, that's not necessary, but I was wondering if it made the code a
bit cleaner, or else the non-zlib path would rely on the default
compression method string.

> The old way had preprocessor #if around both the "if" and "else" - is
> that what you meant?
> 
> If you don't insist on calling parse(NONE), the only change is to remove
> the empty #else, which was my original patch.

Removing the empty else has as problem to create an empty if block,
which could be itself a cause of warnings?

> If I were to rewrite the comment, it'd say:
> 
> +* When gzip is available, custom and directory formats are 
> compressed by
> +* default

Okay.
--
Michael


signature.asc
Description: PGP signature


Re: Add LZ4 compression in pg_dump

2023-04-12 Thread Justin Pryzby
On Thu, Apr 13, 2023 at 07:23:48AM +0900, Michael Paquier wrote:
> On Tue, Apr 11, 2023 at 08:19:59PM -0500, Justin Pryzby wrote:
> > On Wed, Apr 12, 2023 at 10:07:08AM +0900, Michael Paquier wrote:
> >> Yes, this comment gives no value as it stands.  I would be tempted to
> >> follow the suggestion to group the whole code block in a single ifdef,
> >> including the check, and remove this comment.  Like the attached
> >> perhaps?
> > 
> > +1
> 
> Let me try this one again, as the previous patch would cause a warning
> under --without:-zlib as user_compression_defined would be unused.  We
> could do something like the attached instead.  It means doing twice
> parse_compress_specification() for the non-zlib path, still we are
> already doing so for the zlib path.
> 
> If there are other ideas, feel free.

I don't think you need to call parse_compress_specification(NONE).
As you wrote it, if zlib is unavailable, there's no parse(NONE) call,
even for directory and custom formats.  And there's no parse(NONE) call
for plan format when zlib is available.

The old way had preprocessor #if around both the "if" and "else" - is
that what you meant ?

If you don't insist on calling parse(NONE), the only change is to remove
the empty #else, which was my original patch.

"if no compression specification has been specified" is redundant with
"by default", and causes "not the others" to dangle.

If I were to rewrite the comment, it'd say:

+* When gzip is available, custom and directory formats are compressed 
by
+* default




Re: Add LZ4 compression in pg_dump

2023-04-12 Thread Michael Paquier
On Tue, Apr 11, 2023 at 08:19:59PM -0500, Justin Pryzby wrote:
> On Wed, Apr 12, 2023 at 10:07:08AM +0900, Michael Paquier wrote:
>> Yes, this comment gives no value as it stands.  I would be tempted to
>> follow the suggestion to group the whole code block in a single ifdef,
>> including the check, and remove this comment.  Like the attached
>> perhaps?
> 
> +1

Let me try this one again, as the previous patch would cause a warning
under --without:-zlib as user_compression_defined would be unused.  We
could do something like the attached instead.  It means doing twice
parse_compress_specification() for the non-zlib path, still we are
already doing so for the zlib path.

If there are other ideas, feel free.
--
Michael
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 967ced4eed..596f64ed2c 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -751,17 +751,21 @@ main(int argc, char **argv)
 
 	/*
 	 * Custom and directory formats are compressed by default with gzip when
-	 * available, not the others.
+	 * available if no compression specification has been specified, not the
+	 * others.
 	 */
-	if ((archiveFormat == archCustom || archiveFormat == archDirectory) &&
-		!user_compression_defined)
+	if (!user_compression_defined)
 	{
+		if (archiveFormat == archCustom || archiveFormat == archDirectory)
+		{
 #ifdef HAVE_LIBZ
-		parse_compress_specification(PG_COMPRESSION_GZIP, NULL,
-	 _spec);
+			parse_compress_specification(PG_COMPRESSION_GZIP, NULL,
+		 _spec);
 #else
-		/* Nothing to do in the default case */
+			parse_compress_specification(PG_COMPRESSION_NONE, NULL,
+		 _spec);
 #endif
+		}
 	}
 
 	/*


signature.asc
Description: PGP signature


Re: Add LZ4 compression in pg_dump

2023-04-11 Thread Justin Pryzby
On Wed, Apr 12, 2023 at 10:07:08AM +0900, Michael Paquier wrote:
> On Tue, Apr 11, 2023 at 07:41:11PM -0500, Justin Pryzby wrote:
> > Maybe I would write it as: "if zlib is unavailable, default to no
> > compression".  But I think that's best done in the leading comment, and
> > not inside an empty preprocessor #else.
> > 
> > I was hoping Michael would comment on this.
> 
> (Sorry for the late reply, somewhat missed that.)
> 
> > The placement and phrasing of the comment makes no sense to me.
> 
> Yes, this comment gives no value as it stands.  I would be tempted to
> follow the suggestion to group the whole code block in a single ifdef,
> including the check, and remove this comment.  Like the attached
> perhaps?

+1




Re: Add LZ4 compression in pg_dump

2023-04-11 Thread Michael Paquier
On Tue, Apr 11, 2023 at 07:41:11PM -0500, Justin Pryzby wrote:
> Maybe I would write it as: "if zlib is unavailable, default to no
> compression".  But I think that's best done in the leading comment, and
> not inside an empty preprocessor #else.
> 
> I was hoping Michael would comment on this.

(Sorry for the late reply, somewhat missed that.)

> The placement and phrasing of the comment makes no sense to me.

Yes, this comment gives no value as it stands.  I would be tempted to
follow the suggestion to group the whole code block in a single ifdef,
including the check, and remove this comment.  Like the attached
perhaps?
--
Michael
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 967ced4eed..b7dda5ab27 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -753,16 +753,14 @@ main(int argc, char **argv)
 	 * Custom and directory formats are compressed by default with gzip when
 	 * available, not the others.
 	 */
+#ifdef HAVE_LIBZ
 	if ((archiveFormat == archCustom || archiveFormat == archDirectory) &&
 		!user_compression_defined)
 	{
-#ifdef HAVE_LIBZ
 		parse_compress_specification(PG_COMPRESSION_GZIP, NULL,
 	 _spec);
-#else
-		/* Nothing to do in the default case */
-#endif
 	}
+#endif
 
 	/*
 	 * If emitting an archive format, we always want to emit a DATABASE item,


signature.asc
Description: PGP signature


Re: Add LZ4 compression in pg_dump

2023-04-11 Thread Justin Pryzby
On Mon, Feb 27, 2023 at 02:33:04PM +, gkokola...@pm.me wrote:
> > > - Finally, the "Nothing to do in the default case" comment comes from
> > > Michael's commit 5e73a6048:
> > > 
> > > + /*
> > > + * Custom and directory formats are compressed by default with gzip when
> > > + * available, not the others.
> > > + /
> > > + if ((archiveFormat == archCustom || archiveFormat == archDirectory) &&
> > > + !user_compression_defined)
> > > {
> > > #ifdef HAVE_LIBZ
> > > - if (archiveFormat == archCustom || archiveFormat == archDirectory)
> > > - compressLevel = Z_DEFAULT_COMPRESSION;
> > > - else
> > > + parse_compress_specification(PG_COMPRESSION_GZIP, NULL,
> > > + _spec);
> > > +#else
> > > + / Nothing to do in the default case */
> > > #endif
> > > - compressLevel = 0;
> > > }
> > > 
> > > As the comment says: for -Fc and -Fd, the compression is set to zlib, if
> > > enabled, and when not otherwise specified by the user.
> > > 
> > > Before 5e73a6048, this set compressLevel=0 for -Fp and -Ft, and when
> > > zlib was unavailable.
> > > 
> > > But I'm not sure why there's now an empty "#else". I also don't know
> > > what "the default case" refers to.
> > > 
> > > Maybe the best thing here is to move the preprocessor #if, since it's no
> > > longer in the middle of a runtime conditional:
> > > 
> > > #ifdef HAVE_LIBZ
> > > + if ((archiveFormat == archCustom || archiveFormat == archDirectory) &&
> > > + !user_compression_defined)
> > > + parse_compress_specification(PG_COMPRESSION_GZIP, NULL,
> > > + _spec);
> > > #endif
> > > 
> > > ...but that elicits a warning about "variable set but not used"...
> > 
> > 
> > Not sure, I need to think about this a bit.

> /* Nothing to do for the default case when LIBZ is not available */
> is easier to understand.

Maybe I would write it as: "if zlib is unavailable, default to no
compression".  But I think that's best done in the leading comment, and
not inside an empty preprocessor #else.

I was hoping Michael would comment on this.
The placement and phrasing of the comment makes no sense to me.

-- 
Justin




Re: lz4 --rm on Ubuntu 18.04 (Add LZ4 compression to pg_dump)

2023-04-06 Thread Daniel Gustafsson
> On 6 Apr 2023, at 18:39, Justin Pryzby  wrote:

> *If* you wanted to do something to fix this, you could create a key
> called files_to_remove_after_loading, and run unlink on those files
> rather than running a shell command.  Or maybe just remove the file
> unconditionally at the start of the script ?

Since the test is written in Perl, and Perl has a function for deleting files
which abstracts the platform differences, using it seems like a logical choice?
{cleanup_cmd} can be replaced with {cleanup_files} with an unlink called on
that?

--
Daniel Gustafsson





Re: lz4 --rm on Ubuntu 18.04 (Add LZ4 compression to pg_dump)

2023-04-06 Thread Justin Pryzby
On Tue, Mar 14, 2023 at 12:16:16AM +0100, Tomas Vondra wrote:
> On 3/9/23 19:00, Tomas Vondra wrote:
> > On 3/9/23 01:30, Michael Paquier wrote:
> >> On Thu, Mar 09, 2023 at 12:39:08AM +0100, Tomas Vondra wrote:
> >>> IMO we should fix that. We have a bunch of buildfarm members running on
> >>> Ubuntu 18.04 (or older) - it's true none of them seems to be running TAP
> >>> tests. But considering how trivial the fix is ...
> >>>
> >>> Barring objections, I'll push a fix early next week.
> >>
> >> +1, better to change that, thanks.  Actually, would --rm be OK even on
> >> Windows?  As far as I can see, the CI detects a LZ4 command for the
> >> VS2019 environment but not the liblz4 libraries that would be needed
> >> to trigger the set of tests.
> > 
> > Thanks for noticing that. I'll investigate next week.
> 
> So, here's a fix that should (I think) replace the 'lz4 --rm' with a
> simple 'rm'. I have two doubts about this, though:
> 
> 1) I haven't found a simple way to inject additional command into the
> test. The pg_dump runs have a predefined list of "steps" to run:
> 
>   -- compress_cmd
>   -- glob_patterns
>   -- command_like
>   -- restore_cmd
> 
> and I don't think there's a good place to inject the 'rm' so I ended up
> adding a 'cleanup_cmd' right after 'compress_cmd'. But it seems a bit
> strange / hacky. Maybe there's a better way?

I don't know if there's a better way, and I don't think it's worth
complicating the tests by more than about 2 lines to handle this.

> 2) I wonder if Windows will know what 'rm' means. I haven't found any
> TAP test doing 'rm' and don't see 'rm' in any $ENV either.

CI probably will, since it's Andres' image built with git-tools and
other helpful stuff installed.  But in general I think it won't; perl is
being used for all the portable stuff.

*If* you wanted to do something to fix this, you could create a key
called files_to_remove_after_loading, and run unlink on those files
rather than running a shell command.  Or maybe just remove the file
unconditionally at the start of the script ?

> That being said, I have no idea how to make this work on our Windows CI.
> As mentioned, the environment is missing the lz4 library - there's a
> 
>   setup_additional_packages_script: |
> REM choco install -y --no-progress ...
> 
> in the .yml file, but AFAICS the chocolatey does not have lz4 :-/

I updated what I'd done in the zstd patch to also run with LZ4.
This won't apply directly due to other patches, but you get the idea...

Maybe it'd be good to have a commented-out "wraps" hint like there is
for choco.  The downloaded files could be cached, too.

diff --git a/.cirrus.yml b/.cirrus.yml
index a3977a4036e..b4387a739f3 100644
--- a/.cirrus.yml
+++ b/.cirrus.yml
@@ -644,9 +644,11 @@ task:
 vcvarsall x64
 mkdir subprojects
 meson wrap install zstd
-meson configure -D zstd:multithread=enabled --force-fallback-for=zstd
+meson wrap install lz4
+meson subprojects download
+meson configure -D zstd:multithread=enabled --force-fallback-for=zstd 
--force-fallback-for=lz4
 set 
CC=c:\ProgramData\chocolatey\lib\ccache\tools\ccache-4.8-windows-x86_64\ccache.exe
 cl.exe
-meson setup --backend ninja --buildtype debug 
-Dc_link_args=/DEBUG:FASTLINK -Dcassert=true -Db_pch=false 
-Dextra_lib_dirs=c:\openssl\1.1\lib -Dextra_include_dirs=c:\openssl\1.1\include 
-DTAR=%TAR% -DPG_TEST_EXTRA="%PG_TEST_EXTRA%" -D zstd=enabled -Dc_args="/Z7 
/MDd" build
+meson setup --backend ninja --buildtype debug 
-Dc_link_args=/DEBUG:FASTLINK -Dcassert=true -Db_pch=false 
-Dextra_lib_dirs=c:\openssl\1.1\lib -Dextra_include_dirs=c:\openssl\1.1\include 
-DTAR=%TAR% -DPG_TEST_EXTRA="%PG_TEST_EXTRA%" -D zstd=enabled -D lz4=enabled 
-Dc_args="/Z7 /MDd" build
 
   build_script: |
 vcvarsall x64

-- 
Justin




Re: Add LZ4 compression in pg_dump

2023-03-31 Thread Tomas Vondra
On 3/31/23 11:19, gkokola...@pm.me wrote:
> 
>> ...
>>
>>
>> I think the patch is fine, but I'm wondering if the renames shouldn't go
>> a bit further. It removes references to LZ4File struct, but there's a
>> bunch of functions with LZ4File_ prefix. Why not to simply use LZ4_
>> prefix? We don't have GzipFile either.
>>
>> Sure, it might be a bit confusing because lz4.h uses LZ4_ prefix, but
>> then we probably should not define LZ4_compressor_init ...
> 
> This is a good point. The initial thought was that since lz4.h is now
> removed, such ambiguity will not be present. In v2 of the patch the
> function is renamed to `LZ4State_compression_init` since this name
> describes better its purpose. It initializes the LZ4State for
> compression.
> 
> As for the LZ4File_ prefix, I have no objections. Please find the
> prefix changed to LZ4Stream_. For the record, the word 'File' is not
> unique to the lz4 implementation. The common data structure used by
> the API in compress_io.h:
> 
>typedef struct CompressFileHandle CompressFileHandle; 
> 
> The public functions for this API are named:
> 
>   InitCompressFileHandle
>   InitDiscoverCompressFileHandle
>   EndCompressFileHandle
> 
> And within InitCompressFileHandle the pattern is:
> 
> if (compression_spec.algorithm == PG_COMPRESSION_NONE)
> InitCompressFileHandleNone(CFH, compression_spec);
> else if (compression_spec.algorithm == PG_COMPRESSION_GZIP)
> InitCompressFileHandleGzip(CFH, compression_spec);
> else if (compression_spec.algorithm == PG_COMPRESSION_LZ4)
> InitCompressFileHandleLZ4(CFH, compression_spec);
> 
> It was felt that a prefix was required due to the inclusion 'lz4.h'
> header where naming functions as 'LZ4_' would be wrong. The prefix
> 'LZ4File_' seemed to be in line with the naming of the rest of
> the relevant functions and structures. Other compressions, gzip and
> none, did not face the same issue.
> 
> To conclude, I think that having a prefix is slightly preferred
> over not having one. Since the prefix `LZ4File_` is not desired,
> I propose `LZ4Stream_` in v2.
> 
> I will not object to dismissing the argument and drop `File` from
> the prefix, if so requested.
> 

Thanks.

I think the LZ4Stream prefix is reasonable, so let's roll with that. I
cleaned up the patch a little bit (mostly comment tweaks, etc.), updated
the commit message and pushed it.

The main tweak I did is renaming all the LZ4State variables from "fs" to
state. The old name referred to the now abandoned "file state", but
after the rename to LZ4State that seems confusing. Some of the places
already used "state"and it's easier to know "state" is always LZ4State,
so let's keep it consistent.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Add LZ4 compression in pg_dump

2023-03-31 Thread gkokolatos





--- Original Message ---
On Wednesday, March 29th, 2023 at 12:02 AM, Tomas Vondra 
 wrote:


> 
> 
> On 3/28/23 18:07, gkokola...@pm.me wrote:
> 
> > --- Original Message ---
> > On Friday, March 24th, 2023 at 10:30 AM, gkokola...@pm.me gkokola...@pm.me 
> > wrote:
> > 
> > > --- Original Message ---
> > > On Thursday, March 23rd, 2023 at 6:10 PM, Tomas Vondra 
> > > tomas.von...@enterprisedb.com wrote:
> > > 
> > > > This leaves the empty-data issue (which we have a fix for) and the
> > > > switch to LZ4F. And then the zstd part.
> > > 
> > > Please expect promptly a patch for the switch to frames.
> > 
> > Please find the expected patch attached. Note that the bulk of the
> > patch is code unification, variable renaming to something more
> > appropriate, and comment addition. These are changes that are not
> > strictly necessary to switch to LZ4F. I do believe that are
> > essential for code hygiene after the switch and they do belong
> > on the same commit.
> 
> 
> I think the patch is fine, but I'm wondering if the renames shouldn't go
> a bit further. It removes references to LZ4File struct, but there's a
> bunch of functions with LZ4File_ prefix. Why not to simply use LZ4_
> prefix? We don't have GzipFile either.
> 
> Sure, it might be a bit confusing because lz4.h uses LZ4_ prefix, but
> then we probably should not define LZ4_compressor_init ...

This is a good point. The initial thought was that since lz4.h is now
removed, such ambiguity will not be present. In v2 of the patch the
function is renamed to `LZ4State_compression_init` since this name
describes better its purpose. It initializes the LZ4State for
compression.

As for the LZ4File_ prefix, I have no objections. Please find the
prefix changed to LZ4Stream_. For the record, the word 'File' is not
unique to the lz4 implementation. The common data structure used by
the API in compress_io.h:

   typedef struct CompressFileHandle CompressFileHandle; 

The public functions for this API are named:

  InitCompressFileHandle
  InitDiscoverCompressFileHandle
  EndCompressFileHandle

And within InitCompressFileHandle the pattern is:

if (compression_spec.algorithm == PG_COMPRESSION_NONE)
InitCompressFileHandleNone(CFH, compression_spec);
else if (compression_spec.algorithm == PG_COMPRESSION_GZIP)
InitCompressFileHandleGzip(CFH, compression_spec);
else if (compression_spec.algorithm == PG_COMPRESSION_LZ4)
InitCompressFileHandleLZ4(CFH, compression_spec);

It was felt that a prefix was required due to the inclusion 'lz4.h'
header where naming functions as 'LZ4_' would be wrong. The prefix
'LZ4File_' seemed to be in line with the naming of the rest of
the relevant functions and structures. Other compressions, gzip and
none, did not face the same issue.

To conclude, I think that having a prefix is slightly preferred
over not having one. Since the prefix `LZ4File_` is not desired,
I propose `LZ4Stream_` in v2.

I will not object to dismissing the argument and drop `File` from
the prefix, if so requested.

> 
> Also, maybe the comments shouldn't use "File API" when compress_io.c
> calls that "Compressed stream API".

Done.

Cheers,
//Georgios

> 
> 
> regards
> 
> --
> Tomas Vondra
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL CompanyFrom b17b60cc1ff608f85c6c75ab19ad40c0863cfa93 Mon Sep 17 00:00:00 2001
From: Georgios Kokolatos 
Date: Fri, 31 Mar 2023 09:16:52 +
Subject: [PATCH v2] Use LZ4 frames in pg_dump's compressor API.

This change allows for greater compaction of data, especially so in very narrow
relations, by avoiding at least a compaction header and footer per row. Since
LZ4 frames are now used by both compression APIs, some code deduplication
opportunities have become obvious and are also implemented.

While at it, rename LZ4File* functions to LZ4Stream* to improve readability.

Reported by: Justin Pryzby
---
 src/bin/pg_dump/compress_lz4.c | 420 +
 1 file changed, 275 insertions(+), 145 deletions(-)

diff --git a/src/bin/pg_dump/compress_lz4.c b/src/bin/pg_dump/compress_lz4.c
index fc2f4e116d..7023b11a2c 100644
--- a/src/bin/pg_dump/compress_lz4.c
+++ b/src/bin/pg_dump/compress_lz4.c
@@ -17,7 +17,6 @@
 #include "compress_lz4.h"
 
 #ifdef USE_LZ4
-#include 
 #include 
 
 /*
@@ -29,133 +28,286 @@
 #endif
 
 /*--
- * Compressor API
- *--
+ * Common to both APIs
  */
 
-typedef struct LZ4CompressorState
+/*
+ * State used for LZ4 (de)compression by both APIs.
+ */
+typedef struct LZ4State
 {
-	char	   *outbuf;
-	size_t		outsize;
-} LZ4CompressorState;
+	/*
+	 * Used by the Stream API to keep track of the file stream.
+	 */
+	FILE	   *fp;
+
+	LZ4F_preferences_t prefs;
+
+	LZ4F_compressionContext_t	ctx;
+	LZ4F_decompressionContext_t	dtx;
+
+	/*
+	 * Used by the Stream API's lazy initialization.
+	 */
+	bool		inited;
+
+	/*
+	 * Used by the Stream API to distinguish between compression

Re: Add LZ4 compression in pg_dump

2023-03-28 Thread Tomas Vondra
On 3/28/23 00:34, gkokola...@pm.me wrote:
> 
> ...
>
>> Got it. In that case I agree it's fine to do that in a single commit.
> 
> For what is worth, I think that this patch should get a +1 and get in. It
> solves the empty writes problem and includes a test to a previous untested
> case.
> 

Pushed, after updating / rewording the commit message a little bit.

Thanks!

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Add LZ4 compression in pg_dump

2023-03-28 Thread Tomas Vondra
On 3/28/23 18:07, gkokola...@pm.me wrote:
> 
> --- Original Message ---
> On Friday, March 24th, 2023 at 10:30 AM, gkokola...@pm.me  
> wrote:
> 
>>
>> --- Original Message ---
>> On Thursday, March 23rd, 2023 at 6:10 PM, Tomas Vondra 
>> tomas.von...@enterprisedb.com wrote:
>>
>>> This leaves the empty-data issue (which we have a fix for) and the
>>> switch to LZ4F. And then the zstd part.
>>
>> Please expect promptly a patch for the switch to frames.
> 
> Please find the expected patch attached. Note that the bulk of the
> patch is code unification, variable renaming to something more
> appropriate, and comment addition. These are changes that are not
> strictly necessary to switch to LZ4F. I do believe that are
> essential for code hygiene after the switch and they do belong
> on the same commit. 
> 


I think the patch is fine, but I'm wondering if the renames shouldn't go
a bit further. It removes references to LZ4File struct, but there's a
bunch of functions with LZ4File_ prefix. Why not to simply use LZ4_
prefix? We don't have GzipFile either.

Sure, it might be a bit confusing because lz4.h uses LZ4_ prefix, but
then we probably should not define LZ4_compressor_init ...

Also, maybe the comments shouldn't use "File API" when compress_io.c
calls that "Compressed stream API".


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Add LZ4 compression in pg_dump

2023-03-28 Thread Justin Pryzby
On Tue, Mar 28, 2023 at 06:40:03PM +0200, Tomas Vondra wrote:
> On 3/28/23 18:07, gkokola...@pm.me wrote:
> > --- Original Message ---
> > On Friday, March 24th, 2023 at 10:30 AM, gkokola...@pm.me 
> >  wrote:
> > 
> >> --- Original Message ---
> >> On Thursday, March 23rd, 2023 at 6:10 PM, Tomas Vondra 
> >> tomas.von...@enterprisedb.com wrote:
> >>
> >>> This leaves the empty-data issue (which we have a fix for) and the
> >>> switch to LZ4F. And then the zstd part.
> >>
> >> Please expect promptly a patch for the switch to frames.
> > 
> > Please find the expected patch attached. Note that the bulk of the
> > patch is code unification, variable renaming to something more
> > appropriate, and comment addition. These are changes that are not
> > strictly necessary to switch to LZ4F. I do believe that are
> > essential for code hygiene after the switch and they do belong
> > on the same commit. 
> 
> Thanks!
> 
> I agree the renames & cleanup are appropriate - it'd be silly to stick
> to misleading naming etc. Would it make sense to split the patch into
> two, to separate the renames and the switch to lz4f?
> That'd make it the changes necessary for lz4f switch clearer.

I don't think so.  Did you mean separate commits only for review ?

The patch is pretty readable - the File API has just some renames, and
the compressor API is what's being replaced, which isn't going to be any
more clear.

@Georgeos: did you consider using a C union in LZ4State, to separate the
parts used by the different APIs ?

-- 
Justin




Re: Add LZ4 compression in pg_dump

2023-03-28 Thread Tomas Vondra



On 3/28/23 18:07, gkokola...@pm.me wrote:
> 
> 
> 
> 
> 
> --- Original Message ---
> On Friday, March 24th, 2023 at 10:30 AM, gkokola...@pm.me  
> wrote:
> 
>>
>> --- Original Message ---
>> On Thursday, March 23rd, 2023 at 6:10 PM, Tomas Vondra 
>> tomas.von...@enterprisedb.com wrote:
>>
>>> This leaves the empty-data issue (which we have a fix for) and the
>>> switch to LZ4F. And then the zstd part.
>>
>> Please expect promptly a patch for the switch to frames.
> 
> Please find the expected patch attached. Note that the bulk of the
> patch is code unification, variable renaming to something more
> appropriate, and comment addition. These are changes that are not
> strictly necessary to switch to LZ4F. I do believe that are
> essential for code hygiene after the switch and they do belong
> on the same commit. 
> 

Thanks!

I agree the renames & cleanup are appropriate - it'd be silly to stick
to misleading naming etc. Would it make sense to split the patch into
two, to separate the renames and the switch to lz4f?

That'd make it the changes necessary for lz4f switch clearer.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Add LZ4 compression in pg_dump

2023-03-28 Thread gkokolatos





--- Original Message ---
On Friday, March 24th, 2023 at 10:30 AM, gkokola...@pm.me  
wrote:

> 
> --- Original Message ---
> On Thursday, March 23rd, 2023 at 6:10 PM, Tomas Vondra 
> tomas.von...@enterprisedb.com wrote:
> 
> > This leaves the empty-data issue (which we have a fix for) and the
> > switch to LZ4F. And then the zstd part.
> 
> Please expect promptly a patch for the switch to frames.

Please find the expected patch attached. Note that the bulk of the
patch is code unification, variable renaming to something more
appropriate, and comment addition. These are changes that are not
strictly necessary to switch to LZ4F. I do believe that are
essential for code hygiene after the switch and they do belong
on the same commit. 

Cheers,
//Georgios

> 
> Cheers,
> //GeorgiosFrom c289fb8d49b680ad180a44b20fff1dc9553b7494 Mon Sep 17 00:00:00 2001
From: Georgios Kokolatos 
Date: Tue, 28 Mar 2023 15:48:06 +
Subject: [PATCH v1] Use LZ4 frames in pg_dump's compressor API.

This change allows for greater compaction of data, especially so in very narrow
relations, by avoiding at least a compaction header and footer per row. Since
LZ4 frames are now used by both compression APIs, some code deduplication
opportunities have become obvious and are also implemented.

Reported by: Justin Pryzby
---
 src/bin/pg_dump/compress_lz4.c | 358 ++---
 1 file changed, 244 insertions(+), 114 deletions(-)

diff --git a/src/bin/pg_dump/compress_lz4.c b/src/bin/pg_dump/compress_lz4.c
index fc2f4e116d..078dc35cd6 100644
--- a/src/bin/pg_dump/compress_lz4.c
+++ b/src/bin/pg_dump/compress_lz4.c
@@ -17,7 +17,6 @@
 #include "compress_lz4.h"
 
 #ifdef USE_LZ4
-#include 
 #include 
 
 /*
@@ -29,102 +28,279 @@
 #endif
 
 /*--
- * Compressor API
- *--
+ * Common to both APIs
  */
 
-typedef struct LZ4CompressorState
+/*
+ * State used for LZ4 (de)compression by both APIs.
+ */
+typedef struct LZ4State
 {
-	char	   *outbuf;
-	size_t		outsize;
-} LZ4CompressorState;
+	/*
+	 * Used by the File API to keep track of the file stream.
+	 */
+	FILE	   *fp;
+
+	LZ4F_preferences_t prefs;
+
+	LZ4F_compressionContext_t	ctx;
+	LZ4F_decompressionContext_t	dtx;
+
+	/*
+	 * Used by the File API's lazy initialization.
+	 */
+	bool		inited;
+
+	/*
+	 * Used by the File API to distinguish between compression
+	 * and decompression operations.
+	 */
+	bool		compressing;
+
+	/*
+	 * Used by the Compressor API to mark if the compression
+	 * headers have been written after initialization.
+	 */
+	bool		needs_header_flush;
+
+	size_t		buflen;
+	char	   *buffer;
+
+	/*
+	 * Used by the File API to store already uncompressed
+	 * data that the caller has not consumed.
+	 */
+	size_t		overflowalloclen;
+	size_t		overflowlen;
+	char	   *overflowbuf;
+
+	/*
+	 * Used by both APIs to keep track of the compressed
+	 * data length stored in the buffer.
+	 */
+	size_t		compressedlen;
+
+	/*
+	 * Used by both APIs to keep track of error codes.
+	 */
+	size_t		errcode;
+} LZ4State;
+
+/*
+ * Initialize the required LZ4State members for compression. Write the LZ4 frame
+ * header in a buffer keeping track of its length. Users of this function can
+ * choose when and how to write the header to a file stream.
+ *
+ * Returns true on success. In case of a failure returns false, and stores the
+ * error code in state->errcode.
+ */
+static bool
+LZ4_compression_state_init(LZ4State *state)
+{
+	size_t		status;
+
+	state->buflen = LZ4F_compressBound(DEFAULT_IO_BUFFER_SIZE, >prefs);
+
+	/*
+	 * LZ4F_compressBegin requires a buffer that is greater or equal to
+	 * LZ4F_HEADER_SIZE_MAX. Verify that the requirement is met.
+	 */
+	if (state->buflen < LZ4F_HEADER_SIZE_MAX)
+		state->buflen = LZ4F_HEADER_SIZE_MAX;
+
+	status = LZ4F_createCompressionContext(>ctx, LZ4F_VERSION);
+	if (LZ4F_isError(status))
+	{
+		state->errcode = status;
+		return false;
+	}
+
+	state->buffer = pg_malloc(state->buflen);
+	status = LZ4F_compressBegin(state->ctx,
+state->buffer, state->buflen,
+			   >prefs);
+	if (LZ4F_isError(status))
+	{
+		state->errcode = status;
+		return false;
+	}
+
+	state->compressedlen = status;
+
+	return true;
+}
+
+/*--
+ * Compressor API
+ *--
+ */
 
 /* Private routines that support LZ4 compressed data I/O */
-static void ReadDataFromArchiveLZ4(ArchiveHandle *AH, CompressorState *cs);
-static void WriteDataToArchiveLZ4(ArchiveHandle *AH, CompressorState *cs,
-  const void *data, size_t dLen);
-static void EndCompressorLZ4(ArchiveHandle *AH, CompressorState *cs);
 
 static void
 ReadDataFromArchiveLZ4(ArchiveHandle *AH, CompressorState *cs)
 {
-	LZ4_streamDecode_t lz4StreamDecode;
-	char	   *buf;
-	char	   *decbuf;
-	size_t		buflen;
-	size_t		cnt;
-
-	buflen = DEFAULT_IO_BUFFER_SIZE;
-	buf = pg_malloc(buflen);
-	decbuf = pg_malloc(buflen);
+	size_t		r;
+	size_t		readbuflen;
+	char	   *outbuf;
+	char	   *readbuf;
+	

Re: Add LZ4 compression in pg_dump

2023-03-27 Thread gkokolatos






--- Original Message ---
On Thursday, March 16th, 2023 at 11:30 PM, Tomas Vondra 
 wrote:


> 
> 
> 
> 
> On 3/16/23 01:20, Justin Pryzby wrote:
> 
> > On Mon, Mar 13, 2023 at 10:47:12PM +0100, Tomas Vondra wrote:
> > 
> > > 
> > > Thanks. I don't want to annoy you too much, but could you split the
> > > patch into the "empty-data" fix and all the other changes (rearranging
> > > functions etc.)? I'd rather not mix those in the same commit.
> > 
> > I don't know if that makes sense? The "empty-data" fix creates a new
> > function called DeflateCompressorInit(). My proposal was to add the new
> > function in the same place in the file as it used to be.
> 
> 
> Got it. In that case I agree it's fine to do that in a single commit.

For what is worth, I think that this patch should get a +1 and get in. It
solves the empty writes problem and includes a test to a previous untested
case.

Cheers,
//Georgios

> 
> > The patch also moves the pg_fatal() that's being removed. I don't think
> > it's going to look any cleaner to read a history involving the
> > pg_fatal() first being added, then moved, then removed. Anyway, I'll
> > wait while the community continues discussion about the pg_fatal().
> 
> 
> I think the agreement was to replace the pg_fatal with and assert, and I
> see your patch already does that.
> 
> 
> regards
> 
> --
> Tomas Vondra
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company




Re: Add LZ4 compression in pg_dump

2023-03-24 Thread gkokolatos






--- Original Message ---
On Thursday, March 23rd, 2023 at 6:10 PM, Tomas Vondra 
 wrote:



> 
> So pushed all three parts, after updating the commit messages a bit.

Thank you very much.

> 
> This leaves the empty-data issue (which we have a fix for) and the
> switch to LZ4F. And then the zstd part.

Please expect promptly a patch for the switch to frames.

Cheers,
//Georgios





Re: Add LZ4 compression in pg_dump

2023-03-23 Thread Tomas Vondra
Hi,

I looked at this again, and I realized I misunderstood the bit about
errno in LZ4File_open_write a bit. I now see it simply just brings the
function in line with Gzip_open_write(), so that the callers can just do
pg_fatal("%m"). I still think the special "errno" handling in this one
place feels a bit random, and handling it by get_error_func() would be
nicer, but we can leave that for a separate patch - no need to block
these changes because of that.

So pushed all three parts, after updating the commit messages a bit.

This leaves the empty-data issue (which we have a fix for) and the
switch to LZ4F. And then the zstd part.


On 3/20/23 23:40, Justin Pryzby wrote:
> On Fri, Mar 17, 2023 at 03:43:58PM +, gkokola...@pm.me wrote:
>> From a174cdff4ec8aad59f5bcc7e8d52218a14fe56fc Mon Sep 17 00:00:00 2001
>> From: Georgios Kokolatos 
>> Date: Fri, 17 Mar 2023 14:45:58 +
>> Subject: [PATCH v3 1/3] Improve type handling in pg_dump's compress file API
> 
>> -int
>> +bool
>>  EndCompressFileHandle(CompressFileHandle *CFH)
>>  {
>> -int ret = 0;
>> +boolret = 0;
> 
> Should say "= false" ?
> 

Right, fixed.

>>  /*
>>   * Write 'size' bytes of data into the file from 'ptr'.
>> + *
>> + * Returns true on success and false on error.
>> + */
>> +   bool(*write_func) (const void *ptr, size_t size,
> 
>> -* Get a pointer to a string that describes an error that occurred 
>> during a
>> -* compress file handle operation.
>> +* Get a pointer to a string that describes an error that occurred 
>> during
>> +* a compress file handle operation.
>>  */
>> const char *(*get_error_func) (CompressFileHandle *CFH);
> 
> This should mention that the error accessible in error_func() applies (only) 
> to
> write_func() ?
> 
> As long as this touches pg_backup_directory.c you could update the
> header comment to refer to "compressed extensions", not just .gz.
> 
> I noticed that EndCompressorLZ4() tests "if (LZ4cs)", but that should
> always be true.
> 

I haven't done these two things. We can/should do that, but it didn't
fit into the three patches.

> I was able to convert the zstd patch to this new API with no issue.
> 

Good to hear.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Add LZ4 compression in pg_dump

2023-03-20 Thread Justin Pryzby
On Fri, Mar 17, 2023 at 03:43:58PM +, gkokola...@pm.me wrote:
> From a174cdff4ec8aad59f5bcc7e8d52218a14fe56fc Mon Sep 17 00:00:00 2001
> From: Georgios Kokolatos 
> Date: Fri, 17 Mar 2023 14:45:58 +
> Subject: [PATCH v3 1/3] Improve type handling in pg_dump's compress file API

> -int
> +bool
>  EndCompressFileHandle(CompressFileHandle *CFH)
>  {
> - int ret = 0;
> + boolret = 0;

Should say "= false" ?

>   /*
>* Write 'size' bytes of data into the file from 'ptr'.
> +  *
> +  * Returns true on success and false on error.
> +  */
> +   bool(*write_func) (const void *ptr, size_t size,

> -* Get a pointer to a string that describes an error that occurred 
> during a
> -* compress file handle operation.
> +* Get a pointer to a string that describes an error that occurred 
> during
> +* a compress file handle operation.
>  */
> const char *(*get_error_func) (CompressFileHandle *CFH);

This should mention that the error accessible in error_func() applies (only) to
write_func() ?

As long as this touches pg_backup_directory.c you could update the
header comment to refer to "compressed extensions", not just .gz.

I noticed that EndCompressorLZ4() tests "if (LZ4cs)", but that should
always be true.

I was able to convert the zstd patch to this new API with no issue.

-- 
Justin




Re: Add LZ4 compression in pg_dump

2023-03-20 Thread Tomas Vondra
Hi,

I was preparing to get the 3 cleanup patches pushed, so I
updated/reworded the commit messages a bit (attached, please check).

But I noticed the commit message for 0001 said:

  In passing save the appropriate errno in LZ4File_open_write in
  case that the caller is not using the API's get_error_func.

I think that's far too low-level for a commit message, it'd be much more
appropriate for a comment at the function.

However, do we even need this behavior? I was looking for code calling
this function without using get_error_func(), but no luck. And if there
is such caller, shouldn't we fix it to use get_error_func()?


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL CompanyFrom 6a3d5d743f022ffcd0fcaf3d6e9ba711e2e785e7 Mon Sep 17 00:00:00 2001
From: Georgios Kokolatos 
Date: Fri, 17 Mar 2023 14:45:58 +
Subject: [PATCH v4 1/3] Improve type handling in pg_dump's compress file API
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

After 0da243fed0 got committed, we've received a report about a compiler
warning, related to the new LZ4File_gets() function:

  compress_lz4.c: In function ‘LZ4File_gets’:
  compress_lz4.c:492:19: warning: comparison of unsigned expression in
  ‘< 0’ is always false [-Wtype-limits]
492 | if (dsize < 0)

The reason is very simple - dsize is declared as size_t, which is an
unsigned integer, and thus the check is pointless and we might fail to
notice an error in some cases (or fail in a strange way a bit later).

The warning could have been silenced by simply changing the type, but we
realized the API mostly assumes all the libraries use the same types and
report errors the same way (e.g. by returning 0 and/or negative value).

But we can't make this assumption - the gzip/lz4 libraries already
disagree on some of this, and even if they did a library added in the
future might not.

The right solution is to define what the API does, and translate the
library-specific behavior in consistent way (so that the internal errors
are not exposed to users of our compression API).

For that reason, this commit adjusts the data types in a couple places,
so that we don't miss library errors, and unifies the error reporting to
simply return true/false (instead of e.g. size_t).

Author: Georgios Kokolatos
Reported-by: Alexander Lakhin
Reviewed-by: Tomas Vondra
Discussion: https://postgr.es/m/33496f7c-3449-1426-d568-63f6bca2a...@gmail.com
---
 src/bin/pg_dump/compress_gzip.c   | 37 ++--
 src/bin/pg_dump/compress_io.c |  8 +--
 src/bin/pg_dump/compress_io.h | 38 +---
 src/bin/pg_dump/compress_lz4.c| 85 +++
 src/bin/pg_dump/compress_none.c   | 41 -
 src/bin/pg_dump/pg_backup_archiver.c  | 18 ++
 src/bin/pg_dump/pg_backup_directory.c | 36 ++--
 7 files changed, 148 insertions(+), 115 deletions(-)

diff --git a/src/bin/pg_dump/compress_gzip.c b/src/bin/pg_dump/compress_gzip.c
index 0af65afeb4..d9c3969332 100644
--- a/src/bin/pg_dump/compress_gzip.c
+++ b/src/bin/pg_dump/compress_gzip.c
@@ -233,14 +233,14 @@ InitCompressorGzip(CompressorState *cs,
  *--
  */
 
-static size_t
-Gzip_read(void *ptr, size_t size, CompressFileHandle *CFH)
+static bool
+Gzip_read(void *ptr, size_t size, size_t *rsize, CompressFileHandle *CFH)
 {
 	gzFile		gzfp = (gzFile) CFH->private_data;
-	size_t		ret;
+	int			gzret;
 
-	ret = gzread(gzfp, ptr, size);
-	if (ret != size && !gzeof(gzfp))
+	gzret = gzread(gzfp, ptr, size);
+	if (gzret <= 0 && !gzeof(gzfp))
 	{
 		int			errnum;
 		const char *errmsg = gzerror(gzfp, );
@@ -249,15 +249,18 @@ Gzip_read(void *ptr, size_t size, CompressFileHandle *CFH)
  errnum == Z_ERRNO ? strerror(errno) : errmsg);
 	}
 
-	return ret;
+	if (rsize)
+		*rsize = (size_t) gzret;
+
+	return true;
 }
 
-static size_t
+static bool
 Gzip_write(const void *ptr, size_t size, CompressFileHandle *CFH)
 {
 	gzFile		gzfp = (gzFile) CFH->private_data;
 
-	return gzwrite(gzfp, ptr, size);
+	return gzwrite(gzfp, ptr, size) > 0;
 }
 
 static int
@@ -287,22 +290,22 @@ Gzip_gets(char *ptr, int size, CompressFileHandle *CFH)
 	return gzgets(gzfp, ptr, size);
 }
 
-static int
+static bool
 Gzip_close(CompressFileHandle *CFH)
 {
 	gzFile		gzfp = (gzFile) CFH->private_data;
 
 	CFH->private_data = NULL;
 
-	return gzclose(gzfp);
+	return gzclose(gzfp) == Z_OK;
 }
 
-static int
+static bool
 Gzip_eof(CompressFileHandle *CFH)
 {
 	gzFile		gzfp = (gzFile) CFH->private_data;
 
-	return gzeof(gzfp);
+	return gzeof(gzfp) == 1;
 }
 
 static const char *
@@ -319,7 +322,7 @@ Gzip_get_error(CompressFileHandle *CFH)
 	return errmsg;
 }
 
-static int
+static bool
 Gzip_open(const char *path, int fd, const char *mode, CompressFileHandle *CFH)
 {
 	gzFile		gzfp;
@@ -342,18 +345,18 @@ Gzip_open(const char *path, int fd, const char *mode, CompressFileHandle *CFH)
 		gzfp = 

Re: Add LZ4 compression in pg_dump

2023-03-17 Thread Tomas Vondra
On 3/17/23 16:43, gkokola...@pm.me wrote:
>>
>> ...
>>
>> I agree it's cleaner the way you did it.
>>
>> I was thinking that with each compression function handling error
>> internally, the callers would not need to do that. But I haven't
>> realized there's logic to detect ENOSPC and so on, and we'd need to
>> duplicate that in every compression func.
>>
> 
> If you agree, I can prepare a patch to improve on the error handling
> aspect of the API as a separate thread, since here we are trying to
> focus on correctness.
> 

Yes, that makes sense. There are far too many patches in this thread
already ...


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Add LZ4 compression in pg_dump

2023-03-17 Thread gkokolatos





--- Original Message ---
On Thursday, March 16th, 2023 at 10:20 PM, Tomas Vondra 
 wrote:


> 
> 
> 
> 
> On 3/16/23 18:04, gkokola...@pm.me wrote:
> 
> > --- Original Message ---
> > On Tuesday, March 14th, 2023 at 4:32 PM, Tomas Vondra 
> > tomas.von...@enterprisedb.com wrote:
> > 
> > > On 3/14/23 16:18, gkokola...@pm.me wrote:
> > > 
> > > > ...> Would you mind me trying to come with a patch to address your 
> > > > points?
> > > 
> > > That'd be great, thanks. Please keep it split into smaller patches - two
> > > might work, with one patch for "cosmetic" changes and the other tweaking
> > > the API error-handling stuff.
> > 
> > Please find attached a set for it. I will admit that the splitting in the
> > series might not be ideal and what you requested. It is split on what
> > seemed as a logical units. Please advice how a better split can look like.
> > 
> > 0001 is unifying types and return values on the API
> > 0002 is addressing the constant definitions
> > 0003 is your previous 0004 adding comments
> 
> 
> Thanks. I think the split seems reasonable - the goal was to not mix
> different changes, and from that POV it works.
> 
> I'm not sure I understand the Gzip_read/Gzip_write changes in 0001. I
> mean, gzread/gzwrite returns int, so how does renaming the size_t
> variable solve the issue of negative values for errors? I mean, this
> 
> - size_t ret;
> + size_t gzret;
> 
> - ret = gzread(gzfp, ptr, size);
> + gzret = gzread(gzfp, ptr, size);
> 
> means we still lost the information gzread() returned a negative value,
> no? We'll still probably trigger an error, but it's a bit weird.

You are obviously correct. My bad, I miss-read the return type of gzread().

Please find an amended version attached.

> Unless I'm missing something, if gzread() ever returns -1 or some other
> negative error value, we'll cast it to size_t, while condition will
> evaluate to "true" and we'll happily chew on some random chunk of data.
> 
> So the confusion is (at least partially) a preexisting issue ...
> 
> For gzwrite() it seems to be fine, because that only returns 0 on error.
> OTOH it's defined to take 'int size' but then we happily pass size_t
> values to it.
> 
> As I wrote earlier, this apparently assumes we never need to deal with
> buffers larger than int, and I don't think we have the ambition to relax
> that (I'm not sure it's even needed / possible).

Agreed.


> I see the read/write functions are now defined as int, but we only ever
> return 0/1 from them, and then interpret that as bool. Why not to define
> it like that? I don't think we need to adhere to the custom that
> everything returns "int". This is an internal API. Or if we want to
> stick to int, I'd define meaningful "nice" constants for 0/1.

The return types are now booleans and the callers have been made aware.


> 0002 seems fine to me. I see you've ditched the idea of having two
> separate buffers, and replaced them with DEFAULT_IO_BUFFER_SIZE. Fine
> with me, although I wonder if this might have negative impact on
> performance or something (but I doubt that).
> 

I doubt that too. Thank you.

> 0003 seems fine too.

Thank you.


> > As far as the error handling is concerned, you had said upthread:
> > 
> > > I think the right approach is to handle all library errors and not just
> > > let them through. So Gzip_write() needs to check the return value, and
> > > either call pg_fatal() or translate it to an error defined by the API.
> > 
> > While working on it, I thought it would be clearer and more consistent
> > for the pg_fatal() to be called by the caller of the individual functions.
> > Each individual function can keep track of the specifics of the error
> > internally. Then the caller upon detecting that there was an error by
> > checking the return value, can call pg_fatal() with a uniform error
> > message and then add the specifics by calling the get_error_func().
> 
> 
> I agree it's cleaner the way you did it.
> 
> I was thinking that with each compression function handling error
> internally, the callers would not need to do that. But I haven't
> realized there's logic to detect ENOSPC and so on, and we'd need to
> duplicate that in every compression func.
> 

If you agree, I can prepare a patch to improve on the error handling
aspect of the API as a separate thread, since here we are trying to
focus on correctness.

Cheers,
//Georgios

> 
> regards
> 
> --
> Tomas Vondra
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL CompanyFrom eeac82b647dc4021e1dcf22d8cc59840fbde8847 Mon Sep 17 00:00:00 2001
From: Georgios Kokolatos 
Date: Fri, 17 Mar 2023 15:29:05 +
Subject: [PATCH v3 3/3] Improve compress_lz4 documentation.

Author: Tomas Vondra
---
 src/bin/pg_dump/compress_lz4.c | 25 -
 1 file changed, 20 insertions(+), 5 deletions(-)

diff --git a/src/bin/pg_dump/compress_lz4.c b/src/bin/pg_dump/compress_lz4.c
index 2f3e552f51..fc2f4e116d 100644
--- 

Re: Add LZ4 compression in pg_dump

2023-03-16 Thread Tomas Vondra



On 3/16/23 23:58, Justin Pryzby wrote:
> On Thu, Mar 16, 2023 at 11:30:50PM +0100, Tomas Vondra wrote:
>> On 3/16/23 01:20, Justin Pryzby wrote:
>>> But try reading the diff while looking for the cause of a bug.  It's the
>>> difference between reading 50, two-line changes, and reading a hunk that
>>> replaces 100 lines with a different 100 lines, with empty/unrelated
>>> lines randomly thrown in as context.
>>
>> I don't know, maybe I'm doing something wrong or maybe I just am bad at
>> looking at diffs, but if I apply the patch you submitted on 8/3 and do
>> the git-diff above (output attached), it seems pretty incomprehensible
>> to me :-( I don't see 50 two-line changes (I certainly wouldn't be able
>> to identify the root cause of the bug based on that).
> 
> It's true that most of the diff is still incomprehensible...
> 
> But look at the part relevant to the "empty-data" bug:
> 

Well, yeah. If you know where to look, and if you squint just the right
way, then you can see any bug. I don't think I'd be able to spot the bug
in the diff unless I knew in advance what the bug is.

That being said, I don't object to moving the function etc. Unless there
are alternative ideas how to fix the empty-data issue, I'll get this
committed after playing with it a bit more.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Add LZ4 compression in pg_dump

2023-03-16 Thread Justin Pryzby
On Thu, Mar 16, 2023 at 11:30:50PM +0100, Tomas Vondra wrote:
> On 3/16/23 01:20, Justin Pryzby wrote:
> > But try reading the diff while looking for the cause of a bug.  It's the
> > difference between reading 50, two-line changes, and reading a hunk that
> > replaces 100 lines with a different 100 lines, with empty/unrelated
> > lines randomly thrown in as context.
>
> I don't know, maybe I'm doing something wrong or maybe I just am bad at
> looking at diffs, but if I apply the patch you submitted on 8/3 and do
> the git-diff above (output attached), it seems pretty incomprehensible
> to me :-( I don't see 50 two-line changes (I certainly wouldn't be able
> to identify the root cause of the bug based on that).

It's true that most of the diff is still incomprehensible...

But look at the part relevant to the "empty-data" bug:

[... incomprehensible changes elided ...]
>  static void
> -InitCompressorZlib(CompressorState *cs, int level)
> +DeflateCompressorInit(CompressorState *cs)
>  {
> + GzipCompressorState *gzipcs;
>   z_streamp   zp;
>  
> - zp = cs->zp = (z_streamp) pg_malloc(sizeof(z_stream));
> + gzipcs = (GzipCompressorState *) 
> pg_malloc0(sizeof(GzipCompressorState));
> + zp = gzipcs->zp = (z_streamp) pg_malloc(sizeof(z_stream));
>   zp->zalloc = Z_NULL;
>   zp->zfree = Z_NULL;
>   zp->opaque = Z_NULL;
>  
>   /*
> -  * zlibOutSize is the buffer size we tell zlib it can output to.  We
> -  * actually allocate one extra byte because some routines want to 
> append a
> -  * trailing zero byte to the zlib output.
> +  * outsize is the buffer size we tell zlib it can output to.  We 
> actually
> +  * allocate one extra byte because some routines want to append a 
> trailing
> +  * zero byte to the zlib output.
>*/
> - cs->zlibOut = (char *) pg_malloc(ZLIB_OUT_SIZE + 1);
> - cs->zlibOutSize = ZLIB_OUT_SIZE;
> + gzipcs->outbuf = pg_malloc(ZLIB_OUT_SIZE + 1);
> + gzipcs->outsize = ZLIB_OUT_SIZE;
>  
> - if (deflateInit(zp, level) != Z_OK)
> - pg_fatal("could not initialize compression library: %s",
> -  zp->msg);
> + /* -Z 0 uses the "None" compressor -- not zlib with no compression */
> + Assert(cs->compression_spec.level != 0);
> +
> + if (deflateInit(zp, cs->compression_spec.level) != Z_OK)
> + pg_fatal("could not initialize compression library: %s", 
> zp->msg);
>  
>   /* Just be paranoid - maybe End is called after Start, with no Write */
> - zp->next_out = (void *) cs->zlibOut;
> - zp->avail_out = cs->zlibOutSize;
> + zp->next_out = gzipcs->outbuf;
> + zp->avail_out = gzipcs->outsize;
> +
> + /* Keep track of gzipcs */
> + cs->private_data = gzipcs;
>  }
>  
>  static void
> -EndCompressorZlib(ArchiveHandle *AH, CompressorState *cs)
> +DeflateCompressorEnd(ArchiveHandle *AH, CompressorState *cs)
>  {
> - z_streamp   zp = cs->zp;
> + GzipCompressorState *gzipcs = (GzipCompressorState *) cs->private_data;
> + z_streamp   zp;
>  
> + zp = gzipcs->zp;
>   zp->next_in = NULL;
>   zp->avail_in = 0;
>  
>   /* Flush any remaining data from zlib buffer */
> - DeflateCompressorZlib(AH, cs, true);
> + DeflateCompressorCommon(AH, cs, true);
>  
>   if (deflateEnd(zp) != Z_OK)
>   pg_fatal("could not close compression stream: %s", zp->msg);
>  
> - free(cs->zlibOut);
> - free(cs->zp);
> + pg_free(gzipcs->outbuf);
> + pg_free(gzipcs->zp);
> + pg_free(gzipcs);
> + cs->private_data = NULL;
>  }
>  
>  static void
> -DeflateCompressorZlib(ArchiveHandle *AH, CompressorState *cs, bool flush)
> +DeflateCompressorCommon(ArchiveHandle *AH, CompressorState *cs, bool flush)
>  {
> - z_streamp   zp = cs->zp;
> - char   *out = cs->zlibOut;
> + GzipCompressorState *gzipcs = (GzipCompressorState *) cs->private_data;
> + z_streamp   zp = gzipcs->zp;
> + void   *out = gzipcs->outbuf;
>   int res = Z_OK;
>  
> - while (cs->zp->avail_in != 0 || flush)
> + while (gzipcs->zp->avail_in != 0 || flush)
>   {
>   res = deflate(zp, flush ? Z_FINISH : Z_NO_FLUSH);
>   if (res == Z_STREAM_ERROR)
>   pg_fatal("could not compress data: %s", zp->msg);
> - if ((flush && (zp->avail_out < cs->zlibOutSize))
> + if ((flush && (zp->avail_out < gzipcs->outsize))
>   || (zp->avail_out == 0)
>   || (zp->avail_in != 0)
>   )
> @@ -289,18 +122,18 @@ DeflateCompressorZlib(ArchiveHandle *AH, 
> CompressorState *cs, bool flush)
>* chunk is the EOF marker in the custom format. This 
> should never
>* happen but ...
>*/
> - if (zp->avail_out < cs->zlibOutSize)
> + if (zp->avail_out 

Re: Add LZ4 compression in pg_dump

2023-03-16 Thread Tomas Vondra


On 3/16/23 01:20, Justin Pryzby wrote:
> On Mon, Mar 13, 2023 at 10:47:12PM +0100, Tomas Vondra wrote:
>>> Rearrange functions to their original order allowing a cleaner diff to the 
>>> prior code;
>>
>> OK. I wasn't very enthusiastic about this initially, but after thinking
>> about it a bit I think it's meaningful to make diffs clearer. But I
>> don't see much difference with/without the patch. The
>>
>> git diff --diff-algorithm=minimal -w 
>> e9960732a~:src/bin/pg_dump/compress_io.c src/bin/pg_dump/compress_gzip.c
>>
>> Produces ~25k diff with/without the patch. What am I doing wrong?
> 
> Do you mean 25 kB of diff ?

Yes, if you redirect the git-diff to a file, it's a 25kB file.

> I agree that the statistics of the diff output don't change a lot:
> 
>   1 file changed, 201 insertions(+), 570 deletions(-)
>   1 file changed, 198 insertions(+), 548 deletions(-)
> 
> But try reading the diff while looking for the cause of a bug.  It's the
> difference between reading 50, two-line changes, and reading a hunk that
> replaces 100 lines with a different 100 lines, with empty/unrelated
> lines randomly thrown in as context.
> 
> When the diff is readable, the pg_fatal() also stands out.
> 

I don't know, maybe I'm doing something wrong or maybe I just am bad at
looking at diffs, but if I apply the patch you submitted on 8/3 and do
the git-diff above (output attached), it seems pretty incomprehensible
to me :-( I don't see 50 two-line changes (I certainly wouldn't be able
to identify the root cause of the bug based on that).

>>> Change pg_fatal() to an assertion+comment;
>>
>> Yeah, that's reasonable. I'd even ditch the assert/comment, TBH. We
>> could add such protections against "impossible" stuff to a zillion other
>> places and the confusion likely outweighs the benefits.
>>
>>> Update the commit message and fix a few typos;
>>
>> Thanks. I don't want to annoy you too much, but could you split the
>> patch into the "empty-data" fix and all the other changes (rearranging
>> functions etc.)? I'd rather not mix those in the same commit.
> 
> I don't know if that makes sense?  The "empty-data" fix creates a new
> function called DeflateCompressorInit().  My proposal was to add the new
> function in the same place in the file as it used to be.
> 

Got it. In that case I agree it's fine to do that in a single commit.

> The patch also moves the pg_fatal() that's being removed.  I don't think
> it's going to look any cleaner to read a history involving the
> pg_fatal() first being added, then moved, then removed.  Anyway, I'll
> wait while the community continues discussion about the pg_fatal().
> 

I think the agreement was to replace the pg_fatal with and assert, and I
see your patch already does that.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Companydiff --git a/src/bin/pg_dump/compress_io.c b/src/bin/pg_dump/compress_gzip.c
index 5ac21f091f0..3c9ac55c266 100644
--- a/src/bin/pg_dump/compress_io.c
+++ b/src/bin/pg_dump/compress_gzip.c
@@ -1,285 +1,118 @@
 /*-
  *
- * compress_io.c
- *  Routines for archivers to write an uncompressed or compressed data
- *  stream.
+ * compress_gzip.c
+ *  Routines for archivers to read or write a gzip compressed data stream.
  *
  * Portions Copyright (c) 1996-2023, PostgreSQL Global Development Group
  * Portions Copyright (c) 1994, Regents of the University of California
  *
- * This file includes two APIs for dealing with compressed data. The first
- * provides more flexibility, using callbacks to read/write data from the
- * underlying stream. The second API is a wrapper around fopen/gzopen and
- * friends, providing an interface similar to those, but abstracts away
- * the possible compression. Both APIs use libz for the compression, but
- * the second API uses gzip headers, so the resulting files can be easily
- * manipulated with the gzip utility.
- *
- * Compressor API
- * --
- *
- * The interface for writing to an archive consists of three functions:
- * AllocateCompressor, WriteDataToArchive and EndCompressor. First you call
- * AllocateCompressor, then write all the data by calling 
WriteDataToArchive
- * as many times as needed, and finally EndCompressor. WriteDataToArchive
- * and EndCompressor will call the WriteFunc that was provided to
- * AllocateCompressor for each chunk of compressed data.
- *
- * The interface for reading an archive consists of just one function:
- * ReadDataFromArchive. ReadDataFromArchive reads the whole compressed 
input
- * stream, by repeatedly calling the given ReadFunc. ReadFunc returns the
- * compressed data chunk at a time, and ReadDataFromArchive decompresses it
- * and passes the decompressed data to ahwrite(), until ReadFunc returns 0
- * to signal EOF.
- *
- * The interface is the same for compressed and uncompressed 

Re: Add LZ4 compression in pg_dump

2023-03-16 Thread Tomas Vondra



On 3/16/23 18:04, gkokola...@pm.me wrote:
> 
> --- Original Message ---
> On Tuesday, March 14th, 2023 at 4:32 PM, Tomas Vondra 
>  wrote:
>>
>> On 3/14/23 16:18, gkokola...@pm.me wrote:
>>
>>> ...> Would you mind me trying to come with a patch to address your points?
>>
>>
>> That'd be great, thanks. Please keep it split into smaller patches - two
>> might work, with one patch for "cosmetic" changes and the other tweaking
>> the API error-handling stuff.
> 
> Please find attached a set for it. I will admit that the splitting in the
> series might not be ideal and what you requested. It is split on what
> seemed as a logical units. Please advice how a better split can look like.
> 
> 0001 is unifying types and return values on the API
> 0002 is addressing the constant definitions
> 0003 is your previous 0004 adding comments
> 

Thanks. I think the split seems reasonable - the goal was to not mix
different changes, and from that POV it works.

I'm not sure I understand the Gzip_read/Gzip_write changes in 0001. I
mean, gzread/gzwrite returns int, so how does renaming the size_t
variable solve the issue of negative values for errors? I mean, this

-size_tret;
+size_tgzret;

-ret = gzread(gzfp, ptr, size);
+gzret = gzread(gzfp, ptr, size);

means we still lost the information gzread() returned a negative value,
no? We'll still probably trigger an error, but it's a bit weird.


ISTM all this kinda assumes we're processing chunks of memory small
enough that we'll never actually overflow int - I did check what the
code in 15 does, and it seems use int and size_t quite arbitrarily.

For example cfread() seems quite sane:

int
cfread(void *ptr, int size, cfp *fp)
{
int ret;
...
ret = gzread(fp->compressedfp, ptr, size);
...
return ret;
}

but then _PrintFileData() happily stashes it into a size_t, ignoring the
signedness. Surely, if

static void
_PrintFileData(ArchiveHandle *AH, char *filename)
{
size_tcnt;
...
while ((cnt = cfread(buf, buflen, cfp)))
{
ahwrite(buf, 1, cnt, AH);
}
...
}

Unless I'm missing something, if gzread() ever returns -1 or some other
negative error value, we'll cast it to  size_t, while condition will
evaluate to "true" and we'll happily chew on some random chunk of data.

So the confusion is (at least partially) a preexisting issue ...

For gzwrite() it seems to be fine, because that only returns 0 on error.
OTOH it's defined to take 'int size' but then we happily pass size_t
values to it.

As I wrote earlier, this apparently assumes we never need to deal with
buffers larger than int, and I don't think we have the ambition to relax
that (I'm not sure it's even needed / possible).


I see the read/write functions are now defined as int, but we only ever
return 0/1 from them, and then interpret that as bool. Why not to define
it like that? I don't think we need to adhere to the custom that
everything returns "int". This is an internal API. Or if we want to
stick to int, I'd define meaningful "nice" constants for 0/1.



0002 seems fine to me. I see you've ditched the idea of having two
separate buffers, and replaced them with DEFAULT_IO_BUFFER_SIZE. Fine
with me, although I wonder if this might have negative impact on
performance or something (but I doubt that).

0003 seems fine too.


> As far as the error handling is concerned, you had said upthread:
> 
>> I think the right approach is to handle all library errors and not just
>> let them through. So Gzip_write() needs to check the return value, and
>> either call pg_fatal() or translate it to an error defined by the API.
> 
> While working on it, I thought it would be clearer and more consistent
> for the pg_fatal() to be called by the caller of the individual functions.
> Each individual function can keep track of the specifics of the error
> internally. Then the caller upon detecting that there was an error by
> checking the return value, can call pg_fatal() with a uniform error
> message and then add the specifics by calling the get_error_func().
> 

I agree it's cleaner the way you did it.

I was thinking that with each compression function handling error
internally, the callers would not need to do that. But I haven't
realized there's logic to detect ENOSPC and so on, and we'd need to
duplicate that in every compression func.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Add LZ4 compression in pg_dump

2023-03-16 Thread gkokolatos


--- Original Message ---
On Tuesday, March 14th, 2023 at 4:32 PM, Tomas Vondra 
 wrote:


> 
> 
> 
> 
> On 3/14/23 16:18, gkokola...@pm.me wrote:
> 
> > ...> Would you mind me trying to come with a patch to address your points?
> 
> 
> That'd be great, thanks. Please keep it split into smaller patches - two
> might work, with one patch for "cosmetic" changes and the other tweaking
> the API error-handling stuff.

Please find attached a set for it. I will admit that the splitting in the
series might not be ideal and what you requested. It is split on what
seemed as a logical units. Please advice how a better split can look like.

0001 is unifying types and return values on the API
0002 is addressing the constant definitions
0003 is your previous 0004 adding comments

As far as the error handling is concerned, you had said upthread:

> I think the right approach is to handle all library errors and not just
> let them through. So Gzip_write() needs to check the return value, and
> either call pg_fatal() or translate it to an error defined by the API.

While working on it, I thought it would be clearer and more consistent
for the pg_fatal() to be called by the caller of the individual functions.
Each individual function can keep track of the specifics of the error
internally. Then the caller upon detecting that there was an error by
checking the return value, can call pg_fatal() with a uniform error
message and then add the specifics by calling the get_error_func().

Thoughts?

Cheers,
//Georgios

> 
> regards
> 
> --
> Tomas Vondra
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL CompanyFrom 4aa7603d891c62bf9d95af9910b8fb4b0fe2fb10 Mon Sep 17 00:00:00 2001
From: Georgios Kokolatos 
Date: Thu, 16 Mar 2023 16:30:09 +
Subject: [PATCH v2 2/3] Clean up constants in pg_dump's compression API.

Prior to the introduction of the API, pg_dump would use the ZLIB_[IN|OUT]_SIZE
constants to handle buffer sizes throughout. This behaviour is confusing after
the introduction of the API. Ammend it by introducing a DEFAULT_IO_BUFFER_SIZE
constant to use when appropriate while giving the opportunity to specific
compression implementations to use their own.

With the help and guidance of Tomas Vondra.
---
 src/bin/pg_dump/compress_gzip.c   | 22 +++---
 src/bin/pg_dump/compress_io.h |  5 ++---
 src/bin/pg_dump/compress_lz4.c| 17 +
 src/bin/pg_dump/compress_none.c   |  4 ++--
 src/bin/pg_dump/pg_backup_directory.c |  4 ++--
 5 files changed, 26 insertions(+), 26 deletions(-)

diff --git a/src/bin/pg_dump/compress_gzip.c b/src/bin/pg_dump/compress_gzip.c
index 29e2fd8d50..4106d4c866 100644
--- a/src/bin/pg_dump/compress_gzip.c
+++ b/src/bin/pg_dump/compress_gzip.c
@@ -120,8 +120,8 @@ WriteDataToArchiveGzip(ArchiveHandle *AH, CompressorState *cs,
 		 * actually allocate one extra byte because some routines want to
 		 * append a trailing zero byte to the zlib output.
 		 */
-		gzipcs->outbuf = pg_malloc(ZLIB_OUT_SIZE + 1);
-		gzipcs->outsize = ZLIB_OUT_SIZE;
+		gzipcs->outsize = DEFAULT_IO_BUFFER_SIZE;
+		gzipcs->outbuf = pg_malloc(gzipcs->outsize + 1);
 
 		/*
 		 * A level of zero simply copies the input one block at the time. This
@@ -158,10 +158,10 @@ ReadDataFromArchiveGzip(ArchiveHandle *AH, CompressorState *cs)
 	zp->zfree = Z_NULL;
 	zp->opaque = Z_NULL;
 
-	buf = pg_malloc(ZLIB_IN_SIZE);
-	buflen = ZLIB_IN_SIZE;
+	buflen = DEFAULT_IO_BUFFER_SIZE;
+	buf = pg_malloc(buflen);
 
-	out = pg_malloc(ZLIB_OUT_SIZE + 1);
+	out = pg_malloc(DEFAULT_IO_BUFFER_SIZE + 1);
 
 	if (inflateInit(zp) != Z_OK)
 		pg_fatal("could not initialize compression library: %s",
@@ -176,14 +176,14 @@ ReadDataFromArchiveGzip(ArchiveHandle *AH, CompressorState *cs)
 		while (zp->avail_in > 0)
 		{
 			zp->next_out = (void *) out;
-			zp->avail_out = ZLIB_OUT_SIZE;
+			zp->avail_out = DEFAULT_IO_BUFFER_SIZE;
 
 			res = inflate(zp, 0);
 			if (res != Z_OK && res != Z_STREAM_END)
 pg_fatal("could not uncompress data: %s", zp->msg);
 
-			out[ZLIB_OUT_SIZE - zp->avail_out] = '\0';
-			ahwrite(out, 1, ZLIB_OUT_SIZE - zp->avail_out, AH);
+			out[DEFAULT_IO_BUFFER_SIZE - zp->avail_out] = '\0';
+			ahwrite(out, 1, DEFAULT_IO_BUFFER_SIZE - zp->avail_out, AH);
 		}
 	}
 
@@ -192,13 +192,13 @@ ReadDataFromArchiveGzip(ArchiveHandle *AH, CompressorState *cs)
 	while (res != Z_STREAM_END)
 	{
 		zp->next_out = (void *) out;
-		zp->avail_out = ZLIB_OUT_SIZE;
+		zp->avail_out = DEFAULT_IO_BUFFER_SIZE;
 		res = inflate(zp, 0);
 		if (res != Z_OK && res != Z_STREAM_END)
 			pg_fatal("could not uncompress data: %s", zp->msg);
 
-		out[ZLIB_OUT_SIZE - zp->avail_out] = '\0';
-		ahwrite(out, 1, ZLIB_OUT_SIZE - zp->avail_out, AH);
+		out[DEFAULT_IO_BUFFER_SIZE - zp->avail_out] = '\0';
+		ahwrite(out, 1, DEFAULT_IO_BUFFER_SIZE - zp->avail_out, AH);
 	}
 
 	if (inflateEnd(zp) != Z_OK)
diff --git a/src/bin/pg_dump/compress_io.h b/src/bin/pg_dump/compress_io.h
index 

Re: Add LZ4 compression in pg_dump

2023-03-15 Thread Justin Pryzby
On Mon, Mar 13, 2023 at 10:47:12PM +0100, Tomas Vondra wrote:
> > Rearrange functions to their original order allowing a cleaner diff to the 
> > prior code;
> 
> OK. I wasn't very enthusiastic about this initially, but after thinking
> about it a bit I think it's meaningful to make diffs clearer. But I
> don't see much difference with/without the patch. The
> 
> git diff --diff-algorithm=minimal -w e9960732a~:src/bin/pg_dump/compress_io.c 
> src/bin/pg_dump/compress_gzip.c
> 
> Produces ~25k diff with/without the patch. What am I doing wrong?

Do you mean 25 kB of diff ?  I agree that the statistics of the diff
output don't change a lot:

  1 file changed, 201 insertions(+), 570 deletions(-)
  1 file changed, 198 insertions(+), 548 deletions(-)

But try reading the diff while looking for the cause of a bug.  It's the
difference between reading 50, two-line changes, and reading a hunk that
replaces 100 lines with a different 100 lines, with empty/unrelated
lines randomly thrown in as context.

When the diff is readable, the pg_fatal() also stands out.

> > Change pg_fatal() to an assertion+comment;
> 
> Yeah, that's reasonable. I'd even ditch the assert/comment, TBH. We
> could add such protections against "impossible" stuff to a zillion other
> places and the confusion likely outweighs the benefits.
> 
> > Update the commit message and fix a few typos;
> 
> Thanks. I don't want to annoy you too much, but could you split the
> patch into the "empty-data" fix and all the other changes (rearranging
> functions etc.)? I'd rather not mix those in the same commit.

I don't know if that makes sense?  The "empty-data" fix creates a new
function called DeflateCompressorInit().  My proposal was to add the new
function in the same place in the file as it used to be.

The patch also moves the pg_fatal() that's being removed.  I don't think
it's going to look any cleaner to read a history involving the
pg_fatal() first being added, then moved, then removed.  Anyway, I'll
wait while the community continues discussion about the pg_fatal().

-- 
Justin




Re: Add LZ4 compression in pg_dump

2023-03-14 Thread Tomas Vondra
On 3/14/23 12:07, gkokola...@pm.me wrote:
> 
> 
> --- Original Message ---
> On Monday, March 13th, 2023 at 10:47 PM, Tomas Vondra 
>  wrote:
> 
> 
> 
>>
>>> Change pg_fatal() to an assertion+comment;
>>
>>
>> Yeah, that's reasonable. I'd even ditch the assert/comment, TBH. We
>> could add such protections against "impossible" stuff to a zillion other
>> places and the confusion likely outweighs the benefits.
>>
> 
> A minor note to add is to not ignore the lessons learned from a7885c9bb.
> 
> For example, as the testing framework stands, one can not test that the
> contents of the custom format are indeed compressed. One can infer it by
> examining the header of the produced dump and searching for the
> compression flag. The code responsible for writing the header and the
> code responsible for actually dealing with data, is not the same. Also,
> the compression library itself will happily read and write uncompressed
> data.
> 
> A pg_fatal, assertion, or similar, is the only guard rail against this
> kind of error. Without it, the tests will continue passing even after
> e.g. a wrong initialization of the API. It was such a case that lead to
> a7885c9bb in the first place. I do think that we wish it to be an
> "impossible" case. Also it will be an untested case with some history
> without such a guard rail.
> 

So is the pg_fatal() a dead code or not? My understanding was it's not
really reachable, and the main purpose is to remind people this is not
possible. Or am I mistaken/confused?

If it's reachable, can we test it? AFAICS we don't, per the coverage
reports.

If it's just a protection against incorrect API initialization, then an
assert is the right solution, I think. With proper comment. But can't we
actually verify that *during* the initialization?

Also, how come WriteDataToArchiveLZ4() doesn't need this protection too?
Or is that due to gzip being the default compression method?

> Of course I will not object to removing it, if you think that is more
> confusing than useful.
> 

Not sure, I have a feeling I don't quite understand in what situation
this actually helps.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Add LZ4 compression in pg_dump

2023-03-14 Thread Tomas Vondra



On 3/14/23 16:18, gkokola...@pm.me wrote:
> ...> Would you mind me trying to come with a patch to address your points?
> 

That'd be great, thanks. Please keep it split into smaller patches - two
might work, with one patch for "cosmetic" changes and the other tweaking
the API error-handling stuff.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Add LZ4 compression in pg_dump

2023-03-14 Thread gkokolatos






--- Original Message ---
On Monday, March 13th, 2023 at 9:21 PM, Tomas Vondra 
 wrote:


>
>
>
>
> On 3/11/23 11:50, gkokola...@pm.me wrote:
>
> > --- Original Message ---
> > On Saturday, March 11th, 2023 at 7:00 AM, Alexander Lakhin 
> > exclus...@gmail.com wrote:
> >
> > > Hello,
> > > 23.02.2023 23:24, Tomas Vondra wrote:

>
>
> Thanks for the patch.
>
> I did look if there are other places that might have the same issue, and
> I think there are - see attached 0002. For example LZ4File_write is
> declared to return size_t, but then it also does
>
> if (LZ4F_isError(status))
> {
> fs->errcode = status;
>
> return -1;
> }
>
> That won't work :-(

You are right. It is confusing.

>
> And these issues may not be restricted to lz4 code - Gzip_write is
> declared to return size_t, but it does
>
> return gzwrite(gzfp, ptr, size);
>
> and gzwrite returns int. Although, maybe that's correct, because
> gzwrite() is "0 on error" so maybe this is fine ...
>
> However, Gzip_read assigns gzread() to size_t, and that does not seem
> great. It probably will still trigger the following pg_fatal() because
> it'd be very lucky to match the expected 'size', but it's confusing.

Agreed.

>
>
> I wonder whether CompressorState should use int or size_t for the
> read_func/write_func callbacks. I guess no option is perfect, i.e. no
> data type will work for all compression libraries we might use (lz4 uses
> int while lz4f uses size_t, to there's that).
>
> It's a bit weird the "open" functions return int and the read/write
> size_t. Maybe we should stick to int, which is what the old functions
> (cfwrite etc.) did.
>
You are right. These functions are modeled by the open/fread/
fwrite etc, and they have kept the return types of these ones. Their
callers do check the return value of read_func and write_func against
the requested size of bytes to be transferred.

>
> But I think the actual problem here is that the API does not clearly
> define how errors are communicated. I mean, it's nice to return the
> value returned by the library function without "mangling" it by
> conversion to size_t, but what if the libraries communicate errors in
> different way? Some may return "0" while others may return "-1".

Agreed.

>
> I think the right approach is to handle all library errors and not just
> let them through. So Gzip_write() needs to check the return value, and
> either call pg_fatal() or translate it to an error defined by the API.

It makes sense. It will change some of the behaviour of the callers,
mostly on what constitutes an error, and what error message is emitted.
This is a reasonable change though.

>
> For example we might say "returns 0 on error" and then translate all
> library-specific errors to that.

Ok.

> While looking at the code I realized a couple function comments don't
> say what's returned in case of error, etc. So 0004 adds those.
>
> 0003 is a couple minor assorted comments/questions:
>
> - Should we move ZLIB_OUT_SIZE/ZLIB_IN_SIZE to compress_gzip.c?

It would make things clearer.

> - Why are LZ4 buffer sizes different (ZLIB has both 4kB)?

Clearly some comments are needed, if the difference makes sense.

> - I wonder if we actually need LZ4F_HEADER_SIZE_MAX? Is it even possible
> for LZ4F_compressBound to return value this small (especially for 16kB
> input buffer)?
>

I would recommend to keep it. Earlier versions of LZ4F_HEADER_SIZE_MAX
do not have it. Later versions do advise to use it.

Would you mind me trying to come with a patch to address your points?

Cheers,
//Georgios

>
>
> regards
>
>
> --
> Tomas Vondra
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company




Re: lz4 --rm on Ubuntu 18.04 (Add LZ4 compression to pg_dump)

2023-03-14 Thread Tomas Vondra
On 3/14/23 11:34, Christoph Berg wrote:
> Re: Tomas Vondra
>> and I don't think there's a good place to inject the 'rm' so I ended up
>> adding a 'cleanup_cmd' right after 'compress_cmd'. But it seems a bit
>> strange / hacky. Maybe there's a better way?
> 
> Does the file need to be removed at all? Could we leave it there and
> have "make clean" remove it?
> 

I don't think that'd work, because of the automatic "discovery" where we
check if a file exists, and if not we try to append .gz and .lz4. So if
you leave the .toc, we'd not find the .lz4, making the test useless ...


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Add LZ4 compression in pg_dump

2023-03-14 Thread gkokolatos



--- Original Message ---
On Monday, March 13th, 2023 at 10:47 PM, Tomas Vondra 
 wrote:



> 
> > Change pg_fatal() to an assertion+comment;
> 
> 
> Yeah, that's reasonable. I'd even ditch the assert/comment, TBH. We
> could add such protections against "impossible" stuff to a zillion other
> places and the confusion likely outweighs the benefits.
> 

A minor note to add is to not ignore the lessons learned from a7885c9bb.

For example, as the testing framework stands, one can not test that the
contents of the custom format are indeed compressed. One can infer it by
examining the header of the produced dump and searching for the
compression flag. The code responsible for writing the header and the
code responsible for actually dealing with data, is not the same. Also,
the compression library itself will happily read and write uncompressed
data.

A pg_fatal, assertion, or similar, is the only guard rail against this
kind of error. Without it, the tests will continue passing even after
e.g. a wrong initialization of the API. It was such a case that lead to
a7885c9bb in the first place. I do think that we wish it to be an
"impossible" case. Also it will be an untested case with some history
without such a guard rail.

Of course I will not object to removing it, if you think that is more
confusing than useful.

Cheers,
//Georgios

> 
> regards
> 
> --
> Tomas Vondra
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company




Re: lz4 --rm on Ubuntu 18.04 (Add LZ4 compression to pg_dump)

2023-03-14 Thread Christoph Berg
Re: Tomas Vondra
> and I don't think there's a good place to inject the 'rm' so I ended up
> adding a 'cleanup_cmd' right after 'compress_cmd'. But it seems a bit
> strange / hacky. Maybe there's a better way?

Does the file need to be removed at all? Could we leave it there and
have "make clean" remove it?

Christoph




Re: lz4 --rm on Ubuntu 18.04 (Add LZ4 compression to pg_dump)

2023-03-13 Thread Tomas Vondra


On 3/9/23 19:00, Tomas Vondra wrote:
> 
> 
> On 3/9/23 01:30, Michael Paquier wrote:
>> On Thu, Mar 09, 2023 at 12:39:08AM +0100, Tomas Vondra wrote:
>>> IMO we should fix that. We have a bunch of buildfarm members running on
>>> Ubuntu 18.04 (or older) - it's true none of them seems to be running TAP
>>> tests. But considering how trivial the fix is ...
>>>
>>> Barring objections, I'll push a fix early next week.
>>
>> +1, better to change that, thanks.  Actually, would --rm be OK even on
>> Windows?  As far as I can see, the CI detects a LZ4 command for the
>> VS2019 environment but not the liblz4 libraries that would be needed
>> to trigger the set of tests.
> 
> Thanks for noticing that. I'll investigate next week.
> 

So, here's a fix that should (I think) replace the 'lz4 --rm' with a
simple 'rm'. I have two doubts about this, though:


1) I haven't found a simple way to inject additional command into the
test. The pg_dump runs have a predefined list of "steps" to run:

  -- compress_cmd
  -- glob_patterns
  -- command_like
  -- restore_cmd

and I don't think there's a good place to inject the 'rm' so I ended up
adding a 'cleanup_cmd' right after 'compress_cmd'. But it seems a bit
strange / hacky. Maybe there's a better way?


2) I wonder if Windows will know what 'rm' means. I haven't found any
TAP test doing 'rm' and don't see 'rm' in any $ENV either.


That being said, I have no idea how to make this work on our Windows CI.
As mentioned, the environment is missing the lz4 library - there's a

  setup_additional_packages_script: |
REM choco install -y --no-progress ...

in the .yml file, but AFAICS the chocolatey does not have lz4 :-/


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Companydiff --git a/src/bin/pg_dump/t/002_pg_dump.pl b/src/bin/pg_dump/t/002_pg_dump.pl
index 9c354213ce..b3dcf6ff6d 100644
--- a/src/bin/pg_dump/t/002_pg_dump.pl
+++ b/src/bin/pg_dump/t/002_pg_dump.pl
@@ -178,11 +178,18 @@ my %pgdump_runs = (
 		compress_cmd => {
 			program => $ENV{'LZ4'},
 			args=> [
-'-z', '-f', '--rm',
+'-z', '-f',
 "$tempdir/compression_lz4_dir/blobs.toc",
 "$tempdir/compression_lz4_dir/blobs.toc.lz4",
 			],
 		},
+		# remove the source (uncompressed) .toc file
+		cleanup_cmd => {
+			program => 'rm',
+			args=> [
+"$tempdir/compression_lz4_dir/blobs.toc",
+			],
+		},
 		# Verify that data files were compressed
 		glob_patterns => [
 			"$tempdir/compression_lz4_dir/toc.dat",
@@ -4274,6 +4281,20 @@ foreach my $run (sort keys %pgdump_runs)
 		command_ok(\@full_compress_cmd, "$run: compression commands");
 	}
 
+	if ($pgdump_runs{$run}->{cleanup_cmd})
+	{
+		my ($cleanup_cmd) = $pgdump_runs{$run}->{cleanup_cmd};
+		my $cleanup_program = $cleanup_cmd->{program};
+
+		# Skip the rest of the test if the compression program is
+		# not defined.
+		next if (!defined($cleanup_cmd) || $cleanup_cmd eq '');
+
+		my @full_cleanup_cmd =
+		  ($cleanup_cmd->{program}, @{ $cleanup_cmd->{args} });
+		command_ok(\@full_cleanup_cmd, "$run: cleanup commands");
+	}
+
 	if ($pgdump_runs{$run}->{glob_patterns})
 	{
 		my $glob_patterns = $pgdump_runs{$run}->{glob_patterns};


Re: Add LZ4 compression in pg_dump

2023-03-13 Thread Tomas Vondra
Hi Justin,

Thanks for the patch.

On 3/8/23 02:45, Justin Pryzby wrote:
> On Wed, Mar 01, 2023 at 04:52:49PM +0100, Tomas Vondra wrote:
>> Thanks. That seems correct to me, but I find it somewhat confusing,
>> because we now have
>>
>>  DeflateCompressorInit vs. InitCompressorGzip
>>
>>  DeflateCompressorEnd vs. EndCompressorGzip
>>
>>  DeflateCompressorData - The name doesn't really say what it does (would
>>  be better to have a verb in there, I think).
>>
>> I wonder if we can make this somehow clearer?
> 
> To move things along, I updated Georgios' patch:
> 
> Rename DeflateCompressorData() to DeflateCompressorCommon();

Hmmm, I don't find "common" any clearer than "data" :-( There needs to
at least be a comment explaining what "common" does.

> Rearrange functions to their original order allowing a cleaner diff to the 
> prior code;

OK. I wasn't very enthusiastic about this initially, but after thinking
about it a bit I think it's meaningful to make diffs clearer. But I
don't see much difference with/without the patch. The

git diff --diff-algorithm=minimal -w
e9960732a~:src/bin/pg_dump/compress_io.c src/bin/pg_dump/compress_gzip.c

Produces ~25k diff with/without the patch. What am I doing wrong?

> Change pg_fatal() to an assertion+comment;

Yeah, that's reasonable. I'd even ditch the assert/comment, TBH. We
could add such protections against "impossible" stuff to a zillion other
places and the confusion likely outweighs the benefits.

> Update the commit message and fix a few typos;
> 

Thanks. I don't want to annoy you too much, but could you split the
patch into the "empty-data" fix and all the other changes (rearranging
functions etc.)? I'd rather not mix those in the same commit.


>> Also, InitCompressorGzip says this:
>>
>>/*
>> * If the caller has defined a write function, prepare the necessary
>> * state. Avoid initializing during the first write call, because End
>> * may be called without ever writing any data.
>> */
>> if (cs->writeF)
>> DeflateCompressorInit(cs);
>>
>> Does it actually make sense to not have writeF defined in some cases?
> 
> InitCompressor is being called for either reading or writing, either of
> which could be null:
> 
> src/bin/pg_dump/pg_backup_custom.c: ctx->cs = 
> AllocateCompressor(AH->compression_spec,
> src/bin/pg_dump/pg_backup_custom.c-   
>NULL,
> src/bin/pg_dump/pg_backup_custom.c-   
>_CustomWriteFunc);
> --
> src/bin/pg_dump/pg_backup_custom.c: cs = 
> AllocateCompressor(AH->compression_spec,
> src/bin/pg_dump/pg_backup_custom.c-   
>   _CustomReadFunc, NULL);
> 
> It's confusing that the comment says "Avoid initializing...".  What it
> really means is "Initialize eagerly...".  But that makes more sense in
> the context of the commit message for this bugfix than in a comment.  So
> I changed that too.
> 
> +   /* If deflation was initialized, finalize it */   
>
> +   if (cs->private_data) 
>
> +   DeflateCompressorEnd(AH, cs); 
>
> 
> Maybe it'd be more clear if this used "if (cs->writeF)", like in the
> init function ?
> 

Yeah, if the two checks are equivalent, it'd be better to stick to the
same check everywhere.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Add LZ4 compression in pg_dump

2023-03-13 Thread Tomas Vondra


On 3/11/23 11:50, gkokola...@pm.me wrote:
> --- Original Message ---
> On Saturday, March 11th, 2023 at 7:00 AM, Alexander Lakhin 
>  wrote:
> 
>> Hello,
>> 23.02.2023 23:24, Tomas Vondra wrote:
>>
>>> On 2/23/23 16:26, Tomas Vondra wrote:
>>>
 Thanks for v30 with the updated commit messages. I've pushed 0001 after
 fixing a comment typo and removing (I think) an unnecessary change in an
 error message.

 I'll give the buildfarm a bit of time before pushing 0002 and 0003.
>>>
>>> I've now pushed 0002 and 0003, after minor tweaks (a couple typos etc.),
>>> and marked the CF entry as committed. Thanks for the patch!
>>>
>>> I wonder how difficult would it be to add the zstd compression, so that
>>> we don't have the annoying "unsupported" cases.
>>
>>
>> With the patch 0003 committed, a single warning -Wtype-limits appeared in the
>> master branch:
>> $ CPPFLAGS="-Og -Wtype-limits" ./configure --with-lz4 -q && make -s -j8
>> compress_lz4.c: In function ‘LZ4File_gets’:
>> compress_lz4.c:492:19: warning: comparison of unsigned expression in ‘< 0’ is
>> always false [-Wtype-limits]
>> 492 | if (dsize < 0)
>> |
> 
> Thank you Alexander. Please find attached an attempt to address it.
> 
>> (I wonder, is it accidental that there no other places that triggers
>> the warning, or some buildfarm animals had this check enabled before?)
> 
> I can not answer about the buildfarms. Do you think that adding an explicit
> check for this warning in meson would help? I am a bit uncertain as I think
> that type-limits are included in extra.
> 
> @@ -1748,6 +1748,7 @@ common_warning_flags = [
>'-Wshadow=compatible-local',
># This was included in -Wall/-Wformat in older GCC versions
>'-Wformat-security',
> +  '-Wtype-limits',
>  ]
> 
>>
>> It is not a false positive as can be proved by the 002_pg_dump.pl modified as
>> follows:
>> - program => $ENV{'LZ4'},
>>
>> + program => 'mv',
>>
>> args => [
>>
>> - '-z', '-f', '--rm',
>> "$tempdir/compression_lz4_dir/blobs.toc",
>> "$tempdir/compression_lz4_dir/blobs.toc.lz4",
>> ],
>> },
> 
> Correct, it is not a false positive. The existing testing framework provides
> limited support for exercising error branches. Especially so when those are
> dependent on generated output. 
> 
>> A diagnostic logging added shows:
>> LZ4File_gets() after LZ4File_read_internal; dsize: 18446744073709551615
>>
>> and pg_restore fails with:
>> error: invalid line in large object TOC file
>> ".../src/bin/pg_dump/tmp_check/tmp_test_22ri/compression_lz4_dir/blobs.toc": 
>> ""
> 
> It is a good thing that the restore fails with bad input. Yet it should
> have failed earlier. The attached makes certain it does fail earlier. 
> 

Thanks for the patch.

I did look if there are other places that might have the same issue, and
I think there are - see attached 0002. For example LZ4File_write is
declared to return size_t, but then it also does

if (LZ4F_isError(status))
{
fs->errcode = status;
return -1;
}

That won't work :-(

And these issues may not be restricted to lz4 code - Gzip_write is
declared to return size_t, but it does

return gzwrite(gzfp, ptr, size);

and gzwrite returns int. Although, maybe that's correct, because
gzwrite() is "0 on error" so maybe this is fine ...

However, Gzip_read assigns gzread() to size_t, and that does not seem
great. It probably will still trigger the following pg_fatal() because
it'd be very lucky to match the expected 'size', but it's confusing.


I wonder whether CompressorState should use int or size_t for the
read_func/write_func callbacks. I guess no option is perfect, i.e. no
data type will work for all compression libraries we might use (lz4 uses
int while lz4f uses size_t, to there's that).

It's a bit weird the "open" functions return int and the read/write
size_t. Maybe we should stick to int, which is what the old functions
(cfwrite etc.) did.


But I think the actual problem here is that the API does not clearly
define how errors are communicated. I mean, it's nice to return the
value returned by the library function without "mangling" it by
conversion to size_t, but what if the libraries communicate errors in
different way? Some may return "0" while others may return "-1".

I think the right approach is to handle all library errors and not just
let them through. So Gzip_write() needs to check the return value, and
either call pg_fatal() or translate it to an error defined by the API.

For example we might say "returns 0 on error" and then translate all
library-specific errors to that.


While looking at the code I realized a couple function comments don't
say what's returned in case of error, etc. So 0004 adds those.

0003 is a couple minor assorted comments/questions:

- Should we move ZLIB_OUT_SIZE/ZLIB_IN_SIZE to compress_gzip.c?

- Why are LZ4 buffer sizes different (ZLIB has both 4kB)?

- I wonder if we actually need LZ4F_HEADER_SIZE_MAX? Is it even possible

Re: Add LZ4 compression in pg_dump

2023-03-12 Thread Tomas Vondra



On 3/12/23 11:07, Peter Eisentraut wrote:
> On 11.03.23 07:00, Alexander Lakhin wrote:
>> Hello,
>> 23.02.2023 23:24, Tomas Vondra wrote:
>>> On 2/23/23 16:26, Tomas Vondra wrote:
 Thanks for v30 with the updated commit messages. I've pushed 0001 after
 fixing a comment typo and removing (I think) an unnecessary change
 in an
 error message.

 I'll give the buildfarm a bit of time before pushing 0002 and 0003.

>>> I've now pushed 0002 and 0003, after minor tweaks (a couple typos etc.),
>>> and marked the CF entry as committed. Thanks for the patch!
>>>
>>> I wonder how difficult would it be to add the zstd compression, so that
>>> we don't have the annoying "unsupported" cases.
>>
>> With the patch 0003 committed, a single warning -Wtype-limits appeared
>> in the
>> master branch:
>> $ CPPFLAGS="-Og -Wtype-limits" ./configure --with-lz4 -q && make -s -j8
>> compress_lz4.c: In function ‘LZ4File_gets’:
>> compress_lz4.c:492:19: warning: comparison of unsigned expression in
>> ‘< 0’ is always false [-Wtype-limits]
>>    492 | if (dsize < 0)
>>    |
>> (I wonder, is it accidental that there no other places that triggers
>> the warning, or some buildfarm animals had this check enabled before?)
> 
> I think there is an underlying problem in this code that it dances back
> and forth between size_t and int in an unprincipled way.
> 
> In the code that triggers the warning, dsize is size_t.  dsize is the
> return from LZ4File_read_internal(), which is declared to return int.
> The variable that LZ4File_read_internal() returns in the success case is
> size_t, but in case of an error it returns -1.  (So the code that is
> warning is meaning to catch this error case, but it won't ever work.)
> Further below LZ4File_read_internal() calls LZ4File_read_overflow(),
> which is declared to return int, but in some cases it returns
> fs->overflowlen, which is size_t.
> 

I agree. I just got home so I looked at this only very briefly, but I
think it's clearly wrong to assign the LZ4File_read_internal() result to
a size_t variable (and it seems to me LZ4File_gets does the same mistake
with LZ4File_read_internal() result).

I'll get this fixed early next week, I'm too tired to do that now
without likely causing further issues.

> This should be cleaned up.
> 
> AFAICT, the upstream API in lz4.h uses int for size values, but
> lz4frame.h uses size_t, so I don't know what the correct approach is.

Yeah, that's a good point. I think Justin is right we should be using
the LZ4F stuff, so ultimately we'll probably switch to size_t. But IMO
it's definitely better to correct the current code first, and only then
switch to LZ4F (from one correct state to another).


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Add LZ4 compression in pg_dump

2023-03-12 Thread Peter Eisentraut

On 11.03.23 07:00, Alexander Lakhin wrote:

Hello,
23.02.2023 23:24, Tomas Vondra wrote:

On 2/23/23 16:26, Tomas Vondra wrote:

Thanks for v30 with the updated commit messages. I've pushed 0001 after
fixing a comment typo and removing (I think) an unnecessary change in an
error message.

I'll give the buildfarm a bit of time before pushing 0002 and 0003.


I've now pushed 0002 and 0003, after minor tweaks (a couple typos etc.),
and marked the CF entry as committed. Thanks for the patch!

I wonder how difficult would it be to add the zstd compression, so that
we don't have the annoying "unsupported" cases.


With the patch 0003 committed, a single warning -Wtype-limits appeared 
in the

master branch:
$ CPPFLAGS="-Og -Wtype-limits" ./configure --with-lz4 -q && make -s -j8
compress_lz4.c: In function ‘LZ4File_gets’:
compress_lz4.c:492:19: warning: comparison of unsigned expression in ‘< 
0’ is always false [-Wtype-limits]

   492 | if (dsize < 0)
   |
(I wonder, is it accidental that there no other places that triggers
the warning, or some buildfarm animals had this check enabled before?)


I think there is an underlying problem in this code that it dances back 
and forth between size_t and int in an unprincipled way.


In the code that triggers the warning, dsize is size_t.  dsize is the 
return from LZ4File_read_internal(), which is declared to return int. 
The variable that LZ4File_read_internal() returns in the success case is 
size_t, but in case of an error it returns -1.  (So the code that is 
warning is meaning to catch this error case, but it won't ever work.) 
Further below LZ4File_read_internal() calls LZ4File_read_overflow(), 
which is declared to return int, but in some cases it returns 
fs->overflowlen, which is size_t.


This should be cleaned up.

AFAICT, the upstream API in lz4.h uses int for size values, but 
lz4frame.h uses size_t, so I don't know what the correct approach is.





Re: Add LZ4 compression in pg_dump

2023-03-11 Thread Alexander Lakhin

Hi Georgios,

11.03.2023 13:50, gkokola...@pm.me wrote:

I can not answer about the buildfarms. Do you think that adding an explicit
check for this warning in meson would help? I am a bit uncertain as I think
that type-limits are included in extra.

@@ -1748,6 +1748,7 @@ common_warning_flags = [
'-Wshadow=compatible-local',
# This was included in -Wall/-Wformat in older GCC versions
'-Wformat-security',
+  '-Wtype-limits',
  ]

I'm not sure that I can promote additional checks (or determine where
to put them), but if some patch introduces a warning of a type that wasn't
present before, I think it's worth to eliminate the warning (if it is
sensible) to keep the source code check baseline at the same level
or even lift it up gradually.
I've also found that the same commit introduced a single instance of
the analyzer-possible-null-argument warning:
CPPFLAGS="-Og -fanalyzer -Wno-analyzer-malloc-leak -Wno-analyzer-file-leak 
-Wno-analyzer-null-dereference -Wno-analyzer-shift-count-overflow 
-Wno-analyzer-free-of-non-heap -Wno-analyzer-null-argument 
-Wno-analyzer-double-free -Wanalyzer-possible-null-argument" ./configure 
--with-lz4 -q && make -s -j8

compress_io.c: In function ‘hasSuffix’:
compress_io.c:158:47: warning: use of possibly-NULL ‘filename’ where non-null 
expected [CWE-690] [-Wanalyzer-possible-null-argument]

  158 | int filenamelen = strlen(filename);
  | ^~~~
  ‘InitDiscoverCompressFileHandle’: events 1-3
...

(I use gcc-11.3.)
As I can see, many existing uses of strdup() are followed by a check for
null result, so maybe it's a common practice and a similar check should
be added in InitDiscoverCompressFileHandle().
(There also a couple of other warnings introduced with the lz4 compression
patches, but those ones are not unique, so I maybe they aren't worth fixing.)


It is a good thing that the restore fails with bad input. Yet it should
have failed earlier. The attached makes certain it does fail earlier.


Thanks! Your patch definitely fixes the issue.

Best regards,
Alexander




Re: Add LZ4 compression in pg_dump

2023-03-11 Thread gkokolatos
--- Original Message ---
On Saturday, March 11th, 2023 at 7:00 AM, Alexander Lakhin 
 wrote:

> Hello,
> 23.02.2023 23:24, Tomas Vondra wrote:
> 
> > On 2/23/23 16:26, Tomas Vondra wrote:
> > 
> > > Thanks for v30 with the updated commit messages. I've pushed 0001 after
> > > fixing a comment typo and removing (I think) an unnecessary change in an
> > > error message.
> > > 
> > > I'll give the buildfarm a bit of time before pushing 0002 and 0003.
> > 
> > I've now pushed 0002 and 0003, after minor tweaks (a couple typos etc.),
> > and marked the CF entry as committed. Thanks for the patch!
> > 
> > I wonder how difficult would it be to add the zstd compression, so that
> > we don't have the annoying "unsupported" cases.
> 
> 
> With the patch 0003 committed, a single warning -Wtype-limits appeared in the
> master branch:
> $ CPPFLAGS="-Og -Wtype-limits" ./configure --with-lz4 -q && make -s -j8
> compress_lz4.c: In function ‘LZ4File_gets’:
> compress_lz4.c:492:19: warning: comparison of unsigned expression in ‘< 0’ is
> always false [-Wtype-limits]
> 492 | if (dsize < 0)
> |

Thank you Alexander. Please find attached an attempt to address it.

> (I wonder, is it accidental that there no other places that triggers
> the warning, or some buildfarm animals had this check enabled before?)

I can not answer about the buildfarms. Do you think that adding an explicit
check for this warning in meson would help? I am a bit uncertain as I think
that type-limits are included in extra.

@@ -1748,6 +1748,7 @@ common_warning_flags = [
   '-Wshadow=compatible-local',
   # This was included in -Wall/-Wformat in older GCC versions
   '-Wformat-security',
+  '-Wtype-limits',
 ]

> 
> It is not a false positive as can be proved by the 002_pg_dump.pl modified as
> follows:
> - program => $ENV{'LZ4'},
> 
> + program => 'mv',
> 
> args => [
> 
> - '-z', '-f', '--rm',
> "$tempdir/compression_lz4_dir/blobs.toc",
> "$tempdir/compression_lz4_dir/blobs.toc.lz4",
> ],
> },

Correct, it is not a false positive. The existing testing framework provides
limited support for exercising error branches. Especially so when those are
dependent on generated output. 

> A diagnostic logging added shows:
> LZ4File_gets() after LZ4File_read_internal; dsize: 18446744073709551615
> 
> and pg_restore fails with:
> error: invalid line in large object TOC file
> ".../src/bin/pg_dump/tmp_check/tmp_test_22ri/compression_lz4_dir/blobs.toc": 
> ""

It is a good thing that the restore fails with bad input. Yet it should
have failed earlier. The attached makes certain it does fail earlier. 

Cheers,
//Georgios

> 
> Best regards,
> AlexanderFrom b80bb52ef6546aee8c8431d7cc126fa4a76b543c Mon Sep 17 00:00:00 2001
From: Georgios Kokolatos 
Date: Sat, 11 Mar 2023 09:54:40 +
Subject: [PATCH v1] Respect return type of LZ4File_read_internal

The function LZ4File_gets() was storing the return value of
LZ4File_read_internal in a variable of the wrong type, disregarding sign-es.
As a consequence, LZ4File_gets() would not take the error path when it should.

In an attempt to improve readability, spell out the significance of a negative
return value of LZ4File_read_internal() in LZ4File_read().

Reported-by: Alexander Lakhin 
---
 src/bin/pg_dump/compress_lz4.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/src/bin/pg_dump/compress_lz4.c b/src/bin/pg_dump/compress_lz4.c
index 63e794cdc6..cc039f0b47 100644
--- a/src/bin/pg_dump/compress_lz4.c
+++ b/src/bin/pg_dump/compress_lz4.c
@@ -453,7 +453,7 @@ LZ4File_read(void *ptr, size_t size, CompressFileHandle *CFH)
 	int			ret;
 
 	ret = LZ4File_read_internal(fs, ptr, size, false);
-	if (ret != size && !LZ4File_eof(CFH))
+	if (ret < 0 || (ret != size && !LZ4File_eof(CFH)))
 		pg_fatal("could not read from input file: %s", LZ4File_get_error(CFH));
 
 	return ret;
@@ -486,14 +486,14 @@ static char *
 LZ4File_gets(char *ptr, int size, CompressFileHandle *CFH)
 {
 	LZ4File*fs = (LZ4File *) CFH->private_data;
-	size_t		dsize;
+	int			ret;
 
-	dsize = LZ4File_read_internal(fs, ptr, size, true);
-	if (dsize < 0)
+	ret = LZ4File_read_internal(fs, ptr, size, true);
+	if (ret < 0)
 		pg_fatal("could not read from input file: %s", LZ4File_get_error(CFH));
 
 	/* Done reading */
-	if (dsize == 0)
+	if (ret == 0)
 		return NULL;
 
 	return ptr;
-- 
2.34.1



Re: Add LZ4 compression in pg_dump

2023-03-10 Thread Alexander Lakhin

Hello,
23.02.2023 23:24, Tomas Vondra wrote:

On 2/23/23 16:26, Tomas Vondra wrote:

Thanks for v30 with the updated commit messages. I've pushed 0001 after
fixing a comment typo and removing (I think) an unnecessary change in an
error message.

I'll give the buildfarm a bit of time before pushing 0002 and 0003.


I've now pushed 0002 and 0003, after minor tweaks (a couple typos etc.),
and marked the CF entry as committed. Thanks for the patch!

I wonder how difficult would it be to add the zstd compression, so that
we don't have the annoying "unsupported" cases.


With the patch 0003 committed, a single warning -Wtype-limits appeared in the
master branch:
$ CPPFLAGS="-Og -Wtype-limits" ./configure --with-lz4 -q && make -s -j8
compress_lz4.c: In function ‘LZ4File_gets’:
compress_lz4.c:492:19: warning: comparison of unsigned expression in ‘< 0’ is 
always false [-Wtype-limits]

  492 | if (dsize < 0)
  |
(I wonder, is it accidental that there no other places that triggers
the warning, or some buildfarm animals had this check enabled before?)

It is not a false positive as can be proved by the 002_pg_dump.pl modified as
follows:
-   program => $ENV{'LZ4'},
+   program => 'mv',
    args    => [
-   '-z', '-f', '--rm',
"$tempdir/compression_lz4_dir/blobs.toc",
"$tempdir/compression_lz4_dir/blobs.toc.lz4",
    ],
    },
A diagnostic logging added shows:
LZ4File_gets() after LZ4File_read_internal; dsize: 18446744073709551615

and pg_restore fails with:
error: invalid line in large object TOC file 
".../src/bin/pg_dump/tmp_check/tmp_test_22ri/compression_lz4_dir/blobs.toc": ""


Best regards,
Alexander




Re: Add LZ4 compression in pg_dump

2023-03-10 Thread Michael Paquier
On Fri, Mar 10, 2023 at 07:05:49AM -0600, Justin Pryzby wrote:
> On Thu, Mar 09, 2023 at 06:58:20PM +0100, Tomas Vondra wrote:
>> I'm a bit confused about the lz4 vs. lz4f stuff, TBH. If we switch to
>> lz4f, doesn't that mean it (e.g. restore) won't work on systems that
>> only have older lz4 version? What would/should happen if we take backup
>> compressed with lz4f, an then try restoring it on an older system where
>> lz4 does not support lz4f?
> 
> You seem to be thinking about LZ4F as a weird, new innovation I'm
> experimenting with, but compress_lz4.c already uses LZ4F for its "file"
> API.

Note: we already use lz4 frames in pg_receivewal (for WAL) and
pg_basebackup (bbstreamer).
--
Michael


signature.asc
Description: PGP signature


Re: Add LZ4 compression in pg_dump

2023-03-10 Thread Justin Pryzby
On Thu, Mar 09, 2023 at 06:58:20PM +0100, Tomas Vondra wrote:
> I'm a bit confused about the lz4 vs. lz4f stuff, TBH. If we switch to
> lz4f, doesn't that mean it (e.g. restore) won't work on systems that
> only have older lz4 version? What would/should happen if we take backup
> compressed with lz4f, an then try restoring it on an older system where
> lz4 does not support lz4f?

You seem to be thinking about LZ4F as a weird, new innovation I'm
experimenting with, but compress_lz4.c already uses LZ4F for its "file"
API.  LZ4F is also what's written by the lz4 CLI tool, and I found that
LZ4F has been included in the library for ~8 years:

https://github.com/lz4/lz4/releases?page=2
r126 Dec 24, 2014
New : lz4frame API is now integrated into liblz4

> Maybe if lz4f format is incompatible with regular lz4, we should treat
> it as a separate compression method 'lz4f'?
> 
> I'm mostly afk until the end of the week, but I tried searching for lz4f
> info - the results are not particularly enlightening, unfortunately.
> 
> AFAICS this only applies to lz4f stuff. Or would the streaming mode be a
> breaking change too?

Streaming mode outputs the same format as the existing code, but gives
better compression.  We could (theoretically) change it in a bugfix
release, and old output would still be restorable (I think new output
would even be restorable with the old versions of pg_restore).

But that's not true for LZ4F.  The benefit there is that it avoids
outputing a separate block for each row.  That's essential for narrow
tables, for which the block header currently being written has an
overhead several times larger than the data.

-- 
Justin




Re: lz4 --rm on Ubuntu 18.04 (Add LZ4 compression to pg_dump)

2023-03-09 Thread Tomas Vondra



On 3/9/23 01:30, Michael Paquier wrote:
> On Thu, Mar 09, 2023 at 12:39:08AM +0100, Tomas Vondra wrote:
>> IMO we should fix that. We have a bunch of buildfarm members running on
>> Ubuntu 18.04 (or older) - it's true none of them seems to be running TAP
>> tests. But considering how trivial the fix is ...
>>
>> Barring objections, I'll push a fix early next week.
> 
> +1, better to change that, thanks.  Actually, would --rm be OK even on
> Windows?  As far as I can see, the CI detects a LZ4 command for the
> VS2019 environment but not the liblz4 libraries that would be needed
> to trigger the set of tests.

Thanks for noticing that. I'll investigate next week.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Add LZ4 compression in pg_dump

2023-03-09 Thread Tomas Vondra



On 3/9/23 17:15, Justin Pryzby wrote:
> On Wed, Mar 01, 2023 at 05:39:54PM +0100, Tomas Vondra wrote:
>> On 2/27/23 05:49, Justin Pryzby wrote:
>>> On Sat, Feb 25, 2023 at 08:05:53AM -0600, Justin Pryzby wrote:
 On Fri, Feb 24, 2023 at 11:02:14PM -0600, Justin Pryzby wrote:
> I have some fixes (attached) and questions while polishing the patch for
> zstd compression.  The fixes are small and could be integrated with the
> patch for zstd, but could be applied independently.

 One more - WriteDataToArchiveGzip() says:
>>>
>>> One more again.
>>>
>>> The LZ4 path is using non-streaming mode, which compresses each block
>>> without persistent state, giving poor compression for -Fc compared with
>>> -Fp.  If the data is highly compressible, the difference can be orders
>>> of magnitude.
>>>
>>> $ ./src/bin/pg_dump/pg_dump -h /tmp postgres -Z lz4 -Fp |wc -c
>>> 12351763
>>> $ ./src/bin/pg_dump/pg_dump -h /tmp postgres -Z lz4 -Fc |wc -c
>>> 21890708
>>>
>>> That's not true for gzip:
>>>
>>> $ ./src/bin/pg_dump/pg_dump -h /tmp postgres -t t1 -Z gzip -Fc |wc -c
>>> 2118869
>>> $ ./src/bin/pg_dump/pg_dump -h /tmp postgres -t t1 -Z gzip -Fp |wc -c
>>> 2115832
>>>
>>> The function ought to at least use streaming mode, so each block/row
>>> isn't compressioned in isolation.  003 is a simple patch to use
>>> streaming mode, which improves the -Fc case:
>>>
>>> $ ./src/bin/pg_dump/pg_dump -h /tmp postgres -Z lz4 -Fc |wc -c
>>> 15178283
>>>
>>> However, that still flushes the compression buffer, writing a block
>>> header, for every row.  With a single-column table, pg_dump -Fc -Z lz4
>>> still outputs ~10% *more* data than with no compression at all.  And
>>> that's for compressible data.
>>>
>>> $ ./src/bin/pg_dump/pg_dump -h /tmp postgres -t t1 -Fc -Z lz4 |wc -c
>>> 12890296
>>> $ ./src/bin/pg_dump/pg_dump -h /tmp postgres -t t1 -Fc -Z none |wc -c
>>> 11890296
>>>
>>> I think this should use the LZ4F API with frames, which are buffered to
>>> avoid outputting a header for every single row.  The LZ4F format isn't
>>> compatible with the LZ4 format, so (unlike changing to the streaming
>>> API) that's not something we can change in a bugfix release.  I consider
>>> this an Opened Item.
>>>
>>> With the LZ4F API in 004, -Fp and -Fc are essentially the same size
>>> (like gzip).  (Oh, and the output is three times smaller, too.)
>>>
>>> $ ./src/bin/pg_dump/pg_dump -h /tmp postgres -t t1 -Z lz4 -Fp |wc -c
>>> 4155448
>>> $ ./src/bin/pg_dump/pg_dump -h /tmp postgres -t t1 -Z lz4 -Fc |wc -c
>>> 4156548
>>
>> Thanks. Those are definitely interesting improvements/optimizations!
>>
>> I suggest we track them as a separate patch series - please add them to
>> the CF app (I guess you'll have to add them to 2023-07 at this point,
>> but we can get them in, I think).
> 
> Thanks for looking.  I'm not sure if I'm the best person to write/submit
> the patch to implement that for LZ4.  Georgios, would you want to take
> on this change ?
> 
> I think that needs to be changed for v16, since 1) LZ4F works so much
> better like this, and 2) we can't change it later without breaking
> compatibility of the dumpfiles by changing the header with another name
> other than "lz4".  Also, I imagine we'd want to continue supporting the
> ability to *restore* a dumpfile using the old(current) format, which
> would be untestable code unless we also preserved the ability to write
> it somehow (like -Z lz4-old).
> 

I'm a bit confused about the lz4 vs. lz4f stuff, TBH. If we switch to
lz4f, doesn't that mean it (e.g. restore) won't work on systems that
only have older lz4 version? What would/should happen if we take backup
compressed with lz4f, an then try restoring it on an older system where
lz4 does not support lz4f?

Maybe if lz4f format is incompatible with regular lz4, we should treat
it as a separate compression method 'lz4f'?

I'm mostly afk until the end of the week, but I tried searching for lz4f
info - the results are not particularly enlightening, unfortunately.

AFAICS this only applies to lz4f stuff. Or would the streaming mode be a
breaking change too?

> One issue is that LZ4F_createCompressionContext() and
> LZ4F_compressBegin() ought to be called in InitCompressorLZ4().  It
> seems like it might *need* to be called there to avoid exactly the kind
> of issue that I reported with empty LOs with gzip.  But
> InitCompressorLZ4() isn't currently passed the ArchiveHandle, so can't
> write the header.  And LZ4CompressorState has a simple char *buf, and
> not an more elaborate data structure like zlib.  You could work around
> that by keeping storing the "len" of the existing buffer, and flushing
> it in EndCompressorLZ4(), but that adds needless complexity to the Write
> and End functions.  Maybe the Init function should be passed the AH.
> 

Not sure, but looking at GzipCompressorState I see the only extra thing
it has (compared to LZ4CompressorState) is "z_streamp". I can't
experiment with this until the end of 

Re: Add LZ4 compression in pg_dump

2023-03-09 Thread Justin Pryzby
On Wed, Mar 01, 2023 at 05:39:54PM +0100, Tomas Vondra wrote:
> On 2/27/23 05:49, Justin Pryzby wrote:
> > On Sat, Feb 25, 2023 at 08:05:53AM -0600, Justin Pryzby wrote:
> >> On Fri, Feb 24, 2023 at 11:02:14PM -0600, Justin Pryzby wrote:
> >>> I have some fixes (attached) and questions while polishing the patch for
> >>> zstd compression.  The fixes are small and could be integrated with the
> >>> patch for zstd, but could be applied independently.
> >>
> >> One more - WriteDataToArchiveGzip() says:
> > 
> > One more again.
> > 
> > The LZ4 path is using non-streaming mode, which compresses each block
> > without persistent state, giving poor compression for -Fc compared with
> > -Fp.  If the data is highly compressible, the difference can be orders
> > of magnitude.
> > 
> > $ ./src/bin/pg_dump/pg_dump -h /tmp postgres -Z lz4 -Fp |wc -c
> > 12351763
> > $ ./src/bin/pg_dump/pg_dump -h /tmp postgres -Z lz4 -Fc |wc -c
> > 21890708
> > 
> > That's not true for gzip:
> > 
> > $ ./src/bin/pg_dump/pg_dump -h /tmp postgres -t t1 -Z gzip -Fc |wc -c
> > 2118869
> > $ ./src/bin/pg_dump/pg_dump -h /tmp postgres -t t1 -Z gzip -Fp |wc -c
> > 2115832
> > 
> > The function ought to at least use streaming mode, so each block/row
> > isn't compressioned in isolation.  003 is a simple patch to use
> > streaming mode, which improves the -Fc case:
> > 
> > $ ./src/bin/pg_dump/pg_dump -h /tmp postgres -Z lz4 -Fc |wc -c
> > 15178283
> > 
> > However, that still flushes the compression buffer, writing a block
> > header, for every row.  With a single-column table, pg_dump -Fc -Z lz4
> > still outputs ~10% *more* data than with no compression at all.  And
> > that's for compressible data.
> > 
> > $ ./src/bin/pg_dump/pg_dump -h /tmp postgres -t t1 -Fc -Z lz4 |wc -c
> > 12890296
> > $ ./src/bin/pg_dump/pg_dump -h /tmp postgres -t t1 -Fc -Z none |wc -c
> > 11890296
> > 
> > I think this should use the LZ4F API with frames, which are buffered to
> > avoid outputting a header for every single row.  The LZ4F format isn't
> > compatible with the LZ4 format, so (unlike changing to the streaming
> > API) that's not something we can change in a bugfix release.  I consider
> > this an Opened Item.
> > 
> > With the LZ4F API in 004, -Fp and -Fc are essentially the same size
> > (like gzip).  (Oh, and the output is three times smaller, too.)
> > 
> > $ ./src/bin/pg_dump/pg_dump -h /tmp postgres -t t1 -Z lz4 -Fp |wc -c
> > 4155448
> > $ ./src/bin/pg_dump/pg_dump -h /tmp postgres -t t1 -Z lz4 -Fc |wc -c
> > 4156548
> 
> Thanks. Those are definitely interesting improvements/optimizations!
> 
> I suggest we track them as a separate patch series - please add them to
> the CF app (I guess you'll have to add them to 2023-07 at this point,
> but we can get them in, I think).

Thanks for looking.  I'm not sure if I'm the best person to write/submit
the patch to implement that for LZ4.  Georgios, would you want to take
on this change ?

I think that needs to be changed for v16, since 1) LZ4F works so much
better like this, and 2) we can't change it later without breaking
compatibility of the dumpfiles by changing the header with another name
other than "lz4".  Also, I imagine we'd want to continue supporting the
ability to *restore* a dumpfile using the old(current) format, which
would be untestable code unless we also preserved the ability to write
it somehow (like -Z lz4-old).

One issue is that LZ4F_createCompressionContext() and
LZ4F_compressBegin() ought to be called in InitCompressorLZ4().  It
seems like it might *need* to be called there to avoid exactly the kind
of issue that I reported with empty LOs with gzip.  But
InitCompressorLZ4() isn't currently passed the ArchiveHandle, so can't
write the header.  And LZ4CompressorState has a simple char *buf, and
not an more elaborate data structure like zlib.  You could work around
that by keeping storing the "len" of the existing buffer, and flushing
it in EndCompressorLZ4(), but that adds needless complexity to the Write
and End functions.  Maybe the Init function should be passed the AH.

-- 
Justin




Re: lz4 --rm on Ubuntu 18.04 (Add LZ4 compression to pg_dump)

2023-03-08 Thread Michael Paquier
On Thu, Mar 09, 2023 at 12:39:08AM +0100, Tomas Vondra wrote:
> IMO we should fix that. We have a bunch of buildfarm members running on
> Ubuntu 18.04 (or older) - it's true none of them seems to be running TAP
> tests. But considering how trivial the fix is ...
> 
> Barring objections, I'll push a fix early next week.

+1, better to change that, thanks.  Actually, would --rm be OK even on
Windows?  As far as I can see, the CI detects a LZ4 command for the
VS2019 environment but not the liblz4 libraries that would be needed
to trigger the set of tests.
--
Michael


signature.asc
Description: PGP signature


Re: lz4 --rm on Ubuntu 18.04 (Add LZ4 compression to pg_dump)

2023-03-08 Thread Tomas Vondra
On 3/8/23 20:20, Daniel Gustafsson wrote:
>> On 8 Mar 2023, at 18:55, Christoph Berg  wrote:
> 
>> 18.04 will be EOL in a few weeks so it might be ok to just say it's
>> not supported, but removing the input file manually after calling lz4
>> would be an easy fix.
> 
> Is it reasonable to expect that this version of LZ4 can/will appear on any
> other platform outside of archeology?  Removing the file manually would be a
> trivial way to stabilize but if it's only expected to happen on platforms 
> which
> are long since EOL by the time 16 ships then the added complication could be
> hard to justify.
> 

IMO we should fix that. We have a bunch of buildfarm members running on
Ubuntu 18.04 (or older) - it's true none of them seems to be running TAP
tests. But considering how trivial the fix is ...

Barring objections, I'll push a fix early next week.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: lz4 --rm on Ubuntu 18.04 (Add LZ4 compression to pg_dump)

2023-03-08 Thread Daniel Gustafsson
> On 8 Mar 2023, at 18:55, Christoph Berg  wrote:

> 18.04 will be EOL in a few weeks so it might be ok to just say it's
> not supported, but removing the input file manually after calling lz4
> would be an easy fix.

Is it reasonable to expect that this version of LZ4 can/will appear on any
other platform outside of archeology?  Removing the file manually would be a
trivial way to stabilize but if it's only expected to happen on platforms which
are long since EOL by the time 16 ships then the added complication could be
hard to justify.

--
Daniel Gustafsson





lz4 --rm on Ubuntu 18.04 (Add LZ4 compression to pg_dump)

2023-03-08 Thread Christoph Berg
Re: Tomas Vondra
> Add LZ4 compression to pg_dump

This broke the TAP tests on Ubuntu 18.04 (bionic):

[17:06:45.513](0.000s) ok 1927 - compression_lz4_custom: should not dump 
test_table with 4-row INSERTs
# Running: pg_dump --jobs=2 --format=directory --compress=lz4:1 
--file=/home/myon/projects/postgresql/pg/master/build/src/bin/pg_dump/tmp_check/tmp_test__aAO/compression_lz4_dir
 postgres
[17:06:46.651](1.137s) ok 1928 - compression_lz4_dir: pg_dump runs
# Running: /usr/bin/lz4 -z -f --rm 
/home/myon/projects/postgresql/pg/master/build/src/bin/pg_dump/tmp_check/tmp_test__aAO/compression_lz4_dir/blobs.toc
 
/home/myon/projects/postgresql/pg/master/build/src/bin/pg_dump/tmp_check/tmp_test__aAO/compression_lz4_dir/blobs.toc.lz4
Incorrect parameters
Usage :
  /usr/bin/lz4 [arg] [input] [output]

input   : a filename
  with no FILE, or when FILE is - or stdin, read standard input
Arguments :
 -1 : Fast compression (default) 
 -9 : High compression 
 -d : decompression (default for .lz4 extension)
 -z : force compression
 -f : overwrite output without prompting 
 -h/-H  : display help/long help and exit
[17:06:46.667](0.016s) not ok 1929 - compression_lz4_dir: compression commands
[17:06:46.668](0.001s) 
[17:06:46.668](0.001s) #   Failed test 'compression_lz4_dir: compression 
commands'
[17:06:46.669](0.000s) #   at t/002_pg_dump.pl line 4274.
[17:06:46.670](0.001s) ok 1930 - compression_lz4_dir: glob check for 
/home/myon/projects/postgresql/pg/master/build/src/bin/pg_dump/tmp_check/tmp_test__aAO/compression_lz4_dir/toc.dat

The lz4 binary there doesn't have the --rm option yet.

liblz4-tool  0.0~r131-2ubuntu3

--rm appears in a single place only:

 # Give coverage for manually compressed blob.toc files during
 # restore.
 compress_cmd => {
 program => $ENV{'LZ4'},
 args=> [
 '-z', '-f', '--rm',
 "$tempdir/compression_lz4_dir/blobs.toc",
 "$tempdir/compression_lz4_dir/blobs.toc.lz4",
 ],
 },

18.04 will be EOL in a few weeks so it might be ok to just say it's
not supported, but removing the input file manually after calling lz4
would be an easy fix.

Christoph




Re: Add LZ4 compression in pg_dump

2023-03-07 Thread Justin Pryzby
On Wed, Mar 01, 2023 at 04:52:49PM +0100, Tomas Vondra wrote:
> Thanks. That seems correct to me, but I find it somewhat confusing,
> because we now have
> 
>  DeflateCompressorInit vs. InitCompressorGzip
> 
>  DeflateCompressorEnd vs. EndCompressorGzip
> 
>  DeflateCompressorData - The name doesn't really say what it does (would
>  be better to have a verb in there, I think).
> 
> I wonder if we can make this somehow clearer?

To move things along, I updated Georgios' patch:

Rename DeflateCompressorData() to DeflateCompressorCommon();
Rearrange functions to their original order allowing a cleaner diff to the 
prior code;
Change pg_fatal() to an assertion+comment;
Update the commit message and fix a few typos;

> Also, InitCompressorGzip says this:
> 
>/*
> * If the caller has defined a write function, prepare the necessary
> * state. Avoid initializing during the first write call, because End
> * may be called without ever writing any data.
> */
> if (cs->writeF)
> DeflateCompressorInit(cs);
>
> Does it actually make sense to not have writeF defined in some cases?

InitCompressor is being called for either reading or writing, either of
which could be null:

src/bin/pg_dump/pg_backup_custom.c: ctx->cs = 
AllocateCompressor(AH->compression_spec,
src/bin/pg_dump/pg_backup_custom.c- 
 NULL,
src/bin/pg_dump/pg_backup_custom.c- 
 _CustomWriteFunc);
--
src/bin/pg_dump/pg_backup_custom.c: cs = 
AllocateCompressor(AH->compression_spec,
src/bin/pg_dump/pg_backup_custom.c- 
_CustomReadFunc, NULL);

It's confusing that the comment says "Avoid initializing...".  What it
really means is "Initialize eagerly...".  But that makes more sense in
the context of the commit message for this bugfix than in a comment.  So
I changed that too.

+   /* If deflation was initialized, finalize it */ 
 
+   if (cs->private_data)   
 
+   DeflateCompressorEnd(AH, cs);   
 

Maybe it'd be more clear if this used "if (cs->writeF)", like in the
init function ?

-- 
Justin
>From 5c027aa86e8591db140093c48a58aafba7a6c28c Mon Sep 17 00:00:00 2001
From: Georgios Kokolatos 
Date: Wed, 1 Mar 2023 12:42:32 +
Subject: [PATCH] pg_dump: fix gzip compression of empty data

When creating dumps with the Compressor API, it is possible to only call
the Allocate and End compressor functions without ever writing any data.
Since e9960732a, the gzip implementation wrongly assumed that the write
function would always be called and deferred the initialization of the
internal compression system until the first write call.

The problem with that approach is that the End call would not finalize
the internal compression system if it hadn't been initialized.

This commit initializes the internal compression system during the
Allocate call, whenever a write function was provided by the caller.
Given that decompression does not need to keep track of any state, the
compressor's private_data member is now populated only during
compression.

In passing, rearrange the functions to their original order, to allow
usefully comparing with the previous code, like:
git diff --diff-algorithm=minimal -w e9960732a~:src/bin/pg_dump/compress_io.c src/bin/pg_dump/compress_gzip.c

Also replace an unreachable pg_fatal() with an assert+comment.  I
(Justin) argued that the new fatal shouldn't have been introduced in a
refactoring commit, so this is a compromise.

Report and initial patch by Justin Pryzby, test case by Georgios
Kokolatos.

https://www.postgresql.org/message-id/20230228235834.GC30529%40telsasoft.com
---
 src/bin/pg_dump/compress_gzip.c  | 137 ++-
 src/bin/pg_dump/t/002_pg_dump.pl |  23 ++
 2 files changed, 101 insertions(+), 59 deletions(-)

diff --git a/src/bin/pg_dump/compress_gzip.c b/src/bin/pg_dump/compress_gzip.c
index 0af65afeb4e..3c9ac55c266 100644
--- a/src/bin/pg_dump/compress_gzip.c
+++ b/src/bin/pg_dump/compress_gzip.c
@@ -32,9 +32,75 @@ typedef struct GzipCompressorState
 	size_t		outsize;
 } GzipCompressorState;
 
+
 /* Private routines that support gzip compressed data I/O */
+static void DeflateCompressorInit(CompressorState *cs);
+static void DeflateCompressorEnd(ArchiveHandle *AH, CompressorState *cs);
+static void DeflateCompressorCommon(ArchiveHandle *AH, CompressorState *cs,
+	bool flush);
+static void EndCompressorGzip(ArchiveHandle *AH, CompressorState *cs);
+static void WriteDataToArchiveGzip(ArchiveHandle *AH, CompressorState *cs,
+   const void *data, size_t dLen);

Re: Add LZ4 compression in pg_dump

2023-03-02 Thread Tomas Vondra



On 3/2/23 18:18, Justin Pryzby wrote:
> On Wed, Mar 01, 2023 at 05:20:05PM +0100, Tomas Vondra wrote:
>> On 2/25/23 15:05, Justin Pryzby wrote:
>>> On Fri, Feb 24, 2023 at 11:02:14PM -0600, Justin Pryzby wrote:
 I have some fixes (attached) and questions while polishing the patch for
 zstd compression.  The fixes are small and could be integrated with the
 patch for zstd, but could be applied independently.
>>>
>>> One more - WriteDataToArchiveGzip() says:
>>>
>>> +   if (cs->compression_spec.level == 0)
>>> +   pg_fatal("requested to compress the archive yet no level was 
>>> specified");
>>>
>>> That was added at e9960732a.  
>>>
>>> But if you specify gzip:0, the compression level is already enforced by
>>> validate_compress_specification(), before hitting gzip.c:
>>>
>>> | pg_dump: error: invalid compression specification: compression algorithm 
>>> "gzip" expects a compression level between 1 and 9 (default at -1)
>>>
>>> 5e73a6048 intended that to work as before, and you *can* specify -Z0:
>>>
>>> The change is backward-compatible, hence specifying only an integer
>>> leads to no compression for a level of 0 and gzip compression when the
>>> level is greater than 0.
>>>
>>> $ time ./src/bin/pg_dump/pg_dump -h /tmp regression -t int8_tbl -Fp 
>>> --compress 0 |file -
>>> /dev/stdin: ASCII text
>>
>> FWIW I agree we should make this backwards-compatible - accept "0" and
>> treat it as no compression.
>>
>> Georgios, can you prepare a patch doing that?
> 
> I think maybe Tomas misunderstood.  What I was trying to say is that -Z
> 0 *is* accepted to mean no compression.  This part wasn't quoted, but I
> said:
> 

Ah, I see. Well, I also tried but with "-Z gzip:0" (and not -Z 0), and
that does fail:

  error: invalid compression specification: compression algorithm "gzip"
  expects a compression level between 1 and 9 (default at -1)

It's a bit weird these two cases behave differently, when both translate
to the same default compression method (gzip).

>> Right now, I think that pg_fatal in gzip.c is dead code - that was first
>> added in the patch version sent on 21 Dec 2022.
> 
> If you run the diff command that I've been talking about, you'll see
> that InitCompressorZlib was almost unchanged - e9960732 is essentially a
> refactoring.  I don't think it's desirable to add a pg_fatal() in a
> function that's otherwise nearly-unchanged.  The fact that it's
> nearly-unchanged is a good thing: it simplifies reading of what changed.
> If someone wants to add a pg_fatal() in that code path, it'd be better
> done in its own commit, with a separate message explaining the change.
> 
> If you insist on changing anything here, you might add an assertion (as
> you said earlier) along with a comment like
> /* -Z 0 uses the "None" compressor rather than zlib with no compression */
> 

Yeah, a comment would be helpful.

Also, after thinking about it a bit more maybe having the unreachable
pg_fatal() is not a good thing, as it will just confuse people (I'd
certainly assume having such check means there's a way in which case it
might trigger.). Maybe an assert would be better?


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Add LZ4 compression in pg_dump

2023-03-02 Thread Justin Pryzby
On Wed, Mar 01, 2023 at 05:20:05PM +0100, Tomas Vondra wrote:
> On 2/25/23 15:05, Justin Pryzby wrote:
> > On Fri, Feb 24, 2023 at 11:02:14PM -0600, Justin Pryzby wrote:
> >> I have some fixes (attached) and questions while polishing the patch for
> >> zstd compression.  The fixes are small and could be integrated with the
> >> patch for zstd, but could be applied independently.
> > 
> > One more - WriteDataToArchiveGzip() says:
> > 
> > +   if (cs->compression_spec.level == 0)
> > +   pg_fatal("requested to compress the archive yet no level was 
> > specified");
> > 
> > That was added at e9960732a.  
> > 
> > But if you specify gzip:0, the compression level is already enforced by
> > validate_compress_specification(), before hitting gzip.c:
> > 
> > | pg_dump: error: invalid compression specification: compression algorithm 
> > "gzip" expects a compression level between 1 and 9 (default at -1)
> > 
> > 5e73a6048 intended that to work as before, and you *can* specify -Z0:
> > 
> > The change is backward-compatible, hence specifying only an integer
> > leads to no compression for a level of 0 and gzip compression when the
> > level is greater than 0.
> > 
> > $ time ./src/bin/pg_dump/pg_dump -h /tmp regression -t int8_tbl -Fp 
> > --compress 0 |file -
> > /dev/stdin: ASCII text
> 
> FWIW I agree we should make this backwards-compatible - accept "0" and
> treat it as no compression.
> 
> Georgios, can you prepare a patch doing that?

I think maybe Tomas misunderstood.  What I was trying to say is that -Z
0 *is* accepted to mean no compression.  This part wasn't quoted, but I
said:

> Right now, I think that pg_fatal in gzip.c is dead code - that was first
> added in the patch version sent on 21 Dec 2022.

If you run the diff command that I've been talking about, you'll see
that InitCompressorZlib was almost unchanged - e9960732 is essentially a
refactoring.  I don't think it's desirable to add a pg_fatal() in a
function that's otherwise nearly-unchanged.  The fact that it's
nearly-unchanged is a good thing: it simplifies reading of what changed.
If someone wants to add a pg_fatal() in that code path, it'd be better
done in its own commit, with a separate message explaining the change.

If you insist on changing anything here, you might add an assertion (as
you said earlier) along with a comment like
/* -Z 0 uses the "None" compressor rather than zlib with no compression */

-- 
Justin




Re: Add LZ4 compression in pg_dump

2023-03-02 Thread gkokolatos





--- Original Message ---
On Wednesday, March 1st, 2023 at 5:20 PM, Tomas Vondra 
 wrote:


> 
> 
> 
> 
> On 2/25/23 15:05, Justin Pryzby wrote:
> 
> > On Fri, Feb 24, 2023 at 11:02:14PM -0600, Justin Pryzby wrote:
> > 
> > > I have some fixes (attached) and questions while polishing the patch for
> > > zstd compression. The fixes are small and could be integrated with the
> > > patch for zstd, but could be applied independently.
> > 
> > One more - WriteDataToArchiveGzip() says:
> > 
> > + if (cs->compression_spec.level == 0)
> > + pg_fatal("requested to compress the archive yet no level was specified");
> > 
> > That was added at e9960732a.
> > 
> > But if you specify gzip:0, the compression level is already enforced by
> > validate_compress_specification(), before hitting gzip.c:
> > 
> > | pg_dump: error: invalid compression specification: compression algorithm 
> > "gzip" expects a compression level between 1 and 9 (default at -1)
> > 
> > 5e73a6048 intended that to work as before, and you can specify -Z0:
> > 
> > The change is backward-compatible, hence specifying only an integer
> > leads to no compression for a level of 0 and gzip compression when the
> > level is greater than 0.
> > 
> > $ time ./src/bin/pg_dump/pg_dump -h /tmp regression -t int8_tbl -Fp 
> > --compress 0 |file -
> > /dev/stdin: ASCII text
> 
> 
> FWIW I agree we should make this backwards-compatible - accept "0" and
> treat it as no compression.
> 
> Georgios, can you prepare a patch doing that?

Please find a patch attached. However I am a bit at a loss, the backwards
compatible behaviour has not changed. Passing -Z0/--compress=0 does produce
a non compressed output. So I am not really certain as to what broke and
needs fixing.

What commit 5e73a6048 did fail to do, is test the backwards compatible
behaviour. The attached amends it.

Cheers,
//Georgios

> 
> 
> regards
> --
> Tomas Vondra
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL CompanyFrom 99c2da94ecbeacf997270dd26fc5c0a63ffcedd4 Mon Sep 17 00:00:00 2001
From: Georgios Kokolatos 
Date: Thu, 2 Mar 2023 16:10:03 +
Subject: [PATCH vX] Add test for backwards compatible -Z0 option in pg_dump

Commit 5e73a6048 expanded pg_dump with the ability to use compression
specifications. A commonly shared code which lets the user control in an
extended way the method, level, and other details of a desired compression.

Prior to this commit, pg_dump could only accept an integer for the
-Z/--compress option. An integer value of 0 had the special meaning of non
compression. Commit 5e73a6048 respected and maintained this behaviour for
backwards compatibility.

However no tests covered this scenario. The current commit adds coverage for
this case.
---
 src/bin/pg_dump/t/002_pg_dump.pl | 13 +
 1 file changed, 13 insertions(+)

diff --git a/src/bin/pg_dump/t/002_pg_dump.pl b/src/bin/pg_dump/t/002_pg_dump.pl
index 7b5a6e190c..ec7aaab884 100644
--- a/src/bin/pg_dump/t/002_pg_dump.pl
+++ b/src/bin/pg_dump/t/002_pg_dump.pl
@@ -76,6 +76,19 @@ my %pgdump_runs = (
 		],
 	},
 
+	# Verify that the backwards compatible option -Z0 produces
+	# non compressed output
+	compression_none_plain => {
+		test_key   => 'compression',
+		# Enforce this test when compile option is available
+		compile_option => 'gzip',
+		dump_cmd   => [
+			'pg_dump',  '--format=plain',
+			'-Z0', "--file=$tempdir/compression_none_plain.sql",
+			'postgres',
+		],
+	},
+
 	# Do not use --no-sync to give test coverage for data sync.
 	compression_gzip_custom => {
 		test_key   => 'compression',
-- 
2.34.1



Re: Add LZ4 compression in pg_dump

2023-03-01 Thread Justin Pryzby
On Wed, Mar 01, 2023 at 01:39:14PM +, gkokola...@pm.me wrote:
> On Wednesday, March 1st, 2023 at 12:58 AM, Justin Pryzby 
>  wrote:
> 
> > The current function order avoids 3 lines of declarations, but it's
> > obviously pretty useful to be able to run that diff command. I already
> > argued for not calling the functions "Gzip" on the grounds that the name
> > was inaccurate.
> 
> I have no idea why we are back on the naming issue. I stand by the name
> because in my humble opinion helps the code reader. There is a certain
> uniformity when the compression_spec.algorithm and the compressor
> functions match as the following code sample shows.

I mentioned that it's because this allows usefully running "diff"
against the previous commits.

> if (compression_spec.algorithm == PG_COMPRESSION_NONE) 
> InitCompressorNone(cs, compression_spec);
> else if (compression_spec.algorithm == PG_COMPRESSION_GZIP)
> InitCompressorGzip(cs, compression_spec);
> else if (compression_spec.algorithm == PG_COMPRESSION_LZ4)
> InitCompressorLZ4(cs, compression_spec);
>  
> When the reader wants to see what happens when the PG_COMPRESSION_XXX
> is set, has to simply search for the XXX part. I think that this is
> justification enough for the use of the names.

You're right about that.

But (with the exception of InitCompressorGzip), I'm referring to the
naming of internal functions, static to gzip.c, so renaming can't be
said to cause a loss of clarity.

> > I'd want to create an empty large object in src/test/sql/largeobject.sql
> > to exercise this tested during pgupgrade. But unfortunately that
> > doesn't use -Fc, so this isn't hit. Empty input is an important enough
> > test case to justify a tap test, if there's no better way.
> 
> Please find in the attached a test case that exercises this codepath.

Thanks for writing it.

This patch could be an opportunity to improve the "diff" output, without
renaming anything.

The old order of functions was:
-InitCompressorZlib
-EndCompressorZlib
-DeflateCompressorZlib
-WriteDataToArchiveZlib
-ReadDataFromArchiveZlib

If you put DeflateCompressorEnd immediately after DeflateCompressorInit,
diff works nicely.  You'll have to add at least one declaration, which
seems very worth it.

-- 
Justin




Re: Add LZ4 compression in pg_dump

2023-03-01 Thread Tomas Vondra



On 2/27/23 05:49, Justin Pryzby wrote:
> On Sat, Feb 25, 2023 at 08:05:53AM -0600, Justin Pryzby wrote:
>> On Fri, Feb 24, 2023 at 11:02:14PM -0600, Justin Pryzby wrote:
>>> I have some fixes (attached) and questions while polishing the patch for
>>> zstd compression.  The fixes are small and could be integrated with the
>>> patch for zstd, but could be applied independently.
>>
>> One more - WriteDataToArchiveGzip() says:
> 
> One more again.
> 
> The LZ4 path is using non-streaming mode, which compresses each block
> without persistent state, giving poor compression for -Fc compared with
> -Fp.  If the data is highly compressible, the difference can be orders
> of magnitude.
> 
> $ ./src/bin/pg_dump/pg_dump -h /tmp postgres -Z lz4 -Fp |wc -c
> 12351763
> $ ./src/bin/pg_dump/pg_dump -h /tmp postgres -Z lz4 -Fc |wc -c
> 21890708
> 
> That's not true for gzip:
> 
> $ ./src/bin/pg_dump/pg_dump -h /tmp postgres -t t1 -Z gzip -Fc |wc -c
> 2118869
> $ ./src/bin/pg_dump/pg_dump -h /tmp postgres -t t1 -Z gzip -Fp |wc -c
> 2115832
> 
> The function ought to at least use streaming mode, so each block/row
> isn't compressioned in isolation.  003 is a simple patch to use
> streaming mode, which improves the -Fc case:
> 
> $ ./src/bin/pg_dump/pg_dump -h /tmp postgres -Z lz4 -Fc |wc -c
> 15178283
> 
> However, that still flushes the compression buffer, writing a block
> header, for every row.  With a single-column table, pg_dump -Fc -Z lz4
> still outputs ~10% *more* data than with no compression at all.  And
> that's for compressible data.
> 
> $ ./src/bin/pg_dump/pg_dump -h /tmp postgres -t t1 -Fc -Z lz4 |wc -c
> 12890296
> $ ./src/bin/pg_dump/pg_dump -h /tmp postgres -t t1 -Fc -Z none |wc -c
> 11890296
> 
> I think this should use the LZ4F API with frames, which are buffered to
> avoid outputting a header for every single row.  The LZ4F format isn't
> compatible with the LZ4 format, so (unlike changing to the streaming
> API) that's not something we can change in a bugfix release.  I consider
> this an Opened Item.
> 
> With the LZ4F API in 004, -Fp and -Fc are essentially the same size
> (like gzip).  (Oh, and the output is three times smaller, too.)
> 
> $ ./src/bin/pg_dump/pg_dump -h /tmp postgres -t t1 -Z lz4 -Fp |wc -c
> 4155448
> $ ./src/bin/pg_dump/pg_dump -h /tmp postgres -t t1 -Z lz4 -Fc |wc -c
> 4156548
> 

Thanks. Those are definitely interesting improvements/optimizations!

I suggest we track them as a separate patch series - please add them to
the CF app (I guess you'll have to add them to 2023-07 at this point,
but we can get them in, I think).


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Add LZ4 compression in pg_dump

2023-03-01 Thread Tomas Vondra



On 2/25/23 15:05, Justin Pryzby wrote:
> On Fri, Feb 24, 2023 at 11:02:14PM -0600, Justin Pryzby wrote:
>> I have some fixes (attached) and questions while polishing the patch for
>> zstd compression.  The fixes are small and could be integrated with the
>> patch for zstd, but could be applied independently.
> 
> One more - WriteDataToArchiveGzip() says:
> 
> +   if (cs->compression_spec.level == 0)
> +   pg_fatal("requested to compress the archive yet no level was 
> specified");
> 
> That was added at e9960732a.  
> 
> But if you specify gzip:0, the compression level is already enforced by
> validate_compress_specification(), before hitting gzip.c:
> 
> | pg_dump: error: invalid compression specification: compression algorithm 
> "gzip" expects a compression level between 1 and 9 (default at -1)
> 
> 5e73a6048 intended that to work as before, and you *can* specify -Z0:
> 
> The change is backward-compatible, hence specifying only an integer
> leads to no compression for a level of 0 and gzip compression when the
> level is greater than 0.
> 
> $ time ./src/bin/pg_dump/pg_dump -h /tmp regression -t int8_tbl -Fp 
> --compress 0 |file -
> /dev/stdin: ASCII text
> 

FWIW I agree we should make this backwards-compatible - accept "0" and
treat it as no compression.

Georgios, can you prepare a patch doing that?


regards
-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Add LZ4 compression in pg_dump

2023-03-01 Thread Tomas Vondra



On 2/27/23 15:56, gkokola...@pm.me wrote:
> 
> 
> 
> 
> 
> --- Original Message ---
> On Saturday, February 25th, 2023 at 3:05 PM, Justin Pryzby 
>  wrote:
> 
> 
>>
>>
>> On Fri, Feb 24, 2023 at 11:02:14PM -0600, Justin Pryzby wrote:
>>
>>> I have some fixes (attached) and questions while polishing the patch for
>>> zstd compression. The fixes are small and could be integrated with the
>>> patch for zstd, but could be applied independently.
> 
> 
> Please find some comments on the rest of the fixes patch that Tomas has not
> commented on.
> 
> can be compressed with the gzip or
> -   lz4tool.
> +   lz4 tools.
> 
> +1
> 
>  The compression method can be set to gzip or
> -lz4 or none for no compression.
> +lz4, or none for no 
> compression.
> 
> I am not a native English speaker. Yet I think that if one adds commas
> in one of the options, then one should add commas to all the options.
> Namely, the aboveis missing a comma between gzip and lz4. However I
> think that not having any commas still works grammatically and
> syntactically.
> 

I pushed a fix with most of these wording changes. As for this comma, I
believe the correct style is

   a, b, or c

At least that's what the other places in the pg_dump.sgml file do.

> -   ($pgdump_runs{$run}->{compile_option} eq 'gzip' && 
> !$supports_gzip) ||
> -   ($pgdump_runs{$run}->{compile_option} eq 'lz4' && 
> !$supports_lz4))
> +   (($pgdump_runs{$run}->{compile_option} eq 'gzip' && 
> !$supports_gzip) ||
> +   ($pgdump_runs{$run}->{compile_option} eq 'lz4' && 
> !$supports_lz4)))
> 

Pushed a fix for this too.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Add LZ4 compression in pg_dump

2023-03-01 Thread Tomas Vondra



On 3/1/23 14:39, gkokola...@pm.me wrote:
> 
> 
> 
> 
> 
> --- Original Message ---
> On Wednesday, March 1st, 2023 at 12:58 AM, Justin Pryzby 
>  wrote:
> 
> 
> 
>> I found that e9960732a broke writing of empty gzip-compressed data,
>> specifically LOs. pg_dump succeeds, but then the restore fails:
>>
>> postgres=# SELECT lo_create(1234);
>> lo_create | 1234
>>
>> $ time ./src/bin/pg_dump/pg_dump -h /tmp -d postgres -Fc 
>> |./src/bin/pg_dump/pg_restore -f /dev/null -v
>> pg_restore: implied data-only restore
>> pg_restore: executing BLOB 1234
>> pg_restore: processing BLOBS
>> pg_restore: restoring large object with OID 1234
>> pg_restore: error: could not uncompress data: (null)
>>
> 
> Thank you for looking. This was an untested case.
> 

Yeah :-(

>> The inline patch below fixes it, but you won't be able to apply it
>> directly, as it's on top of other patches which rename the functions
>> back to "Zlib" and rearranges the functions to their original order, to
>> allow running:
>>
>> git diff --diff-algorithm=minimal -w 
>> e9960732a~:./src/bin/pg_dump/compress_io.c ./src/bin/pg_dump/compress_gzip.c
>>
> 
> Please find a patch attached that can be applied directly.
> 
>> The current function order avoids 3 lines of declarations, but it's
>> obviously pretty useful to be able to run that diff command. I already
>> argued for not calling the functions "Gzip" on the grounds that the name
>> was inaccurate.
> 
> I have no idea why we are back on the naming issue. I stand by the name
> because in my humble opinion helps the code reader. There is a certain
> uniformity when the compression_spec.algorithm and the compressor
> functions match as the following code sample shows.
> 
> if (compression_spec.algorithm == PG_COMPRESSION_NONE) 
> InitCompressorNone(cs, compression_spec);
> else if (compression_spec.algorithm == PG_COMPRESSION_GZIP)
> InitCompressorGzip(cs, compression_spec);
> else if (compression_spec.algorithm == PG_COMPRESSION_LZ4)
> InitCompressorLZ4(cs, compression_spec);
>  
> When the reader wants to see what happens when the PG_COMPRESSION_XXX
> is set, has to simply search for the XXX part. I think that this is
> justification enough for the use of the names.
> 

I don't recall the previous discussion about the naming, but I'm not
sure why would it be inaccurate. We call it 'gzip' pretty much
everywhere, and I agree with Georgios there's it helps to make this
consistent with the PG_COMPRESSION_ stuff.

The one thing that concerned me while reviewing it earlier was that it
might make the backpatcheing harder. But that's mostly irrelevant due to
all the other changes I think.

>>
>> I'd want to create an empty large object in src/test/sql/largeobject.sql
>> to exercise this tested during pgupgrade. But unfortunately that
>> doesn't use -Fc, so this isn't hit. Empty input is an important enough
>> test case to justify a tap test, if there's no better way.
> 
> Please find in the attached a test case that exercises this codepath.
> 

Thanks. That seems correct to me, but I find it somewhat confusing,
because we now have

 DeflateCompressorInit vs. InitCompressorGzip

 DeflateCompressorEnd vs. EndCompressorGzip

 DeflateCompressorData - The name doesn't really say what it does (would
 be better to have a verb in there, I think).

I wonder if we can make this somehow clearer?

Also, InitCompressorGzip says this:

   /*
* If the caller has defined a write function, prepare the necessary
* state. Avoid initializing during the first write call, because End
* may be called without ever writing any data.
*/
if (cs->writeF)
DeflateCompressorInit(cs);

Does it actually make sense to not have writeF defined in some cases?


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Add LZ4 compression in pg_dump

2023-03-01 Thread Tomas Vondra
On 3/1/23 08:24, Michael Paquier wrote:
> On Tue, Feb 28, 2023 at 05:58:34PM -0600, Justin Pryzby wrote:
>> I found that e9960732a broke writing of empty gzip-compressed data,
>> specifically LOs.  pg_dump succeeds, but then the restore fails:
> 
> The number of issues you have been reporting here begins to worries
> me..  How many of them have you found?  Is it right to assume that all
> of them have basically 03d02f5 as oldest origin point?

AFAICS a lot of the issues are more a discussion about wording in a
couple places, whether it's nicer to do A or B, name the functions
differently or what.

I'm aware of three genuine issues that I intend to fix shortly:

1) incorrect "if" condition in a TAP test

2) failure when compressing empty LO (which we had no test for)

3) change in handling "compression level = 0" (which I believe should be
made to behave like before)


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Add LZ4 compression in pg_dump

2023-03-01 Thread gkokolatos





--- Original Message ---
On Wednesday, March 1st, 2023 at 12:58 AM, Justin Pryzby  
wrote:



> I found that e9960732a broke writing of empty gzip-compressed data,
> specifically LOs. pg_dump succeeds, but then the restore fails:
> 
> postgres=# SELECT lo_create(1234);
> lo_create | 1234
> 
> $ time ./src/bin/pg_dump/pg_dump -h /tmp -d postgres -Fc 
> |./src/bin/pg_dump/pg_restore -f /dev/null -v
> pg_restore: implied data-only restore
> pg_restore: executing BLOB 1234
> pg_restore: processing BLOBS
> pg_restore: restoring large object with OID 1234
> pg_restore: error: could not uncompress data: (null)
> 

Thank you for looking. This was an untested case.

> The inline patch below fixes it, but you won't be able to apply it
> directly, as it's on top of other patches which rename the functions
> back to "Zlib" and rearranges the functions to their original order, to
> allow running:
> 
> git diff --diff-algorithm=minimal -w 
> e9960732a~:./src/bin/pg_dump/compress_io.c ./src/bin/pg_dump/compress_gzip.c
> 

Please find a patch attached that can be applied directly.

> The current function order avoids 3 lines of declarations, but it's
> obviously pretty useful to be able to run that diff command. I already
> argued for not calling the functions "Gzip" on the grounds that the name
> was inaccurate.

I have no idea why we are back on the naming issue. I stand by the name
because in my humble opinion helps the code reader. There is a certain
uniformity when the compression_spec.algorithm and the compressor
functions match as the following code sample shows.

if (compression_spec.algorithm == PG_COMPRESSION_NONE) 
InitCompressorNone(cs, compression_spec);
else if (compression_spec.algorithm == PG_COMPRESSION_GZIP)
InitCompressorGzip(cs, compression_spec);
else if (compression_spec.algorithm == PG_COMPRESSION_LZ4)
InitCompressorLZ4(cs, compression_spec);
 
When the reader wants to see what happens when the PG_COMPRESSION_XXX
is set, has to simply search for the XXX part. I think that this is
justification enough for the use of the names.

> 
> I'd want to create an empty large object in src/test/sql/largeobject.sql
> to exercise this tested during pgupgrade. But unfortunately that
> doesn't use -Fc, so this isn't hit. Empty input is an important enough
> test case to justify a tap test, if there's no better way.

Please find in the attached a test case that exercises this codepath.

Cheers,
//GeorgiosFrom 95450f0e7e90f0a1a3cdfc12c760a9520bd2995f Mon Sep 17 00:00:00 2001
From: Georgios Kokolatos 
Date: Wed, 1 Mar 2023 12:42:32 +
Subject: [PATCH vX] Properly gzip compress when no data is available

When creating dumps with the Compressor API, it is possible to only call the
Allocate and End compressor functions without ever writing any data. The gzip
implementation wrongly assumed that the write function would be called and
defered the initialization of the internal compression system for the first
write call. The End call would only finilize the internal compression system if
that was ever initialized.

The problem with that approach is that it violated the expectations of the
internal compression system during decompression.

This commit initializes the internal compression system during the Allocate
call, under the condition that a write function was provided by the caller.
Given that decompression does not need to keep track of any state, the
compressor's private_data member is now populated only during compression.

Tests are added to cover this scenario.

Initial patch by Justin Pruzby.

Reported-by: Justin Pryzby
---
 src/bin/pg_dump/compress_gzip.c  | 118 +--
 src/bin/pg_dump/t/002_pg_dump.pl |  27 ++-
 2 files changed, 91 insertions(+), 54 deletions(-)

diff --git a/src/bin/pg_dump/compress_gzip.c b/src/bin/pg_dump/compress_gzip.c
index 0af65afeb4..f5d32cf059 100644
--- a/src/bin/pg_dump/compress_gzip.c
+++ b/src/bin/pg_dump/compress_gzip.c
@@ -32,9 +32,48 @@ typedef struct GzipCompressorState
 	size_t		outsize;
 } GzipCompressorState;
 
+
 /* Private routines that support gzip compressed data I/O */
 static void
-DeflateCompressorGzip(ArchiveHandle *AH, CompressorState *cs, bool flush)
+DeflateCompressorInit(CompressorState *cs)
+{
+	GzipCompressorState *gzipcs;
+	z_streamp	zp;
+
+	gzipcs = (GzipCompressorState *) pg_malloc0(sizeof(GzipCompressorState));
+	zp = gzipcs->zp = (z_streamp) pg_malloc(sizeof(z_stream));
+	zp->zalloc = Z_NULL;
+	zp->zfree = Z_NULL;
+	zp->opaque = Z_NULL;
+
+	/*
+	 * outsize is the buffer size we tell zlib it can output to.  We
+	 * actually allocate one extra byte because some routines want to
+	 * append a trailing zero byte to the zlib output.
+	 */
+	gzipcs->outbuf = pg_malloc(ZLIB_OUT_SIZE + 1);
+	gzipcs->outsize = ZLIB_OUT_SIZE;
+
+	/*
+	 * A level of zero simply copies the input one block at the time. This
+	 * is probably 

Re: Add LZ4 compression in pg_dump

2023-02-28 Thread Michael Paquier
On Tue, Feb 28, 2023 at 05:58:34PM -0600, Justin Pryzby wrote:
> I found that e9960732a broke writing of empty gzip-compressed data,
> specifically LOs.  pg_dump succeeds, but then the restore fails:

The number of issues you have been reporting here begins to worries
me..  How many of them have you found?  Is it right to assume that all
of them have basically 03d02f5 as oldest origin point?
--
Michael


signature.asc
Description: PGP signature


Re: Add LZ4 compression in pg_dump

2023-02-28 Thread Justin Pryzby
On Thu, Feb 23, 2023 at 09:24:46PM +0100, Tomas Vondra wrote:
> On 2/23/23 16:26, Tomas Vondra wrote:
> > Thanks for v30 with the updated commit messages. I've pushed 0001 after
> > fixing a comment typo and removing (I think) an unnecessary change in an
> > error message.
> > 
> > I'll give the buildfarm a bit of time before pushing 0002 and 0003.
> > 
> 
> I've now pushed 0002 and 0003, after minor tweaks (a couple typos etc.),
> and marked the CF entry as committed. Thanks for the patch!

I found that e9960732a broke writing of empty gzip-compressed data,
specifically LOs.  pg_dump succeeds, but then the restore fails:

postgres=# SELECT lo_create(1234);
lo_create | 1234

$ time ./src/bin/pg_dump/pg_dump -h /tmp -d postgres -Fc 
|./src/bin/pg_dump/pg_restore -f /dev/null -v 
pg_restore: implied data-only restore
pg_restore: executing BLOB 1234
pg_restore: processing BLOBS
pg_restore: restoring large object with OID 1234
pg_restore: error: could not uncompress data: (null)

The inline patch below fixes it, but you won't be able to apply it
directly, as it's on top of other patches which rename the functions
back to "Zlib" and rearranges the functions to their original order, to
allow running:

git diff --diff-algorithm=minimal -w e9960732a~:./src/bin/pg_dump/compress_io.c 
./src/bin/pg_dump/compress_gzip.c

The current function order avoids 3 lines of declarations, but it's
obviously pretty useful to be able to run that diff command.  I already
argued for not calling the functions "Gzip" on the grounds that the name
was inaccurate.

I'd want to create an empty large object in src/test/sql/largeobject.sql
to exercise this tested during pgupgrade.  But unfortunately that
doesn't use -Fc, so this isn't hit.  Empty input is an important enough
test case to justify a tap test, if there's no better way.

diff --git a/src/bin/pg_dump/compress_gzip.c b/src/bin/pg_dump/compress_gzip.c
index f3f5e87c9a8..68f3111b2fe 100644
--- a/src/bin/pg_dump/compress_gzip.c
+++ b/src/bin/pg_dump/compress_gzip.c
@@ -55,6 +55,32 @@ InitCompressorZlib(CompressorState *cs,
gzipcs = (ZlibCompressorState *) 
pg_malloc0(sizeof(ZlibCompressorState));
 
cs->private_data = gzipcs;
+
+   if (cs->writeF)
+   {
+   z_streamp   zp;
+   zp = gzipcs->zp = (z_streamp) pg_malloc0(sizeof(z_stream));
+   zp->zalloc = Z_NULL;
+   zp->zfree = Z_NULL;
+   zp->opaque = Z_NULL;
+
+   /*
+* outsize is the buffer size we tell zlib it can output to.  We
+* actually allocate one extra byte because some routines want 
to append a
+* trailing zero byte to the zlib output.
+*/
+
+   gzipcs->outbuf = pg_malloc(ZLIB_OUT_SIZE + 1);
+   gzipcs->outsize = ZLIB_OUT_SIZE;
+
+   if (deflateInit(gzipcs->zp, cs->compression_spec.level) != Z_OK)
+   pg_fatal("could not initialize compression library: %s",
+   zp->msg);
+
+   /* Just be paranoid - maybe End is called after Start, with no 
Write */
+   zp->next_out = gzipcs->outbuf;
+   zp->avail_out = gzipcs->outsize;
+   }
 }
 
 static void
@@ -63,7 +89,7 @@ EndCompressorZlib(ArchiveHandle *AH, CompressorState *cs)
ZlibCompressorState *gzipcs = (ZlibCompressorState *) cs->private_data;
z_streamp   zp;
 
-   if (gzipcs->zp)
+   if (cs->writeF != NULL)
{
zp = gzipcs->zp;
zp->next_in = NULL;
@@ -131,29 +157,6 @@ WriteDataToArchiveZlib(ArchiveHandle *AH, CompressorState 
*cs,
   const void *data, size_t dLen)
 {
ZlibCompressorState *gzipcs = (ZlibCompressorState *) cs->private_data;
-   z_streamp   zp;
-
-   if (!gzipcs->zp)
-   {
-   zp = gzipcs->zp = (z_streamp) pg_malloc(sizeof(z_stream));
-   zp->zalloc = Z_NULL;
-   zp->zfree = Z_NULL;
-   zp->opaque = Z_NULL;
-
-   /*
-* outsize is the buffer size we tell zlib it can output to.  We
-* actually allocate one extra byte because some routines want 
to
-* append a trailing zero byte to the zlib output.
-*/
-   gzipcs->outbuf = pg_malloc(ZLIB_OUT_SIZE + 1);
-   gzipcs->outsize = ZLIB_OUT_SIZE;
-
-   if (deflateInit(zp, cs->compression_spec.level) != Z_OK)
-   pg_fatal("could not initialize compression library: 
%s", zp->msg);
-
-   zp->next_out = gzipcs->outbuf;
-   zp->avail_out = gzipcs->outsize;
-   }
 
gzipcs->zp->next_in = (void *) unconstify(void *, data);
gzipcs->zp->avail_in = dLen;
>From 1c707279596f3cffde9c97b450dcbef0b6ddbd94 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Sun, 26 Feb 2023 17:12:03 -0600
Subject: 

Re: Add LZ4 compression in pg_dump

2023-02-27 Thread gkokolatos






--- Original Message ---
On Saturday, February 25th, 2023 at 3:05 PM, Justin Pryzby 
 wrote:


> 
> 
> On Fri, Feb 24, 2023 at 11:02:14PM -0600, Justin Pryzby wrote:
> 
> > I have some fixes (attached) and questions while polishing the patch for
> > zstd compression. The fixes are small and could be integrated with the
> > patch for zstd, but could be applied independently.


Please find some comments on the rest of the fixes patch that Tomas has not
commented on.

can be compressed with the gzip or
-   lz4tool.
+   lz4 tools.

+1

 The compression method can be set to gzip or
-lz4 or none for no compression.
+lz4, or none for no compression.

I am not a native English speaker. Yet I think that if one adds commas
in one of the options, then one should add commas to all the options.
Namely, the aboveis missing a comma between gzip and lz4. However I
think that not having any commas still works grammatically and
syntactically.

-   /*
-* A level of zero simply copies the input one block at the 
time. This
-* is probably not what the user wanted when calling this 
interface.
-*/
-   if (cs->compression_spec.level == 0)
-   pg_fatal("requested to compress the archive yet no 
level was specified");


I disagree with change. WriteDataToArchiveGzip() is far away from
what ever the code in pg_dump.c is doing. Any non valid values for
level will emit an error in when the proper gzip/zlib code is
called. A zero value however, will not emit such error. Having the
extra check there is a future proof guarantee in a very low cost.
Furthermore, it quickly informs the reader of the code about that
specific value helping with readability and comprehension.

If any change is required, something for which I vote strongly
against, I would at least recommend to replace it with an
assertion.

- * Initialize a compress file stream. Deffer the compression algorithm
+ * Initialize a compress file stream. Infer the compression algorithm

:+1:

-   # Skip command-level tests for gzip if there is no support for it.
+   # Skip command-level tests for gzip/lz4 if they're not supported.

We will be back at that again soon. Maybe change to:

Skip command-level test for unsupported compression methods

To include everything.


-   ($pgdump_runs{$run}->{compile_option} eq 'gzip' && 
!$supports_gzip) ||
-   ($pgdump_runs{$run}->{compile_option} eq 'lz4' && 
!$supports_lz4))
+   (($pgdump_runs{$run}->{compile_option} eq 'gzip' && 
!$supports_gzip) ||
+   ($pgdump_runs{$run}->{compile_option} eq 'lz4' && 
!$supports_lz4)))

Good catch, :+1:

Cheers,
//Georgios

> --
> Justin




Re: Add LZ4 compression in pg_dump

2023-02-27 Thread gkokolatos






--- Original Message ---
On Sunday, February 26th, 2023 at 3:59 PM, Tomas Vondra 
 wrote:


> 
> 
> 
> 
> On 2/25/23 06:02, Justin Pryzby wrote:
> 
> > I have some fixes (attached) and questions while polishing the patch for
> > zstd compression. The fixes are small and could be integrated with the
> > patch for zstd, but could be applied independently.
> > 
> > - I'm unclear about get_error_func(). That's called in three places
> > from pg_backup_directory.c, after failures from write_func(), to
> > supply an compression-specific error message to pg_fatal(). But it's
> > not being used outside of directory format, nor for errors for other
> > function pointers, or even for all errors in write_func(). Is there
> > some reason why each compression method's write_func() shouldn't call
> > pg_fatal() directly, with its compression-specific message ?
> 
> 
> I think there are a couple more places that might/should call
> get_error_func(). For example ahwrite() in pg_backup_archiver.c now
> simply does
> 
> if (bytes_written != size * nmemb)
> WRITE_ERROR_EXIT;
> 
> but perhaps it should call get_error_func() too. There are probably
> other places that call write_func() and should use get_error_func().

Agreed, calling get_error_func() would be preferable to a fatal error. It
should be the caller of the api who decides how to proceed.

> 
> > - I still think supports_compression() should be renamed, or made into a
> > static function in the necessary file. The main reason is that it's
> > more clear what it indicates - whether compression is "implemented by
> > pgdump" and not whether compression is "supported by this postgres
> > build". It also seems possible that we'd want to add a function
> > called something like supports_compression(), indicating whether the
> > algorithm is supported by the current build. It'd be better if pgdump
> > didn't subjugate that name.
> 
> 
> If we choose to rename this to have pgdump_ prefix, fine with me. But I
> don't think there's a realistic chance of conflict, as it's restricted
> to pgdump header etc. And it's not part of an API, so I guess we could
> rename that in the future if needed.

Agreed, it is very unrealistic that one will include that header file anywhere
but within pg_dump. Also. I think that adding a prefix, "pgdump", "pg_dump",
or similar does not add value and subtracts readability. 

> 
> > - Finally, the "Nothing to do in the default case" comment comes from
> > Michael's commit 5e73a6048:
> > 
> > + /*
> > + * Custom and directory formats are compressed by default with gzip when
> > + * available, not the others.
> > + /
> > + if ((archiveFormat == archCustom || archiveFormat == archDirectory) &&
> > + !user_compression_defined)
> > {
> > #ifdef HAVE_LIBZ
> > - if (archiveFormat == archCustom || archiveFormat == archDirectory)
> > - compressLevel = Z_DEFAULT_COMPRESSION;
> > - else
> > + parse_compress_specification(PG_COMPRESSION_GZIP, NULL,
> > + _spec);
> > +#else
> > + / Nothing to do in the default case */
> > #endif
> > - compressLevel = 0;
> > }
> > 
> > As the comment says: for -Fc and -Fd, the compression is set to zlib, if
> > enabled, and when not otherwise specified by the user.
> > 
> > Before 5e73a6048, this set compressLevel=0 for -Fp and -Ft, and when
> > zlib was unavailable.
> > 
> > But I'm not sure why there's now an empty "#else". I also don't know
> > what "the default case" refers to.
> > 
> > Maybe the best thing here is to move the preprocessor #if, since it's no
> > longer in the middle of a runtime conditional:
> > 
> > #ifdef HAVE_LIBZ
> > + if ((archiveFormat == archCustom || archiveFormat == archDirectory) &&
> > + !user_compression_defined)
> > + parse_compress_specification(PG_COMPRESSION_GZIP, NULL,
> > + _spec);
> > #endif
> > 
> > ...but that elicits a warning about "variable set but not used"...
> 
> 
> Not sure, I need to think about this a bit.

Not having warnings is preferable, isn't it? I can understand the confusion
on the message though. Maybe a phrasing like:
/* Nothing to do for the default case when LIBZ is not available */
is easier to understand.

Cheers,
//Georgios

> 
> regards
> 
> --
> Tomas Vondra
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company




Re: Add LZ4 compression in pg_dump

2023-02-26 Thread Justin Pryzby
On Sat, Feb 25, 2023 at 08:05:53AM -0600, Justin Pryzby wrote:
> On Fri, Feb 24, 2023 at 11:02:14PM -0600, Justin Pryzby wrote:
> > I have some fixes (attached) and questions while polishing the patch for
> > zstd compression.  The fixes are small and could be integrated with the
> > patch for zstd, but could be applied independently.
> 
> One more - WriteDataToArchiveGzip() says:

One more again.

The LZ4 path is using non-streaming mode, which compresses each block
without persistent state, giving poor compression for -Fc compared with
-Fp.  If the data is highly compressible, the difference can be orders
of magnitude.

$ ./src/bin/pg_dump/pg_dump -h /tmp postgres -Z lz4 -Fp |wc -c
12351763
$ ./src/bin/pg_dump/pg_dump -h /tmp postgres -Z lz4 -Fc |wc -c
21890708

That's not true for gzip:

$ ./src/bin/pg_dump/pg_dump -h /tmp postgres -t t1 -Z gzip -Fc |wc -c
2118869
$ ./src/bin/pg_dump/pg_dump -h /tmp postgres -t t1 -Z gzip -Fp |wc -c
2115832

The function ought to at least use streaming mode, so each block/row
isn't compressioned in isolation.  003 is a simple patch to use
streaming mode, which improves the -Fc case:

$ ./src/bin/pg_dump/pg_dump -h /tmp postgres -Z lz4 -Fc |wc -c
15178283

However, that still flushes the compression buffer, writing a block
header, for every row.  With a single-column table, pg_dump -Fc -Z lz4
still outputs ~10% *more* data than with no compression at all.  And
that's for compressible data.

$ ./src/bin/pg_dump/pg_dump -h /tmp postgres -t t1 -Fc -Z lz4 |wc -c
12890296
$ ./src/bin/pg_dump/pg_dump -h /tmp postgres -t t1 -Fc -Z none |wc -c
11890296

I think this should use the LZ4F API with frames, which are buffered to
avoid outputting a header for every single row.  The LZ4F format isn't
compatible with the LZ4 format, so (unlike changing to the streaming
API) that's not something we can change in a bugfix release.  I consider
this an Opened Item.

With the LZ4F API in 004, -Fp and -Fc are essentially the same size
(like gzip).  (Oh, and the output is three times smaller, too.)

$ ./src/bin/pg_dump/pg_dump -h /tmp postgres -t t1 -Z lz4 -Fp |wc -c
4155448
$ ./src/bin/pg_dump/pg_dump -h /tmp postgres -t t1 -Z lz4 -Fc |wc -c
4156548

-- 
Justin
>From 3a980f956bf314fb161fbf0a76f62ed0c2c35bfe Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Fri, 24 Feb 2023 14:07:09 -0600
Subject: [PATCH 1/4] f!fixes for LZ4

---
 src/bin/pg_dump/compress_gzip.c |  8 
 src/bin/pg_dump/compress_io.h   |  2 +-
 src/bin/pg_dump/compress_lz4.c  | 12 
 src/bin/pg_dump/pg_dump.c   |  2 --
 4 files changed, 5 insertions(+), 19 deletions(-)

diff --git a/src/bin/pg_dump/compress_gzip.c b/src/bin/pg_dump/compress_gzip.c
index 0af65afeb4e..52f41c2e58c 100644
--- a/src/bin/pg_dump/compress_gzip.c
+++ b/src/bin/pg_dump/compress_gzip.c
@@ -123,17 +123,9 @@ WriteDataToArchiveGzip(ArchiveHandle *AH, CompressorState *cs,
 		gzipcs->outbuf = pg_malloc(ZLIB_OUT_SIZE + 1);
 		gzipcs->outsize = ZLIB_OUT_SIZE;
 
-		/*
-		 * A level of zero simply copies the input one block at the time. This
-		 * is probably not what the user wanted when calling this interface.
-		 */
-		if (cs->compression_spec.level == 0)
-			pg_fatal("requested to compress the archive yet no level was specified");
-
 		if (deflateInit(zp, cs->compression_spec.level) != Z_OK)
 			pg_fatal("could not initialize compression library: %s", zp->msg);
 
-		/* Just be paranoid - maybe End is called after Start, with no Write */
 		zp->next_out = gzipcs->outbuf;
 		zp->avail_out = gzipcs->outsize;
 	}
diff --git a/src/bin/pg_dump/compress_io.h b/src/bin/pg_dump/compress_io.h
index bbde2693915..cdb15951ea9 100644
--- a/src/bin/pg_dump/compress_io.h
+++ b/src/bin/pg_dump/compress_io.h
@@ -172,7 +172,7 @@ struct CompressFileHandle
 extern CompressFileHandle *InitCompressFileHandle(const pg_compress_specification compression_spec);
 
 /*
- * Initialize a compress file stream. Deffer the compression algorithm
+ * Initialize a compress file stream. Infer the compression algorithm
  * from 'path', either by examining its suffix or by appending the supported
  * suffixes in 'path'.
  */
diff --git a/src/bin/pg_dump/compress_lz4.c b/src/bin/pg_dump/compress_lz4.c
index fe1014e6e77..7dacfeae469 100644
--- a/src/bin/pg_dump/compress_lz4.c
+++ b/src/bin/pg_dump/compress_lz4.c
@@ -105,12 +105,8 @@ EndCompressorLZ4(ArchiveHandle *AH, CompressorState *cs)
 	LZ4CompressorState *LZ4cs;
 
 	LZ4cs = (LZ4CompressorState *) cs->private_data;
-	if (LZ4cs)
-	{
-		pg_free(LZ4cs->outbuf);
-		pg_free(LZ4cs);
-		cs->private_data = NULL;
-	}
+	pg_free(LZ4cs->outbuf);
+	pg_free(LZ4cs);
 }
 
 
@@ -161,8 +157,8 @@ typedef struct LZ4File
 } LZ4File;
 
 /*
- * LZ4 equivalent to feof() or gzeof(). The end of file is reached if there
- * is no decompressed output in the overflow buffer and the end of the file
+ * LZ4 equivalent to feof() or gzeof().  Return true iff there is no
+ * decompressed output in the overflow buffer and the end of the backing file
  * 

Re: Add LZ4 compression in pg_dump

2023-02-26 Thread Tomas Vondra



On 2/25/23 06:02, Justin Pryzby wrote:
> I have some fixes (attached) and questions while polishing the patch for
> zstd compression.  The fixes are small and could be integrated with the
> patch for zstd, but could be applied independently.
> 
> - I'm unclear about get_error_func().  That's called in three places
>   from pg_backup_directory.c, after failures from write_func(), to
>   supply an compression-specific error message to pg_fatal().  But it's
>   not being used outside of directory format, nor for errors for other
>   function pointers, or even for all errors in write_func().  Is there
>   some reason why each compression method's write_func() shouldn't call
>   pg_fatal() directly, with its compression-specific message ?
> 

I think there are a couple more places that might/should call
get_error_func(). For example ahwrite() in pg_backup_archiver.c now
simply does

if (bytes_written != size * nmemb)
WRITE_ERROR_EXIT;

but perhaps it should call get_error_func() too. There are probably
other places that call write_func() and should use get_error_func().

> - I still think supports_compression() should be renamed, or made into a
>   static function in the necessary file.  The main reason is that it's
>   more clear what it indicates - whether compression is "implemented by
>   pgdump" and not whether compression is "supported by this postgres
>   build".  It also seems possible that we'd want to add a function
>   called something like supports_compression(), indicating whether the
>   algorithm is supported by the current build.  It'd be better if pgdump
>   didn't subjugate that name.
> 

If we choose to rename this to have pgdump_ prefix, fine with me. But I
don't think there's a realistic chance of conflict, as it's restricted
to pgdump header etc. And it's not part of an API, so I guess we could
rename that in the future if needed.

> - Finally, the "Nothing to do in the default case" comment comes from
>   Michael's commit 5e73a6048:
> 
> +   /*
> +* Custom and directory formats are compressed by default with gzip 
> when
> +* available, not the others.
> +*/
> +   if ((archiveFormat == archCustom || archiveFormat == archDirectory) &&
> +   !user_compression_defined)
> {
>  #ifdef HAVE_LIBZ
> -   if (archiveFormat == archCustom || archiveFormat == 
> archDirectory)
> -   compressLevel = Z_DEFAULT_COMPRESSION;
> -   else
> +   parse_compress_specification(PG_COMPRESSION_GZIP, NULL,
> +
> _spec);
> +#else
> +   /* Nothing to do in the default case */
>  #endif
> -   compressLevel = 0;
> }
> 
> 
> As the comment says: for -Fc and -Fd, the compression is set to zlib, if
> enabled, and when not otherwise specified by the user.
> 
> Before 5e73a6048, this set compressLevel=0 for -Fp and -Ft, *and* when
> zlib was unavailable.
> 
> But I'm not sure why there's now an empty "#else".  I also don't know
> what "the default case" refers to.
> 
> Maybe the best thing here is to move the preprocessor #if, since it's no
> longer in the middle of a runtime conditional:
> 
>  #ifdef HAVE_LIBZ
> +   if ((archiveFormat == archCustom || archiveFormat == archDirectory) &&
> +   !user_compression_defined)
> +   parse_compress_specification(PG_COMPRESSION_GZIP, NULL,
> +_spec);
>  #endif
> 
> ...but that elicits a warning about "variable set but not used"...
> 

Not sure, I need to think about this a bit.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Add LZ4 compression in pg_dump

2023-02-25 Thread Justin Pryzby
On Fri, Feb 24, 2023 at 11:02:14PM -0600, Justin Pryzby wrote:
> I have some fixes (attached) and questions while polishing the patch for
> zstd compression.  The fixes are small and could be integrated with the
> patch for zstd, but could be applied independently.

One more - WriteDataToArchiveGzip() says:

+   if (cs->compression_spec.level == 0)
+   pg_fatal("requested to compress the archive yet no level was 
specified");

That was added at e9960732a.  

But if you specify gzip:0, the compression level is already enforced by
validate_compress_specification(), before hitting gzip.c:

| pg_dump: error: invalid compression specification: compression algorithm 
"gzip" expects a compression level between 1 and 9 (default at -1)

5e73a6048 intended that to work as before, and you *can* specify -Z0:

The change is backward-compatible, hence specifying only an integer
leads to no compression for a level of 0 and gzip compression when the
level is greater than 0.

$ time ./src/bin/pg_dump/pg_dump -h /tmp regression -t int8_tbl -Fp 
--compress 0 |file -
/dev/stdin: ASCII text

Right now, I think that pg_fatal in gzip.c is dead code - that was first
added in the patch version sent on 21 Dec 2022.

-- 
Justin
>From 07b446803ec89ccd0954583640f314fa7f77048f Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Fri, 24 Feb 2023 14:07:09 -0600
Subject: [PATCH] f!fixes for LZ4

---
 doc/src/sgml/ref/pg_dump.sgml| 4 ++--
 src/bin/pg_dump/compress_gzip.c  | 7 ---
 src/bin/pg_dump/compress_io.c| 2 +-
 src/bin/pg_dump/compress_io.h| 4 ++--
 src/bin/pg_dump/compress_lz4.c   | 4 ++--
 src/bin/pg_dump/pg_backup_archiver.c | 4 ++--
 src/bin/pg_dump/pg_dump.c| 2 --
 src/bin/pg_dump/t/002_pg_dump.pl | 6 +++---
 8 files changed, 12 insertions(+), 21 deletions(-)

diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml
index 49d218905fb..6fbe49f7ede 100644
--- a/doc/src/sgml/ref/pg_dump.sgml
+++ b/doc/src/sgml/ref/pg_dump.sgml
@@ -331,7 +331,7 @@ PostgreSQL documentation
can read. A directory format archive can be manipulated with
standard Unix tools; for example, files in an uncompressed archive
can be compressed with the gzip or
-   lz4tool.
+   lz4 tools.
This format is compressed by default using gzip
and also supports parallel dumps.
   
@@ -655,7 +655,7 @@ PostgreSQL documentation

 Specify the compression method and/or the compression level to use.
 The compression method can be set to gzip or
-lz4 or none for no compression.
+lz4, or none for no compression.
 A compression detail string can optionally be specified.  If the
 detail string is an integer, it specifies the compression level.
 Otherwise, it should be a comma-separated list of items, each of the
diff --git a/src/bin/pg_dump/compress_gzip.c b/src/bin/pg_dump/compress_gzip.c
index 0af65afeb4e..af68fd27a12 100644
--- a/src/bin/pg_dump/compress_gzip.c
+++ b/src/bin/pg_dump/compress_gzip.c
@@ -123,13 +123,6 @@ WriteDataToArchiveGzip(ArchiveHandle *AH, CompressorState *cs,
 		gzipcs->outbuf = pg_malloc(ZLIB_OUT_SIZE + 1);
 		gzipcs->outsize = ZLIB_OUT_SIZE;
 
-		/*
-		 * A level of zero simply copies the input one block at the time. This
-		 * is probably not what the user wanted when calling this interface.
-		 */
-		if (cs->compression_spec.level == 0)
-			pg_fatal("requested to compress the archive yet no level was specified");
-
 		if (deflateInit(zp, cs->compression_spec.level) != Z_OK)
 			pg_fatal("could not initialize compression library: %s", zp->msg);
 
diff --git a/src/bin/pg_dump/compress_io.c b/src/bin/pg_dump/compress_io.c
index ce06f1eac9c..9239dbb2755 100644
--- a/src/bin/pg_dump/compress_io.c
+++ b/src/bin/pg_dump/compress_io.c
@@ -83,7 +83,7 @@
  * used by the caller in an error message.
  */
 char *
-supports_compression(const pg_compress_specification compression_spec)
+pgdump_supports_compression(const pg_compress_specification compression_spec)
 {
 	const pg_compress_algorithm	algorithm = compression_spec.algorithm;
 	bool		supported = false;
diff --git a/src/bin/pg_dump/compress_io.h b/src/bin/pg_dump/compress_io.h
index bbde2693915..46815fa2ebe 100644
--- a/src/bin/pg_dump/compress_io.h
+++ b/src/bin/pg_dump/compress_io.h
@@ -21,7 +21,7 @@
 #define ZLIB_OUT_SIZE	4096
 #define ZLIB_IN_SIZE	4096
 
-extern char *supports_compression(const pg_compress_specification compression_spec);
+extern char *pgdump_supports_compression(const pg_compress_specification compression_spec);
 
 /*
  * Prototype for callback function used in writeData()
@@ -172,7 +172,7 @@ struct CompressFileHandle
 extern CompressFileHandle *InitCompressFileHandle(const pg_compress_specification compression_spec);
 
 /*
- * Initialize a compress file stream. Deffer the compression algorithm
+ * Initialize a compress file stream. 

Re: Add LZ4 compression in pg_dump

2023-02-24 Thread Justin Pryzby
I have some fixes (attached) and questions while polishing the patch for
zstd compression.  The fixes are small and could be integrated with the
patch for zstd, but could be applied independently.

- I'm unclear about get_error_func().  That's called in three places
  from pg_backup_directory.c, after failures from write_func(), to
  supply an compression-specific error message to pg_fatal().  But it's
  not being used outside of directory format, nor for errors for other
  function pointers, or even for all errors in write_func().  Is there
  some reason why each compression method's write_func() shouldn't call
  pg_fatal() directly, with its compression-specific message ?

- I still think supports_compression() should be renamed, or made into a
  static function in the necessary file.  The main reason is that it's
  more clear what it indicates - whether compression is "implemented by
  pgdump" and not whether compression is "supported by this postgres
  build".  It also seems possible that we'd want to add a function
  called something like supports_compression(), indicating whether the
  algorithm is supported by the current build.  It'd be better if pgdump
  didn't subjugate that name.

- Finally, the "Nothing to do in the default case" comment comes from
  Michael's commit 5e73a6048:

+   /*
+* Custom and directory formats are compressed by default with gzip when
+* available, not the others.
+*/
+   if ((archiveFormat == archCustom || archiveFormat == archDirectory) &&
+   !user_compression_defined)
{
 #ifdef HAVE_LIBZ
-   if (archiveFormat == archCustom || archiveFormat == 
archDirectory)
-   compressLevel = Z_DEFAULT_COMPRESSION;
-   else
+   parse_compress_specification(PG_COMPRESSION_GZIP, NULL,
+
_spec);
+#else
+   /* Nothing to do in the default case */
 #endif
-   compressLevel = 0;
}


As the comment says: for -Fc and -Fd, the compression is set to zlib, if
enabled, and when not otherwise specified by the user.

Before 5e73a6048, this set compressLevel=0 for -Fp and -Ft, *and* when
zlib was unavailable.

But I'm not sure why there's now an empty "#else".  I also don't know
what "the default case" refers to.

Maybe the best thing here is to move the preprocessor #if, since it's no
longer in the middle of a runtime conditional:

 #ifdef HAVE_LIBZ
+   if ((archiveFormat == archCustom || archiveFormat == archDirectory) &&
+   !user_compression_defined)
+   parse_compress_specification(PG_COMPRESSION_GZIP, NULL,
+_spec);
 #endif

...but that elicits a warning about "variable set but not used"...

-- 
Justin
>From e31901414a8509317297972d1033c2e08629d903 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Fri, 24 Feb 2023 14:07:09 -0600
Subject: [PATCH] f!fixes for LZ4

---
 doc/src/sgml/ref/pg_dump.sgml| 4 ++--
 src/bin/pg_dump/compress_io.c| 2 +-
 src/bin/pg_dump/compress_io.h| 4 ++--
 src/bin/pg_dump/compress_lz4.c   | 4 ++--
 src/bin/pg_dump/pg_backup_archiver.c | 4 ++--
 src/bin/pg_dump/pg_dump.c| 2 --
 src/bin/pg_dump/t/002_pg_dump.pl | 6 +++---
 7 files changed, 12 insertions(+), 14 deletions(-)

diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml
index 49d218905fb..6fbe49f7ede 100644
--- a/doc/src/sgml/ref/pg_dump.sgml
+++ b/doc/src/sgml/ref/pg_dump.sgml
@@ -331,7 +331,7 @@ PostgreSQL documentation
can read. A directory format archive can be manipulated with
standard Unix tools; for example, files in an uncompressed archive
can be compressed with the gzip or
-   lz4tool.
+   lz4 tools.
This format is compressed by default using gzip
and also supports parallel dumps.
   
@@ -655,7 +655,7 @@ PostgreSQL documentation

 Specify the compression method and/or the compression level to use.
 The compression method can be set to gzip or
-lz4 or none for no compression.
+lz4, or none for no compression.
 A compression detail string can optionally be specified.  If the
 detail string is an integer, it specifies the compression level.
 Otherwise, it should be a comma-separated list of items, each of the
diff --git a/src/bin/pg_dump/compress_io.c b/src/bin/pg_dump/compress_io.c
index ce06f1eac9c..9239dbb2755 100644
--- a/src/bin/pg_dump/compress_io.c
+++ b/src/bin/pg_dump/compress_io.c
@@ -83,7 +83,7 @@
  * used by the caller in an error message.
  */
 char *
-supports_compression(const pg_compress_specification compression_spec)
+pgdump_supports_compression(const pg_compress_specification compression_spec)
 {
 	const pg_compress_algorithm	algorithm = compression_spec.algorithm;
 	bool		supported = 

Re: Add LZ4 compression in pg_dump

2023-02-24 Thread gkokolatos






--- Original Message ---
On Friday, February 24th, 2023 at 5:35 AM, Michael Paquier 
 wrote:


> 
> 
> On Thu, Feb 23, 2023 at 07:51:16PM -0600, Justin Pryzby wrote:
> 
> > On Thu, Feb 23, 2023 at 09:24:46PM +0100, Tomas Vondra wrote:
> > 
> > > I've now pushed 0002 and 0003, after minor tweaks (a couple typos etc.),
> > > and marked the CF entry as committed. Thanks for the patch!
> > 
> > A big thanks from me to everyone involved.
> 
> 
> Wow, nice! The APIs are clear to follow.

I am out of words, thank you all so very much. I learned a lot. 

> 
> > I'll send a patch soon. I first submitted patches for that 2 years ago
> > (before PGDG was ready to add zstd).
> > https://commitfest.postgresql.org/31/2888/
> 
> 
> Thanks. It should be straight-forward to see that in 16, I guess.
> --
> Michael




Re: Add LZ4 compression in pg_dump

2023-02-23 Thread Michael Paquier
On Thu, Feb 23, 2023 at 07:51:16PM -0600, Justin Pryzby wrote:
> On Thu, Feb 23, 2023 at 09:24:46PM +0100, Tomas Vondra wrote:
>> I've now pushed 0002 and 0003, after minor tweaks (a couple typos etc.),
>> and marked the CF entry as committed. Thanks for the patch!
> 
> A big thanks from me to everyone involved.

Wow, nice!  The APIs are clear to follow.

> I'll send a patch soon.  I first submitted patches for that 2 years ago
> (before PGDG was ready to add zstd).
> https://commitfest.postgresql.org/31/2888/

Thanks.  It should be straight-forward to see that in 16, I guess.
--
Michael


signature.asc
Description: PGP signature


Re: Add LZ4 compression in pg_dump

2023-02-23 Thread Justin Pryzby
On Thu, Feb 23, 2023 at 09:24:46PM +0100, Tomas Vondra wrote:
> I've now pushed 0002 and 0003, after minor tweaks (a couple typos etc.),
> and marked the CF entry as committed. Thanks for the patch!

A big thanks from me to everyone involved.

> I wonder how difficult would it be to add the zstd compression, so that
> we don't have the annoying "unsupported" cases.

I'll send a patch soon.  I first submitted patches for that 2 years ago
(before PGDG was ready to add zstd).
https://commitfest.postgresql.org/31/2888/

-- 
Justin




  1   2   >