Re: refactoring basebackup.c

2023-04-12 Thread Robert Haas
On Wed, Apr 12, 2023 at 10:57 AM Justin Pryzby  wrote:
> I think these maybe got forgotten ?

Committed.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: refactoring basebackup.c

2023-04-12 Thread Justin Pryzby
On Fri, Mar 24, 2023 at 10:46:37AM -0400, Robert Haas wrote:
> On Thu, Mar 23, 2023 at 4:11 PM Robert Haas  wrote:
> > On Wed, Mar 22, 2023 at 10:09 PM Thomas Munro  
> > wrote:
> > > > BaseBackupSync is not documented
> > > > BaseBackupWrite is not documented
> > >
> > > [Resending with trimmed CC: list, because the mailing list told me to
> > > due to a blocked account, sorry if you already got the above.]
> >
> > Bummer. I'll write a patch to fix that tomorrow, unless somebody beats me 
> > to it.
> 
> Here's a patch for that, and a patch to add the missing error check
> Peter noticed.

I think these maybe got forgotten ?




Re: refactoring basebackup.c

2023-03-24 Thread Robert Haas
On Thu, Mar 23, 2023 at 4:11 PM Robert Haas  wrote:
> On Wed, Mar 22, 2023 at 10:09 PM Thomas Munro  wrote:
> > > BaseBackupSync is not documented
> > > BaseBackupWrite is not documented
> >
> > [Resending with trimmed CC: list, because the mailing list told me to
> > due to a blocked account, sorry if you already got the above.]
>
> Bummer. I'll write a patch to fix that tomorrow, unless somebody beats me to 
> it.

Here's a patch for that, and a patch to add the missing error check
Peter noticed.

-- 
Robert Haas
EDB: http://www.enterprisedb.com


0001-Add-missing-documentation-entries-for-new-base-backu.patch
Description: Binary data


0002-basebackup_to_shell-Add-missing-error-check.patch
Description: Binary data


Re: refactoring basebackup.c

2023-03-23 Thread Robert Haas
On Wed, Mar 22, 2023 at 10:09 PM Thomas Munro  wrote:
> > BaseBackupSync is not documented
> > BaseBackupWrite is not documented
>
> [Resending with trimmed CC: list, because the mailing list told me to
> due to a blocked account, sorry if you already got the above.]

Bummer. I'll write a patch to fix that tomorrow, unless somebody beats me to it.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: refactoring basebackup.c

2023-03-22 Thread Thomas Munro
On Thu, Mar 23, 2023 at 2:50 PM Thomas Munro  wrote:
> In rem: commit 3500ccc3,
>
> for X in ` grep -E '^[^*]+event_name = "'
> src/backend/utils/activity/wait_event.c |
>sed 's/^.* = "//;s/";$//;/unknown/d' `
> do
>   if ! git grep "$X" doc/src/sgml/monitoring.sgml > /dev/null
>   then
> echo "$X is not documented"
>   fi
> done
>
> BaseBackupSync is not documented
> BaseBackupWrite is not documented

[Resending with trimmed CC: list, because the mailing list told me to
due to a blocked account, sorry if you already got the above.]




Re: refactoring basebackup.c (zstd workers)

2022-03-30 Thread Justin Pryzby
On Wed, Mar 30, 2022 at 04:14:47PM -0400, Tom Lane wrote:
> Robert Haas  writes:
> >> Maybe if I just put that last sentence into the comment it's clear enough?
> 
> > Done that way, since I thought it was better to fix the bug than wait
> > for more feedback on the wording. We can still adjust the wording, or
> > the coding, if it's not clear enough.
> 
> FWIW, I thought that explanation was fine, but I was deferring to
> Justin who was the one who thought things were unclear.

I still think it's unnecessarily confusing to nest "if" and "?:" conditionals
in one statement, instead of 2 or 3 separate "if"s, or "||"s.
But it's also not worth fussing over any more.




Re: refactoring basebackup.c (zstd workers)

2022-03-30 Thread Tom Lane
Robert Haas  writes:
>> Maybe if I just put that last sentence into the comment it's clear enough?

> Done that way, since I thought it was better to fix the bug than wait
> for more feedback on the wording. We can still adjust the wording, or
> the coding, if it's not clear enough.

FWIW, I thought that explanation was fine, but I was deferring to
Justin who was the one who thought things were unclear.

regards, tom lane




Re: refactoring basebackup.c (zstd workers)

2022-03-30 Thread Robert Haas
On Tue, Mar 29, 2022 at 8:51 AM Robert Haas  wrote:
> On Mon, Mar 28, 2022 at 8:11 PM Tom Lane  wrote:
> > I suspect Robert wrote it that way intentionally --- but if so,
> > I agree it could do with more than zero commentary.
>
> Well, the point is, we stop advancing kwend when we get to the end of
> the keyword, and *vend when we get to the end of the value. If there's
> a value, the end of the keyword can't have been the end of the string,
> but the end of the value might have been. If there's no value, the end
> of the keyword could be the end of the string.
>
> Maybe if I just put that last sentence into the comment it's clear enough?

Done that way, since I thought it was better to fix the bug than wait
for more feedback on the wording. We can still adjust the wording, or
the coding, if it's not clear enough.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: refactoring basebackup.c (zstd workers)

2022-03-29 Thread Robert Haas
On Mon, Mar 28, 2022 at 8:11 PM Tom Lane  wrote:
> I suspect Robert wrote it that way intentionally --- but if so,
> I agree it could do with more than zero commentary.

Well, the point is, we stop advancing kwend when we get to the end of
the keyword, and *vend when we get to the end of the value. If there's
a value, the end of the keyword can't have been the end of the string,
but the end of the value might have been. If there's no value, the end
of the keyword could be the end of the string.

Maybe if I just put that last sentence into the comment it's clear enough?

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: refactoring basebackup.c (zstd workers)

2022-03-28 Thread Tom Lane
Justin Pryzby  writes:
> Also, why wouldn't *kwend be checked in any case ?

I suspect Robert wrote it that way intentionally --- but if so,
I agree it could do with more than zero commentary.

regards, tom lane




Re: refactoring basebackup.c (zstd workers)

2022-03-28 Thread Justin Pryzby
On Mon, Mar 28, 2022 at 05:39:31PM -0400, Robert Haas wrote:
> On Mon, Mar 28, 2022 at 4:53 PM Justin Pryzby  wrote:
> > I suggest to write it differently, as in 0002.
> 
> That doesn't seem better to me. What's the argument for it?

I find this much easier to understand:

/* If we got an error or have reached the end of the string, 
stop. */
  
-   if (result->parse_error != NULL || *kwend == '\0' || *vend == 
'\0')   
 
+   if (result->parse_error != NULL)

   
+   break;  

   
+   if (*kwend == '\0') 

   
+   break;  

   
+   if (vend != NULL && *vend == '\0')  

   
break;  

   

than

/* If we got an error or have reached the end of the string, 
stop. */
-   if (result->parse_error != NULL || *kwend == '\0' || *vend == 
'\0')
+   if (result->parse_error != NULL ||
+   (vend == NULL ? *kwend == '\0' : *vend == '\0'))

Also, why wouldn't *kwend be checked in any case ?




Re: refactoring basebackup.c (zstd workers)

2022-03-28 Thread Robert Haas
On Mon, Mar 28, 2022 at 4:53 PM Justin Pryzby  wrote:
> I suggest to write it differently, as in 0002.

That doesn't seem better to me. What's the argument for it?

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: refactoring basebackup.c (zstd workers)

2022-03-28 Thread Justin Pryzby
On Mon, Mar 28, 2022 at 03:50:50PM -0400, Robert Haas wrote:
> On Sun, Mar 27, 2022 at 1:47 PM Tom Lane  wrote:
> > Coverity has a nitpick about this:
> >
> > /srv/coverity/git/pgsql-git/postgresql/src/common/backup_compression.c: 194 
> > in parse_bc_specification()
> > 193 /* Advance to next entry and loop around. */
> > >>> CID 1503251:  Null pointer dereferences  (REVERSE_INULL)
> > >>> Null-checking "vend" suggests that it may be null, but it has 
> > >>> already been dereferenced on all paths leading to the check.
> > 194 specification = vend == NULL ? kwend + 1 : vend + 1;
> > 195 }
> > 196 }
> >
> > Not sure if you should remove this null-check or add some other ones,
> > but I think you ought to do one or the other.
> 
> Yes, I think this is buggy.  I think there's only a theoretical bug
> right now, because the only keyword we have is "level" and that
> requires a value. But if I add an example keyword that does not
> require an associated value (as demonstrated in the attached patch)
> and do something like pg_basebackup -cfast -D whatever --compress
> lz4:example, then the present code will dereference "vend" even though
> it's NULL, which is not good. The attached patch also shows how I
> think that should be fixed.
> 
> As I hope is apparent, the first hunk of this patch is not for commit,
> and the second hunk is for commit.

Confirmed that it's a real issue with my patch for zstd long match mode.  But
you need to specify another option after the value-less flag option for it to
crash.

I suggest to write it differently, as in 0002.

This also fixes some rebase-induced errors with my previous patches, and adds
expect_boolean().
>From 9bedbfc6bfa471473a8b3479ffd1888d5da285ab Mon Sep 17 00:00:00 2001
From: Robert Haas 
Date: Mon, 28 Mar 2022 13:25:44 -0400
Subject: [PATCH 1/4] Allow parallel zstd compression when taking a base
 backup.

libzstd allows transparent parallel compression just by setting
an option when creating the compression context, so permit that
for both client and server-side backup compression. To use this,
use something like pg_basebackup --compress WHERE-zstd:workers=N
where WHERE is "client" or "server" and N is an integer.

When compression is performed on the server side, this will spawn
threads inside the PostgreSQL backend. While there is almost no
PostgreSQL server code which is thread-safe, the threads here are used
internally by libzstd and touch only data structures controlled by
libzstd.

Patch by me, based in part on earlier work by Dipesh Pandit
and Jeevan Ladhe. Reviewed by Justin Pryzby.
---
 doc/src/sgml/protocol.sgml| 12 +++--
 doc/src/sgml/ref/pg_basebackup.sgml   |  4 +-
 src/backend/replication/basebackup_zstd.c | 45 ---
 src/bin/pg_basebackup/bbstreamer_zstd.c   | 40 -
 src/bin/pg_basebackup/t/010_pg_basebackup.pl  |  5 +++
 src/bin/pg_verifybackup/t/009_extract.pl  | 29 +++-
 src/bin/pg_verifybackup/t/010_client_untar.pl | 33 --
 src/common/backup_compression.c   | 16 +++
 src/include/common/backup_compression.h   |  2 +
 src/test/perl/PostgreSQL/Test/Cluster.pm  |  3 +-
 10 files changed, 148 insertions(+), 41 deletions(-)

diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index 2fa3cedfe9e..98f0bc3cc34 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -2739,17 +2739,23 @@ The commands accepted in replication mode are:
   option.  If the value is an integer, it specifies the compression
   level.  Otherwise, it should be a comma-separated list of items,
   each of the form keyword or
-  keyword=value. Currently, the only supported
-  keyword is level, which sets the compression
-  level.
+  keyword=value. Currently, the supported keywords
+  are level and workers.
 
 
 
+  The level keyword sets the compression level.
   For gzip the compression level should be an
   integer between 1 and 9, for lz4 an integer
   between 1 and 12, and for zstd an integer
   between 1 and 22.
  
+
+
+  The workers keyword sets the number of threads
+  that should be used for parallel compression. Parallel compression
+  is supported only for zstd.
+ 
 

 
diff --git a/doc/src/sgml/ref/pg_basebackup.sgml b/doc/src/sgml/ref/pg_basebackup.sgml
index d9233beb8e1..82f5f606250 100644
--- a/doc/src/sgml/ref/pg_basebackup.sgml
+++ b/doc/src/sgml/ref/pg_basebackup.sgml
@@ -424,8 +424,8 @@ PostgreSQL documentation
 integer, it specifies the compression level.  Otherwise, it should be
 a comma-separated list of items, each of the form
 keyword or keyword=value.
-Currently, the only supported keyword is level,
-which sets the compression 

Re: refactoring basebackup.c (zstd workers)

2022-03-28 Thread Tom Lane
Robert Haas  writes:
> On Sun, Mar 27, 2022 at 1:47 PM Tom Lane  wrote:
>> Not sure if you should remove this null-check or add some other ones,
>> but I think you ought to do one or the other.

> As I hope is apparent, the first hunk of this patch is not for commit,
> and the second hunk is for commit.

Looks plausible to me.

regards, tom lane




Re: refactoring basebackup.c (zstd workers)

2022-03-28 Thread Robert Haas
On Sun, Mar 27, 2022 at 1:47 PM Tom Lane  wrote:
> Coverity has a nitpick about this:
>
> /srv/coverity/git/pgsql-git/postgresql/src/common/backup_compression.c: 194 
> in parse_bc_specification()
> 193 /* Advance to next entry and loop around. */
> >>> CID 1503251:  Null pointer dereferences  (REVERSE_INULL)
> >>> Null-checking "vend" suggests that it may be null, but it has already 
> >>> been dereferenced on all paths leading to the check.
> 194 specification = vend == NULL ? kwend + 1 : vend + 1;
> 195 }
> 196 }
>
> Not sure if you should remove this null-check or add some other ones,
> but I think you ought to do one or the other.

Yes, I think this is buggy.  I think there's only a theoretical bug
right now, because the only keyword we have is "level" and that
requires a value. But if I add an example keyword that does not
require an associated value (as demonstrated in the attached patch)
and do something like pg_basebackup -cfast -D whatever --compress
lz4:example, then the present code will dereference "vend" even though
it's NULL, which is not good. The attached patch also shows how I
think that should be fixed.

As I hope is apparent, the first hunk of this patch is not for commit,
and the second hunk is for commit.

-- 
Robert Haas
EDB: http://www.enterprisedb.com


coverity-backup-compression-fix.patch
Description: Binary data


Re: refactoring basebackup.c (zstd workers)

2022-03-27 Thread Tom Lane
Robert Haas  writes:
> [ v5-0001-Replace-BASE_BACKUP-COMPRESSION_LEVEL-option-with.patch ]

Coverity has a nitpick about this:

/srv/coverity/git/pgsql-git/postgresql/src/common/backup_compression.c: 194 in 
parse_bc_specification()
193 /* Advance to next entry and loop around. */
>>> CID 1503251:  Null pointer dereferences  (REVERSE_INULL)
>>> Null-checking "vend" suggests that it may be null, but it has already 
>>> been dereferenced on all paths leading to the check.
194 specification = vend == NULL ? kwend + 1 : vend + 1;
195 }
196 }

Not sure if you should remove this null-check or add some other ones,
but I think you ought to do one or the other.

regards, tom lane




Re: refactoring basebackup.c (zstd workers)

2022-03-23 Thread Robert Haas
On Tue, Mar 22, 2022 at 11:37 AM Robert Haas  wrote:
> On Mon, Mar 21, 2022 at 2:41 PM Dagfinn Ilmari Mannsåker
>  wrote:
> > This is no longer the accurate. How about something like like "Specifies
> > details of the chosen compression method"?
>
> Good catch. v5 attached.

And committed.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: refactoring basebackup.c (zstd workers)

2022-03-22 Thread Robert Haas
On Mon, Mar 21, 2022 at 2:41 PM Dagfinn Ilmari Mannsåker
 wrote:
> This is no longer the accurate. How about something like like "Specifies
> details of the chosen compression method"?

Good catch. v5 attached.

-- 
Robert Haas
EDB: http://www.enterprisedb.com


v5-0001-Replace-BASE_BACKUP-COMPRESSION_LEVEL-option-with.patch
Description: Binary data


Re: refactoring basebackup.c (zstd workers)

2022-03-21 Thread Dagfinn Ilmari Mannsåker
Robert Haas  writes:

> On Mon, Mar 21, 2022 at 2:22 PM Justin Pryzby  wrote:
>> + * during parsing, and will otherwise contain a an appropriate error 
>> message.
>
> OK, thanks. v4 attached.

I haven't read the whole patch, but I noticed an omission in the
documentation changes:

> diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
> index 9178c779ba..00c593f1af 100644
> --- a/doc/src/sgml/protocol.sgml
> +++ b/doc/src/sgml/protocol.sgml
> @@ -2731,14 +2731,24 @@ The commands accepted in replication mode are:
> 
> -COMPRESSION_LEVEL 
> level
> +COMPRESSION_DETAIL 
> detail
>  
>   
>Specifies the compression level to be used.

This is no longer the accurate. How about something like like "Specifies
details of the chosen compression method"?

- ilmari




Re: refactoring basebackup.c (zstd workers)

2022-03-21 Thread Robert Haas
On Mon, Mar 21, 2022 at 2:22 PM Justin Pryzby  wrote:
> + * during parsing, and will otherwise contain a an appropriate error message.

OK, thanks. v4 attached.

> I think "algorithm" could be much more nuanced than "lz4", but I also think
> we've spent more than enough time on it now :)

Oh dear. But yes.

-- 
Robert Haas
EDB: http://www.enterprisedb.com


v4-0001-Replace-BASE_BACKUP-COMPRESSION_LEVEL-option-with.patch
Description: Binary data


Re: refactoring basebackup.c (zstd workers)

2022-03-21 Thread Justin Pryzby
On Mon, Mar 21, 2022 at 12:57:36PM -0400, Robert Haas wrote:
> > typo: contain a an
> I searched for the "contain a an" typo that you mentioned but was not able to
> find it. Can you give me a more specific pointer?

Here:

+ * during parsing, and will otherwise contain a an appropriate error message.

> I looked a little bit more at the compression method vs. compression
> algorithm thing. I agree that there is some inconsistency in
> terminology here, but I'm still not sure that we are well-served by
> trying to make it totally uniform, especially if we pick the word
> "method" as the standard rather than "algorithm". In my opinion,
> "method" is less specific than "algorithm". If someone asks me to
> choose a compression algorithm, I know that I should give an answer
> like "lz4" or "zstd". If they ask me to pick a compression method, I'm
> not quite sure whether they want that kind of answer or whether they
> want something more detailed, like "use lz4 with compression level 3
> and a 1MB block size". After all, that is (at least according to my
> understanding of how English works) a perfectly valid answer to the
> question "what method should I use to compress this data?" -- but not
> to the question "what algorithm should I use to compress this data?".
> The latter can ONLY be properly answered by saying something like
> "lz4". And I think that's really the root of my hesitation to make the
> kinds of changes you want here.

I think "algorithm" could be much more nuanced than "lz4", but I also think
we've spent more than enough time on it now :)

-- 
Justin




Re: refactoring basebackup.c (zstd workers)

2022-03-21 Thread Robert Haas
On Mon, Mar 21, 2022 at 9:18 AM Justin Pryzby  wrote:
> On Sun, Mar 20, 2022 at 09:38:44PM -0400, Robert Haas wrote:
> > > This patch also needs to update the other user-facing docs.
> >
> > Which ones exactly?
>
> I mean pg_basebackup -Z
>
> -Z level
> -Z [{client|server}-]method[:level]
> --compress=level
> --compress=[{client|server}-]method[:level]

Ah, right. Thanks.

Here's v3. I have updated that section of the documentation. I also
went and added a bunch more test cases for validation of compression
detail strings, many inspired by your examples, and fixed all the bugs
that I found in the process. I think the crashes you complained about
are now fixed, but please let me know if I have missed any. I also
added _() calls as you suggested. I searched for the "contain a an"
typo that you mentioned but was not able to find it. Can you give me a
more specific pointer?

I looked a little bit more at the compression method vs. compression
algorithm thing. I agree that there is some inconsistency in
terminology here, but I'm still not sure that we are well-served by
trying to make it totally uniform, especially if we pick the word
"method" as the standard rather than "algorithm". In my opinion,
"method" is less specific than "algorithm". If someone asks me to
choose a compression algorithm, I know that I should give an answer
like "lz4" or "zstd". If they ask me to pick a compression method, I'm
not quite sure whether they want that kind of answer or whether they
want something more detailed, like "use lz4 with compression level 3
and a 1MB block size". After all, that is (at least according to my
understanding of how English works) a perfectly valid answer to the
question "what method should I use to compress this data?" -- but not
to the question "what algorithm should I use to compress this data?".
The latter can ONLY be properly answered by saying something like
"lz4". And I think that's really the root of my hesitation to make the
kinds of changes you want here. If it's just a question of specifying
a compression algorithm and a level, I don't think using the name
"method" for the algorithm is going to be too bad. But as we enrich
the system with multiple compression algorithms each of which may have
multiple and different parameters, I think the whole thing becomes
murkier and the need for precision in language goes up.

Now that is of course an arguable position and you're welcome to
disagree with it, but I think that's part of why I'm hesitating.
Another part of it, at least for me, is that complete uniformity is
not always a positive. I suppose all of us have had the experience at
some point of reading a manual that says something like "to activate
the boil water function, press and release the 'boil water' button"
and rolled our eyes at how useless it was. It's important to me that
we don't fall into that trap. We clearly don't want to go ballistic
and have random inconsistencies in language for no reason, but at the
same time, it's not useful to tell people that METHOD should be
replaced with a compression method and LEVEL with a compression level.
I mean, if you end up saying something like that interspersed with
non-obvious information, that is OK, and I don't want to overstate the
point I'm trying to make. But it seems to me that if there's a little
variation in phrasing and we end up saying that METHOD means the
compression algorithm or that ALGORITHM means the compression method
or whatever, that can actually make things more clear. Here again it's
debatable: how much variation in phraseology is helpful, and at what
point does it just start to seem inconsistent? Well, everyone may have
their own opinion.

I'm not trying to pretend that this patch (or the existing code base)
gets this all right. But I do think that, to the extent that we have a
considered position on what to do here, we can make that change later,
perhaps even after getting some user feedback on what does and does
not make sense to other people. And I also think that what we end up
doing here may well end up being more nuanced than a blanket
search-and-replace. I'm not saying we couldn't make a blanket
search-and-replace. I just don't see it as necessarily creating value,
or being all that closely connected to the goal of this patch, which
is to quickly clean up a forward-compatibility risk before we hit
feature freeze.

Thanks,

-- 
Robert Haas
EDB: http://www.enterprisedb.com


v3-0001-Replace-BASE_BACKUP-COMPRESSION_LEVEL-option-with.patch
Description: Binary data


Re: refactoring basebackup.c (zstd workers)

2022-03-21 Thread Justin Pryzby
On Sun, Mar 20, 2022 at 09:38:44PM -0400, Robert Haas wrote:
> > This patch also needs to update the other user-facing docs.
> 
> Which ones exactly?

I mean pg_basebackup -Z

-Z level
-Z [{client|server}-]method[:level]
--compress=level
--compress=[{client|server}-]method[:level]




Re: refactoring basebackup.c (zstd workers)

2022-03-20 Thread Robert Haas
On Sun, Mar 20, 2022 at 9:32 PM Tom Lane  wrote:
> Robert Haas  writes:
> > I think I'm guilty of verbal inexactitude here but not bad coding.
> > Checking for *endptr != '\0', as I did, is not sufficient to detect
> > "whether an error occurred," as I alleged. But, in the part of my
> > response you didn't quote, I believe I made it clear that I only need
> > to detect garbage, not out-of-range values. And I think *endptr !=
> > '\0' will do that.
>
> Hmm ... do you consider an empty string to be valid input?

No, and I thought I had checked properly for that condition before
reaching the point in the code where I call strtol(), but it turns out
I have not, which I guess is what Justin has been trying to tell me
for a few emails now.

I'll send an updated patch tomorrow after looking this all over more carefully.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: refactoring basebackup.c (zstd workers)

2022-03-20 Thread Robert Haas
On Sun, Mar 20, 2022 at 3:40 PM Justin Pryzby  wrote:
> The user-facing docs are already standardized using "compression method", with
> 2 exceptions, of which one is contrib/ and the other is what I'm suggesting to
> make consistent here.
>
> $ git grep 'compression algorithm' doc
> doc/src/sgml/pgcrypto.sgml:Which compression algorithm to use.  Only 
> available if
> doc/src/sgml/ref/pg_basebackup.sgml:compression algorithm is 
> selected, or if server-side compression

Well, if you just count the number of occurrences of each string in
the documentation, sure. But all of the ones that are talking about a
compression method seem to have to do with configurable TOAST
compression, and the fact that the documentation for that feature is
more extensive than for the pre-existing feature that refers to a
compression algorithm does not, at least in my view, turn it into a
project standard from which no deviation is permitted.

> > Did the latter. The former would need to be fixed in a bunch of places
> > and while I'm happy to accept an expert opinion on exactly what needs
> > to be done here, I don't want to try to do it and do it wrong. Better
> > to let someone with good knowledge of the subject matter patch it up
> > later than do a crummy job now.
>
> I believe it just needs _("foo")
> See git grep '= _('

Hmm. Maybe.

> I mentioned another issue off-list:
> pg_basebackup.c:2741:10: warning: suggest parentheses around assignment used 
> as truth value [-Wparentheses]
>  2741 |   Assert(compressloc = COMPRESS_LOCATION_SERVER);
>   |  ^~~
> pg_basebackup.c:2741:3: note: in expansion of macro ‘Assert’
>  2741 |   Assert(compressloc = COMPRESS_LOCATION_SERVER);
>
> This crashes the server using your v2 patch:
>
> src/bin/pg_basebackup/pg_basebackup --wal-method fetch -Ft -D - -h /tmp 
> --no-sync --no-manifest --compress=server-zstd:level, |wc -c

Well that's unfortunate. Will fix.

> I wonder whether the syntax should really use both ":" and ",".
> Maybe ":" isn't needed at all.

I don't think we should treat the compression method name in the same
way as a compression algorithm option.

> This patch also needs to update the other user-facing docs.

Which ones exactly?

> typo: contain a an

OK, will fix.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: refactoring basebackup.c (zstd workers)

2022-03-20 Thread Tom Lane
Robert Haas  writes:
> I think I'm guilty of verbal inexactitude here but not bad coding.
> Checking for *endptr != '\0', as I did, is not sufficient to detect
> "whether an error occurred," as I alleged. But, in the part of my
> response you didn't quote, I believe I made it clear that I only need
> to detect garbage, not out-of-range values. And I think *endptr !=
> '\0' will do that.

Hmm ... do you consider an empty string to be valid input?

regards, tom lane




Re: refactoring basebackup.c (zstd workers)

2022-03-20 Thread Robert Haas
On Sun, Mar 20, 2022 at 3:11 PM Tom Lane  wrote:
> > Even after reading the man page for strtol, it's not clear to me that
> > this is needed. That page represents checking *endptr != '\0' as
> > sufficient to tell whether an error occurred.
>
> I'm not sure whose man page you looked at, but the POSIX standard [1]
> has a pretty clear opinion about this:
>
> Since 0, {LONG_MIN} or {LLONG_MIN}, and {LONG_MAX} or {LLONG_MAX} are
> returned on error and are also valid returns on success, an
> application wishing to check for error situations should set errno to
> 0, then call strtol() or strtoll(), then check errno.
>
> Checking *endptr != '\0' is for detecting whether there is trailing
> garbage after the number; which may be an error case or not as you
> choose, but it's a different matter.

I think I'm guilty of verbal inexactitude here but not bad coding.
Checking for *endptr != '\0', as I did, is not sufficient to detect
"whether an error occurred," as I alleged. But, in the part of my
response you didn't quote, I believe I made it clear that I only need
to detect garbage, not out-of-range values. And I think *endptr !=
'\0' will do that.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: refactoring basebackup.c (zstd workers)

2022-03-20 Thread Justin Pryzby
On Sun, Mar 20, 2022 at 03:05:28PM -0400, Robert Haas wrote:
> On Thu, Mar 17, 2022 at 3:41 PM Justin Pryzby  wrote:
> > -errmsg("unrecognized 
> > compression algorithm: \"%s\"",
> > +errmsg("unrecognized 
> > compression algorithm \"%s\"",
> >
> > Most other places seem to say "compression method".  So I'd suggest to 
> > change
> > that here, and in doc/src/sgml/ref/pg_basebackup.sgml.
> 
> I'm not sure that's really better, and I don't think this patch is
> introducing an altogether novel usage. I think I would probably try to
> standardize on algorithm rather than method if I were standardizing
> the whole source tree, but I think we can leave that discussion for
> another time.

The user-facing docs are already standardized using "compression method", with
2 exceptions, of which one is contrib/ and the other is what I'm suggesting to
make consistent here.

$ git grep 'compression algorithm' doc
doc/src/sgml/pgcrypto.sgml:Which compression algorithm to use.  Only 
available if
doc/src/sgml/ref/pg_basebackup.sgml:compression algorithm is selected, 
or if server-side compression

> > +   result->parse_error =
> > +   pstrdup("found empty string where a 
> > compression option was expected");
> >
> > Needs to be localized with _() ?
> > Also, document that it's pstrdup'd.
> 
> Did the latter. The former would need to be fixed in a bunch of places
> and while I'm happy to accept an expert opinion on exactly what needs
> to be done here, I don't want to try to do it and do it wrong. Better
> to let someone with good knowledge of the subject matter patch it up
> later than do a crummy job now.

I believe it just needs _("foo")
See git grep '= _('

I mentioned another issue off-list:
pg_basebackup.c:2741:10: warning: suggest parentheses around assignment used as 
truth value [-Wparentheses]
 2741 |   Assert(compressloc = COMPRESS_LOCATION_SERVER);
  |  ^~~
pg_basebackup.c:2741:3: note: in expansion of macro ‘Assert’
 2741 |   Assert(compressloc = COMPRESS_LOCATION_SERVER);

This crashes the server using your v2 patch:

src/bin/pg_basebackup/pg_basebackup --wal-method fetch -Ft -D - -h /tmp 
--no-sync --no-manifest --compress=server-zstd:level, |wc -c

I wonder whether the syntax should really use both ":" and ",".
Maybe ":" isn't needed at all.

This patch also needs to update the other user-facing docs.

typo: contain a an

-- 
Justin




Re: refactoring basebackup.c (zstd workers)

2022-03-20 Thread Tom Lane
Robert Haas  writes:
>> Should this also set/check errno ?
>> And check if value != ivalue_endp ?
>> See strtol(3)

> Even after reading the man page for strtol, it's not clear to me that
> this is needed. That page represents checking *endptr != '\0' as
> sufficient to tell whether an error occurred.

I'm not sure whose man page you looked at, but the POSIX standard [1]
has a pretty clear opinion about this:

Since 0, {LONG_MIN} or {LLONG_MIN}, and {LONG_MAX} or {LLONG_MAX} are
returned on error and are also valid returns on success, an
application wishing to check for error situations should set errno to
0, then call strtol() or strtoll(), then check errno.

Checking *endptr != '\0' is for detecting whether there is trailing
garbage after the number; which may be an error case or not as you
choose, but it's a different matter.

regards, tom lane

[1] https://pubs.opengroup.org/onlinepubs/9699919799/




Re: refactoring basebackup.c (zstd workers)

2022-03-20 Thread Robert Haas
On Thu, Mar 17, 2022 at 3:41 PM Justin Pryzby  wrote:
> gzip comma

I think it's fine the way it's written. If we made that change, then
we'd have a comma for gzip and not for the other two algorithms. Also,
I'm just moving that sentence, so any change that there is to be made
here is a job for some other patch.

> alphabetical

Fixed.

> -errmsg("unrecognized 
> compression algorithm: \"%s\"",
> +errmsg("unrecognized 
> compression algorithm \"%s\"",
>
> Most other places seem to say "compression method".  So I'd suggest to change
> that here, and in doc/src/sgml/ref/pg_basebackup.sgml.

I'm not sure that's really better, and I don't think this patch is
introducing an altogether novel usage. I think I would probably try to
standardize on algorithm rather than method if I were standardizing
the whole source tree, but I think we can leave that discussion for
another time.

> -   if (o_compression_level && !o_compression)
> +   if (o_compression_detail && !o_compression)
> ereport(ERROR,
> (errcode(ERRCODE_SYNTAX_ERROR),
>  errmsg("compression level requires 
> compression")));
>
> s/level/detail/

Fixed.


> It'd be great if this were re-usable for wal_compression, which I hope in 
> pg16 will
> support at least level=N.  And eventually pg_dump.  But those clients 
> shouldn't
> accept a client/server prefix.  Maybe the way to handle that is for those 
> tools
> to check locationres and reject it if it was specified.

One thing I forgot to mention in my previous response is that I think
the parsing code is actually well set up for this the way I have it.
server- and client- gets parsed off in a different place than we
interpret the rest, which fits well with your observation that other
cases wouldn't have a client or server prefix.

> sp: sandwich

Fixed.

> star

Fixed.

> should be const ?

OK.

>
> +   /* As a special case, the specification can be a bare integer. */
> +   bare_level = strtol(specification, _level_endp, 10);
>
> Should this call expect_integer_value()?
> See below.

I don't think that would be useful. We have no keyword to pass for the
error message, nor would we use the error message if one got
constructed.

> +   result->parse_error =
> +   pstrdup("found empty string where a 
> compression option was expected");
>
> Needs to be localized with _() ?
> Also, document that it's pstrdup'd.

Did the latter. The former would need to be fixed in a bunch of places
and while I'm happy to accept an expert opinion on exactly what needs
to be done here, I don't want to try to do it and do it wrong. Better
to let someone with good knowledge of the subject matter patch it up
later than do a crummy job now.

> -1 isn't great, since it's also an integer, and, also a valid compression 
> level
> for zstd (did you see my message about that?).  Maybe INT_MIN is ok.

It really doesn't matter. Could just return 42. The client shouldn't
use the value if there's an error.

> +{
> +   int ivalue;
> +   char   *ivalue_endp;
> +
> +   ivalue = strtol(value, _endp, 10);
>
> Should this also set/check errno ?
> And check if value != ivalue_endp ?
> See strtol(3)

Even after reading the man page for strtol, it's not clear to me that
this is needed. That page represents checking *endptr != '\0' as
sufficient to tell whether an error occurred. Maybe it wouldn't catch
an out of range value, but in practice all of the algorithms we
support now and any we support in the future are going to catch
something clamped to LONG_MIN or LONG_MAX as out of range and display
the correct error message. What's your specific thinking here?

> +   unsignedoptions;/* OR of 
> BACKUP_COMPRESSION_OPTION constants */
>
> Should be "unsigned int" or "bits32" ?

I do not see why either of those would be better.

> The server crashes if I send an unknown option - you should hit that in the
> regression tests.

Turns out I was testing this on the client side but not the server
side. Fixed and added more tests.

v2 attached.

-- 
Robert Haas
EDB: http://www.enterprisedb.com


v2-0001-Replace-BASE_BACKUP-COMPRESSION_LEVEL-option-with.patch
Description: Binary data


Re: refactoring basebackup.c (zstd workers)

2022-03-17 Thread Robert Haas
Thanks for the review!

I'll address most of these comments later, but quickly for right now...

On Thu, Mar 17, 2022 at 3:41 PM Justin Pryzby  wrote:
> It'd be great if this were re-usable for wal_compression, which I hope in 
> pg16 will
> support at least level=N.  And eventually pg_dump.  But those clients 
> shouldn't
> accept a client/server prefix.  Maybe the way to handle that is for those 
> tools
> to check locationres and reject it if it was specified.
> [...]
> This is confusingly similar to src/include/access/xlog.h:WalCompression.
> I think someone else mentioned this before ?

A couple of people before me have had delusions of grandeur in this
area. We have the WalCompression enum, which has values of the form
COMPRESSION_*, instead of WAL_COMPRESSION_*, as if the WAL were going
to be the only thing that ever got compressed. And pg_dump.h also has
a CompressionAlgorithm enum, with values like COMPR_ALG_*, which isn't
great naming either. Clearly there's some cleanup needed here: if we
can use the same enum for multiple systems, then it can have a name
implying that it's the only game in town, but otherwise both the enum
name and the corresponding value need to use a suitable prefix. I
think that's a job for another patch, probably post-v15. For now I
plan to do the right thing with the new names I'm adding, and leave
the existing names alone. That can be changed in the future, if and
when it seems sensible.

As I said elsewhere, I think the WAL compression stuff is badly
designed and should probably be rewritten completely, maybe to reuse
the bbstreamer stuff. In that case, WalCompressionMethod would
probably go away entirely, making the naming confusion moot, and
picking up zstd and lz4 compression support for free. If that doesn't
happen, we can probably find some way to at least make them share an
enum, but I think that's too hairy to try to clean up right now with
feature freeze pending.

> The server crashes if I send an unknown option - you should hit that in the
> regression tests.
>
> $ src/bin/pg_basebackup/pg_basebackup --wal-method fetch -Ft -D - -h /tmp 
> --no-sync --no-manifest --compress=server-lz4:a |wc -c
> TRAP: FailedAssertion("pointer != NULL", File: 
> "../../../../src/include/utils/memutils.h", Line: 123, PID: 8627)
> postgres: walsender pryzbyj [local] 
> BASE_BACKUP(ExceptionalCondition+0xa0)[0x560b45d7b64b]
> postgres: walsender pryzbyj [local] BASE_BACKUP(pfree+0x5d)[0x560b45dad1ea]
> postgres: walsender pryzbyj [local] 
> BASE_BACKUP(parse_bc_specification+0x154)[0x560b45dc5d4f]
> postgres: walsender pryzbyj [local] BASE_BACKUP(+0x43d56c)[0x560b45bc556c]
> postgres: walsender pryzbyj [local] 
> BASE_BACKUP(SendBaseBackup+0x2d)[0x560b45bc85ca]
> postgres: walsender pryzbyj [local] 
> BASE_BACKUP(exec_replication_command+0x3a2)[0x560b45bdddb2]
> postgres: walsender pryzbyj [local] 
> BASE_BACKUP(PostgresMain+0x6b2)[0x560b45c39131]
> postgres: walsender pryzbyj [local] BASE_BACKUP(+0x40530e)[0x560b45b8d30e]
> postgres: walsender pryzbyj [local] BASE_BACKUP(+0x408572)[0x560b45b90572]
> postgres: walsender pryzbyj [local] BASE_BACKUP(+0x4087b9)[0x560b45b907b9]
> postgres: walsender pryzbyj [local] 
> BASE_BACKUP(PostmasterMain+0x1135)[0x560b45b91d9b]
> postgres: walsender pryzbyj [local] BASE_BACKUP(main+0x229)[0x560b45ad0f78]

That's odd - I thought I had tested that case. Will double-check.

> This is interpreted like client-gzip-1; should multiple specifications of
> compress be prohibited ?
>
> | src/bin/pg_basebackup/pg_basebackup --wal-method fetch -Ft -D - -h /tmp 
> --no-sync --no-manifest --compress=server-lz4 --compress=1

They're not now and haven't been in the past. I think the last one
should just win (as it apparently does, here). We do that in some
places and throw an error in others and I'm not sure if we have a 100%
consistent rule for it, but flipping one location between one behavior
and the other isn't going to make things more consistent overall.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: refactoring basebackup.c (zstd workers)

2022-03-17 Thread Justin Pryzby
diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index 9178c779ba..00c593f1af 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -2731,14 +2731,24 @@ The commands accepted in replication mode are:
+
+  For gzip the compression level should be an

gzip comma

+++ b/src/backend/replication/basebackup.c
@@ -18,6 +18,7 @@
 
 #include "access/xlog_internal.h"  /* for pg_start/stop_backup */
 #include "common/file_perm.h"
+#include "common/backup_compression.h"

alphabetical

-errmsg("unrecognized 
compression algorithm: \"%s\"",
+errmsg("unrecognized 
compression algorithm \"%s\"",

Most other places seem to say "compression method".  So I'd suggest to change
that here, and in doc/src/sgml/ref/pg_basebackup.sgml.

-   if (o_compression_level && !o_compression)
+   if (o_compression_detail && !o_compression)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
 errmsg("compression level requires 
compression")));

s/level/detail/

 /*
+ * Basic parsing of a value specified for -Z/--compress.
+ *
+ * We're not concerned here with understanding exactly what behavior the
+ * user wants, but we do need to know whether the user is requesting client
+ * or server side compression or leaving it unspecified, and we need to
+ * separate the name of the compression algorithm from the detail string.
+ *
+ * For instance, if the user writes --compress client-lz4:6, we want to
+ * separate that into (a) client-side compression, (b) algorithm "lz4",
+ * and (c) detail "6". Note, however, that all the client/server prefix is
+ * optional, and so is the detail. The algorithm name is required, unless
+ * the whole string is an integer, in which case we assume "gzip" as the
+ * algorithm and use the integer as the detail.
..
  */
 static void
+parse_compress_options(char *option, char **algorithm, char **detail,
+  CompressionLocation *locationres)

It'd be great if this were re-usable for wal_compression, which I hope in pg16 
will
support at least level=N.  And eventually pg_dump.  But those clients shouldn't
accept a client/server prefix.  Maybe the way to handle that is for those tools
to check locationres and reject it if it was specified.

+ * We're not concerned with validation at this stage, so if the user writes
+ * --compress client-turkey:sandwhich, the requested algorithm is "turkey"
+ * and the detail string is "sandwhich". We'll sort out whether that's legal

sp: sandwich

+   WalCompressionMethodwal_compress_method;

This is confusingly similar to src/include/access/xlog.h:WalCompression.
I think someone else mentioned this before ?

+ * A compression specification specifies the parameters that should be used
+ * when * performing compression with a specific algorithm. The simplest

star

+/*
+ * Get the human-readable name corresponding to a particular compression
+ * algorithm.
+ */
+char *
+get_bc_algorithm_name(bc_algorithm algorithm)

should be const ?

+   /* As a special case, the specification can be a bare integer. */
+   bare_level = strtol(specification, _level_endp, 10);

Should this call expect_integer_value()?
See below.

+   result->parse_error =
+   pstrdup("found empty string where a compression 
option was expected");

Needs to be localized with _() ?
Also, document that it's pstrdup'd.

+/*
+ * Parse 'value' as an integer and return the result.
+ *
+ * If parsing fails, set result->parse_error to an appropriate message
+ * and return -1.
+ */
+static int
+expect_integer_value(char *keyword, char *value, bc_specification *result)

-1 isn't great, since it's also an integer, and, also a valid compression level
for zstd (did you see my message about that?).  Maybe INT_MIN is ok.

+{
+   int ivalue;
+   char   *ivalue_endp;
+
+   ivalue = strtol(value, _endp, 10);

Should this also set/check errno ?
And check if value != ivalue_endp ?
See strtol(3)

+char *
+validate_bc_specification(bc_specification *spec)
...
+   /*
+* If a compression level was specified, check that the algorithm 
expects
+* a compression level and that the level is within the legal range for
+* the algorithm.

It would be nice if this could be shared with wal_compression and pg_dump.
We shouldn't need multiple places with structures giving the algorithms and
range of compression levels.

+   unsignedoptions;/* OR of 
BACKUP_COMPRESSION_OPTION constants */

Should be "unsigned int" or "bits32" ?

The server crashes if I send an unknown option - you should hit that in the
regression tests.

$ src/bin/pg_basebackup/pg_basebackup --wal-method fetch -Ft -D - -h /tmp 
--no-sync --no-manifest 

Re: refactoring basebackup.c (zstd workers)

2022-03-17 Thread Robert Haas
On Mon, Mar 14, 2022 at 1:21 PM Robert Haas  wrote:
> There's some appeal to that, but one downside is that it means that
> the client can't be used to fetch data that is compressed in a way
> that the server knows about and the client doesn't. I don't think
> that's great. Why should, for example, pg_basebackup need to be
> compiled with zstd support in order to request zstd compression on the
> server side? If the server knows about the brand new
> justin-magic-sauce compression algorithm, maybe the client should just
> be able to request it and, when given various .jms files by the
> server, shrug its shoulders and accept them for what they are. That
> doesn't work if -Fp is involved, or similar, but it should work fine
> for simple cases if we set things up right.

Concretely, I propose the attached patch for v15. It renames the
newly-added COMPRESSION_LEVEL option to COMPRESSION_DETAIL, introduces
a flexible syntax for options along the lines you proposed, and
adjusts things so that a client that doesn't support a particular type
of compression can still request that type of compression from the
server.

I think it's important to do this for v15 so that we don't end up with
backward-compatibility problems down the road.

-- 
Robert Haas
EDB: http://www.enterprisedb.com


v1-0001-Replace-BASE_BACKUP-COMPRESSION_LEVEL-option-with.patch
Description: Binary data


Re: refactoring basebackup.c (zstd negative compression)

2022-03-16 Thread Justin Pryzby
Should zstd's negative compression levels be supported here ?

Here's a POC patch which is enough to play with it.

$ src/bin/pg_basebackup/pg_basebackup --wal-method fetch -Ft -D - -h /tmp 
--no-sync --compress=zstd |wc -c
12305659
$ src/bin/pg_basebackup/pg_basebackup --wal-method fetch -Ft -D - -h /tmp 
--no-sync --compress=zstd:1 |wc -c
13827521
$ src/bin/pg_basebackup/pg_basebackup --wal-method fetch -Ft -D - -h /tmp 
--no-sync --compress=zstd:0 |wc -c
12304018
$ src/bin/pg_basebackup/pg_basebackup --wal-method fetch -Ft -D - -h /tmp 
--no-sync --compress=zstd:-1 |wc -c
16443893
$ src/bin/pg_basebackup/pg_basebackup --wal-method fetch -Ft -D - -h /tmp 
--no-sync --compress=zstd:-2 |wc -c
17349563
$ src/bin/pg_basebackup/pg_basebackup --wal-method fetch -Ft -D - -h /tmp 
--no-sync --compress=zstd:-4 |wc -c
19452631
$ src/bin/pg_basebackup/pg_basebackup --wal-method fetch -Ft -D - -h /tmp 
--no-sync --compress=zstd:-7 |wc -c
21871505

Also, with a partial regression DB, this crashes when writing to stdout.

$ src/bin/pg_basebackup/pg_basebackup --wal-method fetch -Ft -D - -h /tmp 
--no-sync --compress=lz4 |wc -c
pg_basebackup: bbstreamer_lz4.c:172: bbstreamer_lz4_compressor_content: 
Assertion `mystreamer->base.bbs_buffer.maxlen >= out_bound' failed.
24117248

#4  0xe8b4 in bbstreamer_lz4_compressor_content 
(streamer=0x555a5260, member=0x7fffc760, 
data=0x73068010 "{ \"PostgreSQL-Backup-Manifest-Version\": 
1,\n\"Files\": [\n{ \"Path\": \"backup_label\", \"Size\": 227, 
\"Last-Modified\": \"2022-03-16 02:29:11 GMT\", \"Checksum-Algorithm\": 
\"CRC32C\", \"Checksum\": \"46f69d99\" },\n{ \"Pa"..., len=401072, 
context=BBSTREAMER_MEMBER_CONTENTS) at bbstreamer_lz4.c:172
mystreamer = 0x555a5260
next_in = 0x73068010 "{ \"PostgreSQL-Backup-Manifest-Version\": 
1,\n\"Files\": [\n{ \"Path\": \"backup_label\", \"Size\": 227, 
\"Last-Modified\": \"2022-03-16 02:29:11 GMT\", \"Checksum-Algorithm\": 
\"CRC32C\", \"Checksum\": \"46f69d99\" },\n{ \"Pa"...
...

(gdb) p mystreamer->base.bbs_buffer.maxlen
$1 = 524288
(gdb) p (int) LZ4F_compressBound(len, >prefs)
$4 = 524300

This is with: liblz4-1:amd64 1.9.2-2ubuntu0.20.04.1
>From 9a330a3a1801352cef3b5912e31aba61760dac32 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Thu, 10 Mar 2022 20:16:19 -0600
Subject: [PATCH] pg_basebackup: support Zstd negative compression levels

"higher than maximum" is bogus

TODO: each compression methods should enforce its own levels
---
 src/backend/replication/basebackup_zstd.c |  8 ++--
 src/backend/replication/repl_scanner.l|  4 +++-
 src/bin/pg_basebackup/bbstreamer_zstd.c   |  7 ++-
 src/bin/pg_basebackup/pg_basebackup.c | 23 ---
 4 files changed, 27 insertions(+), 15 deletions(-)

diff --git a/src/backend/replication/basebackup_zstd.c b/src/backend/replication/basebackup_zstd.c
index c0e2be6e27b..4464fcb30e1 100644
--- a/src/backend/replication/basebackup_zstd.c
+++ b/src/backend/replication/basebackup_zstd.c
@@ -71,7 +71,7 @@ bbsink_zstd_new(bbsink *next, int compresslevel)
 
 	Assert(next != NULL);
 
-	if (compresslevel < 0 || compresslevel > 22)
+	if (compresslevel < -7 || compresslevel > 22)
 		ereport(ERROR,
 (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
  errmsg("zstd compression level %d is out of range",
@@ -96,13 +96,17 @@ bbsink_zstd_begin_backup(bbsink *sink)
 {
 	bbsink_zstd *mysink = (bbsink_zstd *) sink;
 	size_t		output_buffer_bound;
+	size_t		ret;
 
 	mysink->cctx = ZSTD_createCCtx();
 	if (!mysink->cctx)
 		elog(ERROR, "could not create zstd compression context");
 
-	ZSTD_CCtx_setParameter(mysink->cctx, ZSTD_c_compressionLevel,
+	ret = ZSTD_CCtx_setParameter(mysink->cctx, ZSTD_c_compressionLevel,
 		   mysink->compresslevel);
+	if (ZSTD_isError(ret))
+		elog(ERROR, "could not create zstd compression context: %s",
+ZSTD_getErrorName(ret));
 
 	/*
 	 * We need our own buffer, because we're going to pass different data to
diff --git a/src/backend/replication/repl_scanner.l b/src/backend/replication/repl_scanner.l
index 4b64c0d768b..05c4ef463a1 100644
--- a/src/backend/replication/repl_scanner.l
+++ b/src/backend/replication/repl_scanner.l
@@ -86,6 +86,7 @@ xdinside		[^"]+
 
 digit			[0-9]
 hexdigit		[0-9A-Fa-f]
+sign			("-"|"+")
 
 ident_start		[A-Za-z\200-\377_]
 ident_cont		[A-Za-z\200-\377_0-9\$]
@@ -127,9 +128,10 @@ NOEXPORT_SNAPSHOT	{ return K_NOEXPORT_SNAPSHOT; }
 USE_SNAPSHOT		{ return K_USE_SNAPSHOT; }
 WAIT{ return K_WAIT; }
 
+
 {space}+		{ /* do nothing */ }
 
-{digit}+		{
+{sign}?{digit}+		{
 	yylval.uintval = strtoul(yytext, NULL, 10);
 	return UCONST;
 }
diff --git a/src/bin/pg_basebackup/bbstreamer_zstd.c b/src/bin/pg_basebackup/bbstreamer_zstd.c
index e86749a8fb7..337e789b6a1 100644
--- a/src/bin/pg_basebackup/bbstreamer_zstd.c
+++ b/src/bin/pg_basebackup/bbstreamer_zstd.c
@@ -67,6 +67,7 @@ bbstreamer_zstd_compressor_new(bbstreamer *next, int compresslevel)
 {
 #ifdef 

Re: refactoring basebackup.c

2022-03-15 Thread Robert Haas
On Tue, Mar 15, 2022 at 6:33 AM Jeevan Ladhe  wrote:
> I get following error at my end:
>
> $ pg_basebackup -D /tmp/zstd_bk -Ft -Xfetch --compress=server-zstd:7@4
> pg_basebackup: error: could not initiate base backup: ERROR:  could not 
> compress data: Unsupported parameter
> pg_basebackup: removing data directory "/tmp/zstd_bk"
>
> This is mostly because I have the zstd library version v1.4.4, which
> does not have default support for parallel workers. Maybe we should
> have a better error, something that is hinting that the parallelism is
> not supported by the particular build.

I'm not averse to trying to improve that error message, but honestly
I'd consider that to be good enough already to be acceptable. We could
think about trying to add an errhint() telling you that the problem
may be with your libzstd build.

> The regression for pg_verifybackup test 008_untar.pl also fails with a
> similar error. Here, I think we should have some logic in regression to
> skip the test if the parameter is not supported?

Or at least to have the test not fail.

> Also, just a thought, for the versions where parallelism is not
> supported, should we instead just throw a warning and fall back to
> non-parallel behavior?

I don't think so. I think it's better for the user to get an error and
then change their mind and request something we can do.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: refactoring basebackup.c

2022-03-15 Thread Jeevan Ladhe
Thanks for the patch, Dipesh.
I had a look at the patch and also tried to take the backup. I have
following suggestions and observations:

I get following error at my end:

$ pg_basebackup -D /tmp/zstd_bk -Ft -Xfetch --compress=server-zstd:7@4
pg_basebackup: error: could not initiate base backup: ERROR:  could not
compress data: Unsupported parameter
pg_basebackup: removing data directory "/tmp/zstd_bk"

This is mostly because I have the zstd library version v1.4.4, which
does not have default support for parallel workers. Maybe we should
have a better error, something that is hinting that the parallelism is
not supported by the particular build.

The regression for pg_verifybackup test 008_untar.pl also fails with a
similar error. Here, I think we should have some logic in regression to
skip the test if the parameter is not supported?

+   if (ZSTD_isError(ret))

+   elog(ERROR,

+"could not compress data: %s",

+ZSTD_getErrorName(ret));

I think all of this can go on one line, but anyhow we have to improve
the error message here.

Also, just a thought, for the versions where parallelism is not
supported, should we instead just throw a warning and fall back to
non-parallel behavior?

Regards,
Jeevan Ladhe

On Mon, 14 Mar 2022 at 21:41, Dipesh Pandit  wrote:

> Hi,
>
> I tried to implement support for parallel ZSTD compression. The
> library provides an option (ZSTD_c_nbWorkers) to specify the
> number of compression workers. The number of parallel
> workers can be set as part of compression parameter and if this
> option is specified then the library performs parallel compression
> based on the specified number of workers.
>
> User can specify the number of parallel worker as part of
> --compress option by appending an integer value after at sign (@).
> (-Z, --compress=[{client|server}-]{gzip|lz4|zstd}[:LEVEL][@WORKERS])
>
> Please find the attached patch v1 with the above changes.
>
> Note: ZSTD library version 1.5.x supports parallel compression
> by default and if the library version is lower than 1.5.x then
> parallel compression is enabled only the source is compiled with build
> macro ZSTD_MULTITHREAD. If the linked library version doesn't
> support parallel compression then setting the value of parameter
> ZSTD_c_nbWorkers to a value other than 0 will be no-op and
> returns an error.
>
> Thanks,
> Dipesh
>


Re: refactoring basebackup.c (zstd workers)

2022-03-14 Thread Robert Haas
On Mon, Mar 14, 2022 at 1:11 PM Justin Pryzby  wrote:
> Internally, I was thinking they'd all be handled as first-class options, with
> separate struct fields and separate replication protocol options.  If an 
> option
> isn't known, it'd be rejected on the client side, rather than causing an error
> on the server.

There's some appeal to that, but one downside is that it means that
the client can't be used to fetch data that is compressed in a way
that the server knows about and the client doesn't. I don't think
that's great. Why should, for example, pg_basebackup need to be
compiled with zstd support in order to request zstd compression on the
server side? If the server knows about the brand new
justin-magic-sauce compression algorithm, maybe the client should just
be able to request it and, when given various .jms files by the
server, shrug its shoulders and accept them for what they are. That
doesn't work if -Fp is involved, or similar, but it should work fine
for simple cases if we set things up right.

> Maybe there'd be an option parser for this in common/ (I think that might
> require having new data structure there too, maybe one for each compression
> method, or maybe a union{} to handles them all).  Most of the ~100 lines to
> support wal_compression='zstd:N' are to parse out the N.

Yes, it's actually a very simple feature now that we've got the rest
of the infrastructure set up correctly for it.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: refactoring basebackup.c (zstd workers)

2022-03-14 Thread Justin Pryzby
On Mon, Mar 14, 2022 at 01:02:20PM -0400, Robert Haas wrote:
> On Mon, Mar 14, 2022 at 12:35 PM Justin Pryzby  wrote:
> > I suggest to use a syntax that's more general than that, maybe something 
> > like
> >
> > :[level=]N,parallel=N,flag,flag,...
> >
> > For example, someone may want to use zstd "long" mode or (when it's 
> > released)
> > rsyncable mode, or specify fine-grained compression parameters (strategy,
> > windowLog, hashLog, etc).
> 
> That's an interesting idea. I wonder what the replication protocol
> ought to look like in that case. Should we have a COMPRESSION_DETAIL
> argument that is just a string, and let the server parse it out? Or
> separate protocol-level options? It does feel reasonable to have both
> COMPRESSION_LEVEL and COMPRESSION_WORKERS as first-class options, but
> I don't know that we want COMPRESSION_HASHLOG true as part of our
> first-class grammar.

I was only referring to the user-facing grammar.

Internally, I was thinking they'd all be handled as first-class options, with
separate struct fields and separate replication protocol options.  If an option
isn't known, it'd be rejected on the client side, rather than causing an error
on the server.

Maybe there'd be an option parser for this in common/ (I think that might
require having new data structure there too, maybe one for each compression
method, or maybe a union{} to handles them all).  Most of the ~100 lines to
support wal_compression='zstd:N' are to parse out the N.

-- 
Justin




Re: refactoring basebackup.c (zstd workers)

2022-03-14 Thread Robert Haas
On Mon, Mar 14, 2022 at 12:35 PM Justin Pryzby  wrote:
> I suggest to use a syntax that's more general than that, maybe something like
>
> :[level=]N,parallel=N,flag,flag,...
>
> For example, someone may want to use zstd "long" mode or (when it's released)
> rsyncable mode, or specify fine-grained compression parameters (strategy,
> windowLog, hashLog, etc).

That's an interesting idea. I wonder what the replication protocol
ought to look like in that case. Should we have a COMPRESSION_DETAIL
argument that is just a string, and let the server parse it out? Or
separate protocol-level options? It does feel reasonable to have both
COMPRESSION_LEVEL and COMPRESSION_WORKERS as first-class options, but
I don't know that we want COMPRESSION_HASHLOG true as part of our
first-class grammar.

> I hope the same syntax will be shared with wal_compression and pg_dump.
> And libpq, if that patch progresses.
>
> BTW, I think this may be better left for PG16.

Possibly so ... but if we're thinking of any revisions to the
newly-added grammar, we had better take care of that now, before it's
set in stone.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: refactoring basebackup.c (zstd workers)

2022-03-14 Thread Justin Pryzby
On Mon, Mar 14, 2022 at 09:41:35PM +0530, Dipesh Pandit wrote:
> I tried to implement support for parallel ZSTD compression. The
> library provides an option (ZSTD_c_nbWorkers) to specify the
> number of compression workers. The number of parallel
> workers can be set as part of compression parameter and if this
> option is specified then the library performs parallel compression
> based on the specified number of workers.
> 
> User can specify the number of parallel worker as part of
> --compress option by appending an integer value after at sign (@).
> (-Z, --compress=[{client|server}-]{gzip|lz4|zstd}[:LEVEL][@WORKERS])

I suggest to use a syntax that's more general than that, maybe something like

:[level=]N,parallel=N,flag,flag,...

For example, someone may want to use zstd "long" mode or (when it's released)
rsyncable mode, or specify fine-grained compression parameters (strategy,
windowLog, hashLog, etc).

I hope the same syntax will be shared with wal_compression and pg_dump.
And libpq, if that patch progresses.

BTW, I think this may be better left for PG16.

-- 
Justin




Re: refactoring basebackup.c

2022-03-14 Thread Dipesh Pandit
Hi,

I tried to implement support for parallel ZSTD compression. The
library provides an option (ZSTD_c_nbWorkers) to specify the
number of compression workers. The number of parallel
workers can be set as part of compression parameter and if this
option is specified then the library performs parallel compression
based on the specified number of workers.

User can specify the number of parallel worker as part of
--compress option by appending an integer value after at sign (@).
(-Z, --compress=[{client|server}-]{gzip|lz4|zstd}[:LEVEL][@WORKERS])

Please find the attached patch v1 with the above changes.

Note: ZSTD library version 1.5.x supports parallel compression
by default and if the library version is lower than 1.5.x then
parallel compression is enabled only the source is compiled with build
macro ZSTD_MULTITHREAD. If the linked library version doesn't
support parallel compression then setting the value of parameter
ZSTD_c_nbWorkers to a value other than 0 will be no-op and
returns an error.

Thanks,
Dipesh
From 688ad1e3f9b43bf911e8c3837497a874e4a6937f Mon Sep 17 00:00:00 2001
From: Dipesh Pandit 
Date: Mon, 14 Mar 2022 18:39:02 +0530
Subject: [PATCH] support parallel zstd compression

---
 doc/src/sgml/ref/pg_basebackup.sgml   | 11 ++-
 src/backend/replication/basebackup.c  | 14 +++-
 src/backend/replication/basebackup_zstd.c | 15 -
 src/bin/pg_basebackup/bbstreamer.h|  3 +-
 src/bin/pg_basebackup/bbstreamer_zstd.c   | 11 ++-
 src/bin/pg_basebackup/pg_basebackup.c | 97 +--
 src/bin/pg_verifybackup/t/008_untar.pl|  8 +++
 src/bin/pg_verifybackup/t/010_client_untar.pl |  8 +++
 src/include/replication/basebackup_sink.h |  2 +-
 9 files changed, 142 insertions(+), 27 deletions(-)

diff --git a/doc/src/sgml/ref/pg_basebackup.sgml b/doc/src/sgml/ref/pg_basebackup.sgml
index 4a630b5..87feca0 100644
--- a/doc/src/sgml/ref/pg_basebackup.sgml
+++ b/doc/src/sgml/ref/pg_basebackup.sgml
@@ -399,9 +399,9 @@ PostgreSQL documentation
 
  
   -Z level
-  -Z [{client|server}-]method[:level]
+  -Z [{client|server}-]method[:level][@workers]
   --compress=level
-  --compress=[{client|server}-]method[:level]
+  --compress=[{client|server}-]method[:level][@workers]
   

 Requests compression of the backup. If client or
@@ -428,6 +428,13 @@ PostgreSQL documentation
 the level is 0.


+Compression workers can be specified optionally by appending the
+number of workers after an at sign (@). It 
+defines the degree of parallelism while compressing the archive.
+Currently, parallel compression is supported only for
+zstd compressed archives.
+   
+   
 When the tar format is used with gzip,
 lz4, or zstd, the suffix
 .gz, .lz4, or
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index 2378ce5..8217fa9 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -82,6 +82,7 @@ typedef struct
 	backup_manifest_option manifest;
 	basebackup_compression_type	compression;
 	int			compression_level;
+	int			compression_workers;
 	pg_checksum_type manifest_checksum_type;
 } basebackup_options;
 
@@ -718,6 +719,7 @@ parse_basebackup_options(List *options, basebackup_options *opt)
 	char	   *target_str = "compat";	/* placate compiler */
 	bool		o_compression = false;
 	bool		o_compression_level = false;
+	bool		o_compression_workers = false;
 
 	MemSet(opt, 0, sizeof(*opt));
 	opt->target = BACKUP_TARGET_CLIENT;
@@ -925,6 +927,15 @@ parse_basebackup_options(List *options, basebackup_options *opt)
 			opt->compression_level = defGetInt32(defel);
 			o_compression_level = true;
 		}
+		else if (strcmp(defel->defname, "compression_workers") == 0)
+		{
+			if (o_compression_workers)
+ereport(ERROR,
+		(errcode(ERRCODE_SYNTAX_ERROR),
+		 errmsg("duplicate option \"%s\"", defel->defname)));
+			opt->compression_workers = defGetInt32(defel);
+			o_compression_workers = true;
+		}
 		else
 			ereport(ERROR,
 	(errcode(ERRCODE_SYNTAX_ERROR),
@@ -1030,7 +1041,8 @@ SendBaseBackup(BaseBackupCmd *cmd)
 	else if (opt.compression == BACKUP_COMPRESSION_LZ4)
 		sink = bbsink_lz4_new(sink, opt.compression_level);
 	else if (opt.compression == BACKUP_COMPRESSION_ZSTD)
-		sink = bbsink_zstd_new(sink, opt.compression_level);
+		sink = bbsink_zstd_new(sink, opt.compression_level,
+			   opt.compression_workers);
 
 	/* Set up progress reporting. */
 	sink = bbsink_progress_new(sink, opt.progress);
diff --git a/src/backend/replication/basebackup_zstd.c b/src/backend/replication/basebackup_zstd.c
index e3f9b1d..54b91eb 100644
--- a/src/backend/replication/basebackup_zstd.c
+++ b/src/backend/replication/basebackup_zstd.c
@@ -28,6 +28,9 @@ typedef struct bbsink_zstd
 	/* Compression level */
 	int			compresslevel;
 
+	/* Compression workers*/

Re: refactoring basebackup.c

2022-03-14 Thread Robert Haas
On Fri, Mar 11, 2022 at 8:52 PM Andres Freund  wrote:
> You could also just append a manifest as a compresed tar to the compressed tar
> stream. Unfortunately GNU tar requires -i to read concated compressed
> archives, so perhaps that's not quite an alternative.

s/Unfortunately/Fortunately/ :-p

I think we've already gone way too far in the direction of making this
stuff rely on specific details of the tar format. What if someday we
wanted to switch to pax, cpio, zip, 7zip, whatever, or even just have
one of those things as an option? It's not that I'm dying to have
PostgreSQL produce rar or arj files, but I think we box ourselves into
a corner when we just assume tar everywhere. As an example of a
similar issue with real consequences, consider the recent discovery
that we can't easily add support for LZ4 or ZSTD compression of
pg_wal.tar. The problem is that the existing code tells the gzip
library to emit the tar header as part of the compressed stream
without actually compressing it, and then it goes back and overwrites
that data later! Unsurprisingly, that's not a feature every
compression library offers.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: refactoring basebackup.c

2022-03-11 Thread Andres Freund
Hi,

On 2022-03-11 10:19:29 -0500, Robert Haas wrote:
> Thanks for the report. The problem here is that, when the output is
> standard output (-D -), pg_basebackup can only produce a single output
> file, so the manifest gets injected into the tar file on the client
> side rather than being written separately as we do in normal cases.
> However, that only works if we're receiving a tar file that we can
> parse from the server, and here the server is sending a compressed
> tarfile. The current code mistakely attempts to parse the compressed
> tarfile as if it were an uncompressed tarfile, which causes the error
> messages that you are seeing (and which I can also reproduce here). We
> actually have enough infrastructure available in pg_basebackup now
> that we could do the "right thing" in this case: decompress the data
> received from the server, parse the resulting tar file, inject the
> backup manifest, construct a new tar file, and recompress. However, I
> think that's probably not a good idea, because it's unlikely that the
> user will understand that the data is being compressed on the server,
> then decompressed, and then recompressed again, and the performance of
> the resulting pipeline will probably not be very good. So I think we
> should just refuse this command. Patch for that attached.

You could also just append a manifest as a compresed tar to the compressed tar
stream. Unfortunately GNU tar requires -i to read concated compressed
archives, so perhaps that's not quite an alternative.

Greetings,

Andres Freund




Re: refactoring basebackup.c

2022-03-11 Thread Robert Haas
On Tue, Feb 15, 2022 at 11:26 AM Robert Haas  wrote:
> On Wed, Feb 9, 2022 at 8:41 AM Abhijit Menon-Sen  wrote:
> > It took me a while to assimilate these patches, including the backup
> > targets one, which I hadn't looked at before. Now that I've wrapped my
> > head around how to put the pieces together, I really like the idea. As
> > you say, writing non-trivial integrations in C will take some effort,
> > but it seems worthwhile. It's also nice that one can continue to use
> > pg_basebackup to trigger the backups and see progress information.
>
> Cool. Thanks for having a look.
>
> > Yes, it looks simple to follow the example set by basebackup_to_shell to
> > write a custom target. The complexity will be in whatever we need to do
> > to store/forward the backup data, rather than in obtaining the data in
> > the first place, which is exactly as it should be.
>
> Yeah, that's what made me really happy with how this came out.
>
> Here's v2, rebased and with documentation added.

I don't hear many comments on this, but I'm pretty sure that it's a
good idea, and there haven't been many objections to this patch series
as a whole, so I'd like to proceed with it. If nobody objects
vigorously, I'll commit this next week.

Thanks,

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: refactoring basebackup.c

2022-03-11 Thread Robert Haas
On Fri, Mar 11, 2022 at 11:29 AM Justin Pryzby  wrote:
> Sounds right.

OK, committed.

> Also, I think the magic 8 for .gz should actually be a 7.
>
> I'm not sure why it tests for ".gz" but not ".tar.gz", which would help to 
> make
> them all less magic.
>
> commit 1fb1e21ba7a500bb2b85ec3e65f59130fcdb4a7e
> Author: Justin Pryzby 
> Date:   Thu Mar 10 21:22:16 2022 -0600

Yeah, your patch looks right. Committed that, too.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: refactoring basebackup.c

2022-03-11 Thread Justin Pryzby
On Fri, Mar 11, 2022 at 10:19:29AM -0500, Robert Haas wrote:
> So I think we should just refuse this command. Patch for that attached.

Sounds right.

Also, I think the magic 8 for .gz should actually be a 7.

I'm not sure why it tests for ".gz" but not ".tar.gz", which would help to make
them all less magic.

commit 1fb1e21ba7a500bb2b85ec3e65f59130fcdb4a7e
Author: Justin Pryzby 
Date:   Thu Mar 10 21:22:16 2022 -0600

pg_basebackup: make magic numbers less magic

The magic 8 for .gz should actually be a 7.

.tar.gz
1234567

.tar.lz4
.tar.zst
12345678

See d45099425, 751b8d23b, 7cf085f07.

diff --git a/src/bin/pg_basebackup/pg_basebackup.c 
b/src/bin/pg_basebackup/pg_basebackup.c
index 9f3ecc60fbe..8dd9721323d 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -1223,17 +1223,17 @@ CreateBackupStreamer(char *archive_name, char 
*spclocation,
is_tar = (archive_name_len > 4 &&
  strcmp(archive_name + archive_name_len - 4, ".tar") 
== 0);
 
-   /* Is this a gzip archive? */
-   is_tar_gz = (archive_name_len > 8 &&
-strcmp(archive_name + archive_name_len - 3, 
".gz") == 0);
+   /* Is this a .tar.gz archive? */
+   is_tar_gz = (archive_name_len > 7 &&
+strcmp(archive_name + archive_name_len - 7, 
".tar.gz") == 0);
 
-   /* Is this a LZ4 archive? */
+   /* Is this a .tar.lz4 archive? */
is_tar_lz4 = (archive_name_len > 8 &&
- strcmp(archive_name + archive_name_len - 4, 
".lz4") == 0);
+ strcmp(archive_name + archive_name_len - 8, 
".tar.lz4") == 0);
 
-   /* Is this a ZSTD archive? */
+   /* Is this a .tar.zst archive? */
is_tar_zstd = (archive_name_len > 8 &&
-  strcmp(archive_name + archive_name_len - 4, 
".zst") == 0);
+  strcmp(archive_name + archive_name_len - 8, 
".tar.zst") == 0);
 
/*
 * We have to parse the archive if (1) we're suppose to extract it, or 
if




Re: refactoring basebackup.c

2022-03-11 Thread Robert Haas
On Thu, Mar 10, 2022 at 8:02 PM Justin Pryzby  wrote:
> I'm getting errors from pg_basebackup when using both -D- and 
> --compress=server-*
> The issue seems to go away if I use --no-manifest.
>
> $ ./src/bin/pg_basebackup/pg_basebackup -h /tmp -Ft -D- --wal-method none 
> --compress=server-gzip >/dev/null ; echo $?
> pg_basebackup: error: tar member has empty name
> 1
>
> $ ./src/bin/pg_basebackup/pg_basebackup -h /tmp -Ft -D- --wal-method none 
> --compress=server-gzip >/dev/null ; echo $?
> NOTICE:  WAL archiving is not enabled; you must ensure that all required WAL 
> segments are copied through other means to complete the backup
> pg_basebackup: error: COPY stream ended before last file was finished
> 1

Thanks for the report. The problem here is that, when the output is
standard output (-D -), pg_basebackup can only produce a single output
file, so the manifest gets injected into the tar file on the client
side rather than being written separately as we do in normal cases.
However, that only works if we're receiving a tar file that we can
parse from the server, and here the server is sending a compressed
tarfile. The current code mistakely attempts to parse the compressed
tarfile as if it were an uncompressed tarfile, which causes the error
messages that you are seeing (and which I can also reproduce here). We
actually have enough infrastructure available in pg_basebackup now
that we could do the "right thing" in this case: decompress the data
received from the server, parse the resulting tar file, inject the
backup manifest, construct a new tar file, and recompress. However, I
think that's probably not a good idea, because it's unlikely that the
user will understand that the data is being compressed on the server,
then decompressed, and then recompressed again, and the performance of
the resulting pipeline will probably not be very good. So I think we
should just refuse this command. Patch for that attached.

-- 
Robert Haas
EDB: http://www.enterprisedb.com


reject-compressed-inject.patch
Description: Binary data


Re: refactoring basebackup.c

2022-03-10 Thread Justin Pryzby
I'm getting errors from pg_basebackup when using both -D- and 
--compress=server-*
The issue seems to go away if I use --no-manifest.

$ ./src/bin/pg_basebackup/pg_basebackup -h /tmp -Ft -D- --wal-method none 
--compress=server-gzip >/dev/null ; echo $?
pg_basebackup: error: tar member has empty name
1

$ ./src/bin/pg_basebackup/pg_basebackup -h /tmp -Ft -D- --wal-method none 
--compress=server-gzip >/dev/null ; echo $?
NOTICE:  WAL archiving is not enabled; you must ensure that all required WAL 
segments are copied through other means to complete the backup
pg_basebackup: error: COPY stream ended before last file was finished
1




Re: refactoring basebackup.c

2022-03-08 Thread Jeevan Ladhe
ok got it. Thanks for your insights.

Regards,
Jeevan Ladhe

On Tue, 8 Mar 2022 at 22:23, Robert Haas  wrote:

> On Tue, Mar 8, 2022 at 11:32 AM Jeevan Ladhe 
> wrote:
> > I reviewed the patch, and it seems to be capturing and replacing all the
> > places of HAVE_LIB* with USE_* correctly.
> > Just curious, apart from consistency, do you see other problems as well
> > when testing one vs the other?
>
> So, the kind of problem you would worry about in a case like this is:
> suppose that configure detects LIBLZ4, but the user specifies
> --without-lz4. Then maybe there is some way for HAVE_LIBLZ4 to be
> true, while USE_LIBLZ4 is false, and therefore we should not be
> compiling code that uses LZ4 but do anyway. As configure.ac is
> currently coded, I think that's impossible, because we only search for
> liblz4 if the user says --with-lz4, and if they do that, then USE_LZ4
> will be set. Therefore, I don't think there is a live problem here,
> just an inconsistency.
>
> Probably still best to clean it up before an angry Andres chases me
> down, since I know he's working on the build system...
>
> --
> Robert Haas
> EDB: http://www.enterprisedb.com
>


Re: refactoring basebackup.c

2022-03-08 Thread Robert Haas
On Tue, Mar 8, 2022 at 11:32 AM Jeevan Ladhe  wrote:
> I reviewed the patch, and it seems to be capturing and replacing all the
> places of HAVE_LIB* with USE_* correctly.
> Just curious, apart from consistency, do you see other problems as well
> when testing one vs the other?

So, the kind of problem you would worry about in a case like this is:
suppose that configure detects LIBLZ4, but the user specifies
--without-lz4. Then maybe there is some way for HAVE_LIBLZ4 to be
true, while USE_LIBLZ4 is false, and therefore we should not be
compiling code that uses LZ4 but do anyway. As configure.ac is
currently coded, I think that's impossible, because we only search for
liblz4 if the user says --with-lz4, and if they do that, then USE_LZ4
will be set. Therefore, I don't think there is a live problem here,
just an inconsistency.

Probably still best to clean it up before an angry Andres chases me
down, since I know he's working on the build system...

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: refactoring basebackup.c

2022-03-08 Thread Jeevan Ladhe
>
> OK, committed all that stuff.
>

Thanks for the commit Robert.


> I think we also need to fix one other thing. Right now, for LZ4
> support we test HAVE_LIBLZ4, but TOAST and XLOG compression are
> testing USE_LZ4, so I think we should be doing the same here. And
> similarly I think we should be testing USE_ZSTD not HAVE_LIBZSTD.
>

I reviewed the patch, and it seems to be capturing and replacing all the
places of HAVE_LIB* with USE_* correctly.
Just curious, apart from consistency, do you see other problems as well
when testing one vs the other?

Regards,
Jeevan Ladhe


Re: refactoring basebackup.c

2022-03-08 Thread Robert Haas
On Tue, Mar 8, 2022 at 4:49 AM Jeevan Ladhe  wrote:
> I agree with your patch. The patch looks good to me.
> Yes, the LZ4 flush check should also be fixed. Please find the attached
> patch to fix the LZ4 code.

OK, committed all that stuff.

I think we also need to fix one other thing. Right now, for LZ4
support we test HAVE_LIBLZ4, but TOAST and XLOG compression are
testing USE_LZ4, so I think we should be doing the same here. And
similarly I think we should be testing USE_ZSTD not HAVE_LIBZSTD.

Patch for that attached.

-- 
Robert Haas
EDB: http://www.enterprisedb.com


fix-symbol-tests.patch
Description: Binary data


Re: refactoring basebackup.c

2022-03-08 Thread Jeevan Ladhe
Hi Robert,

My proposed changes are largely cosmetic, but one thing that isn't is
> revising the size - pos <= bound tests to instead check size - pos <
> bound. My reasoning for that change is: if the number of bytes
> remaining in the buffer is exactly equal to the maximum number we can
> write, we don't need to flush it yet. If that sounds correct, we
> should fix the LZ4 code the same way.
>

I agree with your patch. The patch looks good to me.
Yes, the LZ4 flush check should also be fixed. Please find the attached
patch to fix the LZ4 code.

Regards,
Jeevan Ladhe


fix_lz4_flush_logic.patch
Description: Binary data


Re: refactoring basebackup.c

2022-03-07 Thread Robert Haas
On Wed, Feb 16, 2022 at 8:46 PM Jeevan Ladhe  wrote:
> Thanks for the comments Robert. I have addressed your comments in the
> attached patch v13-0002-ZSTD-add-server-side-compression-support.patch.
> Rest of the patches are similar to v12, but just bumped the version number.

OK, here's a consolidated patch with all your changes from 0002-0004
as 0001 plus a few proposed edits of my own in 0002. By and large I
think this is fine.

My proposed changes are largely cosmetic, but one thing that isn't is
revising the size - pos <= bound tests to instead check size - pos <
bound. My reasoning for that change is: if the number of bytes
remaining in the buffer is exactly equal to the maximum number we can
write, we don't need to flush it yet. If that sounds correct, we
should fix the LZ4 code the same way.

-- 
Robert Haas
EDB: http://www.enterprisedb.com


v14-0002-My-changes.patch
Description: Binary data


v14-0001-Patches-from-JL.patch
Description: Binary data


walmethods.c is kind of a mess (was Re: refactoring basebackup.c)

2022-03-04 Thread Robert Haas
On Fri, Mar 4, 2022 at 3:32 AM Dipesh Pandit  wrote:
> GZIP manages to overcome this problem as it provides an option to turn on/off
> compression on the fly while writing a compressed archive with the help of 
> zlib
> library function deflateParams(). The current gzip implementation for
> CreateWalTarMethod uses this library function to turn off compression just 
> before
> step #1 and it writes the uncompressed header of size equal to TAR_BLOCK_SIZE.
> It uses the same library function to turn on the compression for writing the 
> contents
> of the WAL file as part of step #2. It again turns off the compression just 
> before step
> #3 to overwrite the header. The header is overwritten at the same offset with 
> size
> equal to TAR_BLOCK_SIZE.

This is a real mess. To me, it seems like a pretty big hack to use
deflateParams() to shut off compression in the middle of the
compressed data stream so that we can go back and overwrite that part
of the data later. It appears that the only reason we need that hack
is because we don't know the file size starting out. Except we kind of
do know the size, because pad_to_size specifies a minimum size for the
file. It's true that the maximum file size is unbounded, but I'm not
sure why that's important. I wonder if anyone else has an idea why we
didn't just set the file size to pad_to_size exactly when we write the
tar header the first time, instead of this IMHO kind of nutty approach
where we back up. I'd try to figure it out from the comments, but
there basically aren't any. I also had a look at the relevant commit
messages and didn't see anything relevant there either. If I'm missing
something, please point it out.

While I'm complaining, I noticed while looking at this code that it is
documented that "The caller must ensure that only one method is
instantiated in any given program, and that it's only instantiated
once!" As far as I can see, this is because somebody thought about
putting all of the relevant data into a struct and then decided on an
alternative strategy of storing some of it there, and the rest in a
global variable. I can't quite imagine why anyone would think that was
a good idea. There may be some reason that I can't see right now, but
here again there appear to be no relevant code comments.

I'm somewhat inclined to wonder whether we could just get rid of
walmethods.c entirely and use the new bbstreamer stuff instead. That
code also knows how to write plain files into a directory, and write
tar archives, and compress stuff, but in my totally biased opinion as
the author of most of that code, it's better code. It has no
restriction on using at most one method per program, or of
instantiating that method only once, and it already has LZ4 support,
and there's a pending patch for ZSTD support that I intend to get
committed soon as well. It also has, and I know I might be beating a
dead horse here, comments. Now, admittedly, it does need to know the
size of each archive member up front in order to work, so if we can't
solve the problem then we can't go this route. But if we can't solve
that problem, then we also can't add LZ4 and ZSTD support to
walmethods.c, because random access to compressed data is not really a
thing, even if we hacked it to work for gzip.

Thoughts?

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: refactoring basebackup.c

2022-03-04 Thread Dipesh Pandit
Hi,

> > It will be good if we can also fix
> > CreateWalTarMethod to support LZ4 and ZSTD.
> Ok we will see, either Dipesh or I will take care of it.

I took a look at the CreateWalTarMethod to support LZ4 compression
for WAL files. The current implementation involves a 3 step to backup
a WAL file to a tar archive. For each file:

   1. It first writes the header in the function tar_open_for_write,
   flushes the contents of tar to disk and stores the header offset.
   2.  Next, the contents of WAL are written to the tar archive.
   3. In the end, it recalculates the checksum in function tar_close() and
   overwrites the header at an offset stored in step #1.

The need for overwriting header in CreateWalTarMethod is mainly related to
partial WAL files where the size of the WAL file < WalSegSize. The file is
being
padded and checksum is recalculated after adding pad bytes.

If we go ahead and implement LZ4 support for CreateWalTarMethod then
we have a problem here at step #3. In order to achieve better compression
ratio, compressed LZ4 blocks are linked to each other and these blocks
are decoded sequentially. If we overwrite the header as part of step #3 then
it corrupts the link between compressed LZ4 blocks. Although LZ4 provides
an option to write the compressed block independently (using blockMode
option set to LZ4F_blockIndepedent) but it is still a problem because we
don't
know if overwriting the header after recalculating the checksum will not
overlap
the boundary of the next block.

GZIP manages to overcome this problem as it provides an option to turn
on/off
compression on the fly while writing a compressed archive with the help of
zlib
library function deflateParams(). The current gzip implementation for
CreateWalTarMethod uses this library function to turn off compression just
before
step #1 and it writes the uncompressed header of size equal to
TAR_BLOCK_SIZE.
It uses the same library function to turn on the compression for writing
the contents
of the WAL file as part of step #2. It again turns off the compression just
before step
#3 to overwrite the header. The header is overwritten at the same offset
with size
equal to TAR_BLOCK_SIZE.

Since GZIP provides this option to enable/disable compression, it is
possible to
control the size of data we are writing to a compressed archive. Even if we
overwrite
an already written block in a compressed archive there is no risk of it
overlapping
with the boundary of the next block. This mechanism is not available in LZ4
and ZSTD.

In order to support LZ4 and ZSTD compression for CreateWalTarMethod we may
need to refactor this code unless I am missing something. We need to
somehow
add the padding bytes in case of partial WAL before we send it to the
compressed
archive. This will make sure that all files which are being compressed does
not
require any padding as the size is always equal to WalSegSize. There is no
need to
recalculate the checksum and we can avoid overwriting the header as part of
step #3.

Thoughts?

Thanks,
Dipesh


Re: refactoring basebackup.c

2022-02-16 Thread Jeevan Ladhe
Thanks for the comments Robert. I have addressed your comments in the
attached patch v13-0002-ZSTD-add-server-side-compression-support.patch.
Rest of the patches are similar to v12, but just bumped the version number.

> It will be good if we can also fix
> CreateWalTarMethod to support LZ4 and ZSTD.
Ok we will see, either Dipesh or I will take care of it.

Regards,
Jeevan Ladhe


On Thu, 17 Feb 2022 at 02:37, Robert Haas  wrote:

> On Wed, Feb 16, 2022 at 12:46 PM Jeevan Ladhe 
> wrote:
> > So, I went ahead and have now also implemented client side decompression
> > for zstd.
> >
> > Robert separated[1] the ZSTD configure switch from my original patch
> > of server side compression and also added documentation related to
> > the switch. I have included that patch here in the patch series for
> > simplicity.
> >
> > The server side compression patch
> > 0002-ZSTD-add-server-side-compression-support.patch has also taken care
> > of Justin Pryzby's comments[2]. Also, made changes to pg_basebackup help
> > as suggested by Álvaro Herrera.
>
> The first hunk of the documentation changes is missing a comma between
> gzip and lz4.
>
> + * At the start of each archive we reset the state to start a new
> + * compression operation. The parameters are sticky and they would
> stick
> + * around as we are resetting with option ZSTD_reset_session_only.
>
> I don't think "would" is what you mean here. If you say something
> would stick around, that means it could be that way it isn't. ("I
> would go to the store and buy some apples, but I know they don't have
> any so there's no point.") I think you mean "will".
>
> -printf(_("  -Z,
> --compress={[{client,server}-]gzip,lz4,none}[:LEVEL] or [LEVEL]\n"
> - " compress tar output with given
> compression method or level\n"));
> +printf(_("  -Z,
> --compress=[{client|server}-]{gzip|lz4|zstd}[:LEVEL]\n"));
> +printf(_("  -Z, --compress=none\n"));
>
> You deleted a line that you should have preserved here.
>
> Overall there doesn't seem to be much to complain about here on a
> first read-through. It will be good if we can also fix
> CreateWalTarMethod to support LZ4 and ZSTD.
>
> --
> Robert Haas
> EDB: http://www.enterprisedb.com
>


v13-0004-ZSTD-add-client-side-decompression-support.patch
Description: Binary data


v13-0001-Add-support-for-building-with-ZSTD.patch
Description: Binary data


v13-0002-ZSTD-add-server-side-compression-support.patch
Description: Binary data


v13-0003-ZSTD-add-client-side-compression-support.patch
Description: Binary data


Re: refactoring basebackup.c

2022-02-16 Thread Robert Haas
On Wed, Feb 16, 2022 at 12:46 PM Jeevan Ladhe  wrote:
> So, I went ahead and have now also implemented client side decompression
> for zstd.
>
> Robert separated[1] the ZSTD configure switch from my original patch
> of server side compression and also added documentation related to
> the switch. I have included that patch here in the patch series for
> simplicity.
>
> The server side compression patch
> 0002-ZSTD-add-server-side-compression-support.patch has also taken care
> of Justin Pryzby's comments[2]. Also, made changes to pg_basebackup help
> as suggested by Álvaro Herrera.

The first hunk of the documentation changes is missing a comma between
gzip and lz4.

+ * At the start of each archive we reset the state to start a new
+ * compression operation. The parameters are sticky and they would stick
+ * around as we are resetting with option ZSTD_reset_session_only.

I don't think "would" is what you mean here. If you say something
would stick around, that means it could be that way it isn't. ("I
would go to the store and buy some apples, but I know they don't have
any so there's no point.") I think you mean "will".

-printf(_("  -Z,
--compress={[{client,server}-]gzip,lz4,none}[:LEVEL] or [LEVEL]\n"
- " compress tar output with given
compression method or level\n"));
+printf(_("  -Z, --compress=[{client|server}-]{gzip|lz4|zstd}[:LEVEL]\n"));
+printf(_("  -Z, --compress=none\n"));

You deleted a line that you should have preserved here.

Overall there doesn't seem to be much to complain about here on a
first read-through. It will be good if we can also fix
CreateWalTarMethod to support LZ4 and ZSTD.

--
Robert Haas
EDB: http://www.enterprisedb.com




Re: refactoring basebackup.c

2022-02-16 Thread Jeevan Ladhe
Hi Everyone,

So, I went ahead and have now also implemented client side decompression
for zstd.

Robert separated[1] the ZSTD configure switch from my original patch
of server side compression and also added documentation related to
the switch. I have included that patch here in the patch series for
simplicity.

The server side compression patch
0002-ZSTD-add-server-side-compression-support.patch has also taken care
of Justin Pryzby's comments[2]. Also, made changes to pg_basebackup help
as suggested by Álvaro Herrera.

[1]
https://www.postgresql.org/message-id/CA%2BTgmobRisF-9ocqYDcMng6iSijGj1EZX99PgXA%3D3VVbWuahog%40mail.gmail.com
[2]
https://www.postgresql.org/message-id/20220215175944.GY31460%40telsasoft.com

Regards,
Jeevan Ladhe

On Wed, 16 Feb 2022 at 21:46, Robert Haas  wrote:

> On Wed, Feb 16, 2022 at 11:11 AM Alvaro Herrera 
> wrote:
> > This is hard to interpret for humans though because of the nested
> > brackets and braces.  It gets considerably easier if you split it in
> > separate variants:
> >
> >-Z, --compress=[{client|server}-]{gzip|lz4}[:LEVEL]
> >-Z, --compress=LEVEL
> >-Z, --compress=none
> >  compress tar output with given compression
> method or level
> >
> >
> > or, if you choose to leave the level-only variant undocumented, then
> >
> >-Z, --compress=[{client|server}-]{gzip|lz4}[:LEVEL]
> >-Z, --compress=none
> >  compress tar output with given compression
> method or level
> >
> > There still are some nested brackets and braces, but the scope is
> > reduced enough that interpreting seems quite a bit simpler.
>
> I could go for that. I'm also just noticing that "none" is not really
> a compression method or level, and the statement that it can only
> compress "tar" output is no longer correct, because server-side
> compression can be used together with -Fp. So maybe we should change
> the sentence afterward to something a bit more generic, like "specify
> whether and how to compress the backup".
>
> --
> Robert Haas
> EDB: http://www.enterprisedb.com
>


v12-0003-ZSTD-add-client-side-compression-support.patch
Description: Binary data


v2-0001-Add-support-for-building-with-ZSTD.patch
Description: Binary data


v12-0004-ZSTD-add-client-side-decompression-support.patch
Description: Binary data


v12-0002-ZSTD-add-server-side-compression-support.patch
Description: Binary data


Re: refactoring basebackup.c

2022-02-16 Thread Robert Haas
On Wed, Feb 16, 2022 at 11:11 AM Alvaro Herrera  wrote:
> This is hard to interpret for humans though because of the nested
> brackets and braces.  It gets considerably easier if you split it in
> separate variants:
>
>-Z, --compress=[{client|server}-]{gzip|lz4}[:LEVEL]
>-Z, --compress=LEVEL
>-Z, --compress=none
>  compress tar output with given compression method or 
> level
>
>
> or, if you choose to leave the level-only variant undocumented, then
>
>-Z, --compress=[{client|server}-]{gzip|lz4}[:LEVEL]
>-Z, --compress=none
>  compress tar output with given compression method or 
> level
>
> There still are some nested brackets and braces, but the scope is
> reduced enough that interpreting seems quite a bit simpler.

I could go for that. I'm also just noticing that "none" is not really
a compression method or level, and the statement that it can only
compress "tar" output is no longer correct, because server-side
compression can be used together with -Fp. So maybe we should change
the sentence afterward to something a bit more generic, like "specify
whether and how to compress the backup".

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: refactoring basebackup.c

2022-02-16 Thread Alvaro Herrera
On 2022-Feb-14, Robert Haas wrote:

> A more consistent way of writing the supported syntax would be like this:
> 
>   -Z, --compress={[{client|server}-]{gzip|lz4}}[:LEVEL]|LEVEL|none}
> 
> I would be somewhat inclined to leave the level-only variant
> undocumented and instead write it like this:
> 
>   -Z, --compress={[{client|server}-]{gzip|lz4}}[:LEVEL]|none}

This is hard to interpret for humans though because of the nested
brackets and braces.  It gets considerably easier if you split it in
separate variants:

   -Z, --compress=[{client|server}-]{gzip|lz4}[:LEVEL]
   -Z, --compress=LEVEL
   -Z, --compress=none
 compress tar output with given compression method or 
level


or, if you choose to leave the level-only variant undocumented, then

   -Z, --compress=[{client|server}-]{gzip|lz4}[:LEVEL]
   -Z, --compress=none
 compress tar output with given compression method or 
level

There still are some nested brackets and braces, but the scope is
reduced enough that interpreting seems quite a bit simpler.

-- 
Álvaro Herrera   39°49'30"S 73°17'W  —  https://www.EnterpriseDB.com/




Re: refactoring basebackup.c (zstd)

2022-02-16 Thread Robert Haas
On Tue, Feb 15, 2022 at 12:59 PM Justin Pryzby  wrote:
> There's superfluous changes to ./configure unrelated to the changes in
> configure.ac.  Probably because you're using a different version of autotools,
> or a vendor's patched copy.  You can remove the changes with git checkout -p 
> or
> similar.

I noticed this already and fixed it in the version of the patch I
posted on the other thread.

> +++ b/src/backend/replication/basebackup_zstd.c
> +bbsink *
> +bbsink_zstd_new(bbsink *next, int compresslevel)
> +{
> +#ifndef HAVE_LIBZSTD
> +   ereport(ERROR,
> +   (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> +errmsg("zstd compression is not supported by this 
> build")));
> +#else
>
> This should have an return; like what's added by 71ce8 and 302612a6c.
> Also, the parens() around errcode aren't needed since last year.

The parens are still acceptable style, though. The return I guess is needed.

> +   bbsink_zstd *sink;
> +
> +   Assert(next != NULL);
> +   Assert(compresslevel >= 0 && compresslevel <= 22);
> +
> +   if (compresslevel < 0 || compresslevel > 22)
> +   ereport(ERROR,
>
> This looks like dead code in assert builds.
> If it's unreachable, it can be elog().

Actually, the right thing to do here is remove the assert, I think. I
don't believe that the code is unreachable. If I'm wrong and it is
unreachable then the test-and-ereport should be removed.

> + * Compress the input data to the output buffer until we run out of input
> + * data. Each time the output buffer falls below the compression bound for
> + * the input buffer, invoke the archive_contents() method for then next sink.
>
> *the next sink ?

Yeah.

> Does anyone plan to include this for pg15 ?  If so, I think at least the WAL
> compression should have support added too.  I'd plan to rebase Michael's 
> patch.
> https://www.postgresql.org/message-id/ynqwd2gsmrnqw...@paquier.xyz

Yes, I'd like to get this into PG15. It's very similar to the LZ4
compression support which was already committed, so it feels like
finishing it up and including it in the release makes a lot of sense.
I'm not against the idea of using ZSTD in other places where it makes
sense as well, but I think that's a separate issue from this patch. As
far as I'm concerned, either basebackup compression with ZSTD or WAL
compression with ZSTD could be committed even if the other is not, and
I plan to spend my time on this project, not that project. However, if
you're saying you want to work on the WAL compression stuff, I've got
no objection to that.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: refactoring basebackup.c

2022-02-15 Thread Jeevan Ladhe
Thanks Tushar for the testing.

I further worked on ZSTD and now have implemented client side
compression as well. Attached are the patches for both server-side and
client-side compression.

The patch 0001 is a server-side patch, and has not changed since the
last patch version - v10, but, just bumping the version number.

Patch 0002 is the client-side compression patch.

Regards,
Jeevan Ladhe

On Tue, 15 Feb 2022 at 22:24, tushar  wrote:

> On 2/15/22 6:48 PM, Jeevan Ladhe wrote:
> > Please find the attached updated version of patch for ZSTD server side
> Thanks, Jeevan, I again tested with the attached patch, and as mentioned
> the crash is fixed now.
>
> also, I tested with different labels with gzip V/s zstd against data
> directory size which is 29GB and found these results
>
> 
> ./pg_basebackup  -t server:/tmp/
> --compress=server-zstd:  -Xnone -n -N --no-estimate-size -v
>
> --compress=server-zstd:1 =  compress directory size is  1.3GB
> --compress=server-zstd:4 = compress  directory size is  1.3GB
> --compress=server-zstd:7 = compress  directory size is  1.2GB
> --compress=server-zstd:12 = compress directory size is 1.2GB
> 
>
> ===
> ./pg_basebackup  -t server:/tmp/
> --compress=server-gzip:  -Xnone -n -N --no-estimate-size -v
>
> --compress=server-gzip:1 =  compress directory size is  1.8GB
> --compress=server-gzip:4 = compress  directory size is  1.6GB
> --compress=server-gzip:9 = compress  directory size is  1.6GB
> ===
>
> --
> regards,tushar
> EnterpriseDB  https://www.enterprisedb.com/
> The Enterprise PostgreSQL Company
>
>


v11-0001-Add-a-ZSTD-compression-method-for-server-side-compre.patch
Description: Binary data


v11-0002-ZSTD-CLIENT-compression-WIP-working.patch
Description: Binary data


Re: refactoring basebackup.c (zstd)

2022-02-15 Thread Justin Pryzby
+++ b/configure

@@ -801,6 +805,7 @@ infodir
 docdir
 oldincludedir
 includedir
+runstatedir

There's superfluous changes to ./configure unrelated to the changes in
configure.ac.  Probably because you're using a different version of autotools,
or a vendor's patched copy.  You can remove the changes with git checkout -p or
similar.

+++ b/src/backend/replication/basebackup_zstd.c
+bbsink *
+bbsink_zstd_new(bbsink *next, int compresslevel)
+{
+#ifndef HAVE_LIBZSTD
+   ereport(ERROR,
+   (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+errmsg("zstd compression is not supported by this 
build")));
+#else

This should have an return; like what's added by 71ce8 and 302612a6c.
Also, the parens() around errcode aren't needed since last year.

+   bbsink_zstd *sink;
+
+   Assert(next != NULL);
+   Assert(compresslevel >= 0 && compresslevel <= 22);
+
+   if (compresslevel < 0 || compresslevel > 22)
+   ereport(ERROR,

This looks like dead code in assert builds.
If it's unreachable, it can be elog().

+ * Compress the input data to the output buffer until we run out of input
+ * data. Each time the output buffer falls below the compression bound for
+ * the input buffer, invoke the archive_contents() method for then next sink.

*the next sink ?

Does anyone plan to include this for pg15 ?  If so, I think at least the WAL
compression should have support added too.  I'd plan to rebase Michael's patch.
https://www.postgresql.org/message-id/ynqwd2gsmrnqw...@paquier.xyz

-- 
Justin




Re: refactoring basebackup.c

2022-02-15 Thread tushar

On 2/15/22 6:48 PM, Jeevan Ladhe wrote:

Please find the attached updated version of patch for ZSTD server side
Thanks, Jeevan, I again tested with the attached patch, and as mentioned 
the crash is fixed now.


also, I tested with different labels with gzip V/s zstd against data 
directory size which is 29GB and found these results



./pg_basebackup  -t server:/tmp/ 
--compress=server-zstd:  -Xnone -n -N --no-estimate-size -v


--compress=server-zstd:1 =  compress directory size is  1.3GB
--compress=server-zstd:4 = compress  directory size is  1.3GB
--compress=server-zstd:7 = compress  directory size is  1.2GB
--compress=server-zstd:12 = compress directory size is 1.2GB


===
./pg_basebackup  -t server:/tmp/ 
--compress=server-gzip:  -Xnone -n -N --no-estimate-size -v


--compress=server-gzip:1 =  compress directory size is  1.8GB
--compress=server-gzip:4 = compress  directory size is  1.6GB
--compress=server-gzip:9 = compress  directory size is  1.6GB
===

--
regards,tushar
EnterpriseDB  https://www.enterprisedb.com/
The Enterprise PostgreSQL Company





Re: refactoring basebackup.c

2022-02-15 Thread Robert Haas
On Wed, Feb 9, 2022 at 8:41 AM Abhijit Menon-Sen  wrote:
> It took me a while to assimilate these patches, including the backup
> targets one, which I hadn't looked at before. Now that I've wrapped my
> head around how to put the pieces together, I really like the idea. As
> you say, writing non-trivial integrations in C will take some effort,
> but it seems worthwhile. It's also nice that one can continue to use
> pg_basebackup to trigger the backups and see progress information.

Cool. Thanks for having a look.

> Yes, it looks simple to follow the example set by basebackup_to_shell to
> write a custom target. The complexity will be in whatever we need to do
> to store/forward the backup data, rather than in obtaining the data in
> the first place, which is exactly as it should be.

Yeah, that's what made me really happy with how this came out.

Here's v2, rebased and with documentation added.

-- 
Robert Haas
EDB: http://www.enterprisedb.com


v2-0001-Allow-extensions-to-add-new-backup-targets.patch
Description: Binary data


v2-0002-Add-basebackup_to_shell-contrib-module.patch
Description: Binary data


Re: refactoring basebackup.c

2022-02-15 Thread Jeevan Ladhe
Hi,

Please find the attached updated version of patch for ZSTD server side
compression.

This patch has following changes:

- Fixes the issue Tushar reported[1].
- Adds a tap test.
- Makes document changes related to zstd.
- Updates pg_basebackup help for pg_basebackup. Here I have chosen the
suggestion by Robert upthread (as given below):

>> I would be somewhat inclined to leave the level-only variant
>> undocumented and instead write it like this:
>>  -Z, --compress={[{client|server}-]{gzip|lz4}}[:LEVEL]|none}

- pg_indent on basebackup_zstd.c.

Thanks Tushar, for offline help for testing the patch.

[1]
https://www.postgresql.org/message-id/6c3f1558-1e56-9946-78a2-c59340da1dbf%40enterprisedb.com

Regards,
Jeevan Ladhe

On Mon, 14 Feb 2022 at 21:30, Robert Haas  wrote:

> On Sat, Feb 12, 2022 at 1:01 AM Shinoda, Noriyoshi (PN Japan FSIP)
>  wrote:
> > Thank you for developing a great feature.
> > The current help message shown below does not seem to be able to specify
> the 'client-' or 'server-' for lz4 compression.
> >  --compress = {[{client, server}-]gzip, lz4, none}[:LEVEL]
> >
> > The attached small patch fixes the help message as follows:
> >  --compress = {[{client, server}-]{gzip, lz4}, none}[:LEVEL]
>
> Hmm. After studying this a bit more closely, I think this might
> actually need a bit more revision than what you propose here. In most
> places, we use vertical bars to separate alternatives:
>
>   -X, --wal-method=none|fetch|stream
>
> But here, we're using commas in some places and the word "or" in one
> case as well:
>
>   -Z, --compress={[{client,server}-]gzip,lz4,none}[:LEVEL] or [LEVEL]
>
> We're also not consistently using braces for grouping, which makes the
> order of operations a bit unclear, and it makes no sense to put
> brackets around LEVEL when it's the only thing that's part of that
> alternative.
>
> A more consistent way of writing the supported syntax would be like this:
>
>   -Z, --compress={[{client|server}-]{gzip|lz4}}[:LEVEL]|LEVEL|none}
>
> I would be somewhat inclined to leave the level-only variant
> undocumented and instead write it like this:
>
>   -Z, --compress={[{client|server}-]{gzip|lz4}}[:LEVEL]|none}
>
> --
> Robert Haas
> EDB: http://www.enterprisedb.com
>


v10-0001-Add-a-ZSTD-compression-method-for-server-side-compre.patch
Description: Binary data


Re: refactoring basebackup.c

2022-02-14 Thread Robert Haas
On Sat, Feb 12, 2022 at 1:01 AM Shinoda, Noriyoshi (PN Japan FSIP)
 wrote:
> Thank you for developing a great feature.
> The current help message shown below does not seem to be able to specify the 
> 'client-' or 'server-' for lz4 compression.
>  --compress = {[{client, server}-]gzip, lz4, none}[:LEVEL]
>
> The attached small patch fixes the help message as follows:
>  --compress = {[{client, server}-]{gzip, lz4}, none}[:LEVEL]

Hmm. After studying this a bit more closely, I think this might
actually need a bit more revision than what you propose here. In most
places, we use vertical bars to separate alternatives:

  -X, --wal-method=none|fetch|stream

But here, we're using commas in some places and the word "or" in one
case as well:

  -Z, --compress={[{client,server}-]gzip,lz4,none}[:LEVEL] or [LEVEL]

We're also not consistently using braces for grouping, which makes the
order of operations a bit unclear, and it makes no sense to put
brackets around LEVEL when it's the only thing that's part of that
alternative.

A more consistent way of writing the supported syntax would be like this:

  -Z, --compress={[{client|server}-]{gzip|lz4}}[:LEVEL]|LEVEL|none}

I would be somewhat inclined to leave the level-only variant
undocumented and instead write it like this:

  -Z, --compress={[{client|server}-]{gzip|lz4}}[:LEVEL]|none}

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: refactoring basebackup.c

2022-02-12 Thread Andres Freund
Hi,

On 2022-02-12 15:12:21 -0600, Justin Pryzby wrote:
> I think they would've been visible in the CI environment, too.

Yea, but only if you looked carefully enough. The postgres github repo has CI
enabled, and it's green. But the windows build step does show the warnings:

https://cirrus-ci.com/task/6185407539838976?logs=build#L2066
https://cirrus-ci.com/github/postgres/postgres/

[19:08:09.086] c:\cirrus\src\backend\replication\basebackup_lz4.c(87): warning 
C4715: 'bbsink_lz4_new': not all control paths return a value 
[c:\cirrus\postgres.vcxproj]

Probably worth scripting something to make the windows task error out if there
had been warnings, but only after running the tests.

Greetings,

Andres Freund




Re: refactoring basebackup.c

2022-02-12 Thread Justin Pryzby
The LZ4 patches caused new compiler warnings.
It's the same issue that was fixed at 71ce8 for gzip.
I think they would've been visible in the CI environment, too.

https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=wrasse=2022-02-12%2005%3A08%3A48=make
"/export/home/nm/farm/studio64v12_6/HEAD/pgsql.build/../pgsql/src/backend/replication/basebackup_lz4.c",
 line 87: warning: Function has no return statement : bbsink_lz4_new

https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=bowerbird=2022-02-12%2013%3A11%3A20=make
https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=hamerkop=2022-02-12%2010%3A04%3A08=make
warning C4715: 'bbsink_lz4_new': not all control paths return a value

https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=anole=2022-02-12%2005%3A46%3A44=make
"basebackup_lz4.c", line 87: warning #2940-D: missing return statement at end 
of non-void function "bbsink_lz4_new"

https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=wrasse=2022-02-12%2005%3A08%3A48=make
"/export/home/nm/farm/studio64v12_6/HEAD/pgsql.build/../pgsql/src/backend/replication/basebackup_lz4.c",
 line 87: warning: Function has no return statement : bbsink_lz4_new




RE: refactoring basebackup.c

2022-02-11 Thread Shinoda, Noriyoshi (PN Japan FSIP)
Hi, Hackers.
Thank you for developing a great feature.
The current help message shown below does not seem to be able to specify the 
'client-' or 'server-' for lz4 compression.
 --compress = {[{client, server}-]gzip, lz4, none}[:LEVEL]

The attached small patch fixes the help message as follows:
 --compress = {[{client, server}-]{gzip, lz4}, none}[:LEVEL]

Regards,
Noriyoshi Shinoda
-Original Message-
From: Robert Haas  
Sent: Saturday, February 12, 2022 12:50 AM
To: Justin Pryzby 
Cc: Jeevan Ladhe ; Dipesh Pandit 
; Abhijit Menon-Sen ; Dmitry Dolgov 
<9erthali...@gmail.com>; Jeevan Ladhe ; Mark 
Dilger ; pgsql-hack...@postgresql.org; tushar 

Subject: Re: refactoring basebackup.c

On Fri, Feb 11, 2022 at 10:29 AM Justin Pryzby  wrote:
> FYI: there's a couple typos in the last 2 patches.

Hmm. OK. But I don't consider "can be optionally specified" incorrect or worse 
than "can optionally be specified".

I do agree that spelling words correctly is a good idea.

--
Robert Haas
EDB: http://www.enterprisedb.com 




pg_basebackup_help_v1.diff
Description: pg_basebackup_help_v1.diff


Re: refactoring basebackup.c

2022-02-11 Thread Robert Haas
On Fri, Feb 11, 2022 at 10:29 AM Justin Pryzby  wrote:
> FYI: there's a couple typos in the last 2 patches.

Hmm. OK. But I don't consider "can be optionally specified" incorrect
or worse than "can optionally be specified".

I do agree that spelling words correctly is a good idea.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: refactoring basebackup.c

2022-02-11 Thread Justin Pryzby
On Fri, Feb 11, 2022 at 08:35:25PM +0530, Jeevan Ladhe wrote:
> Thanks Robert for the bravity :-)

FYI: there's a couple typos in the last 2 patches.

I added them to my typos branch; feel free to wait until April if you'd prefer
to see them fixed in bulk.

diff --git a/doc/src/sgml/ref/pg_basebackup.sgml 
b/doc/src/sgml/ref/pg_basebackup.sgml
index 53aa40dcd19..649b91208f3 100644
--- a/doc/src/sgml/ref/pg_basebackup.sgml
+++ b/doc/src/sgml/ref/pg_basebackup.sgml
@@ -419,7 +419,7 @@ PostgreSQL documentation

 The compression method can be set to gzip or
 lz4, or none for no
-compression. A compression level can be optionally specified, by
+compression. A compression level can optionally be specified, by
 appending the level number after a colon (:). If no
 level is specified, the default compression level will be used. If
 only a level is specified without mentioning an algorithm,
@@ -440,7 +440,7 @@ PostgreSQL documentation
 -Xstream, pg_wal.tar will
 be compressed using gzip if client-side gzip
 compression is selected, but will not be compressed if server-side
-compresion or LZ4 compresion is selected.
+compression or LZ4 compression is selected.

   
  




Re: refactoring basebackup.c

2022-02-11 Thread Jeevan Ladhe
Thanks Robert for the bravity :-)

Regards,
Jeevan Ladhe


On Fri, 11 Feb 2022, 20:31 Robert Haas,  wrote:

> On Fri, Feb 11, 2022 at 7:20 AM Dipesh Pandit 
> wrote:
> > > Sure, please find the rebased patch attached.
> >
> > Thanks, I have validated v2 patch on top of rebased patch.
>
> I'm still feeling brave, so I committed this too after fixing a few
> things. In the process I noticed that we don't have support for LZ4
> compression of streamed WAL (cf. CreateWalTarMethod). It would be good
> to fix that. I'm not quite sure whether
>
> http://postgr.es/m/pm1bMV6zZh9_4tUgCjSVMLxDX4cnBqCDGTmdGlvBLHPNyXbN18x_k00eyjkCCJGEajWgya2tQLUDpvb2iIwlD22IcUIrIt9WnMtssNh-F9k=@pm.me
> is basically what we need or whether something else is required.
>
> --
> Robert Haas
> EDB: http://www.enterprisedb.com
>


Re: refactoring basebackup.c

2022-02-11 Thread Robert Haas
On Fri, Feb 11, 2022 at 7:20 AM Dipesh Pandit  wrote:
> > Sure, please find the rebased patch attached.
>
> Thanks, I have validated v2 patch on top of rebased patch.

I'm still feeling brave, so I committed this too after fixing a few
things. In the process I noticed that we don't have support for LZ4
compression of streamed WAL (cf. CreateWalTarMethod). It would be good
to fix that. I'm not quite sure whether
http://postgr.es/m/pm1bMV6zZh9_4tUgCjSVMLxDX4cnBqCDGTmdGlvBLHPNyXbN18x_k00eyjkCCJGEajWgya2tQLUDpvb2iIwlD22IcUIrIt9WnMtssNh-F9k=@pm.me
is basically what we need or whether something else is required.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: refactoring basebackup.c

2022-02-11 Thread Robert Haas
On Fri, Feb 11, 2022 at 5:58 AM Jeevan Ladhe  wrote:
> >Jeevan, Your v12 patch does not apply on HEAD, it requires a
> rebase.
>
> Sure, please find the rebased patch attached.

It's Friday today, but I'm feeling brave, and it's still morning here,
so ... committed.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: refactoring basebackup.c

2022-02-11 Thread Dipesh Pandit
> Sure, please find the rebased patch attached.

Thanks, I have validated v2 patch on top of rebased patch.

Thanks,
Dipesh


Re: refactoring basebackup.c

2022-02-11 Thread Jeevan Ladhe
>Jeevan, Your v12 patch does not apply on HEAD, it requires a
rebase.

Sure, please find the rebased patch attached.

Regards,
Jeevan

On Fri, 11 Feb 2022 at 14:13, Dipesh Pandit  wrote:

> Hi,
>
> Thanks for the feedback, I have incorporated the suggestions
> and updated a new patch. PFA v2 patch.
>
> > I think similar to bbstreamer_lz4_compressor_content() in
> > bbstreamer_lz4_decompressor_content() we can change len to avail_in.
>
> In bbstreamer_lz4_decompressor_content(), we are modifying avail_in
> based on the number of bytes decompressed in each iteration. I think
> we cannot replace it with "len" here.
>
> Jeevan, Your v12 patch does not apply on HEAD, it requires a
> rebase. I have applied it on commit
> 400fc6b6487ddf16aa82c9d76e5cfbe64d94f660
> to validate my v2 patch.
>
> Thanks,
> Dipesh
>


v13-0001-Add-a-LZ4-compression-method-for-server-side-compres.patch
Description: Binary data


Re: refactoring basebackup.c

2022-02-11 Thread Dipesh Pandit
Hi,

Thanks for the feedback, I have incorporated the suggestions
and updated a new patch. PFA v2 patch.

> I think similar to bbstreamer_lz4_compressor_content() in
> bbstreamer_lz4_decompressor_content() we can change len to avail_in.

In bbstreamer_lz4_decompressor_content(), we are modifying avail_in
based on the number of bytes decompressed in each iteration. I think
we cannot replace it with "len" here.

Jeevan, Your v12 patch does not apply on HEAD, it requires a
rebase. I have applied it on commit 400fc6b6487ddf16aa82c9d76e5cfbe64d94f660
to validate my v2 patch.

Thanks,
Dipesh
From 47a0ef4348747ffa61eccd7954e00f3cf5fc7222 Mon Sep 17 00:00:00 2001
From: Dipesh Pandit 
Date: Thu, 3 Feb 2022 18:31:03 +0530
Subject: [PATCH] support client side compression and decompression using LZ4

---
 src/bin/pg_basebackup/Makefile|   1 +
 src/bin/pg_basebackup/bbstreamer.h|   3 +
 src/bin/pg_basebackup/bbstreamer_lz4.c| 431 ++
 src/bin/pg_basebackup/pg_basebackup.c |  32 +-
 src/bin/pg_verifybackup/t/009_extract.pl  |   7 +-
 src/bin/pg_verifybackup/t/010_client_untar.pl | 111 +++
 src/tools/msvc/Mkvcbuild.pm   |   1 +
 7 files changed, 580 insertions(+), 6 deletions(-)
 create mode 100644 src/bin/pg_basebackup/bbstreamer_lz4.c
 create mode 100644 src/bin/pg_verifybackup/t/010_client_untar.pl

diff --git a/src/bin/pg_basebackup/Makefile b/src/bin/pg_basebackup/Makefile
index ada3a5a..1d0db4f 100644
--- a/src/bin/pg_basebackup/Makefile
+++ b/src/bin/pg_basebackup/Makefile
@@ -43,6 +43,7 @@ BBOBJS = \
 	bbstreamer_file.o \
 	bbstreamer_gzip.o \
 	bbstreamer_inject.o \
+	bbstreamer_lz4.o \
 	bbstreamer_tar.o
 
 all: pg_basebackup pg_receivewal pg_recvlogical
diff --git a/src/bin/pg_basebackup/bbstreamer.h b/src/bin/pg_basebackup/bbstreamer.h
index fe49ae3..c2de77b 100644
--- a/src/bin/pg_basebackup/bbstreamer.h
+++ b/src/bin/pg_basebackup/bbstreamer.h
@@ -206,6 +206,9 @@ extern bbstreamer *bbstreamer_extractor_new(const char *basepath,
 			void (*report_output_file) (const char *));
 
 extern bbstreamer *bbstreamer_gzip_decompressor_new(bbstreamer *next);
+extern bbstreamer *bbstreamer_lz4_compressor_new(bbstreamer *next,
+ int compresslevel);
+extern bbstreamer *bbstreamer_lz4_decompressor_new(bbstreamer *next);
 extern bbstreamer *bbstreamer_tar_parser_new(bbstreamer *next);
 extern bbstreamer *bbstreamer_tar_terminator_new(bbstreamer *next);
 extern bbstreamer *bbstreamer_tar_archiver_new(bbstreamer *next);
diff --git a/src/bin/pg_basebackup/bbstreamer_lz4.c b/src/bin/pg_basebackup/bbstreamer_lz4.c
new file mode 100644
index 000..f0bc226
--- /dev/null
+++ b/src/bin/pg_basebackup/bbstreamer_lz4.c
@@ -0,0 +1,431 @@
+/*-
+ *
+ * bbstreamer_lz4.c
+ *
+ * Portions Copyright (c) 1996-2022, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *		  src/bin/pg_basebackup/bbstreamer_lz4.c
+ *-
+ */
+
+#include "postgres_fe.h"
+
+#include 
+
+#ifdef HAVE_LIBLZ4
+#include 
+#endif
+
+#include "bbstreamer.h"
+#include "common/logging.h"
+#include "common/file_perm.h"
+#include "common/string.h"
+
+#ifdef HAVE_LIBLZ4
+typedef struct bbstreamer_lz4_frame
+{
+	bbstreamer	base;
+
+	LZ4F_compressionContext_t	cctx;
+	LZ4F_decompressionContext_t	dctx;
+	LZ4F_preferences_t			prefs;
+
+	size_t		bytes_written;
+	bool		header_written;
+} bbstreamer_lz4_frame;
+
+static void bbstreamer_lz4_compressor_content(bbstreamer *streamer,
+			  bbstreamer_member *member,
+			  const char *data, int len,
+			  bbstreamer_archive_context context);
+static void bbstreamer_lz4_compressor_finalize(bbstreamer *streamer);
+static void bbstreamer_lz4_compressor_free(bbstreamer *streamer);
+
+const bbstreamer_ops bbstreamer_lz4_compressor_ops = {
+	.content = bbstreamer_lz4_compressor_content,
+	.finalize = bbstreamer_lz4_compressor_finalize,
+	.free = bbstreamer_lz4_compressor_free
+};
+
+static void bbstreamer_lz4_decompressor_content(bbstreamer *streamer,
+bbstreamer_member *member,
+const char *data, int len,
+bbstreamer_archive_context context);
+static void bbstreamer_lz4_decompressor_finalize(bbstreamer *streamer);
+static void bbstreamer_lz4_decompressor_free(bbstreamer *streamer);
+
+const bbstreamer_ops bbstreamer_lz4_decompressor_ops = {
+	.content = bbstreamer_lz4_decompressor_content,
+	.finalize = bbstreamer_lz4_decompressor_finalize,
+	.free = bbstreamer_lz4_decompressor_free
+};
+#endif
+
+/*
+ * Create a new base backup streamer that performs lz4 compression of tar
+ * blocks.
+ */
+bbstreamer *
+bbstreamer_lz4_compressor_new(bbstreamer *next, int compresslevel)
+{
+#ifdef HAVE_LIBLZ4
+	bbstreamer_lz4_frame   *streamer;
+	LZ4F_errorCode_t		ctxError;
+	LZ4F_preferences_t	   *prefs;
+	size_t	compressed_bound;
+
+	

Re: refactoring basebackup.c

2022-02-10 Thread Jeevan Ladhe
Thanks for the patch, Dipesh.
With a quick look at the patch I have following observations:

--
In bbstreamer_lz4_compressor_new(), I think this alignment is not needed
on client side:

/* Align the output buffer length. */
compressed_bound += compressed_bound + BLCKSZ - (compressed_bound %
BLCKSZ);
--

bbstreamer_lz4_compressor_content(), avail_in and len variables both are
not changed. I think we can simply change the len to avail_in in the
argument list.
--

Comment:
+* Update the offset and capacity of output buffer based on based
on number
+* of bytes written to output buffer.

I think it is thinko:

+* Update the offset and capacity of output buffer based on number
of
+* bytes written to output buffer.
--

Indentation:

+   if ((mystreamer->base.bbs_buffer.maxlen -
mystreamer->bytes_written) <=
+   footer_bound)

--
I think similar to bbstreamer_lz4_compressor_content() in
bbstreamer_lz4_decompressor_content() we can change len to avail_in.


Regards,
Jeevan Ladhe

On Thu, 10 Feb 2022 at 18:11, Dipesh Pandit  wrote:

> Hi,
>
> > On Mon, Jan 31, 2022 at 4:41 PM Jeevan Ladhe <
> jeevan.la...@enterprisedb.com> wrote:
>
>> Hi Robert,
>>
>> I had an offline discussion with Dipesh, and he will be working on the
>> lz4 client side decompression part.
>>
>
> Please find the attached patch to support client side compression
> and decompression using lz4.
>
> Added a new lz4 bbstreamer to compress the archive chunks at
> client if user has specified --compress=clinet-lz4:[LEVEL] option
> in pg_basebackup. The new streamer accepts archive chunks
> compresses it and forwards it to plain-writer.
>
> Similarly, If a user has specified a server compressed lz4 archive
> with plain format (-F p) backup then it requires decompressing
> the compressed archive chunks before forwarding it to tar extractor.
> Added a new bbstreamer to decompress the compressed archive
> and forward it to tar extractor.
>
> Note: This patch can be applied on Jeevan Ladhe's v12 patch
> for lz4 compression.
>
> Thanks,
> Dipesh
>


Re: refactoring basebackup.c

2022-02-10 Thread Dipesh Pandit
Hi,

> On Mon, Jan 31, 2022 at 4:41 PM Jeevan Ladhe <
jeevan.la...@enterprisedb.com> wrote:

> Hi Robert,
>
> I had an offline discussion with Dipesh, and he will be working on the
> lz4 client side decompression part.
>

Please find the attached patch to support client side compression
and decompression using lz4.

Added a new lz4 bbstreamer to compress the archive chunks at
client if user has specified --compress=clinet-lz4:[LEVEL] option
in pg_basebackup. The new streamer accepts archive chunks
compresses it and forwards it to plain-writer.

Similarly, If a user has specified a server compressed lz4 archive
with plain format (-F p) backup then it requires decompressing
the compressed archive chunks before forwarding it to tar extractor.
Added a new bbstreamer to decompress the compressed archive
and forward it to tar extractor.

Note: This patch can be applied on Jeevan Ladhe's v12 patch
for lz4 compression.

Thanks,
Dipesh
From 67e47579e119897c66e6f5f7a5e5e9542399072f Mon Sep 17 00:00:00 2001
From: Dipesh Pandit 
Date: Thu, 3 Feb 2022 18:31:03 +0530
Subject: [PATCH] support client side compression and decompression using LZ4

---
 src/bin/pg_basebackup/Makefile|   1 +
 src/bin/pg_basebackup/bbstreamer.h|   3 +
 src/bin/pg_basebackup/bbstreamer_lz4.c| 436 ++
 src/bin/pg_basebackup/pg_basebackup.c |  32 +-
 src/bin/pg_verifybackup/t/009_extract.pl  |   7 +-
 src/bin/pg_verifybackup/t/010_client_untar.pl | 111 +++
 src/tools/msvc/Mkvcbuild.pm   |   1 +
 7 files changed, 585 insertions(+), 6 deletions(-)
 create mode 100644 src/bin/pg_basebackup/bbstreamer_lz4.c
 create mode 100644 src/bin/pg_verifybackup/t/010_client_untar.pl

diff --git a/src/bin/pg_basebackup/Makefile b/src/bin/pg_basebackup/Makefile
index ada3a5a..1d0db4f 100644
--- a/src/bin/pg_basebackup/Makefile
+++ b/src/bin/pg_basebackup/Makefile
@@ -43,6 +43,7 @@ BBOBJS = \
 	bbstreamer_file.o \
 	bbstreamer_gzip.o \
 	bbstreamer_inject.o \
+	bbstreamer_lz4.o \
 	bbstreamer_tar.o
 
 all: pg_basebackup pg_receivewal pg_recvlogical
diff --git a/src/bin/pg_basebackup/bbstreamer.h b/src/bin/pg_basebackup/bbstreamer.h
index fe49ae3..c2de77b 100644
--- a/src/bin/pg_basebackup/bbstreamer.h
+++ b/src/bin/pg_basebackup/bbstreamer.h
@@ -206,6 +206,9 @@ extern bbstreamer *bbstreamer_extractor_new(const char *basepath,
 			void (*report_output_file) (const char *));
 
 extern bbstreamer *bbstreamer_gzip_decompressor_new(bbstreamer *next);
+extern bbstreamer *bbstreamer_lz4_compressor_new(bbstreamer *next,
+ int compresslevel);
+extern bbstreamer *bbstreamer_lz4_decompressor_new(bbstreamer *next);
 extern bbstreamer *bbstreamer_tar_parser_new(bbstreamer *next);
 extern bbstreamer *bbstreamer_tar_terminator_new(bbstreamer *next);
 extern bbstreamer *bbstreamer_tar_archiver_new(bbstreamer *next);
diff --git a/src/bin/pg_basebackup/bbstreamer_lz4.c b/src/bin/pg_basebackup/bbstreamer_lz4.c
new file mode 100644
index 000..9055a23
--- /dev/null
+++ b/src/bin/pg_basebackup/bbstreamer_lz4.c
@@ -0,0 +1,436 @@
+/*-
+ *
+ * bbstreamer_lz4.c
+ *
+ * Portions Copyright (c) 1996-2022, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *		  src/bin/pg_basebackup/bbstreamer_lz4.c
+ *-
+ */
+
+#include "postgres_fe.h"
+
+#include 
+
+#ifdef HAVE_LIBLZ4
+#include 
+#endif
+
+#include "bbstreamer.h"
+#include "common/logging.h"
+#include "common/file_perm.h"
+#include "common/string.h"
+
+#ifdef HAVE_LIBLZ4
+typedef struct bbstreamer_lz4_frame
+{
+	bbstreamer	base;
+
+	LZ4F_compressionContext_t	cctx;
+	LZ4F_decompressionContext_t	dctx;
+	LZ4F_preferences_t			prefs;
+
+	size_t		bytes_written;
+	bool		header_written;
+} bbstreamer_lz4_frame;
+
+static void bbstreamer_lz4_compressor_content(bbstreamer *streamer,
+			  bbstreamer_member *member,
+			  const char *data, int len,
+			  bbstreamer_archive_context context);
+static void bbstreamer_lz4_compressor_finalize(bbstreamer *streamer);
+static void bbstreamer_lz4_compressor_free(bbstreamer *streamer);
+
+const bbstreamer_ops bbstreamer_lz4_compressor_ops = {
+	.content = bbstreamer_lz4_compressor_content,
+	.finalize = bbstreamer_lz4_compressor_finalize,
+	.free = bbstreamer_lz4_compressor_free
+};
+
+static void bbstreamer_lz4_decompressor_content(bbstreamer *streamer,
+bbstreamer_member *member,
+const char *data, int len,
+bbstreamer_archive_context context);
+static void bbstreamer_lz4_decompressor_finalize(bbstreamer *streamer);
+static void bbstreamer_lz4_decompressor_free(bbstreamer *streamer);
+
+const bbstreamer_ops bbstreamer_lz4_decompressor_ops = {
+	.content = bbstreamer_lz4_decompressor_content,
+	.finalize = bbstreamer_lz4_decompressor_finalize,
+	.free = bbstreamer_lz4_decompressor_free

Re: refactoring basebackup.c

2022-02-09 Thread Abhijit Menon-Sen
At 2022-02-02 10:55:53 -0500, robertmh...@gmail.com wrote:
>
> On Tue, Jan 18, 2022 at 1:55 PM Robert Haas  wrote:
> > 0001 adds "server" and "blackhole" as backup targets. It now has some
> > tests. This might be more or less ready to ship, unless somebody else
> > sees a problem, or I find one.
> 
> I played around with this a bit and it seems quite easy to extend this
> further. So please find attached a couple more patches to generalize
> this mechanism.

It took me a while to assimilate these patches, including the backup
targets one, which I hadn't looked at before. Now that I've wrapped my
head around how to put the pieces together, I really like the idea. As
you say, writing non-trivial integrations in C will take some effort,
but it seems worthwhile. It's also nice that one can continue to use
pg_basebackup to trigger the backups and see progress information.

> Granted, coding up a new base backup target is
> something only experienced C hackers are likely to do, but the fact
> that I was able to throw this together so quickly suggests to me that
> I've got the design basically right, and that anyone who does want to
> plug into the new mechanism shouldn't have too much trouble doing so.
> 
> Thoughts?

Yes, it looks simple to follow the example set by basebackup_to_shell to
write a custom target. The complexity will be in whatever we need to do
to store/forward the backup data, rather than in obtaining the data in
the first place, which is exactly as it should be.

Thanks!

-- Abhijit




Re: refactoring basebackup.c

2022-02-02 Thread Robert Haas
On Tue, Jan 18, 2022 at 1:55 PM Robert Haas  wrote:
> 0001 adds "server" and "blackhole" as backup targets. It now has some
> tests. This might be more or less ready to ship, unless somebody else
> sees a problem, or I find one.

I played around with this a bit and it seems quite easy to extend this
further. So please find attached a couple more patches to generalize
this mechanism.

0001 adds an extensibility framework for backup targets. The idea is
that an extension loaded via shared_preload_libraries can call
BaseBackupAddTarget() to define a new base backup target, which the
user can then access via pg_basebackup --target TARGET_NAME, or if
they want to pass a detail string, pg_basebackup --target
TARGET_NAME:DETAIL. There might be slightly better ways of hooking
this into the system. I'm not unhappy with this approach, but there
might be a better idea out there.

0002 adds an example contrib module called basebackup_to_shell. The
system administrator can set basebackup_to_shell.command='SOMETHING'.
A backup directed to the 'shell' target will cause the server to
execute the configured command once per generated archive, and once
for the backup_manifest, if any. When executing the command, %f gets
replaced with the archive filename (e.g. base.tar) and %d gets
replaced with the detail. The actual contents of the file are passed
to the command's standard input, and it can then do whatever it likes
with that data. Clearly, this is not state of the art; for instance,
if what you really want is to upload the backup files someplace via
HTTP, using this to run 'curl' is probably not so good of an idea as
using an extension module that links with libcurl. That would likely
lead to better error checking, better performance, nicer
configuration, and just generally fewer things that can go wrong. On
the other hand, writing an integration in C is kind of tricky, and
this thing is quite easy to use -- and it does work.

There are a couple of things to be concerned about with 0002 from a
security perspective. First, in a backend environment, we have a
function to spawn a subprocess via popen(), namely OpenPipeStream(),
but there is no function to spawn a subprocess with execve() and end
up with a socket connected to its standard input. And that means that
whatever command the administrator configures is being interpreted by
the shell, which is a potential problem given that we're interpolating
the target detail string supplied by the user, who must have at least
replication privileges but need not be the superuser. I chose to
handle this by allowing the target detail to contain only alphanumeric
characters. Refinement is likely possible, but whether the effort is
worthwhile seems questionable. Second, what if the superuser wants to
allow the use of this module to only some of the users who have
replication privileges? That seems a bit unlikely but it's possible,
so I added a GUC basebackup_to_shell.required_role. If set, the
functionality is only usable by members of the named role. If unset,
anyone with replication privilege can use it. I guess someone could
criticize this as defaulting to the least secure setting, but
considering that you have to have replication privileges to use this
at all, I don't find that argument much to get excited about.

I have to say that I'm incredibly happy with how easy these patches
were to write. I think this is going to make adding new base backup
targets as accessible as we can realistically hope to make it. There
is some boilerplate code, as an examination of the patches will
reveal, but it's not a lot, and at least IMHO it's pretty
straightforward. Granted, coding up a new base backup target is
something only experienced C hackers are likely to do, but the fact
that I was able to throw this together so quickly suggests to me that
I've got the design basically right, and that anyone who does want to
plug into the new mechanism shouldn't have too much trouble doing so.

Thoughts?

-- 
Robert Haas
EDB: http://www.enterprisedb.com


0001-Allow-extensions-to-add-new-backup-targets.patch
Description: Binary data


0002-Add-basebackup_to_shell-contrib-module.patch
Description: Binary data


Re: refactoring basebackup.c

2022-01-31 Thread Jeevan Ladhe
> I think you are right, I have removed the message and again introduced
> the Assert() back.
>
In my previous version of patch, this was a problem, basically, there should
not be an assert as the code is still reachable be it server-lz4 or
client-lz4.
I removed the assert and added the level range check there similar to gzip.

Regards,
Jeevan Ladhe


v12-0001-Add-a-LZ4-compression-method-for-server-side-compres.patch
Description: Binary data


Re: refactoring basebackup.c

2022-01-31 Thread Robert Haas
On Mon, Jan 31, 2022 at 6:11 AM Jeevan Ladhe
 wrote:
> I had an offline discussion with Dipesh, and he will be working on the
> lz4 client side decompression part.

OK. I guess we should also be thinking about client-side LZ4
compression. It's probably best to focus on that before worrying about
ZSTD, even though ZSTD would be really cool to have.

>> - In the new test case you set decompress_flags but according to the
>> documentation I have here, -m is for multiple files (and so should not
>> be needed here) and -d is for decompression (which is what we want
>> here). So I'm confused why this is like this.
>
> As explained earlier in the tap test the 'lz4 -d base.tar.lz4' command was
> throwing the decompression to stdout. Now, I have removed the '-m',
> added '-d' for decompression, and also added the target file explicitly in
> the command.

I don't see the behavior you describe here. For me:

[rhaas ~]$ lz4 q.lz4
Decoding file q
q.lz4: decoded 3785 bytes
[rhaas ~]$ rm q
[rhaas ~]$ lz4 -m q.lz4
[rhaas ~]$ ls q
q
[rhaas ~]$ rm q
[rhaas ~]$ lz4 -d q.lz4
Decoding file q
q.lz4: decoded 3785 bytes
[rhaas ~]$ rm q
[rhaas ~]$ lz4 -d -m q.lz4
[rhaas ~]$ ls q
q

In other words, on my system, the file gets decompressed with or
without -d, and with or without -m. The only difference I see is that
using -m makes it happen silently, without printing anything on the
terminal. Anyway, I wasn't saying that using -m was necessarily wrong,
just that I didn't understand why you had it like that. Now that I'm
more informed, I recommend that we use -d -m, the former to be
explicit about wanting to decompress and the latter because it either
makes it less noisy (on my system) or makes it work at all (on yours).
It's surprising that the command behavior would be different like that
on different systems, but it is what it is. I think any set of flags
we put here is better than adding more logical in perl, as it keeps
things simpler.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: refactoring basebackup.c

2022-01-31 Thread Jeevan Ladhe
Hi Robert,

I had an offline discussion with Dipesh, and he will be working on the
lz4 client side decompression part.

Please find the attached patch with the following changes:

- Even if we were going to support LZ4 only on the server side, surely
> it's not right to refuse --compress lz4 and --compress client-lz4 at
> the parsing stage. I don't even think the message you added to main()
> is reachable.
>

I think you are right, I have removed the message and again introduced
the Assert() back.

- In the new test case you set decompress_flags but according to the
> documentation I have here, -m is for multiple files (and so should not
> be needed here) and -d is for decompression (which is what we want
> here). So I'm confused why this is like this.
>

As explained earlier in the tap test the 'lz4 -d base.tar.lz4' command was
throwing the decompression to stdout. Now, I have removed the '-m',
added '-d' for decompression, and also added the target file explicitly in
the command.

Regards,
Jeevan Ladhe


v11-0001-Add-a-LZ4-compression-method-for-server-side-compres.patch
Description: Binary data


Re: refactoring basebackup.c

2022-01-28 Thread Jeevan Ladhe
On Sat, Jan 29, 2022 at 1:20 AM Robert Haas  wrote:

> On Fri, Jan 28, 2022 at 12:48 PM Jeevan Ladhe
>  wrote:
> > I have attached the latest rebased version of the LZ4 server-side
> compression
> > patch on the recent commits. This patch also introduces the compression
> level
> > and adds a tap test.
>
> In view of this morning's commit of
> d45099425eb19e420433c9d81d354fe585f4dbd6 I think the threshold for
> committing this patch has gone up. We need to make it support
> decompression with LZ4 on the client side, as we now have for gzip.
>

Fair enough. Makes sense.


> - In the new test case you set decompress_flags but according to the
> documentation I have here, -m is for multiple files (and so should not
> be needed here) and -d is for decompression (which is what we want
> here). So I'm confused why this is like this.
>
>
'-d' is the default when we have a .lz4 extension, which is true in our
case,
hence elimininated that. About, '-m' introduction, without any option, or
even
after providing the explicit '-d' option, weirdly lz4 command was throwing
decompressed tar on the console, that's when in my lz4 man version I saw
these 2 lines and tried adding '-m' option, and it worked:

" It is considered bad practice to rely on implicit output in scripts.
 because the script´s environment may change. Always use explicit
 output in scripts. -c ensures that output will be stdout. Conversely,
 providing a destination name, or using -m ensures that the output will
 be either the specified name, or filename.lz4 respectively."

and

"Similarly, lz4 -m -d can decompress multiple *.lz4 files."


> This part seems clearly correct, so I have committed it.


Thanks for pushing it.

Regards,
Jeevan Ladhe


Re: refactoring basebackup.c

2022-01-28 Thread Robert Haas
On Fri, Jan 28, 2022 at 12:48 PM Jeevan Ladhe
 wrote:
> I have attached the latest rebased version of the LZ4 server-side compression
> patch on the recent commits. This patch also introduces the compression level
> and adds a tap test.

In view of this morning's commit of
d45099425eb19e420433c9d81d354fe585f4dbd6 I think the threshold for
committing this patch has gone up. We need to make it support
decompression with LZ4 on the client side, as we now have for gzip.

Other comments:

- Even if we were going to support LZ4 only on the server side, surely
it's not right to refuse --compress lz4 and --compress client-lz4 at
the parsing stage. I don't even think the message you added to main()
is reachable.

- In the new test case you set decompress_flags but according to the
documentation I have here, -m is for multiple files (and so should not
be needed here) and -d is for decompression (which is what we want
here). So I'm confused why this is like this.

Other than that this seems like it's in pretty good shape.

> Also, while adding the lz4 case in the pg_verifybackup/t/008_untar.pl, I found
> an unused variable {have_zlib}. I have attached a cleanup patch for that as 
> well.

This part seems clearly correct, so I have committed it.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: refactoring basebackup.c

2022-01-28 Thread Jeevan Ladhe
Hi Robert,

I have attached the latest rebased version of the LZ4 server-side
compression
patch on the recent commits. This patch also introduces the compression
level
and adds a tap test.

Also, while adding the lz4 case in the pg_verifybackup/t/008_untar.pl, I
found
an unused variable {have_zlib}. I have attached a cleanup patch for that as
well.

Please review and let me know your thoughts.

Regards,
Jeevan Ladhe


0001-gzip-tap-test-remove-extra-variable.patch
Description: Binary data


v10-0002-Add-a-LZ4-compression-method-for-server-side-compres.patch
Description: Binary data


Re: refactoring basebackup.c

2022-01-28 Thread Robert Haas
On Fri, Jan 28, 2022 at 3:54 AM Dipesh Pandit  wrote:
> Thanks. This makes sense.
>
> +#ifdef HAVE_LIBZ
> +   /*
> +* If the user has requested a server compressed archive along with 
> archive
> +* extraction at client then we need to decompress it.
> +*/
> +   if (format == 'p' && compressmethod == COMPRESSION_GZIP &&
> +   compressloc == COMPRESS_LOCATION_SERVER)
> +   streamer = bbstreamer_gzip_decompressor_new(streamer);
> +#endif
>
> I think it is not required to have HAVE_LIBZ check in pg_basebackup.c
> while creating a new gzip writer/decompressor. This check is already
> in place in bbstreamer_gzip_writer_new() and 
> bbstreamer_gzip_decompressor_new()
> and it throws an error in case the build does not have required library
> support. I have removed this check from pg_basebackup.c and updated
> a delta patch. The patch can be applied on v5 patch.

Right, makes sense. Committed with that change, plus I realized the
skip count in the test case file was wrong after the changes I made
yesterday, so I fixed that as well.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: refactoring basebackup.c

2022-01-28 Thread tushar

On 1/27/22 11:12 PM, Robert Haas wrote:

Well what's weird here is that you are using both --gzip and also
--compress. Those both control the same behavior, so it's a surprising
idea to specify both. But I guess if someone does, we should make the
second one fully override the first one. Here's a patch to try to do
that.

right, the current behavior was  -

[edb@centos7tushar bin]$ ./pg_basebackup  -t server:/tmp/y101 --gzip -Z 
none  -Xnone

pg_basebackup: error: cannot use compression level with method none
Try "pg_basebackup --help" for more information.

and even this was not matching with PG v14 behavior too
e.g
 ./pg_basebackup -Ft -z -Z none  -D /tmp/test1  ( working in PG v14 but 
throwing above error on PG HEAD)


and somewhere we were breaking the backward compatibility.

now with your patch -this seems working fine

[edb@centos7tushar bin]$ ./pg_basebackup  -t server:/tmp/y101 --gzip*-Z 
none*  -Xnone
NOTICE:  WAL archiving is not enabled; you must ensure that all required 
WAL segments are copied through other means to complete the backup

[edb@centos7tushar bin]$ ls /tmp/y101
backup_manifest *base.tar*

OR

[edb@centos7tushar bin]$  ./pg_basebackup  -t server:/tmp/y0p -Z none  
-Xfetch *-z*

[edb@centos7tushar bin]$ ls /tmp/y0p
backup_manifest *base.tar.gz*

but what about server-gzip:0? should it allow compressing the directory?

[edb@centos7tushar bin]$  ./pg_basebackup  -t server:/tmp/1 
--compress=server-gzip:0  -Xfetch

[edb@centos7tushar bin]$ ls /tmp/1
backup_manifest  base.tar.gz

--
regards,tushar
EnterpriseDB  https://www.enterprisedb.com/
The Enterprise PostgreSQL Company



Re: refactoring basebackup.c

2022-01-28 Thread Dipesh Pandit
Hi,

> I made a pass over these patches today and made a bunch of minor
> corrections. New version attached. The two biggest things I changed
> are (1) s/gzip_extractor/gzip_compressor/, because I feel like you
> extract an archive like a tarfile, but that is not what is happening
> here, this is not an archive and (2) I took a few bits of out of the
> test case that didn't seem to be necessary. There wasn't any reason
> that I could see why testing for PG_VERSION needed to be skipped when
> the compression method is 'none', so my first thought was to just take
> out the 'if' statement around that, but then after more thought that
> test and the one for pg_verifybackup are certainly going to fail if
> those files are not present, so why have an extra test? It might make
> sense if we were only conditionally able to run pg_verifybackup and
> wanted to have some test coverage even when we can't, but that's not
> the case here, so I see no point.

Thanks. This makes sense.

+#ifdef HAVE_LIBZ
+   /*
+* If the user has requested a server compressed archive along with
archive
+* extraction at client then we need to decompress it.
+*/
+   if (format == 'p' && compressmethod == COMPRESSION_GZIP &&
+   compressloc == COMPRESS_LOCATION_SERVER)
+   streamer = bbstreamer_gzip_decompressor_new(streamer);
+#endif

I think it is not required to have HAVE_LIBZ check in pg_basebackup.c
while creating a new gzip writer/decompressor. This check is already
in place in bbstreamer_gzip_writer_new() and
bbstreamer_gzip_decompressor_new()
and it throws an error in case the build does not have required library
support. I have removed this check from pg_basebackup.c and updated
a delta patch. The patch can be applied on v5 patch.

Thanks,
Dipesh
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 46ab60d..1f81bbf 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -1199,7 +1199,6 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
 			compressloc != COMPRESS_LOCATION_CLIENT)
 			streamer = bbstreamer_plain_writer_new(archive_filename,
    archive_file);
-#ifdef HAVE_LIBZ
 		else if (compressmethod == COMPRESSION_GZIP)
 		{
 			strlcat(archive_filename, ".gz", sizeof(archive_filename));
@@ -1207,7 +1206,6 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
   archive_file,
   compresslevel);
 		}
-#endif
 		else
 		{
 			Assert(false);		/* not reachable */
@@ -1256,7 +1254,6 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
 	else if (expect_unterminated_tarfile)
 		streamer = bbstreamer_tar_terminator_new(streamer);
 
-#ifdef HAVE_LIBZ
 	/*
 	 * If the user has requested a server compressed archive along with archive
 	 * extraction at client then we need to decompress it.
@@ -1264,7 +1261,6 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
 	if (format == 'p' && compressmethod == COMPRESSION_GZIP &&
 			compressloc == COMPRESS_LOCATION_SERVER)
 		streamer = bbstreamer_gzip_decompressor_new(streamer);
-#endif
 
 	/* Return the results. */
 	*manifest_inject_streamer_p = manifest_inject_streamer;


Re: refactoring basebackup.c

2022-01-27 Thread Robert Haas
On Thu, Jan 27, 2022 at 2:37 AM Dipesh Pandit  wrote:
> I have updated the patches to support server compression (gzip) for
> plain format backup. Please find attached v4 patches.

I made a pass over these patches today and made a bunch of minor
corrections. New version attached. The two biggest things I changed
are (1) s/gzip_extractor/gzip_compressor/, because I feel like you
extract an archive like a tarfile, but that is not what is happening
here, this is not an archive and (2) I took a few bits of out of the
test case that didn't seem to be necessary. There wasn't any reason
that I could see why testing for PG_VERSION needed to be skipped when
the compression method is 'none', so my first thought was to just take
out the 'if' statement around that, but then after more thought that
test and the one for pg_verifybackup are certainly going to fail if
those files are not present, so why have an extra test? It might make
sense if we were only conditionally able to run pg_verifybackup and
wanted to have some test coverage even when we can't, but that's not
the case here, so I see no point.

I studied this a bit to see whether I needed to make any adjustments
along the lines of 4f0bcc735038e96404fae59aa16ef9beaf6bb0aa in order
for this to work on msys. I think I don't, because 002_algorithm.pl
and 003_corruption.pl both pass $backup_path, not $real_backup_path,
to command_ok -- and I think something inside there does the
translation, which is weird, but we might as well be consistent.
008_untar.pl and 4f0bcc735038e96404fae59aa16ef9beaf6bb0aa needed to do
something different because --target server:X confused the msys magic,
but I think that shouldn't be an issue for this patch. However, I
might be wrong.

Barring objections or problems, I plan to commit this version
tomorrow. I'd do it today, but I have plans for tonight that are
incompatible with discovering that the build farm hates this 

-- 
Robert Haas
EDB: http://www.enterprisedb.com


v5-0001-Allow-server-side-compression-to-be-used-with-Fp.patch
Description: Binary data


Re: refactoring basebackup.c

2022-01-27 Thread Robert Haas
On Thu, Jan 27, 2022 at 12:08 PM tushar  wrote:
> On 1/27/22 10:17 PM, Robert Haas wrote:
> > Cool. I committed that patch.
> Thanks , Please refer to this scenario  where the label is set to  0 for
> server-gzip but the directory is still  compressed
>
> [edb@centos7tushar bin]$ ./pg_basebackup -t server:/tmp/11 --gzip
> --compress=0 -Xnone
> NOTICE:  all required WAL segments have been archived
> [edb@centos7tushar bin]$ ls /tmp/11
> 16384.tar  backup_manifest  base.tar
>
>
> [edb@centos7tushar bin]$ ./pg_basebackup -t server:/tmp/10 --gzip
> --compress=server-gzip:0 -Xnone
> NOTICE:  all required WAL segments have been archived
> [edb@centos7tushar bin]$ ls /tmp/10
> 16384.tar.gz  backup_manifest  base.tar.gz
>
> 0 is for no compression so the directory should not be compressed if we
> mention server-gzip:0 and both these
> above scenarios should match?

Well what's weird here is that you are using both --gzip and also
--compress. Those both control the same behavior, so it's a surprising
idea to specify both. But I guess if someone does, we should make the
second one fully override the first one. Here's a patch to try to do
that.

-- 
Robert Haas
EDB: http://www.enterprisedb.com


overwrite-compression-level.patch
Description: Binary data


Re: refactoring basebackup.c

2022-01-27 Thread tushar

On 1/27/22 10:17 PM, Robert Haas wrote:

Cool. I committed that patch.
Thanks , Please refer to this scenario  where the label is set to  0 for 
server-gzip but the directory is still  compressed


[edb@centos7tushar bin]$ ./pg_basebackup -t server:/tmp/11 --gzip 
--compress=0 -Xnone

NOTICE:  all required WAL segments have been archived
[edb@centos7tushar bin]$ ls /tmp/11
16384.tar  backup_manifest  base.tar


[edb@centos7tushar bin]$ ./pg_basebackup -t server:/tmp/10 --gzip 
--compress=server-gzip:0 -Xnone

NOTICE:  all required WAL segments have been archived
[edb@centos7tushar bin]$ ls /tmp/10
16384.tar.gz  backup_manifest  base.tar.gz

0 is for no compression so the directory should not be compressed if we 
mention server-gzip:0 and both these

above scenarios should match?

--
regards,tushar
EnterpriseDB  https://www.enterprisedb.com/
The Enterprise PostgreSQL Company





Re: refactoring basebackup.c

2022-01-27 Thread Robert Haas
On Thu, Jan 27, 2022 at 7:15 AM tushar  wrote:
> On 1/27/22 2:15 AM, Robert Haas wrote:
> > The attached patch should fix this, too.
> Thanks, the issues seem to be fixed now.

Cool. I committed that patch.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: refactoring basebackup.c

2022-01-27 Thread tushar

On 1/27/22 2:15 AM, Robert Haas wrote:

The attached patch should fix this, too.

Thanks, the issues seem to be fixed now.

--
regards,tushar
EnterpriseDB  https://www.enterprisedb.com/
The Enterprise PostgreSQL Company





Re: refactoring basebackup.c

2022-01-26 Thread Dipesh Pandit
Hi,

> It only needed trivial rebasing; I have committed it after doing that.

I have updated the patches to support server compression (gzip) for
plain format backup. Please find attached v4 patches.

Thanks,
Dipesh
From 4d0c84d6fac841aafb757535cc0e48334a251581 Mon Sep 17 00:00:00 2001
From: Dipesh Pandit 
Date: Mon, 24 Jan 2022 15:28:48 +0530
Subject: [PATCH 1/2] Support for extracting gzip compressed archive

pg_basebackup can support server side compression using gzip. In
order to support plain format backup with option '-Fp' we need to
add support for decompressing the compressed blocks at client. This
patch addresses the extraction of gzip compressed blocks at client.
---
 doc/src/sgml/ref/pg_basebackup.sgml |   8 +-
 src/bin/pg_basebackup/Makefile  |   1 +
 src/bin/pg_basebackup/bbstreamer.h  |   1 +
 src/bin/pg_basebackup/bbstreamer_file.c | 182 
 src/bin/pg_basebackup/bbstreamer_gzip.c | 376 
 src/bin/pg_basebackup/pg_basebackup.c   |  19 +-
 6 files changed, 401 insertions(+), 186 deletions(-)
 create mode 100644 src/bin/pg_basebackup/bbstreamer_gzip.c

diff --git a/doc/src/sgml/ref/pg_basebackup.sgml b/doc/src/sgml/ref/pg_basebackup.sgml
index 1d0df34..19849be 100644
--- a/doc/src/sgml/ref/pg_basebackup.sgml
+++ b/doc/src/sgml/ref/pg_basebackup.sgml
@@ -428,8 +428,12 @@ PostgreSQL documentation


 When the tar format is used, the suffix .gz will
-automatically be added to all tar filenames. Compression is not
-available in plain format.
+automatically be added to all tar filenames.
+   
+   
+Server compression can be specified with plain format backup. It
+enables compression of the archive at server and extract the
+compressed archive at client.

   
  
diff --git a/src/bin/pg_basebackup/Makefile b/src/bin/pg_basebackup/Makefile
index 5b18851..78d96c6 100644
--- a/src/bin/pg_basebackup/Makefile
+++ b/src/bin/pg_basebackup/Makefile
@@ -38,6 +38,7 @@ OBJS = \
 BBOBJS = \
 	pg_basebackup.o \
 	bbstreamer_file.o \
+	bbstreamer_gzip.o \
 	bbstreamer_inject.o \
 	bbstreamer_tar.o
 
diff --git a/src/bin/pg_basebackup/bbstreamer.h b/src/bin/pg_basebackup/bbstreamer.h
index fc88b50..270b0df 100644
--- a/src/bin/pg_basebackup/bbstreamer.h
+++ b/src/bin/pg_basebackup/bbstreamer.h
@@ -205,6 +205,7 @@ extern bbstreamer *bbstreamer_extractor_new(const char *basepath,
 			const char *(*link_map) (const char *),
 			void (*report_output_file) (const char *));
 
+extern bbstreamer *bbstreamer_gzip_extractor_new(bbstreamer *next);
 extern bbstreamer *bbstreamer_tar_parser_new(bbstreamer *next);
 extern bbstreamer *bbstreamer_tar_terminator_new(bbstreamer *next);
 extern bbstreamer *bbstreamer_tar_archiver_new(bbstreamer *next);
diff --git a/src/bin/pg_basebackup/bbstreamer_file.c b/src/bin/pg_basebackup/bbstreamer_file.c
index 77ca222..d721f87 100644
--- a/src/bin/pg_basebackup/bbstreamer_file.c
+++ b/src/bin/pg_basebackup/bbstreamer_file.c
@@ -11,10 +11,6 @@
 
 #include "postgres_fe.h"
 
-#ifdef HAVE_LIBZ
-#include 
-#endif
-
 #include 
 
 #include "bbstreamer.h"
@@ -30,15 +26,6 @@ typedef struct bbstreamer_plain_writer
 	bool		should_close_file;
 } bbstreamer_plain_writer;
 
-#ifdef HAVE_LIBZ
-typedef struct bbstreamer_gzip_writer
-{
-	bbstreamer	base;
-	char	   *pathname;
-	gzFile		gzfile;
-}			bbstreamer_gzip_writer;
-#endif
-
 typedef struct bbstreamer_extractor
 {
 	bbstreamer	base;
@@ -62,22 +49,6 @@ const bbstreamer_ops bbstreamer_plain_writer_ops = {
 	.free = bbstreamer_plain_writer_free
 };
 
-#ifdef HAVE_LIBZ
-static void bbstreamer_gzip_writer_content(bbstreamer *streamer,
-		   bbstreamer_member *member,
-		   const char *data, int len,
-		   bbstreamer_archive_context context);
-static void bbstreamer_gzip_writer_finalize(bbstreamer *streamer);
-static void bbstreamer_gzip_writer_free(bbstreamer *streamer);
-static const char *get_gz_error(gzFile gzf);
-
-const bbstreamer_ops bbstreamer_gzip_writer_ops = {
-	.content = bbstreamer_gzip_writer_content,
-	.finalize = bbstreamer_gzip_writer_finalize,
-	.free = bbstreamer_gzip_writer_free
-};
-#endif
-
 static void bbstreamer_extractor_content(bbstreamer *streamer,
 		 bbstreamer_member *member,
 		 const char *data, int len,
@@ -196,159 +167,6 @@ bbstreamer_plain_writer_free(bbstreamer *streamer)
 }
 
 /*
- * Create a bbstreamer that just compresses data using gzip, and then writes
- * it to a file.
- *
- * As in the case of bbstreamer_plain_writer_new, pathname is always used
- * for error reporting purposes; if file is NULL, it is also the opened and
- * closed so that the data may be written there.
- */
-bbstreamer *
-bbstreamer_gzip_writer_new(char *pathname, FILE *file, int compresslevel)
-{
-#ifdef HAVE_LIBZ
-	bbstreamer_gzip_writer *streamer;
-
-	streamer = palloc0(sizeof(bbstreamer_gzip_writer));
-	*((const bbstreamer_ops **) 

Re: refactoring basebackup.c

2022-01-26 Thread Robert Haas
On Tue, Jan 25, 2022 at 11:22 AM tushar  wrote:
> A)Getting syntax error if -z is used along with -t
>
> [edb@centos7tushar bin]$ ./pg_basebackup -t server:/tmp/data902 -z -Xfetch
> pg_basebackup: error: could not initiate base backup: ERROR:  syntax error

Oops. The attached patch should fix this.

> B)No information of "client-gzip" or "server-gzip" added under
> "--compress" option/method of ./pg_basebackup --help.

Already fixed by e1f860f13459e186479319aa9f65ef184277805f.

> C) -R option is silently ignoring

The attached patch should fix this, too.

Thanks for finding these issues.

-- 
Robert Haas
EDB: http://www.enterprisedb.com


fix-mistakes-found-by-tushar.patch
Description: Binary data


Re: refactoring basebackup.c

2022-01-26 Thread Robert Haas
On Tue, Jan 25, 2022 at 8:23 PM Michael Paquier  wrote:
> On Tue, Jan 25, 2022 at 03:54:52AM +, Shinoda, Noriyoshi (PN Japan FSIP) 
> wrote:
> > Michael, I am proposing to that we remove this message as part of
> > this commit:
> >
> > -pg_log_info("no value specified for compression
> > level, switching to default");
> >
> > I think most people won't want to specify a compression level, so
> > emitting a message when they don't seems too verbose.
>
> (Just noticed this message as I am not in CC.)
> Removing this message is fine by me, thanks!

Oh, I thought I'd CC'd you. I know I meant to do so.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




  1   2   >