Re: Teach pg_receivewal to use lz4 compression

2022-03-17 Thread Michael Paquier
On Thu, Mar 17, 2022 at 06:12:20AM -0500, Justin Pryzby wrote:
> I think this should use 
> 
> +#include "lz4frame.h"
> 
> commit ba595d2322da095a1e6703171b3f1f2815cb
> Author: Michael Paquier 
> Date:   Fri Nov 5 11:33:25 2021 +0900
> 
> Add support for LZ4 compression in pg_receivewal

Yes, you are right.  A second thing is that should be declared before
the PG headers.
--
Michael


signature.asc
Description: PGP signature


Re: Teach pg_receivewal to use lz4 compression

2022-03-17 Thread Justin Pryzby
On Sat, Feb 12, 2022 at 12:52:40PM +0900, Michael Paquier wrote:
> On Fri, Feb 11, 2022 at 10:07:49AM -0500, Robert Haas wrote:
> > Over in 
> > http://postgr.es/m/CA+TgmoYUDEJga2qV_XbAZ=pgebaosgfmzz6ac4_srwom_+u...@mail.gmail.com
> > I was noticing that CreateWalTarMethod doesn't support LZ4
> > compression. It would be nice if it did. I thought maybe the patch on
> > this thread would fix that, but I think maybe it doesn't, because it
> > looks like that's touching the WalDirectoryMethod part of that file,
> > rather than the WalTarMethod part. Is that correct?
> 
> Correct.  pg_receivewal only cares about the directory method, so this
> thread was limited to this part.  Yes, it would be nice to extend
> fully the tar method of walmethods.c to support LZ4, but I was not
> sure what needed to be done, and I am still not sure based on what has
> just been done as of 751b8d23.
> 
> > And, on a related note, Michael, do you plan to get something
> > committed here? 
> 
> Apart from f79962d, ba5 and 50e1441, I don't think that there was
> something left to do for this thread.  Perhaps I am missing something?

I think this should use 

+#include "lz4frame.h"

commit ba595d2322da095a1e6703171b3f1f2815cb
Author: Michael Paquier 
Date:   Fri Nov 5 11:33:25 2021 +0900

Add support for LZ4 compression in pg_receivewal

-- 
Justin




Re: Teach pg_receivewal to use lz4 compression

2022-02-14 Thread Robert Haas
On Fri, Feb 11, 2022 at 10:52 PM Michael Paquier  wrote:
> > And, on a related note, Michael, do you plan to get something
> > committed here?
>
> Apart from f79962d, ba5 and 50e1441, I don't think that there was
> something left to do for this thread.  Perhaps I am missing something?

Oh, my mistake. I didn't realize you'd already committed it.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Teach pg_receivewal to use lz4 compression

2022-02-11 Thread Michael Paquier
On Fri, Feb 11, 2022 at 10:07:49AM -0500, Robert Haas wrote:
> Over in 
> http://postgr.es/m/CA+TgmoYUDEJga2qV_XbAZ=pgebaosgfmzz6ac4_srwom_+u...@mail.gmail.com
> I was noticing that CreateWalTarMethod doesn't support LZ4
> compression. It would be nice if it did. I thought maybe the patch on
> this thread would fix that, but I think maybe it doesn't, because it
> looks like that's touching the WalDirectoryMethod part of that file,
> rather than the WalTarMethod part. Is that correct?

Correct.  pg_receivewal only cares about the directory method, so this
thread was limited to this part.  Yes, it would be nice to extend
fully the tar method of walmethods.c to support LZ4, but I was not
sure what needed to be done, and I am still not sure based on what has
just been done as of 751b8d23.

> And, on a related note, Michael, do you plan to get something
> committed here? 

Apart from f79962d, ba5 and 50e1441, I don't think that there was
something left to do for this thread.  Perhaps I am missing something?
--
Michael


signature.asc
Description: PGP signature


Re: Teach pg_receivewal to use lz4 compression

2022-02-11 Thread Robert Haas
On Thu, Nov 4, 2021 at 10:47 PM Michael Paquier  wrote:
> Indeed.  My rebase was a bit sloppy here.

Hi!

Over in 
http://postgr.es/m/CA+TgmoYUDEJga2qV_XbAZ=pgebaosgfmzz6ac4_srwom_+u...@mail.gmail.com
I was noticing that CreateWalTarMethod doesn't support LZ4
compression. It would be nice if it did. I thought maybe the patch on
this thread would fix that, but I think maybe it doesn't, because it
looks like that's touching the WalDirectoryMethod part of that file,
rather than the WalTarMethod part. Is that correct? And, on a related
note, Michael, do you plan to get something committed here?

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Teach pg_receivewal to use lz4 compression

2021-11-24 Thread Jeevan Ladhe
On Wed, Nov 24, 2021 at 10:55 AM Michael Paquier 
wrote:

> On Mon, Nov 22, 2021 at 09:02:47AM -0500, Robert Haas wrote:
> > On Mon, Nov 22, 2021 at 12:46 AM Jeevan Ladhe
> >  wrote:
> >> Fair enough. But, still I have a doubt in mind what benefit would that
> >> really bring to us here, because we are immediately also freeing the
> >> lz4buf without using it anywhere.
> >
> > Yeah, I'm also doubtful about that. If we're freeng the compression
> > context, we shouldn't need to guarantee that it's in any particular
> > state before doing so. Why would any critical cleanup be part of
> > LZ4F_compressEnd() rather than LZ4F_freeCompressionContext()? The
> > point of LZ4F_compressEnd() is to make sure all of the output bytes
> > get written, and it would be stupid to force people to write the
> > output bytes even when they've decided that they no longer care about
> > them due to some error.
>
> Hmm.  I have double-checked all that, and I agree that we could just
> skip LZ4F_compressEnd() in this error code path.  From what I can see
> in the upstream code, what we have now is not broken either, but the
> compressEnd() call does some work that's not needed here.


Yes I agree that we are not broken, but as you said we are doing some
an extra bit of work here.

Regards,
Jeevan Ladhe


Re: Teach pg_receivewal to use lz4 compression

2021-11-23 Thread Michael Paquier
On Mon, Nov 22, 2021 at 09:02:47AM -0500, Robert Haas wrote:
> On Mon, Nov 22, 2021 at 12:46 AM Jeevan Ladhe
>  wrote:
>> Fair enough. But, still I have a doubt in mind what benefit would that
>> really bring to us here, because we are immediately also freeing the
>> lz4buf without using it anywhere.
> 
> Yeah, I'm also doubtful about that. If we're freeng the compression
> context, we shouldn't need to guarantee that it's in any particular
> state before doing so. Why would any critical cleanup be part of
> LZ4F_compressEnd() rather than LZ4F_freeCompressionContext()? The
> point of LZ4F_compressEnd() is to make sure all of the output bytes
> get written, and it would be stupid to force people to write the
> output bytes even when they've decided that they no longer care about
> them due to some error.

Hmm.  I have double-checked all that, and I agree that we could just
skip LZ4F_compressEnd() in this error code path.  From what I can see
in the upstream code, what we have now is not broken either, but the
compressEnd() call does some work that's not needed here.
--
Michael


signature.asc
Description: PGP signature


Re: Teach pg_receivewal to use lz4 compression

2021-11-22 Thread Robert Haas
On Mon, Nov 22, 2021 at 12:46 AM Jeevan Ladhe
 wrote:
> Fair enough. But, still I have a doubt in mind what benefit would that
> really bring to us here, because we are immediately also freeing the
> lz4buf without using it anywhere.

Yeah, I'm also doubtful about that. If we're freeng the compression
context, we shouldn't need to guarantee that it's in any particular
state before doing so. Why would any critical cleanup be part of
LZ4F_compressEnd() rather than LZ4F_freeCompressionContext()? The
point of LZ4F_compressEnd() is to make sure all of the output bytes
get written, and it would be stupid to force people to write the
output bytes even when they've decided that they no longer care about
them due to some error.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Teach pg_receivewal to use lz4 compression

2021-11-21 Thread Jeevan Ladhe
On Fri, Nov 19, 2021 at 7:37 AM Michael Paquier  wrote:

> On Thu, Nov 18, 2021 at 07:54:37PM +0530, Jeevan Ladhe wrote:
> > In dir_open_for_write() I observe that we are writing the header
> > and then calling LZ4F_compressEnd() in case there is an error
> > while writing the buffer to the file, and the output buffer of
> > LZ4F_compressEnd() is not written anywhere. Why should this be
> > necessary? To flush the contents of the internal buffer? But, then we
> > are calling LZ4F_freeCompressionContext() immediately after the
> > LZ4F_compressEnd() call. I might be missing something, will be
> > happy to get more insights.
>
> My concern here was symmetry, where IMO it makes sense to have a
> compressEnd call each time there is a successful compressBegin call
> done for the LZ4 state data, as there is no way to know if in the
> future LZ4 won't change some of its internals to do more than just an
> internal flush.
>

Fair enough. But, still I have a doubt in mind what benefit would that
really bring to us here, because we are immediately also freeing the
lz4buf without using it anywhere.

Regards,
Jeevan


Re: Teach pg_receivewal to use lz4 compression

2021-11-19 Thread gkokolatos
‐‐‐ Original Message ‐‐‐

On Friday, November 19th, 2021 at 3:07 AM, Michael Paquier 
 wrote:

> On Thu, Nov 18, 2021 at 07:54:37PM +0530, Jeevan Ladhe wrote:
>
> > In dir_open_for_write() I observe that we are writing the header
> > and then calling LZ4F_compressEnd() in case there is an error
> > while writing the buffer to the file, and the output buffer of
> > LZ4F_compressEnd() is not written anywhere. Why should this be
> > necessary? To flush the contents of the internal buffer? But, then we
> > are calling LZ4F_freeCompressionContext() immediately after the
> > LZ4F_compressEnd() call. I might be missing something, will be
> > happy to get more insights.
>
> My concern here was symmetry, where IMO it makes sense to have a
> compressEnd call each time there is a successful compressBegin call
> done for the LZ4 state data, as there is no way to know if in the
> future LZ4 won't change some of its internals to do more than just an
> internal flush.

Agreed.

Although the library does provide an interface for simply flushing contents, it
also assumes that each initializing call will have a finilizing call. If my
memory serves me right, earlier versions of the patch, did not have this
summetry, but that got ammended.

Cheers,
//Georgios

> ---
> Michael





Re: Teach pg_receivewal to use lz4 compression

2021-11-18 Thread Michael Paquier
On Thu, Nov 18, 2021 at 07:54:37PM +0530, Jeevan Ladhe wrote:
> In dir_open_for_write() I observe that we are writing the header
> and then calling LZ4F_compressEnd() in case there is an error
> while writing the buffer to the file, and the output buffer of
> LZ4F_compressEnd() is not written anywhere. Why should this be
> necessary? To flush the contents of the internal buffer? But, then we
> are calling LZ4F_freeCompressionContext() immediately after the
> LZ4F_compressEnd() call. I might be missing something, will be
> happy to get more insights.

My concern here was symmetry, where IMO it makes sense to have a
compressEnd call each time there is a successful compressBegin call
done for the LZ4 state data, as there is no way to know if in the
future LZ4 won't change some of its internals to do more than just an
internal flush.
--
Michael


signature.asc
Description: PGP signature


Re: Teach pg_receivewal to use lz4 compression

2021-11-18 Thread Jeevan Ladhe
In dir_open_for_write() I observe that we are writing the header
and then calling LZ4F_compressEnd() in case there is an error
while writing the buffer to the file, and the output buffer of
LZ4F_compressEnd() is not written anywhere. Why should this be
necessary? To flush the contents of the internal buffer? But, then we
are calling LZ4F_freeCompressionContext() immediately after the
LZ4F_compressEnd() call. I might be missing something, will be
happy to get more insights.

Regards,
Jeevan Ladhe

On Fri, Nov 5, 2021 at 1:21 PM  wrote:

>
>
> ‐‐‐ Original Message ‐‐‐
>
> On Friday, November 5th, 2021 at 3:47 AM, Michael Paquier <
> mich...@paquier.xyz> wrote:
>
> >
> > I have spent an extra couple of hours staring at the code, and the
> > whole looked fine, so applied. While on it, I have tested the new TAP
> > tests with all the possible combinations of --without-zlib and
> > --with-lz4.
>
> Great news. Thank you very much.
>
> Cheers,
> //Georgios
>
> > --
> > Michael
>
>
>


Re: Teach pg_receivewal to use lz4 compression

2021-11-05 Thread gkokolatos



‐‐‐ Original Message ‐‐‐

On Friday, November 5th, 2021 at 3:47 AM, Michael Paquier  
wrote:

>
> I have spent an extra couple of hours staring at the code, and the
> whole looked fine, so applied. While on it, I have tested the new TAP
> tests with all the possible combinations of --without-zlib and
> --with-lz4.

Great news. Thank you very much.

Cheers,
//Georgios

> --
> Michael




Re: Teach pg_receivewal to use lz4 compression

2021-11-04 Thread Michael Paquier
On Thu, Nov 04, 2021 at 05:02:28PM +, gkokola...@pm.me wrote:
> Removed an extra condinional check while switching over compression_method.

Indeed.  My rebase was a bit sloppy here.

> because compression_method is the global option exposed to the whereas
> wal_compression_method is the local variable used to figure out what kind of
> file the function is currently working with. This error was existing at least 
> in
> v9-0002 of $subject.

Right.

> I felt that converting it a do {} while () loop instead, will help with
> readability:
> +   do
> +   {
> 
> +   /*
> +* No need to continue reading the file when the uncompressed_size
> +* exceeds WalSegSz, even if there are still data left to read.
> +* However, if uncompressed_size is equal to WalSegSz, it should
> +* verify that there is no more data to read.
> +*/
> +   } while (r > 0 && uncompressed_size <= WalSegSz);

No objections from me to do that.  This makes the code a bit easier to
follow, indeed.

> I would like to have a bit more test coverage in the case for 
> FindStreamingStart().
> Specifically for the case that a lz4-compressed segment larger than WalSegSz 
> exists.

The same could be said for gzip.  I am not sure that this is worth the
extra I/O and pg_receivewal commands, though.

I have spent an extra couple of hours staring at the code, and the
whole looked fine, so applied.  While on it, I have tested the new TAP
tests with all the possible combinations of --without-zlib and
--with-lz4.
--
Michael


signature.asc
Description: PGP signature


Re: Teach pg_receivewal to use lz4 compression

2021-11-04 Thread gkokolatos


‐‐‐ Original Message ‐‐‐

On Thursday, November 4th, 2021 at 9:21 AM, Michael Paquier 
 wrote:
> > +#ifdef HAVE_LIBLZ4
> > +while (readp < readend)
> > +{
> > +size_tread_size = 1;
> > +size_tout_size = 1;
> > +
> > +status = LZ4F_decompress(ctx, outbuf, _size,
> > + readbuf, _size, NULL);
>
> And...  It happens that the error from v9 is here, where we need to
> read the amount of remaining data from "readp", and not "readbuf" :)
>
> It is already late here, I'll continue on this stuff tomorrow, but
> this looks rather committable overall.

Thank you for v11 of the patch. Please find attached v12 which addresses a few
minor points.

Added an Oxford comma since the list now contains three or more terms:
---with-lz4) and none.
+--with-lz4), and none.

Removed an extra condinional check while switching over compression_method.
Instead of:
+   case COMPRESSION_LZ4:
+#ifdef HAVE_LIBLZ4
+   if (compresslevel != 0)
+   {
+   pg_log_error("cannot use --compress with
--compression-method=%s",
+"lz4");
+   fprintf(stderr, _("Try \"%s --help\" for more 
information.\n"),
+   progname);
+   exit(1);
+   }
+#else
+   if (compression_method == COMPRESSION_LZ4)
+   {
+   pg_log_error("this build does not support compression 
with %s",
+"LZ4");
+   exit(1);
+   }
+   break;
+#endif

I opted for:
+   case COMPRESSION_LZ4:
+#ifdef HAVE_LIBLZ4
+   if (compresslevel != 0)
+   {
+   pg_log_error("cannot use --compress with
--compression-method=%s",
+"lz4");
+   fprintf(stderr, _("Try \"%s --help\" for more 
information.\n"),
+   progname);
+   exit(1);
+   }
+#else
+   pg_log_error("this build does not support compression with 
%s",
+"LZ4");
+   exit(1);
+ #endif

There was an error while trying to find the streaming start. The code read:
+ else if (!ispartial && compression_method == COMPRESSION_LZ4)

where it should be instead:
+ else if (!ispartial && wal_compression_method == COMPRESSION_LZ4)

because compression_method is the global option exposed to the whereas
wal_compression_method is the local variable used to figure out what kind of
file the function is currently working with. This error was existing at least in
v9-0002 of $subject.

The variables readbuf and outbuf, used in the decompression of LZ4 files, are
now heap allocated.

Last, while the following is correct:
+   /*
+* Once we have read enough data to cover one segment, we are
+* done, there is no need to do more.
+*/
+   while (uncompressed_size <= WalSegSz)

I felt that converting it a do {} while () loop instead, will help with
readability:
+   do
+   {

+   /*
+* No need to continue reading the file when the uncompressed_size
+* exceeds WalSegSz, even if there are still data left to read.
+* However, if uncompressed_size is equal to WalSegSz, it should
+* verify that there is no more data to read.
+*/
+   } while (r > 0 && uncompressed_size <= WalSegSz);

of course the check:
+   /* Done reading the file */
+   if (r == 0)
+   break;
midway the loop is no longer needed and thus removed.

I would like to have a bit more test coverage in the case for 
FindStreamingStart().
Specifically for the case that a lz4-compressed segment larger than WalSegSz 
exists.
The attached patch does not contain such test case. I am not very certain on 
how to
create such a test case reliably as it would be mostly based on a warning 
emitted
during the parsing of such a file.

Cheers,
//Georgios

> --
> MichaelFrom 48720e7c6ba771c45d43dc9f5e6833f8bb6715e6 Mon Sep 17 00:00:00 2001
From: Georgios Kokolatos 
Date: Thu, 4 Nov 2021 16:05:21 +
Subject: [PATCH v12] Teach pg_receivewal to use LZ4 compression

The program pg_receivewal can use gzip compression to store the received
WAL.  This commit teaches it to also be able to use LZ4 compression. It
is required that the binary is build using the -llz4 flag. It is enabled
via

Re: Teach pg_receivewal to use lz4 compression

2021-11-04 Thread gkokolatos



‐‐‐ Original Message ‐‐‐

On Thursday, November 4th, 2021 at 9:21 AM, Michael Paquier 
 wrote:

> On Thu, Nov 04, 2021 at 04:31:48PM +0900, Michael Paquier wrote:
> Thanks.  I have looked at 0001 today, and applied it after fixing a
> couple of issues.

Great! Thank you very much.

> From memory:
> - zlib.h was missing from pg_receivewal.c, issue that I noticed after
> removing the redefinition of Z_DEFAULT_COMPRESSION because there was
> no need for it (did a run with a --without-zlib as well).

Yeah, I simply wanted to avoid adding a header. Either way works really.

> - Simplified a bit the error handling for incorrect option
> combinations, using a switch/case while on it.

Much cleaner done this way.

> - Renamed the existing variable "compression" in walmethods.c to
> compression_level, to reduce any confusion with the introduction of
> compression_method.  One thing I have noticed is about the tar method,
> where we rely on the compression level to decide if compression should
> be used or not.  There should be some simplifications possible there
> but there is a huge take in receivelog.c where we use COMPRESSION_NONE
> to track down that we still want to zero a new segment when using tar
> method.

Agreed.

> - Use of 'I' as short option name, err...  After applying the first
> batch..

I left that in just to have the two compression related options next to each
other when switching. I assumed it might help with readability for the next
developer looking at it.

Removing it, is cleaner for the option definifion though, thanks.

>
> Based on the work of 0001, there were some conflicts with 0002.  I
> have solved them while reviewing it, and adapted the code to what got
> already applied.

Thank you very much.

>
> +   header_size = LZ4F_compressBegin(ctx, lz4buf, lz4bufsize, NULL);
> +   if (LZ4F_isError(header_size))
> +   {
> +   pg_free(lz4buf);
> +   close(fd);
> +   return NULL;
> +   }
> In dir_open_for_write(), I guess that this one is missing one
> LZ4F_freeCompressionContext().

Agreed.

>
> +   status = LZ4F_freeDecompressionContext(ctx);
> +   if (LZ4F_isError(status))
> +   {
> +   pg_log_error("could not free LZ4 decompression context: %s",
> +LZ4F_getErrorName(status));
> +   exit(1);
> +   }
> +
> +   if (uncompressed_size != WalSegSz)
> +   {
> +   pg_log_warning("compressed segment file \"%s\" has
> incorrect uncompressed size %ld, skipping",
> +  dirent->d_name, uncompressed_size);
> +   (void) LZ4F_freeDecompressionContext(ctx);
> +   continue;
> +   }
> When the uncompressed size does not match out expected size, the
> second LZ4F_freeDecompressionContext() looks unnecessary to me because
> we have already one a couple of lines above.

Agreed.

>
> +   ctx_out = LZ4F_createCompressionContext(, LZ4F_VERSION);
> +   lz4bufsize = LZ4F_compressBound(LZ4_IN_SIZE, NULL);
> +   if (LZ4F_isError(ctx_out))
> +   {
> +   close(fd);
> +   return NULL;
> +   }
> LZ4F_compressBound() can be after the check on ctx_out, here.
>
> +   while (1)
> +   {
> +   char   *readp;
> +   char   *readend;
> Simply looping when decompressing a segment to check its size looks
> rather unsafe to me.  We should leave the loop once uncompressed_size
> is strictly more than WalSegSz.

The loop exits when done reading or when it failed to read:

+   r = read(fd, readbuf, sizeof(readbuf));
+   if (r < 0)
+   {
+   pg_log_error("could not read file \"%s\": %m", fullpath);
+   exit(1);
+   }
+
+   /* Done reading */
+   if (r == 0)
+   break;

Although I do agree that it can exit before that, if the uncompressed size is
greater than WalSegSz.

>
> The amount of TAP tests looks fine, and that's consistent with what we
> do for zlib^D^D^D^Dgzip.  Now, when testing manually pg_receivewal
> with various combinations of gzip, lz4 and none, I can see the
> following failure in the code that calculates the streaming start
> point:
> pg_receivewal: error: could not decompress file
> "wals//00010006.lz4": ERROR_frameType_unknown
>

Hmmm I will look into that.

> In the LZ4 code, this points to lib/lz4frame.c, where we read an
> incorrect header (see the part that does not match LZ4F_MAGICNUMBER).
> The segments written by pg_receivewal are clean.  Please note that
> this shows up as well when manually compressing some segments with a
> simple lz4 command, to simulate for example the case where a user
> compressed some segments by himself/herself before running
> pg_receivewal.
>

Rights, thank you for investigating. I will look further.

> So, tour code does 

Re: Teach pg_receivewal to use lz4 compression

2021-11-04 Thread Michael Paquier
On Thu, Nov 04, 2021 at 04:31:48PM +0900, Michael Paquier wrote:
> I would have expected read_size to be (readend - readp) to match with
> the remaining data in the read buffer that we still need to read.
> Shouldn't out_size also be LZ4_CHUNK_SZ to match with the size of the
> output buffer where all the contents are read?  By setting it to 1, I
> think that this is doing more loops into LZ4F_decompress() than really
> necessary.  It would not hurt either to memset(0) those buffers before
> they are used, IMO.  I am not completely sure either, but should we
> use the number of bytes returned by LZ4F_decompress() as a hint for
> the follow-up loops?
>
> +#ifdef HAVE_LIBLZ4
> +while (readp < readend)
> +{
> +size_tread_size = 1;
> +size_tout_size = 1;
> +
> +status = LZ4F_decompress(ctx, outbuf, _size,
> + readbuf, _size, NULL);

And...  It happens that the error from v9 is here, where we need to
read the amount of remaining data from "readp", and not "readbuf" :)

It is already late here, I'll continue on this stuff tomorrow, but
this looks rather committable overall. 
--
Michael
From d2d76d8fefb1bad5db611746cf3d0ac89a67de4b Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Thu, 4 Nov 2021 17:19:41 +0900
Subject: [PATCH v11] Teach pg_receivewal to use LZ4 compression

The program pg_receivewal can use gzip compression to store the received
WAL.  This commit teaches it to also be able to use LZ4 compression. It
is required that the binary is build using the -llz4 flag. It is enabled
via the --with-lz4 flag on configuration time.

The option `--compression-method` has been expanded to inlude the value
[LZ4].  The option `--compress` can not be used with LZ4 compression.

Under the hood there is nothing exceptional to be noted. Tar based
archives have not yet been taught to use LZ4 compression. If that is
felt useful, then it is easy to be added in the future.

Tests have been added to verify the creation and correctness of the
generated LZ4 files. The later is achieved by the use of LZ4 program, if
present in the installation.
---
 src/bin/pg_basebackup/Makefile   |   1 +
 src/bin/pg_basebackup/pg_receivewal.c| 153 ++
 src/bin/pg_basebackup/t/020_pg_receivewal.pl |  72 -
 src/bin/pg_basebackup/walmethods.c   | 160 ++-
 src/bin/pg_basebackup/walmethods.h   |   1 +
 doc/src/sgml/ref/pg_receivewal.sgml  |   8 +-
 src/Makefile.global.in   |   1 +
 7 files changed, 384 insertions(+), 12 deletions(-)

diff --git a/src/bin/pg_basebackup/Makefile b/src/bin/pg_basebackup/Makefile
index 459d514183..fd920fc197 100644
--- a/src/bin/pg_basebackup/Makefile
+++ b/src/bin/pg_basebackup/Makefile
@@ -19,6 +19,7 @@ top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
 
 # make these available to TAP test scripts
+export LZ4
 export TAR
 # Note that GZIP cannot be used directly as this environment variable is
 # used by the command "gzip" to pass down options, so stick with a different
diff --git a/src/bin/pg_basebackup/pg_receivewal.c b/src/bin/pg_basebackup/pg_receivewal.c
index 8acc0fc009..b5c0a98b82 100644
--- a/src/bin/pg_basebackup/pg_receivewal.c
+++ b/src/bin/pg_basebackup/pg_receivewal.c
@@ -32,6 +32,10 @@
 #include "receivelog.h"
 #include "streamutil.h"
 
+#ifdef HAVE_LIBLZ4
+#include "lz4frame.h"
+#endif
+
 /* Time to sleep between reconnection attempts */
 #define RECONNECT_SLEEP_TIME 5
 
@@ -136,6 +140,15 @@ is_xlogfilename(const char *filename, bool *ispartial,
 		return true;
 	}
 
+	/* File looks like a completed LZ4-compressed WAL file */
+	if (fname_len == XLOG_FNAME_LEN + strlen(".lz4") &&
+		strcmp(filename + XLOG_FNAME_LEN, ".lz4") == 0)
+	{
+		*ispartial = false;
+		*wal_compression_method = COMPRESSION_LZ4;
+		return true;
+	}
+
 	/* File looks like a partial uncompressed WAL file */
 	if (fname_len == XLOG_FNAME_LEN + strlen(".partial") &&
 		strcmp(filename + XLOG_FNAME_LEN, ".partial") == 0)
@@ -154,6 +167,15 @@ is_xlogfilename(const char *filename, bool *ispartial,
 		return true;
 	}
 
+	/* File looks like a partial LZ4-compressed WAL file */
+	if (fname_len == XLOG_FNAME_LEN + strlen(".lz4.partial") &&
+		strcmp(filename + XLOG_FNAME_LEN, ".lz4.partial") == 0)
+	{
+		*ispartial = true;
+		*wal_compression_method = COMPRESSION_LZ4;
+		return true;
+	}
+
 	/* File does not look like something we know */
 	return false;
 }
@@ -284,6 +306,14 @@ FindStreamingStart(uint32 *tli)
 		 * than 4GB, and then compare it to the size of a completed segment.
 		 * The 4 last bytes correspond to the ISIZE member according to
 		 * http://www.zlib.org/rfc-gzip.html.
+		 *
+		 * For LZ4 compressed segments, uncompress 

Re: Teach pg_receivewal to use lz4 compression

2021-11-04 Thread Michael Paquier
43795ec201ed444d4e74014c3a2d3 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Thu, 4 Nov 2021 16:24:48 +0900
Subject: [PATCH v10] Teach pg_receivewal to use LZ4 compression

The program pg_receivewal can use gzip compression to store the received
WAL.  This commit teaches it to also be able to use LZ4 compression. It
is required that the binary is build using the -llz4 flag. It is enabled
via the --with-lz4 flag on configuration time.

The option `--compression-method` has been expanded to inlude the value
[LZ4].  The option `--compress` can not be used with LZ4 compression.

Under the hood there is nothing exceptional to be noted. Tar based
archives have not yet been taught to use LZ4 compression. If that is
felt useful, then it is easy to be added in the future.

Tests have been added to verify the creation and correctness of the
generated LZ4 files. The later is achieved by the use of LZ4 program, if
present in the installation.
---
 src/bin/pg_basebackup/Makefile   |   1 +
 src/bin/pg_basebackup/pg_receivewal.c| 144 +
 src/bin/pg_basebackup/t/020_pg_receivewal.pl |  72 -
 src/bin/pg_basebackup/walmethods.c   | 160 ++-
 src/bin/pg_basebackup/walmethods.h   |   1 +
 doc/src/sgml/ref/pg_receivewal.sgml  |   8 +-
 src/Makefile.global.in   |   1 +
 7 files changed, 375 insertions(+), 12 deletions(-)

diff --git a/src/bin/pg_basebackup/Makefile b/src/bin/pg_basebackup/Makefile
index 459d514183..fd920fc197 100644
--- a/src/bin/pg_basebackup/Makefile
+++ b/src/bin/pg_basebackup/Makefile
@@ -19,6 +19,7 @@ top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
 
 # make these available to TAP test scripts
+export LZ4
 export TAR
 # Note that GZIP cannot be used directly as this environment variable is
 # used by the command "gzip" to pass down options, so stick with a different
diff --git a/src/bin/pg_basebackup/pg_receivewal.c b/src/bin/pg_basebackup/pg_receivewal.c
index 8acc0fc009..6d090d4c50 100644
--- a/src/bin/pg_basebackup/pg_receivewal.c
+++ b/src/bin/pg_basebackup/pg_receivewal.c
@@ -32,6 +32,10 @@
 #include "receivelog.h"
 #include "streamutil.h"
 
+#ifdef HAVE_LIBLZ4
+#include "lz4frame.h"
+#endif
+
 /* Time to sleep between reconnection attempts */
 #define RECONNECT_SLEEP_TIME 5
 
@@ -136,6 +140,15 @@ is_xlogfilename(const char *filename, bool *ispartial,
 		return true;
 	}
 
+	/* File looks like a completed LZ4-compressed WAL file */
+	if (fname_len == XLOG_FNAME_LEN + strlen(".lz4") &&
+		strcmp(filename + XLOG_FNAME_LEN, ".lz4") == 0)
+	{
+		*ispartial = false;
+		*wal_compression_method = COMPRESSION_LZ4;
+		return true;
+	}
+
 	/* File looks like a partial uncompressed WAL file */
 	if (fname_len == XLOG_FNAME_LEN + strlen(".partial") &&
 		strcmp(filename + XLOG_FNAME_LEN, ".partial") == 0)
@@ -154,6 +167,15 @@ is_xlogfilename(const char *filename, bool *ispartial,
 		return true;
 	}
 
+	/* File looks like a partial LZ4-compressed WAL file */
+	if (fname_len == XLOG_FNAME_LEN + strlen(".lz4.partial") &&
+		strcmp(filename + XLOG_FNAME_LEN, ".lz4.partial") == 0)
+	{
+		*ispartial = true;
+		*wal_compression_method = COMPRESSION_LZ4;
+		return true;
+	}
+
 	/* File does not look like something we know */
 	return false;
 }
@@ -284,6 +306,14 @@ FindStreamingStart(uint32 *tli)
 		 * than 4GB, and then compare it to the size of a completed segment.
 		 * The 4 last bytes correspond to the ISIZE member according to
 		 * http://www.zlib.org/rfc-gzip.html.
+		 *
+		 * For LZ4 compressed segments, uncompress the file in a throw-away
+		 * buffer keeping track of the uncompressed size, then compare it to
+		 * the size of a completed segment.  Per its protocol, LZ4 does not
+		 * store the uncompressed size of an object by default.  contentSize
+		 * is one possible way to do that, but we need to rely on a method
+		 * where WAL segments could have been compressed by a different
+		 * source than pg_receivewal, like an archive_command.
 		 */
 		if (!ispartial && wal_compression_method == COMPRESSION_NONE)
 		{
@@ -315,6 +345,7 @@ FindStreamingStart(uint32 *tli)
 			snprintf(fullpath, sizeof(fullpath), "%s/%s", basedir, dirent->d_name);
 
 			fd = open(fullpath, O_RDONLY | PG_BINARY, 0);
+
 			if (fd < 0)
 			{
 pg_log_error("could not open compressed file \"%s\": %m",
@@ -350,6 +381,98 @@ FindStreamingStart(uint32 *tli)
 continue;
 			}
 		}
+		else if (!ispartial && compression_method == COMPRESSION_LZ4)
+		{
+#ifdef HAVE_LIBLZ4
+#define LZ4_CHUNK_SZ	4096
+			int			fd;
+			int			r;
+			size_t		uncompressed_size = 0;
+			char		fullpath[MAXPGPATH * 2];
+			char		readbuf[LZ4_CHUNK_SZ];
+			char		outbuf[LZ4_CHUNK_SZ];
+			LZ4F_decompressionContext_t ctx = NULL;
+			LZ4F_errorCode_t

Re: Teach pg_receivewal to use lz4 compression

2021-11-03 Thread gkokolatos


‐‐‐ Original Message ‐‐‐

On Wednesday, November 3rd, 2021 at 9:05 AM,  wrote:

> ‐‐‐ Original Message ‐‐‐
>
> On Wednesday, November 3rd, 2021 at 12:23 AM, Michael Paquier 
> mich...@paquier.xyz wrote:
> > On Tue, Nov 02, 2021 at 12:31:47PM -0400, Robert Haas wrote:
> > > On Tue, Nov 2, 2021 at 8:17 AM Magnus Hagander mag...@hagander.net wrote:
> > >
> > > > I think for the end user, it is strictly better to name it "gzip",
> > > > and given that the target of this option is the end user we should
> > > > do so. (It'd be different it we were talking about a build-time
> > > > parameter to configure).
> > >
> > > I agree. Also, I think there's actually a file format called "zlib"
> > > which is slightly different from the "gzip" format, and you have to be
> > > careful not to generate the wrong one.
> >
> > Okay, fine by me. It would be better to be also consistent in
> > WalCompressionMethods once we switch to this option value, then.
>
> I will revert to gzip for version 9. Should be out shortly.

Please find v9 attached.

Cheers,
//Georgios
>
> > MichaelFrom 8e33136f81c3197020053cba0f7f070d594f056e Mon Sep 17 00:00:00 2001
From: Georgios Kokolatos 
Date: Wed, 3 Nov 2021 08:59:58 +
Subject: [PATCH v9 2/2] Teach pg_receivewal to use LZ4 compression

The program pg_receivewal can use gzip compression to store the received WAL.
This commit teaches it to also be able to use LZ4 compression. It is required
that the binary is build using the -llz4 flag. It is enabled via the --with-lz4
flag on configuration time.

The option `--compression-method` has been expanded to inlude the value [LZ4].
The option `--compress` can not be used with LZ4 compression.

Under the hood there is nothing exceptional to be noted. Tar based archives have
not yet been taught to use LZ4 compression. If that is felt useful, then it is
easy to be added in the future.

Tests have been added to verify the creation and correctness of the generated
LZ4 files. The later is achieved by the use of LZ4 program, if present in the
installation.
---
 doc/src/sgml/ref/pg_receivewal.sgml  |   5 +-
 src/Makefile.global.in   |   1 +
 src/bin/pg_basebackup/Makefile   |   1 +
 src/bin/pg_basebackup/pg_receivewal.c| 129 +++
 src/bin/pg_basebackup/t/020_pg_receivewal.pl |  72 -
 src/bin/pg_basebackup/walmethods.c   | 159 ++-
 src/bin/pg_basebackup/walmethods.h   |   1 +
 7 files changed, 358 insertions(+), 10 deletions(-)

diff --git a/doc/src/sgml/ref/pg_receivewal.sgml b/doc/src/sgml/ref/pg_receivewal.sgml
index cf2eaa1486..411b275de0 100644
--- a/doc/src/sgml/ref/pg_receivewal.sgml
+++ b/doc/src/sgml/ref/pg_receivewal.sgml
@@ -268,8 +268,11 @@ PostgreSQL documentation
   

 Enables compression of write-ahead logs using the specified method.
-Supported values gzip,
+Supported values are lz4, gzip,
 and none.
+For the LZ4 method to be available,
+PostgreSQL must have been have been compiled
+with --with-lz4.

   
  
diff --git a/src/Makefile.global.in b/src/Makefile.global.in
index 533c12fef9..05c54b27de 100644
--- a/src/Makefile.global.in
+++ b/src/Makefile.global.in
@@ -350,6 +350,7 @@ XGETTEXT = @XGETTEXT@
 
 GZIP	= gzip
 BZIP2	= bzip2
+LZ4	= lz4
 
 DOWNLOAD = wget -O $@ --no-use-server-timestamps
 #DOWNLOAD = curl -o $@
diff --git a/src/bin/pg_basebackup/Makefile b/src/bin/pg_basebackup/Makefile
index 459d514183..387d728345 100644
--- a/src/bin/pg_basebackup/Makefile
+++ b/src/bin/pg_basebackup/Makefile
@@ -24,6 +24,7 @@ export TAR
 # used by the command "gzip" to pass down options, so stick with a different
 # name.
 export GZIP_PROGRAM=$(GZIP)
+export LZ4
 
 override CPPFLAGS := -I$(libpq_srcdir) $(CPPFLAGS)
 LDFLAGS_INTERNAL += -L$(top_builddir)/src/fe_utils -lpgfeutils $(libpq_pgport)
diff --git a/src/bin/pg_basebackup/pg_receivewal.c b/src/bin/pg_basebackup/pg_receivewal.c
index 9449b50868..af3eba8845 100644
--- a/src/bin/pg_basebackup/pg_receivewal.c
+++ b/src/bin/pg_basebackup/pg_receivewal.c
@@ -29,6 +29,10 @@
 #include "receivelog.h"
 #include "streamutil.h"
 
+#ifdef HAVE_LIBLZ4
+#include "lz4frame.h"
+#endif
+
 /* Time to sleep between reconnection attempts */
 #define RECONNECT_SLEEP_TIME 5
 
@@ -137,6 +141,15 @@ is_xlogfilename(const char *filename, bool *ispartial,
 		return true;
 	}
 
+	/* File looks like a complete LZ4 compressed XLOG file */
+	if (fname_len == XLOG_FNAME_LEN + strlen(".lz4") &&
+		strcmp(filename + XLOG_FNAME_LEN, ".lz4") == 0)
+	{
+		*ispartial = false;
+		*wal_compression_method = COMPRESSION_LZ4;
+		return true;
+	}
+
 	/* File looks like a partial uncomp

Re: Teach pg_receivewal to use lz4 compression

2021-11-03 Thread gkokolatos



‐‐‐ Original Message ‐‐‐

On Wednesday, November 3rd, 2021 at 12:23 AM, Michael Paquier 
 wrote:

> On Tue, Nov 02, 2021 at 12:31:47PM -0400, Robert Haas wrote:
>> On Tue, Nov 2, 2021 at 8:17 AM Magnus Hagander mag...@hagander.net wrote:
>>> I think for the end user, it is strictly better to name it "gzip",
>>> and given that the target of this option is the end user we should
>>> do so. (It'd be different it we were talking about a build-time
>>> parameter to configure).
>>
>> I agree. Also, I think there's actually a file format called "zlib"
>> which is slightly different from the "gzip" format, and you have to be
>> careful not to generate the wrong one.
>
> Okay, fine by me.  It would be better to be also consistent in
> WalCompressionMethods once we switch to this option value, then.

I will revert to gzip for version 9. Should be out shortly.

Cheers,
//Georgios
>
> Michael




Re: Teach pg_receivewal to use lz4 compression

2021-11-02 Thread Michael Paquier
On Tue, Nov 02, 2021 at 12:31:47PM -0400, Robert Haas wrote:
> On Tue, Nov 2, 2021 at 8:17 AM Magnus Hagander  wrote:
>> I think for the end user, it is strictly better to name it "gzip",
>> and given that the target of this option is the end user we should
>> do so. (It'd be different it we were talking about a build-time
>> parameter to configure). 
> 
> I agree. Also, I think there's actually a file format called "zlib"
> which is slightly different from the "gzip" format, and you have to be
> careful not to generate the wrong one.

Okay, fine by me.  It would be better to be also consistent in
WalCompressionMethods once we switch to this option value, then.
--
Michael


signature.asc
Description: PGP signature


Re: Teach pg_receivewal to use lz4 compression

2021-11-02 Thread Robert Haas
On Tue, Nov 2, 2021 at 8:17 AM Magnus Hagander  wrote:
> Um, why?
>
> That we are using zlib to provide the compression is an implementation 
> detail. Whereas AFAIK "gzip" refers to both the program and the format. And 
> we specifically use the gzxxx() functions in zlib, in order to produce gzip 
> format.
>
> I think for the end user, it is strictly better to name it "gzip", and given 
> that the target of this option is the end user we should do so. (It'd be 
> different it we were talking about a build-time parameter to configure).

I agree. Also, I think there's actually a file format called "zlib"
which is slightly different from the "gzip" format, and you have to be
careful not to generate the wrong one.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Teach pg_receivewal to use lz4 compression

2021-11-02 Thread Magnus Hagander
On Tue, Nov 2, 2021 at 9:51 AM Michael Paquier  wrote:

> On Tue, Nov 02, 2021 at 07:27:50AM +0900, Michael Paquier wrote:
> > On Mon, Nov 01, 2021 at 08:39:59AM +, gkokola...@pm.me wrote:
> > > Agreed.
> > >
> > > I have already started on v8 of the patch with that technique. I should
> > > be able to update the thread soon.
> >
> > Nice, thanks!
>
> By the way, I was reading the last version of the patch today, and
> it seems to me that we could make the commit history if we split the
> patch into two parts:
> - One that introduces the new option --compression-method and
> is_xlogfilename(), while reworking --compress (including documentation
> changes).
> - One to have LZ4 support.
>
> v7 has been using "gzip" for the option name, but I think that it
> would be more consistent to use "zlib" instead.
>

Um, why?

That we are using zlib to provide the compression is an implementation
detail. Whereas AFAIK "gzip" refers to both the program and the format. And
we specifically use the gzxxx() functions in zlib, in order to produce gzip
format.

I think for the end user, it is strictly better to name it "gzip", and
given that the target of this option is the end user we should do so. (It'd
be different it we were talking about a build-time parameter to configure).

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 


Re: Teach pg_receivewal to use lz4 compression

2021-11-02 Thread gkokolatos
TarMethodData
 {
 	char	   *tarfilename;
 	int			fd;
+	WalCompressionMethod compression_method;
 	int			compression;
 	bool		sync;
 	TarMethodFile *currentfile;
@@ -731,10 +737,10 @@ tar_get_file_size(const char *pathname)
 	return -1;
 }
 
-static int
-tar_compression(void)
+static WalCompressionMethod
+tar_compression_method(void)
 {
-	return tar_data->compression;
+	return tar_data->compression_method;
 }
 
 static off_t
@@ -1031,8 +1037,16 @@ tar_finish(void)
 	return true;
 }
 
+/*
+ * The argument compression_method is currently ignored. It is in place for
+ * symmetry with CreateWalDirectoryMethod which uses it for distinguishing
+ * between the different compression methods. CreateWalTarMethod and its family
+ * of functions handle only zlib compression.
+ */
 WalWriteMethod *
-CreateWalTarMethod(const char *tarbase, int compression, bool sync)
+CreateWalTarMethod(const char *tarbase,
+   WalCompressionMethod compression_method,
+   int compression, bool sync)
 {
 	WalWriteMethod *method;
 	const char *suffix = (compression != 0) ? ".tar.gz" : ".tar";
@@ -1043,7 +1057,7 @@ CreateWalTarMethod(const char *tarbase, int compression, bool sync)
 	method->get_current_pos = tar_get_current_pos;
 	method->get_file_size = tar_get_file_size;
 	method->get_file_name = tar_get_file_name;
-	method->compression = tar_compression;
+	method->compression_method = tar_compression_method;
 	method->close = tar_close;
 	method->sync = tar_sync;
 	method->existsfile = tar_existsfile;
@@ -1054,6 +1068,7 @@ CreateWalTarMethod(const char *tarbase, int compression, bool sync)
 	tar_data->tarfilename = pg_malloc0(strlen(tarbase) + strlen(suffix) + 1);
 	sprintf(tar_data->tarfilename, "%s%s", tarbase, suffix);
 	tar_data->fd = -1;
+	tar_data->compression_method = compression_method;
 	tar_data->compression = compression;
 	tar_data->sync = sync;
 #ifdef HAVE_LIBZ
diff --git a/src/bin/pg_basebackup/walmethods.h b/src/bin/pg_basebackup/walmethods.h
index 4abdfd8333..4fc7b3d2a3 100644
--- a/src/bin/pg_basebackup/walmethods.h
+++ b/src/bin/pg_basebackup/walmethods.h
@@ -19,6 +19,12 @@ typedef enum
 	CLOSE_NO_RENAME
 } WalCloseMethod;
 
+typedef enum
+{
+	COMPRESSION_ZLIB,
+	COMPRESSION_NONE
+} WalCompressionMethod;
+
 /*
  * A WalWriteMethod structure represents the different methods used
  * to write the streaming WAL as it's received.
@@ -58,8 +64,8 @@ struct WalWriteMethod
 	 */
 	char	   *(*get_file_name) (const char *pathname, const char *temp_suffix);
 
-	/* Return the level of compression */
-	int			(*compression) (void);
+	/* Returns the compression method */
+	WalCompressionMethod (*compression_method) (void);
 
 	/*
 	 * Write count number of bytes to the file, and return the number of bytes
@@ -95,8 +101,11 @@ struct WalWriteMethod
  *		   not all those required for pg_receivewal)
  */
 WalWriteMethod *CreateWalDirectoryMethod(const char *basedir,
+		 WalCompressionMethod compression_method,
 		 int compression, bool sync);
-WalWriteMethod *CreateWalTarMethod(const char *tarbase, int compression, bool sync);
+WalWriteMethod *CreateWalTarMethod(const char *tarbase,
+   WalCompressionMethod compression_method,
+   int compression, bool sync);
 
 /* Cleanup routines for previously-created methods */
 void		FreeWalDirectoryMethod(void);
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 734e2f..da6ac8ed83 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2858,6 +2858,7 @@ WaitEventTimeout
 WaitPMResult
 WalCloseMethod
 WalCompression
+WalCompressionMethod
 WalLevel
 WalRcvData
 WalRcvExecResult
-- 
2.25.1

From d936a8621339a86e9151bd5b942e25afd1dabc64 Mon Sep 17 00:00:00 2001
From: Georgios Kokolatos 
Date: Tue, 2 Nov 2021 11:26:51 +
Subject: [PATCH v8 2/2] Teach pg_receivewal to use LZ4 compression

The program pg_receivewal can use gzip compression to store the received WAL.
This commit teaches it to also be able to use LZ4 compression. It is required
that the binary is build using the -llz4 flag. It is enabled via the --with-lz4
flag on configuration time.

The option `--compression-method` has been expanded to inlude the value [LZ4].
The option `--compress` can not be used with LZ4 compression.

Under the hood there is nothing exceptional to be noted. Tar based archives have
not yet been taught to use LZ4 compression. If that is felt useful, then it is
easy to be added in the future.

Tests have been added to verify the creation and correctness of the generated
LZ4 files. The later is achieved by the use of LZ4 program, if present in the
installation.
---
 doc/src/sgml/ref/pg_receivewal.sgml  |   5 +-
 src/Makefile.global.in   |   1 +
 src/bin/pg_basebackup/Makefile   |   1 +
 src/bin/pg_basebackup/pg_receivewal.c| 129 +++
 src/bin/pg_basebackup/t/020_pg_receivewal.pl |  72 

Re: Teach pg_receivewal to use lz4 compression

2021-11-02 Thread Michael Paquier
On Tue, Nov 02, 2021 at 07:27:50AM +0900, Michael Paquier wrote:
> On Mon, Nov 01, 2021 at 08:39:59AM +, gkokola...@pm.me wrote:
> > Agreed.
> > 
> > I have already started on v8 of the patch with that technique. I should
> > be able to update the thread soon.
> 
> Nice, thanks!

By the way, I was reading the last version of the patch today, and
it seems to me that we could make the commit history if we split the
patch into two parts:
- One that introduces the new option --compression-method and
is_xlogfilename(), while reworking --compress (including documentation
changes).
- One to have LZ4 support.

v7 has been using "gzip" for the option name, but I think that it
would be more consistent to use "zlib" instead.
--
Michael


signature.asc
Description: PGP signature


Re: Teach pg_receivewal to use lz4 compression

2021-11-01 Thread Michael Paquier
On Mon, Nov 01, 2021 at 08:39:59AM +, gkokola...@pm.me wrote:
> Agreed.
> 
> I have already started on v8 of the patch with that technique. I should
> be able to update the thread soon.

Nice, thanks!
--
Michael


signature.asc
Description: PGP signature


Re: Teach pg_receivewal to use lz4 compression

2021-11-01 Thread gkokolatos



‐‐‐ Original Message ‐‐‐

On Monday, November 1st, 2021 at 9:09 AM, Michael Paquier  
wrote:

> On Fri, Oct 29, 2021 at 08:38:33PM +0900, Michael Paquier wrote:
>
> It would be good to test with many segments, but could we think about
> just relying on LZ4F_decompress() with a frame and compute the
> decompressed size by ourselves? At least that will never break, and
> that would work in all the cases aimed by pg_receivewal.

Agreed.

I have already started on v8 of the patch with that technique. I should
be able to update the thread soon.

>
> Michael




Re: Teach pg_receivewal to use lz4 compression

2021-11-01 Thread Michael Paquier
On Fri, Oct 29, 2021 at 08:38:33PM +0900, Michael Paquier wrote:
> Why would the header size change between the moment the segment is
> begun and it is finished?  We could store it in memory and write it
> again when the segment is closed instead, even if it means to fseek()
> back to the beginning of the file once the segment is completed.
> Storing WalSegSz from the moment a segment is opened makes the code
> weaker to SIGINTs and the kind, so this does not fix the problem I
> mentioned previously :/

I got to think more on this one, and another argument against storing
an incorrect contentSize while the segment is not completed would
break the case of partial segments with --synchronous, where we should
still be able to recover as much data flushed as possible.  Like zlib,
where one has to complete the partial segment with zeros after
decompression until the WAL segment size is reached, we should be able
to support that with LZ4.  (I have saved some customer data in the
past thanks to this property, btw.)

It is proves to be too fancy to rewrite the header with a correct
contentSize once the segment is completed, another way would be to
enforce a decompression of each segment in-memory.  The advantage of
this method is that we would be a maximum portable.  For example, if
one begins to use pg_receivewal on an archive directory where we used
an archive_command, we would be able to grab the starting LSN.  That's
more costly of course, but the LZ4 protocol does not make that easy
either with its chunk protocol.  By the way, you are right that we
should worry about the variability in size of the header as we only
have the guarantee that it can be within a give window.  I missed
that and lz4frame.h mentions that around LZ4F_headerSize :/

It would be good to test with many segments, but could we think about
just relying on LZ4F_decompress() with a frame and compute the
decompressed size by ourselves?  At least that will never break, and
that would work in all the cases aimed by pg_receivewal.
--
Michael


signature.asc
Description: PGP signature


Re: Teach pg_receivewal to use lz4 compression

2021-10-29 Thread Michael Paquier
On Fri, Oct 29, 2021 at 09:45:41AM +, gkokola...@pm.me wrote:
> On Saturday, September 18th, 2021 at 8:18 AM, Michael Paquier 
>  wrote:
>> We don't really care about contentSize as long as a segment is not
>> completed.  Rather than filling contentSize all the time we write
>> something, we'd better update frameInfo once the segment is
>> completed and closed.  That would also take take of the error as this
>> is not checked if contentSize is 0.  It seems to me that we should
>> fill in the information when doing a CLOSE_NORMAL.
> 
> Thank you for the comment. I think that the opposite should be done. At the 
> time
> that the file is closed, the header is already written to disk. We have no way
> to know that is not. If we need to go back to refill the information, we will
> have to ask for the API to produce a new header. There is little guarantee 
> that
> the header size will be the same and as a consequence we will have to shift
> the actual data around.

Why would the header size change between the moment the segment is
begun and it is finished?  We could store it in memory and write it
again when the segment is closed instead, even if it means to fseek()
back to the beginning of the file once the segment is completed.
Storing WalSegSz from the moment a segment is opened makes the code
weaker to SIGINTs and the kind, so this does not fix the problem I
mentioned previously :/

> In the attached, the header is rewritten only when closing an incomplete
> segment. For all intents and purposes that segment is not usable. However 
> there
> might be custom scripts that might want to attempt to parse even an otherwise
> unusable file.
> 
> A different and easier approach would be to simply prepare the LZ4 context for
> future actions and simply ignore the file.

I am not sure what you mean by "ignore" here.  Do you mean to store 0
in contentSize when opening the segment and rewriting again the header
once the segment is completed?
--
Michael


signature.asc
Description: PGP signature


Re: Teach pg_receivewal to use lz4 compression

2021-10-29 Thread gkokolatos


‐‐‐ Original Message ‐‐‐

On Saturday, September 18th, 2021 at 8:18 AM, Michael Paquier 
 wrote:
> On Fri, Sep 17, 2021 at 08:12:42AM +, gkokola...@pm.me wrote:
>
> I have been digging into the issue I saw in the TAP tests when closing
> a segment, and found the problem.  The way you manipulate
> frameInfo.contentSize by just setting it to WalSegSz when *opening*
> a segment causes problems on LZ4F_compressEnd(), making the code
> throw a ERROR_frameSize_wrong.  In lz4frame.c, the end of
> LZ4F_compressEnd() triggers this check and the error:
> if (cctxPtr->prefs.frameInfo.contentSize) {
> if (cctxPtr->prefs.frameInfo.contentSize != cctxPtr->totalInSize)
> return err0r(LZ4F_ERROR_frameSize_wrong);
> }
>
> We don't really care about contentSize as long as a segment is not
> completed.  Rather than filling contentSize all the time we write
> something, we'd better update frameInfo once the segment is
> completed and closed.  That would also take take of the error as this
> is not checked if contentSize is 0.  It seems to me that we should
> fill in the information when doing a CLOSE_NORMAL.

Thank you for the comment. I think that the opposite should be done. At the time
that the file is closed, the header is already written to disk. We have no way
to know that is not. If we need to go back to refill the information, we will
have to ask for the API to produce a new header. There is little guarantee that
the header size will be the same and as a consequence we will have to shift
the actual data around.

In the attached, the header is rewritten only when closing an incomplete
segment. For all intents and purposes that segment is not usable. However there
might be custom scripts that might want to attempt to parse even an otherwise
unusable file.

A different and easier approach would be to simply prepare the LZ4 context for
future actions and simply ignore the file.

>
> -   if (stream->walmethod->compression() == 0 &&
> +   if (stream->walmethod->compression() == COMPRESSION_NONE &&
> stream->walmethod->existsfile(fn))
> This one was a more serious issue, as the compression() callback would
> return an integer for the compression level but v5 compared it to a
> WalCompressionMethod.  In order to take care of this issue, mainly for
> pg_basebackup, I think that we have to update the compression()
> callback to compression_method(), and it is cleaner to save the
> compression method as well as the compression level for the tar data.
>

Agreed.

> I am attaching a new patch, on which I have done many tweaks and
> adjustments while reviewing it.  The attached patch fixes the second
> issue, and I have done nothing about the first issue yet, but that
> should be simple enough to address as this needs an update of the
> frame info when closing a completed segment.  Could you look at it?
>

Thank you. Find v7 attached, rebased to the current head.

Cheers,
//Georgios

> Thanks,
> --
> MichaeFrom c3c2eca22102cd0186eb1975339248a200e1ceb9 Mon Sep 17 00:00:00 2001
From: Georgios Kokolatos 
Date: Fri, 22 Oct 2021 13:14:15 +
Subject: [PATCH v7] Teach pg_receivewal to use LZ4 compression

The program pg_receivewal can use gzip compression to store the received WAL.
This commit teaches it to also be able to use LZ4 compression. It is required
that the binary is build using the -llz4 flag. It is enabled via the --with-lz4
flag on configuration time.

Previously, the user had to use the option --compress with a value between [0-9]
to denote that gzip compression was required. This specific behaviour has not
maintained. A newly introduced option --compression-method=[LZ4|gzip] can be
used to ask for the logs to be compressed. Compression values can be selected
only when the compression method is gzip. A compression value of 0 now returns
an error.

Under the hood there is nothing exceptional to be noted. Tar based archives have
not yet been taught to use LZ4 compression. If that is felt useful, then it is
easy to be added in the future.

Tests have been added to verify the creation and correctness of the generated
LZ4 files. The later is achieved by the use of LZ4 program, if present in the
installation.
---
 doc/src/sgml/ref/pg_receivewal.sgml  |  28 +-
 src/Makefile.global.in   |   1 +
 src/bin/pg_basebackup/Makefile   |   1 +
 src/bin/pg_basebackup/pg_basebackup.c|   7 +-
 src/bin/pg_basebackup/pg_receivewal.c| 274 +++
 src/bin/pg_basebackup/receivelog.c   |   2 +-
 src/bin/pg_basebackup/t/020_pg_receivewal.pl |  79 +-
 src/bin/pg_basebackup/walmethods.c   | 263 --
 src/bin/pg_basebackup/walmethods.h   |  16 +-
 src/tools/pgindent/typedefs.list |   1 +
 10 files changed

Re: Teach pg_receivewal to use lz4 compression

2021-09-18 Thread Michael Paquier
On Fri, Sep 17, 2021 at 08:12:42AM +, gkokola...@pm.me wrote:
> Yeah, I was considering it to split them over to a separate commit,
> then decided against it. Will do so in the future.

I have been digging into the issue I saw in the TAP tests when closing
a segment, and found the problem.  The way you manipulate
frameInfo.contentSize by just setting it to WalSegSz when *opening*
a segment causes problems on LZ4F_compressEnd(), making the code
throw a ERROR_frameSize_wrong.  In lz4frame.c, the end of
LZ4F_compressEnd() triggers this check and the error:
if (cctxPtr->prefs.frameInfo.contentSize) {
if (cctxPtr->prefs.frameInfo.contentSize != cctxPtr->totalInSize)
return err0r(LZ4F_ERROR_frameSize_wrong);
}

We don't really care about contentSize as long as a segment is not
completed.  Rather than filling contentSize all the time we write
something, we'd better update frameInfo once the segment is
completed and closed.  That would also take take of the error as this
is not checked if contentSize is 0.  It seems to me that we should
fill in the information when doing a CLOSE_NORMAL.

-   if (stream->walmethod->compression() == 0 &&
+   if (stream->walmethod->compression() == COMPRESSION_NONE &&
stream->walmethod->existsfile(fn))
This one was a more serious issue, as the compression() callback would
return an integer for the compression level but v5 compared it to a
WalCompressionMethod.  In order to take care of this issue, mainly for
pg_basebackup, I think that we have to update the compression()
callback to compression_method(), and it is cleaner to save the
compression method as well as the compression level for the tar data.

I am attaching a new patch, on which I have done many tweaks and
adjustments while reviewing it.  The attached patch fixes the second
issue, and I have done nothing about the first issue yet, but that
should be simple enough to address as this needs an update of the
frame info when closing a completed segment.  Could you look at it?

Thanks,
--
Michael
From 37e3800d279566445864ed82f29e8d650c72d8cd Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Sat, 18 Sep 2021 15:11:49 +0900
Subject: [PATCH v6] Teach pg_receivewal to use LZ4 compression

The program pg_receivewal can use gzip compression to store the received WAL.
This commit teaches it to also be able to use LZ4 compression. It is required
that the binary is build using the -llz4 flag. It is enabled via the --with-lz4
flag on configuration time.

Previously, the user had to use the option --compress with a value between [0-9]
to denote that gzip compression was required. This specific behaviour has not
maintained. A newly introduced option --compression-method=[LZ4|gzip] can be
used to ask for the logs to be compressed. Compression values can be selected
only when the compression method is gzip. A compression value of 0 now returns
an error.

Under the hood there is nothing exceptional to be noted. Tar based archives have
not yet been taught to use LZ4 compression. If that is felt useful, then it is
easy to be added in the future.

Tests have been added to verify the creation and correctness of the generated
LZ4 files. The later is achieved by the use of LZ4 program, if present in the
installation.
---
 src/bin/pg_basebackup/Makefile   |   1 +
 src/bin/pg_basebackup/pg_basebackup.c|   7 +-
 src/bin/pg_basebackup/pg_receivewal.c| 278 +++
 src/bin/pg_basebackup/receivelog.c   |   2 +-
 src/bin/pg_basebackup/t/020_pg_receivewal.pl |  81 +-
 src/bin/pg_basebackup/walmethods.c   | 216 --
 src/bin/pg_basebackup/walmethods.h   |  16 +-
 doc/src/sgml/ref/pg_receivewal.sgml  |  28 +-
 src/Makefile.global.in   |   1 +
 src/tools/pgindent/typedefs.list |   1 +
 10 files changed, 546 insertions(+), 85 deletions(-)

diff --git a/src/bin/pg_basebackup/Makefile b/src/bin/pg_basebackup/Makefile
index 459d514183..387d728345 100644
--- a/src/bin/pg_basebackup/Makefile
+++ b/src/bin/pg_basebackup/Makefile
@@ -24,6 +24,7 @@ export TAR
 # used by the command "gzip" to pass down options, so stick with a different
 # name.
 export GZIP_PROGRAM=$(GZIP)
+export LZ4
 
 override CPPFLAGS := -I$(libpq_srcdir) $(CPPFLAGS)
 LDFLAGS_INTERNAL += -L$(top_builddir)/src/fe_utils -lpgfeutils $(libpq_pgport)
diff --git a/src/bin/pg_basebackup/pg_basebackup.c 
b/src/bin/pg_basebackup/pg_basebackup.c
index 669aa207a3..18c6a93cec 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -555,10 +555,13 @@ LogStreamerMain(logstreamer_param *param)
stream.replication_slot = replication_slot;
 
if (format == 'p')
-   stream.walmethod = CreateWalDirectoryMethod(param->xlog, 0,
+

Re: Teach pg_receivewal to use lz4 compression

2021-09-17 Thread gkokolatos



‐‐‐ Original Message ‐‐‐

On Friday, September 17th, 2021 at 09:39, Michael Paquier  
wrote:

> On Thu, Sep 16, 2021 at 03:17:15PM +, gkokola...@pm.me wrote:
>
> > Hopefully fixed.
>
> Thanks for the new version. I have put my hands on the patch, and
> began reviewing its internals with LZ4. I am not done with it yet,
> and I have noticed some places that could be improved (error handling,
> some uses of LZ4F_flush() that should be replaced LZ4F_compressEnd(),
> and more tweaks). I'll send an updated version once I complete my
> review, but that looks rather solid overall.

Thanks! Looking forward to seeing it!

> The changes done in close_walfile()@receivelog.c are useful taken
> independently, so I have applied these separately.

Yeah, I was considering it to split them over to a separate commit,
then decided against it. Will do so in the future.

Cheers,
//Georgios

> 
> Michael




Re: Teach pg_receivewal to use lz4 compression

2021-09-17 Thread Michael Paquier
On Thu, Sep 16, 2021 at 03:17:15PM +, gkokola...@pm.me wrote:
> Hopefully fixed.

Thanks for the new version.  I have put my hands on the patch, and
began reviewing its internals with LZ4.  I am not done with it yet,
and I have noticed some places that could be improved (error handling,
some uses of LZ4F_flush() that should be replaced LZ4F_compressEnd(),
and more tweaks).  I'll send an updated version once I complete my
review, but that looks rather solid overall.

The changes done in close_walfile()@receivelog.c are useful taken
independently, so I have applied these separately.
--
Michael


signature.asc
Description: PGP signature


Re: Teach pg_receivewal to use lz4 compression

2021-09-16 Thread gkokolatos
‐‐‐ Original Message ‐‐‐

On Wednesday, September 15th, 2021 at 08:46, Michael Paquier 
mich...@paquier.xyz wrote:

Hi,

thank you for the review.

> On Fri, Sep 10, 2021 at 08:21:51AM +, gkokola...@pm.me wrote:
>
> > Agreed. A default value of 5, which is in the middle point of options, has 
> > been
> > defined and used.
> > In addition, the tests have been adjusted to mimic the newly added gzip 
> > tests.
>
> Looking at lz4frame.h, there is LZ4F_flush() that allows to compress
> immediately any data buffered in the frame context but not compressed
> yet. It seems to me that dir_sync() should be extended to support
> LZ4.

Agreed. LZ4F_flush() calls have been added where appropriate.

>
> export GZIP_PROGRAM=$(GZIP)
> +export LZ4
> [...]
> +PGAC_PATH_PROGS(LZ4, lz4)
> -
> PGAC_PATH_BISON
>
> The part of the test assigning LZ4 is fine, but I'd rather switch to a
> logic à-la-gzip, where we just save "lz4" in Makefile.global.in,
> saving cycles in ./configure.

Reluctantly agreed.

>
> +static bool
> +is_xlogfilename(const char *filename, bool *ispartial,
> -   WalCompressionMethod *wal_compression_method)
>
>
> I like the set of simplifications you have done here to detection if a
> segment is partial and which compression method applies to it.

Thank you very much.

>
> +   if (compression_method != COMPRESSION_ZLIB && compresslevel != 0)
> +   {
> +   pg_log_error("can only use --compress together with "
> +"--compression-method=gzip");
> +#ifndef HAVE_LIBLZ4
> +   pg_log_error("this build does not support compression via gzip");
> +#endif
>
> s/HAVE_LIBLZ4/HAVE_LIBZ/.
>

Fixed.

> +$primary->command_fails(
> +   [
> + 'pg_receivewal', '-D', $stream_dir, '--compression-method', 'lz4',
> + '--compress', '1'
> +   ],
> +   'failure if --compression-method=lz4 specified with --compress');
> This would fail when the code is not built with LZ4 with a non-zero
> error code but with an error that is not what we expect. I think that
> you should use $primary->command_fails_like() instead. That's quite
> new, as of de1d4fe. The matching error pattern will need to change
> depending on if we build the code with LZ4 or not. A simpler method
> is to use --compression-method=none, to bypass the first round of
> errors and make that build-independent, but that feels incomplete if
> you want to tie that to LZ4.

Fixed. Now a regex has been added to address both builds.

>
> +   pg_log_warning("compressed segment file "%s" has 
> incorrect header size %lu, skipping",
> +  dirent->d_name, consumed_len);
> +   LZ4F_freeDecompressionContext(ctx);
>
> I agree that skipping all those cases when calculating the streaming
> start point is more consistent.

Thanks.

>
> +   if (r < 0)
> +   pg_log_error("could not read compressed file 
> "%s": %m",
> +fullpath);
> +   else
> +   pg_log_error("could not read compressed file 
> "%s": read %d of %lu",
> +fullpath, r, sizeof(buf));
>
> Let's same in translation effort here by just using "could not read",
> etc. by removing the term "compressed".

The string is also present in the gzip compressed case, i.e. prior to this 
patch.
The translation effort will not be reduced by changing this string only.

> +   pg_log_error("can only use --compress together with "
> +"--compression-method=gzip");
>
> Better to keep these in a single line to ease grepping. We don't care
> if error strings are longer than the 72-80 character limit.

Fixed.

> +/* Size of lz4 input chunk for .lz4 */
> +#define LZ4_IN_SIZE 4096
>
> Why this choice? Does it need to use LZ4_COMPRESSBOUND?

This value is used in order to calculate the bound, before any buffer is
received. Then when we receive buffer, we consume them in LZ4_IN_SIZE chunks.
Note the call to LZ4F_compressBound() in dir_open_for_write().

+   ctx_out = LZ4F_createCompressionContext(, LZ4F_VERSION);
+   lz4bufsize = LZ4F_compressBound(LZ4_IN_SIZE, );


> -  if (dir_data->compression > 0)
> +  if (dir_data->compression_method == COMPRESSION_ZLIB)
> gzclose(gzfp);
> else
>
> Hm. The addition of the header in dir_open_for_write() uses
> LZ4F_compressBegin. Shouldn't we use LZ4F_compressEnd() if
> fsync_fname() or fsync_parent_path() fail on top of closing the fd?
> That would be more consistent IMO to do so. The patch does that in
> dir_close(). You should do that additionally if there is a failure
> when writing the header.

Fixed. LZ4_flush() have been added where appropriate.

>
> +   pg_log_error("invalid compression-method \"%s\", 
> optarg);
> +   exit(1);
>
> This could be "invalid value \"%s\" 

Re: Teach pg_receivewal to use lz4 compression

2021-09-15 Thread Michael Paquier
On Fri, Sep 10, 2021 at 08:21:51AM +, gkokola...@pm.me wrote:
> Agreed. A default value of 5, which is in the middle point of options, has 
> been
> defined and used.
> 
> In addition, the tests have been adjusted to mimic the newly added gzip tests.

Looking at lz4frame.h, there is LZ4F_flush() that allows to compress
immediately any data buffered in the frame context but not compressed
yet.  It seems to me that dir_sync() should be extended to support
LZ4.

 export GZIP_PROGRAM=$(GZIP)
+export LZ4
[...]
+PGAC_PATH_PROGS(LZ4, lz4)
+
 PGAC_PATH_BISON
The part of the test assigning LZ4 is fine, but I'd rather switch to a
logic à-la-gzip, where we just save "lz4" in Makefile.global.in,
saving cycles in ./configure.

+static bool
+is_xlogfilename(const char *filename, bool *ispartial,
+   WalCompressionMethod *wal_compression_method)
I like the set of simplifications you have done here to detection if a
segment is partial and which compression method applies to it.

+   if (compression_method != COMPRESSION_ZLIB && compresslevel != 0)
+   {
+   pg_log_error("can only use --compress together with "
+"--compression-method=gzip");
+#ifndef HAVE_LIBLZ4
+   pg_log_error("this build does not support compression via gzip");
+#endif

s/HAVE_LIBLZ4/HAVE_LIBZ/.

+$primary->command_fails(
+   [
+ 'pg_receivewal', '-D', $stream_dir, '--compression-method', 'lz4',
+ '--compress', '1'
+   ],
+   'failure if --compression-method=lz4 specified with --compress');

This would fail when the code is not built with LZ4 with a non-zero
error code but with an error that is not what we expect.  I think that
you should use $primary->command_fails_like() instead.  That's quite
new, as of de1d4fe.  The matching error pattern will need to change
depending on if we build the code with LZ4 or not.  A simpler method
is to use --compression-method=none, to bypass the first round of
errors and make that build-independent, but that feels incomplete if
you want to tie that to LZ4.

+   pg_log_warning("compressed segment file \"%s\" has incorrect 
header size %lu, skipping",
+  dirent->d_name, consumed_len);
+   LZ4F_freeDecompressionContext(ctx);
I agree that skipping all those cases when calculating the streaming
start point is more consistent.

+   if (r < 0)
+   pg_log_error("could not read compressed file \"%s\": %m",
+fullpath);
+   else
+   pg_log_error("could not read compressed file \"%s\": read 
%d of %lu",
+fullpath, r, sizeof(buf));
Let's same in translation effort here by just using "could not read",
etc. by removing the term "compressed".

+   pg_log_error("can only use --compress together with "
+"--compression-method=gzip");
Better to keep these in a single line to ease grepping.  We don't care
if error strings are longer than the 72-80 character limit.

+/* Size of lz4 input chunk for .lz4 */
+#define LZ4_IN_SIZE  4096
Why this choice?  Does it need to use LZ4_COMPRESSBOUND?

-   if (dir_data->compression > 0)
+   if (dir_data->compression_method == COMPRESSION_ZLIB)
gzclose(gzfp);
else
Hm.  The addition of the header in dir_open_for_write() uses
LZ4F_compressBegin.  Shouldn't we use LZ4F_compressEnd() if
fsync_fname() or fsync_parent_path() fail on top of closing the fd?
That would be more consistent IMO to do so.  The patch does that in
dir_close().  You should do that additionally if there is a failure
when writing the header.

+   pg_log_error("invalid compression-method \"%s\"", optarg);
+   exit(1);
This could be "invalid value \"%s\" for option %s", see
option_parse_int() in fe_utils/option_utils.c.

After running the TAP tests, the LZ4 section is failing as follows:
pg_receivewal: stopped log streaming at 0/4001950 (timeline 1)
pg_receivewal: not renaming "00010004.partial", segment is not 
complete
pg_receivewal: error: could not close file "00010004": 
Undefined error: 0
ok 26 - streaming some WAL using --compression-method=lz4
The third log line I am quoting here looks unexpected to me.  Saying
that, the tests integrate nicely with the existing code.

As mentioned upthread, LZ4-compressed files don't store the file size
by default.  I think that we should document that better in the code
and the documentation, in two ways at least:
- Add some comments mentioning lz4 --content-size, with at least one
in FindStreamingStart().
- Add a new paragraph in the documentation of --compression-method.

The name of the compression method is "LZ4" with upper-case
characters.  Some comments in the code and the tests, as well as the
docs, are not careful about that.
--
Michael


signature.asc
Description: PGP signature


Re: Teach pg_receivewal to use lz4 compression

2021-09-13 Thread gkokolatos

‐‐‐ Original Message ‐‐‐

On Saturday, September 11th, 2021 at 07:02, Jian Guo  wrote:

Hi,

thank you for looking at the patch.

> - LZ4F_decompressionContext_t ctx = NULL;
> - snprintf(fullpath, sizeof(fullpath), "%s/%s", basedir, 
> dirent->d_name);
> - fd = open(fullpath, O_RDONLY | PG_BINARY, 0);
>
> Should close the fd before exit or abort.

You are correct. Please find version 4 attached.

Cheers,
//Georgios

v4-0001-Teach-pg_receivewal-to-use-lz4-compression.patch
Description: Binary data


Re: Teach pg_receivewal to use lz4 compression

2021-09-10 Thread Jian Guo
@@ -250,14 +302,18 @@ FindStreamingStart(uint32 *tli)
/*
 * Check that the segment has the right size, if it's supposed 
to be
 * completed.  For non-compressed segments just check the 
on-disk size
-* and see if it matches a completed segment. For compressed 
segments,
-* look at the last 4 bytes of the compressed file, which is 
where the
-* uncompressed size is located for gz files with a size lower 
than
-* 4GB, and then compare it to the size of a completed segment. 
The 4
-* last bytes correspond to the ISIZE member according to
+* and see if it matches a completed segment. For zlib 
compressed
+* segments, look at the last 4 bytes of the compressed file, 
which is
+* where the uncompressed size is located for gz files with a 
size lower
+* than 4GB, and then compare it to the size of a completed 
segment.
+* The 4 last bytes correspond to the ISIZE member according to
 * http://www.zlib.org/rfc-gzip.html.
+*
+* For lz4 compressed segments read the header using the 
exposed API and
+* compare the uncompressed file size, stored in
+* LZ4F_frameInfo_t{.contentSize}, to that of a completed 
segment.
 */
-   if (!ispartial && !iscompress)
+   if (!ispartial && wal_compression_method == COMPRESSION_NONE)
{
struct stat statbuf;
charfullpath[MAXPGPATH * 2];
@@ -276,7 +332,7 @@ FindStreamingStart(uint32 *tli)
continue;
}
}
-   else if (!ispartial && iscompress)
+   else if (!ispartial && wal_compression_method == 
COMPRESSION_ZLIB)
{
int fd;
charbuf[4];
@@ -322,6 +378,70 @@ FindStreamingStart(uint32 *tli)
continue;
}
}
+   else if (!ispartial && compression_method == COMPRESSION_LZ4)
+   {
+#ifdef HAVE_LIBLZ4
+   int fd;
+   int r;
+   size_t  consumed_len = LZ4F_HEADER_SIZE_MAX;
+   charbuf[LZ4F_HEADER_SIZE_MAX];
+   charfullpath[MAXPGPATH * 2];
+   LZ4F_frameInfo_t frame_info = { 0 };
+   LZ4F_decompressionContext_t ctx = NULL;
+
+   snprintf(fullpath, sizeof(fullpath), "%s/%s", basedir, 
dirent->d_name);
+
+   fd = open(fullpath, O_RDONLY | PG_BINARY, 0);

Should close the fd before exit or abort.

Re: Teach pg_receivewal to use lz4 compression

2021-09-10 Thread gkokolatos


‐‐‐ Original Message ‐‐‐

On Friday, July 9th, 2021 at 04:49, Michael Paquier  wrote:

Hi,

please find v3 of the patch attached, rebased to the current head.

> > Michael Paquier wrote:
> >
>
> * http://www.zlib.org/rfc-gzip.html.
>
> -   -   For lz4 compressed segments
> */
> This comment is incomplete.

Fixed.

> +#define IsLZ4CompressXLogFileName(fname) \
> -   (strlen(fname) == XLOG_FNAME_LEN + strlen(".lz4") && \
> -   strspn(fname, "0123456789ABCDEF") == XLOG_FNAME_LEN && \
> -   strcmp((fname) + XLOG_FNAME_LEN, ".lz4") == 0)
>
> +#define IsLZ4PartialCompressXLogFileName(fname) \
> -   (strlen(fname) == XLOG_FNAME_LEN + strlen(".lz4.partial") && \
> -   strspn(fname, "0123456789ABCDEF") == XLOG_FNAME_LEN && \
> -   strcmp((fname) + XLOG_FNAME_LEN, ".lz4.partial") == 0)
>
> This is getting complicated. Would it be better to change this stuff
> and switch to a routine that checks if a segment has a valid name, is
> partial, and the type of compression that applied to it? It seems to
> me that we should group iszlibcompress and islz4compress together with
> the options available through compression_method.

Agreed and done.


> -   if (compresslevel != 0)
> -   {
> - if (compression_method == COMPRESSION_NONE)
> - {
> - compression_method = COMPRESSION_ZLIB;
> - }
> - if (compression_method != COMPRESSION_ZLIB)
> - {
> - pg_log_error("cannot use --compress when "
> -  "--compression-method is not gzip");
> - fprintf(stderr, _("Try \\"%s --help\\" for more 
> information.\\n"),
> - progname);
> - exit(1);
> - }
> -   }
>
> For compatibility where --compress enforces the use of zlib that would
> work, but this needs a comment explaining the goal of this block. I
> am wondering if it would be better to break the will and just complain
> when specifying --compress without --compression-method though. That
> would cause compatibility issues, but this would make folks aware of
> the presence of LZ4, which does not sound bad to me either as ZLIB is
> slower than LZ4 on all fronts.

Fair point. In v3 of the patch --compress requires --compression-method. Passing
0 as value errors out.

> -   else if (compression_method == COMPRESSION_ZLIB)
> -   {
> - pg_log_error("cannot use --compression-method gzip when "
> -  "--compression is 0");
> - fprintf(stderr, _("Try \\"%s --help\\" for more information.\\n"),
> - progname);
> - exit(1);
> -   }
>
> Hmm. It would be more natural to enforce a default compression level
> in this case? The user is asking for a compression with zlib here.

Agreed. A default value of 5, which is in the middle point of options, has been
defined and used.

In addition, the tests have been adjusted to mimic the newly added gzip tests.


Cheers,
//Georgios


> ---
>
> Michael

v3-0001-Teach-pg_receivewal-to-use-lz4-compression.patch
Description: Binary data


Re: Teach pg_receivewal to use lz4 compression

2021-07-12 Thread Dilip Kumar
On Mon, Jul 12, 2021 at 3:15 PM Magnus Hagander  wrote:
>
> On Mon, Jul 12, 2021 at 11:33 AM  wrote:
> >
> >
> >
> > ‐‐‐ Original Message ‐‐‐
> >
> > On Monday, July 12th, 2021 at 07:56, Michael Paquier  
> > wrote:
> >
> > > On Mon, Jul 12, 2021 at 11:10:24AM +0530, Dilip Kumar wrote:
> > >
> > > > On Thu, Jul 8, 2021 at 7:48 PM gkokola...@pm.me wrote:
> > > >
> > > > > We can, though I am not in favour of doing so. There is seemingly
> > > > >
> > > > > little benefit for added complexity.
> > > >
> > > > I am really not sure what complexity you are talking about, do you
> > > >
> > > > mean since with pglz we were already providing the compression level
> > > >
> > > > so let it be as it is but with the new compression method you don't
> > > >
> > > > see much benefit of providing compression level or speed?
> > >
> > > You mean s/pglz/zlib/ here perhaps? I am not sure what Georgios has
> > >
> > > in mind, but my opinion stands on the latter: there is little benefit
> > >
> > > in making lz4 faster than the default and reduce compression, as the
> > >
> > > default is already a rather low CPU user.
> >
> > Thank you all for your comments. I am sitting on the same side as Micheal
> > on this one. The complexity is not huge, yet there will need to be code to
> > pass this option to the lz4 api and various test cases to verify for
> > correctness and integrity. The burden of maintenance of this code vs the
> > benefit of the option, tilt the scale towards not including the option.
>
> +1 for skipping that one, at least for now, and sticking to
> default-only for lz4.

Okay, fine with me as well.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: Teach pg_receivewal to use lz4 compression

2021-07-12 Thread Magnus Hagander
On Mon, Jul 12, 2021 at 11:33 AM  wrote:
>
>
>
> ‐‐‐ Original Message ‐‐‐
>
> On Monday, July 12th, 2021 at 07:56, Michael Paquier  
> wrote:
>
> > On Mon, Jul 12, 2021 at 11:10:24AM +0530, Dilip Kumar wrote:
> >
> > > On Thu, Jul 8, 2021 at 7:48 PM gkokola...@pm.me wrote:
> > >
> > > > We can, though I am not in favour of doing so. There is seemingly
> > > >
> > > > little benefit for added complexity.
> > >
> > > I am really not sure what complexity you are talking about, do you
> > >
> > > mean since with pglz we were already providing the compression level
> > >
> > > so let it be as it is but with the new compression method you don't
> > >
> > > see much benefit of providing compression level or speed?
> >
> > You mean s/pglz/zlib/ here perhaps? I am not sure what Georgios has
> >
> > in mind, but my opinion stands on the latter: there is little benefit
> >
> > in making lz4 faster than the default and reduce compression, as the
> >
> > default is already a rather low CPU user.
>
> Thank you all for your comments. I am sitting on the same side as Micheal
> on this one. The complexity is not huge, yet there will need to be code to
> pass this option to the lz4 api and various test cases to verify for
> correctness and integrity. The burden of maintenance of this code vs the
> benefit of the option, tilt the scale towards not including the option.

+1 for skipping that one, at least for now, and sticking to
default-only for lz4.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/




Re: Teach pg_receivewal to use lz4 compression

2021-07-12 Thread gkokolatos



‐‐‐ Original Message ‐‐‐

On Monday, July 12th, 2021 at 07:56, Michael Paquier  
wrote:

> On Mon, Jul 12, 2021 at 11:10:24AM +0530, Dilip Kumar wrote:
>
> > On Thu, Jul 8, 2021 at 7:48 PM gkokola...@pm.me wrote:
> >
> > > We can, though I am not in favour of doing so. There is seemingly
> > >
> > > little benefit for added complexity.
> >
> > I am really not sure what complexity you are talking about, do you
> >
> > mean since with pglz we were already providing the compression level
> >
> > so let it be as it is but with the new compression method you don't
> >
> > see much benefit of providing compression level or speed?
>
> You mean s/pglz/zlib/ here perhaps? I am not sure what Georgios has
>
> in mind, but my opinion stands on the latter: there is little benefit
>
> in making lz4 faster than the default and reduce compression, as the
>
> default is already a rather low CPU user.

Thank you all for your comments. I am sitting on the same side as Micheal
on this one. The complexity is not huge, yet there will need to be code to
pass this option to the lz4 api and various test cases to verify for
correctness and integrity. The burden of maintenance of this code vs the
benefit of the option, tilt the scale towards not including the option.

Of course, I will happily provide whatever the community finds beneficial.

Cheers,
//Georgios


> -
>
> Michael




Re: Teach pg_receivewal to use lz4 compression

2021-07-11 Thread Michael Paquier
On Mon, Jul 12, 2021 at 11:10:24AM +0530, Dilip Kumar wrote:
> On Thu, Jul 8, 2021 at 7:48 PM  wrote:
>> We can, though I am not in favour of doing so. There is seemingly
>> little benefit for added complexity.
> 
> I am really not sure what complexity you are talking about, do you
> mean since with pglz we were already providing the compression level
> so let it be as it is but with the new compression method you don't
> see much benefit of providing compression level or speed?

You mean s/pglz/zlib/ here perhaps?  I am not sure what Georgios has
in mind, but my opinion stands on the latter: there is little benefit
in making lz4 faster than the default and reduce compression, as the
default is already a rather low CPU user.
--
Michael


signature.asc
Description: PGP signature


Re: Teach pg_receivewal to use lz4 compression

2021-07-11 Thread Dilip Kumar
On Thu, Jul 8, 2021 at 7:48 PM  wrote:
>
> Dilip Kumar wrote:
>
> > Wouldn't it be better to call it compression method instead of
> > compression program?
>
> Agreed. This is inline with the suggestions of other reviewers.
> Find the change in the attached patch.

Thanks, I will have a look.

> > I think we can somehow use "acceleration" parameter of lz4 compression
> > to map on compression level, It is not direct mapping but
> > can't we create some internal mapping instead of completely ignoring
> > this option for lz4, or we can provide another option for lz4?
>
> We can, though I am not in favour of doing so. There is seemingly
> little benefit for added complexity.

I am really not sure what complexity you are talking about, do you
mean since with pglz we were already providing the compression level
so let it be as it is but with the new compression method you don't
see much benefit of providing compression level or speed?

> > Should we also support LZ4 compression using dictionary?
>
> I would we should not do that. If my understanding is correct,
> decompression would require the dictionary to be passed along.
> The algorithm seems to be very competitive to the current compression
> as is.

I agree, we might not go for a dictionary because we would need to
dictionary to decompress as well.  So that will add an extra
complexity for user.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: Teach pg_receivewal to use lz4 compression

2021-07-09 Thread gkokolatos



‐‐‐ Original Message ‐‐‐

On Friday, July 9th, 2021 at 04:49, Michael Paquier  wrote:

> On Thu, Jul 08, 2021 at 02:18:40PM +, gkokola...@pm.me wrote:
>
> > please find v2 of the patch which tries to address the commends
> >
> > received so far.
>
> Thanks!
>
> > Michael Paquier wrote:
> >
> > > -   system_or_bail('lz4', '-t', $lz4_wals[0]);
> > >
> > > I think that you should just drop this part of the test. The only
> > >
> > > part of LZ4 that we require to be present when Postgres is built with
> > >
> > > --with-lz4 is its library liblz4. Commands associated to it may not
> > >
> > > be around, causing this test to fail. The test checking that one .lz4
> > >
> > > file has been created is good to have. It may be worth adding a test
> > >
> > > with a .lz4.partial segment generated and --endpos pointing to a LSN
> > >
> > > that does not finish the segment that gets switched.
> >
> > I humbly disagree with the need for the test. It is rather easily possible
> >
> > to generate a file that can not be decoded, thus becoming useless. Having 
> > the
> >
> > test will provide some guarantee for the fact. In the current patch, there
> >
> > is code to find out if the program lz4 is available in the system. If it is
> >
> > not available, then that specific test is skipped. The rest remains as it
> >
> > were. Also `system_or_bail` is not used anymore in favour of the 
> > `system_log`
> >
> > so that the test counted and the execution of tests continues upon failure.
>
> Check. I can see what you are using in the new patch. We could live
>
> with that.

Great. Thank you.

>
> > > It seems to me that you are missing some logic in
> > >
> > > FindStreamingStart() to handle LZ4-compressed segments, in relation
> > >
> > > with IsCompressXLogFileName() and IsPartialCompressXLogFileName().
> >
> > Very correct. The logic is now added. Given the lz4 api, one has to fill
> >
> > in the uncompressed size during creation time. Then one can read the
> >
> > headers and verify the expectations.
>
> Yeah, I read that as well with lz4 --list and the kind. That's weird
>
> compared to how ZLIB gives an easy access to it. We may want to do an
>
> effort and tell about more lz4 --content-size/--list, telling that we
>
> add the size in the segment compressed because we have to and LZ4 does
>
> not do it by default?

I am afraid I do not follow. In the patch we do add the uncompressed size
and then, the uncompressed size is checked against a known value WalSegSz.
What the compressed size will be checked against?

>
> > > Should we have more tests for ZLIB, while on it? That seems like a
> > >
> > > good addition as long as we can skip the tests conditionally when
> > >
> > > that's not supported.
> >
> > Agreed. Please allow me to provide a distinct patch for this code.
>
> Thanks. Looking forward to seeing it. That may be better on a
>
> separate thread, even if I digressed in this thread :)

Thank you for the comments. I will sent in a separate thread.

>
> > > I think we can somehow use "acceleration" parameter of lz4 compression
> > >
> > > to map on compression level, It is not direct mapping but
> > >
> > > can't we create some internal mapping instead of completely ignoring
> > >
> > > this option for lz4, or we can provide another option for lz4?
> >
> > We can, though I am not in favour of doing so. There is seemingly
> >
> > little benefit for added complexity.
>
> Agreed.
>
> > > What I think is important for the user when it comes to this
> > >
> > > option is the consistency of its naming across all the tools
> > >
> > > supporting it. pg_dump and pg_basebackup, where we could plug LZ4,
> > >
> > > already use most of the short options you could use for pg_receivewal,
> > >
> > > having only a long one gives a bit more flexibility.
> >
> > Done.
>
> * http://www.zlib.org/rfc-gzip.html.
>
> -   -   For lz4 compressed segments
>
> */
>
> This comment is incomplete.

It is. I will fix.

>
> +#define IsLZ4CompressXLogFileName(fname) \
> -   (strlen(fname) == XLOG_FNAME_LEN + strlen(".lz4") && \
> -   strspn(fname, "0123456789ABCDEF") == XLOG_FNAME_LEN && \
> -   strcmp((fname) + XLOG_FNAME_LEN, ".lz4") == 0)
>
> +#define IsLZ4PartialCompressXLogFileName(fname) \
> -   (strlen(fname) == XLOG_FNAME_LEN + strlen(".lz4.partial") && \
> -   strspn(fname, "0123456789ABCDEF") == XLOG_FNAME_LEN && \
> -   strcmp((fname) + XLOG_FNAME_LEN, ".lz4.partial") == 0)
>
> This is getting complicated. Would it be better to change this stuff
>
> and switch to a routine that checks if a segment has a valid name, is
>
> partial, and the type of compression that applied to it? It seems to
>
> me that we should group iszlibcompress and islz4compress together with
>
> the options available through compression_method.

I agree with you. I will refactor.


> -   if (compresslevel != 0)
> -   {
> - if (compression_method == COMPRESSION_NONE)
>

Re: Teach pg_receivewal to use lz4 compression

2021-07-08 Thread Michael Paquier
On Thu, Jul 08, 2021 at 02:18:40PM +, gkokola...@pm.me wrote:
> please find v2 of the patch which tries to address the commends
> received so far.

Thanks!

> Michael Paquier wrote:
>> +   system_or_bail('lz4', '-t', $lz4_wals[0]);
>> I think that you should just drop this part of the test.  The only
>> part of LZ4 that we require to be present when Postgres is built with
>> --with-lz4 is its library liblz4.  Commands associated to it may not
>> be around, causing this test to fail.  The test checking that one .lz4
>> file has been created is good to have.  It may be worth adding a test
>> with a .lz4.partial segment generated and --endpos pointing to a LSN
>> that does not finish the segment that gets switched.
> 
> I humbly disagree with the need for the test. It is rather easily possible
> to generate a file that can not be decoded, thus becoming useless. Having the
> test will provide some guarantee for the fact. In the current patch, there
> is code to find out if the program lz4 is available in the system. If it is
> not available, then that specific test is skipped. The rest remains as it
> were. Also `system_or_bail` is not used anymore in favour of the `system_log`
> so that the test counted and the execution of tests continues upon failure.

Check.  I can see what you are using in the new patch.  We could live
with that.

>> It seems to me that you are missing some logic in
>> FindStreamingStart() to handle LZ4-compressed segments, in relation
>> with IsCompressXLogFileName() and IsPartialCompressXLogFileName().
> 
> Very correct. The logic is now added. Given the lz4 api, one has to fill
> in the uncompressed size during creation time. Then one can read the
> headers and verify the expectations.

Yeah, I read that as well with lz4 --list and the kind.  That's weird
compared to how ZLIB gives an easy access to it.  We may want to do an
effort and tell about more lz4 --content-size/--list, telling that we
add the size in the segment compressed because we have to and LZ4 does
not do it by default?

>> Should we have more tests for ZLIB, while on it?  That seems like a
>> good addition as long as we can skip the tests conditionally when
>> that's not supported.
> 
> Agreed. Please allow me to provide a distinct patch for this code.

Thanks.  Looking forward to seeing it.  That may be better on a
separate thread, even if I digressed in this thread :)

>> I think we can somehow use "acceleration" parameter of lz4 compression
>> to map on compression level, It is not direct mapping but
>> can't we create some internal mapping instead of completely ignoring
>> this option for lz4, or we can provide another option for lz4?
> 
> We can, though I am not in favour of doing so. There is seemingly
> little benefit for added complexity.

Agreed.

>> What I think is important for the user when it comes to this
>> option is the consistency of its naming across all the tools
>> supporting it.  pg_dump and pg_basebackup, where we could plug LZ4,
>> already use most of the short options you could use for pg_receivewal,
>> having only a long one gives a bit more flexibility.
> 
> Done.

 * http://www.zlib.org/rfc-gzip.html.
+* For lz4 compressed segments
 */
This comment is incomplete.   

+#define IsLZ4CompressXLogFileName(fname)\
+   (strlen(fname) == XLOG_FNAME_LEN + strlen(".lz4") && \
+strspn(fname, "0123456789ABCDEF") == XLOG_FNAME_LEN && \
+strcmp((fname) + XLOG_FNAME_LEN, ".lz4") == 0)
+#define IsLZ4PartialCompressXLogFileName(fname)\
+   (strlen(fname) == XLOG_FNAME_LEN + strlen(".lz4.partial") && \
+strspn(fname, "0123456789ABCDEF") == XLOG_FNAME_LEN && \
+strcmp((fname) + XLOG_FNAME_LEN, ".lz4.partial") == 0)
This is getting complicated.  Would it be better to change this stuff
and switch to a routine that checks if a segment has a valid name, is
partial, and the type of compression that applied to it?  It seems to
me that we should group iszlibcompress and islz4compress together with
the options available through compression_method.

+   if (compresslevel != 0)
+   {
+   if (compression_method == COMPRESSION_NONE)
+   {
+   compression_method = COMPRESSION_ZLIB;
+   }
+   if (compression_method != COMPRESSION_ZLIB)
+   {
+   pg_log_error("cannot use --compress when "
+"--compression-method is not gzip");
+   fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+   progname);
+   exit(1);
+   }
+   }
For compatibility where --compress enforces the use of zlib that would
work, but this needs a comment explaining the goal of this block.  I
am wondering if it would be better to break the will and just complain
when specifying --compress without --compression-method though.  That
would cause compatibility issues, but this would make folks aware of
the presence of LZ4, which does not sound bad to me either as ZLIB is
slower than LZ4 on all 

Re: Teach pg_receivewal to use lz4 compression

2021-07-08 Thread gkokolatos

Hi,

please find v2 of the patch which tries to address the commends received so far.

Thank you all for your comments.

Michael Paquier wrote:

> Documentation is missing from the patch.
>
It has now been added.

> +   LZ4F_compressionContext_t ctx;
> +   size_t  outbufCapacity;
> +   void   *outbuf;
> It may be cleaner to refer to lz4 in the name of those variables?

Agreed and done

> +   ctx_out = LZ4F_createCompressionContext(, LZ4F_VERSION);
> +   outbufCapacity = LZ4F_compressBound(LZ4_IN_SIZE, NULL /* default 
> preferences */);
> Interesting.  So this cannot be done at compilation time because of
> the auto-flush mode looking at the LZ4 code.  That looks about right.

This is also my understanding.

> +   system_or_bail('lz4', '-t', $lz4_wals[0]);
> I think that you should just drop this part of the test.  The only
> part of LZ4 that we require to be present when Postgres is built with
> --with-lz4 is its library liblz4.  Commands associated to it may not
> be around, causing this test to fail.  The test checking that one .lz4
> file has been created is good to have.  It may be worth adding a test
> with a .lz4.partial segment generated and --endpos pointing to a LSN
> that does not finish the segment that gets switched.

I humbly disagree with the need for the test. It is rather easily possible
to generate a file that can not be decoded, thus becoming useless. Having the
test will provide some guarantee for the fact. In the current patch, there
is code to find out if the program lz4 is available in the system. If it is
not available, then that specific test is skipped. The rest remains as it
were. Also `system_or_bail` is not used anymore in favour of the `system_log`
so that the test counted and the execution of tests continues upon failure.


> It seems to me that you are missing some logic in
> FindStreamingStart() to handle LZ4-compressed segments, in relation
> with IsCompressXLogFileName() and IsPartialCompressXLogFileName().

Very correct. The logic is now added. Given the lz4 api, one has to fill
in the uncompressed size during creation time. Then one can read the
headers and verify the expectations.


> Should we have more tests for ZLIB, while on it?  That seems like a
> good addition as long as we can skip the tests conditionally when
> that's not supported.

Agreed. Please allow me to provide a distinct patch for this code.


Dilip Kumar wrote:

> Wouldn't it be better to call it compression method instead of
> compression program?

Agreed. This is inline with the suggestions of other reviewers.
Find the change in the attached patch.

> I think we can somehow use "acceleration" parameter of lz4 compression
> to map on compression level, It is not direct mapping but
> can't we create some internal mapping instead of completely ignoring
> this option for lz4, or we can provide another option for lz4?

We can, though I am not in favour of doing so. There is seemingly
little benefit for added complexity.

> Should we also support LZ4 compression using dictionary?

I would we should not do that. If my understanding is correct,
decompression would require the dictionary to be passed along.
The algorithm seems to be very competitive to the current compression
as is.

Magnus Hagander wrote:

> I came here to say exactly that, just had to think up what I thought
> was the better name first. Either method or algorithm, but method
> seems like the much simpler choice and therefore better in this case.
>
> Should is also then not be --compression-method, rather than 
> --compress-method?

Agreed and changed throughout.


Michael Paquier wrote:

> What I think is important for the user when it comes to this
> option is the consistency of its naming across all the tools
> supporting it.  pg_dump and pg_basebackup, where we could plug LZ4,
> already use most of the short options you could use for pg_receivewal,
> having only a long one gives a bit more flexibility.

Done.

Cheers,
//Georgios

v2-0001-Teach-pg_receivewal-to-use-lz4-compression.patch
Description: Binary data


Re: Teach pg_receivewal to use lz4 compression

2021-07-02 Thread gkokolatos



‐‐‐ Original Message ‐‐‐

On Friday, July 2nd, 2021 at 03:10, Michael Paquier  wrote:

> On Thu, Jul 01, 2021 at 02:10:17PM +, gkokola...@pm.me wrote:
>
> > Micheal suggested on the same thread to move my entry in the help output so 
> > that
> >
> > the output remains ordered. I would like the options for the compression 
> > method and
> >
> > the already existing compression level to next to each other if possible. 
> > Then it
> >
> > should be either 'X' or 'Y'.
>
> Hmm. Grouping these together makes sense for the user. One choice
>
> that we have here is to drop the short option, and just use a long
>
> one. What I think is important for the user when it comes to this
>
> option is the consistency of its naming across all the tools
>
> supporting it. pg_dump and pg_basebackup, where we could plug LZ4,
>
> already use most of the short options you could use for pg_receivewal,
>
> having only a long one gives a bit more flexibility.


Good point. I am going with that one.


> --
>
> Michael




Re: Teach pg_receivewal to use lz4 compression

2021-07-01 Thread Michael Paquier
On Thu, Jul 01, 2021 at 02:10:17PM +, gkokola...@pm.me wrote:
> Micheal suggested on the same thread to move my entry in the help output so 
> that
> the output remains ordered. I would like the options for the compression 
> method and
> the already existing compression level to next to each other if possible. 
> Then it
> should be either 'X' or 'Y'.

Hmm.  Grouping these together makes sense for the user.  One choice
that we have here is to drop the short option, and just use a long
one.  What I think is important for the user when it comes to this
option is the consistency of its naming across all the tools
supporting it.  pg_dump and pg_basebackup, where we could plug LZ4,
already use most of the short options you could use for pg_receivewal,
having only a long one gives a bit more flexibility. 
--
Michael


signature.asc
Description: PGP signature


Re: Teach pg_receivewal to use lz4 compression

2021-07-01 Thread gkokolatos



‐‐‐ Original Message ‐‐‐

On Thursday, July 1st, 2021 at 15:58, Magnus Hagander  
wrote:

> On Thu, Jul 1, 2021 at 3:39 PM gkokola...@pm.me wrote:
>
> > ‐‐‐ Original Message ‐‐‐
> >
> > On Thursday, July 1st, 2021 at 12:28, Magnus Hagander mag...@hagander.net 
> > wrote:
> >
> > > On Wed, Jun 30, 2021 at 8:34 AM Dilip Kumar dilipbal...@gmail.com wrote:
> > >
> > > > On Tue, Jun 29, 2021 at 8:15 PM gkokola...@pm.me wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > The program pg_receivewal can use gzip compression to store the 
> > > > > received WAL.
> > > > >
> > > > > This patch teaches it to be able to use lz4 compression if the binary 
> > > > > is build
> > > > >
> > > > > using the -llz4 flag.
> > > >
> > > > +1 for the idea
> > > >
> > > > Some comments/suggestions on the patch
> > > >
> > > > @@ -90,7 +91,8 @@ usage(void)
> > > >
> > > > printf((" --synchronous flush write-ahead log immediately
> > > >
> > > > after writing\n"));
> > > >
> > > > printf((" -v, --verbose output verbose messages\n"));
> > > >
> > > > printf(_(" -V, --version output version information, then exit\n"));
> > > >
> > > > -   printf(_(" -Z, --compress=0-9 compress logs with given
> > > >
> > > > compression level\n"));
> > > >
> > > > -   printf(_(" -I, --compress-program use this program for 
> > > > compression\n"));
> > > >
> > > >
> > > > Wouldn't it be better to call it compression method instead of
> > > >
> > > > compression program?
> > >
> > > I came here to say exactly that, just had to think up what I thought
> > >
> > > was the better name first. Either method or algorithm, but method
> > >
> > > seems like the much simpler choice and therefore better in this case.
> > >
> > > Should is also then not be --compression-method, rather than 
> > > --compress-method?
> >
> > Not a problem. To be very transparent, I first looked what was already out 
> > there.
> >
> > For example `tar` is using
> >
> > -I, --use-compress-program=PROG
> >
> > yet the 'use-' bit would push the alignment of the --help output, so I 
> > removed it.
>
> I think the difference there is that tar actually calls an external
>
> program to do the work... And we are using the built-in library,
>
> right?

You are very correct :) I am not objecting the change at all. Just let you know
how I chose that. You know, naming is dead easy and all...

On a more serious note, what about the `-I` short flag? Should we keep it or
is there a better one to be used?

Micheal suggested on the same thread to move my entry in the help output so that
the output remains ordered. I would like the options for the compression method 
and
the already existing compression level to next to each other if possible. Then 
it
should be either 'X' or 'Y'.

Thoughts?



>
> 
>
> Magnus Hagander
>
> Me: https://www.hagander.net/
>
> Work: https://www.redpill-linpro.com/




Re: Teach pg_receivewal to use lz4 compression

2021-07-01 Thread Magnus Hagander
On Thu, Jul 1, 2021 at 3:39 PM  wrote:
>
>
>
> ‐‐‐ Original Message ‐‐‐
>
> On Thursday, July 1st, 2021 at 12:28, Magnus Hagander  
> wrote:
>
> > On Wed, Jun 30, 2021 at 8:34 AM Dilip Kumar dilipbal...@gmail.com wrote:
> >
> > > On Tue, Jun 29, 2021 at 8:15 PM gkokola...@pm.me wrote:
> > >
> > > > Hi,
> > > >
> > > > The program pg_receivewal can use gzip compression to store the 
> > > > received WAL.
> > > >
> > > > This patch teaches it to be able to use lz4 compression if the binary 
> > > > is build
> > > >
> > > > using the -llz4 flag.
> > >
> > > +1 for the idea
> > >
> > > Some comments/suggestions on the patch
> > >
> > > @@ -90,7 +91,8 @@ usage(void)
> > >
> > > printf((" --synchronous flush write-ahead log immediately
> > >
> > > after writing\n"));
> > >
> > > printf((" -v, --verbose output verbose messages\n"));
> > >
> > > printf(_(" -V, --version output version information, then exit\n"));
> > >
> > > -   printf(_(" -Z, --compress=0-9 compress logs with given
> > >
> > > compression level\n"));
> > >
> > > -   printf(_(" -I, --compress-program use this program for 
> > > compression\n"));
> > >
> > > Wouldn't it be better to call it compression method instead of
> > >
> > > compression program?
> >
> > I came here to say exactly that, just had to think up what I thought
> >
> > was the better name first. Either method or algorithm, but method
> >
> > seems like the much simpler choice and therefore better in this case.
> >
> > Should is also then not be --compression-method, rather than 
> > --compress-method?
>
> Not a problem. To be very transparent, I first looked what was already out 
> there.
> For example `tar` is using
> -I, --use-compress-program=PROG
> yet the 'use-' bit would push the alignment of the --help output, so I 
> removed it.

I think the difference there is that tar actually calls an external
program to do the work... And we are using the built-in library,
right?

--
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/




Re: Teach pg_receivewal to use lz4 compression

2021-07-01 Thread gkokolatos



‐‐‐ Original Message ‐‐‐

On Thursday, July 1st, 2021 at 12:28, Magnus Hagander  
wrote:

> On Wed, Jun 30, 2021 at 8:34 AM Dilip Kumar dilipbal...@gmail.com wrote:
>
> > On Tue, Jun 29, 2021 at 8:15 PM gkokola...@pm.me wrote:
> >
> > > Hi,
> > >
> > > The program pg_receivewal can use gzip compression to store the received 
> > > WAL.
> > >
> > > This patch teaches it to be able to use lz4 compression if the binary is 
> > > build
> > >
> > > using the -llz4 flag.
> >
> > +1 for the idea
> >
> > Some comments/suggestions on the patch
> >
> > @@ -90,7 +91,8 @@ usage(void)
> >
> > printf((" --synchronous flush write-ahead log immediately
> >
> > after writing\n"));
> >
> > printf((" -v, --verbose output verbose messages\n"));
> >
> > printf(_(" -V, --version output version information, then exit\n"));
> >
> > -   printf(_(" -Z, --compress=0-9 compress logs with given
> >
> > compression level\n"));
> >
> > -   printf(_(" -I, --compress-program use this program for compression\n"));
> >
> > Wouldn't it be better to call it compression method instead of
> >
> > compression program?
>
> I came here to say exactly that, just had to think up what I thought
>
> was the better name first. Either method or algorithm, but method
>
> seems like the much simpler choice and therefore better in this case.
>
> Should is also then not be --compression-method, rather than 
> --compress-method?

Not a problem. To be very transparent, I first looked what was already out 
there.
For example `tar` is using
-I, --use-compress-program=PROG
yet the 'use-' bit would push the alignment of the --help output, so I removed 
it.

To me, as a non native English speaker, `--compression-method` does sound 
better.
I can just re-align the rest of the help output.

Updated patch is on the making.

>
> --
>
> Magnus Hagander
>
> Me: https://www.hagander.net/
>
> Work: https://www.redpill-linpro.com/




Re: Teach pg_receivewal to use lz4 compression

2021-07-01 Thread Magnus Hagander
On Wed, Jun 30, 2021 at 8:34 AM Dilip Kumar  wrote:
>
> On Tue, Jun 29, 2021 at 8:15 PM  wrote:
> >
> > Hi,
> >
> > The program pg_receivewal can use gzip compression to store the received 
> > WAL.
> > This patch teaches it to be able to use lz4 compression if the binary is 
> > build
> > using the -llz4 flag.
>
> +1 for the idea
>
> Some comments/suggestions on the patch
>
> 1.
> @@ -90,7 +91,8 @@ usage(void)
>   printf(_("  --synchronous  flush write-ahead log immediately
> after writing\n"));
>   printf(_("  -v, --verbose  output verbose messages\n"));
>   printf(_("  -V, --version  output version information, then 
> exit\n"));
> - printf(_("  -Z, --compress=0-9 compress logs with given
> compression level\n"));
> + printf(_("  -I, --compress-program use this program for compression\n"));
>
> Wouldn't it be better to call it compression method instead of
> compression program?

I came here to say exactly that, just had to think up what I thought
was the better name first. Either method or algorithm, but method
seems like the much simpler choice and therefore better in this case.

Should is also then not be --compression-method, rather than --compress-method?

--
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/




Re: Teach pg_receivewal to use lz4 compression

2021-06-30 Thread Dilip Kumar
On Tue, Jun 29, 2021 at 8:15 PM  wrote:
>
> Hi,
>
> The program pg_receivewal can use gzip compression to store the received WAL.
> This patch teaches it to be able to use lz4 compression if the binary is build
> using the -llz4 flag.

+1 for the idea

Some comments/suggestions on the patch

1.
@@ -90,7 +91,8 @@ usage(void)
  printf(_("  --synchronous  flush write-ahead log immediately
after writing\n"));
  printf(_("  -v, --verbose  output verbose messages\n"));
  printf(_("  -V, --version  output version information, then exit\n"));
- printf(_("  -Z, --compress=0-9 compress logs with given
compression level\n"));
+ printf(_("  -I, --compress-program use this program for compression\n"));

Wouldn't it be better to call it compression method instead of
compression program?

2.
+ printf(_("  -Z, --compress=0-9 compress logs with given
compression level (available only with compress-program=zlib)\n"));

I think we can somehow use "acceleration" parameter of lz4 compression
to map on compression level, It is not direct mapping but
can't we create some internal mapping instead of completely ignoring
this option for lz4, or we can provide another option for lz4?

3. Should we also support LZ4 compression using dictionary?

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: Teach pg_receivewal to use lz4 compression

2021-06-29 Thread Michael Paquier
On Tue, Jun 29, 2021 at 02:45:17PM +, gkokola...@pm.me wrote:
> The program pg_receivewal can use gzip compression to store the received WAL.
> This patch teaches it to be able to use lz4 compression if the binary is build
> using the -llz4 flag.

Nice.

> Previously, the user had to use the option --compress with a value between 
> [0-9]
> to denote that gzip compression was requested. This specific behaviour is
> maintained. A newly introduced option --compress-program=lz4 can be used to 
> ask
> for the logs to be compressed using lz4 instead. In that case, no compression
> values can be selected as it does not seem too useful.

Yes, I am not convinced either that we should care about making the
acceleration customizable.

> Under the hood there is nothing exceptional to be noted. Tar based archives 
> have
> not yet been taught to use lz4 compression. Those are used by pg_basebackup. 
> If
> is is felt useful, then it is easy to be added in a new patch.

Documentation is missing from the patch.

+   LZ4F_compressionContext_t ctx;
+   size_t  outbufCapacity;
+   void   *outbuf;
It may be cleaner to refer to lz4 in the name of those variables?

+   ctx_out = LZ4F_createCompressionContext(, LZ4F_VERSION);
+   outbufCapacity = LZ4F_compressBound(LZ4_IN_SIZE, NULL /* default 
preferences */);
Interesting.  So this cannot be done at compilation time because of
the auto-flush mode looking at the LZ4 code.  That looks about right.

getopt_long() is forgotting the new option 'I'.

+   system_or_bail('lz4', '-t', $lz4_wals[0]);
I think that you should just drop this part of the test.  The only
part of LZ4 that we require to be present when Postgres is built with
--with-lz4 is its library liblz4.  Commands associated to it may not
be around, causing this test to fail.  The test checking that one .lz4
file has been created is good to have.  It may be worth adding a test
with a .lz4.partial segment generated and --endpos pointing to a LSN
that does not finish the segment that gets switched.

It seems to me that you are missing some logic in
FindStreamingStart() to handle LZ4-compressed segments, in relation
with IsCompressXLogFileName() and IsPartialCompressXLogFileName().

+   pg_log_error("invalid compress-program \"%s\"", optarg);
"compress-program" sounds weird.  Shouldn't that just say "invalid
compression method" or similar?

+   printf(_("  -Z, --compress=0-9 compress logs with given
compression level (available only with compress-program=zlib)\n"));
This line is too long.

Should we have more tests for ZLIB, while on it?  That seems like a
good addition as long as we can skip the tests conditionally when
that's not supported.
--
Michael


signature.asc
Description: PGP signature


Teach pg_receivewal to use lz4 compression

2021-06-29 Thread gkokolatos
Hi,

The program pg_receivewal can use gzip compression to store the received WAL.
This patch teaches it to be able to use lz4 compression if the binary is build
using the -llz4 flag.

Previously, the user had to use the option --compress with a value between [0-9]
to denote that gzip compression was requested. This specific behaviour is
maintained. A newly introduced option --compress-program=lz4 can be used to ask
for the logs to be compressed using lz4 instead. In that case, no compression
values can be selected as it does not seem too useful.

Under the hood there is nothing exceptional to be noted. Tar based archives have
not yet been taught to use lz4 compression. Those are used by pg_basebackup. If
is is felt useful, then it is easy to be added in a new patch.

Cheers,
//Georgios

v1-0001-Teach-pg_receivewal-to-use-lz4-compression.patch
Description: Binary data