Re: [HACKERS] pg_basebackup: Allow use of arbitrary compression program

2017-04-27 Thread Michael Harris
Hi All,

I have a working prototype now, but there is one aspect I haven't been
able to find the best solution for.

The CLI interface so far has the following new added option:

-C, --compressprog=PRG use supplied external program for compression

An example usage would be:

pg_basebackup -D /home/harmic/tmp/ -C bzip2 -F t

The command string supplied to -C should be a compression command that
reads from stdin and outputs to stdout.

The problem is: when constructing output filename(s), how can we
suffix them with the correct suffix (.gz / .bz2 / .xz / ) ?

The options I can think of are:

 1. Add yet another command line option to specify a suffix
 2. Some kind of heuristic to figure it out from the supplied command
string (from known compression programs, but that will never be
complete)
 3. Don't worry about it, let the user rename them afterwards, in
which case they would be named .tar
 4. Make the compression command a template, eg. "bzip2 -c > %s.bz2",
so that the template itself will add the suffix

#4 might also be more flexible for tools that don't support output to
stdout, but it is a bit more complex to use.

Any other ideas?

Regards // Mike


On Wed, Apr 12, 2017 at 3:49 PM, Michael Harris  wrote:
> Hi,
>
> Thanks for the feedback!
>
>>> 2) The current logic either uses zlib if compiled in, or offers no
>>> compression at all, controlled by a series of #ifdef/#endif. I would
>>> prefer that the user can either use zlib or an external program
>>> without having to recompile, so I would remove the #ifdefs and replace
>>> them with run time branching.
>>
>>
>> Not sure how that would work or be needed. The reasonable thing would be if 
>> zlib
>> is available when building the choices would be "no compression",
>> "zlib compression" or "external compression". If there was no zlib available
>> when building, the choices would be "no compression" or "external 
>> compression".
>
> That's exactly how I intend it to work. I had thought that the current
> structure of the code would not allow that, but looking at it more
> closely I see that it does, so I don't have to re-organize the
> #ifdefs.
>
> Regards // Mike


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_basebackup: Allow use of arbitrary compression program

2017-04-11 Thread Michael Harris
Hi,

Thanks for the feedback!

>> 2) The current logic either uses zlib if compiled in, or offers no
>> compression at all, controlled by a series of #ifdef/#endif. I would
>> prefer that the user can either use zlib or an external program
>> without having to recompile, so I would remove the #ifdefs and replace
>> them with run time branching.
>
>
> Not sure how that would work or be needed. The reasonable thing would be if 
> zlib
> is available when building the choices would be "no compression",
> "zlib compression" or "external compression". If there was no zlib available
> when building, the choices would be "no compression" or "external 
> compression".

That's exactly how I intend it to work. I had thought that the current
structure of the code would not allow that, but looking at it more
closely I see that it does, so I don't have to re-organize the
#ifdefs.

Regards // Mike


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_basebackup: Allow use of arbitrary compression program

2017-04-09 Thread Magnus Hagander
On Fri, Apr 7, 2017 at 4:04 AM, Michael Harris  wrote:

> Hello,
>
> Back in pg 9.2, we hacked a copy of pg_basebackup to add a command
> line option which would allow the user to specify an arbitrary
> external program (potentially including arguments) to be used to
> compress the tar backup.
>
> Our motivation was to be able to use pigz (parallel gzip
> implementation) to speed up the compression. It also allows using
> tools like bzip2, xz, etc instead of the inbuilt zlib.
>
> I never ended up submitting that upstream, but now it looks like I
> will have to repeat the exercise for 9.6, so I was wondering if such a
> feature would be welcomed.
>
> I found one or two references to people asking for this, eg:
> https://www.commandprompt.com/blog/a_pg_basebackup_wish_list/
>
> To do it properly would require:
>
> 1) Adding command line option as follows:
>
>   -C, --compressprog=PROG
>  Use supplied program for compression
>
> 2) The current logic either uses zlib if compiled in, or offers no
> compression at all, controlled by a series of #ifdef/#endif. I would
> prefer that the user can either use zlib or an external program
> without having to recompile, so I would remove the #ifdefs and replace
> them with run time branching.
>

Not sure how that would work or be needed. The reasonable thing would be if
zlib is available when building the choices would be "no compression",
"zlib compression" or "external compression". If there was no zlib
available when building, the choices would be "no compression" or "external
compression".

Or maybe I'm misunderstanding what you're saying?



> 3) When opening the output file, if the -C option was used, use popen
> to open a child process and write to that.
>
> My questions are:
> - Has anything like this already been discussed?
>

I think it has, but not in detail.



> - Would this be a welcome contribution?
>

Yes, I definitely think this would be useful.



> - Can anyone see any problems with the above approach?
>

One thing to consider is the work done recently to ensure that the output
is properly synchronized when written to disk. I don't think it's
reasonable to expect that from an external compression, but if it can be
made optional that'd be good. Or at least be careful not to break the
current one.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 


Re: [HACKERS] pg_basebackup: Allow use of arbitrary compression program

2017-04-07 Thread Jeff Janes
On Thu, Apr 6, 2017 at 7:04 PM, Michael Harris  wrote:

> Hello,
>
> Back in pg 9.2, we hacked a copy of pg_basebackup to add a command
> line option which would allow the user to specify an arbitrary
> external program (potentially including arguments) to be used to
> compress the tar backup.
>
> Our motivation was to be able to use pigz (parallel gzip
> implementation) to speed up the compression. It also allows using
> tools like bzip2, xz, etc instead of the inbuilt zlib.
>
> I never ended up submitting that upstream, but now it looks like I
> will have to repeat the exercise for 9.6, so I was wondering if such a
> feature would be welcomed.
>

I would welcome it.  I would really like to be able to use parallel pigz
and pxz.

You can stream the data into a compression tool of your choice as long as
you use tar mode and specify '-D -', but that is incompatible with table
spaces, and with xlog streaming, and so is not a very good solution.

Cheers,

Jeff


[HACKERS] pg_basebackup: Allow use of arbitrary compression program

2017-04-06 Thread Michael Harris
Hello,

Back in pg 9.2, we hacked a copy of pg_basebackup to add a command
line option which would allow the user to specify an arbitrary
external program (potentially including arguments) to be used to
compress the tar backup.

Our motivation was to be able to use pigz (parallel gzip
implementation) to speed up the compression. It also allows using
tools like bzip2, xz, etc instead of the inbuilt zlib.

I never ended up submitting that upstream, but now it looks like I
will have to repeat the exercise for 9.6, so I was wondering if such a
feature would be welcomed.

I found one or two references to people asking for this, eg:
https://www.commandprompt.com/blog/a_pg_basebackup_wish_list/

To do it properly would require:

1) Adding command line option as follows:

  -C, --compressprog=PROG
 Use supplied program for compression

2) The current logic either uses zlib if compiled in, or offers no
compression at all, controlled by a series of #ifdef/#endif. I would
prefer that the user can either use zlib or an external program
without having to recompile, so I would remove the #ifdefs and replace
them with run time branching.

3) When opening the output file, if the -C option was used, use popen
to open a child process and write to that.

My questions are:
- Has anything like this already been discussed?
- Would this be a welcome contribution?
- Can anyone see any problems with the above approach?

Thanks!

Regards
Mike Harris


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers