Re: where should I stick that backup?

2020-04-20 Thread Amit Kapila
On Thu, Apr 16, 2020 at 3:44 AM Andres Freund wrote: > > > I think having a simple framework in pg_basebackup for plugging in new > > algorithms would make it noticeably simpler to add LZ4 or whatever > > your favorite compression algorithm is. And I think having that > > framework also be able

Re: where should I stick that backup?

2020-04-20 Thread Amit Kapila
On Mon, Apr 13, 2020 at 5:57 AM Stephen Frost wrote: > > There's a couple of other pieces here that I think bear mentioning. The > first is that pgBackRest has an actual 'restore' command- and that works > with the filters and works with the storage drivers, so what you're > looking at when it

Re: where should I stick that backup?

2020-04-20 Thread Amit Kapila
On Mon, Apr 20, 2020 at 8:18 AM Amit Kapila wrote: > > On Sat, Apr 18, 2020 at 8:35 PM Robert Haas wrote: > > > > On Fri, Apr 17, 2020 at 7:44 PM Andres Freund wrote: > > > This suggest that pipes do have a considerably higher overhead on > > > windows, but that it's not all that terrible if

Re: where should I stick that backup?

2020-04-19 Thread Amit Kapila
On Sat, Apr 18, 2020 at 5:14 AM Andres Freund wrote: > > zstd -T0 < onegbofrandom > NUL > zstd -T0 < onegbofrandom > /dev/null > linux host: 0.361s > windows guest: 0.602s > > zstd -T0 < onegbofrandom | dd bs=1M of=NUL > zstd -T0 < onegbofrandom | dd bs=1M of=/dev/null > linux host:

Re: where should I stick that backup?

2020-04-19 Thread Amit Kapila
On Sat, Apr 18, 2020 at 8:35 PM Robert Haas wrote: > > On Fri, Apr 17, 2020 at 7:44 PM Andres Freund wrote: > > This suggest that pipes do have a considerably higher overhead on > > windows, but that it's not all that terrible if one takes care to use > > large buffers in each pipe element. > >

Re: where should I stick that backup?

2020-04-18 Thread Greg Stark
Fwiw, it was common trick in the Oracle world to create a named pipe to gzip and then write your backup to it. I really like that way of doing things but I suppose it's probably too old-fashioned to expect to survive. And in practice while it worked for a manual process for a sysadmin it's pretty

Re: where should I stick that backup?

2020-04-18 Thread Robert Haas
On Fri, Apr 17, 2020 at 7:44 PM Andres Freund wrote: > This suggest that pipes do have a considerably higher overhead on > windows, but that it's not all that terrible if one takes care to use > large buffers in each pipe element. > > It's notable though that even the simplest use of a pipe does

Re: where should I stick that backup?

2020-04-17 Thread Andres Freund
Hi, On 2020-04-17 12:19:32 -0400, Robert Haas wrote: > On Thu, Apr 16, 2020 at 10:22 PM Robert Haas wrote: > > Hmm. Could we learn what we need to know about this by doing something > > as taking a basebackup of a cluster with some data in it (say, created > > by pgbench -i -s 400 or something)

Re: where should I stick that backup?

2020-04-17 Thread Robert Haas
On Thu, Apr 16, 2020 at 10:22 PM Robert Haas wrote: > Hmm. Could we learn what we need to know about this by doing something > as taking a basebackup of a cluster with some data in it (say, created > by pgbench -i -s 400 or something) and then comparing the speed of cat > < base.tar | gzip >

Re: where should I stick that backup?

2020-04-16 Thread Robert Haas
On Wed, Apr 15, 2020 at 7:55 PM Robert Haas wrote: > Yeah. I think we really need to understand the performance > characteristics of pipes better. If they're slow, then anything that > needs to be fast has to work some other way (but we could still > provide a pipe-based slow way for niche uses).

Re: where should I stick that backup?

2020-04-15 Thread Robert Haas
On Wed, Apr 15, 2020 at 6:13 PM Andres Freund wrote: > I guess what I perceived to be the fundamental difference, before this > email, between our positions is that I (still) think that exposing > detailed postprocessing shell fragment style arguments to pg_basebackup, > especially as the only

Re: where should I stick that backup?

2020-04-15 Thread Andres Freund
Hi, On 2020-04-15 09:23:30 -0400, Robert Haas wrote: > On Tue, Apr 14, 2020 at 9:50 PM Andres Freund wrote: > > On 2020-04-14 11:38:03 -0400, Robert Haas wrote: > > > I'm fairly deeply uncomfortable with what Andres is proposing. I see > > > that it's very powerful, and can do a lot of things,

Re: where should I stick that backup?

2020-04-15 Thread Robert Haas
On Tue, Apr 14, 2020 at 9:50 PM Andres Freund wrote: > On 2020-04-14 11:38:03 -0400, Robert Haas wrote: > > I'm fairly deeply uncomfortable with what Andres is proposing. I see > > that it's very powerful, and can do a lot of things, and that if > > you're building something that does

Re: where should I stick that backup?

2020-04-14 Thread Andres Freund
Hi, On 2020-04-14 11:38:03 -0400, Robert Haas wrote: > I'm fairly deeply uncomfortable with what Andres is proposing. I see > that it's very powerful, and can do a lot of things, and that if > you're building something that does sophisticated things with storage, > you probably want an API like

Re: where should I stick that backup?

2020-04-14 Thread Robert Haas
On Tue, Apr 14, 2020 at 11:08 AM Stephen Frost wrote: > Wouldn't it make sense to, given that we have some idea of what we want > it to eventually look like, to make progress in that direction though? Well, yes. :-) > That is- I tend to agree with Andres that having this supported > server-side

Re: where should I stick that backup?

2020-04-14 Thread Stephen Frost
Greetings, * Robert Haas (robertmh...@gmail.com) wrote: > On Sun, Apr 12, 2020 at 8:27 PM Andres Freund wrote: > > I really think we want the option to eventually do this server-side. And > > I don't quite see it as viable to go for an API that allows to specify > > shell fragments that are

Re: where should I stick that backup?

2020-04-14 Thread Robert Haas
On Sun, Apr 12, 2020 at 8:27 PM Andres Freund wrote: > > That's quite appealing. One downside - IMHO significant - is that you > > have to have a separate process to do *anything*. If you want to add a > > filter that just logs everything it's asked to do, for example, you've > > gotta have a

Re: where should I stick that backup?

2020-04-13 Thread Robert Haas
On Sun, Apr 12, 2020 at 9:18 PM Stephen Frost wrote: > There's two different questions we're talking about here and I feel like > they're being conflated. To try and clarify: > > - Could you implement FDWs with shell scripts, and custom programs? I'm > pretty confident that the answer is yes,

Re: where should I stick that backup?

2020-04-13 Thread Bruce Momjian
On Sun, Apr 12, 2020 at 09:18:28PM -0400, Stephen Frost wrote: > * Bruce Momjian (br...@momjian.us) wrote: > > On Fri, Apr 10, 2020 at 10:54:10AM -0400, Stephen Frost wrote: > > > * Robert Haas (robertmh...@gmail.com) wrote: > > > > On Thu, Apr 9, 2020 at 6:44 PM Bruce Momjian wrote: > > > > >

Re: where should I stick that backup?

2020-04-12 Thread Stephen Frost
Greetings, Answering both in one since they're largely the same. * Bruce Momjian (br...@momjian.us) wrote: > On Fri, Apr 10, 2020 at 10:54:10AM -0400, Stephen Frost wrote: > > * Robert Haas (robertmh...@gmail.com) wrote: > > > On Thu, Apr 9, 2020 at 6:44 PM Bruce Momjian wrote: > > > > Good

Re: where should I stick that backup?

2020-04-12 Thread Andres Freund
Hi, On 2020-04-12 20:02:50 -0400, Robert Haas wrote: > On Sun, Apr 12, 2020 at 3:17 PM Andres Freund wrote: > > A huge advantage of a scheme like this would be that it wouldn't have to > > be specific to pg_basebackup. It could just as well work directly on the > > server, avoiding an unnecesary

Re: where should I stick that backup?

2020-04-12 Thread Stephen Frost
Greetings, * David Steele (da...@pgmasters.net) wrote: > On 4/12/20 6:37 PM, Andres Freund wrote: > >On 2020-04-12 17:57:05 -0400, David Steele wrote: > >>On 4/12/20 3:17 PM, Andres Freund wrote: > >>>There's various ways we could address the issue for how the subcommand > >>>can access the file

Re: where should I stick that backup?

2020-04-12 Thread Robert Haas
On Sun, Apr 12, 2020 at 3:17 PM Andres Freund wrote: > A huge advantage of a scheme like this would be that it wouldn't have to > be specific to pg_basebackup. It could just as well work directly on the > server, avoiding an unnecesary loop through the network. Which > e.g. could integrate with

Re: where should I stick that backup?

2020-04-12 Thread David Steele
On 4/12/20 6:37 PM, Andres Freund wrote: Hi, On 2020-04-12 17:57:05 -0400, David Steele wrote: On 4/12/20 3:17 PM, Andres Freund wrote: [proposal outline[ This is pretty much what pgBackRest does. We call them "local" processes and they do most of the work during

Re: where should I stick that backup?

2020-04-12 Thread Andres Freund
Hi, On 2020-04-12 17:57:05 -0400, David Steele wrote: > On 4/12/20 3:17 PM, Andres Freund wrote: > > [proposal outline[ > > This is pretty much what pgBackRest does. We call them "local" processes and > they do most of the work during backup/restore/archive-get/archive-push. Hah. I swear, I

Re: where should I stick that backup?

2020-04-12 Thread David Steele
On 4/12/20 3:17 PM, Andres Freund wrote: More generally, can you think of any ideas for how to structure an API here that are easier to use than "write some C code"? Or do you think we should tell people to write some C code if they want to compress/encrypt/relocate their backup in some

Re: where should I stick that backup?

2020-04-12 Thread David Steele
On 4/12/20 11:04 AM, Robert Haas wrote: On Sun, Apr 12, 2020 at 10:09 AM Magnus Hagander wrote: There are certainly cases for it. It might not be they have to be the same connection, but still be the same session, meaning before the first time you perform some step of authentication, get a

Re: where should I stick that backup?

2020-04-12 Thread Andres Freund
Hi, On 2020-04-12 11:04:46 -0400, Robert Haas wrote: > I would expect that we would want to provide a flexible way for a > target or filter to be passed options from the pg_basebackup command > line. So one might for example write this: > > pg_basebackup --filter='lz4 -9'

Re: where should I stick that backup?

2020-04-12 Thread Andres Freund
Hi, On 2020-04-11 16:22:09 -0400, Robert Haas wrote: > On Fri, Apr 10, 2020 at 3:38 PM Andres Freund wrote: > > Wouldn't there be state like a S3/ssh/https/... connection? And perhaps > > a 'backup_id' in the backup metadata DB that'd one would want to update > > at the end? > > Good question.

Re: where should I stick that backup?

2020-04-12 Thread Robert Haas
On Sun, Apr 12, 2020 at 10:09 AM Magnus Hagander wrote: > There are certainly cases for it. It might not be they have to be the same > connection, but still be the same session, meaning before the first time you > perform some step of authentication, get a token, and then use that for all >

Re: where should I stick that backup?

2020-04-12 Thread Magnus Hagander
On Sat, Apr 11, 2020 at 10:22 PM Robert Haas wrote: > On Fri, Apr 10, 2020 at 3:38 PM Andres Freund wrote: > > Wouldn't there be state like a S3/ssh/https/... connection? And perhaps > > a 'backup_id' in the backup metadata DB that'd one would want to update > > at the end? > > Good question. I

Re: where should I stick that backup?

2020-04-11 Thread Jose Luis Tallon
On 10/4/20 21:38, Andres Freund wrote: Hi, On 2020-04-10 12:20:01 -0400, Robert Haas wrote: - We're only talking about writing a handful of tar files, and that's in the context of a full-database backup, which is a much heavier-weight operation than a query. - There is not really any state

Re: where should I stick that backup?

2020-04-11 Thread Robert Haas
On Fri, Apr 10, 2020 at 3:38 PM Andres Freund wrote: > Wouldn't there be state like a S3/ssh/https/... connection? And perhaps > a 'backup_id' in the backup metadata DB that'd one would want to update > at the end? Good question. I don't know that there would be but, uh, maybe? It's not obvious

Re: where should I stick that backup?

2020-04-11 Thread Jose Luis Tallon
On 10/4/20 15:49, Robert Haas wrote: On Thu, Apr 9, 2020 at 6:44 PM Bruce Momjian wrote: Good point, but if there are multiple APIs, it makes shell script flexibility even more useful. [snip] One thing I do think would be realistic would be to invent a set of tools that are perform certain

Re: where should I stick that backup?

2020-04-10 Thread Andres Freund
Hi, On 2020-04-10 12:20:01 -0400, Robert Haas wrote: > - We're only talking about writing a handful of tar files, and that's > in the context of a full-database backup, which is a much > heavier-weight operation than a query. > - There is not really any state that needs to be maintained across

Re: where should I stick that backup?

2020-04-10 Thread Robert Haas
On Fri, Apr 10, 2020 at 10:54 AM Stephen Frost wrote: > So, this goes to what I was just mentioning to Bruce independently- you > could have made the same argument about FDWs, but it just doesn't > actually hold any water. Sure, some of the FDWs aren't great, but > there's certainly no shortage

Re: where should I stick that backup?

2020-04-10 Thread Bruce Momjian
On Fri, Apr 10, 2020 at 10:54:10AM -0400, Stephen Frost wrote: > Greetings, > > * Robert Haas (robertmh...@gmail.com) wrote: > > On Thu, Apr 9, 2020 at 6:44 PM Bruce Momjian wrote: > > > Good point, but if there are multiple APIs, it makes shell script > > > flexibility even more useful. > > >

Re: where should I stick that backup?

2020-04-10 Thread Stephen Frost
Greetings, * Robert Haas (robertmh...@gmail.com) wrote: > On Thu, Apr 9, 2020 at 6:44 PM Bruce Momjian wrote: > > Good point, but if there are multiple APIs, it makes shell script > > flexibility even more useful. > > This is really the key point for me. There are so many existing tools > that

Re: where should I stick that backup?

2020-04-10 Thread Stephen Frost
Greetings, * Bruce Momjian (br...@momjian.us) wrote: > On Thu, Apr 9, 2020 at 04:15:07PM -0400, Stephen Frost wrote: > > * Bruce Momjian (br...@momjian.us) wrote: > > > I think we need to step back and look at the larger issue. The real > > > argument goes back to the Unix command-line API vs

Re: where should I stick that backup?

2020-04-10 Thread Robert Haas
On Thu, Apr 9, 2020 at 6:44 PM Bruce Momjian wrote: > Good point, but if there are multiple APIs, it makes shell script > flexibility even more useful. This is really the key point for me. There are so many existing tools that store a file someplace that we really can't ever hope to support them

Re: where should I stick that backup?

2020-04-09 Thread Bruce Momjian
On Thu, Apr 9, 2020 at 04:15:07PM -0400, Stephen Frost wrote: > Greetings, > > * Bruce Momjian (br...@momjian.us) wrote: > > I think we need to step back and look at the larger issue. The real > > argument goes back to the Unix command-line API vs the VMS/Windows API. > > The former has

Re: where should I stick that backup?

2020-04-09 Thread Stephen Frost
Greetings, * Bruce Momjian (br...@momjian.us) wrote: > I think we need to step back and look at the larger issue. The real > argument goes back to the Unix command-line API vs the VMS/Windows API. > The former has discrete parts that can be stitched together, while the > VMS/Windows API

Re: where should I stick that backup?

2020-04-09 Thread Bruce Momjian
On Mon, Apr 6, 2020 at 07:32:45PM +0200, Magnus Hagander wrote: > On Mon, Apr 6, 2020 at 4:45 PM Stephen Frost wrote: > > For my 2c, at least, introducing more shell commands into critical parts > > of the system is absolutely the wrong direction to go in. > > archive_command continues to be a

Re: where should I stick that backup?

2020-04-08 Thread Stephen Frost
Greetings, * Robert Haas (robertmh...@gmail.com) wrote: > On Wed, Apr 8, 2020 at 2:06 PM Stephen Frost wrote: > > * Robert Haas (robertmh...@gmail.com) wrote: > > > On Wed, Apr 8, 2020 at 1:05 PM Stephen Frost wrote: > > > > What if %f.bz2 already exists? > > > > > > That cannot occur in the

Re: where should I stick that backup?

2020-04-08 Thread Robert Haas
On Wed, Apr 8, 2020 at 2:06 PM Stephen Frost wrote: > * Robert Haas (robertmh...@gmail.com) wrote: > > On Wed, Apr 8, 2020 at 1:05 PM Stephen Frost wrote: > > > What if %f.bz2 already exists? > > > > That cannot occur in the scenario I described. > > Of course it can. Not really. The steps I

Re: where should I stick that backup?

2020-04-08 Thread Stephen Frost
Greeitngs, * Robert Haas (robertmh...@gmail.com) wrote: > On Wed, Apr 8, 2020 at 1:05 PM Stephen Frost wrote: > > What if %f.bz2 already exists? > > That cannot occur in the scenario I described. Of course it can. > > How about if %f has a space in it? > > For a tar-format backup I don't

Re: where should I stick that backup?

2020-04-08 Thread Robert Haas
On Wed, Apr 8, 2020 at 1:05 PM Stephen Frost wrote: > What if %f.bz2 already exists? That cannot occur in the scenario I described. > How about if %f has a space in it? For a tar-format backup I don't think that can happen, because the file names will be base.tar and ${tablespace_oid}.tar. For

Re: where should I stick that backup?

2020-04-08 Thread Stephen Frost
Greetings, * Robert Haas (robertmh...@gmail.com) wrote: > On Mon, Apr 6, 2020 at 2:23 PM Stephen Frost wrote: > > So, instead of talking about 'bzip2 > %f.bz2', and then writing into our > > documentation that that's how this feature can be used, what about > > proposing something that would

Re: where should I stick that backup?

2020-04-06 Thread Robert Haas
On Mon, Apr 6, 2020 at 1:32 PM Magnus Hagander wrote: > Now, if we were just talking about compression, it would actually be > interesting to implement some sort of "postgres compression API" if > you will, that is implemented by a shared library. This library could > then be used from

Re: where should I stick that backup?

2020-04-06 Thread Robert Haas
On Mon, Apr 6, 2020 at 2:23 PM Stephen Frost wrote: > So, instead of talking about 'bzip2 > %f.bz2', and then writing into our > documentation that that's how this feature can be used, what about > proposing something that would actually work reliably with this > interface? Which properly

Re: where should I stick that backup?

2020-04-06 Thread Stephen Frost
Greetings, * Magnus Hagander (mag...@hagander.net) wrote: > On Mon, Apr 6, 2020 at 4:45 PM Stephen Frost wrote: > > * Noah Misch (n...@leadboat.com) wrote: > > > On Fri, Apr 03, 2020 at 10:19:21AM -0400, Robert Haas wrote: > > > > What I'm thinking about is: suppose we add an option to

Re: where should I stick that backup?

2020-04-06 Thread Stephen Frost
Greetings, * Robert Haas (robertmh...@gmail.com) wrote: > On Mon, Apr 6, 2020 at 10:45 AM Stephen Frost wrote: > > For my 2c, at least, introducing more shell commands into critical parts > > of the system is absolutely the wrong direction to go in. > > archive_command continues to be a mess

Re: where should I stick that backup?

2020-04-06 Thread Magnus Hagander
On Mon, Apr 6, 2020 at 4:45 PM Stephen Frost wrote: > > Greetings, > > * Noah Misch (n...@leadboat.com) wrote: > > On Fri, Apr 03, 2020 at 10:19:21AM -0400, Robert Haas wrote: > > > What I'm thinking about is: suppose we add an option to pg_basebackup > > > with a name like --pipe-output. This

Re: where should I stick that backup?

2020-04-06 Thread Robert Haas
On Mon, Apr 6, 2020 at 10:45 AM Stephen Frost wrote: > For my 2c, at least, introducing more shell commands into critical parts > of the system is absolutely the wrong direction to go in. > archive_command continues to be a mess that we refuse to clean up or > even properly document and the

Re: where should I stick that backup?

2020-04-06 Thread Stephen Frost
Greetings, * Noah Misch (n...@leadboat.com) wrote: > On Fri, Apr 03, 2020 at 10:19:21AM -0400, Robert Haas wrote: > > What I'm thinking about is: suppose we add an option to pg_basebackup > > with a name like --pipe-output. This would be mutually exclusive with > > -D, but would work at least

Re: where should I stick that backup?

2020-04-05 Thread Noah Misch
On Fri, Apr 03, 2020 at 10:19:21AM -0400, Robert Haas wrote: > What I'm thinking about is: suppose we add an option to pg_basebackup > with a name like --pipe-output. This would be mutually exclusive with > -D, but would work at least with -Ft and maybe also with -Fp. The > argument to

where should I stick that backup?

2020-04-03 Thread Robert Haas
There are a couple of things that pg_basebackup can't do that might be an issue for some users. One of them is that you might want to do something like encrypt your backup. Another is that you might want to store someplace other than in the filesystem, like maybe S3. We could certainly teach