Re: [HACKERS] Streaming a base backup from master

2010-09-07 Thread Bruce Momjian
Greg Stark wrote: The industry standard solution that we're missing that we *should* be figuring out how to implement is incremental backups. I've actually been thinking about this recently and I think we could do it fairly easily with our existing infrastructure. I was planning on doing it

Re: [HACKERS] Streaming a base backup from master

2010-09-06 Thread Greg Stark
On Sun, Sep 5, 2010 at 4:51 PM, Martijn van Oosterhout klep...@svana.org wrote: If you're working from a known good version of the database at some point, yes you are right you have more interesting options. If you don't you want something that will fix it. Sure, in that case you want to

Re: [HACKERS] Streaming a base backup from master

2010-09-06 Thread Robert Haas
On Mon, Sep 6, 2010 at 10:07 AM, Greg Stark gsst...@mit.edu wrote: I think that description pretty much settles the question in my mind. The implementation choice of scanning the WAL to find all the changed blocks is more relevant to the use cases where incremental backups are useful. If you

Re: [HACKERS] Streaming a base backup from master

2010-09-05 Thread Martijn van Oosterhout
On Sat, Sep 04, 2010 at 02:42:40PM +0100, Greg Stark wrote: On Fri, Sep 3, 2010 at 8:30 PM, Martijn van Oosterhout klep...@svana.org wrote: rsync is not rocket science. All you need is for the receiving end to send a checksum for each block it has. The server side does the same checksum

Re: [HACKERS] Streaming a base backup from master

2010-09-04 Thread Greg Stark
On Fri, Sep 3, 2010 at 8:30 PM, Martijn van Oosterhout klep...@svana.org wrote: rsync is not rocket science. All you need is for the receiving end to send a checksum for each block it has. The server side does the same checksum and for each block sends back same or new data. Well rsync is

Re: [HACKERS] Streaming a base backup from master

2010-09-04 Thread Thom Brown
On 4 September 2010 14:42, Greg Stark gsst...@mit.edu wrote: The industry standard solution that we're missing that we *should* be figuring out how to implement is incremental backups. I'll buy you a crate of beer if this gets implemented... although you're in Dublin so would be like buying

Re: [HACKERS] Streaming a base backup from master

2010-09-04 Thread Robert Haas
On Sat, Sep 4, 2010 at 9:42 AM, Greg Stark gsst...@mit.edu wrote: *However* I tihnk you're all headed in the wrong direction here. I don't think rsync is what anyone should be doing with their backups at all. It still requires scanning through *all* your data even if you've only changed a

Re: [HACKERS] Streaming a base backup from master

2010-09-03 Thread Thom Brown
On 3 September 2010 12:19, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: TODO: * We need a smarter way to do pg_start/stop_backup() with this. At the moment, you can only have one backup running at a time, but we shouldn't have that limitation with this built-in mechanism.

Re: [HACKERS] Streaming a base backup from master

2010-09-03 Thread Dave Page
On Fri, Sep 3, 2010 at 12:19 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: Here's a WIP patch. It adds a new TAKE_BACKUP command to the replication command set. Upon receiving that command, the master starts a COPY, and streams a tarred copy of the data directory to the

Re: [HACKERS] Streaming a base backup from master

2010-09-03 Thread Magnus Hagander
On Fri, Sep 3, 2010 at 13:19, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: It's been discussed before that it would be cool if you could stream a new base backup from the master server, via libpq. That way you would not need low-level filesystem access to initialize a new

Re: [HACKERS] Streaming a base backup from master

2010-09-03 Thread Magnus Hagander
On Fri, Sep 3, 2010 at 13:25, Thom Brown t...@linux.com wrote: On 3 September 2010 12:19, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: TODO: * We need a smarter way to do pg_start/stop_backup() with this. At the moment, you can only have one backup running at a time, but we

Re: [HACKERS] Streaming a base backup from master

2010-09-03 Thread Thom Brown
On 3 September 2010 12:30, Magnus Hagander mag...@hagander.net wrote: On Fri, Sep 3, 2010 at 13:25, Thom Brown t...@linux.com wrote: On 3 September 2010 12:19, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: TODO: * We need a smarter way to do pg_start/stop_backup() with this.

Re: [HACKERS] Streaming a base backup from master

2010-09-03 Thread Heikki Linnakangas
On 03/09/10 14:25, Thom Brown wrote: On 3 September 2010 12:19, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: TODO: * We need a smarter way to do pg_start/stop_backup() with this. At the moment, you can only have one backup running at a time, but we shouldn't have that

Re: [HACKERS] Streaming a base backup from master

2010-09-03 Thread Heikki Linnakangas
On 03/09/10 14:28, Dave Page wrote: On Fri, Sep 3, 2010 at 12:19 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: Here's a WIP patch. It adds a new TAKE_BACKUP command to the replication command set. Upon receiving that command, the master starts a COPY, and streams a tarred

Re: [HACKERS] Streaming a base backup from master

2010-09-03 Thread Magnus Hagander
On Fri, Sep 3, 2010 at 13:48, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: On 03/09/10 14:28, Dave Page wrote: On Fri, Sep 3, 2010 at 12:19 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com  wrote: Here's a WIP patch. It adds a new TAKE_BACKUP command to the

Re: [HACKERS] Streaming a base backup from master

2010-09-03 Thread Greg Stark
On Fri, Sep 3, 2010 at 12:19 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: * We need a smarter way to do pg_start/stop_backup() with this. At the moment, you can only have one backup running at a time, but we shouldn't have that limitation with this built-in mechanism. Well

Re: [HACKERS] Streaming a base backup from master

2010-09-03 Thread Heikki Linnakangas
On 03/09/10 15:16, Greg Stark wrote: On Fri, Sep 3, 2010 at 12:19 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: * We need a smarter way to do pg_start/stop_backup() with this. At the moment, you can only have one backup running at a time, but we shouldn't have that

Re: [HACKERS] Streaming a base backup from master

2010-09-03 Thread Robert Haas
On Fri, Sep 3, 2010 at 7:28 AM, Dave Page dp...@pgadmin.org wrote: On Fri, Sep 3, 2010 at 12:19 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: Here's a WIP patch. It adds a new TAKE_BACKUP command to the replication command set. Upon receiving that command, the master starts

Re: [HACKERS] Streaming a base backup from master

2010-09-03 Thread Magnus Hagander
On Fri, Sep 3, 2010 at 15:24, Robert Haas robertmh...@gmail.com wrote: On Fri, Sep 3, 2010 at 7:28 AM, Dave Page dp...@pgadmin.org wrote: On Fri, Sep 3, 2010 at 12:19 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: Here's a WIP patch. It adds a new TAKE_BACKUP command to the

Re: [HACKERS] Streaming a base backup from master

2010-09-03 Thread Dave Page
On Fri, Sep 3, 2010 at 2:24 PM, Robert Haas robertmh...@gmail.com wrote: On Fri, Sep 3, 2010 at 7:28 AM, Dave Page dp...@pgadmin.org wrote: On Fri, Sep 3, 2010 at 12:19 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: Here's a WIP patch. It adds a new TAKE_BACKUP command to

Re: [HACKERS] Streaming a base backup from master

2010-09-03 Thread Robert Haas
On Fri, Sep 3, 2010 at 9:26 AM, Dave Page dp...@pgadmin.org wrote: rsync? Might be easier to use that from day 1 (well, day 2) than to retrofit later. I'm not sure we want to depend on an external utility like that, particularly one that users may not have installed. And I'm not sure if that

Re: [HACKERS] Streaming a base backup from master

2010-09-03 Thread Dave Page
On Fri, Sep 3, 2010 at 2:29 PM, Robert Haas robertmh...@gmail.com wrote: On Fri, Sep 3, 2010 at 9:26 AM, Dave Page dp...@pgadmin.org wrote: rsync? Might be easier to use that from day 1 (well, day 2) than to retrofit later. I'm not sure we want to depend on an external utility like that,

Re: [HACKERS] Streaming a base backup from master

2010-09-03 Thread Robert Haas
On Fri, Sep 3, 2010 at 9:32 AM, Dave Page dp...@pgadmin.org wrote: No, I agree we don't want an external dependency (I was just bleating about needing tar on Windows). I was assuming/hoping there's a librsync somewhere... The rsync code itself is not modular, I believe. I think the author

Re: [HACKERS] Streaming a base backup from master

2010-09-03 Thread Stephen Frost
* Robert Haas (robertmh...@gmail.com) wrote: The rsync code itself is not modular, I believe. I think the author thereof kind of took the approach of placing efficiency before all. Yeah, I looked into this when discussing this same concept at PGCon with folks. There doesn't appear to be a

Re: [HACKERS] Streaming a base backup from master

2010-09-03 Thread Tom Lane
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes: On 03/09/10 15:16, Greg Stark wrote: On Fri, Sep 3, 2010 at 12:19 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: * We need a smarter way to do pg_start/stop_backup() with this. At the moment, you can only have

Re: [HACKERS] Streaming a base backup from master

2010-09-03 Thread Kevin Grittner
Stephen Frost sfr...@snowman.net wrote: there's a heck of alot of complexity there that we *don't* need. rsync is a great tool, don't get me wrong, but let's not try to go over our heads here. Right -- among other things, it checks for portions of a new file which match the old file at a

Re: [HACKERS] Streaming a base backup from master

2010-09-03 Thread Stephen Frost
Kevin, * Kevin Grittner (kevin.gritt...@wicourts.gov) wrote: While 1GB granularity would be OK, I doubt it's optimal; I think CRC checks for smaller chunks might be worthwhile. My gut feel is that somewhere in the 64kB to 1MB range would probably be optimal for us, although the sweet spot

Re: [HACKERS] Streaming a base backup from master

2010-09-03 Thread Thom Brown
On 3 September 2010 16:01, Tom Lane t...@sss.pgh.pa.us wrote: Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes: On 03/09/10 15:16, Greg Stark wrote: On Fri, Sep 3, 2010 at 12:19 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com  wrote: * We need a smarter way to do

Re: [HACKERS] Streaming a base backup from master

2010-09-03 Thread Kevin Grittner
Stephen Frost sfr...@snowman.net wrote: We have something much better, called WAL. If people want to keep their backup current, they should use that after getting the base backup up and working. Unless you want to provide support for Point In Time Recovery without excessive recovery times.

Re: [HACKERS] Streaming a base backup from master

2010-09-03 Thread Heikki Linnakangas
On 03/09/10 18:01, Tom Lane wrote: Heikki Linnakangasheikki.linnakan...@enterprisedb.com writes: On 03/09/10 15:16, Greg Stark wrote: On Fri, Sep 3, 2010 at 12:19 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: * We need a smarter way to do pg_start/stop_backup() with

Re: [HACKERS] Streaming a base backup from master

2010-09-03 Thread Robert Haas
On Fri, Sep 3, 2010 at 11:20 AM, Stephen Frost sfr...@snowman.net wrote: Kevin, * Kevin Grittner (kevin.gritt...@wicourts.gov) wrote: While 1GB granularity would be OK, I doubt it's optimal; I think CRC checks for smaller chunks might be worthwhile.  My gut feel is that somewhere in the 64kB

Re: [HACKERS] Streaming a base backup from master

2010-09-03 Thread Tom Lane
Kevin Grittner kevin.gritt...@wicourts.gov writes: Stephen Frost sfr...@snowman.net wrote: In any case, it's certainly not something required for an initial implementation.. No disagreement there; but sometimes it pays to know where you might want to go, so you don't do something to make

Re: [HACKERS] Streaming a base backup from master

2010-09-03 Thread Robert Haas
On Fri, Sep 3, 2010 at 11:47 AM, Tom Lane t...@sss.pgh.pa.us wrote: Kevin Grittner kevin.gritt...@wicourts.gov writes: Stephen Frost sfr...@snowman.net wrote: In any case, it's certainly not something required for an initial implementation.. No disagreement there; but sometimes it pays to

Re: [HACKERS] Streaming a base backup from master

2010-09-03 Thread David Blewett
On Fri, Sep 3, 2010 at 11:47 AM, Tom Lane t...@sss.pgh.pa.us wrote: IOW, what I'd like to see is protocol extensions that allow an external copy of rsync to be invoked; not build in rsync, or tar, or anything else that we could get off-the-shelf. Personally, I would love to see protocol-level

Re: [HACKERS] Streaming a base backup from master

2010-09-03 Thread Stephen Frost
* Tom Lane (t...@sss.pgh.pa.us) wrote: IOW, what I'd like to see is protocol extensions that allow an external copy of rsync to be invoked; not build in rsync, or tar, or anything else that we could get off-the-shelf. I'd much rather use an existing library to implement it than call out to

Re: [HACKERS] Streaming a base backup from master

2010-09-03 Thread Kevin Grittner
Tom Lane t...@sss.pgh.pa.us wrote: what I'd like to see is protocol extensions that allow an external copy of rsync to be invoked; not build in rsync, or tar, or anything else that we could get off-the-shelf. The complexities of dealing with properly invoking rsync externally could well

Re: [HACKERS] Streaming a base backup from master

2010-09-03 Thread Heikki Linnakangas
On 03/09/10 19:09, Stephen Frost wrote: * Tom Lane (t...@sss.pgh.pa.us) wrote: IOW, what I'd like to see is protocol extensions that allow an external copy of rsync to be invoked; not build in rsync, or tar, or anything else that we could get off-the-shelf. I'd much rather use an existing

Re: [HACKERS] Streaming a base backup from master

2010-09-03 Thread Heikki Linnakangas
On 03/09/10 18:53, David Blewett wrote: On Fri, Sep 3, 2010 at 11:47 AM, Tom Lanet...@sss.pgh.pa.us wrote: IOW, what I'd like to see is protocol extensions that allow an external copy of rsync to be invoked; not build in rsync, or tar, or anything else that we could get off-the-shelf.

Re: [HACKERS] Streaming a base backup from master

2010-09-03 Thread Martijn van Oosterhout
On Fri, Sep 03, 2010 at 09:56:12AM -0400, Stephen Frost wrote: * Robert Haas (robertmh...@gmail.com) wrote: The rsync code itself is not modular, I believe. I think the author thereof kind of took the approach of placing efficiency before all. Yeah, I looked into this when discussing this

Re: [HACKERS] Streaming a base backup from master

2010-09-03 Thread David Blewett
On Fri, Sep 3, 2010 at 12:23 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: On 03/09/10 18:53, David Blewett wrote: On Fri, Sep 3, 2010 at 11:47 AM, Tom Lanet...@sss.pgh.pa.us  wrote: IOW, what I'd like to see is protocol extensions that allow an external copy of rsync to