Re: [HACKERS] Implementing incremental backup

2013-06-22 Thread Cédric Villemain
Le samedi 22 juin 2013 01:09:20, Jehan-Guillaume (ioguix) de Rorthais a écrit : On 20/06/2013 03:25, Tatsuo Ishii wrote: On Wed, Jun 19, 2013 at 8:40 PM, Tatsuo Ishii is...@postgresql.org wrote: On Wed, Jun 19, 2013 at 6:20 PM, Stephen Frost sfr...@snowman.net wrote: * Claudio Freire

Re: [HACKERS] Implementing incremental backup

2013-06-22 Thread Andres Freund
On 2013-06-22 15:58:35 +0200, Cédric Villemain wrote: A differential backup resulting from a bunch of WAL between W1 and Wn would help to recover much faster to the time of Wn than replaying all the WALs between W1 and Wn and saves a lot of space. I was hoping to find some time to dig

Re: [HACKERS] Implementing incremental backup

2013-06-21 Thread Jehan-Guillaume (ioguix) de Rorthais
On 20/06/2013 03:25, Tatsuo Ishii wrote: On Wed, Jun 19, 2013 at 8:40 PM, Tatsuo Ishii is...@postgresql.org wrote: On Wed, Jun 19, 2013 at 6:20 PM, Stephen Frost sfr...@snowman.net wrote: * Claudio Freire (klaussfre...@gmail.com) wrote: [...] The only bottleneck here, is WAL archiving. This

Re: [HACKERS] Implementing incremental backup

2013-06-20 Thread Magnus Hagander
On Thu, Jun 20, 2013 at 12:18 AM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Claudio Freire escribió: On Wed, Jun 19, 2013 at 6:20 PM, Stephen Frost sfr...@snowman.net wrote: * Claudio Freire (klaussfre...@gmail.com) wrote: I don't see how this is better than snapshotting at the

[HACKERS] Implementing incremental backup

2013-06-19 Thread Tatsuo Ishii
Hi, I'm thinking of implementing an incremental backup tool for PostgreSQL. The use case for the tool would be taking a backup of huge database. For that size of database, pg_dump is too slow, even WAL archive is too slow/ineffective as well. However even in a TB database, sometimes actual

Re: [HACKERS] Implementing incremental backup

2013-06-19 Thread Stephen Frost
Tatsuo, * Tatsuo Ishii (is...@postgresql.org) wrote: I'm thinking of implementing an incremental backup tool for PostgreSQL. The use case for the tool would be taking a backup of huge database. For that size of database, pg_dump is too slow, even WAL archive is too slow/ineffective as well.

Re: [HACKERS] Implementing incremental backup

2013-06-19 Thread Ants Aasma
On Wed, Jun 19, 2013 at 1:13 PM, Tatsuo Ishii is...@postgresql.org wrote: I'm thinking of implementing an incremental backup tool for PostgreSQL. The use case for the tool would be taking a backup of huge database. For that size of database, pg_dump is too slow, even WAL archive is too

Re: [HACKERS] Implementing incremental backup

2013-06-19 Thread Claudio Freire
On Wed, Jun 19, 2013 at 7:13 AM, Tatsuo Ishii is...@postgresql.org wrote: For now, my idea is pretty vague. - Record info about modified blocks. We don't need to remember the whole history of a block if the block was modified multiple times. We just remember that the block was modified

Re: [HACKERS] Implementing incremental backup

2013-06-19 Thread Jim Nasby
On 6/19/13 11:02 AM, Claudio Freire wrote: On Wed, Jun 19, 2013 at 7:13 AM, Tatsuo Ishii is...@postgresql.org wrote: For now, my idea is pretty vague. - Record info about modified blocks. We don't need to remember the whole history of a block if the block was modified multiple times. We

Re: [HACKERS] Implementing incremental backup

2013-06-19 Thread Claudio Freire
On Wed, Jun 19, 2013 at 3:54 PM, Jim Nasby j...@nasby.net wrote: On 6/19/13 11:02 AM, Claudio Freire wrote: On Wed, Jun 19, 2013 at 7:13 AM, Tatsuo Ishii is...@postgresql.org wrote: For now, my idea is pretty vague. - Record info about modified blocks. We don't need to remember the

Re: [HACKERS] Implementing incremental backup

2013-06-19 Thread Stephen Frost
* Claudio Freire (klaussfre...@gmail.com) wrote: I don't see how this is better than snapshotting at the filesystem level. I have no experience with TB scale databases (I've been limited to only hundreds of GB), but from my limited mid-size db experience, filesystem snapshotting is pretty much

Re: [HACKERS] Implementing incremental backup

2013-06-19 Thread Claudio Freire
On Wed, Jun 19, 2013 at 6:20 PM, Stephen Frost sfr...@snowman.net wrote: * Claudio Freire (klaussfre...@gmail.com) wrote: I don't see how this is better than snapshotting at the filesystem level. I have no experience with TB scale databases (I've been limited to only hundreds of GB), but from

Re: [HACKERS] Implementing incremental backup

2013-06-19 Thread Alvaro Herrera
Claudio Freire escribió: On Wed, Jun 19, 2013 at 6:20 PM, Stephen Frost sfr...@snowman.net wrote: * Claudio Freire (klaussfre...@gmail.com) wrote: I don't see how this is better than snapshotting at the filesystem level. I have no experience with TB scale databases (I've been limited to

Re: [HACKERS] Implementing incremental backup

2013-06-19 Thread Claudio Freire
On Wed, Jun 19, 2013 at 7:18 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: If you have the two technologies, you could teach them to work in conjunction: you set up WAL replication, and tell the WAL compressor to prune updates for high-update tables (avoid useless traffic), then use

Re: [HACKERS] Implementing incremental backup

2013-06-19 Thread Tatsuo Ishii
I'm thinking of implementing an incremental backup tool for PostgreSQL. The use case for the tool would be taking a backup of huge database. For that size of database, pg_dump is too slow, even WAL archive is too slow/ineffective as well. However even in a TB database, sometimes actual

Re: [HACKERS] Implementing incremental backup

2013-06-19 Thread Claudio Freire
On Wed, Jun 19, 2013 at 7:39 PM, Tatsuo Ishii is...@postgresql.org wrote: I'm thinking of implementing an incremental backup tool for PostgreSQL. The use case for the tool would be taking a backup of huge database. For that size of database, pg_dump is too slow, even WAL archive is too

Re: [HACKERS] Implementing incremental backup

2013-06-19 Thread Tatsuo Ishii
I'm trying to figure out how that's actually different from WAL..? It sounds like you'd get what you're suggesting with simply increasing the checkpoint timeout until the WAL stream is something which you can keep up with. Of course, the downside there is that you'd have to replay more WAL

Re: [HACKERS] Implementing incremental backup

2013-06-19 Thread Stephen Frost
* Tatsuo Ishii (is...@postgresql.org) wrote: Yeah, at first I thought using WAL was a good idea. However I realized that the problem using WAL is we cannot backup unlogged tables because they are not written to WAL. Unlogged tables are also nuked on recovery, so I'm not sure why you think an

Re: [HACKERS] Implementing incremental backup

2013-06-19 Thread Tatsuo Ishii
* Tatsuo Ishii (is...@postgresql.org) wrote: Yeah, at first I thought using WAL was a good idea. However I realized that the problem using WAL is we cannot backup unlogged tables because they are not written to WAL. Unlogged tables are also nuked on recovery, so I'm not sure why you think

Re: [HACKERS] Implementing incremental backup

2013-06-19 Thread Stephen Frost
* Tatsuo Ishii (is...@postgresql.org) wrote: * Tatsuo Ishii (is...@postgresql.org) wrote: Yeah, at first I thought using WAL was a good idea. However I realized that the problem using WAL is we cannot backup unlogged tables because they are not written to WAL. Unlogged tables are

Re: [HACKERS] Implementing incremental backup

2013-06-19 Thread Tatsuo Ishii
* Tatsuo Ishii (is...@postgresql.org) wrote: * Tatsuo Ishii (is...@postgresql.org) wrote: Yeah, at first I thought using WAL was a good idea. However I realized that the problem using WAL is we cannot backup unlogged tables because they are not written to WAL. Unlogged tables are

Re: [HACKERS] Implementing incremental backup

2013-06-19 Thread Tatsuo Ishii
On Wed, Jun 19, 2013 at 6:20 PM, Stephen Frost sfr...@snowman.net wrote: * Claudio Freire (klaussfre...@gmail.com) wrote: I don't see how this is better than snapshotting at the filesystem level. I have no experience with TB scale databases (I've been limited to only hundreds of GB), but from

Re: [HACKERS] Implementing incremental backup

2013-06-19 Thread Stephen Frost
* Tatsuo Ishii (is...@postgresql.org) wrote: I don't think using rsync (or tar or whatever general file utils) against TB database for incremental backup is practical. If it's practical, I would never propose my idea. You could use rsync for incremental updates if you wanted, it'd certainly be

Re: [HACKERS] Implementing incremental backup

2013-06-19 Thread Stephen Frost
* Tatsuo Ishii (is...@postgresql.org) wrote: Why do you think WAL compressor idea is more scalable? I really want to know why. Besides the unlogged tables issue, I can accept the idea if WAL based solution is much more efficient. If there's no perfect, ideal solution, we need to prioritize

Re: [HACKERS] Implementing incremental backup

2013-06-19 Thread Claudio Freire
On Wed, Jun 19, 2013 at 8:40 PM, Tatsuo Ishii is...@postgresql.org wrote: On Wed, Jun 19, 2013 at 6:20 PM, Stephen Frost sfr...@snowman.net wrote: * Claudio Freire (klaussfre...@gmail.com) wrote: I don't see how this is better than snapshotting at the filesystem level. I have no experience

Re: [HACKERS] Implementing incremental backup

2013-06-19 Thread Tatsuo Ishii
On Wed, Jun 19, 2013 at 8:40 PM, Tatsuo Ishii is...@postgresql.org wrote: On Wed, Jun 19, 2013 at 6:20 PM, Stephen Frost sfr...@snowman.net wrote: * Claudio Freire (klaussfre...@gmail.com) wrote: I don't see how this is better than snapshotting at the filesystem level. I have no experience