Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-11-08 Thread Greg Smith
I was curious if anyone running into these problems has gotten a chance to test the 3 fixes committed here. It sounded like Linas even had a repeatable test case? For easier reference the commits are:

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-11-02 Thread Simon Riggs
On Wed, Nov 2, 2011 at 2:40 AM, Chris Redekop ch...@replicon.com wrote: looks like the v3 patch re-introduces the pg_subtrans issue... No, I just separated the patches to be clearer about the individual changes. --  Simon Riggs   http://www.2ndQuadrant.com/  PostgreSQL

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-11-02 Thread Simon Riggs
On Wed, Nov 2, 2011 at 7:34 AM, Simon Riggs si...@2ndquadrant.com wrote: On Wed, Nov 2, 2011 at 2:40 AM, Chris Redekop ch...@replicon.com wrote: looks like the v3 patch re-introduces the pg_subtrans issue... No, I just separated the patches to be clearer about the individual changes. 3 bug

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-11-02 Thread Chris Redekop
okay, sorry I'm a little confused then. Should I be able to apply both the v2 patch as well as the v3 patch? or is it expected that I'd have to manually do the merge? On Wed, Nov 2, 2011 at 1:34 AM, Simon Riggs si...@2ndquadrant.com wrote: On Wed, Nov 2, 2011 at 2:40 AM, Chris Redekop

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-11-01 Thread Chris Redekop
looks like the v3 patch re-introduces the pg_subtrans issue... On Tue, Nov 1, 2011 at 9:33 AM, Simon Riggs si...@2ndquadrant.com wrote: On Thu, Oct 27, 2011 at 4:25 PM, Simon Riggs si...@2ndquadrant.com wrote: StartupMultiXact() didn't need changing, I thought, but I will review further.

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-27 Thread Heikki Linnakangas
On 27.10.2011 09:57, Heikki Linnakangas wrote: My suggestion is to fix the CLOG problem in that same way that you fixed the SUBTRANS problem, i.e. by moving LogStandbySnapshot() to before CheckPointGuts(). Here's what I image CreateCheckPoint() should look like: 1) LogStandbySnapshot() and

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-27 Thread Simon Riggs
On Thu, Oct 27, 2011 at 4:36 AM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Wed, Oct 26, 2011 at 12:16 PM, Simon Riggs si...@2ndquadrant.com wrote: This fixes both the subtrans and clog bugs in one patch. I don't see the point of changing StartupCLOG() to

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-27 Thread Heikki Linnakangas
On 27.10.2011 02:29, Florian Pflug wrote: Per my theory about the cause of the problem in my other mail, I think you might see StartupCLOG failures even during crash recovery, provided that wal_level was set to hot_standby when the primary crashed. Here's how 1) We start a checkpoint, and get

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-27 Thread Robert Haas
On Thu, Oct 27, 2011 at 5:37 AM, Simon Riggs si...@2ndquadrant.com wrote: On Thu, Oct 27, 2011 at 4:36 AM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Wed, Oct 26, 2011 at 12:16 PM, Simon Riggs si...@2ndquadrant.com wrote: This fixes both the subtrans and

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-27 Thread Florian Pflug
On Oct27, 2011, at 08:57 , Heikki Linnakangas wrote: On 27.10.2011 02:29, Florian Pflug wrote: Per my theory about the cause of the problem in my other mail, I think you might see StartupCLOG failures even during crash recovery, provided that wal_level was set to hot_standby when the primary

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-27 Thread Simon Riggs
On Thu, Oct 27, 2011 at 12:36 PM, Robert Haas robertmh...@gmail.com wrote: On Thu, Oct 27, 2011 at 5:37 AM, Simon Riggs si...@2ndquadrant.com wrote: On Thu, Oct 27, 2011 at 4:36 AM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Wed, Oct 26, 2011 at 12:16 PM,

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-27 Thread Simon Riggs
On Thu, Oct 27, 2011 at 12:29 AM, Florian Pflug f...@phlo.org wrote: Per my theory about the cause of the problem in my other mail, I think you might see StartupCLOG failures even during crash recovery, provided that wal_level was set to hot_standby when the primary crashed. Here's how 1) We

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-27 Thread Florian Pflug
On Oct27, 2011, at 15:51 , Simon Riggs wrote: On Thu, Oct 27, 2011 at 12:29 AM, Florian Pflug f...@phlo.org wrote: Here's what I image CreateCheckPoint() should look like: 1) LogStandbySnapshot() and fill out oldestActiveXid 2) Fill out REDO 3) Wait for concurrent commits 4) Fill out

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-27 Thread Tom Lane
Simon Riggs si...@2ndquadrant.com writes: On Thu, Oct 27, 2011 at 12:36 PM, Robert Haas robertmh...@gmail.com wrote: On Thu, Oct 27, 2011 at 5:37 AM, Simon Riggs si...@2ndquadrant.com wrote: It's much easier to understand that StartupCLOG() is actually a no-op and that we need to trim the clog

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-27 Thread Simon Riggs
On Thu, Oct 27, 2011 at 3:03 PM, Florian Pflug f...@phlo.org wrote: I think you make a good case for doing this. However, I'm concerned that moving LogStandbySnapshot() in a backpatch seems more risky than it's worth. We could easily introduce a new bug into what we would all agree is a

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-27 Thread Simon Riggs
On Thu, Oct 27, 2011 at 3:13 PM, Tom Lane t...@sss.pgh.pa.us wrote: However, the obvious next question is whether those other modules don't need to be changed also, and if not why not. Good point. StartupSubtrans() is also changed by this patch, since it will be supplied with an earlier

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-27 Thread Florian Pflug
On Oct27, 2011, at 16:30 , Simon Riggs wrote: On Thu, Oct 27, 2011 at 3:03 PM, Florian Pflug f...@phlo.org wrote: I think you make a good case for doing this. However, I'm concerned that moving LogStandbySnapshot() in a backpatch seems more risky than it's worth. We could easily introduce

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-26 Thread Simon Riggs
On Tue, Oct 25, 2011 at 10:06 PM, Chris Redekop ch...@replicon.com wrote: Chris, can you rearrange the backup so you copy the pg_control file as the first act after the pg_start_backup? I tried this and it doesn't seem to make any difference. It won't, that was a poor initial diagnosis on my

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-26 Thread Florian Pflug
On Oct25, 2011, at 14:51 , Simon Riggs wrote: On Tue, Oct 25, 2011 at 12:39 PM, Florian Pflug f...@phlo.org wrote: What I don't understand is how this affects the CLOG. How does oldestActiveXID factor into CLOG initialization? It is an entirely different error. Ah, OK. I assumed that

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-26 Thread Florian Pflug
On Oct25, 2011, at 13:39 , Florian Pflug wrote: On Oct25, 2011, at 11:13 , Simon Riggs wrote: On Tue, Oct 25, 2011 at 8:03 AM, Simon Riggs si...@2ndquadrant.com wrote: We are starting recovery at the right place but we are initialising the clog and subtrans incorrectly. Precisely, the

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-26 Thread Simon Riggs
On Wed, Oct 26, 2011 at 12:16 PM, Florian Pflug f...@phlo.org wrote: Chris' clog error was caused by a file read error. The file was opened, we did a seek within the file and then the call to read() failed to return a complete page from the file. The xid shown is 22811359, which is the

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-26 Thread Aidan Van Dyk
On Wed, Oct 26, 2011 at 7:43 AM, Simon Riggs si...@2ndquadrant.com wrote: It's very likely that it's a PostgreSQL problem, though. It's probably not a pilot error since it happens even for backups taken with pg_basebackup(), so the only explanation other than a PostgreSQL bug is broken

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-26 Thread Simon Riggs
On Wed, Oct 26, 2011 at 12:54 PM, Aidan Van Dyk ai...@highrise.ca wrote: The read fails because their is no data at the location it's trying to read from, because clog hasn't been extended yet by recovery. You don't actually know that, though I agree it seems a reasonable guess and was my

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-26 Thread Florian Pflug
On Oct26, 2011, at 15:12 , Simon Riggs wrote: On Wed, Oct 26, 2011 at 12:54 PM, Aidan Van Dyk ai...@highrise.ca wrote: The read fails because their is no data at the location it's trying to read from, because clog hasn't been extended yet by recovery. You don't actually know that, though I

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-26 Thread Chris Redekop
And I think they also reported that if they didn't run hot standby, but just normal recovery into a new master, it didn't have the problem either, i.e. without hotstandby, recovery ran, properly extended the clog, and then ran as a new master fine. Yes this is correct...attempting to start as

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-26 Thread Florian Pflug
On Oct26, 2011, at 15:57 , Florian Pflug wrote: As you said, the CLOG page corresponding to nextId *should* always be accessible at the start of recovery (Unless whole file has been removed by VACUUM, that is). So we shouldn't need to extends CLOG. Yet the error suggest that the CLOG is, in

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-26 Thread Aidan Van Dyk
On Wed, Oct 26, 2011 at 9:57 AM, Florian Pflug f...@phlo.org wrote: On Oct26, 2011, at 15:12 , Simon Riggs wrote: On Wed, Oct 26, 2011 at 12:54 PM, Aidan Van Dyk ai...@highrise.ca wrote: The read fails because their is no data at the location it's trying to read from, because clog hasn't been

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-26 Thread Florian Pflug
On Oct26, 2011, at 17:36 , Chris Redekop wrote: And I think they also reported that if they didn't run hot standby, but just normal recovery into a new master, it didn't have the problem either, i.e. without hotstandby, recovery ran, properly extended the clog, and then ran as a new master

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-26 Thread Simon Riggs
On Wed, Oct 26, 2011 at 3:47 PM, Florian Pflug f...@phlo.org wrote: On Oct26, 2011, at 15:57 , Florian Pflug wrote: As you said, the CLOG page corresponding to nextId *should* always be accessible at the start of recovery (Unless whole file has been removed by VACUUM, that is). So we shouldn't

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-26 Thread Simon Riggs
On Wed, Oct 26, 2011 at 5:08 PM, Simon Riggs si...@2ndquadrant.com wrote: Brewing a patch now. Latest thinking... confirmations or other error reports please. This fixes both the subtrans and clog bugs in one patch. --  Simon Riggs   http://www.2ndQuadrant.com/  PostgreSQL

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-26 Thread Simon Riggs
On Wed, Oct 26, 2011 at 5:16 PM, Simon Riggs si...@2ndquadrant.com wrote: On Wed, Oct 26, 2011 at 5:08 PM, Simon Riggs si...@2ndquadrant.com wrote: Brewing a patch now. Latest thinking... confirmations or other error reports please. This fixes both the subtrans and clog bugs in one patch.

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-26 Thread Chris Redekop
FYI I have given this patch a good test and can now no longer reproduce either the subtrans nor the clog error. Thanks guys! On Wed, Oct 26, 2011 at 11:09 AM, Simon Riggs si...@2ndquadrant.com wrote: On Wed, Oct 26, 2011 at 5:16 PM, Simon Riggs si...@2ndquadrant.com wrote: On Wed, Oct 26,

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-26 Thread Florian Pflug
On Oct26, 2011, at 18:08 , Simon Riggs wrote: On Wed, Oct 26, 2011 at 3:47 PM, Florian Pflug f...@phlo.org wrote: On Oct26, 2011, at 15:57 , Florian Pflug wrote: Thus, if the CLOG is extended after (or in the middle of) CheckPointGuts(), but before LogStandbySnapshot(), then we end up with a

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-26 Thread Robert Haas
On Wed, Oct 26, 2011 at 12:16 PM, Simon Riggs si...@2ndquadrant.com wrote: On Wed, Oct 26, 2011 at 5:08 PM, Simon Riggs si...@2ndquadrant.com wrote: Brewing a patch now. Latest thinking... confirmations or other error reports please. This fixes both the subtrans and clog bugs in one patch.

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-26 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes: On Wed, Oct 26, 2011 at 12:16 PM, Simon Riggs si...@2ndquadrant.com wrote: This fixes both the subtrans and clog bugs in one patch. I don't see the point of changing StartupCLOG() to be an empty function and adding a new function TrimCLOG() that does

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-25 Thread Simon Riggs
On Mon, Oct 24, 2011 at 7:13 AM, Florian Pflug f...@phlo.org wrote: I think Simon's theory that we're starting recovery from the wrong place, i.e. should start with an earlier WAL location, is probably correct. The question is, why? Err, that's not what I said and I don't mean that. Having

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-25 Thread Simon Riggs
On Tue, Oct 25, 2011 at 8:03 AM, Simon Riggs si...@2ndquadrant.com wrote: We are starting recovery at the right place but we are initialising the clog and subtrans incorrectly. Precisely, the oldestActiveXid is being derived later than it should be, which can cause problems if this then means

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-25 Thread Florian Pflug
On Oct25, 2011, at 11:13 , Simon Riggs wrote: On Tue, Oct 25, 2011 at 8:03 AM, Simon Riggs si...@2ndquadrant.com wrote: We are starting recovery at the right place but we are initialising the clog and subtrans incorrectly. Precisely, the oldestActiveXid is being derived later than it should

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-25 Thread Simon Riggs
On Tue, Oct 25, 2011 at 12:39 PM, Florian Pflug f...@phlo.org wrote: What I don't understand is how this affects the CLOG. How does oldestActiveXID factor into CLOG initialization? It is an entirely different error. Chris' clog error was caused by a file read error. The file was opened, we

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-25 Thread Chris Redekop
Chris, can you rearrange the backup so you copy the pg_control file as the first act after the pg_start_backup? I tried this and it doesn't seem to make any difference. I also tried the patch and I can no longer reproduce the subtrans error, however instead it now it starts up, but never gets

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-25 Thread Chris Redekop
That isn't a Hot Standby problem, a recovery problem nor is it certain its a PostgreSQL problem. Do you have any theories on this that I could help investigate? It happens even when using pg_basebackup and it persists until another sync is performed, so the files must be in some state that

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-24 Thread Florian Pflug
On Oct24, 2011, at 01:27 , Simon Riggs wrote: FATAL: could not access status of transaction 21110784 which, in pg_subtrans, is the first xid on a new subtrans page. So we have missed zeroing a page. pg_control shows ... Latest checkpoint's oldestActiveXID: 2111 which shows quite

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-24 Thread Florian Pflug
On Oct23, 2011, at 22:48 , Daniel Farina wrote: It doesn't seem meaningful for StartupCLOG (or, indeed, any of the hot-standby path functionality) to be called before that code is executed, but it is anyway right now. I think the idea is to check that the CLOG part which recovery *won't*

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-23 Thread Daniel Farina
On Mon, Oct 17, 2011 at 11:30 PM, Chris Redekop ch...@replicon.com wrote: Well, on the other hand maybe there is something wrong with the data.  Here's the test/steps I just did - 1. I do the pg_basebackup when the master is under load, hot slave now will not start up but warm slave will. 2.

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-23 Thread Simon Riggs
On Sun, Oct 23, 2011 at 9:48 PM, Daniel Farina dan...@heroku.com wrote: On Mon, Oct 17, 2011 at 11:30 PM, Chris Redekop ch...@replicon.com wrote: Well, on the other hand maybe there is something wrong with the data.  Here's the test/steps I just did - 1. I do the pg_basebackup when the master

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-23 Thread Simon Riggs
On Sun, Oct 16, 2011 at 2:33 AM, Chris Redekop ch...@replicon.com wrote: pg_subtrans: http://pastebin.com/qAXEHAQt I confirm this as a HS issue and will investigate from here. FATAL: could not access status of transaction 21110784 which, in pg_subtrans, is the first xid on a new subtrans

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-23 Thread Simon Riggs
On Sun, Oct 23, 2011 at 9:48 PM, Daniel Farina dan...@heroku.com wrote: Having digged at this a little -- but not too much -- the problem seems to be that postgres is reading the commit logs way, way too early, that is to say, before it has played enough WAL to be 'consistent' (the WAL

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-17 Thread Chris Redekop
I can confirm that both the pg_clog and pg_subtrans errors do occur when using pg_basebackup instead of rsync. The data itself seems to be fine because using the exact same data I can start up a warm standby no problem, it is just the hot standby that will not start up. On Sat, Oct 15, 2011 at

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-17 Thread Chris Redekop
Well, on the other hand maybe there is something wrong with the data. Here's the test/steps I just did - 1. I do the pg_basebackup when the master is under load, hot slave now will not start up but warm slave will. 2. I start a warm slave and let it catch up to current 3. On the slave I change

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-15 Thread Chris Redekop
Linas, could you capture the output of pg_controldata *and* increase the log level to DEBUG1 on the standby? We should then see nextXid value of the checkpoint the recovery is starting from. I'll try to do that whenever I'm in that territory again... Incidentally, recently there was a lot

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-09-29 Thread Linas Virbalas
Linas, could you capture the output of pg_controldata *and* increase the log level to DEBUG1 on the standby? We should then see nextXid value of the checkpoint the recovery is starting from. I'll try to do that whenever I'm in that territory again... Incidentally, recently there was a lot of

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-09-29 Thread Florian Pflug
On Sep29, 2011, at 17:44 , Linas Virbalas wrote: I also checked what rsync does when a file vanishes after rsync computed the file list, but before it is sent. rsync 3.0.7 on OSX, at least, complains loudly, and doesn't sync the file. It BTW also exits non-zero, with a special exit code for

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-09-27 Thread Florian Pflug
On Sep23, 2011, at 21:10 , Robert Haas wrote: So the actual error message in the last test was: 2011-09-21 13:41:05 CEST FATAL: could not access status of transaction 1188673 ...but we can't tell if that was before or after nextXid, which seems like it would be useful to know. If

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-09-24 Thread Daniel Farina
On Fri, Sep 23, 2011 at 9:45 AM, Robert Haas robertmh...@gmail.com wrote: On Fri, Sep 23, 2011 at 11:43 AM, Aidan Van Dyk ai...@highrise.ca wrote: On Fri, Sep 23, 2011 at 4:41 AM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: Unfortunately, it's impossible, because the error

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-09-23 Thread Linas Virbalas
On 9/22/11 6:59 PM, Euler Taveira de Oliveira eu...@timbira.com wrote: If needed, I could do that, if I had the exact procedure... Currently, during the start of the backup I take the following information: Just show us the output of pg_start_backup and part of the standby log with the

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-09-23 Thread Heikki Linnakangas
On 23.09.2011 11:02, Linas Virbalas wrote: On 9/22/11 6:59 PM, Euler Taveira de Oliveiraeu...@timbira.com wrote: If needed, I could do that, if I had the exact procedure... Currently, during the start of the backup I take the following information: Just show us the output of pg_start_backup

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-09-23 Thread Florian Pflug
On Sep23, 2011, at 10:41 , Heikki Linnakangas wrote: On 23.09.2011 11:02, Linas Virbalas wrote: On 9/22/11 6:59 PM, Euler Taveira de Oliveiraeu...@timbira.com wrote: If needed, I could do that, if I had the exact procedure... Currently, during the start of the backup I take the following

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-09-23 Thread Heikki Linnakangas
On 23.09.2011 11:48, Florian Pflug wrote: On Sep23, 2011, at 10:41 , Heikki Linnakangas wrote: On 23.09.2011 11:02, Linas Virbalas wrote: On 9/22/11 6:59 PM, Euler Taveira de Oliveiraeu...@timbira.com wrote: If needed, I could do that, if I had the exact procedure... Currently, during the

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-09-23 Thread Linas Virbalas
On 9/23/11 12:05 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: It looks to me that pg_clog/0001 exists, but it shorter than recovery expects. Which shouldn't happen, of course, because the start-backup checkpoint should flush all the clog that's needed by recovery to disk

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-09-23 Thread Robert Haas
On Fri, Sep 23, 2011 at 8:47 AM, Linas Virbalas linas.virba...@continuent.com wrote: But on the standby its size is the old one (thus, it seems, that the size changed after the rsync transfer and before the pg_stop_backup() was called): Now that seems pretty weird - I don't think that file

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-09-23 Thread Linas Virbalas
But on the standby its size is the old one (thus, it seems, that the size changed after the rsync transfer and before the pg_stop_backup() was called): Now that seems pretty weird - I don't think that file should ever shrink. It seems, I was not clear in my last example. The pg_clog file

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-09-23 Thread Aidan Van Dyk
On Fri, Sep 23, 2011 at 4:41 AM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: Unfortunately, it's impossible, because the error message Could not read from file pg_clog/0001 at offset 32768: Success is shown (and startup aborted) before the turn for redo starts at message

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-09-23 Thread Alvaro Herrera
Excerpts from Linas Virbalas's message of vie sep 23 09:47:20 -0300 2011: On 9/23/11 12:05 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: But on the standby its size is the old one (thus, it seems, that the size changed after the rsync transfer and before the

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-09-23 Thread Magnus Hagander
On Sep 23, 2011 5:59 PM, Alvaro Herrera alvhe...@commandprompt.com wrote: Excerpts from Linas Virbalas's message of vie sep 23 09:47:20 -0300 2011: On 9/23/11 12:05 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: But on the standby its size is the old one (thus, it

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-09-23 Thread Andres Freund
Hi, On Wednesday 21 Sep 2011 16:44:30 Linas Virbalas wrote: 2011-09-21 13:41:05 CEST DETAIL: Could not read from file pg_clog/0001 at offset 32768: Success. Any chance you can attach gdb to the startup process and provide a backtrace from the place where this message is printed? Greetings,

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-09-23 Thread Heikki Linnakangas
On 23.09.2011 19:03, Magnus Hagander wrote: On Sep 23, 2011 5:59 PM, Alvaro Herreraalvhe...@commandprompt.com wrote: Excerpts from Linas Virbalas's message of vie sep 23 09:47:20 -0300 2011: On 9/23/11 12:05 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: But on the

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-09-23 Thread Robert Haas
On Fri, Sep 23, 2011 at 11:43 AM, Aidan Van Dyk ai...@highrise.ca wrote: On Fri, Sep 23, 2011 at 4:41 AM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: Unfortunately, it's impossible, because the error message Could not read from file pg_clog/0001 at offset 32768: Success is

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-09-23 Thread Florian Pflug
On Sep23, 2011, at 18:03 , Magnus Hagander wrote: On Sep 23, 2011 5:59 PM, Alvaro Herrera alvhe...@commandprompt.com wrote: Sounds like rsync is caching the file size at the start of the run, and then copying that many bytes, ignoring the growth that occurred after it started. That

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-09-23 Thread Florian Pflug
On Sep23, 2011, at 18:45 , Robert Haas wrote: Ah. I think you are right - Heikki made the same point. Maybe some of the stuff that happens just after this comment: /* * Initialize for Hot Standby, if enabled. We won't let backends in * yet, not until we've reached

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-09-23 Thread Heikki Linnakangas
On 23.09.2011 19:49, Florian Pflug wrote: On Sep23, 2011, at 18:45 , Robert Haas wrote: Ah. I think you are right - Heikki made the same point. Maybe some of the stuff that happens just after this comment: /* * Initialize for Hot Standby, if enabled. We won't let backends in

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-09-23 Thread Robert Haas
On Fri, Sep 23, 2011 at 12:58 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: There are pretty clear rules on what state clog can be in. When you launch postmaster in a standby: * Any clog preceding the nextXid from the checkpoint record we start recovery from, must either

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-09-22 Thread Linas Virbalas
2.2. pg_start_backup(Obackup_under_loadš) on the master (this will take a while as master is loaded up); No. if you use pg_start_backup('foo', true) it will be fast. Check the manual. If the server is sufficiently heavily loaded that a checkpoint takes a nontrivial amount of time, the OP

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-09-22 Thread Euler Taveira de Oliveira
On 22-09-2011 11:24, Linas Virbalas wrote: In order to check more cases, I have changed the procedure to force an immediate checkpoint, i.e. pg_start_backup('backup_under_load', true). With the same load generator running, pg_start_backup returned almost instantaneously compared to how long it

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-09-22 Thread Robert Haas
2011/9/22 Euler Taveira de Oliveira eu...@timbira.com: On 22-09-2011 11:24, Linas Virbalas wrote: In order to check more cases, I have changed the procedure to force an immediate checkpoint, i.e. pg_start_backup('backup_under_load', true). With the same load generator running,

[HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-09-21 Thread Linas Virbalas
Hello, * Context * I'm observing problems with provisioning a standby from the master by following a basic and documented Making a Base Backup [1] procedure with rsync if, in the mean time, heavy load is applied on the master. After searching the archives, the only more discussed and similar

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-09-21 Thread Euler Taveira de Oliveira
On 21-09-2011 11:44, Linas Virbalas wrote: [This question doesn't belong to -hackers. Please post it in -general or -admin] Procedure: 1. Start load generator on the master (WAL archiving enabled). 2. Prepare a Streaming Replication standby (accepting WAL files too): 2.1. pg_switch_xlog() on

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-09-21 Thread Florian Pflug
On Sep21, 2011, at 16:44 , Linas Virbalas wrote: After searching the archives, the only more discussed and similar issue I found hit was by Daniel Farina in a thread hot backups: am I doing it wrong, or do we have a problem with pg_clog? [2], but, it seems, the issue was discarded because of a

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-09-21 Thread Robert Haas
On Wed, Sep 21, 2011 at 12:22 PM, Euler Taveira de Oliveira eu...@timbira.com wrote: [This question doesn't belong to -hackers. Please post it in -general or -admin] -hackers or -bugs seems appropriate to me; I think this is a bug. 2.2. pg_start_backup(Obackup_under_loadš) on the master (this