Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-03-30 Thread Heikki Linnakangas
Fujii Masao wrote: * Small code changes to handling of failedSources, inspired by your comment. No change in functionality. This is also available in my git repository at git://git.postgresql.org/git/users/heikki/postgres.git, branch xlogchanges I looked the patch and was not able to find

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-03-30 Thread Fujii Masao
On Wed, Mar 31, 2010 at 1:28 AM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: Fujii Masao wrote: * Small code changes to handling of failedSources, inspired by your comment. No change in functionality. This is also available in my git repository at

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-03-25 Thread Simon Riggs
On Thu, 2010-03-25 at 11:08 +0900, Fujii Masao wrote: On Thu, Mar 25, 2010 at 8:23 AM, Simon Riggs si...@2ndquadrant.com wrote: PANICing won't change the situation, so it just destroys server availability. If we had 1 master and 42 slaves then this behaviour would take down almost the whole

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-03-25 Thread Heikki Linnakangas
Tom Lane wrote: Fujii Masao masao.fu...@gmail.com writes: OK. How about making the startup process emit WARNING, stop WAL replay and wait for the presence of trigger file, when an invalid record is found? Which keeps the server up for readonly queries. And if the trigger file is found, I

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-03-25 Thread Simon Riggs
On Thu, 2010-03-25 at 11:08 +0900, Fujii Masao wrote: On Thu, Mar 25, 2010 at 8:23 AM, Simon Riggs si...@2ndquadrant.com wrote: PANICing won't change the situation, so it just destroys server availability. If we had 1 master and 42 slaves then this behaviour would take down almost the whole

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-03-25 Thread Simon Riggs
On Thu, 2010-03-25 at 10:11 +0200, Heikki Linnakangas wrote: PANIC seems like the appropriate solution for now. It definitely is not. Think some more. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-03-25 Thread Heikki Linnakangas
Simon Riggs wrote: On Thu, 2010-03-25 at 11:08 +0900, Fujii Masao wrote: And if the trigger file is found, I think that the startup process should emit a FATAL, i.e., the server should exit immediately, to prevent the server from becoming the primary in a half-finished state. Please

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-03-25 Thread Heikki Linnakangas
(cc'ing docs list) Simon Riggs wrote: The lack of docs begins to show a lack of coherent high-level design here. Yeah, I think you're right. It's becoming hard to keep track of how it's supposed to behave. By now, I've forgotten what this thread was even about. The major design decision in

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-03-25 Thread Heikki Linnakangas
Simon Riggs wrote: On Thu, 2010-03-25 at 10:11 +0200, Heikki Linnakangas wrote: PANIC seems like the appropriate solution for now. It definitely is not. Think some more. Well, what happens now in previous versions with pg_standby et al is that the standby starts up. That doesn't seem

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-03-25 Thread Heikki Linnakangas
Heikki Linnakangas wrote: Simon Riggs wrote: On Thu, 2010-03-25 at 10:11 +0200, Heikki Linnakangas wrote: PANIC seems like the appropriate solution for now. It definitely is not. Think some more. Well, what happens now in previous versions with pg_standby et al is that the standby starts

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-03-25 Thread Heikki Linnakangas
Fujii Masao wrote: sources = ~failedSources; failedSources |= readSource; The above lines in XLogPageRead() seem not to be required in normal recovery case (i.e., standby_mode = off). So how about the attached patch? *** 9050,9056 next_record_is_invalid: --- 9047,9056

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-03-25 Thread Heikki Linnakangas
Fujii Masao wrote: On second thought, the following lines seem to be necessary just after calling XLogPageRead() since it reads new WAL file from another source. if (readSource == XLOG_FROM_STREAM || readSource == XLOG_FROM_ARCHIVE) emode = PANIC; else

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-03-25 Thread Robert Haas
On Thu, Mar 25, 2010 at 8:55 AM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: * If a corrupt WAL record is found in archive or streamed from master in standby mode, throw WARNING instead of PANIC, and keep trying. In archive recovery (ie. standby_mode=off) it's still a PANIC.

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-03-25 Thread Simon Riggs
On Thu, 2010-03-25 at 12:15 +0200, Heikki Linnakangas wrote: (cc'ing docs list) Simon Riggs wrote: The lack of docs begins to show a lack of coherent high-level design here. Yeah, I think you're right. It's becoming hard to keep track of how it's supposed to behave. Thank you for

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-03-25 Thread Simon Riggs
On Thu, 2010-03-25 at 12:26 +0200, Heikki Linnakangas wrote: Simon Riggs wrote: On Thu, 2010-03-25 at 10:11 +0200, Heikki Linnakangas wrote: PANIC seems like the appropriate solution for now. It definitely is not. Think some more. Well, what happens now in previous versions with

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-03-25 Thread Fujii Masao
On Thu, Mar 25, 2010 at 9:55 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: * Fix the bug of a spurious PANIC in archive recovery, if the WAL ends in the middle of a WAL record that continues over a WAL segment boundary. * If a corrupt WAL record is found in archive or

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-03-24 Thread Heikki Linnakangas
Fujii Masao wrote: But in the current (v8.4 or before) behavior, recovery ends normally when an invalid record is found in an archived WAL file. Otherwise, the server would never be able to start normal processing when there is a corrupted archived file for some reasons. So, that invalid

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-03-24 Thread Fujii Masao
On Wed, Mar 24, 2010 at 9:31 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: Hmm, true, this changes behavior over previous releases. I tend to think that it's always an error if there's a corrupt file in the archive, though, and PANIC is appropriate. If the administrator

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-03-24 Thread Fujii Masao
On Wed, Mar 24, 2010 at 10:20 PM, Fujii Masao masao.fu...@gmail.com wrote: Thanks. That's easily fixable (applies over the previous patch): --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -3773,7 +3773,7 @@ retry:                pagelsn.xrecoff = 0;        

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-03-24 Thread Simon Riggs
On Wed, 2010-03-24 at 14:31 +0200, Heikki Linnakangas wrote: Fujii Masao wrote: But in the current (v8.4 or before) behavior, recovery ends normally when an invalid record is found in an archived WAL file. Otherwise, the server would never be able to start normal processing when there is

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-03-24 Thread Fujii Masao
On Thu, Mar 25, 2010 at 8:23 AM, Simon Riggs si...@2ndquadrant.com wrote: PANICing won't change the situation, so it just destroys server availability. If we had 1 master and 42 slaves then this behaviour would take down almost the whole server farm at once. Very uncool. You might have reason

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-03-24 Thread Tom Lane
Fujii Masao masao.fu...@gmail.com writes: OK. How about making the startup process emit WARNING, stop WAL replay and wait for the presence of trigger file, when an invalid record is found? Which keeps the server up for readonly queries. And if the trigger file is found, I think that the

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-03-23 Thread Fujii Masao
Sorry for the delay. On Fri, Mar 19, 2010 at 8:37 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: Here's a patch I've been playing with. Thanks! I'm reading the patch. The idea is that in standby mode, the server keeps trying to make progress in the recovery by: a)

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-03-19 Thread Simon Riggs
On Thu, 2010-03-18 at 23:27 +0900, Fujii Masao wrote: I agree that this is a bigger problem. Since the standby always starts walreceiver before replaying any WAL files in pg_xlog, walreceiver tries to receive the WAL files following the REDO starting point even if they have already been in

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-03-19 Thread Heikki Linnakangas
Simon Riggs wrote: On Thu, 2010-03-18 at 23:27 +0900, Fujii Masao wrote: I agree that this is a bigger problem. Since the standby always starts walreceiver before replaying any WAL files in pg_xlog, walreceiver tries to receive the WAL files following the REDO starting point even if they

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-03-19 Thread Tom Lane
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes: Simon Riggs wrote: We might also have written half a file many times. The files in pg_xlog are suspect whereas the files in the archive are not. If we have both we should prefer the archive. Yep. Really? That will result in a

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-03-19 Thread Heikki Linnakangas
Tom Lane wrote: Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes: Simon Riggs wrote: We might also have written half a file many times. The files in pg_xlog are suspect whereas the files in the archive are not. If we have both we should prefer the archive. Yep. Really?

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-03-19 Thread Alvaro Herrera
Heikki Linnakangas escribió: When recovery reaches an invalid WAL record, typically caused by a half-written WAL file, it closes the file and moves to the next source. If an error is found in a file restored from archive or in a portion just streamed from master, however, a PANIC is thrown,

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-03-19 Thread Heikki Linnakangas
Alvaro Herrera wrote: Heikki Linnakangas escribió: When recovery reaches an invalid WAL record, typically caused by a half-written WAL file, it closes the file and moves to the next source. If an error is found in a file restored from archive or in a portion just streamed from master,

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-03-18 Thread Fujii Masao
On Wed, Mar 17, 2010 at 7:35 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: Fujii Masao wrote: I found another missing feature in new file-based log shipping (i.e., standby_mode is enabled and 'cp' is used as restore_command). After the trigger file is found, the startup

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-03-17 Thread Heikki Linnakangas
Fujii Masao wrote: I found another missing feature in new file-based log shipping (i.e., standby_mode is enabled and 'cp' is used as restore_command). After the trigger file is found, the startup process with pg_standby tries to replay all of the WAL files in both pg_xlog and the archive.

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-03-17 Thread Simon Riggs
On Wed, 2010-03-17 at 12:35 +0200, Heikki Linnakangas wrote: Looking into this, I realized that we have a bigger problem... A lot of this would be easier if you do the docs first, then work through the problems. The new system is more complex, since it has two modes rather than one and also

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-02-18 Thread Fujii Masao
On Fri, Feb 12, 2010 at 2:29 AM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: So the only major feature we're missing is the ability to clean up old files. I found another missing feature in new file-based log shipping (i.e., standby_mode is enabled and 'cp' is used as

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-02-14 Thread Fujii Masao
On Sat, Feb 13, 2010 at 1:10 AM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: Are you thinking of a scenario where remove_command gets stuck, and prevents bgwriter from performing restartpoints while it's stuck? Yes. If there is the archive in the remote server and the network

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-02-12 Thread Simon Riggs
On Fri, 2010-02-12 at 14:38 +0900, Fujii Masao wrote: On Thu, Feb 11, 2010 at 11:22 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: Simon Riggs wrote: Might it not be simpler to add a parameter onto pg_standby? We send %s to tell pg_standby the standby_mode of the server

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-02-12 Thread Heikki Linnakangas
Simon Riggs wrote: In 8.4 it is pg_standby that was responsible for clearing down the archive, which is why I suggested using pg_standby for that again. I agree that will not work. The important thing is not pg_standby but that we have a valid mechanism for clearing down the archive. Good

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-02-12 Thread Simon Riggs
On Fri, 2010-02-12 at 12:54 +, Simon Riggs wrote: So I suggest that you have a new action that gets called after every checkpoint to clear down the archive. It will remove all files from the archive prior to %r. We can implement that as a sequence of unlink()s from within the server, or

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-02-12 Thread Fujii Masao
On Fri, Feb 12, 2010 at 10:10 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: So I suggest that you have a new action that gets called after every checkpoint to clear down the archive. It will remove all files from the archive prior to %r. We can implement that as a sequence

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-02-12 Thread Heikki Linnakangas
Fujii Masao wrote: On Fri, Feb 12, 2010 at 10:10 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: So I suggest that you have a new action that gets called after every checkpoint to clear down the archive. It will remove all files from the archive prior to %r. We can implement

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-02-12 Thread Dimitri Fontaine
Simon Riggs si...@2ndquadrant.com writes: Attached patch implements pg_standby for use as an archive_cleanup_command, reusing existing code with new -a option. Happy to add the archive_cleanup_command into main server as well, if you like. Won't take long. Would it be possible to have the

[HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-02-12 Thread Greg Stark
so I from by like having the server doing the cleanup because it down by necessarily have the while picture. it down nt know of it is the only replica reading these log files our if the site policy is to keep them for disaster recovery purposes. I like having this as an return val command though.

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-02-11 Thread Simon Riggs
On Wed, 2010-02-10 at 09:32 +0200, Heikki Linnakangas wrote: Fujii Masao wrote: As I pointed out previously, the standby might restore a partially-filled WAL file that is being archived by the primary, and cause a FATAL error. And this happened in my box when I was testing the SR.

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-02-11 Thread Heikki Linnakangas
Simon Riggs wrote: On Wed, 2010-02-10 at 09:32 +0200, Heikki Linnakangas wrote: Hmm, so after running restore_command, check the file size and if it's too short, treat it the same as if restore_command returned non-zero? And it will be retried on the next iteration. Works for me, though OTOH

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-02-11 Thread Simon Riggs
On Thu, 2010-02-11 at 14:22 +0200, Heikki Linnakangas wrote: Simon Riggs wrote: On Wed, 2010-02-10 at 09:32 +0200, Heikki Linnakangas wrote: Hmm, so after running restore_command, check the file size and if it's too short, treat it the same as if restore_command returned non-zero? And it

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-02-11 Thread Heikki Linnakangas
Simon Riggs wrote: On Thu, 2010-02-11 at 14:22 +0200, Heikki Linnakangas wrote: Simon Riggs wrote: On Wed, 2010-02-10 at 09:32 +0200, Heikki Linnakangas wrote: Hmm, so after running restore_command, check the file size and if it's too short, treat it the same as if restore_command returned

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-02-11 Thread Simon Riggs
On Thu, 2010-02-11 at 14:44 +0200, Heikki Linnakangas wrote: Simon Riggs wrote: On Thu, 2010-02-11 at 14:22 +0200, Heikki Linnakangas wrote: Simon Riggs wrote: On Wed, 2010-02-10 at 09:32 +0200, Heikki Linnakangas wrote: Hmm, so after running restore_command, check the file size and if

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-02-11 Thread Heikki Linnakangas
Simon Riggs wrote: If you were running pg_standby as the restore_command then this error wouldn't happen. So you need to explain why running pg_standby cannot solve your problem and why we must fix it by replicating code that has previously existed elsewhere. pg_standby cannot be used with

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-02-11 Thread Dimitri Fontaine
Simon Riggs si...@2ndquadrant.com writes: If you were running pg_standby as the restore_command then this error wouldn't happen. So you need to explain why running pg_standby cannot solve your problem and why we must fix it by replicating code that has previously existed elsewhere. Let me

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-02-11 Thread Simon Riggs
On Thu, 2010-02-11 at 15:28 +0200, Heikki Linnakangas wrote: Simon Riggs wrote: If you were running pg_standby as the restore_command then this error wouldn't happen. So you need to explain why running pg_standby cannot solve your problem and why we must fix it by replicating code that has

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-02-11 Thread Simon Riggs
On Thu, 2010-02-11 at 14:41 +0100, Dimitri Fontaine wrote: Simon Riggs si...@2ndquadrant.com writes: If you were running pg_standby as the restore_command then this error wouldn't happen. So you need to explain why running pg_standby cannot solve your problem and why we must fix it by

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-02-11 Thread Heikki Linnakangas
Simon Riggs wrote: One question then: how do we ensure that the archive does not grow too big? pg_standby cleans down the archive using %R. That function appears to not exist anymore. You can still use %R. Of course, plain 'cp' won't know what to do with it, so a script will then be required.

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-02-11 Thread Aidan Van Dyk
* Heikki Linnakangas heikki.linnakan...@enterprisedb.com [100211 08:29]: To suppport a restore_command that does the sleeping itself, like pg_standby, would require a major rearchitecting of the retry logic. And I don't see why that'd desirable anyway. It's easier for the admin to set up

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-02-11 Thread Simon Riggs
On Thu, 2010-02-11 at 15:55 +0200, Heikki Linnakangas wrote: Simon Riggs wrote: One question then: how do we ensure that the archive does not grow too big? pg_standby cleans down the archive using %R. That function appears to not exist anymore. You can still use %R. Of course, plain

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-02-11 Thread Heikki Linnakangas
Aidan Van Dyk wrote: But colour me confused, I'm still not understanding why this is any different that with normal PITR recovery. So even with a plain cp in your recovery command instead of a sleep+copy (a la pg_standby, or PITR tools, or all the home-grown solutions out thery), I'm not

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-02-11 Thread Heikki Linnakangas
Simon Riggs wrote: Might it not be simpler to add a parameter onto pg_standby? We send %s to tell pg_standby the standby_mode of the server which is calling it so it can decide how to act in each case. That would work too, but it doesn't seem any simpler to me. On the contrary. -- Heikki

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-02-11 Thread Simon Riggs
On Thu, 2010-02-11 at 16:22 +0200, Heikki Linnakangas wrote: Simon Riggs wrote: Might it not be simpler to add a parameter onto pg_standby? We send %s to tell pg_standby the standby_mode of the server which is calling it so it can decide how to act in each case. That would work too, but

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-02-11 Thread Aidan Van Dyk
* Heikki Linnakangas heikki.linnakan...@enterprisedb.com [100211 09:17]: If the file is just being copied to the archive when restore_command ('cp', say) is launched, it will copy a half file. That's not a problem for PITR, because PITR will end at the end of valid WAL anyway, but returning a

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-02-11 Thread Greg Smith
Heikki Linnakangas wrote: Simon Riggs wrote: Might it not be simpler to add a parameter onto pg_standby? We send %s to tell pg_standby the standby_mode of the server which is calling it so it can decide how to act in each case. That would work too, but it doesn't seem any simpler to

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-02-11 Thread Heikki Linnakangas
Simon Riggs wrote: On Thu, 2010-02-11 at 16:22 +0200, Heikki Linnakangas wrote: Simon Riggs wrote: Might it not be simpler to add a parameter onto pg_standby? We send %s to tell pg_standby the standby_mode of the server which is calling it so it can decide how to act in each case. That would

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-02-11 Thread Euler Taveira de Oliveira
Simon Riggs escreveu: It would mean that pg_standby would act appropriately according to the setting of standby_mode. So you wouldn't need multiple examples of use, it would all just work whatever the setting of standby_mode. Nice simple entry in the docs. +1. I like the %s idea. IMHO fixing

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-02-11 Thread Heikki Linnakangas
Aidan Van Dyk wrote: * Heikki Linnakangas heikki.linnakan...@enterprisedb.com [100211 09:17]: If the file is just being copied to the archive when restore_command ('cp', say) is launched, it will copy a half file. That's not a problem for PITR, because PITR will end at the end of valid WAL

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-02-11 Thread Heikki Linnakangas
Aidan Van Dyk wrote: * Heikki Linnakangas heikki.linnakan...@enterprisedb.com [100211 09:17]: Yeah, if you're careful about that, then this change isn't required. But pg_standby protects against that, so I think it'd be reasonable to have the same level of protection built-in. It's not a lot

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-02-11 Thread Aidan Van Dyk
* Heikki Linnakangas heikki.linnakan...@enterprisedb.com [100211 12:04]: But it can be a problem - without the last WAL (or at least enough of it) the master switched and archived, you have no guarantee of having being consistent again (I'm thinking specifically of recovering from a fresh

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-02-11 Thread Tom Lane
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes: -1. it isn't necessary for PITR. It's a new requirement for standby_mode='on', unless we add the file size check into the backend. I think we should add the file size check to the backend instead and save admins the headache. I

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-02-11 Thread Heikki Linnakangas
Aidan Van Dyk wrote: * Heikki Linnakangas heikki.linnakan...@enterprisedb.com [100211 12:04]: But it can be a problem - without the last WAL (or at least enough of it) the master switched and archived, you have no guarantee of having being consistent again (I'm thinking specifically of

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-02-11 Thread Simon Riggs
On Thu, 2010-02-11 at 13:08 -0500, Tom Lane wrote: Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes: -1. it isn't necessary for PITR. It's a new requirement for standby_mode='on', unless we add the file size check into the backend. I think we should add the file size check to

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-02-11 Thread Kevin Grittner
Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: I think 'rsync' has the same problem. There is a switch you can use to create the problem under rsync, but by default rsync copies to a temporary file name and moves the completed file to the target name. -Kevin -- Sent via

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-02-11 Thread Garick Hamlin
On Thu, Feb 11, 2010 at 01:22:44PM -0500, Kevin Grittner wrote: Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: I think 'rsync' has the same problem. There is a switch you can use to create the problem under rsync, but by default rsync copies to a temporary file name and

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-02-11 Thread Simon Riggs
On Thu, 2010-02-11 at 19:29 +0200, Heikki Linnakangas wrote: Aidan Van Dyk wrote: * Heikki Linnakangas heikki.linnakan...@enterprisedb.com [100211 09:17]: Yeah, if you're careful about that, then this change isn't required. But pg_standby protects against that, so I think it'd be

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-02-11 Thread Fujii Masao
On Thu, Feb 11, 2010 at 11:22 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: Simon Riggs wrote: Might it not be simpler to add a parameter onto pg_standby? We send %s to tell pg_standby the standby_mode of the server which is calling it so it can decide how to act in each

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-02-11 Thread Heikki Linnakangas
Simon Riggs wrote: On Thu, 2010-02-11 at 13:08 -0500, Tom Lane wrote: Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes: -1. it isn't necessary for PITR. It's a new requirement for standby_mode='on', unless we add the file size check into the backend. I think we should add the

[HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-02-10 Thread Fujii Masao
On Wed, Feb 10, 2010 at 4:32 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: Hmm, so after running restore_command, check the file size and if it's too short, treat it the same as if restore_command returned non-zero? Yes, only in standby mode case. OTOH I think that normal

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-02-10 Thread Aidan Van Dyk
* Heikki Linnakangas heikki.linnakan...@enterprisedb.com [100210 02:33]: Hmm, so after running restore_command, check the file size and if it's too short, treat it the same as if restore_command returned non-zero? And it will be retried on the next iteration. Works for me, though OTOH it

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-02-10 Thread Heikki Linnakangas
Aidan Van Dyk wrote: * Heikki Linnakangas heikki.linnakan...@enterprisedb.com [100210 02:33]: Hmm, so after running restore_command, check the file size and if it's too short, treat it the same as if restore_command returned non-zero? And it will be retried on the next iteration. Works for

[HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-02-09 Thread Fujii Masao
On Thu, Jan 28, 2010 at 12:27 AM, Heikki Linnakangas hei...@postgresql.org wrote: Log Message: --- Make standby server continuously retry restoring the next WAL segment with restore_command, if the connection to the primary server is lost. This ensures that the standby can recover

[HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-02-09 Thread Heikki Linnakangas
Fujii Masao wrote: As I pointed out previously, the standby might restore a partially-filled WAL file that is being archived by the primary, and cause a FATAL error. And this happened in my box when I was testing the SR. sby [20088] FATAL: archive file 00010087 has wrong