Re: [HACKERS] Proposal for Recover mode in pg_ctl (in 8.0)

2004-11-07 Thread Simon Riggs
On Sun, 2004-11-07 at 00:16, Tom Lane wrote:
 I wrote:
  A possibly more reliable interlock would involve having the postmaster
  probe during normal startup to see if there is already an archived WAL
  segment for what it thinks is the current segment.
 
 Another and simpler way is to recommend that people use archive_command
 strings that won't overwrite an existing archive file.
 

Yes, exactly what I meant by saying I'd document the warning.

-- 
Best Regards, Simon Riggs


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] Proposal for Recover mode in pg_ctl (in 8.0)

2004-11-07 Thread Simon Riggs
On Sun, 2004-11-07 at 00:54, Mark Kirkwood wrote:
 While this is nice, it will not help you if the restoration directory is 
 different from your archive directory. That is : restore_command in 
 recovery.conf fetches from somewhere other than where archive_command in 
 postgresql.conf copied.
 

...hmmm, this doesn't look like an issue to me.

If the archive is protected when it's written, then it will be safe to
use later for restoration. The path to the archive may differ, but the
archive itself *needs* to be the same archive.

-- 
Best Regards, Simon Riggs


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Proposal for Recover mode in pg_ctl (in 8.0)

2004-11-07 Thread Simon Riggs
On Sat, 2004-11-06 at 23:29, Tom Lane wrote:
 Simon Riggs [EMAIL PROTECTED] writes:
  If a further pg_ctl mode, recover, were implemented, this would allow a
  fail safe mode for recovery.
 
  e.g.pg_ctl -D datadir recover
 
  pg_ctl could then check for the existence of a recovery.conf file and
  return an error if none were found.
 

...

 A possibly more reliable interlock would involve having the postmaster
 probe during normal startup to see if there is already an archived WAL
 segment for what it thinks is the current segment.  However there are
 several issues here: one is that if you're doing partial-log-file
 shipping, that isn't necessarily an error condition; another is that
 we don't know how to do such a probe unless more information is added
 to postgresql.conf.  We could imagine adding another shell command
 string (something like test -f ... perhaps) but if the user gets it
 wrong he may still be left with no protection.
 

Yes, checking the archive is the safe way, but we don't know how to do
that unless restore_command has been successfully read in (currently
from recovery.conf). Putting it in postgresql.conf is the wrong place,
because that will likely be wrongly set when we restore, and I'm against
editing that file as part of a recovery...once edited, you could lose
all context and thus completely screw the recovery.

All the suggested change is about is trying to find a safe way to fail
if restore_command has not been set because recovery.conf is missing for
whatever reason.

I can't get very excited about this approach, because it only protects
 those people who (a) use pg_ctl to start the postmaster (not everyone)
 and (b) carefully follow the recovery directions (which the people you
 are worried about are very bad at, by hypothesis).
 

Well, I was trying to find a least-risk approach. Touching pg_ctl code
at this stage of beta-ness seemed more reliable than touching postmaster
code. pg_ctl doesn't catch everyone.

Placing the test in postmaster is the natural place for it, yes.

This additional keyword should never be placed in any startup script, so
the commonly used interface would not change, just the one used at
recovery time.

Documentation changes are one thing - and as you point out, if they miss
one thing in the manual, they'll miss three. I'm not expecting many
people to actually use an archive created by using copy or cp.

I'm with Mark on this: you need all the help you can get at 2am when you
just got called out of bed with a phone call from your boss *requesting*
you to recover your incredibly important production system.

I'll cut the doc changes first, then produce a slim patch on postmaster.
 
Best Regards, Simon Riggs


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] Proposal for Recover mode in pg_ctl (in 8.0)

2004-11-07 Thread Tom Lane
Simon Riggs [EMAIL PROTECTED] writes:
 On Sat, 2004-11-06 at 23:29, Tom Lane wrote:
 A possibly more reliable interlock would involve having the postmaster
 probe during normal startup to see if there is already an archived WAL
 segment for what it thinks is the current segment.

 Yes, checking the archive is the safe way, but we don't know how to do
 that unless restore_command has been successfully read in (currently
 from recovery.conf). Putting it in postgresql.conf is the wrong place,

Agreed; we left it out of postgresql.conf for good reasons.  I was
thinking in terms of adding a third command string, perhaps like

test_archive_file = 'test -f /mnt/server/archive/%f'

But it's probably best just to tell people to write their
archive_commands in a non-overwrite style.

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


[HACKERS] Proposal for Recover mode in pg_ctl (in 8.0)

2004-11-06 Thread Simon Riggs

Joachim Wieland has diligently and sensibly pointed out a potential for
user error with the current PITR implementation. This is not a bug *per
se*, but is a design flaw that more than one person could fall into. It
is a minor issue and not that likely, since the manual describes what to
do...but I'm proposing a possible solution nonetheless, since this error
is likely to occur whilst learning the PITR functionality and might
potentially dissuade users from using the approach.

Following restoration of a base backup, archive recovery is entered by
placing a recovery.conf file in the data directory and then restarting
the server using pg_ctl start. If the recovery.conf file is misnamed,
e.g. Recovery.conf, or if it is misplaced, or simply absent, then the
server start will not enter archive recovery, yet will start normally.
The precise difference might well not be apparent. Subsequent server
activity could potentially overwrite archived log files in a poorly
managed archive. Both of those situations are likely to occur
simultaneously with inexperienced users.

The first fix to this is clearly to document the possibility and warn
the user of this possibility - i.e. describe what NOT to do when
invoking archive recovery. This will be submitted shortly.

A further fix is to implement a fail safe mode for invoking recovery.
Rather than making the recovery a normal server start, which then
searches for recovery.conf, it would be better to indicate that the
server start is expected to be a recovery and so the absence of a
recovery.conf should be regarded as an error.

If a further pg_ctl mode, recover, were implemented, this would allow a
fail safe mode for recovery.

e.g.pg_ctl -D datadir recover

pg_ctl could then check for the existence of a recovery.conf file and
return an error if none were found. This mode would not then need to be
passed through to the postmaster as an option, which could accidentally
be re-invoked later should a crash recovery occur (the main reason to
avoid adding recovery.conf options to postgresql.conf in the first
place).

This mode would do nothing more than:
- check for recovery.conf, if not found, return error
- invoke a start normally, as if mode=start had been requested

The doc for invoking recovery could then be changed to include this new
mode, and the potential for error would be removed.

This is a change requested for 8.0, to ensure that PITR doesn't fall
into disrepute by novice users shooting themselves in the foot. It is a
minor change, effecting only PITR, and only to the extent of a
documentation change and a file existence check in pg_ctl. No server
code would be changed.

Alternative suggestions are welcome. Thoughts?

-- 
Best Regards, Simon Riggs


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] Proposal for Recover mode in pg_ctl (in 8.0)

2004-11-06 Thread Tom Lane
Simon Riggs [EMAIL PROTECTED] writes:
 If a further pg_ctl mode, recover, were implemented, this would allow a
 fail safe mode for recovery.

 e.g.  pg_ctl -D datadir recover

 pg_ctl could then check for the existence of a recovery.conf file and
 return an error if none were found.

I can't get very excited about this approach, because it only protects
those people who (a) use pg_ctl to start the postmaster (not everyone)
and (b) carefully follow the recovery directions (which the people you
are worried about are very bad at, by hypothesis).

A possibly more reliable interlock would involve having the postmaster
probe during normal startup to see if there is already an archived WAL
segment for what it thinks is the current segment.  However there are
several issues here: one is that if you're doing partial-log-file
shipping, that isn't necessarily an error condition; another is that
we don't know how to do such a probe unless more information is added
to postgresql.conf.  We could imagine adding another shell command
string (something like test -f ... perhaps) but if the user gets it
wrong he may still be left with no protection.

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] Proposal for Recover mode in pg_ctl (in 8.0)

2004-11-06 Thread Mark Kirkwood
I like it - nice and simple, but targets a large (and likely) foot gun 
situation.

regards
Mark
Simon Riggs wrote:
If a further pg_ctl mode, recover, were implemented, this would allow a
fail safe mode for recovery.
e.g.pg_ctl -D datadir recover
pg_ctl could then check for the existence of a recovery.conf file and
return an error if none were found. This mode would not then need to be
passed through to the postmaster as an option, which could accidentally
be re-invoked later should a crash recovery occur (the main reason to
avoid adding recovery.conf options to postgresql.conf in the first
place).
This mode would do nothing more than:
- check for recovery.conf, if not found, return error
- invoke a start normally, as if mode=start had been requested
The doc for invoking recovery could then be changed to include this new
mode, and the potential for error would be removed.
This is a change requested for 8.0, to ensure that PITR doesn't fall
into disrepute by novice users shooting themselves in the foot. It is a
minor change, effecting only PITR, and only to the extent of a
documentation change and a file existence check in pg_ctl. No server
code would be changed.
Alternative suggestions are welcome. Thoughts?
 

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: [HACKERS] Proposal for Recover mode in pg_ctl (in 8.0)

2004-11-06 Thread Tom Lane
I wrote:
 A possibly more reliable interlock would involve having the postmaster
 probe during normal startup to see if there is already an archived WAL
 segment for what it thinks is the current segment.

Another and simpler way is to recommend that people use archive_command
strings that won't overwrite an existing archive file.

For instance instead of showing the example
archive_command = 'cp %p /mnt/server/archivedir/%f'
we could show
archive_command = 'test ! -f /mnt/server/archivedir/%f  cp %p 
/mnt/server/archivedir/%f'
or on some machines
archive_command = 'cp -i %p /mnt/server/archivedir/%f /dev/null'

One tricky point is to make sure that the command will return nonzero
status if the target exists.  In some quick testing I found that
the cp -i method doesn't have portable behavior on this point.
I think that the test method is portable across Unixen, but no idea
what to do on Windows.

regards, tom lane

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] Proposal for Recover mode in pg_ctl (in 8.0)

2004-11-06 Thread Mark Kirkwood
While this is nice, it will not help you if the restoration directory is 
different from your archive directory. That is : restore_command in 
recovery.conf fetches from somewhere other than where archive_command in 
postgresql.conf copied.

I am not sure how likely this situation is, but setting up log shipping, 
or maybe recovering from disk failure *might* mean you need to bring the 
saved archive files back from somewhere else.

regards
Mark
Tom Lane wrote:
Another and simpler way is to recommend that people use archive_command
strings that won't overwrite an existing archive file.
For instance instead of showing the example
archive_command = 'cp %p /mnt/server/archivedir/%f'
we could show
archive_command = 'test ! -f /mnt/server/archivedir/%f  cp %p 
/mnt/server/archivedir/%f'
 

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] Proposal for Recover mode in pg_ctl (in 8.0)

2004-11-06 Thread Mark Kirkwood
I was thinking that even mildly experienced folks could benefit from a 
helpful sanity check. Typically the need to recover a system never comes 
at a good time, and features that help prevent silly mistakes are a 
great stress saver.

As an aside, while testing recovery during pre beta, I think I probably 
forgot to put in a recovery.conf about 1 time in 10. Now I was using a 
small database cluster tar'ed up in /tmp, so no big deal, but if it had 
been a 100G beast that had to come off tape 

regards
Mark
Tom Lane wrote:
I can't get very excited about this approach, because it only protects
those people who (a) use pg_ctl to start the postmaster (not everyone)
and (b) carefully follow the recovery directions (which the people you
are worried about are very bad at, by hypothesis).
 

---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org