Re: [PATCHES] [BUGS] Incomplete docs for restore_command for hot standby

2008-03-28 Thread Simon Riggs
On Mon, 2008-02-25 at 17:56 +0600, Markus Bertheau wrote:
 2008/2/22, Simon Riggs [EMAIL PROTECTED]:

  If you have some suggested changes, I'd be happy to hear them.
 
   Probably additions are better than just changes though.
 
 What about this:
 
 *** a/doc/src/sgml/backup.sgml
 --- b/doc/src/sgml/backup.sgml
 ***

...

 The FIXME of course needs replacement by someone in the know.

Doc patch edited to include all of Markus' points, tidy up some related
text and fix typos.

Good to apply to HEAD.

-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com 

  PostgreSQL UK 2008 Conference: http://www.postgresql.org.uk
Index: doc/src/sgml/backup.sgml
===
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/doc/src/sgml/backup.sgml,v
retrieving revision 2.115
diff -c -r2.115 backup.sgml
*** doc/src/sgml/backup.sgml	7 Mar 2008 01:46:41 -	2.115
--- doc/src/sgml/backup.sgml	28 Mar 2008 13:08:38 -
***
*** 577,587 
 para
  It is important that the archive command return zero exit status if and
  only if it succeeded.  Upon getting a zero result,
! productnamePostgreSQL/ will assume that the WAL segment file has been
! successfully archived, and will remove or recycle it.
! However, a nonzero status tells
! productnamePostgreSQL/ that the file was not archived; it will try
! again periodically until it succeeds.
 /para
  
 para
--- 577,586 
 para
  It is important that the archive command return zero exit status if and
  only if it succeeded.  Upon getting a zero result,
! productnamePostgreSQL/ will assume that the file has been
! successfully archived, and will remove or recycle it.  However, a nonzero 
! status tells productnamePostgreSQL/ that the file was not archived; 
! it will try again periodically until it succeeds.
 /para
  
 para
***
*** 1001,1011 
  
 para
  It is important that the command return nonzero exit status on failure.
! The command emphasiswill/ be asked for log files that are not present
  in the archive; it must return nonzero when so asked.  This is not an
! error condition.  Be aware also that the base name of the literal%p/
! path will be different from literal%f/; do not expect them to be
! interchangeable.
 /para
  
 para
--- 1000,1012 
  
 para
  It is important that the command return nonzero exit status on failure.
! The command emphasiswill/ be asked for files that are not present
  in the archive; it must return nonzero when so asked.  This is not an
! error condition.  Not all of the requested files will be WAL segment
! files. You should also expect requests for files with a suffix of 
! literal.backup/ or literal.history/. Also be aware also that
! the base name of the literal%p/ path will be different from 
! literal%f/; do not expect them to be interchangeable.
 /para
  
 para
***
*** 1576,1594 
  
 para
  The magic that makes the two loosely coupled servers work together is
! simply a varnamerestore_command/ used on the standby that waits
! for the next WAL file to become available from the primary. The
! varnamerestore_command/ is specified in the
  filenamerecovery.conf/ file on the standby server. Normal recovery
  processing would request a file from the WAL archive, reporting failure
  if the file was unavailable.  For standby processing it is normal for
! the next file to be unavailable, so we must be patient and wait for
! it to appear. A waiting varnamerestore_command/ can be written as
! a custom script that loops after polling for the existence of the next
! WAL file. There must also be some way to trigger failover, which should
! interrupt the varnamerestore_command/, break the loop and return
! a file-not-found error to the standby server. This ends recovery and
! the standby will then come up as a normal server.
 /para
  
 para
--- 1577,1598 
  
 para
  The magic that makes the two loosely coupled servers work together is
! simply a varnamerestore_command/ used on the standby that,
! when asked for the next WAL file, waits for it to become available from
! the primary. The varnamerestore_command/ is specified in the
  filenamerecovery.conf/ file on the standby server. Normal recovery
  processing would request a file from the WAL archive, reporting failure
  if the file was unavailable.  For standby processing it is normal for
! the next WAL file to be unavailable, so we must be patient and wait for
! it to appear. For files ending in literal.backup/ or 
! literal.history/ there is no need to wait, though a non-zero return
! code should also be returned in this case. A waiting 
! varnamerestore_command/ can be written as a custom script that loops
! 

Re: [PATCHES] [BUGS] Incomplete docs for restore_command for hot standby

2008-03-03 Thread Bruce Momjian

Your patch has been added to the PostgreSQL unapplied patches list at:

http://momjian.postgresql.org/cgi-bin/pgpatches

It will be applied as soon as one of the PostgreSQL committers reviews
and approves it.

---


Markus Bertheau wrote:
 2008/2/22, Simon Riggs [EMAIL PROTECTED]:
  On Thu, 2008-02-21 at 08:01 +0600, Markus Bertheau wrote:
   
Section 24.3.3.1 states about restore_command:
   
The command will be asked for file names that are not present in the
archive; it must return nonzero when so asked.
   
Section 24.4.1 further states:
   
The magic that makes the two loosely coupled servers work together is
simply a restore_command used on the standby that waits for the next
WAL file to become available from the primary.
   
It is not clear from the first paragraph, whether the non-existing
file that restore_command is being asked for is a not-yet-generated
WAL file or something different. If it was a not-yet-generated WAL
file, restore_command for replication would have to wait for it to
appear. If it was something different, restore_command for replication
would have to return an error right away. (Because else it would hang
indefinitely, waiting for a file that is not going to appear). Yet I
couldn't find hints in the documentation as to how these two cases can
be detected by restore_command, i.e. how restore_command should tell a
request for a WAL file from a request for a non-WAL file.
 
 
  The two sentences aren't mutually exclusive, especially when you
   consider they are discussing two different use cases. Why not read up on
   pg_standby anyway?
 
 I read about pg_standby, but this is not about solving a particular problem 
 but
 about missing information in the docs.
 
Practice (http://archives.postgresql.org/sydpug/2006-10/msg1.php)
shows that this is a problem, and people use unproved heuristics
('history' substring in the requested file name).
 
 
  Old email written during beta. Read at your own peril.
 
 The email may be old, but the problem at hand is still relevant.
 
Additionally, 24.3.3 contains slightly misleading information:
   
It is important that the command return nonzero exit status on
failure. The command will be asked for log files that are not present
in the archive; it must return nonzero when so asked. This is not an
error condition.
   
This suggests that all non-existing files that restore_command will be
asked for are log files. One could therefore reasonably assume that
restore_command for replication should wait on all non-existing files.
24.3.3.1 later corrects this by stating that not only log files may be
requested, but nevertheless.
 
 
  If you have some suggested changes, I'd be happy to hear them.
 
   Probably additions are better than just changes though.
 
 What about this:
 
 *** a/doc/src/sgml/backup.sgml
 --- b/doc/src/sgml/backup.sgml
 ***
 *** 1001,1011  restore_command = 'cp /mnt/server/archivedir/%f %p'
 
  para
   It is important that the command return nonzero exit status on failure.
 ! The command emphasiswill/ be asked for log files that are not 
 present
 ! in the archive; it must return nonzero when so asked.  This is not an
 ! error condition.  Be aware also that the base name of the literal%p/
 ! path will be different from literal%f/; do not expect them to be
 ! interchangeable.
  /para
 
  para
 --- 1001,1011 
 
  para
   It is important that the command return nonzero exit status on failure.
 ! The command emphasiswill/ be asked for log and other files that are
 ! not present in the archive; it must return nonzero when so asked.  This 
 is
 ! not an error condition.  Be aware also that the base name of the
 ! literal%p/ path will be different from literal%f/; do not expect
 ! them to be interchangeable.
  /para
 
  para
 ***
 *** 1576,1594  archive_command = 'local_backup_script.sh'
 
  para
   The magic that makes the two loosely coupled servers work together is
 ! simply a varnamerestore_command/ used on the standby that waits
 ! for the next WAL file to become available from the primary. The
 ! varnamerestore_command/ is specified in the
   filenamerecovery.conf/ file on the standby server. Normal recovery
   processing would request a file from the WAL archive, reporting failure
   if the file was unavailable.  For standby processing it is normal for
 ! the next file to be unavailable, so we must be patient and wait for
 ! it to appear. A waiting varnamerestore_command/ can be written as
 ! a custom script that loops after polling for the existence of the next
 ! WAL file. There must also be some way to trigger failover, which should
 ! interrupt the