A range of minor enhancements, plus some stronger doc around:

- warnings about overwrites
- additional documentation of recovery.conf parameters

To be frank, its still about 50 pages short of a full discussion of how
it works, when to use it, various scenarios, best practice, integration
with XA (or current lack of it), tuning etc... all of which we can
expect to improve over time with help from the community.

-- 
Best Regards, Simon Riggs
Index: backup.sgml
===================================================================
RCS file: /projects/cvsroot/pgsql/doc/src/sgml/backup.sgml,v
retrieving revision 2.48
diff -d -c -r2.48 backup.sgml
*** backup.sgml	30 Sep 2004 10:30:10 -0000	2.48
--- backup.sgml	7 Nov 2004 22:55:52 -0000
***************
*** 416,422 ****
       Since we can string together an indefinitely long sequence of WAL files
       for replay, continuous backup can be had simply by continuing to archive
       the WAL files.  This is particularly valuable for large databases, where
!      making a full backup may take an unreasonable amount of time.
      </para>
     </listitem>
     <listitem>
--- 416,422 ----
       Since we can string together an indefinitely long sequence of WAL files
       for replay, continuous backup can be had simply by continuing to archive
       the WAL files.  This is particularly valuable for large databases, where
!      it may not be convenient to take a full backup frequently.
      </para>
     </listitem>
     <listitem>
***************
*** 443,449 ****
    <para>
     As with the plain filesystem-backup technique, this method can only
     support restoration of an entire database cluster, not a subset.
!    Also, it requires a lot of archival storage: the base backup is bulky,
     and a busy system will generate many megabytes of WAL traffic that
     have to be archived.  Still, it is the preferred backup technique in
     many situations where high reliability is needed.
--- 443,449 ----
    <para>
     As with the plain filesystem-backup technique, this method can only
     support restoration of an entire database cluster, not a subset.
!    Also, it requires a lot of archival storage: the base backup may be bulky,
     and a busy system will generate many megabytes of WAL traffic that
     have to be archived.  Still, it is the preferred backup technique in
     many situations where high reliability is needed.
***************
*** 496,512 ****
      The shell command to use is specified by the <xref
      linkend="guc-archive-command"> configuration parameter, which in practice
      will always be placed in the <filename>postgresql.conf</filename> file.
!     In this string,
      any <literal>%p</> is replaced by the absolute path of the file to
      archive, while any <literal>%f</> is replaced by the file name only.
      Write <literal>%%</> if you need to embed an actual <literal>%</>
      character in the command.  The simplest useful command is something
      like
  <programlisting>
! archive_command = 'cp %p /mnt/server/archivedir/%f'	
  </programlisting>
      which will copy archivable WAL segments to the directory
!     <literal>/mnt/server/archivedir</>.
     </para>
  
     <para>
--- 496,525 ----
      The shell command to use is specified by the <xref
      linkend="guc-archive-command"> configuration parameter, which in practice
      will always be placed in the <filename>postgresql.conf</filename> file.
!    </para>
! 
!    <para>
!     If you are using a Backup and Recovery application that has an
!     interface specifically designed to work with PostgreSQL 8.0, 
!     then you should consult the specific documentation to
!     see how to set the archive_command and to understand any other
!     required actions. If you are developing your own site-specific
!     archive mechanism, then read on to understand the guidelines, restrictions
!     and warnings you will need to follow.
!    </para>
! 
!    <para>
!     In the archive_command string,
      any <literal>%p</> is replaced by the absolute path of the file to
      archive, while any <literal>%f</> is replaced by the file name only.
      Write <literal>%%</> if you need to embed an actual <literal>%</>
      character in the command.  The simplest useful command is something
      like
  <programlisting>
! archive_command = 'cp -i %p /mnt/server/archivedir/%f'	
  </programlisting>
      which will copy archivable WAL segments to the directory
!     <literal>/mnt/server/archivedir</>. This is an example, not a 
!     recommendation, and even this may not work on all platforms.
     </para>
  
     <para>
***************
*** 522,539 ****
      It is important that the archive command return zero exit status if and
      only if it succeeded.  Upon getting a zero result,
      <productname>PostgreSQL</> will assume that the WAL segment file has been
!     successfully archived, and it may be overwritten with new data very
!     soon thereafter.  However, a nonzero status tells
      <productname>PostgreSQL</> that the file was not archived; it will try
      again periodically until it succeeds.
     </para>
  
     <para>
      Speed of the archiving command is not important, so long as it can keep up
!     with the average rate at which your server generates WAL data.  It is okay
!     if the archiving process falls a little behind (or even a lot behind, if
!     you don't mind the <literal>pg_xlog/</> directory filling up with
!     not-yet-archived segment files).
     </para>
  
     <para>
--- 535,603 ----
      It is important that the archive command return zero exit status if and
      only if it succeeded.  Upon getting a zero result,
      <productname>PostgreSQL</> will assume that the WAL segment file has been
!     successfully archived, and that file can then be recycled for reuse within
!     the <literal>pg_xlog/</> directory. However, a nonzero status tells
      <productname>PostgreSQL</> that the file was not archived; it will try
      again periodically until it succeeds.
     </para>
  
     <para>
+     To recap: the management of the archive is completely the administrator's
+     responsibility. You are cautioned to ensure that the following
+     situations should not be allowed to occur:
+    </para>
+ 
+   <itemizedlist>
+    <listitem>
+     <para>The archive_command fails repeatedly because some aspect requires 
+      operator intervention or the archive runs out of space. This could occur 
+      if you write directly to tape without an autochanger, so when the tape 
+      fills, nothing further occurs until the tape is swapped. You are advised
+      to ensure that any error condition or request to a human operator
+      is reported appropriately so that the situation can be 
+      resolved relatively quickly. The <literal>pg_xlog/</> directory will be 
+      continue to fill with WAL segment files until the situation is resolved.
+     </para>
+    </listitem>
+    <listitem>
+     <para>You archive two or more servers to exactly the same archive directory
+      or storage area. You should always ensure that files from different
+      servers can be distinguished from each other, for example by using a
+      database naming scheme to identify a directory for the server.
+     </para>
+    </listitem>
+    <listitem>
+     <para>If you restore a base backup, you do not start the server without
+      using a recovery.conf file. The server could start, but then not
+      rollforward using the archived WAL files. If your base backup also 
+      included the then-current WAL files, then this will succeed. Subsequent
+      server operations will then attempt to overwrite previously archived
+      WAL files.
+     </para>
+    </listitem>
+   </itemizedlist>
+ 
+    <para>
+     The last two situations are only possible if you use an archive_command 
+     that overwrites files without reporting an error. If you have 
+     created your own archive_command then you are advised to unit test
+     your chosen archive_command on its own to check whether attempts to 
+     overwrite files throw errors. If the command chosen does not throw
+     an error on overwrite, you are advised to 
+     put an additional test into the archive_command to check for file
+     existence in the archive before the copy to archive occurs.
+    </para>
+ 
+    <para>
      Speed of the archiving command is not important, so long as it can keep up
!     with the average rate at which your server generates WAL data.  Normal
!     operation continues even if the archiving process falls a little behind.
!     If archiving falls significantly behind this will increase the amount of
!     data lost in the event of a disaster. It will also mean that the
!     <literal>pg_xlog/</> directory will contain large numbers of 
!     not-yet-archived segment files and that could eventually exceed available
!     disk space. You are advised to monitor the archive process to ensure that
!     it is working as you intend.
     </para>
  
     <para>
***************
*** 812,824 ****
      get given the available WAL segments).  But if you want to recover to
      some previous point in time (say, right before the junior DBA dropped your
      main transaction table), just specify the required stopping point in
!     <literal>recovery.conf</>.  You can specify the stop point either by
!     date/time or completion of a specific transaction ID.  The stop
!     specification can be inclusive or exclusive.  As of this writing 
      only the date/time option is very usable, since there are no tools 
!     to help you identify which transaction ID to use.  Keep in mind 
!     that while transaction IDs are asigned sequentially at transaction 
!     start, transactions can complete in a different numeric order.
     </para>
     <para>
      Note that the stop point must be after the ending time of the backup
--- 876,887 ----
      get given the available WAL segments).  But if you want to recover to
      some previous point in time (say, right before the junior DBA dropped your
      main transaction table), just specify the required stopping point in
!     <literal>recovery.conf</>.  You can specify the stop point, known as the
!     recovery_target, either by date/time or completion of a specific 
!     transaction ID.  The stop specification can be inclusive or exclusive.  
!     As of this writing 
      only the date/time option is very usable, since there are no tools 
!     to help you identify with any accuracy which transaction ID to use.  
     </para>
     <para>
      Note that the stop point must be after the ending time of the backup
***************
*** 827,832 ****
--- 890,1014 ----
      recover to such a time, you must go back to your previous base backup
      and roll forward from there.)
     </para>
+ 
+     <sect3 id="recovery-config-settings">
+      <title>Recovery Settings</title>
+ 
+        <para>
+         These settings can only be made in the 
+         <filename>recovery.conf</filename> file, and apply only for the
+         duration of the recovery. They must be reset for any subsequent 
+         recovery you wish to perform. They cannot be changed once recovery 
+         has begun.
+        </para>
+ 
+      <variablelist>
+ 
+      <varlistentry id="restore-command" xreflabel="restore_command">
+       <term><varname>restore_command</varname> (<type>string</type>)</term>
+       <listitem>
+        <para>
+         The shell command to execute to restore an archived segment of
+         the WAL file series. This parameter is required.
+         Any <literal>%p</> in the string is
+         replaced by the absolute path of the file to archive, and any
+         <literal>%f</> is replaced by the file name only. Use
+         <literal>%%</> to embed an actual <literal>%</> character in the
+         command. 
+        </para>
+        <para>
+         It is important for the command to return a zero exit status only if
+         it succeeds. If recovery to end of logs, the command *will* be 
+         eventually be asked for a WAL file that is not present in the archive; 
+         it must return nonzero when so asked.
+         Examples:
+ <programlisting>
+ archive_command = 'cp /mnt/server/archivedir/"%f" "%p"'
+ archive_command = 'copy /mnt/server/archivedir/"%f" "%p"'  # Win32
+ </programlisting>
+        </para>
+       </listitem>
+      </varlistentry>
+ 
+      <varlistentry id="recovery-target-time" xreflabel="recovery_target_time">
+       <term><varname>recovery_target_time</varname> 
+            (<type>timestamp</type>)
+       </term>
+       <listitem>
+        <para>
+         All commits or aborts are recorded with a timestamp in the WAL logs.
+         This parameter specifies the timestamp up to which all commits or aborts
+         will be recovered. It is compared with the timestamp as shown in the
+         WAL logs; no other time or timezone matters - be careful to specify the 
+         timezone the database was in, rather than the one the recovery system
+         is in. The accuracy of the timestamp may differ according to platform,
+         though you might expect this to be in milliseconds. 
+         An optional parameter, since the default is to recover to end of logs.
+         If set, it is mutually exclusive with 
+         <xref linkend="recovery-target-xid">.
+         The precise stopping point is also influenced by 
+         <xref linkend="recovery-target-inclusive">
+        </para>
+       </listitem>
+      </varlistentry>
+ 
+      <varlistentry id="recovery-target-xid" xreflabel="recovery_target_xid">
+       <term><varname>recovery_target_xid</varname> (<type>string</type>)</term>
+       <listitem>
+        <para>
+         This parameter specifies the transaction id up to which recovery
+         will proceed, applying records in the order they complete. Keep in mind 
+         that while transaction IDs are asigned sequentially at transaction 
+         start, transactions can complete in a different numeric order.
+         An optional parameter, since the default is to recover to end of logs.
+         If set, it is mutually exclusive with 
+         <xref linkend="recovery-target-time">.
+         The precise stopping point is also influenced by 
+         <xref linkend="recovery-target-inclusive">
+        </para>
+       </listitem>
+      </varlistentry>
+ 
+      <varlistentry id="recovery-target-inclusive" 
+                    xreflabel="recovery_target_inclusive">
+       <term><varname>recovery_target_inclusive</varname> 
+         (<type>boolean</type>)
+       </term>
+       <listitem>
+        <para>
+         Specifies whether we stop just after the recovery target specified
+         (<literal>true</literal>), or just before the recovery target 
+         (exclusive, <literal>false</literal>). 
+         Applies to both <xref linkend="recovery-target-time">
+         and <xref linkend="recovery-target-time">, whichever one is
+         specified for this recovery. The default recovery is to the end of the 
+         log and in this case the default for this parameter is 
+         <literal>true</literal>.
+        </para>
+       </listitem>
+      </varlistentry>
+ 
+      <varlistentry id="recovery-target-timeline" 
+                    xreflabel="recovery_target_timeline">
+       <term><varname>recovery_target_timeline</varname> 
+         (<type>string</type>)
+       </term>
+       <listitem>
+        <para>
+         An optional parameter for use in some recovery situations that are
+         not normally expected to occur. The default setting of the parameter
+         is most usually the correct one and you should not worry about 
+         setting this correctly when first performing a recovery. It may
+         become important to set this when performing a re-recovery.
+        </para>
+       </listitem>
+      </varlistentry>
+ 
+    </variablelist>
+ 
+    </sect3>
+ 
+ 
    </sect2>
  
    <sect2 id="backup-timelines">
---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Reply via email to