[PATCHES] doc improv: backup/restore

Neil Conway Wed, 21 Apr 2004 10:25:41 -0700

This patch improves the backup & restore documentation somewhat: I added
a few more cross-refs, documented increasing checkpoint_segments, and
made a few other minor improvements.


I intend to apply this within 24 hours barring any complaints.

-Neil

Index: doc/src/sgml/backup.sgml
===================================================================
RCS file: /var/lib/cvs/pgsql-server/doc/src/sgml/backup.sgml,v
retrieving revision 2.38
diff -c -r2.38 backup.sgml
*** a/doc/src/sgml/backup.sgml	9 Mar 2004 16:57:46 -0000	2.38
--- b/doc/src/sgml/backup.sgml	19 Apr 2004 17:01:14 -0000
***************
*** 30,36 ****
     commands that, when fed back to the server, will recreate the
     database in the same state as it was at the time of the dump.
     <productname>PostgreSQL</> provides the utility program
!    <application>pg_dump</> for this purpose. The basic usage of this
     command is:
  <synopsis>
  pg_dump <replaceable class="parameter">dbname</replaceable> &gt; <replaceable class="parameter">outfile</replaceable>
--- 30,36 ----
     commands that, when fed back to the server, will recreate the
     database in the same state as it was at the time of the dump.
     <productname>PostgreSQL</> provides the utility program
!    <xref linkend="app-pgdump"> for this purpose. The basic usage of this
     command is:
  <synopsis>
  pg_dump <replaceable class="parameter">dbname</replaceable> &gt; <replaceable class="parameter">outfile</replaceable>
***************
*** 126,135 ****
     </para>
  
     <para>
!     Once restored, it is wise to run <command>ANALYZE</> on each
!     database so the optimizer has useful statistics. You
!     can also run <command>vacuumdb -a -z</> to <command>ANALYZE</> all
!     databases.
     </para>
  
     <para>
--- 126,136 ----
     </para>
  
     <para>
!     Once restored, it is wise to run <xref linkend="sql-analyze"
!     endterm="sql-analyze-title"> on each database so the optimizer has
!     useful statistics. You can also run <command>vacuumdb -a -z</> to
!     <command>VACUUM ANALYZE</> all databases; this is equivalent to
!     running <command>VACUUM ANALYZE</command> manually.
     </para>
  
     <para>
***************
*** 153,165 ****
      </para>
     </important>
  
!    <tip>
!     <para>
!     Restore performance can be improved by increasing the
!     configuration parameter <xref
!     linkend="guc-maintenance-work-mem">.
!     </para>
!    </tip>
    </sect2>
  
    <sect2 id="backup-dump-all">
--- 154,164 ----
      </para>
     </important>
  
!    <para>
!     For advice on how to load large amounts of data into
!     <productname>PostgreSQL</productname> efficiently, refer to <xref
!     linkend="populate">.
!    </para>
    </sect2>
  
    <sect2 id="backup-dump-all">
***************
*** 167,178 ****
  
     <para>
      The above mechanism is cumbersome and inappropriate when backing
!     up an entire database cluster. For this reason the
!     <application>pg_dumpall</> program is provided.
      <application>pg_dumpall</> backs up each database in a given
!     cluster, and also preserves cluster-wide data such as
!     users and groups. The call sequence for
!     <application>pg_dumpall</> is simply
  <synopsis>
  pg_dumpall &gt; <replaceable>outfile</>
  </synopsis>
--- 166,176 ----
  
     <para>
      The above mechanism is cumbersome and inappropriate when backing
!     up an entire database cluster. For this reason the <xref
!     linkend="app-pg-dumpall"> program is provided.
      <application>pg_dumpall</> backs up each database in a given
!     cluster, and also preserves cluster-wide data such as users and
!     groups. The basic usage of this command is:
  <synopsis>
  pg_dumpall &gt; <replaceable>outfile</>
  </synopsis>
***************
*** 195,201 ****
      Since <productname>PostgreSQL</productname> allows tables larger
      than the maximum file size on your system, it can be problematic
      to dump such a table to a file, since the resulting file will likely
!     be larger than the maximum size allowed by your system. As
      <application>pg_dump</> can write to the standard output, you can
      just use standard Unix tools to work around this possible problem.
     </para>
--- 193,199 ----
      Since <productname>PostgreSQL</productname> allows tables larger
      than the maximum file size on your system, it can be problematic
      to dump such a table to a file, since the resulting file will likely
!     be larger than the maximum size allowed by your system. Since
      <application>pg_dump</> can write to the standard output, you can
      just use standard Unix tools to work around this possible problem.
     </para>
***************
*** 274,280 ****
      For reasons of backward compatibility, <application>pg_dump</>
      does not dump large objects by default.<indexterm><primary>large
      object</primary><secondary>backup</secondary></indexterm> To dump
!     large objects you must use either the custom or the TAR output
      format, and use the <option>-b</> option in
      <application>pg_dump</>. See the reference pages for details.  The
      directory <filename>contrib/pg_dumplo</> of the
--- 272,278 ----
      For reasons of backward compatibility, <application>pg_dump</>
      does not dump large objects by default.<indexterm><primary>large
      object</primary><secondary>backup</secondary></indexterm> To dump
!     large objects you must use either the custom or the tar output
      format, and use the <option>-b</> option in
      <application>pg_dump</>. See the reference pages for details.  The
      directory <filename>contrib/pg_dumplo</> of the
***************
*** 315,325 ****
       <para>
        The database server <emphasis>must</> be shut down in order to
        get a usable backup. Half-way measures such as disallowing all
!       connections will not work as there is always some buffering
!       going on. Information about stopping the server can be
!       found in <xref linkend="postmaster-shutdown">.  Needless to say
!       that you also need to shut down the server before restoring the
!       data.
       </para>
      </listitem>
  
--- 313,324 ----
       <para>
        The database server <emphasis>must</> be shut down in order to
        get a usable backup. Half-way measures such as disallowing all
!       connections will <emphasis>not</emphasis> work
!       (<command>tar</command> and similar tools do not take an atomic
!       snapshot of the state of the filesystem at a point in
!       time). Information about stopping the server can be found in
!       <xref linkend="postmaster-shutdown">.  Needless to say that you
!       also need to shut down the server before restoring the data.
       </para>
      </listitem>
  
Index: doc/src/sgml/perform.sgml
===================================================================
RCS file: /var/lib/cvs/pgsql-server/doc/src/sgml/perform.sgml,v
retrieving revision 1.43
diff -c -r1.43 perform.sgml
*** a/doc/src/sgml/perform.sgml	25 Mar 2004 18:57:57 -0000	1.43
--- b/doc/src/sgml/perform.sgml	21 Apr 2004 17:11:22 -0000
***************
*** 28,35 ****
      plan</firstterm> for each query it is given.  Choosing the right
      plan to match the query structure and the properties of the data
      is absolutely critical for good performance.  You can use the
!     <command>EXPLAIN</command> command to see what query plan the system
!     creates for any query.
      Plan-reading is an art that deserves an extensive tutorial, which
      this is not; but here is some basic information.
     </para>
--- 28,35 ----
      plan</firstterm> for each query it is given.  Choosing the right
      plan to match the query structure and the properties of the data
      is absolutely critical for good performance.  You can use the
!     <xref linkend="sql-explain" endterm="sql-explain-title"> command
!     to see what query plan the system creates for any query.
      Plan-reading is an art that deserves an extensive tutorial, which
      this is not; but here is some basic information.
     </para>
***************
*** 638,667 ****
     </indexterm>
  
     <para>
!     Turn off autocommit and just do one commit at
!     the end.  (In plain SQL, this means issuing <command>BEGIN</command>
!     at the start and <command>COMMIT</command> at the end.  Some client
!     libraries may do this behind your back, in which case you need to
!     make sure the library does it when you want it done.)
!     If you allow each insertion to be committed separately,
!     <productname>PostgreSQL</productname> is doing a lot of work for each
!     row that is added.
!     An additional benefit of doing all insertions in one transaction
!     is that if the insertion of one row were to fail then the
!     insertion of all rows inserted up to that point would be rolled
!     back, so you won't be stuck with partially loaded data.
     </para>
    </sect2>
  
    <sect2 id="populate-copy-from">
!    <title>Use <command>COPY FROM</command></title>
  
     <para>
!     Use <command>COPY FROM STDIN</command> to load all the rows in one
!     command, instead of using a series of <command>INSERT</command>
!     commands.  This reduces parsing, planning, etc.  overhead a great
!     deal. If you do this then it is not necessary to turn off
!     autocommit, since it is only one command anyway.
     </para>
    </sect2>
  
--- 638,688 ----
     </indexterm>
  
     <para>
!     Turn off autocommit and just do one commit at the end.  (In plain
!     SQL, this means issuing <command>BEGIN</command> at the start and
!     <command>COMMIT</command> at the end.  Some client libraries may
!     do this behind your back, in which case you need to make sure the
!     library does it when you want it done.)  If you allow each
!     insertion to be committed separately,
!     <productname>PostgreSQL</productname> is doing a lot of work for
!     each row that is added.  An additional benefit of doing all
!     insertions in one transaction is that if the insertion of one row
!     were to fail then the insertion of all rows inserted up to that
!     point would be rolled back, so you won't be stuck with partially
!     loaded data.
!    </para>
! 
!    <para>
!     If you are issuing a large sequence of <command>INSERT</command>
!     commands to bulk load some data, also consider using <xref
!     linkend="sql-prepare" endterm="sql-prepare-title"> to create a
!     prepared <command>INSERT</command> statement. Since you are
!     executing the same command multiple times, it is more efficient to
!     prepare the command once and then use <command>EXECUTE</command>
!     as many times as required.
     </para>
    </sect2>
  
    <sect2 id="populate-copy-from">
!    <title>Use <command>COPY</command></title>
! 
!    <para>
!     Use <xref linkend="sql-copy" endterm="sql-copy-title"> to load
!     all the rows in one command, instead of using a series of
!     <command>INSERT</command> commands.  The <command>COPY</command>
!     command is optimized for loading large numbers of rows; it is less
!     flexible than <command>INSERT</command>, but incurs significantly
!     less overhead for large data loads. Since <command>COPY</command>
!     is a single command, there is no need to disable autocommit if you
!     use this method to populate a table.
!    </para>
  
     <para>
!     Note that loading a large number of rows using
!     <command>COPY</command> is almost always faster than using
!     <command>INSERT</command>, even if multiple
!     <command>INSERT</command> commands are batched into a single
!     transaction.
     </para>
    </sect2>
  
***************
*** 678,688 ****
  
     <para>
      If you are augmenting an existing table, you can drop the index,
!     load the table, then recreate the index. Of
!     course, the database performance for other users may be adversely 
!     affected during the time that the index is missing.  One should also
!     think twice before dropping unique indexes, since the error checking
!     afforded by the unique constraint will be lost while the index is missing.
     </para>
    </sect2>
  
--- 699,710 ----
  
     <para>
      If you are augmenting an existing table, you can drop the index,
!     load the table, and then recreate the index. Of course, the
!     database performance for other users may be adversely affected
!     during the time that the index is missing.  One should also think
!     twice before dropping unique indexes, since the error checking
!     afforded by the unique constraint will be lost while the index is
!     missing.
     </para>
    </sect2>
  
***************
*** 701,716 ****
     </para>
    </sect2>
  
    <sect2 id="populate-analyze">
     <title>Run <command>ANALYZE</command> Afterwards</title>
  
     <para>
!     It's a good idea to run <command>ANALYZE</command> or <command>VACUUM
!     ANALYZE</command> anytime you've added or updated a lot of data,
!     including just after initially populating a table.  This ensures that
!     the planner has up-to-date statistics about the table.  With no statistics
!     or obsolete statistics, the planner may make poor choices of query plans,
!     leading to bad performance on queries that use your table.
     </para>
    </sect2>
    </sect1>
--- 723,761 ----
     </para>
    </sect2>
  
+   <sect2 id="populate-checkpoint-segments">
+    <title>Increase <varname>checkpoint_segments</varname></title>
+ 
+    <para>
+     Temporarily increasing the <xref
+     linkend="guc-checkpoint-segments"> configuration variable can also
+     make large data loads faster.  This is because loading a large
+     amount of data into <productname>PostgreSQL</productname> can
+     cause checkpoints to occur more often than the normal checkpoint
+     frequency (specified by the <varname>checkpoint_timeout</varname>
+     configuration variable). Whenever a checkpoint occurs, all dirty
+     pages must be flushed to disk. By increasing
+     <varname>checkpoint_segments</varname> temporarily during bulk
+     data loads, the number of checkpoints that are required can be
+     reduced.
+    </para>
+   </sect2>
+ 
    <sect2 id="populate-analyze">
     <title>Run <command>ANALYZE</command> Afterwards</title>
  
     <para>
!     Whenever you have significantly altered the distribution of data
!     within a table, running <xref linkend="sql-analyze"
!     endterm="sql-analyze-title"> is strongly recommended. This
!     includes when bulk loading large amounts of data into
!     <productname>PostgreSQL</productname>.  Running
!     <command>ANALYZE</command> (or <command>VACUUM ANALYZE</command>)
!     ensures that the planner has up-to-date statistics about the
!     table.  With no statistics or obsolete statistics, the planner may
!     make poor decisions during query planning, leading to poor
!     performance on any tables with inaccurate or nonexistent
!     statistics.
     </para>
    </sect2>
    </sect1>

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

               http://archives.postgresql.org

[PATCHES] doc improv: backup/restore

Reply via email to