Hey,
I originally sent this to -hackers but Bruce kindly pointed me this way.
Attached is my rework of the backup.sgml. I have only worked on the SQL
DUMP and related sections. I left archiving etc... alone. I also didn't
do much with pg_dumpall. I focused on adding up to date information and
removing conversational tone.
The rest is pretty obvious. I am interested feedback before I continue.
Yes, I added -C to the basic pg_dump command. Without it, a newbie will
get an error on restore. I also removed all redirection etc... I know it
is the unix/linux way but flags are what the normies are going to
understand.
Thanks!
JD
--
Command Prompt, Inc. - http://www.commandprompt.com/ 503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Announcing "I'm offended" is basically telling the world you can't
control your own emotions, so everyone else should do it for you.
diff --git a/doc/src/sgml/backup.sgml b/doc/src/sgml/backup.sgml
index def43a2..cdc288b 100644
--- a/doc/src/sgml/backup.sgml
+++ b/doc/src/sgml/backup.sgml
@@ -6,19 +6,17 @@
<indexterm zone="backup"><primary>backup</></>
<para>
- As with everything that contains valuable data, <productname>PostgreSQL</>
- databases should be backed up regularly. While the procedure is
- essentially simple, it is important to have a clear understanding of
+ <productname>PostgreSQL</> databases should be backed up regularly.
+ The procedure is simple but it is important to have a clear understanding of
the underlying techniques and assumptions.
</para>
<para>
- There are three fundamentally different approaches to backing up
+ There are four different approaches to backing up
<productname>PostgreSQL</> data:
<itemizedlist>
<listitem><para><acronym>SQL</> dump</para></listitem>
- <listitem><para>File system level backup</para></listitem>
- <listitem><para>Continuous archiving</para></listitem>
+ <listitem><para><acronym>PITR</>, Point in Time Recovery</para></listitem>
</itemizedlist>
Each has its own strengths and weaknesses; each is discussed in turn
in the following sections.
@@ -28,141 +26,176 @@
<title><acronym>SQL</> Dump</title>
<para>
- The idea behind this dump method is to generate a file with SQL
- commands that, when fed back to the server, will recreate the
- database in the same state as it was at the time of the dump.
- <productname>PostgreSQL</> provides the utility program
- <xref linkend="app-pgdump"> for this purpose. The basic usage of this
- command is:
+ <productname>PostgreSQL</> provides the program <xref linkend="app-pgdump"> for
+ generating a backup file with SQL commands that, when fed back to the server,
+ will recreate the database in the same state as it was at the time of the dump.
+ The basic usage of <xref linkend="app-pgdump"> is:
<synopsis>
-pg_dump <replaceable class="parameter">dbname</replaceable> > <replaceable class="parameter">outfile</replaceable>
+pg_dump <replaceable class="parameter">-C</replaceable> <replaceable
+class="parameter">-F</replaceable> <option>p</option> <replaceable
+class="parameter">-f</replaceable > <option>outfile</option> <option>dbname</option>
</synopsis>
- As you see, <application>pg_dump</> writes its result to the
- standard output. We will see below how this can be useful.
- While the above command creates a text file, <application>pg_dump</>
- can create files in other formats that allow for parallism and more
- fine-grained control of object restoration.
+
+ The use of <option>-C</option> ensures that the dump file will
+ contain the requisite <xref linkend="sql-createdatabase"> command within the dump file. The use of
+ <replaceable class="parameter">-F</replaceable><option>p</option> ensures that you are using the plain
+ text format and the use of <replaceable class="parameter">-f</replaceable> allows you
+ to specify the name of the file the dump will be written to. It is also possible
+ for <application>pg_dump</> to create files in other formats that allow for parallelism
+ and fine-grained control of object backup or restoration. For more details on all options
+ available to <application>pg_dump</> please refer to the <xref linkend="app-pgdump"> reference page.
</para>
<para>
- <application>pg_dump</> is a regular <productname>PostgreSQL</>
- client application (albeit a particularly clever one). This means
- that you can perform this backup procedure from any remote host that has
- access to the database. But remember that <application>pg_dump</>
- does not operate with special permissions. In particular, it must
- have read access to all tables that you want to back up, so in order
- to back up the entire database you almost always have to run it as a
- database superuser. (If you do not have sufficient privileges to back up
- the entire database, you can still back up portions of the database to which
- you do have access using options such as
- <option>-n <replaceable>schema</replaceable></option>
- or <option>-t <replaceable>table</replaceable></option>.)
+ The <application>pg_dump</> application requires read access to all objects within the
+ database that it will be operating with. This generally requires database
+ super-user access. It is possible for any database user to use
+ pg_dump to backup the objects that they own regardless of super-user access. This
+ can be achieved using options such as <replaceable class="parameter">-n</replaceable> <option>schema</option>
+ or <replaceable class="parameter">-t</replaceable> <option>table</option>.
</para>
+
<para>
- To specify which database server <application>pg_dump</> should
- contact, use the command line options <option>-h
- <replaceable>host</></> and <option>-p <replaceable>port</></>. The
- default host is the local host or whatever your
- <envar>PGHOST</envar> environment variable specifies. Similarly,
- the default port is indicated by the <envar>PGPORT</envar>
- environment variable or, failing that, by the compiled-in default.
- (Conveniently, the server will normally have the same compiled-in
- default.)
+ The primary advantage of using <application>pg_dump</> over the other backup
+ methods described is that <application>pg_dump</> output is architecture
+ independent. A backup made with <application>pg_dump</> can generally
+ be moved between operating systems and different architectures (32bit, 64bit,
+ Sparc, Intel). Whereas file-level backups and continuous archiving are both
+ server-version-specific.
+ </para>
+
+ <para>
+ The text files created by <application>pg_dump</> are internally consistent,
+ meaning, the dump represents a snapshot of the database at the time
+ <application>pg_dump</> began running. <application>pg_dump</> does not
+ block other operations on the database while it is working.
+ (Exceptions are those operations that need to operate with an
+ exclusive lock, such as most forms of <command>ALTER TABLE</command>.)
</para>
+<note>
<para>
Like any other <productname>PostgreSQL</> client application,
<application>pg_dump</> will by default connect with the database
user name that is equal to the current operating system user name. To override
- this, either specify the <option>-U</option> option or set the
- environment variable <envar>PGUSER</envar>. Remember that
- <application>pg_dump</> connections are subject to the normal
- client authentication mechanisms (which are described in <xref
- linkend="client-authentication">).
+ this, either specify the <replaceable class="parameter">-U</replaceable> option or set the
+ environment variable <envar>PGUSER</envar>.
</para>
+</note>
+<sect2 id="advanced-pgdump">
+ <title>Advanced pg_dump</title>
+
<para>
- An important advantage of <application>pg_dump</> over the other backup
- methods described later is that <application>pg_dump</>'s output can
- generally be re-loaded into newer versions of <productname>PostgreSQL</>,
- whereas file-level backups and continuous archiving are both extremely
- server-version-specific. <application>pg_dump</> is also the only method
- that will work when transferring a database to a different machine
- architecture, such as going from a 32-bit to a 64-bit server.
+ The <application>pg_dump</application> application provides other formats. The most notable
+ are the use of <replaceable class="parameter">-F</replaceable> <option>c</option> or
+ <replaceable class="parameter">-F</replaceable> <option>d</option>. The use of the
+ <literal>custom</literal> format (<replaceable class="parameter">-F</replaceable> <option>c</option>)
+ is an excellent option for smaller databases when you need fine grained control of the objects
+ you chose to restore. The use of the <literal>directory</literal> format (<replaceable
+ class="parameter">-F</replaceable> <option>d</option>) allows for parallel connection based backups.
+ If you are performing a backup with many objects and using pg_dump then the
+ <literal>directory</literal> format will be the most efficient. This option also allows for fine
+ grained control of the objects you chose to restore.
</para>
<para>
- Dumps created by <application>pg_dump</> are internally consistent,
- meaning, the dump represents a snapshot of the database at the time
- <application>pg_dump</> began running. <application>pg_dump</> does not
- block other operations on the database while it is working.
- (Exceptions are those operations that need to operate with an
- exclusive lock, such as most forms of <command>ALTER TABLE</command>.)
+ If <productname>PostgreSQL</productname> was built on a system with the
+ <application>zlib</> compression library installed, the custom dump
+ format will compress data as it writes it to the output file. This will
+ produce dump file sizes similar to using <command>gzip</command>, but it
+ has the added advantage that tables can be restored selectively.
+ </para>
+
+ <example><title>Backup a single table</title>
+ <para>
+ <synopsis>
+ pg_dump <replaceable class="parameter">-U</replaceable> <option>user</option> <replaceable
+ class="parameter">-h</replaceable> <option>host1</option> <replaceable
+ class="parameter">-F</replaceable> <option>c</option> <replaceable
+ class="parameter">-f</replaceable> <option>outfile</option> <replaceable
+ class="parameter">-t</replaceable> <option>table1</option> <option>dbname</option>
+ </synopsis>
</para>
+ </example>
+ <example><title>Using wildcards with table list</title>
+ <para>
+ <synopsis>
+ pg_dump <replaceable class="parameter">-U</replaceable> <option>user</option> <replaceable
+ class="parameter">-h</replaceable> <option>host1</option> <replaceable
+ class="parameter">-F</replaceable> <option>c</option> <replaceable
+ class="parameter">-f</replaceable> <option>outfile</option> <replaceable
+ class="parameter">-t</replaceable> <option>table*</option> <option>dbname</option>
+ </synopsis>
+ </para>
+ </example>
+
+ <example><title>Using parallelism and a wildcard table list</title>
+ <para>
+ <synopsis>
+ pg_dump <replaceable class="parameter">-U</replaceable> <option>user</option> <replaceable
+ class="parameter">-h</replaceable> <option>host1</option> <replaceable
+ class="parameter">-F</replaceable> <option>d</option> <replaceable
+ class="parameter">-f</replaceable> <option>outfile</option> <replaceable
+ class="parameter">-t</replaceable> <option>table*</option> <replaceable
+ class="parameter">-j</replaceable> <option>8</option> <option>dbname</option>
+ </synopsis>
+ </para>
+ </example>
+
+ <note>
+ <para>
+ The use of the <literal>custom</literal> or <literal>directory</literal> <application>pg_dump</application> formats
+ requires the use of <xref linkend="app-pgrestore"> and will not work with <xref linkend="app-psql">. There is
+ more information on using <application>pg_restore</application> in section <xref linkend="advanced-restore-pgdump">.
+ </para>
+ </note>
+ </sect2>
+
<sect2 id="backup-dump-restore">
<title>Restoring the Dump</title>
<para>
- Text files created by <application>pg_dump</> are intended to
- be read in by the <application>psql</application> program. The
- general command form to restore a dump is
+ The <application>psql</application> application is the default client that ships with
+ <productname>PostgreSQL</productname>. It is also the default application used when restoring
+ text based dumps created by the <application>pg_dump</application> application. For
+ details information on <application>psql</application> please see <xref linkend="app-psql">.
+ For the purposes of restoring a dump the basic usage is:
+ </para>
+
<synopsis>
-psql <replaceable class="parameter">dbname</replaceable> < <replaceable class="parameter">infile</replaceable>
+psql <replaceable
+class="parameter">-f</replaceable> <option>infile</option> <replaceable
+class="parameter">-d</replaceable> <option>dbname</option>
</synopsis>
- where <replaceable class="parameter">infile</replaceable> is the
- file output by the <application>pg_dump</> command. The database <replaceable
- class="parameter">dbname</replaceable> will not be created by this
- command, so you must create it yourself from <literal>template0</>
- before executing <application>psql</> (e.g., with
- <literal>createdb -T template0 <replaceable
- class="parameter">dbname</></literal>). <application>psql</>
- supports options similar to <application>pg_dump</> for specifying
- the database server to connect to and the user name to use. See
- the <xref linkend="app-psql"> reference page for more information.
- Non-text file dumps are restored using the <xref
- linkend="app-pgrestore"> utility.
- </para>
- <para>
- Before restoring an SQL dump, all the users who own objects or were
- granted permissions on objects in the dumped database must already
- exist. If they do not, the restore will fail to recreate the
- objects with the original ownership and/or permissions.
- (Sometimes this is what you want, but usually it is not.)
- </para>
-
- <para>
- By default, the <application>psql</> script will continue to
- execute after an SQL error is encountered. You might wish to run
- <application>psql</application> with
- the <literal>ON_ERROR_STOP</> variable set to alter that
- behavior and have <application>psql</application> exit with an
- exit status of 3 if an SQL error occurs:
-<programlisting>
-psql --set ON_ERROR_STOP=on dbname < infile
-</programlisting>
- Either way, you will only have a partially restored database.
- Alternatively, you can specify that the whole dump should be
- restored as a single transaction, so the restore is either fully
- completed or fully rolled back. This mode can be specified by
- passing the <option>-1</> or <option>--single-transaction</>
- command-line options to <application>psql</>. When using this
- mode, be aware that even a minor error can rollback a
- restore that has already run for many hours. However, that might
- still be preferable to manually cleaning up a complex database
- after a partially restored dump.
- </para>
-
- <para>
- The ability of <application>pg_dump</> and <application>psql</> to
- write to or read from pipes makes it possible to dump a database
- directly from one server to another, for example:
-<programlisting>
-pg_dump -h <replaceable>host1</> <replaceable>dbname</> | psql -h <replaceable>host2</> <replaceable>dbname</>
-</programlisting>
+ <note>
+ <para>
+ If you omitted <replaceable class="parameter">-C</replaceable> when
+ executing <application>pg_dump</application> the <command>CREATE DATABASE</command>
+ command will not be in the text file. You will need to create the database
+ yourself from <literal>template0</> before the executing the restore
+ (e.g., with
+ <application>createdb</application> <replaceable
+ class="parameter">-T</replaceable> <option>template0</option> <option>dbname</option>).
+ </para>
+ </note>
+
+ <warning>
+ <para>
+ <application>pg_dump</application> does not backup users, roles and
+ other global objects. To properly backup global objects you must use
+ <xref linkend="app-pg-dumpall"> with the <replaceable class="parameter">-g
+ </replaceable> parameter. If you do not restore the globals before the
+ text based dump, the database will implicitly restore all objects as the
+ owner passed by <replaceable class="parameter">-U</replaceable>
+ <option>username</option>. If <replaceable class="parameter">-U</replaceable>
+ is not passed then the operating system user executing <application>psql</application>
+ will be used.
</para>
+ </warning>
<important>
<para>
@@ -177,7 +210,7 @@ pg_dump -h <replaceable>host1</> <replaceable>dbname</> | psql -h <replaceable>h
</important>
<para>
- After restoring a backup, it is wise to run <xref
+ After restoring a backup, one should execute <xref
linkend="sql-analyze"> on each
database so the query optimizer has useful statistics;
see <xref linkend="vacuum-for-statistics">
@@ -187,7 +220,60 @@ pg_dump -h <replaceable>host1</> <replaceable>dbname</> | psql -h <replaceable>h
linkend="populate">.
</para>
</sect2>
-
+
+ <sect2 id="advanced-restore-pgdump">
+ <title>Advanced restore</title>
+ <para>
+ <example><title>Using pipes to restore to new server</title>
+ <para>
+ <synopsis>
+ pg_dump <replaceable class="parameter">-h</replaceable> <option>host1</option> <replaceable class="parameter">-d</replaceable> <option>dbname</option> | psql <replaceable class="parameter">-h</replaceable> <option>host2</option> <replaceable class="parameter">-d</replaceable> <option>dbname</option>
+ </synopsis>
+ </para>
+ </example>
+ </para>
+
+ <para>
+ If one is using the <literal>custom</literal>, <literal>directory</literal> or <literal>tar</literal> formats the restore command is <application><xref linkend="app-pgrestore"></application>.
+ The <application>pg_restore</application> program has many benefits over the use <application>psql</application> including fine grained object restore and parallelism.
+ </para>
+ <example><title>Extracting a text dump from a custom format backup</title>
+ <para> The following will extract the backup to the standard output. The use
+ of <replaceable class="parameter">-F</replaceable> is optional as <application>
+ pg_restore</application> should be able to detect the format.
+ </para>
+ <synopsis>
+ pg_restore <replaceable class="parameter">-F</replaceable> <option>c</option> <option>infile</option>
+ </synopsis>
+ </example>
+
+ <example><title>Restoring a single table</title>
+ <synopsis>
+ pg_restore <replaceable
+ class="parameter">-U</replaceable> <option>username</option> <replaceable
+ class="parameter">-h</replaceable> <option>host1</option> <replaceable
+ class="parameter">-d</replaceable> <option>dbname</option> <replaceable
+ class="parameter">-t</replaceable> <option>table</option> <option>infile</option>
+ </synopsis>
+ </example>
+
+ <example><title>Using parallelism to restore databases</title>
+ <para>The use of parallelism will normally allow databases to restore much
+ faster than a single connection based restore. The restore will only execute
+ as quickly as it can restore your largest table but for databases with many
+ objects it is the fastest <application>pg_dump</application> based restore.
+ </para>
+ <synopsis>
+ pg_restore <replaceable
+ class="parameter">-U</replaceable> <option>username</option> <replaceable
+ class="parameter">-h</replaceable> <option>host1</option> <replaceable
+ class="parameter">-d</replaceable> <option>dbname</option> <replaceable
+ class="parameter">-t</replaceable> <option>table</option> <replaceable
+ class="parameter">-j</replaceable> <option>8</option> <option>infile</option>
+ </synopsis>
+ </example>
+ </sect2>
+
<sect2 id="backup-dump-all">
<title>Using <application>pg_dumpall</></title>
@@ -207,13 +293,9 @@ pg_dumpall > <replaceable>outfile</>
<synopsis>
psql -f <replaceable class="parameter">infile</replaceable> postgres
</synopsis>
- (Actually, you can specify any existing database name to start from,
- but if you are loading into an empty cluster then <literal>postgres</>
- should usually be used.) It is always necessary to have
- database superuser access when restoring a <application>pg_dumpall</>
- dump, as that is required to restore the role and tablespace information.
- If you use tablespaces, make sure that the tablespace paths in the
- dump are appropriate for the new installation.
+ It is necessary to have database superuser access when using a
+ <application>pg_dumpall</> dump. The superuser acess is required
+ to restore the role and tablespace information.
</para>
<para>
@@ -230,231 +312,47 @@ psql -f <replaceable class="parameter">infile</replaceable> postgres
This is necessary to fully backup the cluster if running the
<application>pg_dump</> command on individual databases.
</para>
- </sect2>
+
+ <note><para>
+ If you use tablespaces, make sure that the tablespace paths in the
+ dump are appropriate for the new installation.
+ </para>
+ </note>
+</sect2>
<sect2 id="backup-dump-large">
<title>Handling Large Databases</title>
-
- <para>
- Some operating systems have maximum file size limits that cause
- problems when creating large <application>pg_dump</> output files.
- Fortunately, <application>pg_dump</> can write to the standard
- output, so you can use standard Unix tools to work around this
- potential problem. There are several possible methods:
- </para>
-
- <formalpara>
- <title>Use compressed dumps.</title>
- <para>
- You can use your favorite compression program, for example
- <application>gzip</application>:
-
-<programlisting>
-pg_dump <replaceable class="parameter">dbname</replaceable> | gzip > <replaceable class="parameter">filename</replaceable>.gz
-</programlisting>
-
- Reload with:
-
-<programlisting>
-gunzip -c <replaceable class="parameter">filename</replaceable>.gz | psql <replaceable class="parameter">dbname</replaceable>
-</programlisting>
-
- or:
-
-<programlisting>
-cat <replaceable class="parameter">filename</replaceable>.gz | gunzip | psql <replaceable class="parameter">dbname</replaceable>
-</programlisting>
- </para>
- </formalpara>
-
- <formalpara>
- <title>Use <command>split</>.</title>
- <para>
- The <command>split</command> command
- allows you to split the output into smaller files that are
- acceptable in size to the underlying file system. For example, to
- make chunks of 1 megabyte:
-
-<programlisting>
-pg_dump <replaceable class="parameter">dbname</replaceable> | split -b 1m - <replaceable class="parameter">filename</replaceable>
-</programlisting>
-
- Reload with:
-
-<programlisting>
-cat <replaceable class="parameter">filename</replaceable>* | psql <replaceable class="parameter">dbname</replaceable>
-</programlisting>
- </para>
- </formalpara>
-
- <formalpara>
- <title>Use <application>pg_dump</>'s custom dump format.</title>
- <para>
- If <productname>PostgreSQL</productname> was built on a system with the
- <application>zlib</> compression library installed, the custom dump
- format will compress data as it writes it to the output file. This will
- produce dump file sizes similar to using <command>gzip</command>, but it
- has the added advantage that tables can be restored selectively. The
- following command dumps a database using the custom dump format:
-
-<programlisting>
-pg_dump -Fc <replaceable class="parameter">dbname</replaceable> > <replaceable class="parameter">filename</replaceable>
-</programlisting>
-
- A custom-format dump is not a script for <application>psql</>, but
- instead must be restored with <application>pg_restore</>, for example:
-
-<programlisting>
-pg_restore -d <replaceable class="parameter">dbname</replaceable> <replaceable class="parameter">filename</replaceable>
-</programlisting>
-
- See the <xref linkend="app-pgdump"> and <xref
- linkend="app-pgrestore"> reference pages for details.
- </para>
- </formalpara>
-
<para>
- For very large databases, you might need to combine <command>split</>
- with one of the other two approaches.
+ The act of backing up a normal sized database is relatively simple.
+ The act of backing up a large database (>500GB) can be challenging.
+ Fortunately, <productname>PostgreSQL</productname> is very flexible
+ in its ability to provide a reliable backup. Here is a list of things
+ you might want to consider when backing up a large database.
</para>
-
- <formalpara>
- <title>Use <application>pg_dump</>'s parallel dump feature.</title>
- <para>
- To speed up the dump of a large database, you can use
- <application>pg_dump</application>'s parallel mode. This will dump
- multiple tables at the same time. You can control the degree of
- parallelism with the <command>-j</command> parameter. Parallel dumps
- are only supported for the "directory" archive format.
-
-<programlisting>
-pg_dump -j <replaceable class="parameter">num</replaceable> -F d -f <replaceable class="parameter">out.dir</replaceable> <replaceable class="parameter">dbname</replaceable>
-</programlisting>
-
- You can use <command>pg_restore -j</command> to restore a dump in parallel.
- This will work for any archive of either the "custom" or the "directory"
- archive mode, whether or not it has been created with <command>pg_dump -j</command>.
- </para>
- </formalpara>
- </sect2>
- </sect1>
-
- <sect1 id="backup-file">
- <title>File System Level Backup</title>
-
- <para>
- An alternative backup strategy is to directly copy the files that
- <productname>PostgreSQL</> uses to store the data in the database;
- <xref linkend="creating-cluster"> explains where these files
- are located. You can use whatever method you prefer
- for doing file system backups; for example:
-
-<programlisting>
-tar -cf backup.tar /usr/local/pgsql/data
-</programlisting>
- </para>
-
- <para>
- There are two restrictions, however, which make this method
- impractical, or at least inferior to the <application>pg_dump</>
- method:
-
<orderedlist>
- <listitem>
- <para>
- The database server <emphasis>must</> be shut down in order to
- get a usable backup. Half-way measures such as disallowing all
- connections will <emphasis>not</emphasis> work
- (in part because <command>tar</command> and similar tools do not take
- an atomic snapshot of the state of the file system,
- but also because of internal buffering within the server).
- Information about stopping the server can be found in
- <xref linkend="server-shutdown">. Needless to say, you
- also need to shut down the server before restoring the data.
- </para>
- </listitem>
-
- <listitem>
- <para>
- If you have dug into the details of the file system layout of the
- database, you might be tempted to try to back up or restore only certain
- individual tables or databases from their respective files or
- directories. This will <emphasis>not</> work because the
- information contained in these files is not usable without
- the commit log files,
- <filename>pg_clog/*</filename>, which contain the commit status of
- all transactions. A table file is only usable with this
- information. Of course it is also impossible to restore only a
- table and the associated <filename>pg_clog</filename> data
- because that would render all other tables in the database
- cluster useless. So file system backups only work for complete
- backup and restoration of an entire database cluster.
- </para>
- </listitem>
+ <listitem><para>Use the <literal>directory</literal>
+ format and the <replaceable class="parameter">-j</replaceable> <option>NUM</option>
+ option. This will ensure the quickest and most flexible pg_dump style backup.
+ </para></listitem>
+ <listitem><para>Use <literal>continuous archiving</literal> as described in
+ <xref linkend="continuous-archiving">. You can then backup the replica without putting
+ load on the master.
+ </para></listitem>
+ <listitem><para>Use <application>pg_basebackup</application>as described in <xref
+ linkend="backup-base-backup">.</para></listitem>
</orderedlist>
- </para>
-
- <para>
- An alternative file-system backup approach is to make a
- <quote>consistent snapshot</quote> of the data directory, if the
- file system supports that functionality (and you are willing to
- trust that it is implemented correctly). The typical procedure is
- to make a <quote>frozen snapshot</> of the volume containing the
- database, then copy the whole data directory (not just parts, see
- above) from the snapshot to a backup device, then release the frozen
- snapshot. This will work even while the database server is running.
- However, a backup created in this way saves
- the database files in a state as if the database server was not
- properly shut down; therefore, when you start the database server
- on the backed-up data, it will think the previous server instance
- crashed and will replay the WAL log. This is not a problem; just
- be aware of it (and be sure to include the WAL files in your backup).
- You can perform a <command>CHECKPOINT</command> before taking the
- snapshot to reduce recovery time.
- </para>
-
- <para>
- If your database is spread across multiple file systems, there might not
- be any way to obtain exactly-simultaneous frozen snapshots of all
- the volumes. For example, if your data files and WAL log are on different
- disks, or if tablespaces are on different file systems, it might
- not be possible to use snapshot backup because the snapshots
- <emphasis>must</> be simultaneous.
- Read your file system documentation very carefully before trusting
- the consistent-snapshot technique in such situations.
- </para>
-
- <para>
- If simultaneous snapshots are not possible, one option is to shut down
- the database server long enough to establish all the frozen snapshots.
- Another option is to perform a continuous archiving base backup (<xref
- linkend="backup-base-backup">) because such backups are immune to file
- system changes during the backup. This requires enabling continuous
- archiving just during the backup process; restore is done using
- continuous archive recovery (<xref linkend="backup-pitr-recovery">).
- </para>
-
- <para>
- Another option is to use <application>rsync</> to perform a file
- system backup. This is done by first running <application>rsync</>
- while the database server is running, then shutting down the database
- server long enough to do an <command>rsync --checksum</>.
- (<option>--checksum</> is necessary because <command>rsync</> only
- has file modification-time granularity of one second.) The
- second <application>rsync</> will be quicker than the first,
- because it has relatively little data to transfer, and the end result
- will be consistent because the server was down. This method
- allows a file system backup to be performed with minimal downtime.
- </para>
-
- <para>
- Note that a file system backup will typically be larger
- than an SQL dump. (<application>pg_dump</application> does not need to dump
- the contents of indexes for example, just the commands to recreate
- them.) However, taking a file system backup might be faster.
- </para>
+
+ <warning><para>
+ The pg_dump methods utilize at least one connection if not many connections
+ (via <replaceable class="parameter">-j</replaceable>). They also utilize long running
+ transactions. This can cause problems with maintenance. If you find that your database
+ contains a lot of growing bloat consider using a backup method on the master that does
+ not require pg_dump.
+ </para></warning>
+ </sect2>
</sect1>
+
<sect1 id="continuous-archiving">
<title>Continuous Archiving and Point-in-Time Recovery (PITR)</title>
--
Sent via pgsql-docs mailing list (pgsql-docs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-docs