[DOCS] Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-03-25 Thread Heikki Linnakangas
(cc'ing docs list)

Simon Riggs wrote:
> The lack of docs begins to show a lack of coherent high-level design
> here.

Yeah, I think you're right. It's becoming hard to keep track of how it's
supposed to behave.

> By now, I've forgotten what this thread was even about. The major
> design decision in this that keeps showing up is "remove pg_standby, at
> all costs" but no reason has ever been given for that. I do believe
> there is a "better way", but we won't find it by trial and error, even
> if we had time to do so.

This has nothing to do with pg_standby.

> Please work on some clear docs for the failure modes in this system.
> That way we can all read them and understand them, or point out further
> issues. Moving straight to code is not a solution to this, since what we
> need now is to all agree on the way forwards. If we ignore this, then
> there is considerable risk that streaming rep will have a fatal
> operational flaw.
> 
> Please just document/diagram how it works now, highlighting the problems
> that still remain to be solved. We're all behind you and I'm helping
> wherever I can.

Ok, here's my attempt at the docs. Read it as a replacement for the
"High Availability, Load Balancing, and Replication" chapter, but of
course many of the sections will be unchanged, as indicated below.

-
Chapter 25. High Availability, Load Balancing, and Replication

25.1 Comparison of different solutions




25.2 Log-Shipping Standby servers



A standby server can also be used for read-only queries. This is called
Hot Standby mode, see chapter XXX

25.2.1 Planning

Set up two servers with identical hardware ...



25.2.3 Standby mode

In standby mode, the server continously applies WAL received from the
master server. The standby server can receive WAL from a WAL archive
(see restore_command) or directly from the master over a TCP connection
(streaming replication). The standby server will also attempt to restore
any WAL found in the standby's pg_xlog. That typically happens after a
server restart, to replay again any WAL streamed from the master before
the restart, but you can also manually copy files to pg_xlog at any time
to have them replayed.

At startup, the standby begins by restoring all WAL available in the
archive location, calling restore_command. Once it reaches the end of
WAL available there and restore_command fails, it tries to restore any
WAL available in the pg_xlog directory (possibly stored there by
streaming replication before restart). If that fails, and streaming
replication has been configured, the standby tries to connect to the
master server and stream WAL from it. If that fails or streaming
replication is not configured, or if the connection is disconnected
later on, the standby goes back to step 1 and tries to restoring the
file from the archive again. This loop of retries from the archive,
pg_xlog, and via streaming replication goes on until the server is
stopped or failover is triggered by a trigger file.

A corrupt or half-finished WAL file in the archive, or streamed from the
master, causes a PANIC and immediate shutdown of the standby server. A
corrupt WAL file is always a serious event which requires administrator
action. If you want to recover a WAL file known to be corrupt as far as
it can be, you can copy the file manually into pg_xlog.

Standby mode is exited and the server switches to normal operation, when
a trigger file is found (trigger_file). Before failover, it will restore
any WAL available in the archive or in pg_xlog, but won't try to connect
to the master or wait for files to become available in the archive.


25.2.4 Preparing Master for Standby servers

Set up continous archiving to a WAL archive on the master, as described
in the chapter "Continous Archiving and Point-In-Time_recovery". The
archive location should be accessible from the standby even when the
master is down, ie. it should reside on the standby server itself or
another trusted server, not on the master server.

If you want to use streaming replication, set up authentication to allow
streaming replication connections. Set max_wal_senders.

Take a base backup as described in chapter Continous Archiving and
Point-In-Time_recovery / Making a Base Backup.

25.2.4.1 Authentication for streaming replication

Ensure that listen_addresses allows connections from the standby server.




25.2.5 Setting up the standby server

1. Take a base backup, and copy it to the standby

2. Create a restore_command to restore files from the WAL archive.

3. Set standby_mode=on

4. If you want to use streaming replicaton, set primary_conninfo


You can use restartpoint_command to prune the archive of files no longer
needed by the standby.

You can have any number of standby servers as long as you set
max_wal_senders high enough in the master to allow them to be connected
simultaneously.

25.2.6 Archive recovery based log shipping

An alternative to the built-in standby mode desribed in the previous
sections is

[DOCS] Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-03-25 Thread Simon Riggs
On Thu, 2010-03-25 at 12:15 +0200, Heikki Linnakangas wrote:
> (cc'ing docs list)
> 
> Simon Riggs wrote:
> > The lack of docs begins to show a lack of coherent high-level design
> > here.
> 
> Yeah, I think you're right. It's becoming hard to keep track of how it's
> supposed to behave.

Thank you for responding to that comment positively, I am relieved.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-docs mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-docs


[DOCS] docs cleanup patch

2010-03-25 Thread Josh Kupershmidt
Hi all,
Here's a patch which mostly fixes broken URLs in code comments.

Summary of doc. changes:
 * heapfuncs.c: fix awkward comment phrasing

I also tried to fix as many broken URLs as I could find.
 * imath.h, imath.c: homepage for M.J. Fromberger moved
 * sha1.c, sha1.h: new location of FIPS pub 180-1
 * sha2.c: changed URL of PDF describing SHA-256/384/512 to one of the
only active links I could find. archive.org still has the old copy at
http://web.archive.org/web/*/http://csrc.nist.gov/cryptval/shs/sha256-384-512.pdf
 * spell.h: I think this was just pointing to the manpage for
hunspell. I changed to the project page for hunspell, though maybe
it'd be better to point to some other copy of the man page somewhere,
e.g. http://pwet.fr/man/linux/fichiers_speciaux/hunspell
 * dirmod.c: codeproject.com has shuffled pages

Didn't change:
 * be-secure.c: the URL http://www.skip-vpn.org/spec/numbers.html is
down, and I can't find an active copy anywhere else, so I didn't try
to change this. archive.org still has it though (
http://web.archive.org/web/20011212141438/http://www.skip-vpn.org/spec/numbers.html
)

Josh


docs_cleanup.patch
Description: Binary data

-- 
Sent via pgsql-docs mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-docs