PITR Archive Recovery, 28 June 2004

What's in this patch?

- All of what was in previous PITR Archival patch, including reworking
of all the archiver startup/shutdown code to match that of pgstat
- New code to perform Archive Recovery mode, which streams xlogs
straight from archive to allow "infinite" recovery

[This is a full, working patch for discussion and testing, with a few
days left before 7.5dev freeze for changes and corrections]

Archive Recovery Overview

The rule is: when you startup with archive_mode on, take a backup.

This patch provides an additional mode for transactional log recovery.
The current mode provides Crash Recovery, allowing the database to
restart and to read the last few online transaction logs to recover. The
additional mode is known as Archive Recovery, enabled in postgresql.conf
using archive_mode=true.

The administrator also specifies and archive location and a program to
perform the copying (discussed later). As database writes occur and
transaction logs fill they are copied to the archive location by a new
sub-process of the postmaster, called the archiver process.

If the database crashes when in archive_mode, it will recover just as
before. If a major system crash, or other reason to restore occurs, then
the new Archive Recovery mode can be invoked by:
- restoring the full physical backup
- creating DataDir/recovery.conf
- restart PostgreSQL

PostgreSQL will recover using almost the same code path as before,
except: all transaction logs will be restored for use directly from
archive in a stream that prevents disk space from overflowing. Extended
recovery is possible, over many transaction logs - far more than was
previously possible.

The essential aspect to recovery is maintaining a solid backup/log
chain. This must consist of:
i) a full backup of every file in PostgreSQL - miss none!
ii) archived transaction logs from just before backup taken to whenever
you want to recover to...

If you "break the chain" i.e. the two sets of files don't match, or
you've deleted a transaction log or lost a tape - then you will only be
able to recover up to the point the chain broke. The longer the chain,
the easier it will be to break, so backup regularly and work out a
robust archiving policy.

Extensibility
Archive Recovery allows the administrator to specify a program or script
to use when both archiving and restoring transaction log files. As a
result, the administrator can choose to store transaction log files on
any drive, any other system or integrating with any Backup Archive
Recovery (BAR) software.

The Archive Program is provided with 3 parameters, which can be placed
anywhere within the supplied command string. Each parameter is
represented by a "%s" character string.
- The first parameter is replaced with the full path of the transaction
log to be archived.
- The second parameter is replaced with the value of the archive_dest
parameter
e.g. with these settings
archive_mode = true
archive_dest = '/mount/disk2/pgarchive'
archive_program = 'cp %s %s'
with transaction log 00000000000003A4 to archive
with DataDir of web123

would execute the following command 
'cp /usr/local/pgsql/web123/00000000000003A4 /disk2/pgarchive'

To enter Archive Recovery the administrator must create a recovery.conf
file. [Currently an empty file will do, though intended as the one place
where all recovery options and parameters would be set]

The Restore Program is provided with 2 parameters
- The first parameter is replaced with the value of the archive_dest
parameter
- The second parameter is replaced with the value of the transaction log
which recovery is requesting should now be restored
- The third parameter is replaced with the full path of the filename
which should be the target of the single file recovery operation.
[In this patch, XlogArchRestoreProgram is hardcoded to "cp %s/%s %s",
pending short discussion on how to specify this...]

Possibilities
Using the current features, it is possible to implement an Automated
Standby Database. This is an active-passive High Availability option. In
this mode, the main server sends archived log files to a second, standby
server. The standby server is set up to be in "permanent recovery", by
using a RestoreProgram that waits for each file to be shipped to it. The
standby system receives each file, then recovers up to that point - so
the standby system is always a few seconds from completing its startup
should it be required.

Patch Status & Current Caveats
- A number of recovery scenarios have been tested and the patch is
believed to be stable and ready for others to begin commentary and
testing...please understand that there are many scenarios that work and
many that do not...these last are not bugs

- Recovery to a specific point in time is not yet implemented. The
administrator has the following recovery options:
--can recover the system fully, all the way to the end of logs
--stop recovery by withholding log files at an appropriate place,
thereby forcing the termination of recovery

How it Works

archive_debug is a hidden postgresql.conf setting which can be used to
show more debug output for the archive recovery facilities.
...archive_debug = true

[This currently generates an additional log file called recovery.log,
showing the record headers and types of all restored xlog records.]

Each PostgreSQL user has a corresponding backend process performing work
for them. These backends write any transaction log data to the log as
each operation occurs, then marks transactions as either committed or
aborted. The log is split into log segment files. When each log file
fills a notification file is created in archive_status, and at the same
time a signal is sent to the archiver (via the postmaster) to begin
archiving.

The archiver executes the administrator's program or script using a
system(3) call, archiving each file one at a time.

The notification file provides a number of features. First, its
simplicity ensures that we don't need a shared memory link between each
backend and the archiver, which would then be prone to failure - which
is exactly not what we want. Second, the archive_status file is written
whether or not the postmaster is still up, so will still function
correctly even when the worst has happened. Third, as a persistent
record, it will allow archiving to restart at the same point later on
(even after a backup/restore), allowing it to recover gracefully from
administrator's immediate shutdown requests as well as postmaster or
archive system failures.

When recovery completes, archive_status files are written to ensure
cleanup of all transaction logs leaves the database in a fit state for
production. The recovery.conf file is removed to ensure that a
subsequent crash doesn't accidentally begin recovery from archives...

The archiver process starts only when recovery has fully completed.

How to Fail

All of the following ways to fail have already occurred in testing...

1. if you do all 3 of these, you will not have a backup/log chain:
a) use the 'cp' command, or any other command that can write to either a
file or a directory
b) use archive_dest to specify a directory
c) forget to create the directory, which the command in a) then
interprets as a file and then all archived logs overwrite each other and
break the chain. You will be unable to rollforward AT ALL.

2. if you do all 5 of these, you will not rollforward very far, if any.
a) restore the database backup
b) forget to specify a recovery.conf file
c) startup and let database recover using only the xlogs that were
present when the backup was taken
d) then immediately after recovery run a job which causes xlog to be
written, and then a file to fill
e) use an archive command that allows overwriting of previously archived
files
This sequence causes the database to recover successfully, but not using
your full archive chain. The current redo pointer points half way
through your backup/log chain, so when the file is archived, it
overwrites a file in the middle of the chain, thus breaking the chain.
There isn't an option on cp that will prevent this...

3. turn archive_mode on and off...
Turning archive_mode off means that when an xlog fills, it won't EVER
get archived. This will break the backup/log chain. If you turn
archive_mode back on, and an xlog filled, then it will not be picked up
as having filled and you will be missing a link in the chain. Sometimes
you'll get away with it and then you'll think it will work all the time.
The rule is: when you startup with archive_mode on, take a backup.

4. Forget that xlogs are the same size, whether they are full or
not...you can't tell (yet) by looking at one whether it has a record
within it that will break the chain

How to succeed
- Test your recovery procedures before you need them
- When you're in a recovery situation backup everything you can lay your
hands on, to make sure you have a known position to return to while you
ATTEMPT recovery - you may make a mistake and need to retry
- be cool - we all make mistakes, just don't let errors multiply

- disable listen_addresses in postgresql.conf when you recover, to give
yourself some breathing space to check things, when you finally do get
the database ready
- recover with archive_debug = true to give me/others a chance to debug
any problems or answer any questions you may have

The rule is: when you startup with archive_mode on, take a backup. 

When enabling archive_mode for the first time, if you have run
previously without enabling archive_mode, then when you will inevitably
get messages saying "cannot find archive_status file". After you have
taken a backup, you can manually create files in the archive_status
directory of <xlog>.done, which will then allow the bgwriter to clean
them up when it next takes a checkpoint.

Additional Work

- recovery.conf needs some minor work to specify recover options. It is
envisaged that this would be a short bison grammar, very very similar to
postgresql.conf

- It seems possible to easily work around the requirement to take the
backup while the database is open. It would be a good option to have
both hot and cold backup options...

- full documentation will also follow shortly

Credits

This work is the final push in a long series of patches and discussions
about how to achieve archive recovery with PostgreSQL. The work of
J.R.Nield and Patrick MacDonald has provided the detailed underpinnings
for this recent work, which in turn rests upon Vadim Mikheev's original
work on WAL and MVCC. Tom Lane has provided considerable technical
assistance and quality review, whilst Bruce Momjian has provided many of
the ideas and smoothed the way for much of the work. Thanks, all.

Simon Riggs, [EMAIL PROTECTED]

? GNUmakefile
? config.log
? config.status
? src/Makefile.global
? src/backend/postgres
? src/backend/catalog/postgres.bki
? src/backend/catalog/postgres.description
? src/backend/postmaster/pgarch.c
? src/backend/utils/mb/conversion_procs/conversion_create.sql
? src/backend/utils/mb/conversion_procs/ascii_and_mic/libascii_and_mic.so.0.0
? src/backend/utils/mb/conversion_procs/cyrillic_and_mic/libcyrillic_and_mic.so.0.0
? src/backend/utils/mb/conversion_procs/euc_cn_and_mic/libeuc_cn_and_mic.so.0.0
? src/backend/utils/mb/conversion_procs/euc_jp_and_sjis/libeuc_jp_and_sjis.so.0.0
? src/backend/utils/mb/conversion_procs/euc_kr_and_mic/libeuc_kr_and_mic.so.0.0
? src/backend/utils/mb/conversion_procs/euc_tw_and_big5/libeuc_tw_and_big5.so.0.0
? src/backend/utils/mb/conversion_procs/latin2_and_win1250/liblatin2_and_win1250.so.0.0
? src/backend/utils/mb/conversion_procs/latin_and_mic/liblatin_and_mic.so.0.0
? src/backend/utils/mb/conversion_procs/utf8_and_ascii/libutf8_and_ascii.so.0.0
? src/backend/utils/mb/conversion_procs/utf8_and_big5/libutf8_and_big5.so.0.0
? src/backend/utils/mb/conversion_procs/utf8_and_cyrillic/libutf8_and_cyrillic.so.0.0
? src/backend/utils/mb/conversion_procs/utf8_and_euc_cn/libutf8_and_euc_cn.so.0.0
? src/backend/utils/mb/conversion_procs/utf8_and_euc_jp/libutf8_and_euc_jp.so.0.0
? src/backend/utils/mb/conversion_procs/utf8_and_euc_kr/libutf8_and_euc_kr.so.0.0
? src/backend/utils/mb/conversion_procs/utf8_and_euc_tw/libutf8_and_euc_tw.so.0.0
? src/backend/utils/mb/conversion_procs/utf8_and_gb18030/libutf8_and_gb18030.so.0.0
? src/backend/utils/mb/conversion_procs/utf8_and_gbk/libutf8_and_gbk.so.0.0
? src/backend/utils/mb/conversion_procs/utf8_and_iso8859/libutf8_and_iso8859.so.0.0
? src/backend/utils/mb/conversion_procs/utf8_and_iso8859_1/libutf8_and_iso8859_1.so.0.0
? src/backend/utils/mb/conversion_procs/utf8_and_johab/libutf8_and_johab.so.0.0
? src/backend/utils/mb/conversion_procs/utf8_and_sjis/libutf8_and_sjis.so.0.0
? src/backend/utils/mb/conversion_procs/utf8_and_tcvn/libutf8_and_tcvn.so.0.0
? src/backend/utils/mb/conversion_procs/utf8_and_uhc/libutf8_and_uhc.so.0.0
? src/backend/utils/mb/conversion_procs/utf8_and_win1250/libutf8_and_win1250.so.0.0
? src/backend/utils/mb/conversion_procs/utf8_and_win1256/libutf8_and_win1256.so.0.0
? src/backend/utils/mb/conversion_procs/utf8_and_win874/libutf8_and_win874.so.0.0
? src/bin/initdb/initdb
? src/bin/ipcclean/ipcclean
? src/bin/pg_config/pg_config
? src/bin/pg_controldata/pg_controldata
? src/bin/pg_ctl/pg_ctl
? src/bin/pg_dump/pg_dump
? src/bin/pg_dump/pg_dumpall
? src/bin/pg_dump/pg_restore
? src/bin/pg_resetxlog/pg_resetxlog
? src/bin/psql/psql
? src/bin/scripts/clusterdb
? src/bin/scripts/createdb
? src/bin/scripts/createlang
? src/bin/scripts/createuser
? src/bin/scripts/dropdb
? src/bin/scripts/droplang
? src/bin/scripts/dropuser
? src/bin/scripts/vacuumdb
? src/include/pg_config.h
? src/include/pgarch.h
? src/include/stamp-h
? src/interfaces/ecpg/compatlib/libecpg_compat.so.1.1
? src/interfaces/ecpg/ecpglib/libecpg.so.4.2
? src/interfaces/ecpg/pgtypeslib/libpgtypes.so.1.2
? src/interfaces/ecpg/preproc/ecpg
? src/interfaces/libpq/libpq.so.3.2
? src/pl/plpgsql/src/libplpgsql.so.1.0
? src/port/pg_config_paths.h
? src/timezone/zic
Index: src/backend/access/nbtree/nbtsort.c
===================================================================
RCS file: /projects/cvsroot/pgsql-server/src/backend/access/nbtree/nbtsort.c,v
retrieving revision 1.82
diff -c -r1.82 nbtsort.c
*** src/backend/access/nbtree/nbtsort.c	2 Jun 2004 17:28:17 -0000	1.82
--- src/backend/access/nbtree/nbtsort.c	28 Jun 2004 20:11:09 -0000
***************
*** 67,72 ****
--- 67,73 ----
  #include "miscadmin.h"
  #include "storage/smgr.h"
  #include "utils/tuplesort.h"
+ #include "access/xlog.h"
  
  
  /*
***************
*** 220,235 ****
  
  	wstate.index = btspool->index;
  	/*
! 	 * We need to log index creation in WAL iff WAL archiving is enabled
  	 * AND it's not a temp index.
- 	 *
- 	 * XXX when WAL archiving is actually supported, this test will likely
- 	 * need to change; and the hardwired extern is cruddy anyway ...
  	 */
  	{
! 		extern char XLOG_archive_dir[];
! 
! 		wstate.btws_use_wal = XLOG_archive_dir[0] && !wstate.index->rd_istemp;
  	}
  	/* reserve the metapage */
  	wstate.btws_pages_alloced = BTREE_METAPAGE + 1;
--- 221,231 ----
  
  	wstate.index = btspool->index;
  	/*
! 	 * We need to log index creation in WAL if WAL archiving is enabled
  	 * AND it's not a temp index.
  	 */
  	{
!  		wstate.btws_use_wal = XLogArchiveMode && !wstate.index->rd_istemp;
  	}
  	/* reserve the metapage */
  	wstate.btws_pages_alloced = BTREE_METAPAGE + 1;
Index: src/backend/access/transam/xlog.c
===================================================================
RCS file: /projects/cvsroot/pgsql-server/src/backend/access/transam/xlog.c,v
retrieving revision 1.146
diff -c -r1.146 xlog.c
*** src/backend/access/transam/xlog.c	3 Jun 2004 02:08:00 -0000	1.146
--- src/backend/access/transam/xlog.c	28 Jun 2004 20:11:14 -0000
***************
*** 35,46 ****
  #include "storage/proc.h"
  #include "storage/sinval.h"
  #include "storage/spin.h"
  #include "utils/builtins.h"
  #include "utils/guc.h"
  #include "utils/relcache.h"
  #include "miscadmin.h"
  
- 
  /*
   * This chunk of hackery attempts to determine which file sync methods
   * are available on the current platform, and to choose an appropriate
--- 35,46 ----
  #include "storage/proc.h"
  #include "storage/sinval.h"
  #include "storage/spin.h"
+ #include "storage/pmsignal.h"
  #include "utils/builtins.h"
  #include "utils/guc.h"
  #include "utils/relcache.h"
  #include "miscadmin.h"
  
  /*
   * This chunk of hackery attempts to determine which file sync methods
   * are available on the current platform, and to choose an appropriate
***************
*** 84,95 ****
  
  
  /* User-settable parameters */
  int			CheckPointSegments = 3;
  int			XLOGbuffers = 8;
  char	   *XLOG_sync_method = NULL;
  const char	XLOG_sync_method_default[] = DEFAULT_SYNC_METHOD_STR;
- char		XLOG_archive_dir[MAXPGPATH];		/* null string means
- 												 * delete 'em */
  
  #ifdef WAL_DEBUG
  bool		XLOG_DEBUG = false;
--- 84,97 ----
  
  
  /* User-settable parameters */
+ bool 			XLogArchiveMode = false;
+ bool 			XLogArchiveDEBUG = false;
+ char 			*XLogArchiveDest;
+ char 			*XLogArchiveProgram;
  int			CheckPointSegments = 3;
  int			XLOGbuffers = 8;
  char	   *XLOG_sync_method = NULL;
  const char	XLOG_sync_method_default[] = DEFAULT_SYNC_METHOD_STR;
  
  #ifdef WAL_DEBUG
  bool		XLOG_DEBUG = false;
***************
*** 126,131 ****
--- 128,138 ----
  
  /* Are we doing recovery by reading XLOG? */
  bool		InRecovery = false;
+ bool        InArchiveRecovery = false;
+ bool        UseArchiveFirst = false;
+ bool        InRecoveryCleanup = false;
+ 
+ static  char XLogArchRestoreProgram[MAXPGPATH];
  
  /*
   * MyLastRecPtr points to the start of the last XLOG record inserted by the
***************
*** 392,397 ****
--- 399,405 ----
  
  /* File path names */
  static char XLogDir[MAXPGPATH];
+ static char RLogDir[MAXPGPATH];
  static char ControlFilePath[MAXPGPATH];
  
  /*
***************
*** 433,438 ****
--- 441,449 ----
  
  static bool InRedo = false;
  
+ static bool XLogArchiveNotify(uint32 log, uint32 seg);
+ static bool XLogArchiveDone(char xlog[MAXPGPATH]);
+ static void XLogArchiveCleanup(char xlog[32]);
  
  static bool AdvanceXLInsertBuffer(void);
  static bool WasteXLInsertBuffer(void);
***************
*** 443,448 ****
--- 454,460 ----
  					   bool find_free, int max_advance,
  					   bool use_lock);
  static int	XLogFileOpen(uint32 log, uint32 seg, bool econt);
+ static void RestoreRecoveryXlog(char *path, uint32 log, uint32 seg);
  static void PreallocXlogFiles(XLogRecPtr endptr);
  static void MoveOfflineLogs(uint32 log, uint32 seg, XLogRecPtr endptr);
  static XLogRecord *ReadRecord(XLogRecPtr *RecPtr, int emode, char *buffer);
***************
*** 454,463 ****
  static void ReadControlFile(void);
  static char *str_time(time_t tnow);
  static void issue_xlog_fsync(void);
- #ifdef WAL_DEBUG
  static void xlog_outrec(char *buf, XLogRecord *record);
- #endif
- 
  
  /*
   * Insert an XLOG record having the specified RMID and info bytes,
--- 466,472 ----
***************
*** 911,916 ****
--- 920,1056 ----
  }
  
  /*
+  * XLogArchiveNotify
+  *
+  * Writes an archive notification file to the RLogDir
+  *
+  * The name of the notification file is the message that will be picked up
+  * by the archiver, e.g. we write RLogDir/00000001000000C6.ready 
+  * and the archiver then knows to archive XLogDir/00000001000000C6,
+  * then when complete, rename it to RLogDir/00000001000000C6.done
+  *
+  * Called only when in XLogArchiveMode by one backend process
+  */
+ static bool 
+ XLogArchiveNotify(uint32 log, uint32 seg)
+ {
+ 	char		rlog[32];
+ 	char		rlogpath[MAXPGPATH];
+ 	FILE	   	*rlogFD;
+ 
+ /* insert an otherwise empty file called <XLOG>.ready */
+ 	sprintf(rlog, "%08X%08X.ready", log, seg);
+ 	snprintf(rlogpath, MAXPGPATH, "%s/%s", RLogDir, rlog);
+ 	rlogFD = AllocateFile(rlogpath, "w");
+ 	if (rlogFD == NULL)
+ 		ereport(ERROR,
+ 				(errcode_for_file_access(),
+ 			errmsg("could not write archive_status file \"%s\" ",
+ 				   rlogpath)));
+ 	FreeFile(rlogFD);
+ 
+ /* the existence of this file is the message to the archiver to identify
+  * which files require archiving
+  *
+  * if this file is written OK, we then signal the ARCHIVER to do its thang
+  */
+ 
+ 	if (XLogArchiveDEBUG)
+ 		elog(LOG, "backend: written %s", rlogpath );
+ 
+     /*
+      * don't send the signal if we know that the archiver isn't there (yet)
+      * - the archiver will see the archive_status file as soon as it starts 
+      */
+     if (!InArchiveRecovery)
+         SendPostmasterSignal(PMSIGNAL_WAKEN_ARCHIVER);
+ 
+ 	return true;
+ }
+ 
+ /*
+  * XLogArchiveDone
+  *
+  * Searches for an archive notification file in RLogDir
+  * 
+  * Reads RLogDir looking for a specific filename. If that filename ends with .done
+  * then we know that the filename refers to an xlog in XLogDir that is safe to
+  * recycle. If the filename ends .ready then thats OK, else we have an error.
+  * 
+  * Called only when in XLogArchiveMode by bgwriter (when performing checkpoint)
+  *
+  * XXX code is rehacked from an earlier version, so needs streamlining
+  */
+ static bool 
+ XLogArchiveDone(char xlog[32])
+ {
+ 	char		rlogpath[MAXPGPATH];
+ 	FILE	   	*rlogFD;
+ 
+ 	if (XLogArchiveDEBUG)
+ 		elog(LOG, "chkpt: checking for log file %s",
+ 						   xlog);
+ 
+ /* If <XLOG>.done exists then return true
+  */
+ 	snprintf(rlogpath, MAXPGPATH, "%s/%s.done", RLogDir, xlog);
+ 	rlogFD = AllocateFile(rlogpath, "r");
+ 	if (rlogFD != NULL) {
+ 		FreeFile(rlogFD);
+ 		if (XLogArchiveDEBUG)
+ 			elog(LOG, "chkpt: archiving done for log file %s",
+ 						   xlog);
+ 		return true;
+ 	} 
+ 	else
+ 		{
+ /*
+  * else if <XLOG>.ready exists then return false and issue WARNING
+  * ...this indicates archiver is either not working at all or
+  * if it is, then its just way too slow or incorrectly configured
+  */
+ 			snprintf(rlogpath, MAXPGPATH, "%s/%s.ready", RLogDir, xlog);
+ 			rlogFD = AllocateFile(rlogpath, "r");
+ 			if (rlogFD != NULL) {
+ 			    FreeFile(rlogFD);
+ 		 	    elog(WARNING, "chkpt: archiving not yet started for log file %s", 
+ 						xlog);
+ 			    return false;
+ 			}
+ 			else
+ 			{
+ /* else issue a WARNING.... a notification file SHOULD exist... 
+  */ 
+ 			    ereport(WARNING,
+ 				(errcode_for_file_access(),
+ 			 	errmsg("chkpt: cannot find archive_status file: %s ",
+ 						rlogpath)));
+ 			    return false;
+ 			}
+ 		}
+ }
+ 
+ /*
+  * XLogArchiveCleanup
+  *
+  * Cleanup an archive notification file for a particular xlog in XLogDir
+  * 
+  * Called only when in XLogArchiveMode by bgwriter (when performing checkpoint)
+  *
+  */
+ static void
+ XLogArchiveCleanup(char xlog[32])
+ {
+ 	char	rlogpath[MAXPGPATH];
+ 
+ 	snprintf(rlogpath, MAXPGPATH, "%s/%s.done", RLogDir, xlog);
+ 	unlink(rlogpath);
+ 
+ }
+ 
+ /*
   * Advance the Insert state to the next buffer page, writing out the next
   * buffer if it still contains unwritten data.
   *
***************
*** 1259,1264 ****
--- 1399,1412 ----
  		{
  			issue_xlog_fsync();
  			LogwrtResult.Flush = LogwrtResult.Write;	/* end of current page */
+ 
+             /* 
+              * Notify xlog ready to archive?
+              */
+             if (XLogArchiveMode && !XLogArchiveNotify(openLogId, openLogSeg))
+ 				elog(WARNING, "could not set notify for archiver to read log file %u, segment %u",
+ 					   openLogId, openLogSeg);
+ 
  		}
  
  		if (ispartialpage)
***************
*** 1685,1691 ****
  	char		path[MAXPGPATH];
  	int			fd;
  
! 	XLogFileName(path, log, seg);
  
  	fd = BasicOpenFile(path, O_RDWR | PG_BINARY | XLOG_SYNC_BIT,
  					   S_IRUSR | S_IWUSR);
--- 1833,1841 ----
  	char		path[MAXPGPATH];
  	int			fd;
  
!     XLogFileName(path, log, seg);
!  	if (UseArchiveFirst)
!         RestoreRecoveryXlog(path, log, seg);
  
  	fd = BasicOpenFile(path, O_RDWR | PG_BINARY | XLOG_SYNC_BIT,
  					   S_IRUSR | S_IWUSR);
***************
*** 1704,1714 ****
  			errmsg("could not open file \"%s\" (log file %u, segment %u): %m",
  				   path, log, seg)));
  	}
! 
  	return (fd);
  }
  
  /*
   * Preallocate log files beyond the specified log endpoint, according to
   * the XLOGfile user parameter.
   */
--- 1854,2044 ----
  			errmsg("could not open file \"%s\" (log file %u, segment %u): %m",
  				   path, log, seg)));
  	}
!  
  	return (fd);
  }
  
  /*
+  * Get next logfile segment to allow recovery
+  *
+  */
+ static void
+ RestoreRecoveryXlog(char *path, uint32 log, uint32 seg)
+ {
+     char tmpXlog[32];
+     char restoreXlog[32];
+     char tmppath[MAXPGPATH];
+     char xlogRestoreCmd[MAXPGPATH];
+     char recoveryXlog[MAXPGPATH];
+     char lastrecoXlog[MAXPGPATH];
+     int         rc;
+ 	struct stat stat_buf;
+     uint32 prevlog, prevseg;
+ 	FILE	   	*rlogFD;
+ 
+     /* 
+      * If a RecoveryFile exists, then we know we are in media recovery
+      * in which case we choose to recover files from archive, even
+      * though a file of that name may already exist in XLogDir
+      *
+      * By doing this, we do not effect crash recovery code path
+      * when we are not in archive_mode
+      *
+      * We take the archived file because, at the point we took backup,
+      * the current xlog will most probably be only partially full, 
+      * so we MUST refer to the full version of this file and 
+      * NOT the version of the file that exists with the backup.
+      *
+      * We could try to optimize this slightly by checking the local
+      * copy lastchange timestamp against the archived copy, 
+      * but we have no API to do this, nor can we guarantee that the
+      * lastchange timestamp was preserved correctly when we copied
+      * to archive. Our aim is robustness, so we elect not to do this.
+      *
+      * Try to copy full xlog from archive to pg_xlog, if it is available
+      * If that succeeds, we pass the RecoveryXlog filepath back for opening
+      * If that fails, then we try to read a local file if one exists.
+      * This allows us to cater for situations where the current xlog
+      * is still available locally and hasn't yet made it to archive.
+      * This could happen if:
+      * - we decide to recover database to undo user data changes
+      * - we have XLogDir on a different disk and the main DataDir drive
+      *   fails, leaving us with just the XLogDir
+      *
+      * Notice that we don't actually overwrite any files when we copy back
+      * from archive because the XLogArchRestoreProgram may inadvertently
+      * restore inappropriate xlogs, or they may be corrupt, so we may
+      * have to fallback to the segments remaining in current XLogDir later.
+      * The copy-from-archive xlog is always the same, ensuring that we
+      * don't run out of disk space on long recoveries.
+      *
+      * [EMAIL PROTECTED]
+      */
+     
+         snprintf(recoveryXlog, MAXPGPATH, "%s/RECOVERYXLOG", XLogDir);
+     	snprintf(lastrecoXlog, MAXPGPATH, "%s/LASTRECOXLOG", XLogDir);
+ 
+         if (stat(recoveryXlog, &stat_buf) == 0) {            
+             /*
+              * save a copy of the last xlog, before we try to restore
+              * if the restore fails, we will need it to become current xlog
+              */
+             rc = rename(recoveryXlog, lastrecoXlog);
+             if (rc !=0)
+         		elog(LOG, "rename failed %s %s",recoveryXlog, lastrecoXlog);
+             /*
+              * if it fails, ignore it - we'll create one soon...
+              */
+         }
+ 
+         /*
+          * Copy xlog from archive_dest to XLogDir
+          */
+         sprintf(restoreXlog, "%08X%08X", log, seg);
+       	snprintf(xlogRestoreCmd, MAXPGPATH, XLogArchRestoreProgram, 
+                    XLogArchiveDest, restoreXlog, recoveryXlog);
+         if (XLogArchiveDEBUG)
+     		elog(LOG, "redo: system(%s)", xlogRestoreCmd);
+ 
+         rc = system(xlogRestoreCmd);
+         if (rc!=0) {
+             /*
+              * remember, we rollforward UNTIL the restore fails
+              * so failure here is just part of the process...
+              * that makes it difficult to determine whether the restore
+              * failed because there isn't an archive to restore, or
+              * because the administrator has specified the restore
+              * program incorrectly...
+              * we could try to restore the testfile that the archiver writes
+              * when it starts up, but the absence of that file isn't
+              * very reliable evidence that the restore itself is broken,
+              * so just trust that the administrator has it correctly,
+              * XXX enhance that later
+              */
+  	        elog(LOG, "redo: cannot restore \"%s\" from archive", restoreXlog);
+             /*
+              * if an archived file is not available, there might just be 
+              * a partially full version of this file still in XLogDir
+              * so return this as the filename to open.
+              * In many recovery scenarios we expect this to fail also...
+              */
+             snprintf(recoveryXlog, MAXPGPATH, "%s/%s", XLogDir, restoreXlog);
+             UseArchiveFirst = false;
+             if (stat(recoveryXlog, &stat_buf) == 0) {
+      	        elog(LOG, "redo: archive chain ends; using local copy of \"%s\"", restoreXlog);
+             }
+             /*
+              * if this file isn't available, then we need to setup the previous
+              * restored xlog to be the last and current xlog, if it exists
+              * remember: we've been restoring from recoverXlog, which isn't
+              * named the same as the normal xlog chain...
+              * also remember to output a corresponding archive_status of .done
+              */
+             else if ((stat(lastrecoXlog, &stat_buf) == 0) && log==0 && seg > 0) {
+                 prevlog = log;
+                 prevseg = seg;
+         	    PrevLogSeg(prevlog, prevseg);
+                 XLogFileName(tmppath, prevlog, prevseg);
+            		elog(LOG, "redo: moving last restored xlog to %s", tmppath);
+                 rc = rename(lastrecoXlog, tmppath);
+                 if (rc!=0) {
+                	    elog(LOG, "redo: rename failed");
+             	    ereport(PANIC,
+         		        (errcode_for_file_access(),
+             	        errmsg("could not open file \"%s\" (log file %u, segment %u): %m",
+                         tmpXlog, log, seg)));
+                 }
+ 
+                 /* 
+                  * write out an archive_status file for previous xlog
+                  * to allow xlog to be recycled when recovered database
+                  * is all up and working again
+                  * ...looks wrong, but checkpointer is smart enough
+                  * not to archive the current xlog!
+                  */
+             	sprintf(tmpXlog, "%08X%08X", prevlog, prevseg);
+             	snprintf(tmppath, MAXPGPATH, "%s/%s.done", RLogDir, tmpXlog);
+             	rlogFD = AllocateFile(tmppath, "w");
+ 	            if (rlogFD == NULL)
+                     ereport(ERROR,
+ 	       			    (errcode_for_file_access(),
+ 	       		         errmsg("could not write archive_status file \"%s\" ",
+ 	       			        tmppath)));
+ 	            FreeFile(rlogFD);
+             }
+             /* 
+              * there is NO else here...we just return the filename
+              * knowing that it isn't there...which then throws the usual error,
+              * will end with a clear message as to why...but not a problem
+              */
+         }
+         else {
+         /* restore success */
+             /* 
+              * if backup restored an xlog, yet we didnt use the local copy
+              * because we used the xlog version of that name from the
+              * archive instead, we need to write out an archive_status for
+              * it to show it can be recycled later
+              */
+             XLogFileName(tmppath, log, seg);
+             if (stat(tmppath, &stat_buf) == 0) {
+                	sprintf(tmpXlog, "%08X%08X", log, seg);
+                	snprintf(tmppath, MAXPGPATH, "%s/%s.done", RLogDir, tmpXlog);
+                	rlogFD = AllocateFile(tmppath, "w");
+         	    if (rlogFD == NULL)
+                     ereport(ERROR,
+         			    (errcode_for_file_access(),
+         		         errmsg("could not write archive_status file \"%s\" ",
+     	  			        tmppath)));
+     	        FreeFile(rlogFD);
+             }
+  	        elog(LOG, "redo: restored \"%s\" from archive", restoreXlog);
+         }
+         strcpy(path, recoveryXlog);                
+         return;
+ }
+ 
+ /*
   * Preallocate log files beyond the specified log endpoint, according to
   * the XLOGfile user parameter.
   */
***************
*** 1746,1751 ****
--- 2076,2082 ----
  	struct dirent *xlde;
  	char		lastoff[32];
  	char		path[MAXPGPATH];
+     bool        recycle=false;
  
  	XLByteToPrevSeg(endptr, endlogId, endlogSeg);
  
***************
*** 1761,1785 ****
  	errno = 0;
  	while ((xlde = readdir(xldir)) != NULL)
  	{
  		if (strlen(xlde->d_name) == 16 &&
  			strspn(xlde->d_name, "0123456789ABCDEF") == 16 &&
  			strcmp(xlde->d_name, lastoff) <= 0)
  		{
  			snprintf(path, MAXPGPATH, "%s/%s", XLogDir, xlde->d_name);
! 			if (XLOG_archive_dir[0])
! 			{
! 				ereport(LOG,
! 						(errmsg("archiving transaction log file \"%s\"",
! 								xlde->d_name)));
! 				elog(WARNING, "archiving log files is not implemented");
! 			}
! 			else
  			{
  				/*
  				 * Before deleting the file, see if it can be recycled as
  				 * a future log segment.  We allow recycling segments up
! 				 * to XLOGfileslop segments beyond the current XLOG
! 				 * location.
  				 */
  				if (InstallXLogFileSegment(endlogId, endlogSeg, path,
  										   true, XLOGfileslop,
--- 2092,2134 ----
  	errno = 0;
  	while ((xlde = readdir(xldir)) != NULL)
  	{
+ 		/* if correct length and alphanumeric makeup of file looks correct
+ 		 * use the alphanumeric sorting property of the filenames to decide
+ 		 * which ones are earlier than the lastoff transaction log
+ 		 * ...maybe should read lastwrite datetime of lastoff, then check that
+ 		 * only files last written earlier than this are removed/recycled
+ 		 */
  		if (strlen(xlde->d_name) == 16 &&
  			strspn(xlde->d_name, "0123456789ABCDEF") == 16 &&
  			strcmp(xlde->d_name, lastoff) <= 0)
  		{
  			snprintf(path, MAXPGPATH, "%s/%s", XLogDir, xlde->d_name);
! 			if (XLogArchiveMode) {
!                 if (InRecoveryCleanup)
!                     /*
!                      * this allows recycling of transaction logs
!                      * during the shutdown checkpoint at end of recovery
!                      * - we may have restored logs that were not used
!                      * in the recovery sequence, and so will not have
!                      * had an archive_status file written for them. 
!                      * - end-of-recovery doesn't clean up ALL xlogs,
!                      * which is why we also write archive_status files
!                      * as well as doing this
!                      */
!                     recycle=true;
!                 else
!                     recycle=XLogArchiveDone(xlde->d_name);
!             }
!             else
!                 recycle=false;
! 
! 			if ( recycle )
  			{
  				/*
  				 * Before deleting the file, see if it can be recycled as
  				 * a future log segment.  We allow recycling segments up
! 				 * until there are XLOGfileslop segments beyond the
! 				 * current XLOG location, otherwise they are removed.
  				 */
  				if (InstallXLogFileSegment(endlogId, endlogSeg, path,
  										   true, XLOGfileslop,
***************
*** 1793,1802 ****
  				{
  					/* No need for any more future segments... */
  					ereport(LOG,
! 						  (errmsg("removing transaction log file \"%s\"",
  								  xlde->d_name)));
  					unlink(path);
  				}
  			}
  		}
  		errno = 0;
--- 2142,2152 ----
  				{
  					/* No need for any more future segments... */
  					ereport(LOG,
! 						  (errmsg("too many transaction log files, removing \"%s\"",
  								  xlde->d_name)));
  					unlink(path);
  				}
+                 XLogArchiveCleanup(xlde->d_name);
  			}
  		}
  		errno = 0;
***************
*** 2254,2259 ****
--- 2604,2610 ----
  {
  	/* Init XLOG file paths */
  	snprintf(XLogDir, MAXPGPATH, "%s/pg_xlog", DataDir);
+ 	snprintf(RLogDir, MAXPGPATH, "%s/archive_status", XLogDir);
  	snprintf(ControlFilePath, MAXPGPATH, "%s/global/pg_control", DataDir);
  }
  
***************
*** 2785,2790 ****
--- 3136,3143 ----
  	XLogRecord *record;
  	char	   *buffer;
  	uint32		freespace;
+    	char		recoveryCommandFile[MAXPGPATH];
+    	struct stat stat_buf;
  
  	/* Use malloc() to ensure record buffer is MAXALIGNED */
  	buffer = (char *) malloc(_INTL_MAXLOGRECSZ);
***************
*** 2831,2836 ****
--- 3184,3216 ----
  		pg_usleep(60000000L);
  #endif
  
+     /*
+      * Check now for recovery.conf
+      *
+      * if this file exists, it demonstrates the intention of the administrator
+      * to recover this database using archived xlogs
+      *
+      * we do this now because the first xlog is about to be opened for the
+      * first time. We've read the checkpoint pointer from the control file
+      * and we are about to use that to open the xlog it points to, and
+      * will begin rollforward recovery from that point
+      */
+   	snprintf(recoveryCommandFile, MAXPGPATH, "%s/recovery.conf", DataDir);
+     if (stat(recoveryCommandFile, &stat_buf) == 0) {
+      	strcpy(XLogArchRestoreProgram, "cp %s/%s %s");
+         /*
+          * clearly indicate our state
+          */
+         InArchiveRecovery = true;
+         /*
+          * set initial state for checking transaction logs
+          * this may change if the archive runs dry while still InArchiveRecovery
+          */
+         UseArchiveFirst = true;
+     	ereport(LOG,
+     		(errmsg("recovery.conf found...starting archive recovery")));
+     }
+ 
  	/*
  	 * Get the last valid checkpoint record.  If the latest one according
  	 * to pg_control is broken, try the next-to-last one.
***************
*** 2861,2872 ****
  	LastRec = RecPtr = checkPointLoc;
  	memcpy(&checkPoint, XLogRecGetData(record), sizeof(CheckPoint));
  	wasShutdown = (record->xl_info == XLOG_CHECKPOINT_SHUTDOWN);
! 
  	ereport(LOG,
  			(errmsg("redo record is at %X/%X; undo record is at %X/%X; shutdown %s",
  					checkPoint.redo.xlogid, checkPoint.redo.xrecoff,
  					checkPoint.undo.xlogid, checkPoint.undo.xrecoff,
! 					wasShutdown ? "TRUE" : "FALSE")));
  	ereport(LOG,
  			(errmsg("next transaction ID: %u; next OID: %u",
  					checkPoint.nextXid, checkPoint.nextOid)));
--- 3241,3260 ----
  	LastRec = RecPtr = checkPointLoc;
  	memcpy(&checkPoint, XLogRecGetData(record), sizeof(CheckPoint));
  	wasShutdown = (record->xl_info == XLOG_CHECKPOINT_SHUTDOWN);
!     /*
!      * we report the state of the control_file, not the checkpoint, why?
!      * wasShutdown refers to whether the last checkpoint was a 
!      * shutdown checkpoint, NOT whether the database was shutdown
!      * correctly according to control file. This distinction is only
!      * important InArchiveRecovery, since otherwise we could
!      * report that the database was shutdown, when the control file disagrees
!      */
  	ereport(LOG,
  			(errmsg("redo record is at %X/%X; undo record is at %X/%X; shutdown %s",
  					checkPoint.redo.xlogid, checkPoint.redo.xrecoff,
  					checkPoint.undo.xlogid, checkPoint.undo.xrecoff,
!                     (ControlFile->state == DB_SHUTDOWNED) ? "TRUE" : "FALSE")));
! 
  	ereport(LOG,
  			(errmsg("next transaction ID: %u; next OID: %u",
  					checkPoint.nextXid, checkPoint.nextOid)));
***************
*** 2914,2919 ****
--- 3302,3311 ----
  	if (InRecovery)
  	{
  		int			rmid;
+     	char		reclogpath[MAXPGPATH];
+         bool        recovery_debug_log = false;
+         int         reclogFD = -1;
+         char        *recbuf = NULL;
  
  		ereport(LOG,
  				(errmsg("database system was not properly shut down; "
***************
*** 2946,2951 ****
--- 3339,3361 ----
  			ereport(LOG,
  					(errmsg("redo starts at %X/%X",
  							ReadRecPtr.xlogid, ReadRecPtr.xrecoff)));
+ #ifdef WAL_DEBUG
+             if (XLOG_DEBUG)
+                recovery_debug_log = true;
+ #endif
+             if (XLogArchiveDEBUG)            
+                recovery_debug_log = true;
+ 
+             if (recovery_debug_log) {
+            		recbuf = (char *) malloc(BLCKSZ);
+                 snprintf(reclogpath, MAXPGPATH, "%s/recovery.log", DataDir);
+                 unlink(reclogpath);
+                 reclogFD = BasicOpenFile(reclogpath, O_RDWR | O_CREAT | O_EXCL,
+ 					S_IRUSR | S_IWUSR);
+                 if (reclogFD < 0)
+                     recovery_debug_log = false;
+             }
+ 
  			do
  			{
  				/* nextXid must be beyond record's xid */
***************
*** 2956,2976 ****
  					TransactionIdAdvance(ShmemVariableCache->nextXid);
  				}
  
! #ifdef WAL_DEBUG
! 				if (XLOG_DEBUG)
  				{
! 					char		buf[8192];
! 
! 					sprintf(buf, "REDO @ %X/%X; LSN %X/%X: ",
  							ReadRecPtr.xlogid, ReadRecPtr.xrecoff,
  							EndRecPtr.xlogid, EndRecPtr.xrecoff);
! 					xlog_outrec(buf, record);
! 					strcat(buf, " - ");
! 					RmgrTable[record->xl_rmid].rm_desc(buf,
  								record->xl_info, XLogRecGetData(record));
! 					elog(LOG, "%s", buf);
  				}
- #endif
  
  				if (record->xl_info & XLR_BKP_BLOCK_MASK)
  					RestoreBkpBlocks(record, EndRecPtr);
--- 3366,3383 ----
  					TransactionIdAdvance(ShmemVariableCache->nextXid);
  				}
  
! 				if (recovery_debug_log)
  				{
! 					sprintf(recbuf, "\nREDO @ %X/%X; LSN %X/%X: ",
  							ReadRecPtr.xlogid, ReadRecPtr.xrecoff,
  							EndRecPtr.xlogid, EndRecPtr.xrecoff);
! 					xlog_outrec(recbuf, record);
! 					strcat(recbuf, " - ");
! 					RmgrTable[record->xl_rmid].rm_desc(recbuf,
  								record->xl_info, XLogRecGetData(record));
!                     
!                     write(reclogFD, recbuf, strlen(recbuf));
  				}
  
  				if (record->xl_info & XLR_BKP_BLOCK_MASK)
  					RestoreBkpBlocks(record, EndRecPtr);
***************
*** 2978,2988 ****
--- 3385,3405 ----
  				RmgrTable[record->xl_rmid].rm_redo(EndRecPtr, record);
  				record = ReadRecord(NULL, LOG, buffer);
  			} while (record != NULL);
+ 
+             if (reclogFD >= 0) {
+                 close(reclogFD);
+                 free(recbuf);
+             }
+ 
  			ereport(LOG,
  					(errmsg("redo done at %X/%X",
  							ReadRecPtr.xlogid, ReadRecPtr.xrecoff)));
  			LastRec = ReadRecPtr;
  			InRedo = false;
+             if (InArchiveRecovery)
+                 UseArchiveFirst = false;
+                 InRecoveryCleanup = true;
+             InArchiveRecovery = false;
  		}
  		else
  			ereport(LOG,
***************
*** 3147,3152 ****
--- 3564,3575 ----
  	 * Okay, we're officially UP.
  	 */
  	InRecovery = false;
+     if (InRecoveryCleanup) {
+         unlink(recoveryCommandFile);
+         InRecoveryCleanup = false;
+ 		ereport(LOG,
+ 			(errmsg("archive recovery complete")));
+     }
  
  	ControlFile->state = DB_IN_PRODUCTION;
  	ControlFile->time = time(NULL);
***************
*** 3701,3707 ****
  		strcat(buf, "UNKNOWN");
  }
  
- #ifdef WAL_DEBUG
  static void
  xlog_outrec(char *buf, XLogRecord *record)
  {
--- 4124,4129 ----
***************
*** 3726,3733 ****
  	sprintf(buf + strlen(buf), ": %s",
  			RmgrTable[record->xl_rmid].rm_name);
  }
- #endif /* WAL_DEBUG */
- 
  
  /*
   * GUC support
--- 4148,4153 ----
Index: src/backend/postmaster/Makefile
===================================================================
RCS file: /projects/cvsroot/pgsql-server/src/backend/postmaster/Makefile,v
retrieving revision 1.15
diff -c -r1.15 Makefile
*** src/backend/postmaster/Makefile	29 May 2004 22:48:19 -0000	1.15
--- src/backend/postmaster/Makefile	28 Jun 2004 20:11:14 -0000
***************
*** 12,18 ****
  top_builddir = ../../..
  include $(top_builddir)/src/Makefile.global
  
! OBJS = postmaster.o bgwriter.o pgstat.o
  
  all: SUBSYS.o
  
--- 12,18 ----
  top_builddir = ../../..
  include $(top_builddir)/src/Makefile.global
  
! OBJS = postmaster.o bgwriter.o pgstat.o pgarch.o
  
  all: SUBSYS.o
  
Index: src/backend/postmaster/postmaster.c
===================================================================
RCS file: /projects/cvsroot/pgsql-server/src/backend/postmaster/postmaster.c,v
retrieving revision 1.405
diff -c -r1.405 postmaster.c
*** src/backend/postmaster/postmaster.c	24 Jun 2004 21:02:55 -0000	1.405
--- src/backend/postmaster/postmaster.c	28 Jun 2004 20:11:18 -0000
***************
*** 117,123 ****
  #include "utils/ps_status.h"
  #include "bootstrap/bootstrap.h"
  #include "pgstat.h"
! 
  
  /*
   * List of active backends (or child processes anyway; we don't actually
--- 117,123 ----
  #include "utils/ps_status.h"
  #include "bootstrap/bootstrap.h"
  #include "pgstat.h"
! #include "pgarch.h"
  
  /*
   * List of active backends (or child processes anyway; we don't actually
***************
*** 198,203 ****
--- 198,204 ----
  /* PIDs of special child processes; 0 when not running */
  static pid_t StartupPID = 0,
  			BgWriterPID = 0,
+             PgArchPID = 0,
  			PgStatPID = 0;
  
  /* Startup/shutdown state */
***************
*** 1147,1152 ****
--- 1148,1158 ----
  				kill(BgWriterPID, SIGUSR2);
  		}
  
+ 		/* If we have lost the archiver, try to start a new one */
+ 		if (XLogArchiveMode && PgArchPID == 0 && 
+             StartupPID == 0 && !FatalError && Shutdown == NoShutdown)
+ 			PgArchPID = pgarch_start();
+  
  		/* If we have lost the stats collector, try to start a new one */
  		if (PgStatPID == 0 &&
  			StartupPID == 0 && !FatalError && Shutdown == NoShutdown)
***************
*** 1751,1756 ****
--- 1757,1765 ----
  			/* Tell pgstat to shut down too; nothing left for it to do */
  			if (PgStatPID != 0)
  				kill(PgStatPID, SIGQUIT);
+ 			/* Tell pgarch to shut down too; nothing left for it to do */
+ 			if (PgArchPID != 0)
+ 				kill(PgArchPID, SIGQUIT);
  			break;
  
  		case SIGINT:
***************
*** 1795,1800 ****
--- 1804,1812 ----
  			/* Tell pgstat to shut down too; nothing left for it to do */
  			if (PgStatPID != 0)
  				kill(PgStatPID, SIGQUIT);
+ 			/* Tell pgarch to shut down too; nothing left for it to do */
+ 			if (PgArchPID != 0)
+ 				kill(PgArchPID, SIGQUIT);
  			break;
  
  		case SIGQUIT:
***************
*** 1812,1817 ****
--- 1824,1831 ----
  				kill(BgWriterPID, SIGQUIT);
  			if (PgStatPID != 0)
  				kill(PgStatPID, SIGQUIT);
+ 			if (PgArchPID != 0)
+ 				kill(PgArchPID, SIGQUIT);
  			if (DLGetHead(BackendList))
  				SignalChildren(SIGQUIT);
  			ExitPostmaster(0);
***************
*** 1901,1908 ****
  			 */
  			if (Shutdown > NoShutdown && BgWriterPID != 0)
  				kill(BgWriterPID, SIGUSR2);
! 			else if (PgStatPID == 0 && Shutdown == NoShutdown)
! 				PgStatPID = pgstat_start();
  
  			continue;
  		}
--- 1915,1926 ----
  			 */
  			if (Shutdown > NoShutdown && BgWriterPID != 0)
  				kill(BgWriterPID, SIGUSR2);
! 			else if (Shutdown == NoShutdown) {
!                     if (PgStatPID == 0)
!         				PgStatPID = pgstat_start();
!                     if (PgArchPID == 0)
!         				PgArchPID = pgarch_start();
!             }
  
  			continue;
  		}
***************
*** 1951,1956 ****
--- 1969,1990 ----
  		}
  
  		/*
+ 		 * Was it the archiver?  If so, just try to start a new
+ 		 * one; no need to force reset of the rest of the system.  (If fail,
+ 		 * we'll try again in future cycles of the main loop.)
+ 		 */
+ 		if (PgArchPID != 0 && pid == PgArchPID)
+ 		{
+ 			PgArchPID = 0;
+ 			if (exitstatus != 0)
+ 				LogChildExit(LOG, gettext("archiver process"),
+ 							 pid, exitstatus);
+ 			if (StartupPID == 0 && !FatalError && Shutdown == NoShutdown)
+ 				PgArchPID = pgarch_start();
+ 			continue;
+ 		}
+ 
+ 		/*
  		 * Else do standard backend child cleanup.
  		 */
  		CleanupProc(pid, exitstatus);
***************
*** 2132,2137 ****
--- 2166,2182 ----
  		kill(PgStatPID, SIGQUIT);
  	}
  
+ 	/* Force a power-cycle of the pgarch processes too */
+ 	/* (Shouldn't be necessary, but just for luck) */
+ 	if (PgArchPID != 0 && !FatalError)
+ 	{
+ 		ereport(DEBUG2,
+ 				(errmsg_internal("sending %s to process %d",
+ 								 "SIGQUIT",
+ 								 (int) PgArchPID)));
+ 		kill(PgArchPID, SIGQUIT);
+ 	}
+ 
  	FatalError = true;
  }
  
***************
*** 2881,2886 ****
--- 2926,2945 ----
  		if (Shutdown <= SmartShutdown)
  			SignalChildren(SIGUSR1);
  	}
+  
+  	if (CheckPostmasterSignal(PMSIGNAL_WAKEN_ARCHIVER))
+  	{
+  		/*
+  		 * Send SIGUSR1 to ARCHIVER process, to wake it up and begin
+  		 * archiving next transaction log file. Backend should only
+          * send if in XLogArchiveMode...
+  		 */
+  		if (XLogArchiveMode && Shutdown == NoShutdown) {
+             if (XLogArchiveDEBUG)
+        	        elog(LOG, "postmaster: WAKEN_ARCHIVER received, sending SIGUSR1 to archiver");
+             kill(PgArchPID,SIGUSR1);
+ 		}
+     }
  
  	PG_SETMASK(&UnBlockSig);
  
Index: src/backend/utils/misc/guc.c
===================================================================
RCS file: /projects/cvsroot/pgsql-server/src/backend/utils/misc/guc.c,v
retrieving revision 1.211
diff -c -r1.211 guc.c
*** src/backend/utils/misc/guc.c	11 Jun 2004 03:54:54 -0000	1.211
--- src/backend/utils/misc/guc.c	28 Jun 2004 20:11:37 -0000
***************
*** 371,376 ****
--- 371,392 ----
  
  static struct config_bool ConfigureNamesBool[] =
  {
+  	{
+  		{"archive_mode", PGC_POSTMASTER, WAL_SETTINGS,
+  			gettext_noop("Enable archiving of full transaction log files to a specified archival destination."),
+  			NULL
+  		},
+  		&XLogArchiveMode,
+  		false, NULL, NULL
+  	},
+  	{
+  		{"archive_debug", PGC_SIGHUP, WAL_SETTINGS,
+  			gettext_noop("Provide debug output for archive activities."),
+  			NULL
+  		},
+  		&XLogArchiveDEBUG,
+  		false, NULL, NULL
+  	},
  	{
  		{"enable_seqscan", PGC_USERSET, QUERY_TUNING_METHOD,
  			gettext_noop("Enables the planner's use of sequential-scan plans."),
***************
*** 1400,1405 ****
--- 1416,1439 ----
  
  static struct config_string ConfigureNamesString[] =
  {
+  	{
+  		{"archive_dest", PGC_POSTMASTER, WAL_SETTINGS,
+  			gettext_noop("Specifies where to archive WAL logs."),
+  			gettext_noop("A directory or specific location for archiving transation log files from PostgreSQL")
+  		},
+  		&XLogArchiveDest,
+  		"", NULL, NULL
+  	},
+  
+  	{
+  		{"archive_program", PGC_POSTMASTER, WAL_SETTINGS,
+  			gettext_noop("Archive program"),
+  			gettext_noop("The external program that will be called to execute the archival process")
+  		},
+  		&XLogArchiveProgram,
+  		"", NULL, NULL
+  	},
+ 
  	{
  		{"client_encoding", PGC_USERSET, CLIENT_CONN_LOCALE,
  			gettext_noop("Sets the client's character set encoding."),
Index: src/backend/utils/misc/postgresql.conf.sample
===================================================================
RCS file: /projects/cvsroot/pgsql-server/src/backend/utils/misc/postgresql.conf.sample,v
retrieving revision 1.113
diff -c -r1.113 postgresql.conf.sample
*** src/backend/utils/misc/postgresql.conf.sample	7 Apr 2004 05:05:50 -0000	1.113
--- src/backend/utils/misc/postgresql.conf.sample	28 Jun 2004 20:11:37 -0000
***************
*** 103,108 ****
--- 103,119 ----
  
  
  #---------------------------------------------------------------------------
+ # ARCHIVING
+ #---------------------------------------------------------------------------
+ 
+ # - Settings -
+ 
+ #archive_mode = true		# enables archiving of full txn log files
+ #archive_dest = '/tmp'        # specifies destination of archive files
+ #archive_program = 'cp %s %s'   # external archiving program command line
+ 
+ 
+ #---------------------------------------------------------------------------
  # QUERY TUNING
  #---------------------------------------------------------------------------
  
Index: src/bin/initdb/initdb.c
===================================================================
RCS file: /projects/cvsroot/pgsql-server/src/bin/initdb/initdb.c,v
retrieving revision 1.40
diff -c -r1.40 initdb.c
*** src/bin/initdb/initdb.c	24 Jun 2004 19:26:59 -0000	1.40
--- src/bin/initdb/initdb.c	28 Jun 2004 20:11:37 -0000
***************
*** 1828,1834 ****
  	char	   *pgdenv;			/* PGDATA value got from sent to
  								 * environment */
  	char	   *subdirs[] =
! 	{"global", "pg_xlog", "pg_clog", "base", "base/1", "pg_tblspc"};
  
  	progname = get_progname(argv[0]);
  	set_pglocale_pgservice(argv[0], "initdb");
--- 1828,1834 ----
  	char	   *pgdenv;			/* PGDATA value got from sent to
  								 * environment */
  	char	   *subdirs[] =
! 	{"global", "pg_xlog", "pg_xlog/archive_status", "pg_clog", "base", "base/1", "pg_tblspc"};
  
  	progname = get_progname(argv[0]);
  	set_pglocale_pgservice(argv[0], "initdb");
Index: src/include/access/xlog.h
===================================================================
RCS file: /projects/cvsroot/pgsql-server/src/include/access/xlog.h,v
retrieving revision 1.51
diff -c -r1.51 xlog.h
*** src/include/access/xlog.h	29 May 2004 22:48:22 -0000	1.51
--- src/include/access/xlog.h	28 Jun 2004 20:11:37 -0000
***************
*** 210,215 ****
--- 210,219 ----
  extern int	XLOGbuffers;
  extern char *XLOG_sync_method;
  extern const char XLOG_sync_method_default[];
+ extern bool 			XLogArchiveMode;
+ extern bool 			XLogArchiveDEBUG;
+ extern char 			*XLogArchiveDest;
+ extern char 			*XLogArchiveProgram;
  
  #ifdef WAL_DEBUG
  extern bool	XLOG_DEBUG;
Index: src/include/storage/pmsignal.h
===================================================================
RCS file: /projects/cvsroot/pgsql-server/src/include/storage/pmsignal.h,v
retrieving revision 1.8
diff -c -r1.8 pmsignal.h
*** src/include/storage/pmsignal.h	29 May 2004 22:48:23 -0000	1.8
--- src/include/storage/pmsignal.h	28 Jun 2004 20:11:38 -0000
***************
*** 24,29 ****
--- 24,30 ----
  {
  	PMSIGNAL_PASSWORD_CHANGE,	/* pg_pwd file has changed */
  	PMSIGNAL_WAKEN_CHILDREN,	/* send a SIGUSR1 signal to all backends */
+   	PMSIGNAL_WAKEN_ARCHIVER,	/* send a NOTIFY signal to ARCHIVER */
  
  	NUM_PMSIGNALS				/* Must be last value of enum! */
  } PMSignalReason;
---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

               http://www.postgresql.org/docs/faqs/FAQ.html

Reply via email to