Re: [PATCHES] [HACKERS] Infrastructure changes for recovery

2008-09-18 Thread Simon Riggs

On Thu, 2008-09-18 at 10:09 -0400, Tom Lane wrote:
> Simon Riggs <[EMAIL PROTECTED]> writes:
> > On Thu, 2008-09-18 at 09:06 -0400, Tom Lane wrote:
> >> Do we really need a checkpoint there at all?
> 
> > "Timelines only change at shutdown checkpoints".
> 
> Hmm.  I *think* that that is just a debugging crosscheck rather than a
> critical property.  But yeah, it would take some close investigation,
> which maybe isn't warranted if you have a less-invasive solution.

The important thing is that everybody uses the new timelineid.

AFAICS we actually write new timelineids into the WAL file if the
previous timelineid, so it can't be that risky.

Ignore v5 then and look for v6 on Monday. Hopefully the final one :-)

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] [HACKERS] Infrastructure changes for recovery

2008-09-18 Thread Tom Lane
Simon Riggs <[EMAIL PROTECTED]> writes:
> On Thu, 2008-09-18 at 09:06 -0400, Tom Lane wrote:
>> Do we really need a checkpoint there at all?

> "Timelines only change at shutdown checkpoints".

Hmm.  I *think* that that is just a debugging crosscheck rather than a
critical property.  But yeah, it would take some close investigation,
which maybe isn't warranted if you have a less-invasive solution.

regards, tom lane

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] [HACKERS] Infrastructure changes for recovery

2008-09-18 Thread Simon Riggs

On Thu, 2008-09-18 at 09:06 -0400, Tom Lane wrote:
> Simon Riggs <[EMAIL PROTECTED]> writes:
> > Having some trouble trying to get a clean state change from recovery to
> > normal mode.
> 
> > Startup needs to be able to write WAL at the end of recovery so it can
> > write a ShutdownCheckpoint, yet must not be allowed to write WAL before
> > that. Other services are still not fully up yet. So every other process
> > apart from Startup musn't write WAL until Startup has fully completed
> > its actions, which is sometime later.
> > ...
> > We *might* solve this by making the final checkpoint that Startup writes
> > into an online checkpoint.
> 
> Do we really need a checkpoint there at all?  I can't recall right now
> if there was a good reason why Vadim made it do that.  I have a feeling
> that it might have had to do with an assumption that the recovery
> environment was completely distinct from live environment; which is a
> statement that's certainly not going to be true in a query-answering
> slave.

Hmm, a question I hadn't considered.

We definitely don't want to do PreallocXlogFiles().

"Timelines only change at shutdown checkpoints". But its just as easy to
write a short timeline change record rather than the checkpoint itself,
so we can sort that out.

It should be sufficient to request bgwriter to perform an immediate
checkpoint, rather than have startup process perform it. That way
startup can exit and we can change to normal processing faster.

If we crash before the next checkpoint then we would revert to archive
recovery or crash recovery. The last restartpoint might well be
somewhere else. In streaming mode we would presumably have that data
locally, so not really a problem. But we musn't mark control file in
production yet, but we can do so following the next checkpoint.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] [HACKERS] Infrastructure changes for recovery

2008-09-18 Thread Tom Lane
Simon Riggs <[EMAIL PROTECTED]> writes:
> Having some trouble trying to get a clean state change from recovery to
> normal mode.

> Startup needs to be able to write WAL at the end of recovery so it can
> write a ShutdownCheckpoint, yet must not be allowed to write WAL before
> that. Other services are still not fully up yet. So every other process
> apart from Startup musn't write WAL until Startup has fully completed
> its actions, which is sometime later.
> ...
> We *might* solve this by making the final checkpoint that Startup writes
> into an online checkpoint.

Do we really need a checkpoint there at all?  I can't recall right now
if there was a good reason why Vadim made it do that.  I have a feeling
that it might have had to do with an assumption that the recovery
environment was completely distinct from live environment; which is a
statement that's certainly not going to be true in a query-answering
slave.

regards, tom lane

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] [HACKERS] Infrastructure changes for recovery

2008-09-18 Thread Simon Riggs

On Thu, 2008-09-18 at 09:05 +0100, Simon Riggs wrote:

> Feels like I should shutdown the bgwriter after recovery and then
> allow it to be cranked up again after normal processing starts, and do
> all of this through postmaster state changes. That way bgwriter
> doesn't need to do a dynamic state change.

This approach appears to be working nicely so far. Some ugly bits of
former patch removed.

Patch passes basic tests and changes state cleanly. 

Restarting test cycle on this patch now, confirm tomorrow.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support
Index: src/backend/access/transam/xlog.c
===
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/backend/access/transam/xlog.c,v
retrieving revision 1.317
diff -c -r1.317 xlog.c
*** src/backend/access/transam/xlog.c	11 Aug 2008 11:05:10 -	1.317
--- src/backend/access/transam/xlog.c	18 Sep 2008 11:39:07 -
***
*** 119,124 
--- 119,126 
  
  /* Are we doing recovery from XLOG? */
  bool		InRecovery = false;
+ bool		reachedSafeStopPoint = false;
+ bool		StartupCanWriteWAL = false;
  
  /* Are we recovering using offline XLOG archives? */
  static bool InArchiveRecovery = false;
***
*** 131,137 
  static bool recoveryTarget = false;
  static bool recoveryTargetExact = false;
  static bool recoveryTargetInclusive = true;
- static bool recoveryLogRestartpoints = false;
  static TransactionId recoveryTargetXid;
  static TimestampTz recoveryTargetTime;
  static TimestampTz recoveryLastXTime = 0;
--- 133,138 
***
*** 286,295 
--- 287,298 
  /*
   * Total shared-memory state for XLOG.
   */
+ #define	XLOGCTL_BUFFER_SPACING	128
  typedef struct XLogCtlData
  {
  	/* Protected by WALInsertLock: */
  	XLogCtlInsert Insert;
+ 	char	InsertPadding[XLOGCTL_BUFFER_SPACING - sizeof(XLogCtlInsert)];
  
  	/* Protected by info_lck: */
  	XLogwrtRqst LogwrtRqst;
***
*** 297,305 
--- 300,315 
  	uint32		ckptXidEpoch;	/* nextXID & epoch of latest checkpoint */
  	TransactionId ckptXid;
  	XLogRecPtr	asyncCommitLSN; /* LSN of newest async commit */
+ 	/* add data structure padding for above info_lck declarations */
+ 	char	InfoPadding[XLOGCTL_BUFFER_SPACING - sizeof(XLogwrtRqst) 
+ - sizeof(XLogwrtResult)
+ - sizeof(uint32)
+ - sizeof(TransactionId)
+ - sizeof(XLogRecPtr)];
  
  	/* Protected by WALWriteLock: */
  	XLogCtlWrite Write;
+ 	char	WritePadding[XLOGCTL_BUFFER_SPACING - sizeof(XLogCtlWrite)];
  
  	/*
  	 * These values do not change after startup, although the pointed-to pages
***
*** 311,316 
--- 321,337 
  	int			XLogCacheBlck;	/* highest allocated xlog buffer index */
  	TimeLineID	ThisTimeLineID;
  
+ 	/*
+ 	 * canWriteWAL changes at the end of recovery only and is only ever set
+ 	 * by the Startup process. We assume that changes to it are atomic,
+ 	 * so accesses to it is never locked. When it does change bgwriter
+ 	 * must immediately begin using it, since this helps it decide whether
+ 	 * to flush WAL or not when it writes dirty blocks. If bgwriter does
+ 	 * it too soon, we will write invalid WAL records and if it reflects the
+ 	 * change too late it could skip flushing WAL for a data block change.
+ 	 */
+ 	bool		canWriteWAL;
+ 
  	slock_t		info_lck;		/* locks shared variables shown above */
  } XLogCtlData;
  
***
*** 480,485 
--- 501,510 
  	bool		doPageWrites;
  	bool		isLogSwitch = (rmid == RM_XLOG_ID && info == XLOG_SWITCH);
  
+ 	/* cross-check on whether we should be here or not */
+ 	if (!(canWriteWAL() || (InRecovery && StartupCanWriteWAL)))
+ 		elog(FATAL, "cannot make new WAL entries during recovery");
+ 
  	/* info's high bits are reserved for use by me */
  	if (info & XLR_INFO_MASK)
  		elog(PANIC, "invalid xlog info mask %02X", info);
***
*** 1677,1684 
  	XLogRecPtr	WriteRqstPtr;
  	XLogwrtRqst WriteRqst;
  
! 	/* Disabled during REDO */
! 	if (InRedo)
  		return;
  
  	/* Quick exit if already known flushed */
--- 1702,1709 
  	XLogRecPtr	WriteRqstPtr;
  	XLogwrtRqst WriteRqst;
  
! 	/* Disabled during StartupXLog */
! 	if (!canWriteWAL())
  		return;
  
  	/* Quick exit if already known flushed */
***
*** 1766,1774 
  	 * the bad page is encountered again during recovery then we would be
  	 * unable to restart the database at all!  (This scenario has actually
  	 * happened in the field several times with 7.1 releases. Note that we
! 	 * cannot get here while InRedo is true, but if the bad page is brought in
! 	 * and marked dirty during recovery then CreateCheckPoint will try to
! 	 * flush it at the end of recovery.)
  	 *
  	 * The current approach is to ERROR under normal conditions, but only
  	 * WARNING during recovery, so that the system can be brought up even if
--- 1791,1799 
  	 * the bad page is encountered again 

Re: [PATCHES] [HACKERS] Infrastructure changes for recovery

2008-09-18 Thread Simon Riggs

On Tue, 2008-09-16 at 15:45 -0400, Tom Lane wrote:
> Simon Riggs <[EMAIL PROTECTED]> wrote:
> > Testing takes a while on this, I probably won't complete it until
> > Friday. So enclosed patch is for eyeballs only at this stage.
> 
> What's the status on that patch?

Having some trouble trying to get a clean state change from recovery to
normal mode.

Startup needs to be able to write WAL at the end of recovery so it can
write a ShutdownCheckpoint, yet must not be allowed to write WAL before
that. Other services are still not fully up yet. So every other process
apart from Startup musn't write WAL until Startup has fully completed
its actions, which is sometime later.

bgwriter needs to know about the impending state change so it can finish
off any checkpoint its currently working on. But then can't start doing
normal processing yet either. So it must have a third state that is
between recovery and normal processing. When normal processing is
reached, it *must* write WAL immediately from that point onwards, yet
using the new timeline that startup decides upon.

So the idea of a single database-wide boolean state seems impossible. We
need a local canInsertWAL flag that is set at different times in
different processes.

Global states changes are now

not started
started - postmaster process then startup process
safestoppingpoint - bgwriter starts
startup_does_shutdown_checkpoint - bgwriter finishes up, just waits,
   startup is now allowed to insert WAL for checkpoint record
startup completes StartupXLog - bgwriter begins normal processing
startup exits - postmaster in normal state

We *might* solve this by making the final checkpoint that Startup writes
into an online checkpoint. That would then allow a straight and safe
transition for bgwriter from just one state to the other. But that
leaves some other ugliness that I don't want to deal with, 'cos there's
other fish frying.

Feels like I should shutdown the bgwriter after recovery and then allow
it to be cranked up again after normal processing starts, and do all of
this through postmaster state changes. That way bgwriter doesn't need to
do a dynamic state change.

Comments anyone?

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches