Re: [HACKERS] Refactoring standby mode logic

2012-12-03 Thread Simon Riggs
On 29 November 2012 09:06, Heikki Linnakangas hlinnakan...@vmware.com wrote:
 The code that reads the WAL from the archive, from pg_xlog, and from a
 master server via walreceiver, is quite complicated. I'm talking about the
 WaitForWALToBecomeAvailable() function in xlog.c. I got frustrated with that
 while working on the switching timeline over streaming replication patch.

 Attached is a patch to refactor that logic into a more straightforward state
 machine. It's always been a kind of a state machine, but it's been hard to
 see, as the code wasn't explicitly written that way. Any objections?

 The only user-visible effect is that this slightly changes the order that
 recovery tries to read files from the archive, and pg_xlog, in the presence
 of multiple timelines. At the moment, if recovery fails to find a file on
 current timeline in the archive, it then tries to find it in pg_xlog. If
 it's not found there either, it checks if the file on next timeline exists
 in the archive, and then checks if exists in pg_xlog. For example, if we're
 currently recovering timeline 2, and target timeline is 4, and we're looking
 for WAL file A, the files are searched for in this order:

 1. File 0004000A in archive
 2. File 0004000A in pg_xlog
 3. File 0003000A in archive
 4. File 0003000A in pg_xlog
 5. File 0002000A in archive
 6. File 0002000A in pg_xlog

 With this patch, the order is:

 1. File 0004000A in archive
 2. File 0003000A in archive
 3. File 0002000A in archive
 4. File 0004000A in pg_xlog
 5. File 0003000A in pg_xlog
 6. File 0002000A in pg_xlog

 This change should have no effect in normal restore scenarios. It'd only
 make a difference if some files in the middle of the sequence of WAL files
 are missing from the archive, but have been copied to pg_xlog manually, and
 only if that file contains a timeline switch. Even then, I think I like the
 new order better; it's easier to explain if nothing else.

Sorry, forgot to say fine by me.

This probably helps the avoidance of shutdown checkpoints, since for
that, we need to skip retrieving from archive once we're up.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Refactoring standby mode logic

2012-12-03 Thread Heikki Linnakangas

On 30.11.2012 13:11, Dimitri Fontaine wrote:

Hi,

Heikki Linnakangashlinnakan...@vmware.com  writes:

Attached is a patch to refactor that logic into a more straightforward state
machine. It's always been a kind of a state machine, but it's been hard to
see, as the code wasn't explicitly written that way. Any objections?


On a quick glance over, looks good to me. Making that code simpler to
read and reason about seems a good goal.


Thanks.


This change should have no effect in normal restore scenarios. It'd only
make a difference if some files in the middle of the sequence of WAL files
are missing from the archive, but have been copied to pg_xlog manually, and
only if that file contains a timeline switch. Even then, I think I like the
new order better; it's easier to explain if nothing else.


I'm not understanding the sequence difference well enough to comment
here, but I think some people are currently playing tricks in their
failover scripts with moving files directly to the pg_xlog of the server
to be promoted.


That's still perfectly ok. It's only if you have a diverged timeline 
history, and you have files from one timeline in the archive and files 
from another in pg_xlog that you'll see a difference. But in such a 
split situation, it's quite arbitrary which timeline recovery will 
follow anyway, I don't think anyone can sanely rely on either behavior.



Is it possible for your refactoring to keep the old sequence?


Hmm, perhaps, but I think it would complicate the logic a bit. Doesn't 
seem worth it.


Committed..

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Refactoring standby mode logic

2012-12-03 Thread Dimitri Fontaine
Heikki Linnakangas hlinnakan...@vmware.com writes:
 I'm not understanding the sequence difference well enough to comment
 here, but I think some people are currently playing tricks in their
 failover scripts with moving files directly to the pg_xlog of the server
 to be promoted.

 That's still perfectly ok. It's only if you have a diverged timeline
 history, and you have files from one timeline in the archive and files from
 another in pg_xlog that you'll see a difference. But in such a split
 situation, it's quite arbitrary which timeline recovery will follow anyway,
 I don't think anyone can sanely rely on either behavior.

Fair enough.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Refactoring standby mode logic

2012-11-30 Thread Dimitri Fontaine
Hi,

Heikki Linnakangas hlinnakan...@vmware.com writes:
 Attached is a patch to refactor that logic into a more straightforward state
 machine. It's always been a kind of a state machine, but it's been hard to
 see, as the code wasn't explicitly written that way. Any objections?

On a quick glance over, looks good to me. Making that code simpler to
read and reason about seems a good goal.

 This change should have no effect in normal restore scenarios. It'd only
 make a difference if some files in the middle of the sequence of WAL files
 are missing from the archive, but have been copied to pg_xlog manually, and
 only if that file contains a timeline switch. Even then, I think I like the
 new order better; it's easier to explain if nothing else.

I'm not understanding the sequence difference well enough to comment
here, but I think some people are currently playing tricks in their
failover scripts with moving files directly to the pg_xlog of the server
to be promoted.

Is it possible for your refactoring to keep the old sequence?

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Refactoring standby mode logic

2012-11-29 Thread Heikki Linnakangas
The code that reads the WAL from the archive, from pg_xlog, and from a 
master server via walreceiver, is quite complicated. I'm talking about 
the WaitForWALToBecomeAvailable() function in xlog.c. I got frustrated 
with that while working on the switching timeline over streaming 
replication patch.


Attached is a patch to refactor that logic into a more straightforward 
state machine. It's always been a kind of a state machine, but it's been 
hard to see, as the code wasn't explicitly written that way. Any objections?


The only user-visible effect is that this slightly changes the order 
that recovery tries to read files from the archive, and pg_xlog, in the 
presence of multiple timelines. At the moment, if recovery fails to find 
a file on current timeline in the archive, it then tries to find it in 
pg_xlog. If it's not found there either, it checks if the file on next 
timeline exists in the archive, and then checks if exists in pg_xlog. 
For example, if we're currently recovering timeline 2, and target 
timeline is 4, and we're looking for WAL file A, the files are searched 
for in this order:


1. File 0004000A in archive
2. File 0004000A in pg_xlog
3. File 0003000A in archive
4. File 0003000A in pg_xlog
5. File 0002000A in archive
6. File 0002000A in pg_xlog

With this patch, the order is:

1. File 0004000A in archive
2. File 0003000A in archive
3. File 0002000A in archive
4. File 0004000A in pg_xlog
5. File 0003000A in pg_xlog
6. File 0002000A in pg_xlog

This change should have no effect in normal restore scenarios. It'd only 
make a difference if some files in the middle of the sequence of WAL 
files are missing from the archive, but have been copied to pg_xlog 
manually, and only if that file contains a timeline switch. Even then, I 
think I like the new order better; it's easier to explain if nothing else.


- Heikki
*** a/src/backend/access/transam/xlog.c
--- b/src/backend/access/transam/xlog.c
***
*** 512,523  static XLogwrtResult LogwrtResult = {0, 0};
  
  /*
   * Codes indicating where we got a WAL file from during recovery, or where
!  * to attempt to get one.  These are chosen so that they can be OR'd together
!  * in a bitmask state variable.
   */
! #define XLOG_FROM_ARCHIVE		(10)	/* Restored using restore_command */
! #define XLOG_FROM_PG_XLOG		(11)	/* Existing file in pg_xlog */
! #define XLOG_FROM_STREAM		(12)	/* Streamed from master */
  
  /*
   * openLogFile is -1 or a kernel FD for an open log file segment.
--- 512,529 
  
  /*
   * Codes indicating where we got a WAL file from during recovery, or where
!  * to attempt to get one.
   */
! typedef enum
! {
! 	XLOG_FROM_ANY = 0,		/* request to read WAL from any source */
! 	XLOG_FROM_ARCHIVE,		/* restored using restore_command */
! 	XLOG_FROM_PG_XLOG,		/* existing file in pg_xlog */
! 	XLOG_FROM_STREAM,		/* streamed from master */
! } XLogSource;
! 
! /* human-readable names for XLogSources, for debugging output */
! static const char *xlogSourceNames[] = { any, archive, pg_xlog, stream };
  
  /*
   * openLogFile is -1 or a kernel FD for an open log file segment.
***
*** 542,563  static XLogSegNo readSegNo = 0;
  static uint32 readOff = 0;
  static uint32 readLen = 0;
  static bool	readFileHeaderValidated = false;
! static int	readSource = 0;		/* XLOG_FROM_* code */
  
  /*
!  * Keeps track of which sources we've tried to read the current WAL
!  * record from and failed.
   */
! static int	failedSources = 0;	/* OR of XLOG_FROM_* codes */
  
  /*
   * These variables track when we last obtained some WAL data to process,
   * and where we got it from.  (XLogReceiptSource is initially the same as
   * readSource, but readSource gets reset to zero when we don't have data
!  * to process right now.)
   */
  static TimestampTz XLogReceiptTime = 0;
! static int	XLogReceiptSource = 0;		/* XLOG_FROM_* code */
  
  /* Buffer for currently read page (XLOG_BLCKSZ bytes) */
  static char *readBuf = NULL;
--- 548,575 
  static uint32 readOff = 0;
  static uint32 readLen = 0;
  static bool	readFileHeaderValidated = false;
! static XLogSource readSource = 0;		/* XLOG_FROM_* code */
  
  /*
!  * Keeps track of which source we're currently reading from. This is
!  * different from readSource in that this is always set, even when we don't
!  * currently have a WAL file open. If lastSourceFailed is set, our last
!  * attempt to read from currentSource failed, and we should try another source
!  * next.
   */
! static XLogSource currentSource = 0;	/* XLOG_FROM_* code */
! static bool	lastSourceFailed = false;
  
  /*
   * These variables track when we last obtained some WAL data to process,
   * and where we got it from.  (XLogReceiptSource is initially the same as
   * readSource, but readSource gets reset to zero