subject:"Re\: \[HACKERS\] Re\: \[COMMITTERS\] pgsql\: Make standby server continuously retry restoring the next WAL"

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-03-30 Thread Heikki Linnakangas

Fujii Masao wrote:
 * Small code changes to handling of failedSources, inspired by your
 comment. No change in functionality.

 This is also available in my git repository at
 git://git.postgresql.org/git/users/heikki/postgres.git, branch xlogchanges
 
 I looked the patch and was not able to find any big problems until now.
 The attached small patch fixes the typo.

Thanks. Committed with that typo-fix, and I also added a comment
explaining how failedSources and retrying XLogPageRead() works.

I'm now happy with the standby mode logic. It was a bigger struggle than
I anticipated back in January/February, I hope others now find it
intuitive as well. I'm going to work on the documentation of this, along
the lines of the draft I posted last week.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-03-30 Thread Fujii Masao

On Wed, Mar 31, 2010 at 1:28 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 Fujii Masao wrote:
 * Small code changes to handling of failedSources, inspired by your
 comment. No change in functionality.

 This is also available in my git repository at
 git://git.postgresql.org/git/users/heikki/postgres.git, branch xlogchanges

 I looked the patch and was not able to find any big problems until now.
 The attached small patch fixes the typo.

 Thanks. Committed with that typo-fix, and I also added a comment
 explaining how failedSources and retrying XLogPageRead() works.

Great! Thanks a lot!

This change affects various recovery patterns that we support now.
For example, normal crash recovery, archive recovery, recovery using
pg_standby, recovery in standby mode, and so on. So we would need to
test whether all of those patterns work fine with the change.
I'll do it as much as possible.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-03-25 Thread Simon Riggs

On Thu, 2010-03-25 at 11:08 +0900, Fujii Masao wrote:
 On Thu, Mar 25, 2010 at 8:23 AM, Simon Riggs si...@2ndquadrant.com wrote:
  PANICing won't change the situation, so it just destroys server
  availability. If we had 1 master and 42 slaves then this behaviour would
  take down almost the whole server farm at once. Very uncool.
 
  You might have reason to prevent the server starting up at that point,
  when in standby mode, but that is not a reason to PANIC. We don't really
  want all of the standbys thinking they can be the master all at once
  either. Better to throw a serious ERROR and have the server still up and
  available for reads.
 
 OK. How about making the startup process emit WARNING, stop WAL replay and
 wait for the presence of trigger file, when an invalid record is found?
 Which keeps the server up for readonly queries. And if the trigger file is
 found, I think that the startup process should emit a FATAL, i.e., the
 server should exit immediately, to prevent the server from becoming the
 primary in a half-finished state. Also to allow such a halfway failover,
 we should provide fast failover mode as pg_standby does?

The lack of docs begins to show a lack of coherent high-level design
here. By now, I've forgotten what this thread was even about. The major
design decision in this that keeps showing up is remove pg_standby, at
all costs but no reason has ever been given for that. I do believe
there is a better way, but we won't find it by trial and error, even
if we had time to do so.

Please work on some clear docs for the failure modes in this system.
That way we can all read them and understand them, or point out further
issues. Moving straight to code is not a solution to this, since what we
need now is to all agree on the way forwards. If we ignore this, then
there is considerable risk that streaming rep will have a fatal
operational flaw.

Please just document/diagram how it works now, highlighting the problems
that still remain to be solved. We're all behind you and I'm helping
wherever I can.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-03-25 Thread Heikki Linnakangas

Tom Lane wrote:
 Fujii Masao masao.fu...@gmail.com writes:
 OK. How about making the startup process emit WARNING, stop WAL replay and
 wait for the presence of trigger file, when an invalid record is found?
 Which keeps the server up for readonly queries. And if the trigger file is
 found, I think that the startup process should emit a FATAL, i.e., the
 server should exit immediately, to prevent the server from becoming the
 primary in a half-finished state. Also to allow such a halfway failover,
 we should provide fast failover mode as pg_standby does?
 
 I find it extremely scary to read this sort of blue-sky design
 discussion going on now, two months after we were supposedly
 feature-frozen for 9.0.  We need to be looking for the *rock bottom
 minimum* amount of work to do to get 9.0 out the door in a usable
 state; not what would be nice to have later on.

Agreed, this is getting complicated. I'm already worried about the
amount of changes needed to make it work, I don't want to add any new
modes. PANIC seems like the appropriate solution for now.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-03-25 Thread Simon Riggs

On Thu, 2010-03-25 at 11:08 +0900, Fujii Masao wrote:
 On Thu, Mar 25, 2010 at 8:23 AM, Simon Riggs si...@2ndquadrant.com wrote:
  PANICing won't change the situation, so it just destroys server
  availability. If we had 1 master and 42 slaves then this behaviour would
  take down almost the whole server farm at once. Very uncool.
 
  You might have reason to prevent the server starting up at that point,
  when in standby mode, but that is not a reason to PANIC. We don't really
  want all of the standbys thinking they can be the master all at once
  either. Better to throw a serious ERROR and have the server still up and
  available for reads.
 
 OK. How about making the startup process emit WARNING, stop WAL replay and
 wait for the presence of trigger file, when an invalid record is found?
 Which keeps the server up for readonly queries. 

Yes. Receiving new WAL records is a completely separate activity from
running the rest of the server (in this release...).

 And if the trigger file is
 found, I think that the startup process should emit a FATAL, i.e., the
 server should exit immediately, to prevent the server from becoming the
 primary in a half-finished state. 

Please remember that half-finished is your judgment on what has
happened in the particular scenario you are considering. In many cases,
an invalid WAL record clearly and simply indicates the end of WAL and we
should start up normally.

State is a good word here. I'd like to see the server have a clear
state model with well documented transitions between them. The state
should also be externally queriable, so we can work out what its doing
and how long we can expect it to keep doing it for.

I don't want to be in a position where we are waiting for the server to
sort itself out from a complex set of retries. 

 Also to allow such a halfway failover,
 we should provide fast failover mode as pg_standby does?

Yes, we definitely need a JFDI solution for immediate failover.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-03-25 Thread Simon Riggs

On Thu, 2010-03-25 at 10:11 +0200, Heikki Linnakangas wrote:

 PANIC seems like the appropriate solution for now.

It definitely is not. Think some more.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-03-25 Thread Heikki Linnakangas

Simon Riggs wrote:
 On Thu, 2010-03-25 at 11:08 +0900, Fujii Masao wrote:
 And if the trigger file is
 found, I think that the startup process should emit a FATAL, i.e., the
 server should exit immediately, to prevent the server from becoming the
 primary in a half-finished state. 
 
 Please remember that half-finished is your judgment on what has
 happened in the particular scenario you are considering. In many cases,
 an invalid WAL record clearly and simply indicates the end of WAL and we
 should start up normally.

Not in the archive. An invalid WAL record in a file in pg_xlog is
usually an indication of end-of-WAL, but there should be no invalid
records in the archived WAL files, or streamed from the master.

 State is a good word here. I'd like to see the server have a clear
 state model with well documented transitions between them. The state
 should also be externally queriable, so we can work out what its doing
 and how long we can expect it to keep doing it for.

Agreed.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-03-25 Thread Heikki Linnakangas

(cc'ing docs list)

Simon Riggs wrote:
 The lack of docs begins to show a lack of coherent high-level design
 here.

Yeah, I think you're right. It's becoming hard to keep track of how it's
supposed to behave.

 By now, I've forgotten what this thread was even about. The major
 design decision in this that keeps showing up is remove pg_standby, at
 all costs but no reason has ever been given for that. I do believe
 there is a better way, but we won't find it by trial and error, even
 if we had time to do so.

This has nothing to do with pg_standby.

 Please work on some clear docs for the failure modes in this system.
 That way we can all read them and understand them, or point out further
 issues. Moving straight to code is not a solution to this, since what we
 need now is to all agree on the way forwards. If we ignore this, then
 there is considerable risk that streaming rep will have a fatal
 operational flaw.
 
 Please just document/diagram how it works now, highlighting the problems
 that still remain to be solved. We're all behind you and I'm helping
 wherever I can.

Ok, here's my attempt at the docs. Read it as a replacement for the
High Availability, Load Balancing, and Replication chapter, but of
course many of the sections will be unchanged, as indicated below.

-
Chapter 25. High Availability, Load Balancing, and Replication

25.1 Comparison of different solutions

no changes


25.2 Log-Shipping Standby servers

overview from current File-based Log Shipping section. With small
changes so that it applies to the built-in standby mode as well as
pg_standby like solutions

A standby server can also be used for read-only queries. This is called
Hot Standby mode, see chapter XXX

25.2.1 Planning

Set up two servers with identical hardware ...

two first paragraphs of current File-based log-shipping / Planning section

25.2.3 Standby mode

In standby mode, the server continously applies WAL received from the
master server. The standby server can receive WAL from a WAL archive
(see restore_command) or directly from the master over a TCP connection
(streaming replication). The standby server will also attempt to restore
any WAL found in the standby's pg_xlog. That typically happens after a
server restart, to replay again any WAL streamed from the master before
the restart, but you can also manually copy files to pg_xlog at any time
to have them replayed.

At startup, the standby begins by restoring all WAL available in the
archive location, calling restore_command. Once it reaches the end of
WAL available there and restore_command fails, it tries to restore any
WAL available in the pg_xlog directory (possibly stored there by
streaming replication before restart). If that fails, and streaming
replication has been configured, the standby tries to connect to the
master server and stream WAL from it. If that fails or streaming
replication is not configured, or if the connection is disconnected
later on, the standby goes back to step 1 and tries to restoring the
file from the archive again. This loop of retries from the archive,
pg_xlog, and via streaming replication goes on until the server is
stopped or failover is triggered by a trigger file.

A corrupt or half-finished WAL file in the archive, or streamed from the
master, causes a PANIC and immediate shutdown of the standby server. A
corrupt WAL file is always a serious event which requires administrator
action. If you want to recover a WAL file known to be corrupt as far as
it can be, you can copy the file manually into pg_xlog.

Standby mode is exited and the server switches to normal operation, when
a trigger file is found (trigger_file). Before failover, it will restore
any WAL available in the archive or in pg_xlog, but won't try to connect
to the master or wait for files to become available in the archive.


25.2.4 Preparing Master for Standby servers

Set up continous archiving to a WAL archive on the master, as described
in the chapter Continous Archiving and Point-In-Time_recovery. The
archive location should be accessible from the standby even when the
master is down, ie. it should reside on the standby server itself or
another trusted server, not on the master server.

If you want to use streaming replication, set up authentication to allow
streaming replication connections. Set max_wal_senders.

Take a base backup as described in chapter Continous Archiving and
Point-In-Time_recovery / Making a Base Backup.

25.2.4.1 Authentication for streaming replication

Ensure that listen_addresses allows connections from the standby server.

current Streaming Replication / Authentication section, describing
pg_hba.conf


25.2.5 Setting up the standby server

1. Take a base backup, and copy it to the standby

2. Create a restore_command to restore files from the WAL archive.

3. Set standby_mode=on

4. If you want to use streaming replicaton, set primary_conninfo


You can use restartpoint_command to prune the archive of files no longer

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-03-25 Thread Heikki Linnakangas

Simon Riggs wrote:
 On Thu, 2010-03-25 at 10:11 +0200, Heikki Linnakangas wrote:
 
 PANIC seems like the appropriate solution for now.
 
 It definitely is not. Think some more.

Well, what happens now in previous versions with pg_standby et al is
that the standby starts up. That doesn't seem appropriate either.

Hmm, it would be trivial to just stay in the standby mode at a corrupt
file, continuously retrying to restore it and continue replay. If it's
genuinely corrupt, it will never succeed and the standby gets stuck at
that point. Maybe that's better; it's close to what Fujii suggested
except that you don't need a new mode for it.

I'm worried that the administrator won't notice the error promptly
because at a quick glance the server is up and running, while it's
actually stuck at the error and falling indefinitely behind the master.
Maybe if we make it a WARNING, that's enough to alleviate that. It's
true that if the standby is actively being used for read-only queries,
shutting it down to just get the administrators attention isn't good either.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-03-25 Thread Heikki Linnakangas

Heikki Linnakangas wrote:
 Simon Riggs wrote:
 On Thu, 2010-03-25 at 10:11 +0200, Heikki Linnakangas wrote:

 PANIC seems like the appropriate solution for now.
 It definitely is not. Think some more.
 
 Well, what happens now in previous versions with pg_standby et al is
 that the standby starts up. That doesn't seem appropriate either.
 
 Hmm, it would be trivial to just stay in the standby mode at a corrupt
 file, continuously retrying to restore it and continue replay. If it's
 genuinely corrupt, it will never succeed and the standby gets stuck at
 that point. Maybe that's better; it's close to what Fujii suggested
 except that you don't need a new mode for it.

On second thought, that's easy for the built-in standby mode, but not
for archive recovery where we're not currently retrying anything. In
archive recovery, we could throw a WARNING and start up, which would be
like the current behavior in older versions except you get a WARNING
instead of LOG, or we could PANIC. I'm leaning towards PANIC, which
makes most sense for genuine point-in-time or archive recovery (ie. not
a standby server), but I can see the rationale for WARNING too, for
pg_standby and for the sake of preserving old behavior.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-03-25 Thread Heikki Linnakangas

Fujii Masao wrote:
 sources = ~failedSources;
 failedSources |= readSource;
 
 The above lines in XLogPageRead() seem not to be required in normal
 recovery case (i.e., standby_mode = off). So how about the attached
 patch?

 *** 9050,9056  next_record_is_invalid:
 --- 9047,9056 
   readSource = 0;
   
   if (StandbyMode)
 + {
 + failedSources |= readSource;
   goto retry;
 + }
   else
   return false;

That doesn't work because readSource is cleared above. But yeah,
failedSources is not needed in archive recovery, so that line can be
removed.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-03-25 Thread Heikki Linnakangas

Fujii Masao wrote:
 On second thought, the following lines seem to be necessary just after
 calling XLogPageRead() since it reads new WAL file from another source.
 
  if (readSource == XLOG_FROM_STREAM || readSource == XLOG_FROM_ARCHIVE)
  emode = PANIC;
  else
  emode = emode_arg;

Yep.

Here's an updated patch, with these changes since the last patch:

* Fix the bug of a spurious PANIC in archive recovery, if the WAL ends
in the middle of a WAL record that continues over a WAL segment boundary.

* If a corrupt WAL record is found in archive or streamed from master in
standby mode, throw WARNING instead of PANIC, and keep trying. In
archive recovery (ie. standby_mode=off) it's still a PANIC. We can make
it a WARNING too, which gives the pre-9.0 behavior of starting up the
server on corruption. I prefer PANIC but the discussion is still going on.

* Small code changes to handling of failedSources, inspired by your
comment. No change in functionality.

This is also available in my git repository at
git://git.postgresql.org/git/users/heikki/postgres.git, branch xlogchanges

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index e57f22e..4aa1870 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -450,20 +450,33 @@ static uint32 openLogSeg = 0;
 static uint32 openLogOff = 0;
 
 /*
+ * Codes indicating where we got a WAL file from during recovery, or where
+ * to attempt to get one.
+ */
+#define XLOG_FROM_ARCHIVE		(10)	/* Restored using restore_command */
+#define XLOG_FROM_PG_XLOG		(11)	/* Existing file in pg_xlog */
+#define XLOG_FROM_STREAM		(12)	/* Streamed from master */
+
+/*
  * These variables are used similarly to the ones above, but for reading
  * the XLOG.  Note, however, that readOff generally represents the offset
  * of the page just read, not the seek position of the FD itself, which
  * will be just past that page. readLen indicates how much of the current
- * page has been read into readBuf.
+ * page has been read into readBuf, and readSource indicates where we got
+ * the currently open file from.
  */
 static int	readFile = -1;
 static uint32 readId = 0;
 static uint32 readSeg = 0;
 static uint32 readOff = 0;
 static uint32 readLen = 0;
+static int readSource = 0;		/* XLOG_FROM_* code */
 
-/* Is the currently open segment being streamed from primary? */
-static bool readStreamed = false;
+/*
+ * Keeps track of which sources we've tried to read the current WAL
+ * record from and failed.
+ */
+static int failedSources = 0;
 
 /* Buffer for currently read page (XLOG_BLCKSZ bytes) */
 static char *readBuf = NULL;
@@ -517,11 +530,12 @@ static bool InstallXLogFileSegment(uint32 *log, uint32 *seg, char *tmppath,
 	   bool find_free, int *max_advance,
 	   bool use_lock);
 static int XLogFileRead(uint32 log, uint32 seg, int emode, TimeLineID tli,
-			 bool fromArchive, bool notexistOk);
+			 int source, bool notexistOk);
 static int XLogFileReadAnyTLI(uint32 log, uint32 seg, int emode,
-   bool fromArchive);
+   int sources);
 static bool XLogPageRead(XLogRecPtr *RecPtr, int emode, bool fetching_ckpt,
 			 bool randAccess);
+static int emode_for_corrupt_record(int endofwalmode);
 static void XLogFileClose(void);
 static bool RestoreArchivedFile(char *path, const char *xlogfname,
 	const char *recovername, off_t expectedSize);
@@ -2573,7 +2587,7 @@ XLogFileOpen(uint32 log, uint32 seg)
  */
 static int
 XLogFileRead(uint32 log, uint32 seg, int emode, TimeLineID tli,
-			 bool fromArchive, bool notfoundOk)
+			 int source, bool notfoundOk)
 {
 	char		xlogfname[MAXFNAMELEN];
 	char		activitymsg[MAXFNAMELEN + 16];
@@ -2582,23 +2596,28 @@ XLogFileRead(uint32 log, uint32 seg, int emode, TimeLineID tli,
 
 	XLogFileName(xlogfname, tli, log, seg);
 
-	if (fromArchive)
+	switch (source)
 	{
-		/* Report recovery progress in PS display */
-		snprintf(activitymsg, sizeof(activitymsg), waiting for %s,
- xlogfname);
-		set_ps_display(activitymsg, false);
+		case XLOG_FROM_ARCHIVE:
+			/* Report recovery progress in PS display */
+			snprintf(activitymsg, sizeof(activitymsg), waiting for %s,
+	 xlogfname);
+			set_ps_display(activitymsg, false);
 
-		restoredFromArchive = RestoreArchivedFile(path, xlogfname,
-  RECOVERYXLOG,
-  XLogSegSize);
-		if (!restoredFromArchive)
-			return -1;
-	}
-	else
-	{
-		XLogFilePath(path, tli, log, seg);
-		restoredFromArchive = false;
+			restoredFromArchive = RestoreArchivedFile(path, xlogfname,
+	  RECOVERYXLOG,
+	  XLogSegSize);
+			if (!restoredFromArchive)
+return -1;
+			break;
+
+		case XLOG_FROM_PG_XLOG:
+			XLogFilePath(path, tli, log, seg);
+			restoredFromArchive = false;
+			break;
+
+		default:
+			elog(ERROR, invalid XLogFileRead source %d, source);
 	}
 
 	fd = BasicOpenFile(path, O_RDONLY | PG_BINARY, 0);
@@

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-03-25 Thread Robert Haas

On Thu, Mar 25, 2010 at 8:55 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 * If a corrupt WAL record is found in archive or streamed from master in
 standby mode, throw WARNING instead of PANIC, and keep trying. In
 archive recovery (ie. standby_mode=off) it's still a PANIC. We can make
 it a WARNING too, which gives the pre-9.0 behavior of starting up the
 server on corruption. I prefer PANIC but the discussion is still going on.

I don't think we should be changing pre-9.0 behavior more than necessary.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-03-25 Thread Simon Riggs

On Thu, 2010-03-25 at 12:15 +0200, Heikki Linnakangas wrote:
 (cc'ing docs list)
 
 Simon Riggs wrote:
  The lack of docs begins to show a lack of coherent high-level design
  here.
 
 Yeah, I think you're right. It's becoming hard to keep track of how it's
 supposed to behave.

Thank you for responding to that comment positively, I am relieved.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-03-25 Thread Simon Riggs

On Thu, 2010-03-25 at 12:26 +0200, Heikki Linnakangas wrote:
 Simon Riggs wrote:
  On Thu, 2010-03-25 at 10:11 +0200, Heikki Linnakangas wrote:
  
  PANIC seems like the appropriate solution for now.
  
  It definitely is not. Think some more.
 
 Well, what happens now in previous versions with pg_standby et al is
 that the standby starts up. That doesn't seem appropriate either.

Agreed. I said that also, immediately upthread.

Bottom line is I am against anyone being allowed to PANIC the server
just because their piece of it ain't working. The whole purpose of all
of this is High Availability and we don't get that if everybody keeps
stopping for a tea break every time things get tricky. Staying up when
problems occur is the only way to avoid a falling domino taking out the
whole farm.

 I'm worried that the administrator won't notice the error promptly
 because at a quick glance the server is up and running, while it's
 actually stuck at the error and falling indefinitely behind the master.
 Maybe if we make it a WARNING, that's enough to alleviate that. It's
 true that if the standby is actively being used for read-only queries,
 shutting it down to just get the administrators attention isn't good either.

That's what monitoring is for. Let's just make sure this state is
accessible, so people will notice.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-03-25 Thread Fujii Masao

On Thu, Mar 25, 2010 at 9:55 PM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 * Fix the bug of a spurious PANIC in archive recovery, if the WAL ends
 in the middle of a WAL record that continues over a WAL segment boundary.

 * If a corrupt WAL record is found in archive or streamed from master in
 standby mode, throw WARNING instead of PANIC, and keep trying. In
 archive recovery (ie. standby_mode=off) it's still a PANIC. We can make
 it a WARNING too, which gives the pre-9.0 behavior of starting up the
 server on corruption. I prefer PANIC but the discussion is still going on.

Seems reasonable for me.

 * Small code changes to handling of failedSources, inspired by your
 comment. No change in functionality.

 This is also available in my git repository at
 git://git.postgresql.org/git/users/heikki/postgres.git, branch xlogchanges

I looked the patch and was not able to find any big problems until now.
The attached small patch fixes the typo.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


typo.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-03-24 Thread Heikki Linnakangas

Fujii Masao wrote:
 But in the current (v8.4 or before) behavior, recovery ends normally
 when an invalid record is found in an archived WAL file. Otherwise,
 the server would never be able to start normal processing when there
 is a corrupted archived file for some reasons. So, that invalid record
 should not be treated as a PANIC if the server is not in standby mode
 or the trigger file has been created. Thought?

Hmm, true, this changes behavior over previous releases. I tend to think
that it's always an error if there's a corrupt file in the archive,
though, and PANIC is appropriate. If the administrator wants to start up
the database anyway, he can remove the corrupt file from the archive and
place it directly in pg_xlog instead.

 When I tested the patch, the following PANIC error was thrown in the
 normal archive recovery. This seems to derive from the above change.
 The detail error sequence:
 1. In ReadRecord(), emode was set to PANIC after 0001000B
was read.
 2. 0001000C including the contrecord tried to be read
by using the emode (= PANIC). But since 0001000C did
not exist, PANIC error was thrown.
 
 -
 LOG:  restored log file 0001000B from archive
 cp: cannot stat `../data.arh/0001000C': No such file
 or directory
 PANIC:  could not open file pg_xlog/0001000C (log
 file 0, segment 12): No such file or directory
 LOG:  startup process (PID 17204) was terminated by signal 6: Aborted
 LOG:  terminating any other active server processes
 -

Thanks. That's easily fixable (applies over the previous patch):

--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -3773,7 +3773,7 @@ retry:
pagelsn.xrecoff = 0;
}
/* Wait for the next page to become available */
-   if (!XLogPageRead(pagelsn, emode, false, false))
+   if (!XLogPageRead(pagelsn, emode_arg, false, false))
return NULL;

/* Check that the continuation record looks valid */

Perhaps the emode/emode_arg convention is a bit hard to read.

I'll go through the patch myself once more, and commit later today or
tomorrow if now new issues crop up.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-03-24 Thread Fujii Masao

On Wed, Mar 24, 2010 at 9:31 PM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 Hmm, true, this changes behavior over previous releases. I tend to think
 that it's always an error if there's a corrupt file in the archive,
 though, and PANIC is appropriate. If the administrator wants to start up
 the database anyway, he can remove the corrupt file from the archive and
 place it directly in pg_xlog instead.

Okay.

 Thanks. That's easily fixable (applies over the previous patch):

 --- a/src/backend/access/transam/xlog.c
 +++ b/src/backend/access/transam/xlog.c
 @@ -3773,7 +3773,7 @@ retry:
                pagelsn.xrecoff = 0;
            }
            /* Wait for the next page to become available */
 -           if (!XLogPageRead(pagelsn, emode, false, false))
 +           if (!XLogPageRead(pagelsn, emode_arg, false, false))
                return NULL;

            /* Check that the continuation record looks valid */

Seems correct.

 sources = ~failedSources;
 failedSources |= readSource;

The above lines in XLogPageRead() seem not to be required in normal
recovery case (i.e., standby_mode = off). So how about the attached
patch?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


failedSource_v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-03-24 Thread Fujii Masao

On Wed, Mar 24, 2010 at 10:20 PM, Fujii Masao masao.fu...@gmail.com wrote:
 Thanks. That's easily fixable (applies over the previous patch):

 --- a/src/backend/access/transam/xlog.c
 +++ b/src/backend/access/transam/xlog.c
 @@ -3773,7 +3773,7 @@ retry:
                pagelsn.xrecoff = 0;
            }
            /* Wait for the next page to become available */
 -           if (!XLogPageRead(pagelsn, emode, false, false))
 +           if (!XLogPageRead(pagelsn, emode_arg, false, false))
                return NULL;

            /* Check that the continuation record looks valid */

 Seems correct.

On second thought, the following lines seem to be necessary just after
calling XLogPageRead() since it reads new WAL file from another source.

   if (readSource == XLOG_FROM_STREAM || readSource == XLOG_FROM_ARCHIVE)
   emode = PANIC;
   else
   emode = emode_arg;

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-03-24 Thread Simon Riggs

On Wed, 2010-03-24 at 14:31 +0200, Heikki Linnakangas wrote:
 Fujii Masao wrote:
  But in the current (v8.4 or before) behavior, recovery ends normally
  when an invalid record is found in an archived WAL file. Otherwise,
  the server would never be able to start normal processing when there
  is a corrupted archived file for some reasons. So, that invalid record
  should not be treated as a PANIC if the server is not in standby mode
  or the trigger file has been created. Thought?
 
 Hmm, true, this changes behavior over previous releases. I tend to think
 that it's always an error if there's a corrupt file in the archive,
 though, and PANIC is appropriate. If the administrator wants to start up
 the database anyway, he can remove the corrupt file from the archive and
 place it directly in pg_xlog instead.

I don't agree with changing the behaviour from previous releases.

PANICing won't change the situation, so it just destroys server
availability. If we had 1 master and 42 slaves then this behaviour would
take down almost the whole server farm at once. Very uncool.

You might have reason to prevent the server starting up at that point,
when in standby mode, but that is not a reason to PANIC. We don't really
want all of the standbys thinking they can be the master all at once
either. Better to throw a serious ERROR and have the server still up and
available for reads.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-03-24 Thread Fujii Masao

On Thu, Mar 25, 2010 at 8:23 AM, Simon Riggs si...@2ndquadrant.com wrote:
 PANICing won't change the situation, so it just destroys server
 availability. If we had 1 master and 42 slaves then this behaviour would
 take down almost the whole server farm at once. Very uncool.

 You might have reason to prevent the server starting up at that point,
 when in standby mode, but that is not a reason to PANIC. We don't really
 want all of the standbys thinking they can be the master all at once
 either. Better to throw a serious ERROR and have the server still up and
 available for reads.

OK. How about making the startup process emit WARNING, stop WAL replay and
wait for the presence of trigger file, when an invalid record is found?
Which keeps the server up for readonly queries. And if the trigger file is
found, I think that the startup process should emit a FATAL, i.e., the
server should exit immediately, to prevent the server from becoming the
primary in a half-finished state. Also to allow such a halfway failover,
we should provide fast failover mode as pg_standby does?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-03-24 Thread Tom Lane

Fujii Masao masao.fu...@gmail.com writes:
 OK. How about making the startup process emit WARNING, stop WAL replay and
 wait for the presence of trigger file, when an invalid record is found?
 Which keeps the server up for readonly queries. And if the trigger file is
 found, I think that the startup process should emit a FATAL, i.e., the
 server should exit immediately, to prevent the server from becoming the
 primary in a half-finished state. Also to allow such a halfway failover,
 we should provide fast failover mode as pg_standby does?

I find it extremely scary to read this sort of blue-sky design
discussion going on now, two months after we were supposedly
feature-frozen for 9.0.  We need to be looking for the *rock bottom
minimum* amount of work to do to get 9.0 out the door in a usable
state; not what would be nice to have later on.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-03-23 Thread Fujii Masao

Sorry for the delay.

On Fri, Mar 19, 2010 at 8:37 PM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 Here's a patch I've been playing with.

Thanks! I'm reading the patch.

 The idea is that in standby mode,
 the server keeps trying to make progress in the recovery by:

 a) restoring files from archive
 b) replaying files from pg_xlog
 c) streaming from master

 When recovery reaches an invalid WAL record, typically caused by a
 half-written WAL file, it closes the file and moves to the next source.
 If an error is found in a file restored from archive or in a portion
 just streamed from master, however, a PANIC is thrown, because it's not
 expected to have errors in the archive or in the master.

But in the current (v8.4 or before) behavior, recovery ends normally
when an invalid record is found in an archived WAL file. Otherwise,
the server would never be able to start normal processing when there
is a corrupted archived file for some reasons. So, that invalid record
should not be treated as a PANIC if the server is not in standby mode
or the trigger file has been created. Thought?

When I tested the patch, the following PANIC error was thrown in the
normal archive recovery. This seems to derive from the above change.
The detail error sequence:
1. In ReadRecord(), emode was set to PANIC after 0001000B
   was read.
2. 0001000C including the contrecord tried to be read
   by using the emode (= PANIC). But since 0001000C did
   not exist, PANIC error was thrown.

-
LOG:  restored log file 0001000B from archive
cp: cannot stat `../data.arh/0001000C': No such file
or directory
PANIC:  could not open file pg_xlog/0001000C (log
file 0, segment 12): No such file or directory
LOG:  startup process (PID 17204) was terminated by signal 6: Aborted
LOG:  terminating any other active server processes
-

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-03-19 Thread Simon Riggs

On Thu, 2010-03-18 at 23:27 +0900, Fujii Masao wrote:

 I agree that this is a bigger problem. Since the standby always starts
 walreceiver before replaying any WAL files in pg_xlog, walreceiver tries
 to receive the WAL files following the REDO starting point even if they
 have already been in pg_xlog. IOW, the same WAL files might be shipped
 from the primary to the standby many times. This behavior is unsmart,
 and should be addressed.

We might also have written half a file many times. The files in pg_xlog
are suspect whereas the files in the archive are not. If we have both we
should prefer the archive.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-03-19 Thread Heikki Linnakangas

Simon Riggs wrote:
 On Thu, 2010-03-18 at 23:27 +0900, Fujii Masao wrote:
 
 I agree that this is a bigger problem. Since the standby always starts
 walreceiver before replaying any WAL files in pg_xlog, walreceiver tries
 to receive the WAL files following the REDO starting point even if they
 have already been in pg_xlog. IOW, the same WAL files might be shipped
 from the primary to the standby many times. This behavior is unsmart,
 and should be addressed.
 
 We might also have written half a file many times. The files in pg_xlog
 are suspect whereas the files in the archive are not. If we have both we
 should prefer the archive.

Yep.

Here's a patch I've been playing with. The idea is that in standby mode,
the server keeps trying to make progress in the recovery by:

a) restoring files from archive
b) replaying files from pg_xlog
c) streaming from master

When recovery reaches an invalid WAL record, typically caused by a
half-written WAL file, it closes the file and moves to the next source.
If an error is found in a file restored from archive or in a portion
just streamed from master, however, a PANIC is thrown, because it's not
expected to have errors in the archive or in the master.

When a file is streamed from master, it's left in pg_xlog, so it's found
there after a standby restart, and recovery can progress to the same
point as before restart. It also means that you can copy partial WAL
files to pg_xlog at any time and have them replayed in a few seconds.

The code structure is a bit spaghetti-like, I'm afraid. Any suggestions
on how to improve that are welcome..

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com
*** a/src/backend/access/transam/xlog.c
--- b/src/backend/access/transam/xlog.c
***
*** 445,464  static uint32 openLogSeg = 0;
  static uint32 openLogOff = 0;
  
  /*
   * These variables are used similarly to the ones above, but for reading
   * the XLOG.  Note, however, that readOff generally represents the offset
   * of the page just read, not the seek position of the FD itself, which
   * will be just past that page. readLen indicates how much of the current
!  * page has been read into readBuf.
   */
  static int	readFile = -1;
  static uint32 readId = 0;
  static uint32 readSeg = 0;
  static uint32 readOff = 0;
  static uint32 readLen = 0;
  
! /* Is the currently open segment being streamed from primary? */
! static bool readStreamed = false;
  
  /* Buffer for currently read page (XLOG_BLCKSZ bytes) */
  static char *readBuf = NULL;
--- 445,477 
  static uint32 openLogOff = 0;
  
  /*
+  * Codes indicating where we got a WAL file from during recovery, or where
+  * to attempt to get one.
+  */
+ #define XLOG_FROM_ARCHIVE		(10)	/* Restored using restore_command */
+ #define XLOG_FROM_PG_XLOG		(11)	/* Existing file in pg_xlog */
+ #define XLOG_FROM_STREAM		(12)	/* Streamed from master */
+ 
+ /*
   * These variables are used similarly to the ones above, but for reading
   * the XLOG.  Note, however, that readOff generally represents the offset
   * of the page just read, not the seek position of the FD itself, which
   * will be just past that page. readLen indicates how much of the current
!  * page has been read into readBuf, and readSource indicates where we got
!  * the currently open file from.
   */
  static int	readFile = -1;
  static uint32 readId = 0;
  static uint32 readSeg = 0;
  static uint32 readOff = 0;
  static uint32 readLen = 0;
+ static int readSource = 0;		/* XLOG_FROM_* code */
  
! /*
!  * Keeps track of which sources we've tried to read the current WAL
!  * record from and failed.
!  */
! static int failedSources = 0;
  
  /* Buffer for currently read page (XLOG_BLCKSZ bytes) */
  static char *readBuf = NULL;
***
*** 512,520  static bool InstallXLogFileSegment(uint32 *log, uint32 *seg, char *tmppath,
  	   bool find_free, int *max_advance,
  	   bool use_lock);
  static int XLogFileRead(uint32 log, uint32 seg, int emode, TimeLineID tli,
! 			 bool fromArchive, bool notexistOk);
  static int XLogFileReadAnyTLI(uint32 log, uint32 seg, int emode,
!    bool fromArchive);
  static bool XLogPageRead(XLogRecPtr *RecPtr, int emode, bool fetching_ckpt,
  			 bool randAccess);
  static void XLogFileClose(void);
--- 525,533 
  	   bool find_free, int *max_advance,
  	   bool use_lock);
  static int XLogFileRead(uint32 log, uint32 seg, int emode, TimeLineID tli,
! 			 int source, bool notexistOk);
  static int XLogFileReadAnyTLI(uint32 log, uint32 seg, int emode,
!    int sources);
  static bool XLogPageRead(XLogRecPtr *RecPtr, int emode, bool fetching_ckpt,
  			 bool randAccess);
  static void XLogFileClose(void);
***
*** 2567,2573  XLogFileOpen(uint32 log, uint32 seg)
   */
  static int
  XLogFileRead(uint32 log, uint32 seg, int emode, TimeLineID tli,
! 			 bool fromArchive, bool notfoundOk)
  {
  	char		xlogfname[MAXFNAMELEN];
  	char		activitymsg[MAXFNAMELEN + 16];

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-03-19 Thread Tom Lane

Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 Simon Riggs wrote:
 We might also have written half a file many times. The files in pg_xlog
 are suspect whereas the files in the archive are not. If we have both we
 should prefer the archive.

 Yep.

Really?  That will result in a change in the longstanding behavior of
ordinary recovery.  I'm unconvinced that this is wise.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-03-19 Thread Heikki Linnakangas

Tom Lane wrote:
 Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 Simon Riggs wrote:
 We might also have written half a file many times. The files in pg_xlog
 are suspect whereas the files in the archive are not. If we have both we
 should prefer the archive.
 
 Yep.
 
 Really?  That will result in a change in the longstanding behavior of
 ordinary recovery.

Really? Not as far as I can see. Recovery has always tried to restore
WAL files from archive first, falling back to the copy in pg_xlog only
if restore_command returns failure.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-03-19 Thread Alvaro Herrera

Heikki Linnakangas escribió:

 When recovery reaches an invalid WAL record, typically caused by a
 half-written WAL file, it closes the file and moves to the next source.
 If an error is found in a file restored from archive or in a portion
 just streamed from master, however, a PANIC is thrown, because it's not
 expected to have errors in the archive or in the master.

Hmm, I think I've heard that tools like walmgr do incremental copies of
the current WAL segment to the archive.  Doesn't this change break that?
(Maybe I'm confused and it doesn't work that way)

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-03-19 Thread Heikki Linnakangas

Alvaro Herrera wrote:
 Heikki Linnakangas escribió:
 
 When recovery reaches an invalid WAL record, typically caused by a
 half-written WAL file, it closes the file and moves to the next source.
 If an error is found in a file restored from archive or in a portion
 just streamed from master, however, a PANIC is thrown, because it's not
 expected to have errors in the archive or in the master.
 
 Hmm, I think I've heard that tools like walmgr do incremental copies of
 the current WAL segment to the archive.  Doesn't this change break that?

Hmm, you could have a restore_command that checks the size before
restoring to make it still work. I note that pg_standby does that, but
of course you can't use pg_standby with the built-in standby mode. Or
maybe we should modify the built-in standby mode to handle partial files
coming from restore_command by not throwing an error but recovering to
the end of the partial file, and then retrying restore_command again
with the same filename until the whole file is recovered (or the missing
WAL is received through other means, ie. streaming replication).

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-03-18 Thread Fujii Masao

On Wed, Mar 17, 2010 at 7:35 PM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 Fujii Masao wrote:
 I found another missing feature in new file-based log shipping (i.e.,
 standby_mode is enabled and 'cp' is used as restore_command).

 After the trigger file is found, the startup process with pg_standby
 tries to replay all of the WAL files in both pg_xlog and the archive.
 So, when the primary fails, if the latest WAL file in pg_xlog of the
 primary can be read, we can prevent the data loss by copying it to
 pg_xlog of the standby before creating the trigger file.

 On the other hand, the startup process with standby mode doesn't
 replay the WAL files in pg_xlog after the trigger file is found. So
 failover always causes the data loss even if the latest WAL file can
 be read from the primary. And if the latest WAL file is copied to the
 archive instead, it can be replayed but a PANIC error would happen
 because it's not filled.

 We should remove this restriction?

 Looking into this, I realized that we have a bigger problem related to
 this. Although streaming replication stores the streamed WAL files in
 pg_xlog, so that they can be re-replayed after a standby restart without
 connecting to the master, we don't try to replay those either. So if you
 restart standby, it will fail to start up if the WAL it needs can't be
 found in archive or by connecting to the master. That must be fixed.

I agree that this is a bigger problem. Since the standby always starts
walreceiver before replaying any WAL files in pg_xlog, walreceiver tries
to receive the WAL files following the REDO starting point even if they
have already been in pg_xlog. IOW, the same WAL files might be shipped
from the primary to the standby many times. This behavior is unsmart,
and should be addressed.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-03-17 Thread Heikki Linnakangas

Fujii Masao wrote:
 I found another missing feature in new file-based log shipping (i.e.,
 standby_mode is enabled and 'cp' is used as restore_command).
 
 After the trigger file is found, the startup process with pg_standby
 tries to replay all of the WAL files in both pg_xlog and the archive.
 So, when the primary fails, if the latest WAL file in pg_xlog of the
 primary can be read, we can prevent the data loss by copying it to
 pg_xlog of the standby before creating the trigger file.
 
 On the other hand, the startup process with standby mode doesn't
 replay the WAL files in pg_xlog after the trigger file is found. So
 failover always causes the data loss even if the latest WAL file can
 be read from the primary. And if the latest WAL file is copied to the
 archive instead, it can be replayed but a PANIC error would happen
 because it's not filled.
 
 We should remove this restriction?

Looking into this, I realized that we have a bigger problem related to
this. Although streaming replication stores the streamed WAL files in
pg_xlog, so that they can be re-replayed after a standby restart without
connecting to the master, we don't try to replay those either. So if you
restart standby, it will fail to start up if the WAL it needs can't be
found in archive or by connecting to the master. That must be fixed.

I'd imagine that the ability to restore WAL files manually copied to
pg_xlog will fall out of that fix too.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-03-17 Thread Simon Riggs

On Wed, 2010-03-17 at 12:35 +0200, Heikki Linnakangas wrote:

 Looking into this, I realized that we have a bigger problem...

A lot of this would be easier if you do the docs first, then work
through the problems. The new system is more complex, since it has two
modes rather than one and also multiple processes and a live connection.
The number of failure cases must be higher than previously.

Documenting how it is supposed to work in the event of failure will help
everyone check those and comment on them.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-02-18 Thread Fujii Masao

On Fri, Feb 12, 2010 at 2:29 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 So the only major feature we're missing is the ability to clean up old
 files.

I found another missing feature in new file-based log shipping (i.e.,
standby_mode is enabled and 'cp' is used as restore_command).

After the trigger file is found, the startup process with pg_standby
tries to replay all of the WAL files in both pg_xlog and the archive.
So, when the primary fails, if the latest WAL file in pg_xlog of the
primary can be read, we can prevent the data loss by copying it to
pg_xlog of the standby before creating the trigger file.

On the other hand, the startup process with standby mode doesn't
replay the WAL files in pg_xlog after the trigger file is found. So
failover always causes the data loss even if the latest WAL file can
be read from the primary. And if the latest WAL file is copied to the
archive instead, it can be replayed but a PANIC error would happen
because it's not filled.

We should remove this restriction?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-02-14 Thread Fujii Masao

On Sat, Feb 13, 2010 at 1:10 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 Are you thinking of a scenario where remove_command gets stuck, and
 prevents bgwriter from performing restartpoints while it's stuck?

Yes. If there is the archive in the remote server and the network outage
happens, remove_command might get stuck, I'm afraid.

 You
 have trouble if restore_command gets stuck like that as well, so I think
 we can require that the remove_command returns in a reasonable period of
 time, ie. in a few minutes.

Oh, you are right!

BTW, we need to note that remove_command approach would be useless if one
archive is shared by multiple standbys. One standby might wrongly remove
the archived WAL file that has been still required for another standby.
In this case, we need to have the job script that calculates the archived
WAL files that are required by no standbys, and removes them.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-02-12 Thread Simon Riggs

On Fri, 2010-02-12 at 14:38 +0900, Fujii Masao wrote:
 On Thu, Feb 11, 2010 at 11:22 PM, Heikki Linnakangas
 heikki.linnakan...@enterprisedb.com wrote:
  Simon Riggs wrote:
  Might it not be simpler to add a parameter onto pg_standby?
  We send %s to tell pg_standby the standby_mode of the server which is
  calling it so it can decide how to act in each case.
 
  That would work too, but it doesn't seem any simpler to me. On the contrary.
 
 Agreed.
 
 There could be three kinds of SR configurations. Let's think of them 
 separately.
 
 (1) SR without restore_command
 (2) SR with 'cp'
 (3) SR with pg_standby

Thanks for the explanation.

 (1) is the straightforward configuration. In this case the standby replays 
 only
 the WAL files in pg_xlog directory, and starts SR when it has found the 
 invalid
 record or been able to find no more WAL file. Then if SR is terminated for 
 some
 reasons, the standby would periodically try to connect to the primary and 
 start
 SR again. If you choose this, you don't need to care about the problem 
 discussed
 on the thread.
 
 In the (2) case the standby replays the WAL files in not only pg_xlog but also
 the archive, and starts SR when it has found the invalid record or been able 
 to
 find no more WAL file. If the archive is shared between the primary and the
 standby, the standby might restore the partial WAL file being archived 
 (copied)
 by the primary. This could happen because 'cp' is not an atomic operation.
 
 Currently when the standby finds the WAL file whose file size is less than 
 16MB,
 it emits the FATAL error. This is the problem that I presented upthread. That 
 is
 undesirable behavior, so I proposed to just treat that case the same as if no
 more WAL file is found. If so, the standby can start SR instead of emitting 
 the
 FATAL error. (2) is useful configuration as described in Heikki's
 commig message.
 http://archives.postgresql.org/pgsql-committers/2010-01/msg00395.php

 (3) was unexpected configuration (at least for me). This would work fine as a
 *file-based* log shipping but not SR. Since pg_standby doesn't return when no
 more WAL file is found in the archive (i.e., it waits until the next complete
 WAL file is available), SR will never start. OTOH, since pg_standby treats the
 partial file as nonexistence, the problem discussed on this thread doesn't
 happen.

When we refer to pg_standby we mean any script. pg_standby is just a
reference implementation of a useful script. The discussion shouldn't
really focus on pg_standby, nor should we think of it as a different
approach. My original question was whether we are seeking to remove
pg_standby and, if so, have we implemented all of the technical features
that pg_standby provides? Worryingly the answer seems to be Yes and No.
I don't care if we get rid of pg_standby as long as we retain all the
things it does. *Losing* features is not acceptable.

 Questions:
 (A) Is my proposal for (2) reasonable? For me, Yes.
 (B) Should we allow (3) to work as streaming replication? In fact, we should
 create the new mode that makes pg_standby return as soon as it doesn't 
 find
 a complete WAL file in the archive? I agree with Heikki, i.e., don't think
 it's worth doing. Though pg_standby already has the capability to remove 
 the
 old WAL files, we would still need the cron job that removes them
 periodically
 because pg_standby is not executed during SR is running normally.

Yes, I realised that myself overnight and was going to raise this point
with you today. 

In 8.4 it is pg_standby that was responsible for clearing down the
archive, which is why I suggested using pg_standby for that again. I
agree that will not work. The important thing is not pg_standby but that
we have a valid mechanism for clearing down the archive.

If (2) is a fully supported replication method then we cannot rely on
the existence of an external cron job to clear down the archive. Most
importantly, that cron job would not know the point up to which to clear
down the archive, the information given to pg_standby by %r.

So I suggest that you have a new action that gets called after every
checkpoint to clear down the archive. It will remove all files from the
archive prior to %r. We can implement that as a sequence of unlink()s
from within the server, or we can just call a script to do it. I prefer
the latter approach. However we do it, we need something initiated by
the server to maintain the archive and stop it from overflowing. 

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-02-12 Thread Heikki Linnakangas

Simon Riggs wrote:
 In 8.4 it is pg_standby that was responsible for clearing down the
 archive, which is why I suggested using pg_standby for that again. I
 agree that will not work. The important thing is not pg_standby but that
 we have a valid mechanism for clearing down the archive.

Good point. When streaming from the master, the standby doesn't call
restore_command, and restore_command doesn't get a chance to clean up
old files.

 So I suggest that you have a new action that gets called after every
 checkpoint to clear down the archive. It will remove all files from the
 archive prior to %r. We can implement that as a sequence of unlink()s
 from within the server, or we can just call a script to do it. I prefer
 the latter approach. However we do it, we need something initiated by
 the server to maintain the archive and stop it from overflowing. 

+1

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-02-12 Thread Simon Riggs

On Fri, 2010-02-12 at 12:54 +, Simon Riggs wrote:

 So I suggest that you have a new action that gets called after every
 checkpoint to clear down the archive. It will remove all files from the
 archive prior to %r. We can implement that as a sequence of unlink()s
 from within the server, or we can just call a script to do it. I prefer
 the latter approach. However we do it, we need something initiated by
 the server to maintain the archive and stop it from overflowing. 

Attached patch implements pg_standby for use as an
archive_cleanup_command, reusing existing code with new -a option.

e.g.

archive_cleanup_command = 'pg_standby -a -d /mnt/server/archiverdir %r'

Happy to add the archive_cleanup_command into main server as well, if
you like. Won't take long.

-- 
 Simon Riggs   www.2ndQuadrant.com
Index: contrib/pg_standby/pg_standby.c
===
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/contrib/pg_standby/pg_standby.c,v
retrieving revision 1.27
diff -c -r1.27 pg_standby.c
*** contrib/pg_standby/pg_standby.c	4 Nov 2009 12:51:30 -	1.27
--- contrib/pg_standby/pg_standby.c	12 Feb 2010 14:26:42 -
***
*** 53,58 
--- 53,59 
  int			keepfiles = 0;		/* number of WAL files to keep, 0 keep all */
  int			maxretries = 3;		/* number of retries on restore command */
  bool		debug = false;		/* are we debugging? */
+ bool		cleanup_only = false;		/* -a option new for 9.0 */
  bool		need_cleanup = false;		/* do we need to remove files from
  		 * archive? */
  
***
*** 243,249 
  	/*
  	 * Work out name of prior file from current filename
  	 */
! 	if (nextWALFileType == XLOG_DATA)
  	{
  		int			rc;
  		DIR		   *xldir;
--- 244,250 
  	/*
  	 * Work out name of prior file from current filename
  	 */
! 	if (cleanup_only || nextWALFileType == XLOG_DATA)
  	{
  		int			rc;
  		DIR		   *xldir;
***
*** 334,340 
  		 * these files from archive. This shouldn't happen, but better safe
  		 * than sorry.
  		 */
! 		if (strcmp(restartWALFileName, nextWALFileName)  0)
  			return false;
  
  		strcpy(exclusiveCleanupFileName, restartWALFileName);
--- 335,341 
  		 * these files from archive. This shouldn't happen, but better safe
  		 * than sorry.
  		 */
! 		if (!cleanup_only  strcmp(restartWALFileName, nextWALFileName)  0)
  			return false;
  
  		strcpy(exclusiveCleanupFileName, restartWALFileName);
***
*** 512,518 
  static void
  usage(void)
  {
! 	printf(%s allows PostgreSQL warm standby servers to be configured.\n\n, progname);
  	printf(Usage:\n);
  	printf(  %s [OPTION]... ARCHIVELOCATION NEXTWALFILE XLOGFILEPATH [RESTARTWALFILE]\n, progname);
  	printf(\n
--- 513,519 
  static void
  usage(void)
  {
! 	printf(%s allows PostgreSQL standby servers to be configured.\n\n, progname);
  	printf(Usage:\n);
  	printf(  %s [OPTION]... ARCHIVELOCATION NEXTWALFILE XLOGFILEPATH [RESTARTWALFILE]\n, progname);
  	printf(\n
***
*** 520,526 
--- 521,534 
  		 restore_command = 'pg_standby [OPTION]... ARCHIVELOCATION %%f %%p %%r'\n
  		   e.g.\n
  		 restore_command = 'pg_standby -l /mnt/server/archiverdir %%f %%p %%r'\n);
+ 	printf(  %s -a ARCHIVELOCATION RESTARTWALFILE\n, progname);
+ 	printf(\n
+ 		with main intended use as an archive_cleanup_command in the recovery.conf:\n
+ 		 archive_cleanup_command = 'pg_standby -a ARCHIVELOCATION %%r'\n
+ 		   e.g.\n
+ 		 archive_cleanup_command = 'pg_standby -a -d /mnt/server/archiverdir %%r'\n);
  	printf(\nOptions:\n);
+ 	printf(  -a cleans archive only\n);
  	printf(  -c copies file from archive (default)\n);
  	printf(  -d generate lots of debugging output (testing only)\n);
  	printf(  -k NUMFILESTOKEEP  if RESTARTWALFILE not used, removes files prior to limit\n
***
*** 595,604 
  	(void) signal(SIGQUIT, sigquit_handler);
  #endif
  
! 	while ((c = getopt(argc, argv, cdk:lr:s:t:w:)) != -1)
  	{
  		switch (c)
  		{
  			case 'c':			/* Use copy */
  restoreCommandType = RESTORE_COMMAND_COPY;
  break;
--- 603,615 
  	(void) signal(SIGQUIT, sigquit_handler);
  #endif
  
! 	while ((c = getopt(argc, argv, acdk:lr:s:t:w:)) != -1)
  	{
  		switch (c)
  		{
+ 			case 'a':			/* Cleanup only */
+ cleanup_only = true;
+ break;
  			case 'c':			/* Use copy */
  restoreCommandType = RESTORE_COMMAND_COPY;
  break;
***
*** 684,734 
  		exit(2);
  	}
  
! 	if (optind  argc)
! 	{
! 		nextWALFileName = argv[optind];
! 		optind++;
! 	}
! 	else
  	{
! 		fprintf(stderr, %s: use %%f to specify nextWALFileName\n, progname);
! 		fprintf(stderr, Try \%s --help\ for more information.\n, progname);
! 		exit(2);
  	}
  
  	if (optind  argc)
  	{
! 		xlogFilePath = argv[optind];
  		optind++;
  	}
! 	else
  	{
! 		fprintf(stderr, %s: use %%p to specify xlogFilePath\n, progname);
  		fprintf(stderr, Try \%s

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-02-12 Thread Fujii Masao

On Fri, Feb 12, 2010 at 10:10 PM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 So I suggest that you have a new action that gets called after every
 checkpoint to clear down the archive. It will remove all files from the
 archive prior to %r. We can implement that as a sequence of unlink()s
 from within the server, or we can just call a script to do it. I prefer
 the latter approach. However we do it, we need something initiated by
 the server to maintain the archive and stop it from overflowing.

 +1

If we leave executing the remove_command to the bgwriter, the restartpoint
might not happen unfortunately for a long time. To prevent that situation, the
archiver should execute the command, I think. Thought?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-02-12 Thread Heikki Linnakangas

Fujii Masao wrote:
 On Fri, Feb 12, 2010 at 10:10 PM, Heikki Linnakangas
 heikki.linnakan...@enterprisedb.com wrote:
 So I suggest that you have a new action that gets called after every
 checkpoint to clear down the archive. It will remove all files from the
 archive prior to %r. We can implement that as a sequence of unlink()s
 from within the server, or we can just call a script to do it. I prefer
 the latter approach. However we do it, we need something initiated by
 the server to maintain the archive and stop it from overflowing.
 +1
 
 If we leave executing the remove_command to the bgwriter, the restartpoint
 might not happen unfortunately for a long time. 

Are you thinking of a scenario where remove_command gets stuck, and
prevents bgwriter from performing restartpoints while it's stuck? You
have trouble if restore_command gets stuck like that as well, so I think
we can require that the remove_command returns in a reasonable period of
time, ie. in a few minutes.

 To prevent that situation, the
 archiver should execute the command, I think. Thought?

The archiver isn't running in standby, so that's not going to work. And
it's not connected to shared memory either, so it doesn't know what the
latest restartpoint is.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-02-12 Thread Dimitri Fontaine

Simon Riggs si...@2ndquadrant.com writes:
 Attached patch implements pg_standby for use as an
 archive_cleanup_command, reusing existing code with new -a option.

 Happy to add the archive_cleanup_command into main server as well, if
 you like. Won't take long.

Would it be possible to have the server do the cleanup (and the cp for
that matter) on its own or use a script?

Either have archive_cleanup = on and either an commented out (empty)
archive_cleanup_command and the same for the restore_command, or a
special magic value, like 'postgresql' to activate the internal one.

This way we have both a zero config working default setup and the
flexibility to adapt to any archiving solution our user might already be
using.

A default archive_command would be nice too. Like you give it a
directory name or a rsync path and it does the basic right thing.

I'm not sure my detailed approach is the right one, but the birdview is
to have the simple case really simple to setup, and the complex cases
still possible. Default included archive, restore and cleanup commands
would be awesome.

Oh, and the basic simple case actually is with a remote standby. Without
NFS or the like, no shared mount point for archiving. 

Regards,
-- 
dim

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-02-11 Thread Simon Riggs

On Wed, 2010-02-10 at 09:32 +0200, Heikki Linnakangas wrote:
 Fujii Masao wrote:
  As I pointed out previously, the standby might restore a partially-filled
  WAL file that is being archived by the primary, and cause a FATAL error.
  And this happened in my box when I was testing the SR.
  
sby [20088] FATAL:  archive file 00010087 has
  wrong size: 14139392 instead of 16777216
sby [20076] LOG:  startup process (PID 20088) exited with exit code 1
sby [20076] LOG:  terminating any other active server processes
act [18164] LOG:  received immediate shutdown request
  
  If the startup process is in standby mode, I think that it should retry
  starting replication instead of emitting an error when it finds a
  partially-filled file in the archive. Then if the replication has been
  terminated, it has only to restore the archived file again. Thought?
 
 Hmm, so after running restore_command, check the file size and if it's
 too short, treat it the same as if restore_command returned non-zero?
 And it will be retried on the next iteration. Works for me, though OTOH
 it will then fail to complain about a genuinely WAL file that's
 truncated for some reason. I guess there's no way around that, even if
 you have a script as restore_command that does the file size check, it
 will have the same problem.

Are we trying to re-invent pg_standby here?

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-02-11 Thread Heikki Linnakangas

Simon Riggs wrote:
 On Wed, 2010-02-10 at 09:32 +0200, Heikki Linnakangas wrote:
 Hmm, so after running restore_command, check the file size and if it's
 too short, treat it the same as if restore_command returned non-zero?
 And it will be retried on the next iteration. Works for me, though OTOH
 it will then fail to complain about a genuinely WAL file that's
 truncated for some reason. I guess there's no way around that, even if
 you have a script as restore_command that does the file size check, it
 will have the same problem.
 
 Are we trying to re-invent pg_standby here?

That's not the goal, but we seem to need some of the same functionality
in the backend now.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-02-11 Thread Simon Riggs

On Thu, 2010-02-11 at 14:22 +0200, Heikki Linnakangas wrote:
 Simon Riggs wrote:
  On Wed, 2010-02-10 at 09:32 +0200, Heikki Linnakangas wrote:
  Hmm, so after running restore_command, check the file size and if it's
  too short, treat it the same as if restore_command returned non-zero?
  And it will be retried on the next iteration. Works for me, though OTOH
  it will then fail to complain about a genuinely WAL file that's
  truncated for some reason. I guess there's no way around that, even if
  you have a script as restore_command that does the file size check, it
  will have the same problem.
  
  Are we trying to re-invent pg_standby here?
 
 That's not the goal, but we seem to need some of the same functionality
 in the backend now.

I think you need to say why...

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-02-11 Thread Heikki Linnakangas

Simon Riggs wrote:
 On Thu, 2010-02-11 at 14:22 +0200, Heikki Linnakangas wrote:
 Simon Riggs wrote:
 On Wed, 2010-02-10 at 09:32 +0200, Heikki Linnakangas wrote:
 Hmm, so after running restore_command, check the file size and if it's
 too short, treat it the same as if restore_command returned non-zero?
 And it will be retried on the next iteration. Works for me, though OTOH
 it will then fail to complain about a genuinely WAL file that's
 truncated for some reason. I guess there's no way around that, even if
 you have a script as restore_command that does the file size check, it
 will have the same problem.
 Are we trying to re-invent pg_standby here?
 That's not the goal, but we seem to need some of the same functionality
 in the backend now.
 
 I think you need to say why...

See the quoted paragraph above. We should check the file size, so that
we will not fail if the WAL file is just being copied into the archive
directory.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-02-11 Thread Simon Riggs

On Thu, 2010-02-11 at 14:44 +0200, Heikki Linnakangas wrote:
 Simon Riggs wrote:
  On Thu, 2010-02-11 at 14:22 +0200, Heikki Linnakangas wrote:
  Simon Riggs wrote:
  On Wed, 2010-02-10 at 09:32 +0200, Heikki Linnakangas wrote:
  Hmm, so after running restore_command, check the file size and if it's
  too short, treat it the same as if restore_command returned non-zero?
  And it will be retried on the next iteration. Works for me, though OTOH
  it will then fail to complain about a genuinely WAL file that's
  truncated for some reason. I guess there's no way around that, even if
  you have a script as restore_command that does the file size check, it
  will have the same problem.
  Are we trying to re-invent pg_standby here?
  That's not the goal, but we seem to need some of the same functionality
  in the backend now.
  
  I think you need to say why...
 
 See the quoted paragraph above. We should check the file size, so that
 we will not fail if the WAL file is just being copied into the archive
 directory.

We can read, but that's not an explanation. By giving terse answers in
that way you are giving the impression that you don't want discussion on
these points.

If you were running pg_standby as the restore_command then this error
wouldn't happen. So you need to explain why running pg_standby cannot
solve your problem and why we must fix it by replicating code that has
previously existed elsewhere.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-02-11 Thread Heikki Linnakangas

Simon Riggs wrote:
 If you were running pg_standby as the restore_command then this error
 wouldn't happen. So you need to explain why running pg_standby cannot
 solve your problem and why we must fix it by replicating code that has
 previously existed elsewhere.

pg_standby cannot be used with streaming replication.

I guess you're next question is: why not?

The startup process alternates between streaming, and restoring files
from archive using restore_command. It will progress using streaming as
long as it can, but if the connection is lost, it will try to poll the
archive until the connection is established again. The startup process
expects the restore_command to try to restore the file and fail if it's
not found. If the restore_command goes into sleep, waiting for the file
to arrive, that will defeat the retry logic in the server because the
startup process won't get control again to retry establishing the
connection.

That's the the essence of my proposal here:
http://archives.postgresql.org/message-id/4b50afb4.4060...@enterprisedb.com
which is what has now been implemented.

To suppport a restore_command that does the sleeping itself, like
pg_standby, would require a major rearchitecting of the retry logic. And
I don't see why that'd desirable anyway. It's easier for the admin to
set up using simple commands like 'cp' or 'scp', than require him/her to
write scripts that handle the sleeping and retry logic.


The real problem we have right now is missing documentation. It's
starting to hurt us more and more every day, as more people start to
test this. As shown by this thread and some other recent posts.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-02-11 Thread Dimitri Fontaine

Simon Riggs si...@2ndquadrant.com writes:
 If you were running pg_standby as the restore_command then this error
 wouldn't happen. So you need to explain why running pg_standby cannot
 solve your problem and why we must fix it by replicating code that has
 previously existed elsewhere.

Let me try.

pg_standby will not let the server get back to streaming replication
mode once it's done with driving the replay of all the WAL files
available in the archive, but will have the server sits there waiting
for the next file.

The way we want that is implemented now is to have the server switch
back and forth between replaying from the archive and streaming from the
master. So we want the server to restore from the archive the same way
pg_standby used to, except that if the archive does not contain the next
WAL files, we want to get back to streaming.

And the archive reading will resume at next network glitch.

I think it's the reasonning, I hope it explains what you see happening.
-- 
dim

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-02-11 Thread Simon Riggs

On Thu, 2010-02-11 at 15:28 +0200, Heikki Linnakangas wrote:
 Simon Riggs wrote:
  If you were running pg_standby as the restore_command then this error
  wouldn't happen. So you need to explain why running pg_standby cannot
  solve your problem and why we must fix it by replicating code that has
  previously existed elsewhere.
 
 pg_standby cannot be used with streaming replication.

 I guess you're next question is: why not?
 
 The startup process alternates between streaming, and restoring files
 from archive using restore_command. It will progress using streaming as
 long as it can, but if the connection is lost, it will try to poll the
 archive until the connection is established again. The startup process
 expects the restore_command to try to restore the file and fail if it's
 not found. If the restore_command goes into sleep, waiting for the file
 to arrive, that will defeat the retry logic in the server because the
 startup process won't get control again to retry establishing the
 connection.

Why does the startup process need to regain control? Why not just let it
sit and wait? Have you seen that if someone does use pg_standby or
similar scripts in the restore_command that the server will never regain
control in the way you hope. Would that cause a sporadic hang?

The overall design was previously that the solution implementor was in
charge of the archive and only they knew its characteristics.

It seems strange that we will be forced to explicitly ban people from
using a utility they were previously used to using and is still included
with the distro. Then we implement in the server the very things the
utility did. Only this time the solution implementor will not be in
control.

I would not be against implementing all aspects of pg_standby into the
server. It would make life easier in some ways. I am against
implementing only a *few* of the aspects because that leaves solution
architects in a difficult position to know what to do.

Please lay out some options here for discussion by the community. This
seems like a difficult area and not one to be patched up quickly.

 That's the the essence of my proposal here:
 http://archives.postgresql.org/message-id/4b50afb4.4060...@enterprisedb.com
 which is what has now been implemented.
 
 To suppport a restore_command that does the sleeping itself, like
 pg_standby, would require a major rearchitecting of the retry logic. And
 I don't see why that'd desirable anyway. It's easier for the admin to
 set up using simple commands like 'cp' or 'scp', than require him/her to
 write scripts that handle the sleeping and retry logic.
 
 
 The real problem we have right now is missing documentation. It's
 starting to hurt us more and more every day, as more people start to
 test this. As shown by this thread and some other recent posts.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-02-11 Thread Simon Riggs

On Thu, 2010-02-11 at 14:41 +0100, Dimitri Fontaine wrote:
 Simon Riggs si...@2ndquadrant.com writes:
  If you were running pg_standby as the restore_command then this error
  wouldn't happen. So you need to explain why running pg_standby cannot
  solve your problem and why we must fix it by replicating code that has
  previously existed elsewhere.
 
 Let me try.
 
 pg_standby will not let the server get back to streaming replication
 mode once it's done with driving the replay of all the WAL files
 available in the archive, but will have the server sits there waiting
 for the next file.
 
 The way we want that is implemented now is to have the server switch
 back and forth between replaying from the archive and streaming from the
 master. So we want the server to restore from the archive the same way
 pg_standby used to, except that if the archive does not contain the next
 WAL files, we want to get back to streaming.
 
 And the archive reading will resume at next network glitch.
 
 I think it's the reasonning, I hope it explains what you see happening.

OK, thanks.

One question then: how do we ensure that the archive does not grow too
big? pg_standby cleans down the archive using %R. That function appears
to not exist anymore. 

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-02-11 Thread Heikki Linnakangas

Simon Riggs wrote:
 One question then: how do we ensure that the archive does not grow too
 big? pg_standby cleans down the archive using %R. That function appears
 to not exist anymore. 

You can still use %R. Of course, plain 'cp' won't know what to do with
it, so a script will then be required. We should probably provide a
sample of that in the docs, or even a ready-made tool similar to
pg_standby but without the waiting.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-02-11 Thread Aidan Van Dyk

* Heikki Linnakangas heikki.linnakan...@enterprisedb.com [100211 08:29]:
 
 To suppport a restore_command that does the sleeping itself, like
 pg_standby, would require a major rearchitecting of the retry logic. And
 I don't see why that'd desirable anyway. It's easier for the admin to
 set up using simple commands like 'cp' or 'scp', than require him/her to
 write scripts that handle the sleeping and retry logic.

But colour me confused, I'm still not understanding why this is any
different that with normal PITR recovery.

So even with a plain cp in your recovery command instead of a
sleep+copy (a la pg_standby, or PITR tools, or all the home-grown
solutions out thery), I'm not seeing how it's going to get half files.
The only way I can see that is if you're out of disk space in your
recovering pg_xlog.

It's well know in PostgreSQL wal archivne - you don't just shove files
into the archive, you make sure they appear there with the right name
atomically.  And if the master is only running the archive command on
whole WAL files, I just don't understand this whole short wal problem.

And don't try and tell me your just poaching files from a running
cluster's pg_xlog directory, because I'm going to cry...

a.

-- 
Aidan Van Dyk Create like a god,
ai...@highrise.ca   command like a king,
http://www.highrise.ca/   work like a slave.


signature.asc
Description: Digital signature

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-02-11 Thread Simon Riggs

On Thu, 2010-02-11 at 15:55 +0200, Heikki Linnakangas wrote:
 Simon Riggs wrote:
  One question then: how do we ensure that the archive does not grow too
  big? pg_standby cleans down the archive using %R. That function appears
  to not exist anymore. 
 
 You can still use %R. Of course, plain 'cp' won't know what to do with
 it, so a script will then be required. We should probably provide a
 sample of that in the docs, or even a ready-made tool similar to
 pg_standby but without the waiting.

So we still need a script but it can't be pg_standby? Hmmm, OK...

Might it not be simpler to add a parameter onto pg_standby?
We send %s to tell pg_standby the standby_mode of the server which is
calling it so it can decide how to act in each case.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-02-11 Thread Heikki Linnakangas

Aidan Van Dyk wrote:
 But colour me confused, I'm still not understanding why this is any
 different that with normal PITR recovery.
 
 So even with a plain cp in your recovery command instead of a
 sleep+copy (a la pg_standby, or PITR tools, or all the home-grown
 solutions out thery), I'm not seeing how it's going to get half files.

If the file is just being copied to the archive when restore_command
('cp', say) is launched, it will copy a half file. That's not a problem
for PITR, because PITR will end at the end of valid WAL anyway, but
returning a half WAL file in standby mode is a problem.

 It's well know in PostgreSQL wal archivne - you don't just shove files
 into the archive, you make sure they appear there with the right name
 atomically.  And if the master is only running the archive command on
 whole WAL files, I just don't understand this whole short wal problem.

Yeah, if you're careful about that, then this change isn't required. But
pg_standby protects against that, so I think it'd be reasonable to have
the same level of protection built-in. It's not a lot of code.

We could well just document that you should do that, ie. make sure the
file appears in the archive atomically with the right size.

 And don't try and tell me your just poaching files from a running
 cluster's pg_xlog directory, because I'm going to cry...

No :-).

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-02-11 Thread Heikki Linnakangas

Simon Riggs wrote:
 Might it not be simpler to add a parameter onto pg_standby?
 We send %s to tell pg_standby the standby_mode of the server which is
 calling it so it can decide how to act in each case.

That would work too, but it doesn't seem any simpler to me. On the contrary.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-02-11 Thread Simon Riggs

On Thu, 2010-02-11 at 16:22 +0200, Heikki Linnakangas wrote:
 Simon Riggs wrote:
  Might it not be simpler to add a parameter onto pg_standby?
  We send %s to tell pg_standby the standby_mode of the server which is
  calling it so it can decide how to act in each case.
 
 That would work too, but it doesn't seem any simpler to me. On the contrary.

It would mean that pg_standby would act appropriately according to the
setting of standby_mode. So you wouldn't need multiple examples of use,
it would all just work whatever the setting of standby_mode. Nice simple
entry in the docs.

We've said we need a script and pg_standby is it. Having a second
script, pg_standby2, rather than adding something to the existing tools
just seems confusing and easier to get wrong.

I'm happy to do the patch if you like.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-02-11 Thread Aidan Van Dyk

* Heikki Linnakangas heikki.linnakan...@enterprisedb.com [100211 09:17]:

 If the file is just being copied to the archive when restore_command
 ('cp', say) is launched, it will copy a half file. That's not a problem
 for PITR, because PITR will end at the end of valid WAL anyway, but
 returning a half WAL file in standby mode is a problem.

But it can be a problem - without the last WAL (or at least enough of
it) the master switched and archived, you have no guarantee of having
being consistent again (I'm thinking specifically of recovering from a
fresh backup)

 Yeah, if you're careful about that, then this change isn't required. But
 pg_standby protects against that, so I think it'd be reasonable to have
 the same level of protection built-in. It's not a lot of code.

This 1 check isn't, but what about the rest of the things pg_standby
does.  How much functionality should we bring it?  Ideally, all of it.

 We could well just document that you should do that, ie. make sure the
 file appears in the archive atomically with the right size.

I have to admit, today was the first time I went and re-read the PITR
docs, and no, the docs don't seem to talk about that... Maybe it was
just plain obvious to me because it (the atomic apperance) is something
unix devloppers have always had to deal with, so it's ingrained in me.
But I'm *sure* that I've seen that bandied around as common knowledge on
the lists, and one of the reasons we alway see warnings about using
rsync instead of plain SCP, etc.

So ya, we should probably mention that somewhere in the docs.  Section
24.3.6. Caveats?

a.

-- 
Aidan Van Dyk Create like a god,
ai...@highrise.ca   command like a king,
http://www.highrise.ca/   work like a slave.


signature.asc
Description: Digital signature

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-02-11 Thread Greg Smith


Heikki Linnakangas wrote:

Simon Riggs wrote:
  

Might it not be simpler to add a parameter onto pg_standby?
We send %s to tell pg_standby the standby_mode of the server which is
calling it so it can decide how to act in each case.



That would work too, but it doesn't seem any simpler to me. On the contrary.
  


I don't know that the ideas are mutually exclusive.  Should the server 
internals do a basic check that the files they're about to process seem 
sane before processing them?  Yes.  Should we provide a full-featured 
program that knows how far back it can delete archives rather than 
expecting people to write their own?  Yes.  And a modified pg_standby 
seems the easiest way to do that, since it already knows how to do the 
computations, among other benefits.


I already have a work in progress patch to pg_standby I'm racing to 
finish here that cleans up some of the UI warts in the program, 
including the scary sounding and avoidable warnings Selena submitted a 
patch to improve.  I think I'm going to add this feature to it, too, 
whether or not it's deemed necessary.  I'm really tired of example 
setups provided that say just write a simple script using cp like 
trivial example and then supporting those non-production quality 
messes in the field.  My scripts for this sort of thing have more error 
checking in them than functional code.  And pg_standby already has a 
healthy sense of paranoia built into it.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-02-11 Thread Heikki Linnakangas

Simon Riggs wrote:
 On Thu, 2010-02-11 at 16:22 +0200, Heikki Linnakangas wrote:
 Simon Riggs wrote:
 Might it not be simpler to add a parameter onto pg_standby?
 We send %s to tell pg_standby the standby_mode of the server which is
 calling it so it can decide how to act in each case.
 That would work too, but it doesn't seem any simpler to me. On the contrary.
 
 It would mean that pg_standby would act appropriately according to the
 setting of standby_mode. So you wouldn't need multiple examples of use,
 it would all just work whatever the setting of standby_mode. Nice simple
 entry in the docs.
 
 We've said we need a script and pg_standby is it. Having a second
 script, pg_standby2, rather than adding something to the existing tools
 just seems confusing and easier to get wrong.
 
 I'm happy to do the patch if you like.

Well, doesn't really seem that useful to me, but I won't object if you
want to do it.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-02-11 Thread Euler Taveira de Oliveira

Simon Riggs escreveu:
 It would mean that pg_standby would act appropriately according to the
 setting of standby_mode. So you wouldn't need multiple examples of use,
 it would all just work whatever the setting of standby_mode. Nice simple
 entry in the docs.
 
+1. I like the %s idea. IMHO fixing pg_standby for SR is a must-fix. I'm
foreseeing a lot of users asking why pg_standby doesn't work on SR mode ...


-- 
  Euler Taveira de Oliveira
  http://www.timbira.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-02-11 Thread Heikki Linnakangas

Aidan Van Dyk wrote:
 * Heikki Linnakangas heikki.linnakan...@enterprisedb.com [100211 09:17]:
 
 If the file is just being copied to the archive when restore_command
 ('cp', say) is launched, it will copy a half file. That's not a problem
 for PITR, because PITR will end at the end of valid WAL anyway, but
 returning a half WAL file in standby mode is a problem.
 
 But it can be a problem - without the last WAL (or at least enough of
 it) the master switched and archived, you have no guarantee of having
 being consistent again (I'm thinking specifically of recovering from a
 fresh backup)

You have to wait for the last WAL file required by the backup to be
archived before starting recovery. Otherwise there's no guarantee anyway.

 We could well just document that you should do that, ie. make sure the
 file appears in the archive atomically with the right size.
 
 I have to admit, today was the first time I went and re-read the PITR
 docs, and no, the docs don't seem to talk about that... Maybe it was
 just plain obvious to me because it (the atomic apperance) is something
 unix devloppers have always had to deal with, so it's ingrained in me.
 But I'm *sure* that I've seen that bandied around as common knowledge on
 the lists, and one of the reasons we alway see warnings about using
 rsync instead of plain SCP, etc.
 
 So ya, we should probably mention that somewhere in the docs.  Section
 24.3.6. Caveats?

-1. it isn't necessary for PITR. It's a new requirement for
standby_mode='on', unless we add the file size check into the backend. I
think we should add the file size check to the backend instead and save
admins the headache.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-02-11 Thread Heikki Linnakangas

Aidan Van Dyk wrote:
 * Heikki Linnakangas heikki.linnakan...@enterprisedb.com [100211 09:17]:
 
 Yeah, if you're careful about that, then this change isn't required. But
 pg_standby protects against that, so I think it'd be reasonable to have
 the same level of protection built-in. It's not a lot of code.
 
 This 1 check isn't, but what about the rest of the things pg_standby
 does.  How much functionality should we bring it?  Ideally, all of it.

Well, how about we bite the bullet then and add enough bells and
whistles to the backend that pg_standby really isn't needed anymore, and
remove it from contrib?

Looking at the options to pg_standby, we're not missing much:

 Options:
   -c copies file from archive (default)
   -l links into archive (leaves file in archive)

Obsolete (link mode not supported anymore)

   -d generate lots of debugging output (testing only)

We have DEBUG statements in the server...

   -k NUMFILESTOKEEP  if RESTARTWALFILE not used, removes files prior to limit
  (0 keeps all)

This is dangerous, and obsoleted by the RESTARTWALFILE option (%r).

   -r MAXRETRIES  max number of times to retry, with progressive wait
  (default=3)

Frankly this seems pretty useless, but it would be easy to implement

   -s SLEEPTIME   seconds to wait between file checks (min=1, max=60,
  default=5)

The sleep time in the backend is currently hard-coded at 5 s. Should we
make it configurable?

   -t TRIGGERFILE defines a trigger file to initiate failover (no default)

We have this in the backend already.

   -w MAXWAITTIME max seconds to wait for a file (0=no limit) (default=0)

We don't have a timeout in the backend. Should we? (the timeout in
pg_standby won't work, even if we add the option to use pg_standby with
standby_mode='on' as Simon suggested)


So the only major feature we're missing is the ability to clean up old
files.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-02-11 Thread Aidan Van Dyk

* Heikki Linnakangas heikki.linnakan...@enterprisedb.com [100211 12:04]:

  But it can be a problem - without the last WAL (or at least enough of
  it) the master switched and archived, you have no guarantee of having
  being consistent again (I'm thinking specifically of recovering from a
  fresh backup)
 
 You have to wait for the last WAL file required by the backup to be
 archived before starting recovery. Otherwise there's no guarantee anyway.

Right, but now define wait for.  If you pull only half the last WAL
(because you've accepted that you *can* have short WAL files in the
archive), you have the problem with PITR.

Is it wait for it to be in the archive, or wait for it to be in the
archive, and be sure that the contents satisfy some criteria.

I've always made my PITR such that in the archive (i.e. the first
moment a recovery can see it) implies that it's bit-for-bit identical to
the original (or at least as bit-for-bit I can assume by checking
various hashes I can afford to).  I just assumed that was kind of common
practice.

I'm amazed that partial WAL files are every available in anyones
archive, for anyone's  restore command to actually pull.  I find that
scarry, and sure, probably won't regularly be noticed... But man, I'ld
hate the time I need that emergency PITR restore to be the one time when
it needs that WAL, pulls it slightly before the copy has finished (i.e.
the master is pushing the WAL over a WAN to a 2nd site), and have my
restore complete consistently...

a.




-- 
Aidan Van Dyk Create like a god,
ai...@highrise.ca   command like a king,
http://www.highrise.ca/   work like a slave.


signature.asc
Description: Digital signature

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-02-11 Thread Tom Lane

Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 -1. it isn't necessary for PITR. It's a new requirement for
 standby_mode='on', unless we add the file size check into the backend. I
 think we should add the file size check to the backend instead and save
 admins the headache.

I think the file size check needs to be in the backend purely on safety
grounds.  Whether we make pg_standby useful for this context is a
different discussion.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-02-11 Thread Heikki Linnakangas

Aidan Van Dyk wrote:
 * Heikki Linnakangas heikki.linnakan...@enterprisedb.com [100211 12:04]:
 
 But it can be a problem - without the last WAL (or at least enough of
 it) the master switched and archived, you have no guarantee of having
 being consistent again (I'm thinking specifically of recovering from a
 fresh backup)
 You have to wait for the last WAL file required by the backup to be
 archived before starting recovery. Otherwise there's no guarantee anyway.
 
 Right, but now define wait for.

As in don't start postmaster until the last WAL file needed by backup
has been fully copied to the archive.

 (because you've accepted that you *can* have short WAL files in the
 archive)

Only momentarily, while the copy is in progress.

 I've always made my PITR such that in the archive (i.e. the first
 moment a recovery can see it) implies that it's bit-for-bit identical to
 the original (or at least as bit-for-bit I can assume by checking
 various hashes I can afford to).  I just assumed that was kind of common
 practice.

It's certainly good practice, agreed, but hasn't been absolutely required.

 I'm amazed that partial WAL files are every available in anyones
 archive, for anyone's  restore command to actually pull.  I find that
 scarry, and sure, probably won't regularly be noticed... But man, I'ld
 hate the time I need that emergency PITR restore to be the one time when
 it needs that WAL, pulls it slightly before the copy has finished (i.e.
 the master is pushing the WAL over a WAN to a 2nd site), and have my
 restore complete consistently...

It's not as dramatic as you make it sound. We're only talking about the
last WAL file, and only when it's just being copied to the archive. If
you have a archive_command like 'cp', and you look at the archive at the
same millisecond that 'cp' runs, then yes you will see that the latest
WAL file in the archive is only partially copied. It's not a problem for
robustness; if you had looked one millisecond earlier you would not have
seen the file there at all.

Windows 'copy' command preallocates the whole file, which poses a
different problem: if you look at the file while it's being copied, the
file has the right length, but isn't in fact fully copied yet. I think
'rsync' has the same problem. To avoid that issue, you have to use
something like copy+rename to make it atomic. There isn't much we can do
in the server (or in pg_standby) to work around that, because there's no
way to distinguish a file that's being copied from a fully-copied
corrupt file.

We do advise to set up an archive_command that doesn't overwrite
existing files. That together with a partial WAL segment can cause a
problem: if archive_command crashes while it's writing the file, leaving
a partial file in the archive, the subsequent run of archive_command
won't overwrite it and will get stuck trying. However, there's a small
window for that even if you put the file into the archive atomically: if
you crash just after fully copying the file, but before the .done file
is created, upon restart the server will also try to copy the file to
archive, find that it already exists, and fail.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-02-11 Thread Simon Riggs

On Thu, 2010-02-11 at 13:08 -0500, Tom Lane wrote:
 Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
  -1. it isn't necessary for PITR. It's a new requirement for
  standby_mode='on', unless we add the file size check into the backend. I
  think we should add the file size check to the backend instead and save
  admins the headache.
 
 I think the file size check needs to be in the backend purely on safety
 grounds.  Whether we make pg_standby useful for this context is a
 different discussion.

Happy with that.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-02-11 Thread Kevin Grittner

Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote:
 
 I think 'rsync' has the same problem.
 
There is a switch you can use to create the problem under rsync, but
by default rsync copies to a temporary file name and moves the
completed file to the target name.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-02-11 Thread Garick Hamlin

On Thu, Feb 11, 2010 at 01:22:44PM -0500, Kevin Grittner wrote:
 Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote:
  
  I think 'rsync' has the same problem.
  
 There is a switch you can use to create the problem under rsync, but
 by default rsync copies to a temporary file name and moves the
 completed file to the target name.
  
 -Kevin

I don't use PITR, So I don't know any of the well understood facts about 
PITR with postgres, but my understanding with rsync is ...

It doesn't fsync data before rename, its something like: open / write / 
close / rename, which could lead to zero length files on some filesystems.  
(are there other anomalies to worry about here?)

Would that be a concern?  

Garick

 
 -- 
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-02-11 Thread Simon Riggs

On Thu, 2010-02-11 at 19:29 +0200, Heikki Linnakangas wrote:
 Aidan Van Dyk wrote:
  * Heikki Linnakangas heikki.linnakan...@enterprisedb.com [100211 09:17]:
  
  Yeah, if you're careful about that, then this change isn't required. But
  pg_standby protects against that, so I think it'd be reasonable to have
  the same level of protection built-in. It's not a lot of code.
  
  This 1 check isn't, but what about the rest of the things pg_standby
  does.  How much functionality should we bring it?  Ideally, all of it.
 
 Well, how about we bite the bullet then and add enough bells and
 whistles to the backend that pg_standby really isn't needed anymore, and
 remove it from contrib?
 
 Looking at the options to pg_standby, we're not missing much:

You're attitude to this makes me laugh, and cry. 

Removing pg_standby can be done and if you manage it well, I would be
happy. I laugh because it seems like you think I have some ego tied up
in it. It's hardly my finest hour of coding, especially since I've not
had the ability to improve it much before now.

Not missing *much*?! I cry because you're so blase about missing one or
two essential features that took years to put in place. Keeping
pg_standby at least guarantees we can do something about it when we
discover you didn't do a good enough job removing it.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-02-11 Thread Fujii Masao

On Thu, Feb 11, 2010 at 11:22 PM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 Simon Riggs wrote:
 Might it not be simpler to add a parameter onto pg_standby?
 We send %s to tell pg_standby the standby_mode of the server which is
 calling it so it can decide how to act in each case.

 That would work too, but it doesn't seem any simpler to me. On the contrary.

Agreed.

There could be three kinds of SR configurations. Let's think of them separately.

(1) SR without restore_command
(2) SR with 'cp'
(3) SR with pg_standby

(1) is the straightforward configuration. In this case the standby replays only
the WAL files in pg_xlog directory, and starts SR when it has found the invalid
record or been able to find no more WAL file. Then if SR is terminated for some
reasons, the standby would periodically try to connect to the primary and start
SR again. If you choose this, you don't need to care about the problem discussed
on the thread.

In the (2) case the standby replays the WAL files in not only pg_xlog but also
the archive, and starts SR when it has found the invalid record or been able to
find no more WAL file. If the archive is shared between the primary and the
standby, the standby might restore the partial WAL file being archived (copied)
by the primary. This could happen because 'cp' is not an atomic operation.

Currently when the standby finds the WAL file whose file size is less than 16MB,
it emits the FATAL error. This is the problem that I presented upthread. That is
undesirable behavior, so I proposed to just treat that case the same as if no
more WAL file is found. If so, the standby can start SR instead of emitting the
FATAL error. (2) is useful configuration as described in Heikki's
commig message.
http://archives.postgresql.org/pgsql-committers/2010-01/msg00395.php

(3) was unexpected configuration (at least for me). This would work fine as a
*file-based* log shipping but not SR. Since pg_standby doesn't return when no
more WAL file is found in the archive (i.e., it waits until the next complete
WAL file is available), SR will never start. OTOH, since pg_standby treats the
partial file as nonexistence, the problem discussed on this thread doesn't
happen.

Questions:
(A) Is my proposal for (2) reasonable? For me, Yes.
(B) Should we allow (3) to work as streaming replication? In fact, we should
create the new mode that makes pg_standby return as soon as it doesn't find
a complete WAL file in the archive? I agree with Heikki, i.e., don't think
it's worth doing. Though pg_standby already has the capability to remove the
old WAL files, we would still need the cron job that removes them
periodically
because pg_standby is not executed during SR is running normally.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-02-11 Thread Heikki Linnakangas

Simon Riggs wrote:
 On Thu, 2010-02-11 at 13:08 -0500, Tom Lane wrote:
 Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 -1. it isn't necessary for PITR. It's a new requirement for
 standby_mode='on', unless we add the file size check into the backend. I
 think we should add the file size check to the backend instead and save
 admins the headache.
 I think the file size check needs to be in the backend purely on safety
 grounds.  Whether we make pg_standby useful for this context is a
 different discussion.
 
 Happy with that.

Committed such a check now.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-02-10 Thread Aidan Van Dyk

* Heikki Linnakangas heikki.linnakan...@enterprisedb.com [100210 02:33]:
 
 Hmm, so after running restore_command, check the file size and if it's
 too short, treat it the same as if restore_command returned non-zero?
 And it will be retried on the next iteration. Works for me, though OTOH
 it will then fail to complain about a genuinely WAL file that's
 truncated for some reason. I guess there's no way around that, even if
 you have a script as restore_command that does the file size check, it
 will have the same problem.

But isn't this something every current PITR archive already works
around...  Everybody doing PITR archives already know the importance of
making the *appearance* of the WAL filename in the archive atomic.

Don't docs warn about plain cp not being atomic and allowing short
files to appear in the archive... 

I'm not sure why streaming recovery suddenly changes the requirements...

a.

-- 
Aidan Van Dyk Create like a god,
ai...@highrise.ca   command like a king,
http://www.highrise.ca/   work like a slave.


signature.asc
Description: Digital signature

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-02-10 Thread Heikki Linnakangas

Aidan Van Dyk wrote:
 * Heikki Linnakangas heikki.linnakan...@enterprisedb.com [100210 02:33]:
  
 Hmm, so after running restore_command, check the file size and if it's
 too short, treat it the same as if restore_command returned non-zero?
 And it will be retried on the next iteration. Works for me, though OTOH
 it will then fail to complain about a genuinely WAL file that's
 truncated for some reason. I guess there's no way around that, even if
 you have a script as restore_command that does the file size check, it
 will have the same problem.
 
 But isn't this something every current PITR archive already works
 around...  Everybody doing PITR archives already know the importance of
 making the *appearance* of the WAL filename in the archive atomic.

Well, pg_standby does defend against that, but you don't use pg_standby
with the built-in standby mode anymore. It would be reasonable to have
the same level of defenses built-in. It's essentially a one-line change,
and saves a lot of trouble and risk of subtle misconfiguration for admins.

 Don't docs warn about plain cp not being atomic and allowing short
 files to appear in the archive... 

Hmm, I don't see anything about that at quick glance. Besides, normal
PITR doesn't have a problem with that, because it will stop when it
reaches the end of archived WAL anyway.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

72 matches

Mail list logo