Re: [HACKERS] Streaming replication and WAL archive interactions

2015-05-13 Thread Robert Haas
On Mon, May 11, 2015 at 12:00 PM, Heikki Linnakangas hlinn...@iki.fi wrote:
 And here is a new version of the patch. I kept the approach of using pgstat,
 but it now only polls pgstat every 10 seconds, and doesn't block to wait for
 updated stats.

It's not entirely a new problem, but this error message has gotten pretty crazy:

+   (errmsg(WAL archival
(archive_mode=on/always/shared) requires wal_level \archive\,
\hot_standby\, or \logical\)));

Maybe: WAL archival cannot be enabled when wal_level is minimal

I think the documentation should be explicit about what happens if the
primary archives a file and dies before the standby gets notified that
the archiving happened.  The standby, running in shared mode, is then
promoted.  My first guess would be that the standby will end up with
files that thinks it needs to archive but, being unable to do so
because they're already there, they'll live forever in pg_xlog.  I
hope that's not the case.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming replication and WAL archive interactions

2015-05-13 Thread Robert Haas
On Wed, May 13, 2015 at 8:53 AM, Heikki Linnakangas hlinn...@iki.fi wrote:
 Our manual says that archive_command should refuse to overwrite an existing
 file. But to work-around the double-archival problem, where the same file is
 archived twice, it would be even better if it would simply return success if
 the file exists, *and has identical contents*. I don't know how to code that
 logic in a simple one-liner though.

This is why we really, really need that pg_copy command that was
proposed a while back.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming replication and WAL archive interactions

2015-05-13 Thread Heikki Linnakangas

On 05/13/2015 03:36 PM, Robert Haas wrote:

On Mon, May 11, 2015 at 12:00 PM, Heikki Linnakangas hlinn...@iki.fi wrote:

And here is a new version of the patch. I kept the approach of using pgstat,
but it now only polls pgstat every 10 seconds, and doesn't block to wait for
updated stats.


It's not entirely a new problem, but this error message has gotten pretty crazy:

+   (errmsg(WAL archival
(archive_mode=on/always/shared) requires wal_level \archive\,
\hot_standby\, or \logical\)));

Maybe: WAL archival cannot be enabled when wal_level is minimal

I think the documentation should be explicit about what happens if the
primary archives a file and dies before the standby gets notified that
the archiving happened.


Yes, good point.


 The standby, running in shared mode, is then
promoted.  My first guess would be that the standby will end up with
files that thinks it needs to archive but, being unable to do so
because they're already there, they'll live forever in pg_xlog.  I
hope that's not the case.


Hmm. That is exactly what happens. The standby will attempt to archive 
them, which will fail, so the archiver will get stuck retrying.


That's not actually a new problem though. Even with a single server 
doing archiving, it's possible that you crash just after archive_command 
has archived a file, but before it has created the .done file. After 
restart, the server will try to archive the file again, which will fail. 
But yeah, with this patch, that's much more likely to happen after a 
promotion.


Our manual says that archive_command should refuse to overwrite an 
existing file. But to work-around the double-archival problem, where the 
same file is archived twice, it would be even better if it would simply 
return success if the file exists, *and has identical contents*. I don't 
know how to code that logic in a simple one-liner though.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming replication and WAL archive interactions

2015-05-13 Thread Heikki Linnakangas

On 05/13/2015 04:29 PM, Robert Haas wrote:

On Wed, May 13, 2015 at 8:53 AM, Heikki Linnakangas hlinn...@iki.fi wrote:

Our manual says that archive_command should refuse to overwrite an existing
file. But to work-around the double-archival problem, where the same file is
archived twice, it would be even better if it would simply return success if
the file exists, *and has identical contents*. I don't know how to code that
logic in a simple one-liner though.


This is why we really, really need that pg_copy command that was
proposed a while back.


Yeah..

I took a step back and looked at the big picture again:

If we just implement the always mode, and you have a pg_copy command 
or similar that handles duplicates correctly, you don't necessarily need 
the shared mode at all. You can just set archive_command='always', and 
have the master and standby archive to the same location. As long as the 
archive_command works correctly and is race-free, that should work.


I cut back the patch to implement just the always mode. The shared 
mode might still make sense as a future patch, as I think it's easier to 
understand and has less strict requirements for the archive_command, but 
let's take one step at a time.


So attached is a patch that just adds the always mode. This is pretty 
close to what Fujii submitted long ago.


- Heikki

From 71332900247a8c68a61fcf60782cb35cf662b756 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas heikki.linnakangas@iki.fi
Date: Thu, 16 Apr 2015 14:40:24 +0300
Subject: [PATCH 1/1] Add archive_mode='always' option.

In 'always' mode, the standby's WAL archive is taken to be separate from the
primary's, and the standby independently archives all files it receives from
the primary.

Fujii Masao and me.
---
 doc/src/sgml/config.sgml  | 13 +++--
 doc/src/sgml/high-availability.sgml   | 39 +++
 src/backend/access/transam/xlog.c | 22 +--
 src/backend/access/transam/xlogarchive.c  |  5 +++-
 src/backend/postmaster/postmaster.c   | 37 ++---
 src/backend/replication/walreceiver.c | 10 +--
 src/backend/utils/misc/guc.c  | 21 ---
 src/backend/utils/misc/postgresql.conf.sample |  2 +-
 src/include/access/xlog.h | 13 +++--
 9 files changed, 133 insertions(+), 29 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 0d8624a..5549b7d 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -2521,7 +2521,7 @@ include_dir 'conf.d'
 
 variablelist
  varlistentry id=guc-archive-mode xreflabel=archive_mode
-  termvarnamearchive_mode/varname (typeboolean/type)
+  termvarnamearchive_mode/varname (typeenum/type)
   indexterm
primaryvarnamearchive_mode/ configuration parameter/primary
   /indexterm
@@ -2530,7 +2530,16 @@ include_dir 'conf.d'
para
 When varnamearchive_mode/ is enabled, completed WAL segments
 are sent to archive storage by setting
-xref linkend=guc-archive-command.
+xref linkend=guc-archive-command. In addition to literaloff/,
+to disable, there are two modes: literalon/, and
+literalalways/. During normal operation, there is no
+difference between the two modes, but when set to literalalways/
+the WAL archiver is enabled also during archive recovery or standby
+mode. In literalalways/ mode, all files restored from the archive
+or streamed with streaming replication will be archived (again). See
+xref linkend=continuous-archiving-in-standby for details.
+   /para  
+   para
 varnamearchive_mode/ and varnamearchive_command/ are
 separate variables so that varnamearchive_command/ can be
 changed without leaving archiving mode.
diff --git a/doc/src/sgml/high-availability.sgml b/doc/src/sgml/high-availability.sgml
index a17f555..e93b711 100644
--- a/doc/src/sgml/high-availability.sgml
+++ b/doc/src/sgml/high-availability.sgml
@@ -1220,6 +1220,45 @@ primary_slot_name = 'node_a_slot'
 
/sect3
   /sect2
+
+  sect2 id=continuous-archiving-in-standby
+   titleContinuous archiving in standby/title
+
+   indexterm
+ primarycontinuous archiving/primary
+ secondaryin standby/secondary
+   /indexterm
+
+   para
+ When continuous WAL archiving is used in a standby, there are two
+ different scenarios: the WAL archive can be shared between the primary
+ and the standby, or the standby can have its own WAL archive. When
+ the standby has its own WAL archive, set varnamearchive_mode/varname
+ to literalalways/literal, and the standby will call the archive
+ command for every WAL segment it receives, whether it's by restoring
+ from the archive or by streaming replication. The shared archive can
+ be handled similarly, but the archive_command should test if the file
+ being archived 

Re: [HACKERS] Streaming replication and WAL archive interactions

2015-05-11 Thread Heikki Linnakangas

On 05/08/2015 04:21 PM, Heikki Linnakangas wrote:

On 04/22/2015 10:07 AM, Michael Paquier wrote:

On Wed, Apr 22, 2015 at 3:38 PM, Heikki Linnakangas hlinn...@iki.fi wrote:

I feel that the best approach is to archive the last, partial segment, but
with the .partial suffix. I don't see any plausible real-wold setup where
the current behavior would be better. I don't really see much need to
archive the partial segment at all, but there's also no harm in doing it, as
long as it's clearly marked with the .partial suffix.


Well, as long as it is clearly archived at promotion, even with a
suffix, I guess that I am fine... This will need some tweaking on
restore_command for existing applications, but as long as it is
clearly documented I am fine. Shouldn't this be a different patch
though?


Ok, I came up with the attached, which adds the .partial suffix to the
partial WAL segment that's archived after promotion. I couldn't find any
natural place to talk about it in the docs, though. I think after the
docs changes from the main patch are applied, it would be natural to
mention this in the Continuous archiving in standby, so I think I'll
add that later.

Barring objections, I'll push this later tonight.


Applied that part.


Now that we got this last-partial-segment problem out of the way, I'm
going to try fixing the problem you (Michael) pointed out about relying
on pgstat file. Meanwhile, I'd love to get more feedback on the rest of
the patch, and the documentation.


And here is a new version of the patch. I kept the approach of using 
pgstat, but it now only polls pgstat every 10 seconds, and doesn't block 
to wait for updated stats.


- Heikki

From 08ca3cc7b9824503b793e149247ea9c6d3a7f323 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas heikki.linnakangas@iki.fi
Date: Thu, 16 Apr 2015 14:40:24 +0300
Subject: [PATCH v3 1/1] Make WAL archival behave more sensibly in standby
 mode.

This adds two new archive_modes, 'shared' and 'always', to indicate whether
the WAL archive is shared between the primary and standby, or not. In
shared mode, the standby tracks which files have been archived by the
primary. The standby refrains from recycling files that the primary has not
yet archived, and at failover, the standby archives all those files too
from the old timeline. In 'always' mode, the standby's WAL archive is taken
to be separate from the primary's, and the standby independently archives
all files it receives from the primary.

This adds a new archival status message to the protocol. WAL sender sends
one automatically, when the last archived WAL file, as reported in pgstat,
changes. (Or rather, some time after it changes. We're not in a hurry, the
standby doesn't need an up-to-the-second status)

Fujii Masao and me.
---
 doc/src/sgml/config.sgml  |  12 +-
 doc/src/sgml/high-availability.sgml   |  48 +++
 doc/src/sgml/protocol.sgml|  31 +
 src/backend/access/transam/xlog.c |  29 +++-
 src/backend/postmaster/pgstat.c   |  44 ++
 src/backend/postmaster/postmaster.c   |  37 +++--
 src/backend/replication/walreceiver.c | 172 +++-
 src/backend/replication/walsender.c   | 186 ++
 src/backend/utils/misc/guc.c  |  21 +--
 src/backend/utils/misc/postgresql.conf.sample |   2 +-
 src/include/access/xlog.h |  14 +-
 src/include/pgstat.h  |   2 +
 12 files changed, 513 insertions(+), 85 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 0d8624a..ac845e0 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -2521,7 +2521,7 @@ include_dir 'conf.d'
 
 variablelist
  varlistentry id=guc-archive-mode xreflabel=archive_mode
-  termvarnamearchive_mode/varname (typeboolean/type)
+  termvarnamearchive_mode/varname (typeenum/type)
   indexterm
primaryvarnamearchive_mode/ configuration parameter/primary
   /indexterm
@@ -2530,7 +2530,15 @@ include_dir 'conf.d'
para
 When varnamearchive_mode/ is enabled, completed WAL segments
 are sent to archive storage by setting
-xref linkend=guc-archive-command.
+xref linkend=guc-archive-command. In addition to literaloff/,
+to disable, there are three modes: literalon/, literalshared/,
+and literalalways/. During normal operation, there is no
+difference between the three modes, but in archive recovery or
+standby mode, it indicates whether the WAL archive is shared between
+the primary and the standby server or not. See
+xref linkend=continuous-archiving-in-standby for details.
+   /para  
+   para
 varnamearchive_mode/ and varnamearchive_command/ are
 separate variables so that varnamearchive_command/ can be
 changed without leaving archiving mode.
diff --git 

Re: [HACKERS] Streaming replication and WAL archive interactions

2015-05-08 Thread Heikki Linnakangas

On 04/22/2015 10:07 AM, Michael Paquier wrote:

On Wed, Apr 22, 2015 at 3:38 PM, Heikki Linnakangas hlinn...@iki.fi wrote:

I feel that the best approach is to archive the last, partial segment, but
with the .partial suffix. I don't see any plausible real-wold setup where
the current behavior would be better. I don't really see much need to
archive the partial segment at all, but there's also no harm in doing it, as
long as it's clearly marked with the .partial suffix.


Well, as long as it is clearly archived at promotion, even with a
suffix, I guess that I am fine... This will need some tweaking on
restore_command for existing applications, but as long as it is
clearly documented I am fine. Shouldn't this be a different patch
though?


Ok, I came up with the attached, which adds the .partial suffix to the 
partial WAL segment that's archived after promotion. I couldn't find any 
natural place to talk about it in the docs, though. I think after the 
docs changes from the main patch are applied, it would be natural to 
mention this in the Continuous archiving in standby, so I think I'll 
add that later.


Barring objections, I'll push this later tonight.

Now that we got this last-partial-segment problem out of the way, I'm 
going to try fixing the problem you (Michael) pointed out about relying 
on pgstat file. Meanwhile, I'd love to get more feedback on the rest of 
the patch, and the documentation.


- Heikki

From 15c123141d1eef0d6b05a384d1c5c202ffa04a84 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas heikki.linnakangas@iki.fi
Date: Fri, 8 May 2015 12:04:46 +0300
Subject: [PATCH 1/2] Add macros to check if a filename is a WAL segment or
 other such file.

We had many instances of the strlen + strspn combination to check for that.
This makes the code a bit easier to read.
---
 src/backend/access/transam/xlog.c  | 11 +++
 src/backend/replication/basebackup.c   |  7 ++-
 src/bin/pg_basebackup/pg_receivexlog.c | 16 ++--
 src/bin/pg_resetxlog/pg_resetxlog.c|  8 ++--
 src/include/access/xlog_internal.h | 18 ++
 5 files changed, 31 insertions(+), 29 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 92822a1..5097173 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -3577,8 +3577,7 @@ RemoveOldXlogFiles(XLogSegNo segno, XLogRecPtr PriorRedoPtr, XLogRecPtr endptr)
 	while ((xlde = ReadDir(xldir, XLOGDIR)) != NULL)
 	{
 		/* Ignore files that are not XLOG segments */
-		if (strlen(xlde-d_name) != 24 ||
-			strspn(xlde-d_name, 0123456789ABCDEF) != 24)
+		if (!IsXLogFileName(xlde-d_name))
 			continue;
 
 		/*
@@ -3650,8 +3649,7 @@ RemoveNonParentXlogFiles(XLogRecPtr switchpoint, TimeLineID newTLI)
 	while ((xlde = ReadDir(xldir, XLOGDIR)) != NULL)
 	{
 		/* Ignore files that are not XLOG segments */
-		if (strlen(xlde-d_name) != 24 ||
-			strspn(xlde-d_name, 0123456789ABCDEF) != 24)
+		if (!IsXLogFileName(xlde-d_name))
 			continue;
 
 		/*
@@ -3839,10 +3837,7 @@ CleanupBackupHistory(void)
 
 	while ((xlde = ReadDir(xldir, XLOGDIR)) != NULL)
 	{
-		if (strlen(xlde-d_name)  24 
-			strspn(xlde-d_name, 0123456789ABCDEF) == 24 
-			strcmp(xlde-d_name + strlen(xlde-d_name) - strlen(.backup),
-   .backup) == 0)
+		if (IsBackupHistoryFileName(xlde-d_name))
 		{
 			if (XLogArchiveCheckDone(xlde-d_name))
 			{
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index 3563fd9..de103c6 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -350,17 +350,14 @@ perform_base_backup(basebackup_options *opt, DIR *tblspcdir)
 		while ((de = ReadDir(dir, pg_xlog)) != NULL)
 		{
 			/* Does it look like a WAL segment, and is it in the range? */
-			if (strlen(de-d_name) == 24 
-strspn(de-d_name, 0123456789ABCDEF) == 24 
+			if (IsXLogFileName(de-d_name) 
 strcmp(de-d_name + 8, firstoff + 8) = 0 
 strcmp(de-d_name + 8, lastoff + 8) = 0)
 			{
 walFileList = lappend(walFileList, pstrdup(de-d_name));
 			}
 			/* Does it look like a timeline history file? */
-			else if (strlen(de-d_name) == 8 + strlen(.history) 
-	 strspn(de-d_name, 0123456789ABCDEF) == 8 
-	 strcmp(de-d_name + 8, .history) == 0)
+			else if (IsTLHistoryFileName(de-d_name))
 			{
 historyFileList = lappend(historyFileList, pstrdup(de-d_name));
 			}
diff --git a/src/bin/pg_basebackup/pg_receivexlog.c b/src/bin/pg_basebackup/pg_receivexlog.c
index e77d2b6..53802af 100644
--- a/src/bin/pg_basebackup/pg_receivexlog.c
+++ b/src/bin/pg_basebackup/pg_receivexlog.c
@@ -188,23 +188,11 @@ FindStreamingStart(uint32 *tli)
 
 		/*
 		 * Check if the filename looks like an xlog file, or a .partial file.
-		 * Xlog files are always 24 characters, and .partial files are 32
-		 * characters.
 		 */
-		if (strlen(dirent-d_name) == 24)
-		{
-			if (strspn(dirent-d_name, 0123456789ABCDEF) != 24)
-continue;
+		if 

Re: [HACKERS] Streaming replication and WAL archive interactions

2015-04-23 Thread Heikki Linnakangas

On 04/22/2015 11:58 PM, Robert Haas wrote:

On Wed, Apr 22, 2015 at 3:34 PM, Heikki Linnakangas hlinn...@iki.fi wrote:

On 04/22/2015 10:21 PM, Robert Haas wrote:

On Wed, Apr 22, 2015 at 3:01 PM, Heikki Linnakangas hlinn...@iki.fi
wrote:

For example, imagine that perform point-in-time recovery to WAL position
0/1237E568, on timeline 1. That falls within segment
00010012. Then we end recovery, and switch to timeline 2.
After the switch, and some more WAL-logged actions, we'll have these
files
in pg_xlog:

00010011
00010012
00020012
00020013
00020014



Is the 00010012 file a partial segment of the sort
you're proposing to no longer achive?


If you did pure archive recovery, with no streaming replication involved,
then no. If it was created by streaming replication, and the replication had
not filled the whole segment yet, then yes, it would be a partial segment.


Why the difference?


Because we don't archive partial segments, except for the last one at a 
timeline switch, and there was no timeline switch to timeline 1 within 
that segment.


It doesn't really matter, though. The behaviour at the switch from 
timeline 1 to 2 works the same, whether the 00010012 
segment is complete or not.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming replication and WAL archive interactions

2015-04-22 Thread Heikki Linnakangas

On 04/22/2015 09:30 PM, Robert Haas wrote:

On Wed, Apr 22, 2015 at 2:17 AM, Heikki Linnakangas hlinn...@iki.fi wrote:

Note that it's a bit complicated to set up that scenario today. Archiving is
never enabled in recovery mode, so you'll need to use a custom cron job or
something to maintain the archive that C uses. The files will not
automatically flow from B to the second archive. With the patch we're
discussing, however, it would be easy: just set archive_mode='always' in B.


Hmm, I see.  But if C never replays the last, partial segment from the
old timeline, how does it follow the timeline switch?


At timeline switch, we copy the old segment to the new timeline, and 
start writing where we left off. So the WAL from the old timeline is 
found in the segment nominally belonging to the new timeline.


For example, imagine that perform point-in-time recovery to WAL position 
0/1237E568, on timeline 1. That falls within segment 
00010012. Then we end recovery, and switch to timeline 
2. After the switch, and some more WAL-logged actions, we'll have these 
files in pg_xlog:


00010011
00010012
00020012
00020013
00020014

Note that there are two segments ending in 12. They both have the same 
point up to offset 0x37E568, corresponding to the switch point 
0/1237E568. After that, the contents diverge: the segment on the new 
timeline contains a checkpoint/end-of-recovery record at that point, 
followed by new WAL belonging to the new timeline.


Recovery knows about that, so that if you set recovery target to 
timeline 2, and it needs the WAL at the beginning of segment 12 (still 
belonging to timeline 1), it will try to restoring both 
00010012 and 00020012.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming replication and WAL archive interactions

2015-04-22 Thread Robert Haas
On Wed, Apr 22, 2015 at 2:17 AM, Heikki Linnakangas hlinn...@iki.fi wrote:
 Note that it's a bit complicated to set up that scenario today. Archiving is
 never enabled in recovery mode, so you'll need to use a custom cron job or
 something to maintain the archive that C uses. The files will not
 automatically flow from B to the second archive. With the patch we're
 discussing, however, it would be easy: just set archive_mode='always' in B.

Hmm, I see.  But if C never replays the last, partial segment from the
old timeline, how does it follow the timeline switch?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming replication and WAL archive interactions

2015-04-22 Thread Robert Haas
On Tue, Apr 21, 2015 at 8:30 PM, Michael Paquier
michael.paqu...@gmail.com wrote:
 This .partial segment renaming is something that we
 should let the archive_command manage with its internal logic.

This strikes me as equivalent to saying we don't know how to make
this work right, but maybe our users will know.  That never works
out.  As things stand, we have a situation where the archive_command
examples in our documentation are known to be flawed.  They don't
fsync the file, and they'll write a partial file and then, when rerun,
fail to copy the full file because there's already something there.
Efforts have been made to fix these problems (see the pg_copy thread),
but they haven't been completed yet, nor have we even documented the
issues with the commands recommended by the documentation.  Let's
please not throw anything else on the pile of things we're expecting
users to somehow get right.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming replication and WAL archive interactions

2015-04-22 Thread Heikki Linnakangas

On 04/22/2015 10:21 PM, Robert Haas wrote:

On Wed, Apr 22, 2015 at 3:01 PM, Heikki Linnakangas hlinn...@iki.fi wrote:

For example, imagine that perform point-in-time recovery to WAL position
0/1237E568, on timeline 1. That falls within segment
00010012. Then we end recovery, and switch to timeline 2.
After the switch, and some more WAL-logged actions, we'll have these files
in pg_xlog:

00010011
00010012
00020012
00020013
00020014


Is the 00010012 file a partial segment of the sort
you're proposing to no longer achive?


If you did pure archive recovery, with no streaming replication 
involved, then no. If it was created by streaming replication, and the 
replication had not filled the whole segment yet, then yes, it would be 
a partial segment.



Note that there are two segments ending in 12. They both have the same
point up to offset 0x37E568, corresponding to the switch point 0/1237E568.
After that, the contents diverge: the segment on the new timeline contains a
checkpoint/end-of-recovery record at that point, followed by new WAL
belonging to the new timeline.


Check.


Recovery knows about that, so that if you set recovery target to timeline 2,
and it needs the WAL at the beginning of segment 12 (still belonging to
timeline 1), it will try to restoring both 00010012 and
00020012.


What if you set the recovery target to timeline 3?


It depends how timeline 3 was created. If timeline 3 was forked off from 
timeline 2, then recovery would find it. If it was forked off directly 
from timeline 1, then no.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming replication and WAL archive interactions

2015-04-22 Thread Robert Haas
On Wed, Apr 22, 2015 at 3:01 PM, Heikki Linnakangas hlinn...@iki.fi wrote:
 On 04/22/2015 09:30 PM, Robert Haas wrote:
 On Wed, Apr 22, 2015 at 2:17 AM, Heikki Linnakangas hlinn...@iki.fi
 wrote:

 Note that it's a bit complicated to set up that scenario today. Archiving
 is
 never enabled in recovery mode, so you'll need to use a custom cron job
 or
 something to maintain the archive that C uses. The files will not
 automatically flow from B to the second archive. With the patch we're
 discussing, however, it would be easy: just set archive_mode='always' in
 B.


 Hmm, I see.  But if C never replays the last, partial segment from the
 old timeline, how does it follow the timeline switch?

 At timeline switch, we copy the old segment to the new timeline, and start
 writing where we left off. So the WAL from the old timeline is found in the
 segment nominally belonging to the new timeline.

Check.

 For example, imagine that perform point-in-time recovery to WAL position
 0/1237E568, on timeline 1. That falls within segment
 00010012. Then we end recovery, and switch to timeline 2.
 After the switch, and some more WAL-logged actions, we'll have these files
 in pg_xlog:

 00010011
 00010012
 00020012
 00020013
 00020014

Is the 00010012 file a partial segment of the sort
you're proposing to no longer achive?

 Note that there are two segments ending in 12. They both have the same
 point up to offset 0x37E568, corresponding to the switch point 0/1237E568.
 After that, the contents diverge: the segment on the new timeline contains a
 checkpoint/end-of-recovery record at that point, followed by new WAL
 belonging to the new timeline.

Check.

 Recovery knows about that, so that if you set recovery target to timeline 2,
 and it needs the WAL at the beginning of segment 12 (still belonging to
 timeline 1), it will try to restoring both 00010012 and
 00020012.

What if you set the recovery target to timeline 3?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming replication and WAL archive interactions

2015-04-22 Thread Robert Haas
On Wed, Apr 22, 2015 at 3:34 PM, Heikki Linnakangas hlinn...@iki.fi wrote:
 On 04/22/2015 10:21 PM, Robert Haas wrote:
 On Wed, Apr 22, 2015 at 3:01 PM, Heikki Linnakangas hlinn...@iki.fi
 wrote:
 For example, imagine that perform point-in-time recovery to WAL position
 0/1237E568, on timeline 1. That falls within segment
 00010012. Then we end recovery, and switch to timeline 2.
 After the switch, and some more WAL-logged actions, we'll have these
 files
 in pg_xlog:

 00010011
 00010012
 00020012
 00020013
 00020014


 Is the 00010012 file a partial segment of the sort
 you're proposing to no longer achive?

 If you did pure archive recovery, with no streaming replication involved,
 then no. If it was created by streaming replication, and the replication had
 not filled the whole segment yet, then yes, it would be a partial segment.

Why the difference?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming replication and WAL archive interactions

2015-04-21 Thread Michael Paquier
On Thu, Apr 16, 2015 at 8:57 PM, Heikki Linnakangas wrote:
 Oh, hang on, that's not necessarily true. On promotion, the standby
archives
 the last, partial WAL segment from the old timeline. That's just wrong
 (http://www.postgresql.org/message-id/52fcd37c.3070...@vmware.com), and in
 fact I somehow thought I changed that already, but apparently not. So
let's
 stop doing that.

Er. Are you planning to prevent the standby from archiving the last partial
segment from the old timeline at promotion? I thought from previous
discussions that we should do it as master (be it crashed, burned, burried
or dead) may not have the occasion to do it. By preventing its archiving
you close the door to the case where master did not have the occasion to
archive it.

+/* */
+static char primary_last_archived[MAX_XFN_CHARS + 1];
This is visibly missing a comment.

As primary_last_archived is used only by ProcessArchivalReport(), wouldn't
it be better to pass it as argument to this function?

+   /* Check that the filename the primary reported looks valid */
+   if (strlen(primary_last_archived)  24 ||
+   strspn(primary_last_archived, 0123456789ABCDEF) != 24)
+   return;
Not related to this patch, but we had better have a macro doing this job I
think... It keeps spreading around.

People may be surprised that a base backup taken from a node that has
archive_mode = on set (that's the case in a very large number of cases)
will not be able to work as-is as node startup will fail as follows:
FATAL:  archive_mode='on' cannot be used in archive recovery
HINT:  Use 'shared' or 'always' mode instead.
One idea would be to simply ignore the fact that archive_mode = on on nodes
in recovery instead of dropping an error. Note that I like the fact that it
drops an error as that's clear, I just point the fact that people may be
surprised that base backups are not working anymore now in this case.

Are both WalSndArchivalReport() and WalSndArchivalReportIfNecessary()
really necessary? I think that for simplicity you could merge them and use
last_archival_report as a local variable.

Creating a dependency between the pgstat machinery and the WAL sender looks
weak to me. For example with this patch a master cannot stop, as it waits
indefinitely:
LOG:  using stale statistics instead of current ones because stats
collector is not responding
LOG:  sending archival report:
You could scan archive_status/ but that would be costly if there are many
entries to scan and I think that walsender should be highly responsive. Or
you could directly store the name of the lastly archived WAL segment marked
as .done in let's say archive_status/last_archived. An entry for that in
the control file does not seem the right place as a node may not have
archive_mode enabled that's why I am not mentioning it.

Regards,
-- 
Michael


Re: [HACKERS] Streaming replication and WAL archive interactions

2015-04-21 Thread Michael Paquier
On Tue, Apr 21, 2015 at 4:38 PM, Heikki Linnakangas hlinn...@iki.fi wrote:

 On 04/21/2015 09:53 AM, Michael Paquier wrote:

 On Thu, Apr 16, 2015 at 8:57 PM, Heikki Linnakangas wrote:

 Oh, hang on, that's not necessarily true. On promotion, the standby

 archives

 the last, partial WAL segment from the old timeline. That's just wrong
 (http://www.postgresql.org/message-id/52fcd37c.3070...@vmware.com), and
 in
 fact I somehow thought I changed that already, but apparently not. So

 let's

 stop doing that.


 Er. Are you planning to prevent the standby from archiving the last
 partial
 segment from the old timeline at promotion?


 Yes.

  I thought from previous discussions that we should do it as master
 (be it crashed, burned, burried or dead) may not have the occasion to
 do it. By preventing its archiving you close the door to the case
 where master did not have the occasion to archive it.


 The current situation is a mess:

 1. Even though we archive the last segment in the standby, there is no
 guarantee that the master had archived all the previous segments already.

2. If the master is not totally dead, it might try to archive the same file
 with more WAL in it, at the same time or just afterwards, or even just
 before the standby has completed promotion. Which copy do you keep in the
 archive? Having to deal with that makes the archive_command more
 complicated.

 Note that even though we don't archive the partial last segment on the
 previous timeline, the same WAL is copied to the first segment on the new
 timeline. So the WAL isn't lost.


But if the failed master has archived those segments safely, we may need
them, no? I am not sure we can ignore a user who would want to do a PITR
with recovery_target_timeline pointing to the one of the failed master.



  People may be surprised that a base backup taken from a node that has
 archive_mode = on set (that's the case in a very large number of cases)
 will not be able to work as-is as node startup will fail as follows:
 FATAL:  archive_mode='on' cannot be used in archive recovery
 HINT:  Use 'shared' or 'always' mode instead.


 Hmm, good point.

  One idea would be to simply ignore the fact that archive_mode = on on
 nodes
 in recovery instead of dropping an error. Note that I like the fact that
 it
 drops an error as that's clear, I just point the fact that people may be
 surprised that base backups are not working anymore now in this case.


 By ignore, what behaviour do you mean? Would on be equivalent to
 shared, always, or something else?


I meant something backward-compatible, with files marked as .done when they
are finished replaying... But now my words *are* weird as on != off ;)

Or we could keep the current behaviour with archive_mode=on (except for the
 last segment thing, which is just wrong), where the standby only archives
 the new timeline, and nothing from the previous timelines.


I guess this would solve the issue here then, which is not a bad thing in
itself:
http://www.postgresql.org/message-id/20140918180734.361021e1@erg
We would need to check if the situation improves with the 'always' mode btw.


 Are the use cases where you'd want that, rather than the new shared
 mode? I wanted to keep the 'on' mode for backwards-compatibility, but if
 that causes more problems, it might be better to just remove it and force
 the admin to choose what kind of a setup he has, with shared or always.


The 'on' mode is still useful IMO to get a behavior a maximum close to what
previous releases did.
Regards,
-- 
Michael


Re: [HACKERS] Streaming replication and WAL archive interactions

2015-04-21 Thread Heikki Linnakangas

On 04/21/2015 09:53 AM, Michael Paquier wrote:

On Thu, Apr 16, 2015 at 8:57 PM, Heikki Linnakangas wrote:

Oh, hang on, that's not necessarily true. On promotion, the standby

archives

the last, partial WAL segment from the old timeline. That's just wrong
(http://www.postgresql.org/message-id/52fcd37c.3070...@vmware.com), and in
fact I somehow thought I changed that already, but apparently not. So

let's

stop doing that.


Er. Are you planning to prevent the standby from archiving the last partial
segment from the old timeline at promotion?


Yes.


I thought from previous discussions that we should do it as master
(be it crashed, burned, burried or dead) may not have the occasion to
do it. By preventing its archiving you close the door to the case
where master did not have the occasion to archive it.


The current situation is a mess:

1. Even though we archive the last segment in the standby, there is no 
guarantee that the master had archived all the previous segments already.


2. If the master is not totally dead, it might try to archive the same 
file with more WAL in it, at the same time or just afterwards, or even 
just before the standby has completed promotion. Which copy do you keep 
in the archive? Having to deal with that makes the archive_command more 
complicated.


Note that even though we don't archive the partial last segment on the 
previous timeline, the same WAL is copied to the first segment on the 
new timeline. So the WAL isn't lost.



People may be surprised that a base backup taken from a node that has
archive_mode = on set (that's the case in a very large number of cases)
will not be able to work as-is as node startup will fail as follows:
FATAL:  archive_mode='on' cannot be used in archive recovery
HINT:  Use 'shared' or 'always' mode instead.


Hmm, good point.


One idea would be to simply ignore the fact that archive_mode = on on nodes
in recovery instead of dropping an error. Note that I like the fact that it
drops an error as that's clear, I just point the fact that people may be
surprised that base backups are not working anymore now in this case.


By ignore, what behaviour do you mean? Would on be equivalent to 
shared, always, or something else?


Or we could keep the current behaviour with archive_mode=on (except for 
the last segment thing, which is just wrong), where the standby only 
archives the new timeline, and nothing from the previous timelines. Are 
the use cases where you'd want that, rather than the new shared mode? 
I wanted to keep the 'on' mode for backwards-compatibility, but if that 
causes more problems, it might be better to just remove it and force the 
admin to choose what kind of a setup he has, with shared or always.



Creating a dependency between the pgstat machinery and the WAL sender looks
weak to me. For example with this patch a master cannot stop, as it waits
indefinitely:
LOG:  using stale statistics instead of current ones because stats
collector is not responding
LOG:  sending archival report:


Hmm, yeah, having walsender to wait for the stats file to appear is not 
good.



You could scan archive_status/ but that would be costly if there are many
entries to scan and I think that walsender should be highly responsive. Or
you could directly store the name of the lastly archived WAL segment marked
as .done in let's say archive_status/last_archived. An entry for that in
the control file does not seem the right place as a node may not have
archive_mode enabled that's why I am not mentioning it.


The ways that the archiver process can communicate with the rest of the 
system are limited, for the sake of robustness. Writing to the control 
file is definitely not OK. I think using the stats collector is OK for 
this, but we'll have to arrange it so that the walsender doesn't block 
on it, and should probably not force new stat file so often. A 5-10 
seconds old stats file would be perfectly fine for this purpose.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming replication and WAL archive interactions

2015-04-21 Thread Heikki Linnakangas

On 04/21/2015 12:04 PM, Michael Paquier wrote:

On Tue, Apr 21, 2015 at 4:38 PM, Heikki Linnakangas hlinn...@iki.fi wrote:


Note that even though we don't archive the partial last segment on the
previous timeline, the same WAL is copied to the first segment on the new
timeline. So the WAL isn't lost.


But if the failed master has archived those segments safely, we may need
them, no? I am not sure we can ignore a user who would want to do a PITR
with recovery_target_timeline pointing to the one of the failed master.


I think it would be acceptable. If you want to maintain an 
up-to-the-second archive, you can use pg_receivexlog. Mind you, if the 
standby wasn't promoted, the partial segment would not be present in the 
archive anyway. And you can copy the WAL segment manually from 
000200XX to pg_xlog/000100XX before 
starting PITR.


Another thought is that we could archive the partial file, but with a 
different name to avoid confusing it with the full segment. For example, 
we could archive a partial 00010012 segment as 
00020012.0128.partial, where 0128 indicates 
how far that file is valid (this naming is similar to how the backup 
history files are named). Recovery wouldn't automatically pick up those 
files, but the DBA could easily copy the partial file into pg_xlog with 
the full segment's name, if he wants to do PITR to that piece of WAL.



Are the use cases where you'd want that, rather than the new shared
mode? I wanted to keep the 'on' mode for backwards-compatibility, but if
that causes more problems, it might be better to just remove it and force
the admin to choose what kind of a setup he has, with shared or always.


The 'on' mode is still useful IMO to get a behavior a maximum close to what
previous releases did.


But would you ever want the old behaviour, rather than the new shared or 
always behaviour?


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming replication and WAL archive interactions

2015-04-21 Thread Robert Haas
On Tue, Apr 21, 2015 at 6:55 AM, Heikki Linnakangas hlinn...@iki.fi wrote:
 On 04/21/2015 12:04 PM, Michael Paquier wrote:
 On Tue, Apr 21, 2015 at 4:38 PM, Heikki Linnakangas hlinn...@iki.fi
 wrote:
 Note that even though we don't archive the partial last segment on the
 previous timeline, the same WAL is copied to the first segment on the new
 timeline. So the WAL isn't lost.

 But if the failed master has archived those segments safely, we may need
 them, no? I am not sure we can ignore a user who would want to do a PITR
 with recovery_target_timeline pointing to the one of the failed master.

 I think it would be acceptable. If you want to maintain an up-to-the-second
 archive, you can use pg_receivexlog. Mind you, if the standby wasn't
 promoted, the partial segment would not be present in the archive anyway.
 And you can copy the WAL segment manually from 000200XX to
 pg_xlog/000100XX before starting PITR.

 Another thought is that we could archive the partial file, but with a
 different name to avoid confusing it with the full segment. For example, we
 could archive a partial 00010012 segment as
 00020012.0128.partial, where 0128 indicates how
 far that file is valid (this naming is similar to how the backup history
 files are named). Recovery wouldn't automatically pick up those files, but
 the DBA could easily copy the partial file into pg_xlog with the full
 segment's name, if he wants to do PITR to that piece of WAL.

So, suppose you A replicating to B (via an archive) replicating to C
(via a separate archive); A dies, B is promoted.  It sounds to me like
today this will work and with your proposed change it will require
manual intervention.  I don't think that's OK.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming replication and WAL archive interactions

2015-04-16 Thread Heikki Linnakangas

On 03/01/2015 12:36 AM, Venkata Balaji N wrote:

Patch did get applied successfully to the latest master. Can you please
rebase.


Here you go.

On 01/31/2015 03:07 PM, Andres Freund wrote:

On 2014-12-19 22:56:40 +0200, Heikki Linnakangas wrote:

This add two new archive_modes, 'shared' and 'always', to indicate whether
the WAL archive is shared between the primary and standby, or not. In
shared mode, the standby tracks which files have been archived by the
primary. The standby refrains from recycling files that the primary has
not yet archived, and at failover, the standby archives all those files too
from the old timeline. In 'always' mode, the standby's WAL archive is
taken to be separate from the primary's, and the standby independently
archives all files it receives from the primary.


I don't really like this approach. Sharing a archive is rather dangerous
in my experience - if your old master comes up again (and writes in the
last wal file) or similar, you can get into really bad situations.


It doesn't have to actually be shared. The master and standby could 
archive to different locations, but the responsibility of archiving is 
shared, so that on promotion, the standby ensures that every WAL file 
gets archived. If the master didn't do it, then the standby will.


Yes, if the master comes up again, it might try to archive a file that 
the standby already archived. But that's not so bad. Both copies of the 
file will be identical. You could put logic in archive_command to check, 
if the file already exists in the archive, whether the contents are 
identical, and return success without doing anything if they are.


Oh, hang on, that's not necessarily true. On promotion, the standby 
archives the last, partial WAL segment from the old timeline. That's 
just wrong 
(http://www.postgresql.org/message-id/52fcd37c.3070...@vmware.com), and 
in fact I somehow thought I changed that already, but apparently not. So 
let's stop doing that.



What I was thinking about was instead trying to detect the point up to
which files were safely archived by running restore command to check for
the presence of archived files. Then archive anything that has valid
content and isn't yet archived. That doesn't sound particularly
complicated to me.


Hmm. That assumes that the standby has a valid restore_command, and can 
access the WAL archive. Not a too unreasonable requirement I guess, but 
with the scheme I proposed, it's not necessary. Seems a bit silly to 
copy a whole segment from the archive just to check if it exists, though.


- Heikki

From db5c4311baf4e3a2ae3308c4d0d9975ee3692a18 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas heikki.linnakangas@iki.fi
Date: Thu, 16 Apr 2015 14:40:24 +0300
Subject: [PATCH v2 1/1] Make WAL archival behave more sensibly in standby
 mode.

This adds two new archive_modes, 'shared' and 'always', to indicate whether
the WAL archive is shared between the primary and standby, or not. In
shared mode, the standby tracks which files have been archived by the
primary. The standby refrains from recycling files that the primary has
not yet archived, and at failover, the standby archives all those files too
from the old timeline. In 'always' mode, the standby's WAL archive is
taken to be separate from the primary's, and the standby independently
archives all files it receives from the primary.

Fujii Masao and me.
---
 doc/src/sgml/config.sgml  |  12 +-
 doc/src/sgml/high-availability.sgml   |  48 +++
 doc/src/sgml/protocol.sgml|  31 +
 src/backend/access/transam/xlog.c |  29 -
 src/backend/postmaster/postmaster.c   |  37 --
 src/backend/replication/walreceiver.c | 172 --
 src/backend/replication/walsender.c   |  47 +++
 src/backend/utils/misc/guc.c  |  21 ++--
 src/backend/utils/misc/postgresql.conf.sample |   2 +-
 src/include/access/xlog.h |  14 ++-
 10 files changed, 351 insertions(+), 62 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index b30c68d..e352b8e 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -2521,7 +2521,7 @@ include_dir 'conf.d'
 
 variablelist
  varlistentry id=guc-archive-mode xreflabel=archive_mode
-  termvarnamearchive_mode/varname (typeboolean/type)
+  termvarnamearchive_mode/varname (typeenum/type)
   indexterm
primaryvarnamearchive_mode/ configuration parameter/primary
   /indexterm
@@ -2530,7 +2530,15 @@ include_dir 'conf.d'
para
 When varnamearchive_mode/ is enabled, completed WAL segments
 are sent to archive storage by setting
-xref linkend=guc-archive-command.
+xref linkend=guc-archive-command. In addition to literaloff/,
+to disable, there are three modes: literalon/, literalshared/,
+and literalalways/. During normal operation, there is no
+

Re: [HACKERS] Streaming replication and WAL archive interactions

2015-02-28 Thread Venkata Balaji N


 Here's a first cut at this. It includes the changes from your
 standby_wal_archiving_v1.patch, so you get that behaviour if you set
 archive_mode='always', and the new behaviour I wanted with
 archive_mode='shared'. I wrote it on top of the other patch I posted
 recently to not archive bogus recycled WAL segments after promotion (
 http://www.postgresql.org/message-id/549489fa.4010...@vmware.com), but it
 seems to apply without it too.

 I suggest reading the documentation changes first, it hopefully explains
 pretty well how to use this. The code should work too, and comments on that
 are welcome too, but I haven't tested it much. I'll do more testing next
 week.


Patch did get applied successfully to the latest master. Can you please
rebase.

Regards,
Venkata Balaji N


[HACKERS] Re: [HACKERS] Streaming replication and WAL archive interactions

2015-02-11 Thread Миша Тюрин

  This should be a very common setup in the field, so how are  people doing it 
in practice?

One of possible workaround with archive and streaming was to use pg_receivexlog 
from standby to copy/save WALs to archive. but with pg_receivexlog was also 
issue with fsync.


[ master ] -- streaming -- [ standby ] -- pg_receivexlog -- [ /archive ]


In that case archive is always in pre standby state and it could be better than 
had archive broken on promote.
--
Misha

Re: [HACKERS] Streaming replication and WAL archive interactions

2015-01-31 Thread Andres Freund
Hi,

On 2014-12-19 22:56:40 +0200, Heikki Linnakangas wrote:
 This add two new archive_modes, 'shared' and 'always', to indicate whether
 the WAL archive is shared between the primary and standby, or not. In
 shared mode, the standby tracks which files have been archived by the
 primary. The standby refrains from recycling files that the primary has
 not yet archived, and at failover, the standby archives all those files too
 from the old timeline. In 'always' mode, the standby's WAL archive is
 taken to be separate from the primary's, and the standby independently
 archives all files it receives from the primary.

I don't really like this approach. Sharing a archive is rather dangerous
in my experience - if your old master comes up again (and writes in the
last wal file) or similar, you can get into really bad situations.

What I was thinking about was instead trying to detect the point up to
which files were safely archived by running restore command to check for
the presence of archived files. Then archive anything that has valid
content and isn't yet archived. That doesn't sound particularly
complicated to me.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming replication and WAL archive interactions

2014-12-19 Thread Heikki Linnakangas

On 12/18/2014 12:32 PM, Fujii Masao wrote:

On Wed, Dec 17, 2014 at 4:11 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:

On 12/16/2014 10:24 AM, Borodin Vladimir wrote:


12 дек. 2014 г., в 16:46, Heikki Linnakangas
hlinnakan...@vmware.com написал(а):


There have been a few threads on the behavior of WAL archiving,
after a standby server is promoted [1] [2]. In short, it doesn't
work as you might expect. The standby will start archiving after
it's promoted, but it will not archive files that were replicated
from the old master via streaming replication. If those files were
not already archived in the master before the promotion, they are
not archived at all. That's not good if you wanted to restore from
a base backup + the WAL archive later.

The basic setup is a master server, a standby, a WAL archive that's
shared by both, and streaming replication between the master and
standby. This should be a very common setup in the field, so how
are people doing it in practice? Just live with the wisk that you
might miss some files in the archive if you promote? Don't even
realize there's a problem? Something else?



Yes, I do live like that (with streaming replication and shared
archive between master and replicas) and don’t even realize there’s a
problem :( And I think I’m not the only one. Maybe at least a note
should be added to the documentation?



Let's try to figure out a way to fix this in master, but yeah, a note in the
documentation is in order.


+1


And how would we like it to work?



Here's a plan:

Have a mechanism in the standby, to track how far the master has archived
its WAL, and don't throw away WAL in the standby that hasn't been archived
in the master yet. This is similar to the physical replication slots, which
prevent the master from recycling WAL that a standby hasn't received yet,
but in reverse. I think we can use the .done and .ready files for this.
Whenever a file is streamed (completely) from the master, create a .ready
file for it. When we get an acknowledgement from the master that it has
archived it, create a .done file for it. To get the information from the
master, add the last archived WAL segment e.g. in the streaming
replication keep-alive message, or invent a new message type for it.


Sounds OK to me.

How does this work in cascade replication case? The cascading walsender
just relays the archive location to the downstream standby?


Hmm. Yeah, I guess so.


What happens when WAL streaming is terminated and the startup process starts to
read the WAL file from the archive? After reading the WAL file from the archive,
probably we would need to change .ready files of every older WAL files to .done.


I suppose. Although there's no big harm in leaving them in .ready state. 
As soon as you reconnect, the primary will tell if they were archived. 
If the server is promoted before reconnecting, it will try to archive 
the files and archive_command will see that they are already in the 
archive. It has to be prepared for that situation anyway, so that's OK too.


Here's a first cut at this. It includes the changes from your 
standby_wal_archiving_v1.patch, so you get that behaviour if you set 
archive_mode='always', and the new behaviour I wanted with 
archive_mode='shared'. I wrote it on top of the other patch I posted 
recently to not archive bogus recycled WAL segments after promotion 
(http://www.postgresql.org/message-id/549489fa.4010...@vmware.com), but 
it seems to apply without it too.


I suggest reading the documentation changes first, it hopefully explains 
pretty well how to use this. The code should work too, and comments on 
that are welcome too, but I haven't tested it much. I'll do more testing 
next week.


- Heikki

From 03dced40178c0a0b7c28ff630a15cf664995525d Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas heikki.linnakan...@iki.fi
Date: Tue, 16 Dec 2014 23:09:03 +0200
Subject: [PATCH 1/1] Make WAL archival behave more sensibly in standby mode.

This add two new archive_modes, 'shared' and 'always', to indicate whether
the WAL archive is shared between the primary and standby, or not. In
shared mode, the standby tracks which files have been archived by the
primary. The standby refrains from recycling files that the primary has
not yet archived, and at failover, the standby archives all those files too
from the old timeline. In 'always' mode, the standby's WAL archive is
taken to be separate from the primary's, and the standby independently
archives all files it receives from the primary.

Fujii Masao and me.
---
 doc/src/sgml/config.sgml  |  12 +-
 doc/src/sgml/high-availability.sgml   |  48 +++
 doc/src/sgml/protocol.sgml|  31 +
 src/backend/access/transam/xlog.c |  29 -
 src/backend/postmaster/postmaster.c   |  37 --
 src/backend/replication/walreceiver.c | 172 --
 src/backend/replication/walsender.c   |  47 +++
 

Re: [HACKERS] Streaming replication and WAL archive interactions

2014-12-18 Thread Fujii Masao
On Wed, Dec 17, 2014 at 4:11 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 On 12/16/2014 10:24 AM, Borodin Vladimir wrote:

 12 дек. 2014 г., в 16:46, Heikki Linnakangas
 hlinnakan...@vmware.com написал(а):

 There have been a few threads on the behavior of WAL archiving,
 after a standby server is promoted [1] [2]. In short, it doesn't
 work as you might expect. The standby will start archiving after
 it's promoted, but it will not archive files that were replicated
 from the old master via streaming replication. If those files were
 not already archived in the master before the promotion, they are
 not archived at all. That's not good if you wanted to restore from
 a base backup + the WAL archive later.

 The basic setup is a master server, a standby, a WAL archive that's
 shared by both, and streaming replication between the master and
 standby. This should be a very common setup in the field, so how
 are people doing it in practice? Just live with the wisk that you
 might miss some files in the archive if you promote? Don't even
 realize there's a problem? Something else?


 Yes, I do live like that (with streaming replication and shared
 archive between master and replicas) and don’t even realize there’s a
 problem :( And I think I’m not the only one. Maybe at least a note
 should be added to the documentation?


 Let's try to figure out a way to fix this in master, but yeah, a note in the
 documentation is in order.

+1

 And how would we like it to work?


 Here's a plan:

 Have a mechanism in the standby, to track how far the master has archived
 its WAL, and don't throw away WAL in the standby that hasn't been archived
 in the master yet. This is similar to the physical replication slots, which
 prevent the master from recycling WAL that a standby hasn't received yet,
 but in reverse. I think we can use the .done and .ready files for this.
 Whenever a file is streamed (completely) from the master, create a .ready
 file for it. When we get an acknowledgement from the master that it has
 archived it, create a .done file for it. To get the information from the
 master, add the last archived WAL segment e.g. in the streaming
 replication keep-alive message, or invent a new message type for it.

Sounds OK to me.

How does this work in cascade replication case? The cascading walsender
just relays the archive location to the downstream standby?

What happens when WAL streaming is terminated and the startup process starts to
read the WAL file from the archive? After reading the WAL file from the archive,
probably we would need to change .ready files of every older WAL files to .done.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming replication and WAL archive interactions

2014-12-16 Thread Borodin Vladimir

12 дек. 2014 г., в 16:46, Heikki Linnakangas hlinnakan...@vmware.com 
написал(а):

 There have been a few threads on the behavior of WAL archiving, after a 
 standby server is promoted [1] [2]. In short, it doesn't work as you might 
 expect. The standby will start archiving after it's promoted, but it will not 
 archive files that were replicated from the old master via streaming 
 replication. If those files were not already archived in the master before 
 the promotion, they are not archived at all. That's not good if you wanted to 
 restore from a base backup + the WAL archive later.
 
 The basic setup is a master server, a standby, a WAL archive that's shared by 
 both, and streaming replication between the master and standby. This should 
 be a very common setup in the field, so how are people doing it in practice? 
 Just live with the wisk that you might miss some files in the archive if you 
 promote? Don't even realize there's a problem? Something else?

Yes, I do live like that (with streaming replication and shared archive between 
master and replicas) and don’t even realize there’s a problem :( And I think 
I’m not the only one. Maybe at least a note should be added to the 
documentation?

 
 And how would we like it to work?
 
 There was some discussion in August on enabling WAL archiving in the standby, 
 always [3]. That's a related idea, but it assumes that you have a separate 
 archive in the master and the standby. The problem at promotion happens when 
 you have a shared archive between the master and standby.

AFAIK most people use the scheme with shared archive.

 
 [1] 
 http://www.postgresql.org/message-id/CAHGQGwHVYqbX=a+zo+avfbvhlgoypo9g_qdkbabexgxbvgd...@mail.gmail.com
 
 [2] http://www.postgresql.org/message-id/20140904175036.310c6466@erg
 
 [3] 
 http://www.postgresql.org/message-id/CAHGQGwHNMs-syU=mevsesthna+exd9pfo_ohhfpjcwovayr...@mail.gmail.com.
 
 - Heikki
 
 
 -- 
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers


--
Vladimir






Re: [HACKERS] Streaming replication and WAL archive interactions

2014-12-16 Thread Heikki Linnakangas

On 12/16/2014 10:24 AM, Borodin Vladimir wrote:

12 дек. 2014 г., в 16:46, Heikki Linnakangas
hlinnakan...@vmware.com написал(а):


There have been a few threads on the behavior of WAL archiving,
after a standby server is promoted [1] [2]. In short, it doesn't
work as you might expect. The standby will start archiving after
it's promoted, but it will not archive files that were replicated
from the old master via streaming replication. If those files were
not already archived in the master before the promotion, they are
not archived at all. That's not good if you wanted to restore from
a base backup + the WAL archive later.

The basic setup is a master server, a standby, a WAL archive that's
shared by both, and streaming replication between the master and
standby. This should be a very common setup in the field, so how
are people doing it in practice? Just live with the wisk that you
might miss some files in the archive if you promote? Don't even
realize there's a problem? Something else?


Yes, I do live like that (with streaming replication and shared
archive between master and replicas) and don’t even realize there’s a
problem :( And I think I’m not the only one. Maybe at least a note
should be added to the documentation?


Let's try to figure out a way to fix this in master, but yeah, a note in 
the documentation is in order.



And how would we like it to work?


Here's a plan:

Have a mechanism in the standby, to track how far the master has 
archived its WAL, and don't throw away WAL in the standby that hasn't 
been archived in the master yet. This is similar to the physical 
replication slots, which prevent the master from recycling WAL that a 
standby hasn't received yet, but in reverse. I think we can use the 
.done and .ready files for this. Whenever a file is streamed 
(completely) from the master, create a .ready file for it. When we get 
an acknowledgement from the master that it has archived it, create a 
.done file for it. To get the information from the master, add the last 
archived WAL segment e.g. in the streaming replication keep-alive 
message, or invent a new message type for it.


At promotion, archive all the WAL from the old timeline that the master 
hadn't already archived. While doing this, the archive_command can be 
called for files that have in fact already been archived in the master, 
so the command needs to return success if it's asked to archive a file 
and an identical file already exists in the archive. That's a bit 
difficult to write into a one-liner, but hopefully we can still provide 
an example of this. Or have another command, e.g. 
promotion_archive_command, which can just assume that everything is OK 
if the file already exists.


To enable this new mode, let's add a third option to archive_mode, 
besides on/off. Or just make this the default; I'm not sure if anyone 
would want the old behavior.



There was some discussion in August on enabling WAL archiving in
the standby, always [3]. That's a related idea, but it assumes that
you have a separate archive in the master and the standby. The
problem at promotion happens when you have a shared archive between
the master and standby.


AFAIK most people use the scheme with shared archive.


Yeah. Anyway, we can support both scenarios.

- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Streaming replication and WAL archive interactions

2014-12-12 Thread Heikki Linnakangas
There have been a few threads on the behavior of WAL archiving, after a 
standby server is promoted [1] [2]. In short, it doesn't work as you 
might expect. The standby will start archiving after it's promoted, but 
it will not archive files that were replicated from the old master via 
streaming replication. If those files were not already archived in the 
master before the promotion, they are not archived at all. That's not 
good if you wanted to restore from a base backup + the WAL archive later.


The basic setup is a master server, a standby, a WAL archive that's 
shared by both, and streaming replication between the master and 
standby. This should be a very common setup in the field, so how are 
people doing it in practice? Just live with the wisk that you might miss 
some files in the archive if you promote? Don't even realize there's a 
problem? Something else?


And how would we like it to work?

There was some discussion in August on enabling WAL archiving in the 
standby, always [3]. That's a related idea, but it assumes that you have 
a separate archive in the master and the standby. The problem at 
promotion happens when you have a shared archive between the master and 
standby.


[1] 
http://www.postgresql.org/message-id/CAHGQGwHVYqbX=a+zo+avfbvhlgoypo9g_qdkbabexgxbvgd...@mail.gmail.com


[2] http://www.postgresql.org/message-id/20140904175036.310c6466@erg

[3] 
http://www.postgresql.org/message-id/CAHGQGwHNMs-syU=mevsesthna+exd9pfo_ohhfpjcwovayr...@mail.gmail.com.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers