Re: [HACKERS] Unarchived WALs deleted after crash

2013-02-21 Thread Heikki Linnakangas

On 21.02.2013 02:59, Daniel Farina wrote:

On Fri, Feb 15, 2013 at 9:29 AM, Simon Riggssi...@2ndquadrant.com  wrote:

On 15 February 2013 17:07, Heikki Linnakangashlinnakan...@vmware.com  wrote:


Unfortunately in HEAD, xxx.done file is not created when restoring
archived
file because of absence of the patch. We need to implement that first.



Ah yeah, that thing again..
(http://www.postgresql.org/message-id/50df5ba7.6070...@vmware.com) I'm going
to forward-port that patch now, before it's forgotten again. It's not clear
to me what the holdup was on this, but whatever the bigger patch we've been
waiting for is, it can just as well be done on top of the forward-port.


Agreed. I wouldn't wait for a better version now.


Related to this, how is this going to affect point releases, and are
there any lingering doubts about the mechanism of the fix?


Are you talking about the patch to avoid restored WAL segments from 
being re-archived (commit 6f4b8a4f4f7a2d683ff79ab59d3693714b965e3d), or 
the bug that that unarchived WALs were deleted after crash (commit 
b5ec56f664fa20d80fe752de494ec96362eff520)? The former was included in 
9.2.0 already, and the latter will be included in the next point release.


I have no lingering doubts about this. There was some plans to do bigger 
changes for the re-archiving issue 
(6f4b8a4f4f7a2d683ff79ab59d3693714b965e3d), which is why it was 
initially left out from master. But that didn't happen, and I believe 
everyone is happy with the current state of things.



This is
quite serious given my reliance on archiving, so unless the thinking
for point releases is 'real soon' I must backpatch and release it on
my own accord until then.


I don't know what the release schedule is. I take that to be a request 
to put out a new minor release ASAP.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unarchived WALs deleted after crash

2013-02-21 Thread Daniel Farina
On Thu, Feb 21, 2013 at 12:39 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 On 21.02.2013 02:59, Daniel Farina wrote:

 On Fri, Feb 15, 2013 at 9:29 AM, Simon Riggssi...@2ndquadrant.com
 wrote:

 On 15 February 2013 17:07, Heikki Linnakangashlinnakan...@vmware.com
 wrote:

 Unfortunately in HEAD, xxx.done file is not created when restoring
 archived
 file because of absence of the patch. We need to implement that first.



 Ah yeah, that thing again..
 (http://www.postgresql.org/message-id/50df5ba7.6070...@vmware.com) I'm
 going
 to forward-port that patch now, before it's forgotten again. It's not
 clear
 to me what the holdup was on this, but whatever the bigger patch we've
 been
 waiting for is, it can just as well be done on top of the forward-port.


 Agreed. I wouldn't wait for a better version now.


 Related to this, how is this going to affect point releases, and are
 there any lingering doubts about the mechanism of the fix?


 Are you talking about the patch to avoid restored WAL segments from being
 re-archived (commit 6f4b8a4f4f7a2d683ff79ab59d3693714b965e3d), or the bug
 that that unarchived WALs were deleted after crash (commit
 b5ec56f664fa20d80fe752de494ec96362eff520)? The former was included in 9.2.0
 already, and the latter will be included in the next point release.

Unarchived WALs being deleted after a crash is the one that worries
me.  I actually presume re-archivals will happen anyway because I may
lose connection to archive storage after the WAL has already been
committed, hence b5ec56f664fa20d80fe752de494ec96362eff520.

 This is
 quite serious given my reliance on archiving, so unless the thinking
 for point releases is 'real soon' I must backpatch and release it on
 my own accord until then.


 I don't know what the release schedule is. I take that to be a request to
 put out a new minor release ASAP.

Perhaps, but it's more of a concrete evaluation of how important
archiving is to me and my affiliated operation.  An acceptable answer
might be yeah, backpatch if you feel it's that much of a rush.
Clearly, my opinion is that a gap in the archives is pretty
cringe-inducing.  I hit it from an out of disk case, and you'd be
surprised (or perhaps not?) how many people like to kill -9 processes
on a whim.

I already maintain other backpatches (not related to fixes), and this
one is only temporary, so it's not too much trouble for me.

--
fdr


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unarchived WALs deleted after crash

2013-02-21 Thread Jeff Janes
On Thu, Feb 21, 2013 at 12:39 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:

 Are you talking about the patch to avoid restored WAL segments from being
 re-archived (commit 6f4b8a4f4f7a2d683ff79ab59d3693714b965e3d), or the bug
 that that unarchived WALs were deleted after crash (commit
 b5ec56f664fa20d80fe752de494ec96362eff520)? The former was included in 9.2.0
 already, and the latter will be included in the next point release.

...

 I don't know what the release schedule is. I take that to be a request to
 put out a new minor release ASAP.

+1 from me.  I'm rather uncomfortable running a system with this
unarchived deletion bug.

Cheers,

Jeff


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unarchived WALs deleted after crash

2013-02-20 Thread Jehan-Guillaume de Rorthais
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Just a quick top-post to thank you all for this fix guys !

Cheers,

On 15/02/2013 18:43, Heikki Linnakangas wrote:
 On 15.02.2013 19:16, Fujii Masao wrote:
 On Sat, Feb 16, 2013 at 2:07 AM, Heikki Linnakangas 
 hlinnakan...@vmware.com  wrote:
 On 15.02.2013 18:10, Fujii Masao wrote:
 
 At least in 9.2, when the archived file is restored into
 pg_xlog, its xxx.done archive status file is created. So we
 don't need to check InArchiveRecovery when deleting old WAL
 files. Checking whether xxx.done exists is enough.
 
 Hmm, what about streamed WAL files? I guess we could go back to
 the pre-9.2 coding, and check WalRcvInProgress(). But I didn't
 actually like that too much, it seems rather random that old
 streamed files are recycled when wal receiver is running at the
 time of restartpoint, and otherwise not. Because whether wal
 receiver is running at the time the restartpoint happens has 
 little to do with which files were created by streaming
 replication. With the right pattern of streaming files from the
 master, but always being teporarily disconnected when the
 restartpoint runs, you could still accumulate WAL files
 infinitely.
 
 Walreceiver always creates .done file when it closes the 
 already-flushed WAL file and switches WAL file to next. So we
 also don't  need to check WalRcvInProgress().
 
 Ah, I missed that part of the patch.
 
 Okay, agreed, that's a better fix. I committed your forward-port of
 the 9.2 patch to master, reverted my earlier fix for this bug, and
 simply removed the 
 InArchiveRecovery/ArchiveRecoveryInProgress()/RecoveryInProgress() 
 condition from RemoveOldXlogFiles().
 
 - Heikki

- -- 
Jehan-Guillaume de Rorthais
http://www.dalibo.com
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAlEk5hwACgkQXu9L1HbaT6JZ3wCg4h7QT+wRMT8KZAA/PjOjZcCV
CS4AnRFeGdXIgklo1/RD2hi+e98pNBEe
=voW3
-END PGP SIGNATURE-


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unarchived WALs deleted after crash

2013-02-20 Thread Daniel Farina
On Fri, Feb 15, 2013 at 9:29 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On 15 February 2013 17:07, Heikki Linnakangas hlinnakan...@vmware.com wrote:

 Unfortunately in HEAD, xxx.done file is not created when restoring
 archived
 file because of absence of the patch. We need to implement that first.


 Ah yeah, that thing again..
 (http://www.postgresql.org/message-id/50df5ba7.6070...@vmware.com) I'm going
 to forward-port that patch now, before it's forgotten again. It's not clear
 to me what the holdup was on this, but whatever the bigger patch we've been
 waiting for is, it can just as well be done on top of the forward-port.

 Agreed. I wouldn't wait for a better version now.

Related to this, how is this going to affect point releases, and are
there any lingering doubts about the mechanism of the fix?  This is
quite serious given my reliance on archiving, so unless the thinking
for point releases is 'real soon' I must backpatch and release it on
my own accord until then.

Thanks for the attention paid to the bug report, as always.

--
fdr


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unarchived WALs deleted after crash

2013-02-15 Thread Heikki Linnakangas

On 14.02.2013 17:45, Jehan-Guillaume de Rorthais wrote:

I am facing an unexpected behavior on a 9.2.2 cluster that I can
reproduce on current HEAD.

On a cluster with archive enabled but failing, after a crash of
postmaster, the checkpoint occurring before leaving the recovery mode
deletes any additional WALs, even those waiting to be archived.

 ...
 Is it expected ?

No, it's a bug. Ouch. It was introduced in 9.2, by commit 
5286105800c7d5902f98f32e11b209c471c0c69c:



-  /*
-   * Normally we don't delete old XLOG files during recovery to
-   * avoid accidentally deleting a file that looks stale due to a
-   * bug or hardware issue, but in fact contains important data.
-   * During streaming recovery, however, we will eventually fill the
-   * disk if we never clean up, so we have to. That's not an issue
-   * with file-based archive recovery because in that case we
-   * restore one XLOG file at a time, on-demand, and with a
-   * different filename that can't be confused with regular XLOG
-   * files.
-   */
-   if (WalRcvInProgress() || XLogArchiveCheckDone(xlde-d_name))
+   if (RecoveryInProgress() || XLogArchiveCheckDone(xlde-d_name))
 [ delete the file ]


With that commit, we started to keep WAL segments restored from the 
archive in pg_xlog, so we needed to start deleting old segments during 
archive recovery, even when streaming replication was not active. But 
the above change was to broad; we started to delete old segments also 
during crash recovery.


The above should check InArchiveRecovery, ie. only delete old files when 
in archive recovery, not when in crash recovery. But there's one little 
complication: InArchiveRecovery is currently only valid in the startup 
process, so we'll need to also share it in shared memory, so that the 
checkpointer process can access it.


I propose the attached patch to fix it.

- Heikki
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 30d877b..dabb094 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -418,6 +418,7 @@ typedef struct XLogCtlData
 	 * recovery.  Protected by info_lck.
 	 */
 	bool		SharedRecoveryInProgress;
+	bool		SharedInArchiveRecovery;
 
 	/*
 	 * SharedHotStandbyActive indicates if we're still in crash or archive
@@ -622,6 +623,7 @@ static void XLogArchiveCleanup(const char *xlog);
 static void readRecoveryCommandFile(void);
 static void exitArchiveRecovery(TimeLineID endTLI,
 	uint32 endLogId, uint32 endLogSeg);
+static bool ArchiveRecoveryInProgress(void);
 static bool recoveryStopsHere(XLogRecord *record, bool *includeThis);
 static void recoveryPausesHere(void);
 static void SetLatestXTime(TimestampTz xtime);
@@ -3571,7 +3573,7 @@ RemoveOldXlogFiles(uint32 log, uint32 seg, XLogRecPtr endptr)
 			strspn(xlde-d_name, 0123456789ABCDEF) == 24 
 			strcmp(xlde-d_name + 8, lastoff + 8) = 0)
 		{
-			if (RecoveryInProgress() || XLogArchiveCheckDone(xlde-d_name))
+			if (ArchiveRecoveryInProgress() || XLogArchiveCheckDone(xlde-d_name))
 			{
 snprintf(path, MAXPGPATH, XLOGDIR /%s, xlde-d_name);
 
@@ -5289,6 +5291,7 @@ XLOGShmemInit(void)
 	 */
 	XLogCtl-XLogCacheBlck = XLOGbuffers - 1;
 	XLogCtl-SharedRecoveryInProgress = true;
+	XLogCtl-SharedInArchiveRecovery = false;
 	XLogCtl-SharedHotStandbyActive = false;
 	XLogCtl-WalWriterSleeping = false;
 	XLogCtl-Insert.currpage = (XLogPageHeader) (XLogCtl-pages);
@@ -5680,6 +5683,7 @@ readRecoveryCommandFile(void)
 
 	/* Enable fetching from archive recovery area */
 	InArchiveRecovery = true;
+	XLogCtl-SharedInArchiveRecovery = true;
 
 	/*
 	 * If user specified recovery_target_timeline, validate it or compute the
@@ -5718,11 +5722,16 @@ exitArchiveRecovery(TimeLineID endTLI, uint32 endLogId, uint32 endLogSeg)
 {
 	char		recoveryPath[MAXPGPATH];
 	char		xlogpath[MAXPGPATH];
+	/* use volatile pointer to prevent code rearrangement */
+	volatile XLogCtlData *xlogctl = XLogCtl;
 
 	/*
 	 * We are no longer in archive recovery state.
 	 */
 	InArchiveRecovery = false;
+	SpinLockAcquire(xlogctl-info_lck);
+	xlogctl-SharedInArchiveRecovery = false;
+	SpinLockRelease(xlogctl-info_lck);
 
 	/*
 	 * Update min recovery point one last time.
@@ -7315,6 +7324,25 @@ RecoveryInProgress(void)
 }
 
 /*
+ * Are we currently in archive recovery? In the startup process, you can just
+ * check InArchiveRecovery variable instead.
+ */
+static bool
+ArchiveRecoveryInProgress()
+{
+	bool		result;
+	/* use volatile pointer to prevent code rearrangement */
+	volatile XLogCtlData *xlogctl = XLogCtl;
+
+	/* spinlock is essential on machines with weak memory ordering! */
+	SpinLockAcquire(xlogctl-info_lck);
+	result = xlogctl-SharedInArchiveRecovery;
+	SpinLockRelease(xlogctl-info_lck);
+
+	return result;
+}
+
+/*
  * Is HotStandby active yet? This is only important in special backends
  * since normal backends won't ever be able to connect until this returns
  * true. Postmaster knows this by way of signal, not via shared 

Re: [HACKERS] Unarchived WALs deleted after crash

2013-02-15 Thread Simon Riggs
On 15 February 2013 14:31, Heikki Linnakangas hlinnakan...@vmware.com wrote:
 On 14.02.2013 17:45, Jehan-Guillaume de Rorthais wrote:

 I am facing an unexpected behavior on a 9.2.2 cluster that I can
 reproduce on current HEAD.

 On a cluster with archive enabled but failing, after a crash of
 postmaster, the checkpoint occurring before leaving the recovery mode
 deletes any additional WALs, even those waiting to be archived.

 ...
 Is it expected ?

 No, it's a bug. Ouch. It was introduced in 9.2, by commit
 5286105800c7d5902f98f32e11b209c471c0c69c:

Thanks for tracking that down.

 -  /*
 -   * Normally we don't delete old XLOG files during recovery to
 -   * avoid accidentally deleting a file that looks stale due to a
 -   * bug or hardware issue, but in fact contains important data.
 -   * During streaming recovery, however, we will eventually fill the
 -   * disk if we never clean up, so we have to. That's not an issue
 -   * with file-based archive recovery because in that case we
 -   * restore one XLOG file at a time, on-demand, and with a
 -   * different filename that can't be confused with regular XLOG
 -   * files.
 -   */
 -   if (WalRcvInProgress() || XLogArchiveCheckDone(xlde-d_name))
 +   if (RecoveryInProgress() || XLogArchiveCheckDone(xlde-d_name))
  [ delete the file ]


 With that commit, we started to keep WAL segments restored from the archive
 in pg_xlog, so we needed to start deleting old segments during archive
 recovery, even when streaming replication was not active. But the above
 change was to broad; we started to delete old segments also during crash
 recovery.

 The above should check InArchiveRecovery, ie. only delete old files when in
 archive recovery, not when in crash recovery. But there's one little
 complication: InArchiveRecovery is currently only valid in the startup
 process, so we'll need to also share it in shared memory, so that the
 checkpointer process can access it.

 I propose the attached patch to fix it.

Agree with your diagnosis and fix.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unarchived WALs deleted after crash

2013-02-15 Thread Heikki Linnakangas

On 15.02.2013 17:12, Simon Riggs wrote:

On 15 February 2013 14:31, Heikki Linnakangashlinnakan...@vmware.com  wrote:

-  /*
-   * Normally we don't delete old XLOG files during recovery to
-   * avoid accidentally deleting a file that looks stale due to a
-   * bug or hardware issue, but in fact contains important data.
-   * During streaming recovery, however, we will eventually fill the
-   * disk if we never clean up, so we have to. That's not an issue
-   * with file-based archive recovery because in that case we
-   * restore one XLOG file at a time, on-demand, and with a
-   * different filename that can't be confused with regular XLOG
-   * files.
-   */
-   if (WalRcvInProgress() || XLogArchiveCheckDone(xlde-d_name))
+   if (RecoveryInProgress() || XLogArchiveCheckDone(xlde-d_name))
  [ delete the file ]


With that commit, we started to keep WAL segments restored from the archive
in pg_xlog, so we needed to start deleting old segments during archive
recovery, even when streaming replication was not active. But the above
change was to broad; we started to delete old segments also during crash
recovery.

The above should check InArchiveRecovery, ie. only delete old files when in
archive recovery, not when in crash recovery. But there's one little
complication: InArchiveRecovery is currently only valid in the startup
process, so we'll need to also share it in shared memory, so that the
checkpointer process can access it.

I propose the attached patch to fix it.


Agree with your diagnosis and fix.


Ok, committed. For the sake of the archives, attached is a script based 
on Jehan-Guillaume's description that I used for testing (incidentally 
based on Kyotaro's script to reproduce an unrelated problem in another 
thread).


Thanks for the report!

- Heikki


unarchived-wal-removed.sh
Description: Bourne shell script

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unarchived WALs deleted after crash

2013-02-15 Thread Fujii Masao
On Fri, Feb 15, 2013 at 11:31 PM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 On 14.02.2013 17:45, Jehan-Guillaume de Rorthais wrote:

 I am facing an unexpected behavior on a 9.2.2 cluster that I can
 reproduce on current HEAD.

 On a cluster with archive enabled but failing, after a crash of
 postmaster, the checkpoint occurring before leaving the recovery mode
 deletes any additional WALs, even those waiting to be archived.

 ...
 Is it expected ?

 No, it's a bug. Ouch. It was introduced in 9.2, by commit
 5286105800c7d5902f98f32e11b209c471c0c69c:

Oh, sorry for my mistake.


 -  /*
 -   * Normally we don't delete old XLOG files during recovery to
 -   * avoid accidentally deleting a file that looks stale due to a
 -   * bug or hardware issue, but in fact contains important data.
 -   * During streaming recovery, however, we will eventually fill the
 -   * disk if we never clean up, so we have to. That's not an issue
 -   * with file-based archive recovery because in that case we
 -   * restore one XLOG file at a time, on-demand, and with a
 -   * different filename that can't be confused with regular XLOG
 -   * files.
 -   */
 -   if (WalRcvInProgress() || XLogArchiveCheckDone(xlde-d_name))
 +   if (RecoveryInProgress() || XLogArchiveCheckDone(xlde-d_name))
  [ delete the file ]


 With that commit, we started to keep WAL segments restored from the archive
 in pg_xlog, so we needed to start deleting old segments during archive
 recovery, even when streaming replication was not active. But the above
 change was to broad; we started to delete old segments also during crash
 recovery.

 The above should check InArchiveRecovery, ie. only delete old files when in
 archive recovery, not when in crash recovery. But there's one little
 complication: InArchiveRecovery is currently only valid in the startup
 process, so we'll need to also share it in shared memory, so that the
 checkpointer process can access it.

 I propose the attached patch to fix it.

At least in 9.2, when the archived file is restored into pg_xlog, its xxx.done
archive status file is created. So we don't need to check InArchiveRecovery
when deleting old WAL files. Checking whether xxx.done exists is enough.

Unfortunately in HEAD, xxx.done file is not created when restoring archived
file because of absence of the patch. We need to implement that first.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unarchived WALs deleted after crash

2013-02-15 Thread Heikki Linnakangas

On 15.02.2013 18:10, Fujii Masao wrote:

On Fri, Feb 15, 2013 at 11:31 PM, Heikki Linnakangas
hlinnakan...@vmware.com  wrote:

-  /*
-   * Normally we don't delete old XLOG files during recovery to
-   * avoid accidentally deleting a file that looks stale due to a
-   * bug or hardware issue, but in fact contains important data.
-   * During streaming recovery, however, we will eventually fill the
-   * disk if we never clean up, so we have to. That's not an issue
-   * with file-based archive recovery because in that case we
-   * restore one XLOG file at a time, on-demand, and with a
-   * different filename that can't be confused with regular XLOG
-   * files.
-   */
-   if (WalRcvInProgress() || XLogArchiveCheckDone(xlde-d_name))
+   if (RecoveryInProgress() || XLogArchiveCheckDone(xlde-d_name))
  [ delete the file ]


With that commit, we started to keep WAL segments restored from the archive
in pg_xlog, so we needed to start deleting old segments during archive
recovery, even when streaming replication was not active. But the above
change was to broad; we started to delete old segments also during crash
recovery.

The above should check InArchiveRecovery, ie. only delete old files when in
archive recovery, not when in crash recovery. But there's one little
complication: InArchiveRecovery is currently only valid in the startup
process, so we'll need to also share it in shared memory, so that the
checkpointer process can access it.

I propose the attached patch to fix it.


At least in 9.2, when the archived file is restored into pg_xlog, its xxx.done
archive status file is created. So we don't need to check InArchiveRecovery
when deleting old WAL files. Checking whether xxx.done exists is enough.


Hmm, what about streamed WAL files? I guess we could go back to the 
pre-9.2 coding, and check WalRcvInProgress(). But I didn't actually like 
that too much, it seems rather random that old streamed files are 
recycled when wal receiver is running at the time of restartpoint, and 
otherwise not. Because whether wal receiver is running at the time the 
restartpoint happens has little to do with which files were created by 
streaming replication. With the right pattern of streaming files from 
the master, but always being teporarily disconnected when the 
restartpoint runs, you could still accumulate WAL files infinitely.



Unfortunately in HEAD, xxx.done file is not created when restoring archived
file because of absence of the patch. We need to implement that first.


Ah yeah, that thing again.. 
(http://www.postgresql.org/message-id/50df5ba7.6070...@vmware.com) I'm 
going to forward-port that patch now, before it's forgotten again. It's 
not clear to me what the holdup was on this, but whatever the bigger 
patch we've been waiting for is, it can just as well be done on top of 
the forward-port.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unarchived WALs deleted after crash

2013-02-15 Thread Fujii Masao
On Sat, Feb 16, 2013 at 2:07 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 On 15.02.2013 18:10, Fujii Masao wrote:

 On Fri, Feb 15, 2013 at 11:31 PM, Heikki Linnakangas
 hlinnakan...@vmware.com  wrote:

 -  /*

 -   * Normally we don't delete old XLOG files during recovery to
 -   * avoid accidentally deleting a file that looks stale due to a
 -   * bug or hardware issue, but in fact contains important data.
 -   * During streaming recovery, however, we will eventually fill the
 -   * disk if we never clean up, so we have to. That's not an issue
 -   * with file-based archive recovery because in that case we
 -   * restore one XLOG file at a time, on-demand, and with a
 -   * different filename that can't be confused with regular XLOG
 -   * files.
 -   */
 -   if (WalRcvInProgress() || XLogArchiveCheckDone(xlde-d_name))
 +   if (RecoveryInProgress() || XLogArchiveCheckDone(xlde-d_name))
   [ delete the file ]


 With that commit, we started to keep WAL segments restored from the
 archive
 in pg_xlog, so we needed to start deleting old segments during archive
 recovery, even when streaming replication was not active. But the above
 change was to broad; we started to delete old segments also during crash
 recovery.

 The above should check InArchiveRecovery, ie. only delete old files when
 in
 archive recovery, not when in crash recovery. But there's one little
 complication: InArchiveRecovery is currently only valid in the startup
 process, so we'll need to also share it in shared memory, so that the
 checkpointer process can access it.

 I propose the attached patch to fix it.


 At least in 9.2, when the archived file is restored into pg_xlog, its
 xxx.done
 archive status file is created. So we don't need to check
 InArchiveRecovery
 when deleting old WAL files. Checking whether xxx.done exists is enough.


 Hmm, what about streamed WAL files? I guess we could go back to the pre-9.2
 coding, and check WalRcvInProgress(). But I didn't actually like that too
 much, it seems rather random that old streamed files are recycled when wal
 receiver is running at the time of restartpoint, and otherwise not. Because
 whether wal receiver is running at the time the restartpoint happens has
 little to do with which files were created by streaming replication. With
 the right pattern of streaming files from the master, but always being
 teporarily disconnected when the restartpoint runs, you could still
 accumulate WAL files infinitely.

Walreceiver always creates .done file when it closes the
already-flushed WAL file
and switches WAL file to next. So we also don't  need to check
WalRcvInProgress().

 Unfortunately in HEAD, xxx.done file is not created when restoring
 archived
 file because of absence of the patch. We need to implement that first.


 Ah yeah, that thing again..
 (http://www.postgresql.org/message-id/50df5ba7.6070...@vmware.com) I'm going
 to forward-port that patch now, before it's forgotten again. It's not clear
 to me what the holdup was on this, but whatever the bigger patch we've been
 waiting for is, it can just as well be done on top of the forward-port.

I posted the patch to that thread.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unarchived WALs deleted after crash

2013-02-15 Thread Simon Riggs
On 15 February 2013 17:07, Heikki Linnakangas hlinnakan...@vmware.com wrote:

 Unfortunately in HEAD, xxx.done file is not created when restoring
 archived
 file because of absence of the patch. We need to implement that first.


 Ah yeah, that thing again..
 (http://www.postgresql.org/message-id/50df5ba7.6070...@vmware.com) I'm going
 to forward-port that patch now, before it's forgotten again. It's not clear
 to me what the holdup was on this, but whatever the bigger patch we've been
 waiting for is, it can just as well be done on top of the forward-port.

Agreed. I wouldn't wait for a better version now.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unarchived WALs deleted after crash

2013-02-15 Thread Heikki Linnakangas

On 15.02.2013 19:16, Fujii Masao wrote:

On Sat, Feb 16, 2013 at 2:07 AM, Heikki Linnakangas
hlinnakan...@vmware.com  wrote:

On 15.02.2013 18:10, Fujii Masao wrote:


At least in 9.2, when the archived file is restored into pg_xlog, its
xxx.done
archive status file is created. So we don't need to check
InArchiveRecovery
when deleting old WAL files. Checking whether xxx.done exists is enough.


Hmm, what about streamed WAL files? I guess we could go back to the pre-9.2
coding, and check WalRcvInProgress(). But I didn't actually like that too
much, it seems rather random that old streamed files are recycled when wal
receiver is running at the time of restartpoint, and otherwise not. Because
whether wal receiver is running at the time the restartpoint happens has
little to do with which files were created by streaming replication. With
the right pattern of streaming files from the master, but always being
teporarily disconnected when the restartpoint runs, you could still
accumulate WAL files infinitely.


Walreceiver always creates .done file when it closes the
already-flushed WAL file
and switches WAL file to next. So we also don't  need to check
WalRcvInProgress().


Ah, I missed that part of the patch.

Okay, agreed, that's a better fix. I committed your forward-port of the 
9.2 patch to master, reverted my earlier fix for this bug, and simply 
removed the 
InArchiveRecovery/ArchiveRecoveryInProgress()/RecoveryInProgress() 
condition from RemoveOldXlogFiles().


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unarchived WALs deleted after crash

2013-02-15 Thread Simon Riggs
On 15 February 2013 16:10, Fujii Masao masao.fu...@gmail.com wrote:

 I propose the attached patch to fix it.

 At least in 9.2, when the archived file is restored into pg_xlog, its xxx.done
 archive status file is created. So we don't need to check InArchiveRecovery
 when deleting old WAL files. Checking whether xxx.done exists is enough.

I don't agree. The extra test Heikki put in was useful and helps avoid
issues when we get the .done creation wrong, or when people delete
them.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Unarchived WALs deleted after crash

2013-02-14 Thread Jehan-Guillaume de Rorthais
Hi,

I am facing an unexpected behavior on a 9.2.2 cluster that I can
reproduce on current HEAD.

On a cluster with archive enabled but failing, after a crash of
postmaster, the checkpoint occurring before leaving the recovery mode
deletes any additional WALs, even those waiting to be archived.

Because of this, after recovering from the crash, previous PITR backup
can not be used to restore the instance to a time where archiving was
failing. Any slaves fed by WAL or lagging in SR need to be recreated.

AFAICT, this is not documented and I would expect the WALs to be
archived by the archiver process when the cluster exits the recovery step.

Here is a simple scenario to reproduce this.

Configuration:

  wal_level = archive
  archive_mode = on
  archive_command = '/bin/false'
  log_checkpoints = on

Scenario:
  createdb test
  psql -c 'create table test as select i, md5(i::text) from
generate_series(1,300) as i;' test
  kill -9 $(head -1 $PGDATA/postmaster.pid)
  pg_ctl start

Using this scenario, log files shows:

  LOG:  archive command failed with exit code 1
  DETAIL:  The failed archive command was: /bin/false
  WARNING:  transaction log file 00010001 could not
be archived: too many failures
  LOG:  database system was interrupted; last known up at 2013-02-14
16:12:58 CET
  LOG:  database system was not properly shut down; automatic recovery
in progress
  LOG:  crash recovery starts in timeline 1 and has target timeline 1
  LOG:  redo starts at 0/11400078
  LOG:  record with zero length at 0/13397190
  LOG:  redo done at 0/13397160
  LOG:  last completed transaction was at log time 2013-02-14
16:12:58.49303+01
  LOG:  checkpoint starting: end-of-recovery immediate
  LOG:  checkpoint complete: wrote 2869 buffers (17.5%); 0 transaction
log file(s) added, 9 removed, 7 recycled; write=0.023 s, sync=0.468 s,
total=0.739 s; sync files=2, longest=0.426 s, average=0.234 s
  LOG:  autovacuum launcher started
  LOG:  database system is ready to accept connections
  LOG:  archive command failed with exit code 1
  DETAIL:  The failed archive command was: /bin/false
  LOG:  archive command failed with exit code 1
  DETAIL:  The failed archive command was: /bin/false
  LOG:  archive command failed with exit code 1
  DETAIL:  The failed archive command was: /bin/false
  WARNING:  transaction log file 00010011 could not
be archived: too many failures

Before the kill, 00010001 was the WAL to archive.
After the kill, the checkpoint deleted 9 files before exiting recovery
mode and 00010011 become the first WAL to archive.
00010001 through 00010010 were
removed or recycled.

Is it expected ?
-- 
Jehan-Guillaume de Rorthais
http://www.dalibo.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unarchived WALs deleted after crash

2013-02-14 Thread Daniel Farina
On Thu, Feb 14, 2013 at 7:45 AM, Jehan-Guillaume de Rorthais
j...@dalibo.com wrote:
 Hi,

 I am facing an unexpected behavior on a 9.2.2 cluster that I can
 reproduce on current HEAD.

 On a cluster with archive enabled but failing, after a crash of
 postmaster, the checkpoint occurring before leaving the recovery mode
 deletes any additional WALs, even those waiting to be archived.

I believe I have encountered this recently, but didn't get enough
chance to work with it to correspond.  For me, the cause was
out-of-disk on the file system that exclusively contained WAL,
backlogged because archiving fell behind writing.  This causes the
cluster to crash -- par for the course -- but also an archive gap was
created.  At the time I thought there was some kind of bug in dealing
with out of space issues in the archiver (the .ready bookkeeping), but
the symptoms I saw seem like they might be explained by your report,
too.

--
fdr


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers