Re: [HACKERS] [9.3 bug] disk space in pg_xlog increases during archive recovery

MauMau Wed, 12 Feb 2014 04:25:08 -0800

From: "Andres Freund" <and...@2ndquadrant.com>

On 2014-02-02 23:50:40 +0900, Fujii Masao wrote:

Right. If standby_mode is enabled, checkpoint_segment can trigger
the restartpoint. But the problem is that the timing of restartpoint
depends on not only the checkpoint parameters (i.e.,
checkpoint_timeout and checkpoint_segments) that are used during
archive recovery but also the checkpoint WAL that was generated
by the master.


Sure. But we really *need* all the WAL since the last checkpoint's redo
location locally to be safe.

For example, could you imagine the case where the master generated
only one checkpoint WAL since the last backup and it crashed with
database corruption. Then DBA decided to perform normal archive
recovery by using the last backup. In this case, even if DBA reduces
both checkpoint_timeout and checkpoint_segments, only one
restartpoint can occur during recovery. This low frequency of
restartpoint might fill up the disk space with lots of WAL files.


I am not sure I understand the point of this scenario. If the primary
crashed after a checkpoint, there won't be that much WAL since it
happened...

> If the issue is that you're not using standby_mode (if so, why?), then
> the fix maybe is to make that apply to a wider range of situations.

I guess that he is not using standby_mode because, according to
his first email in this thread, he said he would like to prevent WAL
from accumulating in pg_xlog during normal archive recovery (i.e., PITR).


Well, that doesn't necessarily prevent you from using
standby_mode... But yes, that might be the case.

I wonder if we shouldn't just always look at checkpoint segments during
!crash recovery.

Maybe we could consider in that direction, but there is a problem. Archive recovery slows down compared to 9.1, because of repeated restartpoints. Archive recovery should be as fast as possible, because it typically applies dozens or hundreds of WAL files, and the DBA desires immediate resumption of operation.

So, I think we should restore 9.1 behavior for archive recovery. The attached patch keeps restored archived WAL in pg_xlog/ only during standby recovery. It is based on Fujii-san's revison of the patch, with AllowCascadeReplication() condition removed from two if statements.


Regards
MauMau

wal_increase_in_pitr_v4.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [9.3 bug] disk space in pg_xlog increases during archive recovery

Reply via email to