Re: [HACKERS] [9.3 bug] disk space in pg_xlog increases during archive recovery

Andres Freund Sat, 01 Feb 2014 12:45:41 -0800

On 2014-01-21 23:37:43 +0200, Heikki Linnakangas wrote:
> On 01/21/2014 07:31 PM, Fujii Masao wrote:
> >On Fri, Dec 20, 2013 at 9:21 PM, MauMau <[email protected]> wrote:
> >>From: "Fujii Masao" <[email protected]>
> >>
> >>>!     if (source == XLOG_FROM_ARCHIVE && StandbyModeRequested)
> >>>
> >>>Even when standby_mode is not enabled, we can use cascade replication and
> >>>it needs the accumulated WAL files. So I think that
> >>>AllowCascadeReplication()
> >>>should be added into this condition.
> >>>
> >>>!       snprintf(recoveryPath, MAXPGPATH, XLOGDIR "/RECOVERYXLOG");
> >>>!       XLogFilePath(xlogpath, ThisTimeLineID, endLogSegNo);
> >>>!
> >>>!       if (restoredFromArchive)
> >>>
> >>>Don't we need to check !StandbyModeRequested and
> >>>!AllowCascadeReplication()
> >>>here?
> >>
> >>Oh, you are correct.  Okay, done.
> >
> >Thanks! The patch looks good to me. Attached is the updated version of
> >the patch. I added the comments.
> 
> Sorry for reacting so slowly, but I'm not sure I like this patch. It's a
> quite useful property that all the WAL files that are needed for recovery
> are copied into pg_xlog, even when restoring from archive, even when not
> doing cascading replication. It guarantees that you can restart the standby,
> even if the connection to the archive is lost for some reason. I
> intentionally changed the behavior for archive recovery too, when it was
> introduced for cascading replication. Also, I think it's good that the
> behavior does not depend on whether cascading replication is enabled - it's
> a quite subtle difference.
> 
> So, IMHO this is not a bug, it's a feature.


Very much seconded. With the old behaviour it's possible to get into the
situation that you cannot get your standby productive anymore if the
archive server died. Since the archive server is often the primary
that's imo unacceptable.

I'd even go as far as saying the previous behaviour is a bug.

> To solve the original problem of running out of disk space in archive
> recovery, I wonder if we should perform restartpoints more aggressively. We
> intentionally don't trigger restatpoings by checkpoint_segments, only
> checkpoint_timeout, but I wonder if there should be an option for that.
> MauMau, did you try simply reducing checkpoint_timeout, while doing
> recovery?

Hm, don't we actually do cause trigger restartpoints based on checkpoint
segments?

static int
XLogPageRead(XLogReaderState *xlogreader, XLogRecPtr targetPagePtr, int reqLen,
             XLogRecPtr targetRecPtr, char *readBuf, TimeLineID *readTLI)
{
...

    if (readFile >= 0 && !XLByteInSeg(targetPagePtr, readSegNo))
    {
        /*
         * Request a restartpoint if we've replayed too much xlog since the
         * last one.
         */
        if (StandbyModeRequested && bgwriterLaunched)
        {
            if (XLogCheckpointNeeded(readSegNo))
            {
                (void) GetRedoRecPtr();
                if (XLogCheckpointNeeded(readSegNo))
                    RequestCheckpoint(CHECKPOINT_CAUSE_XLOG);
            }
        }
...

Greetings,

Andres Freund

-- 
 Andres Freund                     http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [9.3 bug] disk space in pg_xlog increases during archive recovery

Reply via email to