On Fri, 2008-08-08 at 11:47 +0900, Fujii Masao wrote:
> On Thu, Aug 7, 2008 at 11:34 PM, Simon Riggs <[EMAIL PROTECTED]> wrote:
> >

> In this situation, the history file (000000010000000000000004.00000020.backup)
> is behind the stoppoint (000000010000000000000004) in the alphabetic order.
> So, pg_stop_backup should wait for both the stoppoint and the history
> file, I think.

OK, I see that now.

> 
> > !   while (!XLogArchiveCheckDone(stopxlogfilename, false))
> 
> If a concurrent checkpoint removes the status file before 
> XLogArchiveCheckDone,
> pg_stop_backup continues waiting forever. This is undesirable behavior.

I think it will only get removed by the second checkpoint, not the
first. So the risk of that happening seems almost certainly impossible.
But we'll put in a check just in case.

> Yes, statement_timeout may help. But, I don't want to use it, because the
> *successful* backup is canceled.
> 
> How about checking whether the stoppoint was archived by comparing with
> the last WAL archived. The archiver process can tell the last WAL archived.
> Or, we can calculate it from the status file.

I think its easier to test whether the stopxlogfilename still exists in
pg_xlog. If not, we know it has been archived away. We can add that as
an extra condition inside the loop.

So thinking we should test XLogArchiveCheckDone() for both
stopxlogfilename and history file and then stat for the stop WAL file:



        BackupHistoryFileName(histfilepath, ThisTimeLineID, _logId, _logSeg,
                                                  startpoint.xrecoff % 
XLogSegSize);

        seconds_before_warning = 60;
        waits = 0;

        while (!XLogArchiveCheckDone(histfilepath, false) || 
                   !XLogArchiveCheckDone(stopxlogfilename, false))
        {
                struct stat stat_buf;
                char            xlogpath[MAXPGPATH];

                /*
                 * Check to see if file has already been archived and WAL file
                 * removed by a concurrent checkpoint
                 */
                snprintf(xlogpath, MAXPGPATH, XLOGDIR "/%s", stopxlogfilename);
                if (XLogArchiveCheckDone(histfilepath, false) &&
                        stat(xlogpath, &stat_buf) != 0)
                        break;

                CHECK_FOR_INTERRUPTS();

                pg_usleep(1000000L);

                if (++waits >= seconds_before_warning)
                {
                        seconds_before_warning *= 2;     /* This wraps in >10 
years... */
                        elog(WARNING, "pg_stop_backup() waiting for archive to 
complete " 
                                                        "(%d seconds delay)", 
waits);
                }
        }


-- 
 Simon Riggs           www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to