Re: [HACKERS] hot standby startup, visibility map, clog

2011-06-11 Thread Robert Haas
On Thu, Jun 9, 2011 at 5:14 AM, Daniel Farina dan...@heroku.com wrote:
 The first fact is that turning off hot standby will let the cluster
 start up, but only after seeing a spate of messages like these (dozen
 or dozens, not thousands):

 2011-06-09 08:02:32 UTC  LOG:  restored log file
 0002002C00C0 from archive
 2011-06-09 08:02:33 UTC  WARNING:  xlog min recovery request
 2C/C1F09658 is past current point 2C/C037B278
 2011-06-09 08:02:33 UTC  CONTEXT:  writing block 0 of relation
 base/16385/16784_vm
        xlog redo insert: rel 1663/16385/128029; tid 114321/63
 2011-06-09 08:02:33 UTC  LOG:  restartpoint starting: xlog

 Most importantly, *all* such messages are in visibility map forks
 (_vm).  I reasonably confident that my code does not start reading
 data until pg_start_backup() has returned, and blocks on
 pg_stop_backup() after having read all the data.  Also, the mailing
 list correspondence at
 http://archives.postgresql.org/pgsql-hackers/2010-11/msg02034.php
 suggests that the visibility map is not flushed at checkpoints, so
 perhaps with some poor timing an old page can wander onto disk even
 after a checkpoint barrier that pg_start_backup waits for. (I have not
 yet found the critical section that makes visibilitymap buffers immune
 to checkpoint though).

I don't believe there's anything that makes visibilitymap buffers
immune to checkpoint.  I might be wrong.  You could probably find out
by inserting a few lines of code into the buffer manager to inspect
each buffer tag as the buffer is written out and elog() if the fork
number is VISIBILITYMAP_FORKNUM.  There is some weirdness around how
the LSNs of visibility map pages are set, however.  See
visibilitymap_set()/clear().  Clearing a visibility map bit is done
without touching the LSN at all; setting the bit advances the LSN to
that of the heap page, if it currently precedes that value.  It's not
obvious to me whether or how that's related.

I'm a bit puzzled by the warning message itself, too.  I think that in
order to generate that warning, you need to try to flush a page to
disk whose LSN is higher than the current minimum recovery point.  And
the minimum recovery point is initially set to point to the redo
pointer from the last checkpoint.  But during an online backup, it's
expected that there may be pages on disk with LSNs higher than the
redo pointer from the last checkpoint.  So actually now I'm not sure
why people don't see this warning whenever a lengthy recovery is
required to reach consitency, especially with small values of
shared_buffers.  I'm probably missing something here...

One thing that might make it easier to understand what's going on here
is to crank the log level up to DEBUG2 and post the full output from a
recovery that exhibits this problem, rather than just a few lines.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] hot standby startup, visibility map, clog

2011-06-09 Thread Daniel Farina
Hello list,

A little while ago time ago I posted about how my ... exciting 
backup procedure caused occasional problems starting due to clog not
being big enough.
(http://archives.postgresql.org/pgsql-hackers/2011-04/msg01148.php) I
recently had a reproduction and a little bit of luck, and I think I
have a slightly better idea of what may be causing this.

The first fact is that turning off hot standby will let the cluster
start up, but only after seeing a spate of messages like these (dozen
or dozens, not thousands):

2011-06-09 08:02:32 UTC  LOG:  restored log file
0002002C00C0 from archive
2011-06-09 08:02:33 UTC  WARNING:  xlog min recovery request
2C/C1F09658 is past current point 2C/C037B278
2011-06-09 08:02:33 UTC  CONTEXT:  writing block 0 of relation
base/16385/16784_vm
xlog redo insert: rel 1663/16385/128029; tid 114321/63
2011-06-09 08:02:33 UTC  LOG:  restartpoint starting: xlog

Most importantly, *all* such messages are in visibility map forks
(_vm).  I reasonably confident that my code does not start reading
data until pg_start_backup() has returned, and blocks on
pg_stop_backup() after having read all the data.  Also, the mailing
list correspondence at
http://archives.postgresql.org/pgsql-hackers/2010-11/msg02034.php
suggests that the visibility map is not flushed at checkpoints, so
perhaps with some poor timing an old page can wander onto disk even
after a checkpoint barrier that pg_start_backup waits for. (I have not
yet found the critical section that makes visibilitymap buffers immune
to checkpoint though).

Given all that, if the smgr's generic read path that checks the LSN
and possibly the clog (but apparently only in hot standby mode, since
pre-hot-standby the clog's intermediate states were not so
interesting...) has a problem with such uncheckpointed pages, then it
would seem reasonable that the system refuses to start vs. the way it
once did.

FWIW, letting recovery run without hot standby for a little while,
canceling, and then starting again after the danger zone had passed
would allow recovery to proceed correctly, as one might expect.

Thoughts?

-- 
fdr

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers