On Tue, 2007-07-10 at 02:41 +0200, Florian G. Pflug wrote:
> After struggling with understanding xlog.c and friends far enough to
> be able to refactor StartupXLOG to suit the needs of concurrent recovery,
> I think I've finally reached a workable (but still a bit hacky) solution.
Sounds like great progress.
> My design is centered around the idea of a bgreplay process that
> takes over the role of the bgwriter in readonly mode, and continously
> replays WALs as they arrive. But since recovery during startup is
> still necessary (We need to bring a filesystem-level backup into a
> consistent state - past minRecoveryLoc - before allowing connections),
> this means doing recovery in two steps, from two different processes.
> I've changed StartupXLOG to only recover up to minRecoveryLoc in readonly
> mode, and to skip all steps that are not required if no writes to
> the database will be done later (Especially creating a checkpoint at
> the end of recovery). Instead, it posts the pointer to the last recovered
> xlog record to shared memory.
The split of processing sounds correct. I think we need to consider
whether the startup process should stay around longer or we go for
bgreplay. Having said that, it isn't important right now, so lets bypass
that and carry on with the investigation into other unknowns.
> bgreplay than uses that pointer for an initial call to ReadRecord to
> setup WAL reading for the bgreplay process. Afterwards, it repeatedly
> calls ReplayXLOG (new function), which always replays at least
> one record (If there is one, otherwise it returns false), until
> it reaches a safe restart point.
> Currently, in my test setup, I can start a slave in readonly mode and
> it will do initial recovery, bring postgres online, and continously
> recover from inside bgreplay. There isn't yet any locking between
> wal replay and queries.
I think we need a way of signalling to backends that the recovery is
still in progress or not. New transactions should then be readonly or
readwrite as appropriate. I think maybe you've thought of this already
from your earlier posts?
> I'll add that locking during the new few days, which should result
> it a very early prototype. The next steps will then be finding a way
> to flush backend caches after replaying code that modified system
> tables, and (related) finding a way to deal with the flatfiles.
Seems like the replay should do that and then send out a signal when
replay comes to a halt. Sounds like the major area of complexity, so
stay away from the minefield.
> I'd appreciate any comments on this, especially those pointing
> out problems that I overlooked.
Right now, we're looking to uncover problems not solve them.
You're blazing a trail here, so let's isolate problems and return to
them later. The faster we get to the point that we can run a real SELECT
query the better this will be. That is the half-way point, so get there
as fast as you can and then we can re-evaluate the issues that causes.
Much better to achieve the goal with a long list of caveats than to fall
short, yet have solved a number of smaller problems. We may need to
start again from scratch for the final version. PITR as originally
committed was version 5, and we're probably around version 20 now.
---------------------------(end of broadcast)---------------------------
TIP 7: You can help support the PostgreSQL project by donating at