Hi

After struggling with understanding xlog.c and friends far enough to
be able to refactor StartupXLOG to suit the needs of concurrent recovery,
I think I've finally reached a workable (but still a bit hacky) solution.

My design is centered around the idea of a bgreplay process that
takes over the role of the bgwriter in readonly mode, and continously
replays WALs as they arrive. But since recovery during startup is
still necessary (We need to bring a filesystem-level backup into a
consistent state - past minRecoveryLoc - before allowing connections),
this means doing recovery in two steps, from two different processes.

I've changed StartupXLOG to only recover up to minRecoveryLoc in readonly
mode, and to skip all steps that are not required if no writes to
the database will be done later (Especially creating a checkpoint at
the end of recovery). Instead, it posts the pointer to the last recovered
xlog record to shared memory.

bgreplay than uses that pointer for an initial call to ReadRecord to
setup WAL reading for the bgreplay process. Afterwards, it repeatedly
calls ReplayXLOG (new function), which always replays at least
one record (If there is one, otherwise it returns false), until
it reaches a safe restart point.

Currently, in my test setup, I can start a slave in readonly mode and
it will do initial recovery, bring postgres online, and continously
recover from inside bgreplay. There isn't yet any locking between
wal replay and queries.

I'll add that locking during the new few days, which should result
it a very early prototype. The next steps will then be finding a way
to flush backend caches after replaying code that modified system
tables, and (related) finding a way to deal with the flatfiles.

I'd appreciate any comments on this, especially those pointing
out problems that I overlooked.

greetings, Florian Pflug


---------------------------(end of broadcast)---------------------------
TIP 7: You can help support the PostgreSQL project by donating at

               http://www.postgresql.org/about/donate

Reply via email to