On 05/26/2014 02:26 PM, Greg Stark wrote:
On Mon, May 26, 2014 at 1:22 PM, Heikki Linnakangas <hlinnakan...@vmware.com
wrote:


The second record is generated before the checkpoint is finished and the
checkpoint record is written.  So it will be there.

(if you crash before the checkpoint is finished, the in-progress
checkpoint is no good for recovery anyway, and won't be used)

Another idea would be to have separate checkpoints for each buffer
partition. You would have to start recovery from the oldest checkpoint of
any of the partitions.

Yeah. Simon suggested that when we talked about this, but I didn't understand how that works at the time. I think I do now. The key to making it work is distinguishing, when starting recovery from the latest checkpoint, whether a record for a given page can be replayed safely. I used flags on WAL records in my proposal to achieve this, but using buffer partitions is simpler.

For simplicity, let's imagine that we have two Redo-pointers for each checkpoint record: one for even-numbered pages, and another for odd-numbered pages. When checkpoint begins, we first update the Even-redo pointer to the current WAL insert location, and then flush all the even-numbered buffers in the buffer cache. Then we do the same for Odd.

Recovery begins at the Even-redo pointer. Replay works as normal, but until you reach the Odd-pointer, you refrain from replaying any changes to Odd-numbered pages. After reaching the odd-pointer, you replay everything as normal.

Hmm, that seems actually doable...

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to