On Wed, Apr 18, 2018 at 2:22 AM, Michael Paquier <mich...@paquier.xyz> wrote: > I was thinking about this problem, and it looks that one approach for > double-writes would be to introduce it as a secondary WAL stream > independent from the main one: > - Once a buffer is evicted from shared buffers and is dirty, write it to > double-write stream and to the data file, and only sync it to the > double-write stream. > - At recovery, replay the WAL stream for double-writes first.
I don't really think that this can work. If we're in archive recovery (i.e. recovery of *indefinite* duration), what does it mean to replay the double-writes "first"? What I think probably needs to happen instead is that the secondary WAL stream contains a bunch of records of the form < LSN, block ID, page image >. When recovery replays the WAL record for an LSN, it also restores any double-write images for that LSN. So in effect that WAL format stays the way it is now, but the full page images are moved out of line. If this is all done right, the standby should be able to regenerate the double-write stream without receiving it from the master. That would be good, because then the volume of WAL from master to standby would drop by a large amount. However, it's hard to see how this would perform well. The double-write stream would have to obey the WAL-before-data rule; that is, every eviction from shared buffers would have to flush the WAL *and the double-write buffer*. Unless we're running on hardware where fsync() is very cheap, such as NVRAM, that increase in the total number of fsyncs is probably going to pinch. You'd probably want to have a dwbuf_writer process like wal_writer so that the fsyncs can be issued concurrently, but I suspect that the filesystem will execute them sequentially anyway, hence the pinch. I think this is an interesting topic, but I don't plan to work on it because I have no confidence that I could do it well enough to come out ahead vs. the status quo. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company