Hi,
On Fri, Jul 26, 2013 at 11:42 AM, Greg Smith <g...@2ndquadrant.com> wrote: > On 7/25/13 6:02 PM, didier wrote: > >> It was surely already discussed but why isn't postresql writing >> sequentially its cache in a temporary file? >> > > If you do that, reads of the data will have to traverse that temporary > file to assemble their data. You'll make every later reader pay the random > I/O penalty that's being avoided right now. Checkpoints are already > postponing these random writes as long as possible. You have to take care > of them eventually though. > > > No the log file is only used at recovery time. in check point code: - loop over cache, marks dirty buffers with BM_CHECKPOINT_NEEDED as in current code - other workers can't write and evicted these marked buffers to disk, there's a race with fsync. - check point fsync now or after the next step. - check point loop again save to log these buffers, clear BM_CHECKPOINT_NEEDED but *doesn't* clear BM_DIRTY, of course many buffers will be written again, as they are when check point isn't running. - check point done. During recovery you have to load the log in cache first before applying WAL. Didier