On Sunday, July 14, 2013, Greg Smith wrote: > On 7/14/13 5:28 PM, james wrote: > >> Some random seeks during sync can't be helped, but if they are done when >> we aren't waiting for sync completion then they are in effect free. >> > > That happens sometimes, but if you measure you'll find this doesn't > actually occur usefully in the situation everyone dislikes. In a write > heavy environment where the database doesn't fit in RAM, backends and/or > the background writer are constantly writing data out to the OS. WAL is > going out constantly as well, and in many cases that's competing for the > disks too.
While I think it is probably true that many systems don't separate WAL from non-WAL to different IO controllers, is it true that many systems that are in need of heavy IO tuning don't do so? I thought that that would be the first stop for any DBA of an highly IO-write constrained database. > The most popular blocks in the database get high usage counts and they > never leave shared_buffers except at checkpoint time. That's easy to prove > to yourself with pg_buffercache. > > And once the write cache fills, every I/O operation is now competing. > There is nothing happening for free. You're stealing I/O from something > else any time you force a write out. The optimal throughput path for > checkpoints turns out to be delaying every single bit of I/O as long as > possible, in favor of the [backend|bgwriter] writes and WAL. Whenever you > delay a buffer write, you have increased the possibility that someone else > will write the same block again. And the buffers being written by the > checkpointer are, on average, the most popular ones in the database. > Writing any of them to disk pre-emptively has high odds of writing the > same block more than once per checkpoint. Should the checkpointer make multiple passes over the buffer pool, writing out the high usage_count buffers first, because no one else is going to do it, and then going back for the low usage_count buffers in the hope they were already written out? On the other hand, if the checkpointer writes out a low-usage buffer, why would anyone else need to write it again soon? If it were likely to get dirtied often, it wouldn't be low usage. If it was dirtied rarely, it wouldn't be dirty anymore once written. Cheers, Jeff