On Tue, Feb 28, 2012 at 9:49 AM, Robert Haas <robertmh...@gmail.com> wrote: > On Tue, Feb 28, 2012 at 11:46 AM, Robert Haas <robertmh...@gmail.com> wrote: >> >> This is an interesting hypothesis which I think we can test. I'm >> thinking of writing a quick patch (just for testing, not for commit) >> to set a new buffer flag BM_BGWRITER_CLEANED to every buffer the >> background writer cleans. Then we can keep a count of how often such >> buffers are dirtied before they're evicted, vs. how often they're >> evicted before they're dirtied. If any significant percentage of them >> are redirtied before they're evicted, that would confirm this >> hypothesis. At any rate I think the numbers would be interesting to >> see. > > Patch attached. > ... > That doesn't look bad at all. Then I reset the stats, tried it again, > and got this: > > LOG: bgwriter_clean: 3863 evict-before-dirty, 198 dirty-before-evict > LOG: bgwriter_clean: 3861 evict-before-dirty, 199 dirty-before-evict > LOG: bgwriter_clean: 3978 evict-before-dirty, 218 dirty-before-evict > LOG: bgwriter_clean: 3928 evict-before-dirty, 204 dirty-before-evict > LOG: bgwriter_clean: 3956 evict-before-dirty, 207 dirty-before-evict > LOG: bgwriter_clean: 3906 evict-before-dirty, 222 dirty-before-evict > LOG: bgwriter_clean: 3912 evict-before-dirty, 197 dirty-before-evict > LOG: bgwriter_clean: 3853 evict-before-dirty, 200 dirty-before-evict > > OK, that's not so good, but I don't know why it's different.
I don't think reseting the stats has anything to do with it, it is just that the shared_buffers warmed up over time. On my testing, this dirty-before-evict is because the bgwriter is riding too far ahead of the clock sweep, because of scan_whole_pool_milliseconds. Because it is far ahead, that leaves a lot of run between the two pointers for re-dirtying cache hits to land. Not only is 2 minutes likely to be too small of a value for large shared_buffers, but min_scan_buffers doesn't live up to its name. It is not the minimum buffers to scan, it is the minimum to find/make reusable. If lots of buffers have a nonzero usagecount (and if your data doesn't fix in shared_buffers, it is hard to see how more than half of the buffers can have zero usagecount) or are pinned, you are scanning a lot more than min_scan_buffers. If I disable that, then the bgwriter remains "just in time", just slightly ahead of the clock-sweep, and the dirty-before-evict drops a lot. If scan_whole_pool_milliseconds is to be used at all, it seems like it should not be less than checkpoint_timeout. If I don't want checkpoints trashing my IO, why would I want someone else to do it instead? Cheers, Jeff -- Sent via pgsql-hackers mailing list (email@example.com) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers