                        imola-336       imola-337       imola-340
writes by checkpoint      38302           30410           39529
writes by bgwriter       350113         2205782         1418672
writes by backends      1834333          265755          787633
writes total            2222748         2501947         2245834
allocations             2683170         2657896         2699974

It looks like Tom's idea is not a winner; it leads to more writes than necessary.

The incremental number of writes is not that large; only about 10% more.
The interesting thing is that those "extra" writes must represent
buffers that were re-touched after their usage_count went to zero, but
before they could be recycled by the clock sweep.  While you'd certainly
expect some of that, I'm surprised it is as much as 10%.  Maybe we need
to play with the buffer allocation strategy some more.

The very small difference in NOTPM among the three runs says that either
this whole area is unimportant, or DBT2 isn't a good test case for it;
or maybe that there's something wrong with the patches?

On imola-340, there's still a significant amount of backend writes. I'm still not sure what we should be aiming at. Is 0 backend writes our goal?

Well, the lower the better, but not at the cost of a very large increase
in total writes.

Imola-340 was with a patch along the lines of Itagaki's original patch, ensuring that there's as many clean pages in front of the clock head as were consumed by backends since last bgwriter iteration.

This seems intuitively wrong, since in the presence of bursty request
behavior it'll constantly be getting caught short of buffers.  I think
you need a safety margin and a moving-average decay factor.  Possibly
something like

        buffers_to_clean = Max(buffers_used * 1.1,
                               buffers_to_clean * 0.999);

where buffers_used is the current observation of demand.  This would
give us a safety margin such that buffers_to_clean is not less than
the largest demand observed in the last 100 iterations (0.999 ^ 100
is about 0.90, cancelling out the initial 10% safety margin), and it
takes quite a while for the memory of a demand spike to be forgotten

That would be overly aggressive on a workload that's steady on average, but consists of small bursts. Like this: 0 0 0 0 100 0 0 0 0 100 0 0 0 0 100. You'd end up writing ~100 pages on every bgwriter round, but you only need an average of 20 pages per round. That'd be effectively the same as keeping all buffers with usage_count=0 clean.

BTW, I believe that kind of workload is actually very common. That's what you get if one transaction causes say 10-100 buffer allocations, and you execute one such transaction every few seconds.

