On Thu, 23 Aug 2007, Tom Lane wrote:
It is doubtless true in a lightly loaded system, but once the kernel is under any kind of memory pressure I think it's completely wrong.
The fact that so many tests I've done or seen get maximum throughput in terms of straight TPS with the background writer turned completely off is why I stated that so explicitly. I understand what you're saying in terms of memory pressure, all I'm suggesting is that the empirical tests suggest the current background writer even with moderate improvements doesn't necessarily help when you get there. If writes are blocking, whether the background writer does them slightly ahead of time or whether the backend does them itself doesn't seem to matter very much. On a heavily loaded system, your throughput is bottlenecked at the disk either way--and therefore it's all the more important in those cases to never do a write until you absolutely have to, lest it be wasted.
If you're still fiddling with it then you probably aren't going to get it right in the next few days.
The implementation is fine most of the time, I've just found some corner cases in testing I'd like to improve stability on (mainly how best to handle when no buffers were allocated during the previous period, some small concerns about the first pass over the pool). What I'm thinking of doing is taking a couple of my assumptions/techniques and turning them into things that can be turned on or off with #DEFINE, that way the parts of the code that people don't like are easy to identify and pull out. I've already done with that with one section.
Maybe you need to put back the eliminated tuning parameter in the form of the scaling factor to be used here. I don't like 1.0, mainly because I don't believe your assumption (2). I'm willing to concede that 2.0 might be too much, but I don't know where in between is the sweet spot.
That would be easy to implement and add some flexibility, so I'll do that. bgwriter_lru_percent becomes bgwriter_lru_multiplier, possibly to be renamed later if someone comes up with a snappier name.
Also, we might need a tuning parameter for the reaction speed of the moving average --- what are you using for that?
It's hard-coded at 16 samples. Seemed stable around 10-20, picked 16 in so maybe it will optimize usefully to a bit shift. On the reaction side, it actually reacts faster than that--if the most recent allocation is greater than the average, it uses that instead. The number of samples has more of an impact on the trailing side, and accordingly isn't that critical.
-- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings