On 6/27/13 11:08 AM, Robert Haas wrote:
I'm pretty sure Greg Smith tried it the fixed-sleep thing before and it didn't work that well.
That's correct, I spent about a year whipping that particular horse and submitted improvements on it to the community. http://www.postgresql.org/message-id/4d4f9a3d.5070...@2ndquadrant.com and its updates downthread are good ones to compare this current work against.
The important thing to realize about just delaying fsync calls is that it *cannot* increase TPS throughput. Not possible in theory, obviously doesn't happen in practice. The most efficient way to write things out is to delay those writes as long as possible. The longer you postpone a write, the more elevator sorting and write combining you get out of the OS. This is why operating systems like Linux come tuned for such delayed writes in the first place. Throughput and latency are linked; any patch that aims to decrease latency will probably slow throughput.
Accordingly, the current behavior--no delay--is already the best possible throughput. If you apply a write timing change and it seems to increase TPS, that's almost certainly because it executed less checkpoint writes. It's not a fair comparison. You have to adjust any delaying to still hit the same end point on the checkpoint schedule. That's what my later submissions did, and under that sort of controlled condition most of the improvements went away.
Now, I still do really believe that better spacing of fsync calls helps latency in the real world. Far as I know the server that I developed that patch for originally in 2010 is still running with that change. The result is not a throughput change though; there is a throughput drop with a latency improvement. That is the unbreakable trade-off in this area if all you touch is scheduling.
The reason why I was ignoring this discussion and working on pgbench throttling until now is that you need to measure latency at a constant throughput to advance here on this topic, and that's exactly what the new pgbench feature enables. If we can take the current checkpoint scheduler and an altered one, run both at exactly the same rate, and one gives lower latency, now we're onto something. It's possible to do that with DBT-2 as well, but I wanted something really simple that people could replicate results with in pgbench.
-- Greg Smith 2ndQuadrant US g...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (firstname.lastname@example.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers