Re: [HACKERS] Spread checkpoint sync

Greg Smith Sun, 21 Nov 2010 08:39:03 -0800

Jeff Janes wrote:

And for very large memory
systems, even 1% may be too much to cache (dirty*_ratio can only be
set in integer percent points), so recent kernels introduced
dirty*_bytes parameters.  I like these better because they do what
they say.  With the dirty*_ratio, I could never figure out what it was
a ratio of, and the results were unpredictable without extensive
experimentation.

Right, you can't set dirty_background_ratio low enough to make thisproblem go away. Even attempts to set it to 1%, back when that that wasthe right size for it, seem to be defeated by other mechanisms withinthe kernel. Last time I looked at the related source code, it seemedthe "congestion control" logic that kicks in to throttle writes was alikely suspect. This is why I'm not real optimistic about newermechanism like the dirty_background_bytes added 2.6.29 to help here, asthat just gives a mapping to setting lower values; the same basic logicis under the hood.

Like Jeff, I've never seen dirty_expire_centisecs help at all, possiblydue to the same congestion mechanism.

Yes, but how much work do we want to put into redoing the checkpoint
logic so that the sysadmin on a particular OS and configuration and FS
can avoid having to change the kernel parameters away from their
defaults?  (Assuming of course I am correctly understanding the
problem, always a dangerous assumption.)

I've been trying to make this problem go away using just the kerneltunables available since 2006. I adjusted them carefully on the serverthat ran into this problem so badly that it motivated the submittedpatch, months before this issue got bad. It didn't help. Maybe if theywere running a later kernel that supported dirty_background_bytes thatwould have worked better. During the last few years, the only thingthat has consistently helped in every case is the checkpoint spreadinglogic that went into 8.3. I no longer expect that the kernel developerswill ever make this problem go away the way checkpoints are written outright now, whereas the last good PostgreSQL work in this area definitelyhelped.

The basic premise of the current checkpoint code is that if you writeall of the buffers out early enough, by the time syncs execute enough ofthe data should have gone out that those don't take very long toprocess. That was usually true for the last few years, on systems witha battery-backed cache; the amount of memory cached by the OS wasrelatively small relative to the RAID cache size. That's not the caseanymore, and that divergence is growing bigger.

The idea that the checkpoint sync code can run in a relatively tightloop, without stopping to do the normal background writer cleanup work,is also busted by that observation.


--
Greg Smith   2ndQuadrant US    [email protected]   Baltimore, MD
PostgreSQL Training, Services and Support        www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books


--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Spread checkpoint sync

Reply via email to