On Fri, 7 Sep 2007, Simon Riggs wrote:
I think we should do some more basic tests to see where those outliers come from. We need to establish a clear link between number of dirty writes and response time.
With the test I'm running, which is specifically designed to aggrevate this behavior, the outliers on my system come from how Linux buffers writes. I can adjust them a bit by playing with the parameters as described at http://www.westnet.com/~gsmith/content/linux-pdflush.htm but on the hardware I've got here (single 7200RPM disk for database, another for WAL) they don't move much. Once /proc/meminfo shows enough Dirty memory that pdflush starts blocking writes, game over; you're looking at multi-second delays before my plain old IDE disks clear enough debris out to start responding to new requests even with the Areca controller I'm using.
Perhaps output the number of dirty blocks written on the same line as the output of log_min_duration_statement so that we can correlate response time to dirty-block-writes on that statement.
On Linux at least, I'd expect this won't reveal much. There, the interesting correlation is with how much dirty data is in the underlying OS buffer cache. And exactly how that plays into things is a bit strange sometimes. If you go back to Heikki's DBT2 tests with the background writer schemes he tested, he got frustrated enough with that disconnect that he wrote a little test program just to map out the underlying weirdness: http://archives.postgresql.org/pgsql-hackers/2007-07/msg00261.php
I've confirmed his results on my system and done some improvements to that program myself, but pushed further work on it to the side to finish up the main background writer task instead. I may circle back to that. I'd really like to run all this on another OS as well (I have Solaris 10 on my server box but not fully setup yet), but I can only volunteer so much time to work on all this right now.
If there's anything that needs to be looked at more carefully during tests in this area, it's getting more data about just what the underlying OS is doing while all this is going on. Just the output from vmstat/iostat is very informative. Those using DBT2 for their tests get some nice graphs of this already. I've done some pgbench-based tests that included that before that were very enlightening but sadly that system isn't available to me anymore.
-- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD ---------------------------(end of broadcast)--------------------------- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate