On Mon, 7 Jul 2008, ITAGAKI Takahiro wrote:

I will have a plan to test it on RAID-5 disks, where sequential writing
are much better than random writing. I'll send the result as an evidence.

If you're running more tests here, please turn on log_checkpoints and collect the logs while the test is running. I'm really curious if there's any significant difference in what that reports here in the sorted case vs. the regular one.

Smoothed checkpoint in 8.3 spreads write(), but calls fsync() at once. With sorted writes, we can call fsync() segment-by-segment for each writes of dirty pages contained in the segment. It could improve worst response time during checkpoints.

Further decreasing the amount of data that is fsync'd at any point in time might be a bigger improvement than just the sorting itself is doing (so far I haven't seen anything really significant just from the sort but am still testing).

One thing I didn't see any comments from you on is how/if the sorted writes patch lowers worst-case latency. That's the area I'd hope an improved fsync protocol would help most with, rather than TPS, which might even go backwards because writes won't be as bunched and therefore will have more seeking. It's easy enough to analyze the data coming from "pgbench -l" to figure that out; example shell snipped that shows just the worst ones:

pgbench -l -N <db>
p=$!
wait $p
mv pgbench_log.${p} pgbench.log
cat pgbench.log | cut -f 3 -d " " | sort -n | tail

Actually graphing the latencies can be even more instructive, I have some examples of that on my web page you may have seen before.

In addition, the current smgr layer is completely useless because
it cannot be extended dynamically and cannot handle multiple md-layer
modules. I would rather merge current smgr and part of bufmgr into
a new smgr and add smgr_hook() than bulk_io_hook().

I don't really have a firm opinion here about the code to comment on this specific suggestion, but I will say that I've found the amount of layering in this area makes it difficult to understand just what's going on sometimes (especially when new to it). A lot of that abstraction felt a bit pass-through to me, and anything that would collapse that a bit would be helpful for streamlining the code instrumenting going on with things like dtrace.

--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to