In our last installment, we saw that JFS provides higher pgbench
performance than either XFS or ext3.  Using a direct-I/O patch stolen
from 8.1, JFS achieved 105 tps with 100 clients.

To refresh, the machine in question has 5 7200RPM SATA disks, an Areca
RAID controller with 128MB cache, and 1GB of main memory.  pgbench is
being run with a scale factor of 1000 and 100000 total transactions.

At the suggestion of Andreas Dilger of clusterfs, I tried modulating the
size of the ext3 journal, and the mount options (data=journal,
writeback, and ordered).  I turns out that you can achieve a substantial
improvement (almost 50%) by simply mounting the ext3 volume with
data=writeback instead of data=ordered (the default).  Changing the
journal size did not seem to make a difference, except that 256MB is for
some reason pathological (9% slower than the best time).  128MB, the
default for a large volume, gave the same performance as 400MB (the max)
or 32MB.

In the end, the ext3 volume mounted with -o noatime,data=writeback
yielded 88 tps with 100 clients.  This is about 16% off the performance
of JFS with default options.

Andreas pointed me to experimental patches to ext3's block allocation
code and writeback strategy.  I will test these, but I expect the
database community, which seems so attached to its data, will be very
interested in code that has not yet entered mainstream use.

Another frequent suggestion is to put the xlog on a separate device.  I
tried this, and, for a given number of disks, it appears to be
counter-productive.  A RAID5 of 5 disks holding both logs and data is
about 15% faster than a RAID5 of 3 disks with the data, and a mirror of
two disks holding the xlog.

Here are the pgbench results for each permutation of ext3:

Journal Size | Journal Mode | 1 Client | 10 Clients | 100 Clients
32             ordered        28         51           57
32             writeback      34         70           88
64             ordered        29         52           61
64             writeback      32         69           87
128            ordered        32         54           62
128            writeback      34         70           88
256            ordered        28         51           60
256            writeback      29         64           79
400            ordered        26         49           59
400            writeback      32         70           87


---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?


Reply via email to