I'm not sure exactly which kernels from kernel.org are affected, but we
ran into a serious problem on the 2.6 kernel that we were using in RHEL4
(2.6.9-22.0.1.ELsmp). The symptoms occur during a write-heavy
workloads. From the PVFS2 point of view, write throughput on one or
more servers will slow to just a few KB/s, and the AIO thread will
consume 99% of cpu time.
This problem has bitten some Lustre users as well:
https://mail.clusterfs.com/pipermail/lustre-discuss/2006-March/001214.html
https://mail.clusterfs.com/pipermail/lustre-discuss/2006-March/001299.html
RedHat as a few bugzilla entries for it (the first includes a standalone
test program to trigger the problem):
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=175140
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=156437
The problem was in new "reservation" code for EXT3, which is an
optimization intended to reduce fragmentation on the file system. The
algorithm it was using had a bug that could be triggered with the kind
of write workload that PVFS2 happens to generate when a lot of clients
are writing to it.
There are two ways to fix the problem:
- mount ext3 volumes with the "noreservation" option
- upgrade to a kernel that doesn't exhibit the problem (for RHEL4 this
means upgraded to Update 3)
We happened to be able to reproducably trigger the problem on SAN, but I
don't think the bug is necessarily limited to SAN volumes.
David Metheny and I spent a long time tracking this down. I highly
recommend oprofile for debugging performance bugs like this. That was
what finally clued us in to where the problem was after a variety of
wrong turns :)
-Phil
_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers