I'm not sure exactly which kernels from kernel.org are affected, but we ran into a serious problem on the 2.6 kernel that we were using in RHEL4 (2.6.9-22.0.1.ELsmp). The symptoms occur during a write-heavy workloads. From the PVFS2 point of view, write throughput on one or more servers will slow to just a few KB/s, and the AIO thread will consume 99% of cpu time.

This problem has bitten some Lustre users as well:
https://mail.clusterfs.com/pipermail/lustre-discuss/2006-March/001214.html
https://mail.clusterfs.com/pipermail/lustre-discuss/2006-March/001299.html

RedHat as a few bugzilla entries for it (the first includes a standalone test program to trigger the problem):
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=175140
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=156437

The problem was in new "reservation" code for EXT3, which is an optimization intended to reduce fragmentation on the file system. The algorithm it was using had a bug that could be triggered with the kind of write workload that PVFS2 happens to generate when a lot of clients are writing to it.

There are two ways to fix the problem:
- mount ext3 volumes with the "noreservation" option
- upgrade to a kernel that doesn't exhibit the problem (for RHEL4 this means upgraded to Update 3)

We happened to be able to reproducably trigger the problem on SAN, but I don't think the bug is necessarily limited to SAN volumes.

David Metheny and I spent a long time tracking this down. I highly recommend oprofile for debugging performance bugs like this. That was what finally clued us in to where the problem was after a variety of wrong turns :)

-Phil
_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

Reply via email to