I am currently testing ocfs2 for use in a two-node cluster that will run the Cyrus imapd and am having issues that seem to be related to occasionally long times being needed while the software blocks waiting to get a writelock via the 'fcntl' system call. I am aware that the current ocfs2 supports neither a writable mmap nor a cluster-aware flock, so my tests are done doing all writing to only one node of the cluster and the Cyrus configuration is such that none of the requisite databases require a writable 'mmap' (i.e. all databases are skiplist, not Berkeley DB).
I am using drbd to provide the appropriate
support for having the disks on the two nodes to behave as a shared resource; as permitted by drbd, version 8, the disks on both nodes are drbd primaries and mounted on their respective machines. I am testing by having modest size mail messages delivered to just one of the machines at the rate of 1/sec. The system will run fine in this mode, sometimes for days but then will get hopelessly wedged with many 'lmtpd' processes waiting to get exclusive locks on the various Cyrus databases. As the system approaches this deadlock condition, 'strace' shows times of many seconds being spent in 'fcntl' waiting for the lock and the load average skyrockets because of all the 'lmtpd' processes. Since mail is being delivered at essentially a constant rate and there is no other activity on the systems, I'm confused as to how the machines will often run for extended times before suddenly getting into this pathological state.

I realize that because my setup is using several complex layers (actually the full storage design has

md->drbd->lvm->ocfs2->Cyrus imapd) I will also consult the drbd and Cyrus mailing lists, but I'm hoping that someone on this list might have some insight into how fcntl-based locking is implemented under ocfs2 that may help point the way to what is causing the deadlock after many days of running well.

The machines are both running CentOS 4.4 with a 2.6.19 kernel; the ocfs2 code is that included with the kernel
sources; drbd is version 8.0 and the Cyrus version is 2.3.8.

Thank you for any thoughts on this matter.

Jeff Fookson

--
Jeffrey E. Fookson, PhD                 Phone: (520) 621 3091
Support Systems Analyst, Principal      [EMAIL PROTECTED]
Steward Observatory
University of Arizona


_______________________________________________
Ocfs2-users mailing list
[email protected]
http://oss.oracle.com/mailman/listinfo/ocfs2-users

Reply via email to