
I have an active-active drbd+ocfs2 cluster running dovecot as an imap
server. Currently our load-balancer is configured to only forward
connections to one of the nodes, as explained in bug #1297. The shared
filesystems are still mounted on both machines, though.

Today we had an issue at the machine that is not receiving imap
connections, which caused it to reboot. Immediately, the load average on
the other machine started to increase very quickly, reaching 680. At
that point I tried to stop dovecot and noticed that some of its
processes wouldn't die, similarly to what happens when a process is
doing I/O and the device has a problem like a disk failure, which causes
the process to stay stucked in the kernel waiting for I/O to complete.
Because of this situation, I had to reboot this machine, and our
customers experienced some downtime.

Is this expected behavior? It seems quite fragile that rebooting or even
simply unmounting a filesystem in a cluster node that doesn't have any
processes doing I/O would affect processes in other cluster nodes.

Is there anything I can do to prevent this behavior in the future?


Ocfs2-users mailing list

Reply via email to