Hello I have an active-active drbd+ocfs2 cluster running dovecot as an imap server. Currently our load-balancer is configured to only forward connections to one of the nodes, as explained in bug #1297. The shared filesystems are still mounted on both machines, though.
Today we had an issue at the machine that is not receiving imap connections, which caused it to reboot. Immediately, the load average on the other machine started to increase very quickly, reaching 680. At that point I tried to stop dovecot and noticed that some of its processes wouldn't die, similarly to what happens when a process is doing I/O and the device has a problem like a disk failure, which causes the process to stay stucked in the kernel waiting for I/O to complete. Because of this situation, I had to reboot this machine, and our customers experienced some downtime. Is this expected behavior? It seems quite fragile that rebooting or even simply unmounting a filesystem in a cluster node that doesn't have any processes doing I/O would affect processes in other cluster nodes. Is there anything I can do to prevent this behavior in the future? Thanks Andre _______________________________________________ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users