The config error I would imagine would be that you defined two different clusters, each not having the other node, and that the two nodes have the same node number in both clusters. If so, the disk hb would have detected this error. It would have spewed error messages indicating that "some other nodes is heart beating in my slot". But yes, it would not have fenced.... well I'll need to read the code to confirm.
Alexei_Roudnev wrote:
Just into your collection of _strange_ situations. I saw it few month ago. We built 2 node cluster with iSCSI shared disks. Due to configuration error, servers got the same nodeID, and it resulted in flip-flopping connection to the shared disk between them - fist server catched disk for 5 - 10 seconds, then second catched disk, then first and so on. Result - OCFSv2 assigned the same node slot to both nodes, never recognized that other node was active, and never fence or diagnose any problem (except uncyncronized IO, of course, which broke file system in some time). Looks as heartbeat alghoritm have some flow and don't detect some failures. _______________________________________________ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
_______________________________________________ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users