The config error I would imagine would be that you defined two
different clusters, each not having the other node, and that the two
nodes have the same node number in both clusters. If so, the disk hb
would have detected this error. It would have spewed error messages
indicating that "some other nodes is heart beating in my slot". But yes,
it would not have fenced.... well I'll need to read the code to confirm.

Alexei_Roudnev wrote:
Just into your collection of _strange_ situations. I saw it few month ago.

We built 2 node cluster with iSCSI shared disks. Due to configuration error,
servers got the same nodeID, and
it resulted in flip-flopping connection to the shared disk between them -
fist server catched disk for 5 - 10 seconds, then
second catched disk, then first and so on.

Result - OCFSv2 assigned the same node slot to both nodes, never recognized
that other node was active, and
never fence or diagnose any problem (except uncyncronized IO, of course,
which broke file system in some time).
Looks as heartbeat alghoritm have some flow and don't detect some failures.



_______________________________________________
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users

_______________________________________________
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users

Reply via email to