I read it. He is writing that - if he unplug node1, node1 reboots and node0 replay journal. (what he want to have) - if he unplug node0, node1 reboots and node0 replay journal, which is bad because node0 is not on the network. (he wants node1 to replay)
But there is not any way, in primitive o2cb cluster, to distinguish between these 2 cases (it's why we use heartbeat - itc an be configured to do it much better). So in all cases, if you unplung node0 OR node1, it always cause node1 to reboot and node0 to replay journal. In good cluster (heartheat for example) we configure aditional 'ping' to determine if nodes are still on network or not (so node can distinguish between _other node lost_ and _network connection lost_), and we configure additional serial conenction (so that nodes can communicate even if network switch went down). Without such redundancy, you will always have incorrect behavior in 2 node cluster. > >>>> I unplug network connection from node0 and get e1000 driver "Tx Unit ... > > Hang" > > > >>>> messages on node0 console > >>>> node1 console displays "o2net_idle_timer:1309 here are some times to > >>>> > >>>> two nodes which doesn't include the lowest active node 0" > >>>> node 0 replays node 1's journal, too bad it still isn't on the network > >>>> > >>>> this is in node 1 /var/log/messages after reboot > >>>> > >>>> Nov 14 23:55:56 FTP02 kernel: o2net: connection to node > >>>> > > FTP01.mydomain.net > > > >>>> (num 0) at 10.xxx.0.45:7777 has been idle for 10 seconds, shutting it > >>>> > _______________________________________________ Ocfs2-users mailing list [email protected] http://oss.oracle.com/mailman/listinfo/ocfs2-users
