>Hello Sunil, >Any thoughts to avoid this behavior? It was impossible to resume the OCFS2 service on the other node as it seemed that the access to the shared storage cannot be managed consistently >on the first node, the system didn't hang, but the cluster databases did >Another observation on the first node which hanged first (where did the 10 seconds come from? See config below)
>Feb 3 12:27:06 oracle2d kernel: o2net: connection to node oracle1d (num 1) at 172.20.1.1:7777 has been idle for 10 seconds, shutting it down. >Feb 3 12:27:06 oracle2d kernel: o2net: no longer connected to node oracle1d (num 1) at 172.20.1.1:7777 >Feb 3 14:02:39 oracle2d ntpd[16839]: Listening on interface eth2, 172.20.1.2#123 >Feb 3 14:03:01 oracle2d kernel: o2net: connected to node oracle1d (num 1) at 172.20.1.1:7777 >oracle2d:~ # cat /etc/sysconfig/o2cb ># ># This is a configuration file for automatic startup of the O2CB ># driver. It is generated by running /etc/init.d/o2cb configure. ># Please use that method to modify this file ># > ># O2CB_ENABELED: 'true' means to load the driver on boot. >O2CB_ENABLED=true > ># O2CB_BOOTCLUSTER: If not empty, the name of a cluster to start. >O2CB_BOOTCLUSTER=racdb1 > ># O2CB_HEARTBEAT_THRESHOLD: Iterations before a node is considered dead. >O2CB_HEARTBEAT_THRESHOLD=601 > ># O2CB_IDLE_TIMEOUT_MS: Time in ms before a network connection is considered dead. >O2CB_IDLE_TIMEOUT_MS=30000 > ># O2CB_KEEPALIVE_DELAY_MS: Max time in ms before a keepalive packet is sent >O2CB_KEEPALIVE_DELAY_MS=2000 > ># O2CB_RECONNECT_DELAY_MS: Min time in ms between connection attempts >O2CB_RECONNECT_DELAY_MS=2000 >There was no physical link outage? What else do you recommend to verify? > Thanks for your time > Best regards, > Karim -----Original Message----- From: Sunil Mushran [mailto:[email protected]] Sent: Tuesday, February 03, 2009 8:36 PM To: Karim Alkhayer Cc: [email protected] Subject: Re: [Ocfs2-users] o2quo_make_decision Means the network connection between two nodes, in a two node cluster, broke. In such a case, we fence off one of the nodes. The FAQ and 1.4 user's guide talk about quorum. Karim Alkhayer wrote: > > Sunil, > > > Any clue what this means? > > > Feb 3 12:47:12 oracle2d kernel: (19,1):o2quo_make_decision:144 > ERROR: fencing this node because it is connected to a half-quorum of 1 > out of 2 nodes which doesn't include the lowest active node 1 > > > Feb 3 12:47:12 oracle2d kernel: (19,1):o2hb_stop_all_regions:1889 > ERROR: stopping heartbeat on all active regions. > > > Feb 3 12:47:12 oracle2d kernel: Kernel panic: ocfs2 is very sorry > to be fencing this system by panicing > > > > Thanks, > > Karim > > > > ------------------------------------------------------------------------ > > _______________________________________________ > Ocfs2-users mailing list > [email protected] > http://oss.oracle.com/mailman/listinfo/ocfs2-users _______________________________________________ Ocfs2-users mailing list [email protected] http://oss.oracle.com/mailman/listinfo/ocfs2-users
