One of the nodes in a two node system keeps crashing every couple of
weeks. I'm getting this error message in /var/log/messages.

Nov  7 14:21:40 cib-sim-wec-04 kernel: (0,0):o2net_idle_timer:1306
connection to
 node cib-sim-wec-03 (num 0) at 162.111.10.230:7777 has been idle for 10 seconds
, shutting it down.
Nov  7 14:21:40 cib-sim-wec-04 kernel: (0,0):o2net_idle_timer:1317
here are some
 times that might help debug the situation: (tmr 1162927290.922909 now 116292730
0.921182 dr 1162927290.922894 adv 1162927290.922916:1162927290.922917
func (06c6
e508:504) 1162914724.648412:1162914724.648415)
Nov  7 14:21:40 cib-sim-wec-04 kernel:
(9397,0):o2net_set_nn_state:407 no longer
 connected to node cib-sim-wec-03 (num 0) at 162.111.10.230:7777
Nov  7 14:23:16 cib-sim-wec-04 kernel: (6,0):o2quo_make_decision:144
ERROR: fenc
ing this node because it is connected to a half-quorum of 1 out of 2
nodes which
doesn't include the lowest active node 0
Nov  7 14:23:16 cib-sim-wec-04 kernel:
(6,0):o2hb_stop_all_regions:1728 ERROR: s
topping heartbeat on all active regions.

I've already changed my threshold settings to the following:

cib-sim-wec-04:/proc/fs/ocfs2_nodemanager # more hb_dead_threshold
46

Here's a little background. Two node setup DL580s, with x-over cables
for heartbeats.
SLES9-SP3. OCFS2 version 1.1.7.

Any help would be greatly appreciated.
Thanks

_______________________________________________
Ocfs2-users mailing list
[email protected]
http://oss.oracle.com/mailman/listinfo/ocfs2-users

Reply via email to