Hi, periodically one of on my two nodes cluster is fenced here are the logs:
Jan 14 07:01:44 nvr1-rc kernel: o2net: no longer connected to node nvr2- rc.minint.it (num 0) at 1.1.1.6:7777 Jan 14 07:01:44 nvr1-rc kernel: (21534,1):dlm_do_master_request:1334 ERROR: link to 0 went down! Jan 14 07:01:44 nvr1-rc kernel: (4007,4):dlm_send_proxy_ast_msg:458 ERROR: status = -112 Jan 14 07:01:44 nvr1-rc kernel: (4007,4):dlm_flush_asts:600 ERROR: status = -112 Jan 14 07:01:44 nvr1-rc kernel: (21534,1):dlm_get_lock_resource:917 ERROR: status = -112 Jan 14 07:02:19 nvr1-rc kernel: (3950,5):o2net_connect_expired:1664 ERROR: no connection established with node 0 after 35.0 seconds, giving up and returning errors. Jan 14 07:02:54 nvr1-rc kernel: (3950,5):o2net_connect_expired:1664 ERROR: no connection established with node 0 after 35.0 seconds, giving up and returning errors. Jan 14 07:03:10 nvr1-rc kernel: (4007,4):dlm_send_proxy_ast_msg:458 ERROR: status = -107 Jan 14 07:03:10 nvr1-rc kernel: (4007,4):dlm_flush_asts:600 ERROR: status = -107 Jan 14 07:03:29 nvr1-rc kernel: (3950,5):o2net_connect_expired:1664 ERROR: no connection established with node 0 after 35.0 seconds, giving up and returning errors. Jan 14 07:03:50 nvr1-rc kernel: (31,5):o2quo_make_decision:146 ERROR: fencing this node because it is connected to a half-quorum of 1 out of 2 nodes which doesn't include the lowest active node 0 Jan 14 07:03:50 nvr1-rc kernel: (31,5):o2hb_stop_all_regions:1967 ERROR: stopping heartbeat on all active regions. I'm sure there are no network connectivity problem but it is possible that there are heavy IO loads, is this the intended behaviour? Why under heavy load the loaded node is fenced? I'm using ocfs2-1.4.4 on rhel5 kernel-2.6.18-164.6.1.el5 thanks Nicola _______________________________________________ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users