What version of OCFS2 are you on? Ensure you are running 1.2. I definitely remember this bug being fixed.
doof wrote: > Hi > > I use ocfs2 (on RHEL4) since few days and i have some problem. I setup a > ocfs2 cluster with 2 nodes. > > Sometimes, one node panic because it lost connection with the other node > > Mar 5 16:49:16 node1 kernel: (0,2):o2net_idle_timer:1310 connection to > node node2 (num 0) at 10.150.28.67:7777 has been idle for 10 seconds, > shutting it down. > Mar 5 16:49:16 node1 kernel: (0,2):o2net_idle_timer:1321 here are some > times that might help debug the situation: (tmr 1141573746.685964 now > 1141573756.684348 dr 114157 > 3746.685955 adv 1141573746.6859 > 68:1141573746.685968 func (beddbae4:504) > 1141573746.685776:1141573746.685824) > Mar 5 16:49:16 node1 kernel: (2222,2):o2net_set_nn_state:411 no longer > connected to node node2 (num 0) at 10.150.28.67:7777 > Mar 5 16:49:16 node1 kernel: (2263,7):dlm_send_proxy_ast_msg:448 ERROR: > status = -112 > Mar 5 16:49:16 node1 kernel: (2263,7):dlm_flush_asts:556 ERROR: status > = -112 > Mar 5 16:49:20 node1 kernel: eip: f8b40ba2 > Mar 5 16:49:20 node1 kernel: ------------[ cut here ]------------ > Mar 5 16:49:20 node1 kernel: kernel BUG at include/asm/spinlock.h:133! > Mar 5 16:49:20 node1 kernel: invalid operand: 0000 [#1] > Mar 5 16:49:20 node1 kernel: SMP > Mar 5 16:49:20 node1 kernel: Modules linked in: md5 ipv6 parport_pc lp > parport autofs4 ocfs2(U) debugfs(U) nfs lockd ocfs2_dlmfs(U) > ocfs2_dlm(U) ocfs2_nodemanager(U) co > nfigfs(U) sunrpc microcode dm_m > irror dm_mod button battery ac ohci_hcd cpqphp e1000 e100 mii tg3 floppy > ext3 jbd qla6312(U) qla2300(U) qla2xxx(U) scsi_transport_fc > qla2xxx_conf(U) cciss sd_mod scsi_mo > d > Mar 5 16:49:20 node1 kernel: CPU: 6 > Mar 5 16:49:20 node1 kernel: EIP: 0060:[<c02cff11>] Not tainted VLI > Mar 5 16:49:20 node1 kernel: EFLAGS: 00010216 (2.6.9-22.0.2.ELsmp) > Mar 5 16:49:20 node1 kernel: EIP is at _spin_lock+0x1c/0x34 > Mar 5 16:49:20 node1 kernel: eax: c02e3869 ebx: d36c7994 ecx: > f654ee50 edx: f8b40ba2 > Mar 5 16:49:20 node1 kernel: esi: d36c7980 edi: 00000000 ebp: > 00000000 esp: f654ee54 > Mar 5 16:49:20 node1 kernel: ds: 007b es: 007b ss: 0068 > Mar 5 16:49:20 node1 kernel: Process o2hb-1C0CB88CEF (pid: 2258, > threadinfo=f654e000 task=f72f6730) > Mar 5 16:49:20 node1 kernel: Stack: 00000000 f8b40ba2 d36c7988 f7043400 > f8b40b88 00000000 00000000 f7043400 > Mar 5 16:49:20 node1 kernel: 00000000 00000000 f8b50684 f7043430 > f7043400 f8b5076a f704355c f7043558 > Mar 5 16:49:20 node1 kernel: f8c21920 f8c0b8f7 f7e7f880 00000000 > f654eedc f654eedc f8c1f8a0 f8c0ba27 > Mar 5 16:49:20 node1 kernel: Call Trace: > Mar 5 16:49:20 node1 kernel: [<f8b40ba2>] dlm_mle_node_down+0x10/0x73 > [ocfs2_dlm] > Mar 5 16:49:20 node1 kernel: [<f8b40b88>] > dlm_hb_event_notify_attached+0x6e/0x78 [ocfs2_dlm] > Mar 5 16:49:20 node1 kernel: [<f8b50684>] > __dlm_hb_node_down+0x1a6/0x267 [ocfs2_dlm] > Mar 5 16:49:20 node1 kernel: [<f8b5076a>] > dlm_hb_node_down_cb+0x25/0x3a [ocfs2_dlm] > Mar 5 16:49:20 node1 kernel: [<f8c0b8f7>] > o2hb_fire_callbacks+0x62/0x6c [ocfs2_nodemanager] > Mar 5 16:49:20 node1 kernel: [<f8c0ba27>] > o2hb_run_event_list+0x126/0x162 [ocfs2_nodemanager] > Mar 5 16:49:20 node1 kernel: [<f8c0c0f9>] o2hb_check_slot+0x4d2/0x4e7 > [ocfs2_nodemanager] > Mar 5 16:49:20 node1 kernel: [<c022370a>] submit_bio+0xca/0xd2 > Mar 5 16:49:20 node1 kernel: [<f8c0c3ed>] > o2hb_do_disk_heartbeat+0x2b4/0x325 [ocfs2_nodemanager] > Mar 5 16:49:20 node1 kernel: [<f8c0c4e2>] o2hb_thread+0x0/0x291 > [ocfs2_nodemanager] > Mar 5 16:49:20 node1 kernel: [<f8c0c56b>] o2hb_thread+0x89/0x291 > [ocfs2_nodemanager] > Mar 5 16:49:20 node1 kernel: [<f8c0c4e2>] o2hb_thread+0x0/0x291 > [ocfs2_nodemanager] > Mar 5 16:49:20 node1 kernel: [<c0133a9d>] kthread+0x73/0x9b > Mar 5 16:49:20 node1 kernel: [<c0133a2a>] kthread+0x0/0x9b > Mar 5 16:49:20 node1 kernel: [<c01041f1>] kernel_thread_helper+0x5/0xb > Mar 5 16:49:20 node1 kernel: Code: 00 75 09 f0 81 02 00 00 00 01 30 c9 > 89 c8 c3 53 89 c3 81 78 04 ad 4e ad de 74 18 ff 74 24 04 68 69 38 2e c0 > e8 33 23 e5 ff 58 5a <0f> > 0b 85 00 23 29 2e c0 f0 fe 0b > 79 09 f3 90 80 3b 00 7e f9 eb > Mar 5 16:49:20 node1 kernel: <0>Fatal exception: panic in 5 seconds > > The problem is this panic make a panic on the second node. How can i > prevent panic ? add another node .? > > thanks > Fred > > > > _______________________________________________ > Ocfs2-users mailing list > [email protected] > http://oss.oracle.com/mailman/listinfo/ocfs2-users > _______________________________________________ Ocfs2-users mailing list [email protected] http://oss.oracle.com/mailman/listinfo/ocfs2-users
