thanks for quick response! http://oss.oracle.com/bugzilla/show_bug.cgi?id=1098
forget to say that we use default timeouts Heartbeat dead threshold = 31 Network idle timeout: 30000 Network keepalive delay: 2000 Network reconnect delay: 2000 2009/4/21 Sunil Mushran <sunil.mush...@oracle.com>: > File a bugzilla (oss.oracle.com/bugzilla) for this issue. Attach > the stack trace. > > Also attach the output of the following. > $ find /lib/modules/`uname -r`/kernel/fs/ocfs2 -name \*.ko -exec objdump > -DSl {} >/tmp/ocfs2.syms \; > > Sunil > > Konstantin Tikhonov wrote: >> >> Нi, >> I have a cluster with 5 nodes hosting web application. All web servers >> save log info into shared access.log file. There is awstats log >> analyzer on the first node. Sometimes this node fails with the >> following messages (captured on another server) >> >> Apr 20 17:31:16 um-be-2 [145813.022112] o2net: connection to node >> um-fe-1 (num 1) at 192.168.10.10:7777 has been idle for 30.0 seconds, >> shutting it down. >> Apr 20 17:31:16 um-be-2 [145813.022397] o2net: no longer connected to >> node um-fe-1 (num 1) at 192.168.10.10:7777 >> Apr 20 17:31:16 um-fe-1 [ 9087.529912] o2net: connection to node >> um-be-1 (num 3) at 192.168.10.20:7777 has been idle for 30.0 seconds, >> shutting it down. >> Apr 20 17:31:16 um-fe-1 [ 9087.529971] (4614,1):o2net_idle_timer:1468 >> here are some times that might help debug the situation: (tmr >> 1240219828.837488 now 1240219858.837654 dr 1240219858.834946 adv >> 1240219828.837494:1240219828.837496 func (d5a868ed:502) >> 1240219802.621728:1240219802.621733) >> Apr 20 17:31:16 um-fe-1 [ 9087.529971] o2net: connection to node >> um-be-2 (num 4) at 192.168.10.21:7777 has been idle for 30.0 seconds, >> shutting it down. >> Apr 20 17:31:16 um-fe-1 [ 9087.529971] (4614,1):o2net_idle_timer:1468 >> here are some times that might help debug the situation: (tmr >> 1240219828.837608 now 1240219858.837859 dr 1240219858.837122 adv >> 1240219828.837614:1240219828.837616 func (d5a868ed:502) >> 1240219798.977129:1240219798.977131) >> Apr 20 17:31:16 um-fe-1 [ 9088.977942] o2net: connection to node >> um-be-3 (num 5) at 192.168.10.22:7777 has been idle for 30.0 seconds, >> shutting it down. >> Apr 20 17:31:16 um-fe-1 [ 9088.977971] (4614,1):o2net_idle_timer:1468 >> here are some times that might help debug the situation: (tmr >> 1240219830.287721 now 1240219860.285653 dr 1240219832.288107 adv >> 1240219830.287733:1240219830.287734 func (d5a868ed:502) >> 1240219830.287723:1240219830.287725) >> Apr 20 17:31:16 um-fe-1 [ 9089.041942] o2net: connection to node >> um-fe-2 (num 2) at 192.168.10.11:7777 has been idle for 30.0 seconds, >> shutting it down. >> Apr 20 17:31:16 um-fe-1 [ 9089.041971] (4614,1):o2net_idle_timer:1468 >> here are some times that might help debug the situation: (tmr >> 1240219830.350706 now 1240219860.349653 dr 1240219832.350084 adv >> 1240219830.350720:1240219830.350721 func (d5a868ed:505) >> 1240219830.350707:1240219830.350712) >> Apr 20 17:31:52 um-fe-1 [ 9123.585958] BUG: soft lockup - CPU#1 stuck >> for 61s! [awstats.pl:4614] >> Apr 20 17:31:52 um-fe-1 [ 9123.585958] Modules linked in: >> Apr 20 17:31:52 um-fe-1 netconsole >> Apr 20 17:31:52 um-fe-1 nfsd >> Apr 20 17:31:52 um-fe-1 lockd >> Apr 20 17:31:52 um-fe-1 nfs_acl >> Apr 20 17:31:52 um-fe-1 auth_rpcgss >> Apr 20 17:31:52 um-fe-1 sunrpc >> Apr 20 17:31:52 um-fe-1 exportfs >> Apr 20 17:31:52 um-fe-1 sctp >> Apr 20 17:31:52 um-fe-1 ipv6 >> Apr 20 17:31:52 um-fe-1 libcrc32c >> Apr 20 17:31:52 um-fe-1 ocfs2 >> Apr 20 17:31:52 um-fe-1 ocfs2_dlmfs >> Apr 20 17:31:52 um-fe-1 ocfs2_stack_o2cb >> Apr 20 17:31:52 um-fe-1 ocfs2_dlm >> Apr 20 17:31:52 um-fe-1 ocfs2_nodemanager >> Apr 20 17:31:52 um-fe-1 ocfs2_stackglue >> Apr 20 17:31:52 um-fe-1 bonding >> Apr 20 17:31:52 um-fe-1 ipt_MASQUERADE >> Apr 20 17:31:52 um-fe-1 iptable_nat >> Apr 20 17:31:52 um-fe-1 nf_nat >> Apr 20 17:31:52 um-fe-1 nf_conntrack_ipv4 >> Apr 20 17:31:52 um-fe-1 nf_conntrack >> Apr 20 17:31:52 um-fe-1 iptable_filter >> Apr 20 17:31:52 um-fe-1 ip_tables >> Apr 20 17:31:52 um-fe-1 x_tables >> Apr 20 17:31:52 um-fe-1 fuse >> Apr 20 17:31:52 um-fe-1 dlm >> Apr 20 17:31:52 um-fe-1 configfs >> Apr 20 17:31:52 um-fe-1 tun >> Apr 20 17:31:52 um-fe-1 loop >> Apr 20 17:31:52 um-fe-1 psmouse >> Apr 20 17:31:52 um-fe-1 serio_raw >> Apr 20 17:31:52 um-fe-1 k8temp >> Apr 20 17:31:52 um-fe-1 snd_pcsp >> Apr 20 17:31:52 um-fe-1 snd_pcm >> Apr 20 17:31:52 um-fe-1 snd_timer >> Apr 20 17:31:52 um-fe-1 snd >> Apr 20 17:31:52 um-fe-1 soundcore >> Apr 20 17:31:52 um-fe-1 snd_page_alloc >> Apr 20 17:31:52 um-fe-1 i2c_piix4 >> Apr 20 17:31:52 um-fe-1 i2c_core >> Apr 20 17:31:52 um-fe-1 shpchp >> Apr 20 17:31:52 um-fe-1 pci_hotplug >> Apr 20 17:31:52 um-fe-1 button >> Apr 20 17:31:52 um-fe-1 evdev >> Apr 20 17:31:52 um-fe-1 ext3 >> Apr 20 17:31:52 um-fe-1 jbd >> Apr 20 17:31:52 um-fe-1 mbcache >> Apr 20 17:31:52 um-fe-1 dm_mirror >> Apr 20 17:31:52 um-fe-1 dm_log >> Apr 20 17:31:52 um-fe-1 dm_snapshot >> Apr 20 17:31:52 um-fe-1 dm_mod >> Apr 20 17:31:52 um-fe-1 ses >> Apr 20 17:31:52 um-fe-1 enclosure >> Apr 20 17:31:52 um-fe-1 sd_mod >> Apr 20 17:31:52 um-fe-1 serverworks >> Apr 20 17:31:52 um-fe-1 ide_pci_generic >> Apr 20 17:31:52 um-fe-1 ide_core >> Apr 20 17:31:52 um-fe-1 lpfc >> Apr 20 17:31:52 um-fe-1 scsi_transport_fc >> Apr 20 17:31:52 um-fe-1 scsi_tgt >> Apr 20 17:31:52 um-fe-1 tg3 >> Apr 20 17:31:52 um-fe-1 sata_promise >> Apr 20 17:31:52 um-fe-1 ata_generic >> Apr 20 17:31:52 um-fe-1 ehci_hcd >> Apr 20 17:31:52 um-fe-1 ohci_hcd >> Apr 20 17:31:52 um-fe-1 libata >> Apr 20 17:31:52 um-fe-1 scsi_mod >> Apr 20 17:31:52 um-fe-1 dock >> Apr 20 17:31:52 um-fe-1 thermal >> Apr 20 17:31:52 um-fe-1 processor >> Apr 20 17:31:52 um-fe-1 fan >> Apr 20 17:31:52 um-fe-1 thermal_sys >> Apr 20 17:31:52 um-fe-1 >> Apr 20 17:31:52 um-fe-1 [ 9123.585958] CPU 1: >> Apr 20 17:31:52 um-fe-1 >> Apr 20 17:31:52 um-fe-1 [ 9123.585958] Modules linked in: >> Apr 20 17:31:52 um-fe-1 netconsole >> Apr 20 17:31:52 um-fe-1 nfsd >> Apr 20 17:31:52 um-fe-1 lockd >> Apr 20 17:31:52 um-fe-1 nfs_acl >> Apr 20 17:31:52 um-fe-1 auth_rpcgss >> Apr 20 17:31:52 um-fe-1 sunrpc >> Apr 20 17:31:52 um-fe-1 exportfs >> Apr 20 17:31:52 um-fe-1 sctp >> Apr 20 17:31:52 um-fe-1 ipv6 >> Apr 20 17:31:52 um-fe-1 libcrc32c >> Apr 20 17:31:52 um-fe-1 ocfs2 >> Apr 20 17:31:52 um-fe-1 ocfs2_dlmfs >> Apr 20 17:31:52 um-fe-1 ocfs2_stack_o2cb >> Apr 20 17:31:52 um-fe-1 ocfs2_dlm >> Apr 20 17:31:52 um-fe-1 ocfs2_nodemanager >> Apr 20 17:31:52 um-fe-1 ocfs2_stackglue >> Apr 20 17:31:52 um-fe-1 bonding >> Apr 20 17:31:52 um-fe-1 ipt_MASQUERADE >> Apr 20 17:31:52 um-fe-1 iptable_nat >> Apr 20 17:31:52 um-fe-1 nf_nat >> Apr 20 17:31:52 um-fe-1 nf_conntrack_ipv4 >> Apr 20 17:31:52 um-fe-1 nf_conntrack >> Apr 20 17:31:52 um-fe-1 iptable_filter >> Apr 20 17:31:52 um-fe-1 ip_tables >> Apr 20 17:31:52 um-fe-1 x_tables >> Apr 20 17:31:52 um-fe-1 fuse >> Apr 20 17:31:52 um-fe-1 dlm >> Apr 20 17:31:52 um-fe-1 configfs >> Apr 20 17:31:52 um-fe-1 tun >> Apr 20 17:31:52 um-fe-1 loop >> Apr 20 17:31:52 um-fe-1 psmouse >> Apr 20 17:31:52 um-fe-1 serio_raw >> Apr 20 17:31:52 um-fe-1 k8temp >> Apr 20 17:31:52 um-fe-1 snd_pcsp >> Apr 20 17:31:52 um-fe-1 snd_pcm >> Apr 20 17:31:52 um-fe-1 snd_timer >> Apr 20 17:31:52 um-fe-1 snd >> Apr 20 17:31:52 um-fe-1 soundcore >> Apr 20 17:31:52 um-fe-1 snd_page_alloc >> Apr 20 17:31:52 um-fe-1 i2c_piix4 >> Apr 20 17:31:52 um-fe-1 i2c_core >> Apr 20 17:31:52 um-fe-1 shpchp >> Apr 20 17:31:52 um-fe-1 pci_hotplug >> Apr 20 17:31:52 um-fe-1 button >> Apr 20 17:31:52 um-fe-1 evdev >> Apr 20 17:31:52 um-fe-1 ext3 >> Apr 20 17:31:52 um-fe-1 jbd >> Apr 20 17:31:52 um-fe-1 mbcache >> Apr 20 17:31:52 um-fe-1 dm_mirror >> Apr 20 17:31:52 um-fe-1 dm_log >> Apr 20 17:31:52 um-fe-1 dm_snapshot >> Apr 20 17:31:52 um-fe-1 dm_mod >> Apr 20 17:31:52 um-fe-1 ses >> Apr 20 17:31:52 um-fe-1 enclosure >> Apr 20 17:31:52 um-fe-1 sd_mod >> Apr 20 17:31:52 um-fe-1 serverworks >> Apr 20 17:31:52 um-fe-1 ide_pci_generic >> Apr 20 17:31:52 um-fe-1 ide_core >> Apr 20 17:31:52 um-fe-1 lpfc >> Apr 20 17:31:52 um-fe-1 scsi_transport_fc >> Apr 20 17:31:52 um-fe-1 scsi_tgt >> Apr 20 17:31:52 um-fe-1 tg3 >> Apr 20 17:31:52 um-fe-1 sata_promise >> Apr 20 17:31:52 um-fe-1 ata_generic >> Apr 20 17:31:52 um-fe-1 ehci_hcd >> Apr 20 17:31:52 um-fe-1 ohci_hcd >> Apr 20 17:31:52 um-fe-1 libata >> Apr 20 17:31:52 um-fe-1 scsi_mod >> Apr 20 17:31:52 um-fe-1 dock >> Apr 20 17:31:52 um-fe-1 thermal >> Apr 20 17:31:52 um-fe-1 processor >> Apr 20 17:31:52 um-fe-1 fan >> Apr 20 17:31:52 um-fe-1 thermal_sys >> Apr 20 17:31:52 um-fe-1 >> Apr 20 17:31:52 um-fe-1 [ 9123.585958] Pid: 4614, comm: awstats.pl Not >> tainted 2.6.26-2-amd64 #1 >> Apr 20 17:31:52 um-fe-1 [ 9123.585958] RIP: 0010:[<ffffffffa032b0ca>] >> Apr 20 17:31:52 um-fe-1 [<ffffffffa032b0ca>] >> :ocfs2:ocfs2_cluster_lock+0x87/0x73a >> Apr 20 17:31:52 um-fe-1 [ 9123.585958] RSP: 0018:ffff81001fd81af8 >> EFLAGS: 00000282 >> Apr 20 17:31:52 um-fe-1 [ 9123.585958] RAX: ffffffffa036ff60 RBX: >> ffff81003180a858 RCX: 0000000000000000 >> Apr 20 17:31:52 um-fe-1 [ 9123.585958] RDX: 0000000000000003 RSI: >> ffff81003180a858 RDI: ffff81001fd81b58 >> Apr 20 17:31:52 um-fe-1 [ 9123.585958] RBP: 0000000000000001 R08: >> 0000000000000004 R09: ffff81003180ac38 >> Apr 20 17:31:52 um-fe-1 [ 9123.585958] R10: 00000000005b3292 R11: >> 0000000000000001 R12: ffff81003180a858 >> Apr 20 17:31:52 um-fe-1 [ 9123.585958] R13: 0000000000000000 R14: >> 00000000ffffffff R15: 0000080dadfb67c0 >> Apr 20 17:31:52 um-fe-1 [ 9123.585958] FS: 00007f8c311116e0(0000) >> GS:ffff81003f9f99c0(0000) knlGS:0000000000000000 >> Apr 20 17:31:52 um-fe-1 [ 9123.585958] CS: 0010 DS: 0000 ES: 0000 >> CR0: 0000000080050033 >> Apr 20 17:31:52 um-fe-1 [ 9123.585958] CR2: 00007f8c3111c000 CR3: >> 00000000195d4000 CR4: 00000000000006e0 >> Apr 20 17:31:52 um-fe-1 [ 9123.585958] DR0: 0000000000000000 DR1: >> 0000000000000000 DR2: 0000000000000000 >> Apr 20 17:31:52 um-fe-1 [ 9123.585958] DR3: 0000000000000000 DR6: >> 00000000ffff0ff0 DR7: 0000000000000400 >> Apr 20 17:31:52 um-fe-1 [ 9123.585958] >> Apr 20 17:31:52 um-fe-1 [ 9123.585958] Call Trace: >> Apr 20 17:31:52 um-fe-1 [ 9123.585958] [<ffffffffa032b0c6>] ? >> :ocfs2:ocfs2_cluster_lock+0x83/0x73a >> Apr 20 17:31:52 um-fe-1 [ 9123.585958] [<ffffffffa033cde2>] ? >> :ocfs2:ocfs2_wait_for_recovery+0x10/0x83 >> Apr 20 17:31:52 um-fe-1 [ 9123.585958] [<ffffffffa032bd2c>] ? >> :ocfs2:ocfs2_inode_lock_full+0x17a/0xd08 >> Apr 20 17:31:52 um-fe-1 [ 9123.585958] [<ffffffff8042957d>] ? >> _spin_unlock_irqrestore+0x7/0xe >> Apr 20 17:31:52 um-fe-1 [ 9123.585958] [<ffffffffa032915e>] ? >> :ocfs2:ocfs2_cluster_unlock+0x21d/0x289 >> Apr 20 17:31:52 um-fe-1 [ 9123.585958] [<ffffffffa032c8d9>] ? >> :ocfs2:ocfs2_inode_lock_with_page+0x1f/0x5f >> Apr 20 17:31:52 um-fe-1 [ 9123.585958] [<ffffffffa032311f>] ? >> :ocfs2:ocfs2_readpage+0x7e/0x33f >> Apr 20 17:31:52 um-fe-1 [ 9123.585958] [<ffffffff80270eef>] ? >> find_get_page+0x4a/0x54 >> Apr 20 17:31:52 um-fe-1 [ 9123.585958] [<ffffffff80272bdc>] ? >> generic_file_aio_read+0x31f/0x4a9 >> Apr 20 17:31:52 um-fe-1 [ 9123.585958] [<ffffffffa0337182>] ? >> :ocfs2:ocfs2_file_aio_read+0x282/0x395 >> Apr 20 17:31:52 um-fe-1 [ 9123.585958] [<ffffffff8020bf79>] ? >> sysret_signal+0x2b/0x45 >> Apr 20 17:31:52 um-fe-1 [ 9123.585958] [<ffffffff8029ad13>] ? >> do_sync_read+0xc9/0x10c >> Apr 20 17:31:52 um-fe-1 [ 9123.585958] [<ffffffff802461b1>] ? >> autoremove_wake_function+0x0/0x2e >> Apr 20 17:31:52 um-fe-1 [ 9123.585958] [<ffffffff802460db>] ? >> bit_waitqueue+0x10/0x97 >> Apr 20 17:31:52 um-fe-1 [ 9123.585958] [<ffffffff802461a0>] ? >> wake_up_bit+0x11/0x22 >> Apr 20 17:31:52 um-fe-1 [ 9123.585958] [<ffffffff802aa5da>] ? >> d_kill+0x44/0x59 >> Apr 20 17:31:52 um-fe-1 [ 9123.585958] [<ffffffff8029b504>] ? >> vfs_read+0xaa/0x152 >> Apr 20 17:31:52 um-fe-1 [ 9123.585958] [<ffffffff8029b8e5>] ? >> sys_read+0x45/0x6e >> Apr 20 17:31:52 um-fe-1 [ 9123.585958] [<ffffffff8020beca>] ? >> system_call_after_swapgs+0x8a/0x8f >> Apr 20 17:31:52 um-fe-1 [ 9123.585958] >> Apr 20 17:31:52 um-fe-1 [ 9123.585972] BUG: soft lockup - CPU#0 stuck >> for 61s! [awstats.pl:4620] >> >> What can I do resolv this problem. Thanks. >> >> > > -- С уважением, Константин Тихонов. _______________________________________________ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users