File a bugzilla (oss.oracle.com/bugzilla) for this issue. Attach the stack trace.
Also attach the output of the following. $ find /lib/modules/`uname -r`/kernel/fs/ocfs2 -name \*.ko -exec objdump -DSl {} >/tmp/ocfs2.syms \; Sunil Konstantin Tikhonov wrote: > Нi, > I have a cluster with 5 nodes hosting web application. All web servers > save log info into shared access.log file. There is awstats log > analyzer on the first node. Sometimes this node fails with the > following messages (captured on another server) > > Apr 20 17:31:16 um-be-2 [145813.022112] o2net: connection to node > um-fe-1 (num 1) at 192.168.10.10:7777 has been idle for 30.0 seconds, > shutting it down. > Apr 20 17:31:16 um-be-2 [145813.022397] o2net: no longer connected to > node um-fe-1 (num 1) at 192.168.10.10:7777 > Apr 20 17:31:16 um-fe-1 [ 9087.529912] o2net: connection to node > um-be-1 (num 3) at 192.168.10.20:7777 has been idle for 30.0 seconds, > shutting it down. > Apr 20 17:31:16 um-fe-1 [ 9087.529971] (4614,1):o2net_idle_timer:1468 > here are some times that might help debug the situation: (tmr > 1240219828.837488 now 1240219858.837654 dr 1240219858.834946 adv > 1240219828.837494:1240219828.837496 func (d5a868ed:502) > 1240219802.621728:1240219802.621733) > Apr 20 17:31:16 um-fe-1 [ 9087.529971] o2net: connection to node > um-be-2 (num 4) at 192.168.10.21:7777 has been idle for 30.0 seconds, > shutting it down. > Apr 20 17:31:16 um-fe-1 [ 9087.529971] (4614,1):o2net_idle_timer:1468 > here are some times that might help debug the situation: (tmr > 1240219828.837608 now 1240219858.837859 dr 1240219858.837122 adv > 1240219828.837614:1240219828.837616 func (d5a868ed:502) > 1240219798.977129:1240219798.977131) > Apr 20 17:31:16 um-fe-1 [ 9088.977942] o2net: connection to node > um-be-3 (num 5) at 192.168.10.22:7777 has been idle for 30.0 seconds, > shutting it down. > Apr 20 17:31:16 um-fe-1 [ 9088.977971] (4614,1):o2net_idle_timer:1468 > here are some times that might help debug the situation: (tmr > 1240219830.287721 now 1240219860.285653 dr 1240219832.288107 adv > 1240219830.287733:1240219830.287734 func (d5a868ed:502) > 1240219830.287723:1240219830.287725) > Apr 20 17:31:16 um-fe-1 [ 9089.041942] o2net: connection to node > um-fe-2 (num 2) at 192.168.10.11:7777 has been idle for 30.0 seconds, > shutting it down. > Apr 20 17:31:16 um-fe-1 [ 9089.041971] (4614,1):o2net_idle_timer:1468 > here are some times that might help debug the situation: (tmr > 1240219830.350706 now 1240219860.349653 dr 1240219832.350084 adv > 1240219830.350720:1240219830.350721 func (d5a868ed:505) > 1240219830.350707:1240219830.350712) > Apr 20 17:31:52 um-fe-1 [ 9123.585958] BUG: soft lockup - CPU#1 stuck > for 61s! [awstats.pl:4614] > Apr 20 17:31:52 um-fe-1 [ 9123.585958] Modules linked in: > Apr 20 17:31:52 um-fe-1 netconsole > Apr 20 17:31:52 um-fe-1 nfsd > Apr 20 17:31:52 um-fe-1 lockd > Apr 20 17:31:52 um-fe-1 nfs_acl > Apr 20 17:31:52 um-fe-1 auth_rpcgss > Apr 20 17:31:52 um-fe-1 sunrpc > Apr 20 17:31:52 um-fe-1 exportfs > Apr 20 17:31:52 um-fe-1 sctp > Apr 20 17:31:52 um-fe-1 ipv6 > Apr 20 17:31:52 um-fe-1 libcrc32c > Apr 20 17:31:52 um-fe-1 ocfs2 > Apr 20 17:31:52 um-fe-1 ocfs2_dlmfs > Apr 20 17:31:52 um-fe-1 ocfs2_stack_o2cb > Apr 20 17:31:52 um-fe-1 ocfs2_dlm > Apr 20 17:31:52 um-fe-1 ocfs2_nodemanager > Apr 20 17:31:52 um-fe-1 ocfs2_stackglue > Apr 20 17:31:52 um-fe-1 bonding > Apr 20 17:31:52 um-fe-1 ipt_MASQUERADE > Apr 20 17:31:52 um-fe-1 iptable_nat > Apr 20 17:31:52 um-fe-1 nf_nat > Apr 20 17:31:52 um-fe-1 nf_conntrack_ipv4 > Apr 20 17:31:52 um-fe-1 nf_conntrack > Apr 20 17:31:52 um-fe-1 iptable_filter > Apr 20 17:31:52 um-fe-1 ip_tables > Apr 20 17:31:52 um-fe-1 x_tables > Apr 20 17:31:52 um-fe-1 fuse > Apr 20 17:31:52 um-fe-1 dlm > Apr 20 17:31:52 um-fe-1 configfs > Apr 20 17:31:52 um-fe-1 tun > Apr 20 17:31:52 um-fe-1 loop > Apr 20 17:31:52 um-fe-1 psmouse > Apr 20 17:31:52 um-fe-1 serio_raw > Apr 20 17:31:52 um-fe-1 k8temp > Apr 20 17:31:52 um-fe-1 snd_pcsp > Apr 20 17:31:52 um-fe-1 snd_pcm > Apr 20 17:31:52 um-fe-1 snd_timer > Apr 20 17:31:52 um-fe-1 snd > Apr 20 17:31:52 um-fe-1 soundcore > Apr 20 17:31:52 um-fe-1 snd_page_alloc > Apr 20 17:31:52 um-fe-1 i2c_piix4 > Apr 20 17:31:52 um-fe-1 i2c_core > Apr 20 17:31:52 um-fe-1 shpchp > Apr 20 17:31:52 um-fe-1 pci_hotplug > Apr 20 17:31:52 um-fe-1 button > Apr 20 17:31:52 um-fe-1 evdev > Apr 20 17:31:52 um-fe-1 ext3 > Apr 20 17:31:52 um-fe-1 jbd > Apr 20 17:31:52 um-fe-1 mbcache > Apr 20 17:31:52 um-fe-1 dm_mirror > Apr 20 17:31:52 um-fe-1 dm_log > Apr 20 17:31:52 um-fe-1 dm_snapshot > Apr 20 17:31:52 um-fe-1 dm_mod > Apr 20 17:31:52 um-fe-1 ses > Apr 20 17:31:52 um-fe-1 enclosure > Apr 20 17:31:52 um-fe-1 sd_mod > Apr 20 17:31:52 um-fe-1 serverworks > Apr 20 17:31:52 um-fe-1 ide_pci_generic > Apr 20 17:31:52 um-fe-1 ide_core > Apr 20 17:31:52 um-fe-1 lpfc > Apr 20 17:31:52 um-fe-1 scsi_transport_fc > Apr 20 17:31:52 um-fe-1 scsi_tgt > Apr 20 17:31:52 um-fe-1 tg3 > Apr 20 17:31:52 um-fe-1 sata_promise > Apr 20 17:31:52 um-fe-1 ata_generic > Apr 20 17:31:52 um-fe-1 ehci_hcd > Apr 20 17:31:52 um-fe-1 ohci_hcd > Apr 20 17:31:52 um-fe-1 libata > Apr 20 17:31:52 um-fe-1 scsi_mod > Apr 20 17:31:52 um-fe-1 dock > Apr 20 17:31:52 um-fe-1 thermal > Apr 20 17:31:52 um-fe-1 processor > Apr 20 17:31:52 um-fe-1 fan > Apr 20 17:31:52 um-fe-1 thermal_sys > Apr 20 17:31:52 um-fe-1 > Apr 20 17:31:52 um-fe-1 [ 9123.585958] CPU 1: > Apr 20 17:31:52 um-fe-1 > Apr 20 17:31:52 um-fe-1 [ 9123.585958] Modules linked in: > Apr 20 17:31:52 um-fe-1 netconsole > Apr 20 17:31:52 um-fe-1 nfsd > Apr 20 17:31:52 um-fe-1 lockd > Apr 20 17:31:52 um-fe-1 nfs_acl > Apr 20 17:31:52 um-fe-1 auth_rpcgss > Apr 20 17:31:52 um-fe-1 sunrpc > Apr 20 17:31:52 um-fe-1 exportfs > Apr 20 17:31:52 um-fe-1 sctp > Apr 20 17:31:52 um-fe-1 ipv6 > Apr 20 17:31:52 um-fe-1 libcrc32c > Apr 20 17:31:52 um-fe-1 ocfs2 > Apr 20 17:31:52 um-fe-1 ocfs2_dlmfs > Apr 20 17:31:52 um-fe-1 ocfs2_stack_o2cb > Apr 20 17:31:52 um-fe-1 ocfs2_dlm > Apr 20 17:31:52 um-fe-1 ocfs2_nodemanager > Apr 20 17:31:52 um-fe-1 ocfs2_stackglue > Apr 20 17:31:52 um-fe-1 bonding > Apr 20 17:31:52 um-fe-1 ipt_MASQUERADE > Apr 20 17:31:52 um-fe-1 iptable_nat > Apr 20 17:31:52 um-fe-1 nf_nat > Apr 20 17:31:52 um-fe-1 nf_conntrack_ipv4 > Apr 20 17:31:52 um-fe-1 nf_conntrack > Apr 20 17:31:52 um-fe-1 iptable_filter > Apr 20 17:31:52 um-fe-1 ip_tables > Apr 20 17:31:52 um-fe-1 x_tables > Apr 20 17:31:52 um-fe-1 fuse > Apr 20 17:31:52 um-fe-1 dlm > Apr 20 17:31:52 um-fe-1 configfs > Apr 20 17:31:52 um-fe-1 tun > Apr 20 17:31:52 um-fe-1 loop > Apr 20 17:31:52 um-fe-1 psmouse > Apr 20 17:31:52 um-fe-1 serio_raw > Apr 20 17:31:52 um-fe-1 k8temp > Apr 20 17:31:52 um-fe-1 snd_pcsp > Apr 20 17:31:52 um-fe-1 snd_pcm > Apr 20 17:31:52 um-fe-1 snd_timer > Apr 20 17:31:52 um-fe-1 snd > Apr 20 17:31:52 um-fe-1 soundcore > Apr 20 17:31:52 um-fe-1 snd_page_alloc > Apr 20 17:31:52 um-fe-1 i2c_piix4 > Apr 20 17:31:52 um-fe-1 i2c_core > Apr 20 17:31:52 um-fe-1 shpchp > Apr 20 17:31:52 um-fe-1 pci_hotplug > Apr 20 17:31:52 um-fe-1 button > Apr 20 17:31:52 um-fe-1 evdev > Apr 20 17:31:52 um-fe-1 ext3 > Apr 20 17:31:52 um-fe-1 jbd > Apr 20 17:31:52 um-fe-1 mbcache > Apr 20 17:31:52 um-fe-1 dm_mirror > Apr 20 17:31:52 um-fe-1 dm_log > Apr 20 17:31:52 um-fe-1 dm_snapshot > Apr 20 17:31:52 um-fe-1 dm_mod > Apr 20 17:31:52 um-fe-1 ses > Apr 20 17:31:52 um-fe-1 enclosure > Apr 20 17:31:52 um-fe-1 sd_mod > Apr 20 17:31:52 um-fe-1 serverworks > Apr 20 17:31:52 um-fe-1 ide_pci_generic > Apr 20 17:31:52 um-fe-1 ide_core > Apr 20 17:31:52 um-fe-1 lpfc > Apr 20 17:31:52 um-fe-1 scsi_transport_fc > Apr 20 17:31:52 um-fe-1 scsi_tgt > Apr 20 17:31:52 um-fe-1 tg3 > Apr 20 17:31:52 um-fe-1 sata_promise > Apr 20 17:31:52 um-fe-1 ata_generic > Apr 20 17:31:52 um-fe-1 ehci_hcd > Apr 20 17:31:52 um-fe-1 ohci_hcd > Apr 20 17:31:52 um-fe-1 libata > Apr 20 17:31:52 um-fe-1 scsi_mod > Apr 20 17:31:52 um-fe-1 dock > Apr 20 17:31:52 um-fe-1 thermal > Apr 20 17:31:52 um-fe-1 processor > Apr 20 17:31:52 um-fe-1 fan > Apr 20 17:31:52 um-fe-1 thermal_sys > Apr 20 17:31:52 um-fe-1 > Apr 20 17:31:52 um-fe-1 [ 9123.585958] Pid: 4614, comm: awstats.pl Not > tainted 2.6.26-2-amd64 #1 > Apr 20 17:31:52 um-fe-1 [ 9123.585958] RIP: 0010:[<ffffffffa032b0ca>] > Apr 20 17:31:52 um-fe-1 [<ffffffffa032b0ca>] > :ocfs2:ocfs2_cluster_lock+0x87/0x73a > Apr 20 17:31:52 um-fe-1 [ 9123.585958] RSP: 0018:ffff81001fd81af8 > EFLAGS: 00000282 > Apr 20 17:31:52 um-fe-1 [ 9123.585958] RAX: ffffffffa036ff60 RBX: > ffff81003180a858 RCX: 0000000000000000 > Apr 20 17:31:52 um-fe-1 [ 9123.585958] RDX: 0000000000000003 RSI: > ffff81003180a858 RDI: ffff81001fd81b58 > Apr 20 17:31:52 um-fe-1 [ 9123.585958] RBP: 0000000000000001 R08: > 0000000000000004 R09: ffff81003180ac38 > Apr 20 17:31:52 um-fe-1 [ 9123.585958] R10: 00000000005b3292 R11: > 0000000000000001 R12: ffff81003180a858 > Apr 20 17:31:52 um-fe-1 [ 9123.585958] R13: 0000000000000000 R14: > 00000000ffffffff R15: 0000080dadfb67c0 > Apr 20 17:31:52 um-fe-1 [ 9123.585958] FS: 00007f8c311116e0(0000) > GS:ffff81003f9f99c0(0000) knlGS:0000000000000000 > Apr 20 17:31:52 um-fe-1 [ 9123.585958] CS: 0010 DS: 0000 ES: 0000 > CR0: 0000000080050033 > Apr 20 17:31:52 um-fe-1 [ 9123.585958] CR2: 00007f8c3111c000 CR3: > 00000000195d4000 CR4: 00000000000006e0 > Apr 20 17:31:52 um-fe-1 [ 9123.585958] DR0: 0000000000000000 DR1: > 0000000000000000 DR2: 0000000000000000 > Apr 20 17:31:52 um-fe-1 [ 9123.585958] DR3: 0000000000000000 DR6: > 00000000ffff0ff0 DR7: 0000000000000400 > Apr 20 17:31:52 um-fe-1 [ 9123.585958] > Apr 20 17:31:52 um-fe-1 [ 9123.585958] Call Trace: > Apr 20 17:31:52 um-fe-1 [ 9123.585958] [<ffffffffa032b0c6>] ? > :ocfs2:ocfs2_cluster_lock+0x83/0x73a > Apr 20 17:31:52 um-fe-1 [ 9123.585958] [<ffffffffa033cde2>] ? > :ocfs2:ocfs2_wait_for_recovery+0x10/0x83 > Apr 20 17:31:52 um-fe-1 [ 9123.585958] [<ffffffffa032bd2c>] ? > :ocfs2:ocfs2_inode_lock_full+0x17a/0xd08 > Apr 20 17:31:52 um-fe-1 [ 9123.585958] [<ffffffff8042957d>] ? > _spin_unlock_irqrestore+0x7/0xe > Apr 20 17:31:52 um-fe-1 [ 9123.585958] [<ffffffffa032915e>] ? > :ocfs2:ocfs2_cluster_unlock+0x21d/0x289 > Apr 20 17:31:52 um-fe-1 [ 9123.585958] [<ffffffffa032c8d9>] ? > :ocfs2:ocfs2_inode_lock_with_page+0x1f/0x5f > Apr 20 17:31:52 um-fe-1 [ 9123.585958] [<ffffffffa032311f>] ? > :ocfs2:ocfs2_readpage+0x7e/0x33f > Apr 20 17:31:52 um-fe-1 [ 9123.585958] [<ffffffff80270eef>] ? > find_get_page+0x4a/0x54 > Apr 20 17:31:52 um-fe-1 [ 9123.585958] [<ffffffff80272bdc>] ? > generic_file_aio_read+0x31f/0x4a9 > Apr 20 17:31:52 um-fe-1 [ 9123.585958] [<ffffffffa0337182>] ? > :ocfs2:ocfs2_file_aio_read+0x282/0x395 > Apr 20 17:31:52 um-fe-1 [ 9123.585958] [<ffffffff8020bf79>] ? > sysret_signal+0x2b/0x45 > Apr 20 17:31:52 um-fe-1 [ 9123.585958] [<ffffffff8029ad13>] ? > do_sync_read+0xc9/0x10c > Apr 20 17:31:52 um-fe-1 [ 9123.585958] [<ffffffff802461b1>] ? > autoremove_wake_function+0x0/0x2e > Apr 20 17:31:52 um-fe-1 [ 9123.585958] [<ffffffff802460db>] ? > bit_waitqueue+0x10/0x97 > Apr 20 17:31:52 um-fe-1 [ 9123.585958] [<ffffffff802461a0>] ? > wake_up_bit+0x11/0x22 > Apr 20 17:31:52 um-fe-1 [ 9123.585958] [<ffffffff802aa5da>] ? > d_kill+0x44/0x59 > Apr 20 17:31:52 um-fe-1 [ 9123.585958] [<ffffffff8029b504>] ? > vfs_read+0xaa/0x152 > Apr 20 17:31:52 um-fe-1 [ 9123.585958] [<ffffffff8029b8e5>] ? > sys_read+0x45/0x6e > Apr 20 17:31:52 um-fe-1 [ 9123.585958] [<ffffffff8020beca>] ? > system_call_after_swapgs+0x8a/0x8f > Apr 20 17:31:52 um-fe-1 [ 9123.585958] > Apr 20 17:31:52 um-fe-1 [ 9123.585972] BUG: soft lockup - CPU#0 stuck > for 61s! [awstats.pl:4620] > > What can I do resolv this problem. Thanks. > > _______________________________________________ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users