Do you have the full oops trace?
Nathan Ehresman wrote:
I have a strange OCFS2 problem that has been plaguing me. I have 2
separate OCFS2 clusters, each consisting of 3 machines. One is an
Oracle RAC, the other is used as a shared DocumentRoot for a web
cluster. All 6 machines are in an IBM Bladecenter and thus are nearly
identical hardware and use the same ethernet switch and FC switch.
All 6 machines connect to the same SAN but mount completely different
partitions (LVMed). The 3 RAC nodes are running RHEL
2.6.9-34.0.2.ELsmp and the 3 web heads are running kernel
2.6.9-42.0.3. All 6 machines are running OCFS2 1.2.4. Also, all 6
nodes that their O2CB_HEARTBEAT_THRESHOLD set at 31 as it appears the
timeout on my HBAs is set at 60 seconds.
Every once in a while if two of the web heads are powered on at the
same time and begin to mount the shared OCFS2 partition, one of my
Oracle nodes will complain that OCFS2 is self fencing itself and then
reboot itself (thanks to the hangcheck timer). It is always the 2nd
node in the RAC cluster that does this while nodes 1 and 3 stay up
just fine. I have the following stack trace taken from a netdump of
the kernel on RAC node 2 when it goes down, but I am not familiar
enough with OCFS2 internals to read it. Can anybody read this and
give me any insight into what might be causing this problem?
[<c0129a20>] check_timer_failed+0x3c/0x58
[<c0129c7d>] del_timer+0x12/0x65
[<f88f326b>] qla2x00_done+0x2c6/0x37a [qla2xxx]
[<f88fe7f6>] qla2300_intr_handler+0x25a/0x267 [qla2xxx]
[<c0107472>] handle_IRQ_event+0x25/0x4f
[<c01079d2>] do_IRQ+0x11c/0x1ae
=======================
[<c02d304c>] common_interrupt+0x18/0x20
[<f8c9007b>] ocfs2_do_truncate+0x37a/0xb84 [ocfs2]
[<c02d122b>] _spin_lock+0x27/0x34
[<f8c9700c>] ocfs2_cluster_lock+0xf2/0x894 [ocfs2]
[<f8c96ea1>] ocfs2_status_completion_cb+0x0/0xa [ocfs2]
[<f8c99444>] ocfs2_meta_lock_full+0x1e7/0x57e [ocfs2]
[<c016e4c0>] dput+0x34/0x1a7
[<c01668c8>] link_path_walk+0x94/0xbe
[<c01672e3>] open_namei+0x99/0x579
[<f8ca7625>] ocfs2_inode_revalidate+0x11a/0x1f9 [ocfs2]
[<f8ca3808>] ocfs2_getattr+0x0/0x14d [ocfs2]
[<f8ca386b>] ocfs2_getattr+0x63/0x14d [ocfs2]
[<f8ca3808>] ocfs2_getattr+0x0/0x14d [ocfs2]
[<c0161fa2>] vfs_getattr+0x35/0x88
[<c016201d>] vfs_stat+0x28/0x3a
[<c01672e3>] open_namei+0x99/0x579
[<c015990b>] filp_open+0x66/0x70
[<c0162612>] sys_stat64+0xf/0x23
[<c02d0ca2>] __cond_resched+0x14/0x39
[<c01c23c2>] direct_strncpy_from_user+0x3e/0x5d
[<c0159c7f>] sys_open+0x6a/0x7d
[<c02d268f>] syscall_call+0x7/0xb
Thanks,
Nathan
_______________________________________________
Ocfs2-users mailing list
[email protected]
http://oss.oracle.com/mailman/listinfo/ocfs2-users