Hello, We have a 2 node OCFS2 cluster with the following (pretty standard) configuration:
cluster: node_count = 2 name = icitcluster node: ip_port = 7777 ip_address = x.x.x.91 number = 0 name = app01 cluster = icitcluster node: ip_port = 7777 ip_address = x.x.x.92 number = 1 name = app02 cluster = icitcluster Both machines are running a vanilla 2.6.39.1 kernel. Ocfs-tools are version 1.6.4. We had a crash of one of the nodes when the load was a little higher than usual. The trace is as follows: Aug 10 05:37:23 app02 kernel: [4261471.313660] ------------[ cut here ]------------ Aug 10 05:37:23 app02 kernel: [4261471.313693] kernel BUG at fs/jbd2/journal.c:1610! Aug 10 05:37:23 app02 kernel: [4261471.313719] invalid opcode: 0000 [#1] SMP Aug 10 05:37:23 app02 kernel: [4261471.313747] last sysfs file: /sys/devices/system/cpu/cpu7/cache/index2/shared_cpu_map Aug 10 05:37:23 app02 kernel: [4261471.313792] CPU 2 Aug 10 05:37:23 app02 kernel: [4261471.313798] Modules linked in: ip_vs nf_conntrack ocfs2 jbd2 quota_tree ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodem anager ocfs2_stackglue configfs xfs sd_mod crc32c ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_isc si bonding ipv6 ipmi_devintf cpufreq_ondemand freq_table mperf loop snd_pcm snd_timer snd soundcore serio_raw psmouse tpm_tis tpm evdev hpilo tpm_bios ipmi_s i ipmi_msghandler pcspkr rng_core i5000_edac snd_page_alloc edac_core container i5k_amb processor button ext3 jbd mbcache dm_mirror dm_region_hash dm_log dm_ snapshot dm_mod ide_cd_mod cdrom ata_piix ata_generic libata ide_pci_generic usbhid hid piix ide_core hpsa scsi_mod ehci_hcd uhci_hcd bnx2 cciss thermal fan thermal_sys [last unloaded: scsi_wait_scan] Aug 10 05:37:23 app02 kernel: [4261471.314234] Aug 10 05:37:23 app02 kernel: [4261471.314255] Pid: 3660, comm: ocfs2cmt Not tainted 2.6.39.1 #5 HP ProLiant DL360 G5 Aug 10 05:37:23 app02 kernel: [4261471.314303] RIP: 0010:[<ffffffffa066cead>] [<ffffffffa066cead>] jbd2_journal_flush+0x16d/0x1a0 [jbd2] Aug 10 05:37:23 app02 kernel: [4261471.314360] RSP: 0018:ffff88005ce6de00 EFLAGS: 00010286 Aug 10 05:37:23 app02 kernel: [4261471.314392] RAX: 0000000000000029 RBX: 00000000000082a3 RCX: 0000000000000000 Aug 10 05:37:23 app02 kernel: [4261471.314436] RDX: 0000000000000002 RSI: ffff88005ce6dd50 RDI: ffff8800697f2824 Aug 10 05:37:23 app02 kernel: [4261471.314480] RBP: ffff8800697f2b9c R08: ffff88005ce6c000 R09: 0000000000000000 Aug 10 05:37:23 app02 kernel: [4261471.314529] R10: 0000000000000000 R11: ffffffffa00652a0 R12: ffff8800697f2800 Aug 10 05:37:23 app02 kernel: [4261471.314572] R13: ffff8800697f28f8 R14: ffff8800697f2824 R15: ffff880069f91000 Aug 10 05:37:23 app02 kernel: [4261471.314617] FS: 0000000000000000(0000) GS:ffff88007fc80000(0000) knlGS:0000000000000000 Aug 10 05:37:23 app02 kernel: [4261471.314663] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b Aug 10 05:37:23 app02 kernel: [4261471.314690] CR2: 00007f0190f0d000 CR3: 0000000001419000 CR4: 00000000000406e0 Aug 10 05:37:23 app02 kernel: [4261471.314734] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Aug 10 05:37:23 app02 kernel: [4261471.314778] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Aug 10 05:37:23 app02 kernel: [4261471.314822] Process ocfs2cmt (pid: 3660, threadinfo ffff88005ce6c000, task ffff88005dcda100) Aug 10 05:37:23 app02 kernel: [4261471.314868] Stack: Aug 10 05:37:23 app02 kernel: [4261471.314889] ffff8800378c83c0 ffff8800378c83f8 0000000000000000 ffff8800378c83e8 Aug 10 05:37:23 app02 kernel: [4261471.314939] ffff88005dcda100 ffffffffa06b6ac2 0000000000000082 ffffffff8103576d Aug 10 05:37:23 app02 kernel: [4261471.314995] ffff88005dcda3f0 ffff88005dcda100 ffff88005dcda3f0 ffff88005dcda100 Aug 10 05:37:23 app02 kernel: [4261471.315045] Call Trace: Aug 10 05:37:23 app02 kernel: [4261471.315079] [<ffffffffa06b6ac2>] ? ocfs2_commit_thread+0x82/0x360 [ocfs2] Aug 10 05:37:23 app02 kernel: [4261471.315113] [<ffffffff8103576d>] ? try_to_wake_up+0xed/0x2b0 Aug 10 05:37:23 app02 kernel: [4261471.315142] [<ffffffff81053970>] ? wake_up_bit+0x40/0x40 Aug 10 05:37:23 app02 kernel: [4261471.315175] [<ffffffffa06b6a40>] ? ocfs2_journal_load+0x240/0x240 [ocfs2] Aug 10 05:37:23 app02 kernel: [4261471.315205] [<ffffffff810534e6>] ? kthread+0x96/0xb0 Aug 10 05:37:23 app02 kernel: [4261471.315235] [<ffffffff812dcdd4>] ? kernel_thread_helper+0x4/0x10 Aug 10 05:37:23 app02 kernel: [4261471.315264] [<ffffffff81053450>] ? kthread_worker_fn+0x130/0x130 Aug 10 05:37:23 app02 kernel: [4261471.315293] [<ffffffff812dcdd0>] ? gs_change+0xb/0xb Aug 10 05:37:23 app02 kernel: [4261471.315319] Code: 41 5c 41 5d 41 5e c3 0f 1f 80 00 00 00 00 49 8b 5c 24 58 48 85 db 0f 85 d2 fe ff ff f0 41 81 44 24 24 00 00 00 01 e9 d8 fe ff ff <0f> 0b eb fe 0f 1f 80 00 00 00 00 0f 0b eb fe 0f 1f 40 00 0f 0b Aug 10 05:37:23 app02 kernel: [4261471.315520] RIP [<ffffffffa066cead>] jbd2_journal_flush+0x16d/0x1a0 [jbd2] Aug 10 05:37:23 app02 kernel: [4261471.315553] RSP <ffff88005ce6de00> Aug 10 05:37:23 app02 kernel: [4261471.315810] ---[ end trace d969eb2580157a71 ]--- I've found another report of a crash in the same (source) location posted to the -devel list. The reporter was running a 2.6.38 kernel. There was no reply to his e-mail, though: http://www.mail-archive.com/ocfs2-devel@oss.oracle.com/msg07126.html Is it a known bug and is there a fix or is it new? Regards, Ronald. _______________________________________________ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users