journal.c:1610, 2.6.39.1

Ronald Moesbergen Mon, 15 Aug 2011 01:10:19 -0700

Hello,

We have a 2 node OCFS2 cluster with the following (pretty standard)
configuration:


cluster:
        node_count = 2
        name = icitcluster
node:
        ip_port = 7777
        ip_address = x.x.x.91
        number = 0
        name = app01
        cluster = icitcluster
node:
        ip_port = 7777
        ip_address = x.x.x.92
        number = 1
        name = app02
        cluster = icitcluster

Both machines are running a vanilla 2.6.39.1 kernel. Ocfs-tools are
version 1.6.4. We had a crash of one of the nodes when the load was a
little higher than usual.  The trace is as follows:

Aug 10 05:37:23 app02 kernel: [4261471.313660] ------------[ cut here
]------------
Aug 10 05:37:23 app02 kernel: [4261471.313693] kernel BUG at
fs/jbd2/journal.c:1610!
Aug 10 05:37:23 app02 kernel: [4261471.313719] invalid opcode: 0000 [#1] SMP
Aug 10 05:37:23 app02 kernel: [4261471.313747] last sysfs file:
/sys/devices/system/cpu/cpu7/cache/index2/shared_cpu_map
Aug 10 05:37:23 app02 kernel: [4261471.313792] CPU 2
Aug 10 05:37:23 app02 kernel: [4261471.313798] Modules linked in:
ip_vs nf_conntrack ocfs2 jbd2 quota_tree ocfs2_dlmfs ocfs2_stack_o2cb
ocfs2_dlm ocfs2_nodem
anager ocfs2_stackglue configfs xfs sd_mod crc32c ib_iser rdma_cm
ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp
libiscsi scsi_transport_isc
si bonding ipv6 ipmi_devintf cpufreq_ondemand freq_table mperf loop
snd_pcm snd_timer snd soundcore serio_raw psmouse tpm_tis tpm evdev
hpilo tpm_bios ipmi_s
i ipmi_msghandler pcspkr rng_core i5000_edac snd_page_alloc edac_core
container i5k_amb processor button ext3 jbd mbcache dm_mirror
dm_region_hash dm_log dm_
snapshot dm_mod ide_cd_mod cdrom ata_piix ata_generic libata
ide_pci_generic usbhid hid piix ide_core hpsa scsi_mod ehci_hcd
uhci_hcd bnx2 cciss thermal fan
thermal_sys [last unloaded: scsi_wait_scan]
Aug 10 05:37:23 app02 kernel: [4261471.314234]
Aug 10 05:37:23 app02 kernel: [4261471.314255] Pid: 3660, comm:
ocfs2cmt Not tainted 2.6.39.1 #5 HP ProLiant DL360 G5
Aug 10 05:37:23 app02 kernel: [4261471.314303] RIP:
0010:[<ffffffffa066cead>]  [<ffffffffa066cead>]
jbd2_journal_flush+0x16d/0x1a0 [jbd2]
Aug 10 05:37:23 app02 kernel: [4261471.314360] RSP:
0018:ffff88005ce6de00  EFLAGS: 00010286
Aug 10 05:37:23 app02 kernel: [4261471.314392] RAX: 0000000000000029
RBX: 00000000000082a3 RCX: 0000000000000000
Aug 10 05:37:23 app02 kernel: [4261471.314436] RDX: 0000000000000002
RSI: ffff88005ce6dd50 RDI: ffff8800697f2824
Aug 10 05:37:23 app02 kernel: [4261471.314480] RBP: ffff8800697f2b9c
R08: ffff88005ce6c000 R09: 0000000000000000
Aug 10 05:37:23 app02 kernel: [4261471.314529] R10: 0000000000000000
R11: ffffffffa00652a0 R12: ffff8800697f2800
Aug 10 05:37:23 app02 kernel: [4261471.314572] R13: ffff8800697f28f8
R14: ffff8800697f2824 R15: ffff880069f91000
Aug 10 05:37:23 app02 kernel: [4261471.314617] FS:
0000000000000000(0000) GS:ffff88007fc80000(0000)
knlGS:0000000000000000
Aug 10 05:37:23 app02 kernel: [4261471.314663] CS:  0010 DS: 0000 ES:
0000 CR0: 000000008005003b
Aug 10 05:37:23 app02 kernel: [4261471.314690] CR2: 00007f0190f0d000
CR3: 0000000001419000 CR4: 00000000000406e0
Aug 10 05:37:23 app02 kernel: [4261471.314734] DR0: 0000000000000000
DR1: 0000000000000000 DR2: 0000000000000000
Aug 10 05:37:23 app02 kernel: [4261471.314778] DR3: 0000000000000000
DR6: 00000000ffff0ff0 DR7: 0000000000000400
Aug 10 05:37:23 app02 kernel: [4261471.314822] Process ocfs2cmt (pid:
3660, threadinfo ffff88005ce6c000, task ffff88005dcda100)
Aug 10 05:37:23 app02 kernel: [4261471.314868] Stack:
Aug 10 05:37:23 app02 kernel: [4261471.314889]  ffff8800378c83c0
ffff8800378c83f8 0000000000000000 ffff8800378c83e8
Aug 10 05:37:23 app02 kernel: [4261471.314939]  ffff88005dcda100
ffffffffa06b6ac2 0000000000000082 ffffffff8103576d
Aug 10 05:37:23 app02 kernel: [4261471.314995]  ffff88005dcda3f0
ffff88005dcda100 ffff88005dcda3f0 ffff88005dcda100
Aug 10 05:37:23 app02 kernel: [4261471.315045] Call Trace:
Aug 10 05:37:23 app02 kernel: [4261471.315079]  [<ffffffffa06b6ac2>] ?
ocfs2_commit_thread+0x82/0x360 [ocfs2]
Aug 10 05:37:23 app02 kernel: [4261471.315113]  [<ffffffff8103576d>] ?
try_to_wake_up+0xed/0x2b0
Aug 10 05:37:23 app02 kernel: [4261471.315142]  [<ffffffff81053970>] ?
wake_up_bit+0x40/0x40
Aug 10 05:37:23 app02 kernel: [4261471.315175]  [<ffffffffa06b6a40>] ?
ocfs2_journal_load+0x240/0x240 [ocfs2]
Aug 10 05:37:23 app02 kernel: [4261471.315205]  [<ffffffff810534e6>] ?
kthread+0x96/0xb0
Aug 10 05:37:23 app02 kernel: [4261471.315235]  [<ffffffff812dcdd4>] ?
kernel_thread_helper+0x4/0x10
Aug 10 05:37:23 app02 kernel: [4261471.315264]  [<ffffffff81053450>] ?
kthread_worker_fn+0x130/0x130
Aug 10 05:37:23 app02 kernel: [4261471.315293]  [<ffffffff812dcdd0>] ?
gs_change+0xb/0xb
Aug 10 05:37:23 app02 kernel: [4261471.315319] Code: 41 5c 41 5d 41 5e
c3 0f 1f 80 00 00 00 00 49 8b 5c 24 58 48 85 db 0f 85 d2 fe ff ff f0
41 81 44 24 24 00
 00 00 01 e9 d8 fe ff ff <0f> 0b eb fe 0f 1f 80 00 00 00 00 0f 0b eb
fe 0f 1f 40 00 0f 0b
Aug 10 05:37:23 app02 kernel: [4261471.315520] RIP
[<ffffffffa066cead>] jbd2_journal_flush+0x16d/0x1a0 [jbd2]
Aug 10 05:37:23 app02 kernel: [4261471.315553]  RSP <ffff88005ce6de00>
Aug 10 05:37:23 app02 kernel: [4261471.315810] ---[ end trace
d969eb2580157a71 ]---

I've found another report of a crash in the same (source) location
posted to the -devel list. The reporter was running a 2.6.38 kernel.
There was no reply to his e-mail, though:

http://www.mail-archive.com/ocfs2-devel@oss.oracle.com/msg07126.html

Is it a known bug and is there a fix or is it new?

Regards,
Ronald.

_______________________________________________
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users

[Ocfs2-users] OCFS2 oops/bug in fs/jbd2/journal.c:1610, 2.6.39.1

Reply via email to