For issues on sles, please file a bug/sr with novell. The issue here is insufficient journal credits. It _could_ be that this version is missing mainline git commit e051fda4fd14fe878e6d2183b3a4640febe9e9a8. But I don't know. Novell Support will be better placed to track down the issue.
Sérgio Surkamp wrote: > Hi list, > > One of our OCFS2 servers crashed with this message: > > Aug 26 11:33:11 soap01 kernel: Assertion failure in > journal_dirty_metadata() at fs/jbd/transaction.c:1114: > "handle->h_buffer_credits > 0" > Aug 26 11:33:11 soap01 kernel: ----------- [cut here ] --------- > [please bite here ] --------- > Aug 26 11:33:11 soap01 kernel: Kernel BUG at fs/jbd/transaction.c:1114 > Aug 26 11:33:11 soap01 kernel: invalid opcode: 0000 [1] SMP > Aug 26 11:33:11 soap01 kernel: last sysfs > file: /devices/pci0000:00/0000:00:00.0/irq > Aug 26 11:33:11 soap01 kernel: CPU 0 Aug 26 11:33:11 soap01 kernel: > Modules linked in: af_packet joydev ocfs2 jbd ocfs2_dlmfs ocfs2_dlm > ocfs2_nodemanager configfs nfsd exportfs lockd nfs_acl sunrpc ipv6 > button battery ac netconsole xt_comment xt_tcpudp xt_state > iptable_filter iptable_mangle iptable_nat ip_nat ip_ conntrack > nfnetlink ip_tables x_tables apparmor loop st sr_mod usbhid usb_storage > hw_random shpchp ide_cd aic7xxx uhci_hcd cdrom pci_hotplug ehci_hcd > scsi_transport_spi usbcore bnx2 reiserfs ata_piix ahci libata > dm_snapshot qla2xxx firmware_class qla2xxx_conf intermodule edd d m_mod > fan thermal processor sg megaraid_sas piix sd_mod scsi_mod ide_disk > ide_core Aug 26 11:33:11 soap01 kernel: Pid: 4874, comm: nfsd Tainted: > G U 2.6.16.60-0.21-smp #1 > Aug 26 11:33:11 soap01 kernel: RIP: 0010:[<ffffffff885e21e0>] > <ffffffff885e21e0>{:jbd:journal_dirty_metadata+200} > Aug 26 11:33:11 soap01 kernel: RSP: 0018:ffff81021e9f1c18 EFLAGS: > 00010292 > Aug 26 11:33:11 soap01 kernel: RAX: 000000000000006e RBX: > ffff8101decf30c0 RCX: 0000000000000292 > Aug 26 11:33:11 soap01 kernel: RDX: ffffffff80359968 RSI: > 0000000000000296 RDI: ffffffff80359960 > Aug 26 11:33:11 soap01 kernel: RBP: ffff81002f753870 R08: > ffffffff80359968 R09: ffff810221d3ad80 > Aug 26 11:33:11 soap01 kernel: R10: ffff810001035680 R11: > 0000000000000070 R12: ffff8101dda21588 > Aug 26 11:33:11 soap01 kernel: R13: ffff810207e2fa90 R14: > ffff8102277ab400 R15: ffff8100a4dd394c > Aug 26 11:33:11 soap01 kernel: FS: 00002b7055e986d0(0000) > GS:ffffffff803d3000(0000) knlGS:0000000000000000 > Aug 26 11:33:11 soap01 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: > 000000008005003b > Aug 26 11:33:11 soap01 kernel: CR2: 00002aaaaabdb000 CR3: > 000000015e180000 CR4: 00000000000006e0 > Aug 26 11:33:11 soap01 kernel: Process nfsd (pid: 4874, threadinfo > ffff81021e9f0000, task ffff81021f92e860) > Aug 26 11:33:11 soap01 kernel: Stack: ffff81002f753870 ffff8101dda21588 > 0000000000000000 0000000000000003 > Aug 26 11:33:11 soap01 kernel: ffff81018ba52000 ffffffff8862187f > 0000000000000000 ffff81018ba52040 > Aug 26 11:33:11 soap01 kernel: ffff81007f5163f8 ffffffff8860b38a > Aug 26 11:33:11 soap01 kernel: Call Trace: > <ffffffff8862187f>{:ocfs2:ocfs2_journal_dirty+106} > Aug 26 11:33:11 soap01 kernel: > <ffffffff8860b38a>{:ocfs2:__ocfs2_add_entry+745} > <ffffffff88628766>{:ocfs2:ocfs2_mknod+1710} > Aug 26 11:33:11 soap01 kernel: > <ffffffff88628a45>{:ocfs2:ocfs2_mkdir+127} > <ffffffff80192b48>{vfs_mkdir+346} > Aug 26 11:33:11 soap01 kernel: <ffffffff88522f05>{:nfsd:nfsd_create+753} > <ffffffff88529bb2>{:nfsd:nfsd3_proc_mkdir+217} > Aug 26 11:33:11 soap01 kernel: > <ffffffff8851e0ea>{:nfsd:nfsd_dispatch+216} > <ffffffff884d549a>{:sunrpc:svc_process+982} > Aug 26 11:33:11 soap01 kernel: <ffffffff802ea247>{__down_read+21} > <ffffffff8851e46e>{:nfsd:nfsd+0} > Aug 26 11:33:11 soap01 kernel: <ffffffff8851e63d>{:nfsd:nfsd+463} > <ffffffff8010bed2>{child_rip+8} > Aug 26 11:33:11 soap01 kernel: <ffffffff8851e46e>{:nfsd:nfsd+0} > <ffffffff8851e46e>{:nfsd:nfsd+0} > Aug 26 11:33:11 soap01 kernel: <ffffffff8010beca>{child_rip+0} > Aug 26 11:33:11 soap01 kernel: > Aug 26 11:33:11 soap01 kernel: Code: 0f 0b 68 b9 8a 5e 88 c2 5a 04 41 > ff 4c 24 08 49 39 5d 28 75 > Aug 26 11:33:11 soap01 kernel: RIP > <ffffffff885e21e0>{:jbd:journal_dirty_metadata+200} RSP > <ffff81021e9f1c18> > > Operating system: SuSE SLES 10SP1 > Kernel: 2.6.16.60-0.21-smp > OCFS2: 1.4.0-SLES > > Environment: > > * 2 FreeBSD 7.1-RELEASE-p2 NFS Clients > * 2 SLES 10SP1 exporting the filesystem > > The FreeBSD clients are our email servers, so the main traffic is many > small email files. > > NFS mounted with protocol version 3, readdirplus disabled, read and > write buffer of 32k. > > Pre-crash symptoms: > * The ocfs filesystem hung for a while or gets very slow; > * Low or null device traffic on both nodes (checked with `iostat`); > * The server load get 5 to 6 points higher; > * It seems that something in kernel deadlock, as other processes (doing > IO, but in other mount points with raiserfs) hug a CPU with 100% > usage; > Eg: There is a mysql database in raiserfs mount point and the mysqld > hug the CPU when I call `rcmysql stop`; > * Calling `reboot` or `shutdown -r now` blocks the console (didn't tried > to run it with strace to get the locking point, but will try it happen > again); > * imapd on clients blocked in nfs requests; > One of the processes was blocked in (FreeBSD kernel) state bo_wwa. > Looking in some discussion group's over the net, this state means > blocked by stale NFS server. Attaching to the process with `gdb`, its > always blocked in close() libc call; > > imapd process backtrace: > #0 0x282a5da3 in close () from /lib/libc.so.7 > #1 0x282a5711 in memcpy () from /lib/libc.so.7 > #2 0xbfbf9378 in ?? () > #3 0x2828d58d in fclose () from /lib/libc.so.7 > > Could it be related to o2cb configuration? Current configuration: > > O2CB_HEARTBEAT_THRESHOLD=61 > O2CB_IDLE_TIMEOUT_MS=60000 > > The heartbeat network is a GBit ethernet. > > Regards, > _______________________________________________ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users