Please try http://portal-md.glk.gr/ocfs2/core.32578.bz2
Please let me know, in case you have any problem downloading it. Thanks, George On Thu, 2011-09-15 at 09:45 -0700, Sunil Mushran wrote: > I was hoping to get a readable stack. Please could you provide a link to > the coredump. > > On 09/15/2011 02:51 AM, Betzos Giorgos wrote: > > Hello, > > > > I am sorry for the delay in responding. Unfortunately, if faulted again. > > > > Here is the log. Although my email client folds the Memory Map lines. > > The core file is available. > > > > Thanks, > > > > George > > > > # ./o2image.ppc.dbg /dev/mapper/mpath0 /files_shared/u02.o2image > > *** glibc detected *** ./o2image.ppc.dbg: corrupted double-linked list: > > 0x10075000 *** > > ======= Backtrace: ========= > > /lib/libc.so.6[0xfeb1ab4] > > /lib/libc.so.6(cfree+0xc8)[0xfeb5b68] > > ./o2image.ppc.dbg[0x1000d098] > > ./o2image.ppc.dbg[0x1000297c] > > ./o2image.ppc.dbg[0x10001eb8] > > ./o2image.ppc.dbg[0x1000228c] > > ./o2image.ppc.dbg[0x10002804] > > ./o2image.ppc.dbg[0x10001eb8] > > ./o2image.ppc.dbg[0x1000228c] > > ./o2image.ppc.dbg[0x10002804] > > ./o2image.ppc.dbg[0x10003bbc] > > ./o2image.ppc.dbg[0x10004480] > > /lib/libc.so.6[0xfe4dc60] > > /lib/libc.so.6[0xfe4dea0] > > ======= Memory map: ======== > > 00100000-00120000 r-xp 00100000 00:00 0 > > [vdso] > > 0f430000-0f440000 r-xp 00000000 08:13 > > 180307 /lib/libcom_err.so.2.1 > > 0f440000-0f450000 rw-p 00000000 08:13 > > 180307 /lib/libcom_err.so.2.1 > > 0f900000-0f9c0000 r-xp 00000000 08:13 > > 180293 /lib/libglib-2.0.so.0.1200.3 > > 0f9c0000-0f9d0000 rw-p 000b0000 08:13 > > 180293 /lib/libglib-2.0.so.0.1200.3 > > 0fa40000-0fa50000 r-xp 00000000 08:13 > > 180292 /lib/librt-2.5.so > > 0fa50000-0fa60000 r--p 00000000 08:13 > > 180292 /lib/librt-2.5.so > > 0fa60000-0fa70000 rw-p 00010000 08:13 > > 180292 /lib/librt-2.5.so > > 0fce0000-0fd00000 r-xp 00000000 08:13 > > 180291 /lib/libpthread-2.5.so > > 0fd00000-0fd10000 r--p 00010000 08:13 > > 180291 /lib/libpthread-2.5.so > > 0fd10000-0fd20000 rw-p 00020000 08:13 > > 180291 /lib/libpthread-2.5.so > > 0fe30000-0ffa0000 r-xp 00000000 08:13 > > 180288 /lib/libc-2.5.so > > 0ffa0000-0ffb0000 r--p 00160000 08:13 > > 180288 /lib/libc-2.5.so > > 0ffb0000-0ffc0000 rw-p 00170000 08:13 > > 180288 /lib/libc-2.5.so > > 0ffc0000-0ffe0000 r-xp 00000000 08:13 > > 180287 /lib/ld-2.5.so > > 0ffe0000-0fff0000 r--p 00010000 08:13 > > 180287 /lib/ld-2.5.so > > 0fff0000-10000000 rw-p 00020000 08:13 > > 180287 /lib/ld-2.5.so > > 10000000-10050000 r-xp 00000000 08:13 > > 7487795 /root/o2image.ppc.dbg > > 10050000-10060000 rw-p 00040000 08:13 > > 7487795 /root/o2image.ppc.dbg > > 10060000-10090000 rwxp 10060000 00:00 0 > > [heap] > > f7680000-f7ff0000 rw-p f7680000 00:00 0 > > ff9a0000-ffaf0000 rw-p ff9a0000 00:00 0 > > [stack] > > Aborted (core dumped) > > > > > > On Thu, 2011-09-08 at 12:10 -0700, Sunil Mushran wrote: > >> http://oss.oracle.com/~smushran/o2image.ppc.dbg > >> > >> Use the above executable. Hoping it won't fault. But if it does > >> email me the backtrace. That trace will be readable as the exec > >> has debugging symbols enabled. > >> > >> On 09/07/2011 11:24 PM, Betzos Giorgos wrote: > >>> # rpm -q ocfs2-tools > >>> ocfs2-tools-1.4.4-1.el5.ppc > >>> > >>> On Wed, 2011-09-07 at 09:13 -0700, Sunil Mushran wrote: > >>>> version of ocfs2-tools? > >>>> > >>>> On 09/07/2011 09:10 AM, Betzos Giorgos wrote: > >>>>> Hello, > >>>>> > >>>>> I tried what you suggested but here is what I got: > >>>>> > >>>>> # o2image /dev/mapper/mpath0 /files_shared/u02.o2image > >>>>> *** glibc detected *** o2image: corrupted double-linked list: > >>>>> 0x10045000 *** > >>>>> ======= Backtrace: ========= > >>>>> /lib/libc.so.6[0xfeb1ab4] > >>>>> /lib/libc.so.6(cfree+0xc8)[0xfeb5b68] > >>>>> o2image[0x10007bb0] > >>>>> o2image[0x10002748] > >>>>> o2image[0x10001f50] > >>>>> o2image[0x10002334] > >>>>> o2image[0x100026a0] > >>>>> o2image[0x10001f50] > >>>>> o2image[0x10002334] > >>>>> o2image[0x100026a0] > >>>>> o2image[0x1000358c] > >>>>> o2image[0x10003e28] > >>>>> /lib/libc.so.6[0xfe4dc60] > >>>>> /lib/libc.so.6[0xfe4dea0] > >>>>> ======= Memory map: ======== > >>>>> 00100000-00120000 r-xp 00100000 00:00 0 > >>>>> [vdso] > >>>>> 0f550000-0f560000 r-xp 00000000 08:13 2881590 > >>>>> /lib/libcom_err.so.2.1 > >>>>> 0f560000-0f570000 rw-p 00000000 08:13 2881590 > >>>>> /lib/libcom_err.so.2.1 > >>>>> 0f900000-0f9c0000 r-xp 00000000 08:13 2881576 > >>>>> /lib/libglib-2.0.so.0.1200.3 > >>>>> 0f9c0000-0f9d0000 rw-p 000b0000 08:13 2881576 > >>>>> /lib/libglib-2.0.so.0.1200.3 > >>>>> 0fa40000-0fa50000 r-xp 00000000 08:13 2881575 > >>>>> /lib/librt-2.5.so > >>>>> 0fa50000-0fa60000 r--p 00000000 08:13 2881575 > >>>>> /lib/librt-2.5.so > >>>>> 0fa60000-0fa70000 rw-p 00010000 08:13 2881575 > >>>>> /lib/librt-2.5.so > >>>>> 0fce0000-0fd00000 r-xp 00000000 08:13 2881574 > >>>>> /lib/libpthread-2.5.so > >>>>> 0fd00000-0fd10000 r--p 00010000 08:13 2881574 > >>>>> /lib/libpthread-2.5.so > >>>>> 0fd10000-0fd20000 rw-p 00020000 08:13 2881574 > >>>>> /lib/libpthread-2.5.so > >>>>> 0fe30000-0ffa0000 r-xp 00000000 08:13 2881571 > >>>>> /lib/libc-2.5.so > >>>>> 0ffa0000-0ffb0000 r--p 00160000 08:13 2881571 > >>>>> /lib/libc-2.5.so > >>>>> 0ffb0000-0ffc0000 rw-p 00170000 08:13 2881571 > >>>>> /lib/libc-2.5.so > >>>>> 0ffc0000-0ffe0000 r-xp 00000000 08:13 2881570 > >>>>> /lib/ld-2.5.so > >>>>> 0ffe0000-0fff0000 r--p 00010000 08:13 2881570 > >>>>> /lib/ld-2.5.so > >>>>> 0fff0000-10000000 rw-p 00020000 08:13 2881570 > >>>>> /lib/ld-2.5.so > >>>>> 10000000-10020000 r-xp 00000000 08:13 15058799 > >>>>> /sbin/o2image > >>>>> 10020000-10030000 rw-p 00010000 08:13 15058799 > >>>>> /sbin/o2image > >>>>> 10030000-10060000 rwxp 10030000 00:00 0 > >>>>> [heap] > >>>>> f7680000-f7ff0000 rw-p f7680000 00:00 0 > >>>>> ffc60000-ffdb0000 rw-p ffc60000 00:00 0 > >>>>> [stack] > >>>>> Aborted (core dumped) > >>>>> > >>>>> I have the core file, if you need it. > >>>>> > >>>>> Here is some information about the fs in question. > >>>>> It is used to store Oracle Archive Logs and also to store the rman > >>>>> backup of the DB > >>>>> In the last crash case the fs became full while rman was running. Maybe > >>>>> we can estimate from > >>>>> this the size of the write in that particular case. Oracle DB rman > >>>>> backup files are from 7 to 11Gb. > >>>>> Maybe Oracle DataGuard was also using on the same fs. > >>>>> After the crash, when we rebooted the servers, they would crash again. > >>>>> We then noticed that > >>>>> the fs was full and we removed some unneeded files. > >>>>> > >>>>> The system has crashed a couple more times when the above conditions > >>>>> may not have been the same. > >>>>> > >>>>> Thanks, > >>>>> > >>>>> George > >>>>> > >>>>> ________________________________________ > >>>>> From: Sunil Mushran > >>>>> Sent: Friday, September 02, 2011 8:24 PM > >>>>> To: Betzos Giorgos > >>>>> Cc: ocfs2-users@oss.oracle.com > >>>>> Subject: Re: [Ocfs2-users] Linux kernel crash due to ocfs2 > >>>>> > >>>>> Can you provide me with the o2image. It includes the entire fs metadata. > >>>>> The size of the image file depends on the number of files/dirs. > >>>>> > >>>>> # o2image /dev/sdX /path/to/image/file > >>>>> > >>>>> So the error is clear. We have underestimated the amount of credits > >>>>> (num of blocks that need to be dirtied in that transaction). This is > >>>>> the most > >>>>> common write path in the fs and thus hit heavily. So I am surprised by > >>>>> this. > >>>>> > >>>>> One way to fix it is by reproducing it inhouse. And having the image > >>>>> will allow > >>>>> us to mount the fs and reproduce the issue. Do you know the size of the > >>>>> write? > >>>>> > >>>>> On 09/02/2011 07:23 AM, Betzos Giorgos wrote: > >>>>>> Hello, > >>>>>> > >>>>>> we have a pair of IBM P570 servers running RHEL5.2 > >>>>>> kernel 2.6.18-92.el5.ppc64 > >>>>>> We have Oracle RAC on ocfs2 storage > >>>>>> ocfs2 is 1.4.7-1 for the above kernel (downloaded from oracle oss site) > >>>>>> > >>>>>> Recently both servers have been crashing with the following error: > >>>>>> > >>>>>> Assertion failure in journal_dirty_metadata() at > >>>>>> fs/jbd/transaction.c:1130: "handle->h_buffer_credits> 0" > >>>>>> kernel BUG in journal_dirty_metadata at fs/jbd/transaction.c:1130! > >>>>>> > >>>>>> We get some kind of kernel debug prompt. > >>>>>> > >>>>>> the stack is as follows: > >>>>>> > >>>>>> .ocfs2_journal_dirty+0x78/0x13c [ocfs2] > >>>>>> .ocfs2_search_chain+0x131c/0x165c [ocfs2] > >>>>>> .ocfs2_claim_suballoc_bits+0xadc/0xd94 [ocfs2] > >>>>>> .__ocfs2_claim_clusters+0x1b0/0x348 [ocfs2] > >>>>>> .ocf2_do_extend_allocation+0x1f8/0x5b4 [ocfs2] > >>>>>> .ocfs2_write_cluster_by_desc+0x128/0x850 [ocfs2] > >>>>>> .ocfs2_write_begin_nolock+0xdc0/0xfbc [ocfs2] > >>>>>> .ocfs2_write_begin+0x124/0x224 [ocfs2] > >>>>>> .ocfs2_file_aio_write+0x6a4/0xb40 [ocfs2] > >>>>>> .aio_pwrite+0x50/0xb4 > >>>>>> .aio_run_iocb+0x140/0x214 > >>>>>> .io_submit_one+0x2fc/0x3a8 > >>>>>> .sys_io_submit+0xd0/0x17c > >>>>>> syscall_exit+0x0/0x40 > >>>>>> > >>>>>> In the last crash case, the file system was full. > >>>>>> > >>>>>> Any clues? > >>>>>> > >>>>>> There seems to be a ocfs2 kernel patch some time ago for the 2.6.20.2 > >>>>>> kernel that fixed some journal credits updates. > >>>>>> > >>>>>> Is this another bug? > >>>>>> > >>>>>> Any help will be greatly appreciated, because this is a production > >>>>>> system. > >>>>>> > >>>>>> Thanks, > >>>>>> > >>>>>> George > _______________________________________________ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users