Re: 2.6.10-ac10 oops in journal_commit_transaction
On Iau, 2005-04-21 at 23:29, Chris Wright wrote: > I believe it's fixed in 2.6.11-ac, and we fixed it in the current stable > 2.6.11.7 tree. The following patch is what went into 2.6.11.7: 2.6.11.7 or 2.6.11ac7 (ie 2.6.11.7-ac 8)) both have this fixed. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.10-ac10 oops in journal_commit_transaction
* Zou, Nanhai ([EMAIL PROTECTED]) wrote: > We have seen the same oops on the same point. > Can you point to me the URL where the patch is? > I am not sure which patch should I get. I believe it's fixed in 2.6.11-ac, and we fixed it in the current stable 2.6.11.7 tree. The following patch is what went into 2.6.11.7: --- From: Stephen Tweedie Subject: Prevent race condition in jbd This patch from Stephen Tweedie which fixes a race in jbd code (it demonstrated itself as more or less random NULL dereferences in the journal code). Acked-by: Jan Kara <[EMAIL PROTECTED]> Acked-by: Chris Mason <[EMAIL PROTECTED]> Signed-off-by: Chris Wright <[EMAIL PROTECTED]> Signed-off-by: Greg Kroah-Hartman <[EMAIL PROTECTED]> --- linux-2.6-ext3/fs/jbd/transaction.c.=K=.orig +++ linux-2.6-ext3/fs/jbd/transaction.c @@ -1775,10 +1775,10 @@ static int journal_unmap_buffer(journal_ JBUFFER_TRACE(jh, "checkpointed: add to BJ_Forget"); ret = __dispose_buffer(jh, journal->j_running_transaction); + journal_put_journal_head(jh); spin_unlock(&journal->j_list_lock); jbd_unlock_bh_state(bh); spin_unlock(&journal->j_state_lock); - journal_put_journal_head(jh); return ret; } else { /* There is no currently-running transaction. So the @@ -1789,10 +1789,10 @@ static int journal_unmap_buffer(journal_ JBUFFER_TRACE(jh, "give to committing trans"); ret = __dispose_buffer(jh, journal->j_committing_transaction); + journal_put_journal_head(jh); spin_unlock(&journal->j_list_lock); jbd_unlock_bh_state(bh); spin_unlock(&journal->j_state_lock); - journal_put_journal_head(jh); return ret; } else { /* The orphan record's transaction has @@ -1813,10 +1813,10 @@ static int journal_unmap_buffer(journal_ journal->j_running_transaction); jh->b_next_transaction = NULL; } + journal_put_journal_head(jh); spin_unlock(&journal->j_list_lock); jbd_unlock_bh_state(bh); spin_unlock(&journal->j_state_lock); - journal_put_journal_head(jh); return 0; } else { /* Good, the buffer belongs to the running transaction. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: 2.6.10-ac10 oops in journal_commit_transaction
Hi Alan, We have seen the same oops on the same point. Can you point to me the URL where the patch is? I am not sure which patch should I get. Thanks Zou Nan hai > -Original Message- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of Alan Cox > Sent: Monday, March 07, 2005 6:59 AM > To: Brice Figureau > Cc: Andrew Morton; Linux Kernel Mailing List > Subject: Re: 2.6.10-ac10 oops in journal_commit_transaction > > FYI Stephen Tweedie has now posted a patch for 2.6.x that ought to fix > this one. > > Alan > > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.10-ac10 oops in journal_commit_transaction
FYI Stephen Tweedie has now posted a patch for 2.6.x that ought to fix this one. Alan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.10-ac10 oops in journal_commit_transaction
Hi Andrew, On Thu, 2005-03-03 at 15:37 -0800, Andrew Morton wrote: > Brice Figureau <[EMAIL PROTECTED]> wrote: [snip] > > Unable to handle kernel NULL pointer dereference at virtual address 000c > > printing eip: > > c01a858d > > *pde = > > Oops: 0002 [#1] > > PREEMPT SMP > > Modules linked in: i2c_i801 i2c_core ip_conntrack_ftp ipt_LOG ipt_limit > > ipt_REJECT ipt_state iptable_filter ip_conntrack ip_tables > > CPU:2 > > EIP:0060:[journal_commit_transaction+877/5264]Not tainted VLI > > EFLAGS: 00010286 (2.6.10-ac10) > > EIP is at journal_commit_transaction+0x36d/0x1490 > > Please do: > > gdb vmlinux > (gdb) l *0xc01a858d Unfortunately this kernel is not compiled with CONFIG_DEBUG_INFO=y, so the above command does not work. But: (gdb) disassemble 0xc01a858d Dump of assembler code for function journal_commit_transaction: [snipped] ... 0xc01a8568 :test %eax,%eax 0xc01a856a :jne0xc01a93bf 0xc01a8570 :mov0xfea8(%ebp),%edx 0xc01a8576 :mov0x18(%edx),%eax 0xc01a8579 :test %eax,%eax 0xc01a857b :je 0xc01a8606 0xc01a8581 :mov$0xe000,%esi 0xc01a8586 :and%esp,%esi 0xc01a8588 :mov0x20(%eax),%edi 0xc01a858b :mov(%edi),%ebx 0xc01a858d :lock incl 0xc(%ebx) 0xc01a8591 :mov(%ebx),%eax 0xc01a8593 :test $0x4,%al 0xc01a8595 :jne0xc01a9379 0xc01a859b :mov%ebx,0x4(%esp) 0xc01a859f :mov0x8(%ebp),%ecx 0xc01a85a2 :mov%ecx,(%esp) 0xc01a85a5 :call 0xc01a81d0 0xc01a85aa :test %eax,%eax 0xc01a85ac :je 0xc01a9373 0xc01a85b2 :mov(%ebx),%eax 0xc01a85b4 :test $0x20,%ah So I recompiled my kernel with DEBUG_CONFIG_INFO with the hope that the code won't move too far and I could find the code: On the kernel with *debug* enabled: (gdb) l *0xc01a858d 0xc01a858d is in journal_commit_transaction (buffer_head.h:104). 99 * Emit the buffer bitops functions. Note that there are also functions 100 * of the form "mark_buffer_foo()". These are higher-level functions which 101 * do something in addition to setting a b_state bit. 102 */ 103 BUFFER_FNS(Uptodate, uptodate) 104 BUFFER_FNS(Dirty, dirty) 105 TAS_BUFFER_FNS(Dirty, dirty) 106 BUFFER_FNS(Lock, locked) 107 TAS_BUFFER_FNS(Lock, locked) 108 BUFFER_FNS(Req, req) Which does not seem to match the code included in the oops. (gdb) disassemble 0xc01a858d [snip] 0xc01a85c8 :test %eax,%eax 0xc01a85ca :jne0xc01a941f 0xc01a85d0 :mov0xfea8(%ebp),%edx 0xc01a85d6 :mov0x18(%edx),%eax 0xc01a85d9 :test %eax,%eax 0xc01a85db :je 0xc01a8666 0xc01a85e1 :mov$0xe000,%esi 0xc01a85e6 :and%esp,%esi 0xc01a85e8 :mov0x20(%eax),%edi 0xc01a85eb :mov(%edi),%ebx 0xc01a85ed :lock incl 0xc(%ebx) 0xc01a85f1 :mov(%ebx),%eax 0xc01a85f3 :test $0x4,%al 0xc01a85f5 :jne0xc01a93d9 0xc01a85fb :mov%ebx,0x4(%esp) 0xc01a85ff :mov0x8(%ebp),%ecx 0xc01a8602 :mov%ecx,(%esp) 0xc01a8605 :call 0xc01a8230 0xc01a860a :test %eax,%eax 0xc01a860c :je 0xc01a93d3 0xc01a8612 :mov(%ebx),%eax 0xc01a8614 :test $0x20,%ah So the same code is now at 0xc01a85ed: (gdb) l *0xc01a85ed 0xc01a85ed is in journal_commit_transaction (atomic.h:103). 98 * 99 * Atomically increments @v by 1. 100 */ 101 static __inline__ void atomic_inc(atomic_t *v) 102 { 103 __asm__ __volatile__( 104 LOCK "incl %0" 105 :"=m" (v->counter) 106 :"m" (v->counter)); 107 } It seems to me that get_bh is the culprit because of the following definition from include/linux/buffer_head.h: static inline void get_bh(struct buffer_head *bh) { atomic_inc(&bh->b_count); } I hope this will help you. Let me know if you need more information. Thanks for taking care of that problem, Regards, -- Brice Figureau <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.10-ac10 oops in journal_commit_transaction
Brice Figureau <[EMAIL PROTECTED]> wrote: > > I'm reporting an oops on a bi-Xeon database server under 2.6.10-ac10 > quite similar to: > http://marc.theaimsgroup.com/?l=ext3-users&m=110848085314238&w=2 > > I also got another server crashing (a mail server this time), but I > couldn't get the oops/panic. > > This was after more than two weeks of uptime, I was running 2.6.10-ac1 > before and never got this problem. > > Here are the oops information: > > Unable to handle kernel NULL pointer dereference at virtual address 000c > printing eip: > c01a858d > *pde = > Oops: 0002 [#1] > PREEMPT SMP > Modules linked in: i2c_i801 i2c_core ip_conntrack_ftp ipt_LOG ipt_limit > ipt_REJECT ipt_state iptable_filter ip_conntrack ip_tables > CPU:2 > EIP:0060:[journal_commit_transaction+877/5264]Not tainted VLI > EFLAGS: 00010286 (2.6.10-ac10) > EIP is at journal_commit_transaction+0x36d/0x1490 Please do: gdb vmlinux (gdb) l *0xc01a858d - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/