Re: 2.6.10-ac10 oops in journal_commit_transaction

2005-04-22 Thread Alan Cox
On Iau, 2005-04-21 at 23:29, Chris Wright wrote:
> I believe it's fixed in 2.6.11-ac, and we fixed it in the current stable
> 2.6.11.7 tree.  The following patch is what went into 2.6.11.7:

2.6.11.7 or 2.6.11ac7 (ie 2.6.11.7-ac 8)) both have this fixed.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.10-ac10 oops in journal_commit_transaction

2005-04-21 Thread Chris Wright
* Zou, Nanhai ([EMAIL PROTECTED]) wrote:
>   We have seen the same oops on the same point.
> Can you point to me the URL where the patch is? 
> I am not sure which patch should I get.

I believe it's fixed in 2.6.11-ac, and we fixed it in the current stable
2.6.11.7 tree.  The following patch is what went into 2.6.11.7:
---

From: Stephen Tweedie
Subject: Prevent race condition in jbd

This patch from Stephen Tweedie which fixes a race in jbd code (it
demonstrated itself as more or less random NULL dereferences in the
journal code).

Acked-by: Jan Kara <[EMAIL PROTECTED]>
Acked-by: Chris Mason <[EMAIL PROTECTED]>
Signed-off-by: Chris Wright <[EMAIL PROTECTED]>
Signed-off-by: Greg Kroah-Hartman <[EMAIL PROTECTED]>

--- linux-2.6-ext3/fs/jbd/transaction.c.=K=.orig
+++ linux-2.6-ext3/fs/jbd/transaction.c
@@ -1775,10 +1775,10 @@ static int journal_unmap_buffer(journal_
JBUFFER_TRACE(jh, "checkpointed: add to BJ_Forget");
ret = __dispose_buffer(jh,
journal->j_running_transaction);
+   journal_put_journal_head(jh);
spin_unlock(&journal->j_list_lock);
jbd_unlock_bh_state(bh);
spin_unlock(&journal->j_state_lock);
-   journal_put_journal_head(jh);
return ret;
} else {
/* There is no currently-running transaction. So the
@@ -1789,10 +1789,10 @@ static int journal_unmap_buffer(journal_
JBUFFER_TRACE(jh, "give to committing trans");
ret = __dispose_buffer(jh,
journal->j_committing_transaction);
+   journal_put_journal_head(jh);
spin_unlock(&journal->j_list_lock);
jbd_unlock_bh_state(bh);
spin_unlock(&journal->j_state_lock);
-   journal_put_journal_head(jh);
return ret;
} else {
/* The orphan record's transaction has
@@ -1813,10 +1813,10 @@ static int journal_unmap_buffer(journal_
journal->j_running_transaction);
jh->b_next_transaction = NULL;
}
+   journal_put_journal_head(jh);
spin_unlock(&journal->j_list_lock);
jbd_unlock_bh_state(bh);
spin_unlock(&journal->j_state_lock);
-   journal_put_journal_head(jh);
return 0;
} else {
/* Good, the buffer belongs to the running transaction.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: 2.6.10-ac10 oops in journal_commit_transaction

2005-04-21 Thread Zou, Nanhai
Hi Alan,
We have seen the same oops on the same point.
Can you point to me the URL where the patch is? 
I am not sure which patch should I get.

Thanks
Zou Nan hai 

> -Original Message-
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] On Behalf Of Alan Cox
> Sent: Monday, March 07, 2005 6:59 AM
> To: Brice Figureau
> Cc: Andrew Morton; Linux Kernel Mailing List
> Subject: Re: 2.6.10-ac10 oops in journal_commit_transaction
> 
> FYI Stephen Tweedie has now posted a patch for 2.6.x that ought to fix
> this one.
> 
> Alan
> 
> -
> To unsubscribe from this list: send the line "unsubscribe
linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.10-ac10 oops in journal_commit_transaction

2005-03-06 Thread Alan Cox
FYI Stephen Tweedie has now posted a patch for 2.6.x that ought to fix
this one.

Alan

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.10-ac10 oops in journal_commit_transaction

2005-03-04 Thread Brice Figureau
Hi Andrew,

On Thu, 2005-03-03 at 15:37 -0800, Andrew Morton wrote:
> Brice Figureau <[EMAIL PROTECTED]> wrote:
[snip]
> > Unable to handle kernel NULL pointer dereference at virtual address 000c
> >  printing eip:
> > c01a858d
> > *pde = 
> > Oops: 0002 [#1]
> > PREEMPT SMP 
> > Modules linked in: i2c_i801 i2c_core ip_conntrack_ftp ipt_LOG ipt_limit 
> > ipt_REJECT ipt_state iptable_filter ip_conntrack ip_tables
> > CPU:2
> > EIP:0060:[journal_commit_transaction+877/5264]Not tainted VLI
> > EFLAGS: 00010286   (2.6.10-ac10) 
> > EIP is at journal_commit_transaction+0x36d/0x1490
> 
> Please do:
> 
> gdb vmlinux
> (gdb) l *0xc01a858d

Unfortunately this kernel is not compiled with CONFIG_DEBUG_INFO=y, so
the above command does not work.

But:
(gdb) disassemble 0xc01a858d
Dump of assembler code for function journal_commit_transaction:
[snipped]
...
0xc01a8568 :test   %eax,%eax
0xc01a856a :jne0xc01a93bf 

0xc01a8570 :mov0xfea8(%ebp),%edx
0xc01a8576 :mov0x18(%edx),%eax
0xc01a8579 :test   %eax,%eax
0xc01a857b :je 0xc01a8606 

0xc01a8581 :mov$0xe000,%esi
0xc01a8586 :and%esp,%esi
0xc01a8588 :mov0x20(%eax),%edi
0xc01a858b :mov(%edi),%ebx
0xc01a858d :lock incl 0xc(%ebx)
0xc01a8591 :mov(%ebx),%eax
0xc01a8593 :test   $0x4,%al
0xc01a8595 :jne0xc01a9379 

0xc01a859b :mov%ebx,0x4(%esp)
0xc01a859f :mov0x8(%ebp),%ecx
0xc01a85a2 :mov%ecx,(%esp)
0xc01a85a5 :call   0xc01a81d0 

0xc01a85aa :test   %eax,%eax
0xc01a85ac :je 0xc01a9373 

0xc01a85b2 :mov(%ebx),%eax
0xc01a85b4 :test   $0x20,%ah

So I recompiled my kernel with DEBUG_CONFIG_INFO with the hope that the
code won't move too far and I could find the code:

On the kernel with *debug* enabled:
(gdb) l *0xc01a858d
0xc01a858d is in journal_commit_transaction (buffer_head.h:104).
99   * Emit the buffer bitops functions.   Note that there are also 
functions
100  * of the form "mark_buffer_foo()".  These are higher-level functions 
which
101  * do something in addition to setting a b_state bit.
102  */
103 BUFFER_FNS(Uptodate, uptodate)
104 BUFFER_FNS(Dirty, dirty)
105 TAS_BUFFER_FNS(Dirty, dirty)
106 BUFFER_FNS(Lock, locked)
107 TAS_BUFFER_FNS(Lock, locked)
108 BUFFER_FNS(Req, req)

Which does not seem to match the code included in the oops.

(gdb) disassemble 0xc01a858d
[snip]
0xc01a85c8 :test   %eax,%eax
0xc01a85ca :jne0xc01a941f 

0xc01a85d0 :mov0xfea8(%ebp),%edx
0xc01a85d6 :mov0x18(%edx),%eax
0xc01a85d9 :test   %eax,%eax
0xc01a85db :je 0xc01a8666 

0xc01a85e1 :mov$0xe000,%esi
0xc01a85e6 :and%esp,%esi
0xc01a85e8 :mov0x20(%eax),%edi
0xc01a85eb :mov(%edi),%ebx
0xc01a85ed :lock incl 0xc(%ebx)
0xc01a85f1 :mov(%ebx),%eax
0xc01a85f3 :test   $0x4,%al
0xc01a85f5 :jne0xc01a93d9 

0xc01a85fb :mov%ebx,0x4(%esp)
0xc01a85ff :mov0x8(%ebp),%ecx
0xc01a8602 :mov%ecx,(%esp)
0xc01a8605 :call   0xc01a8230 

0xc01a860a :test   %eax,%eax
0xc01a860c :je 0xc01a93d3 

0xc01a8612 :mov(%ebx),%eax
0xc01a8614 :test   $0x20,%ah

So the same code is now at 0xc01a85ed:
(gdb) l *0xc01a85ed
0xc01a85ed is in journal_commit_transaction (atomic.h:103).
98   * 
99   * Atomically increments @v by 1.
100  */ 
101 static __inline__ void atomic_inc(atomic_t *v)
102 {
103 __asm__ __volatile__(
104 LOCK "incl %0"
105 :"=m" (v->counter)
106 :"m" (v->counter));
107 }

It seems to me that get_bh is the culprit because of the following
definition from include/linux/buffer_head.h:
static inline void get_bh(struct buffer_head *bh)
{
atomic_inc(&bh->b_count);
}

I hope this will help you. Let me know if you need more information.
Thanks for taking care of that problem,
Regards,
-- 
Brice Figureau <[EMAIL PROTECTED]>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.10-ac10 oops in journal_commit_transaction

2005-03-03 Thread Andrew Morton
Brice Figureau <[EMAIL PROTECTED]> wrote:
>
> I'm reporting an oops on a bi-Xeon database server under 2.6.10-ac10
> quite similar to:
> http://marc.theaimsgroup.com/?l=ext3-users&m=110848085314238&w=2
> 
> I also got another server crashing (a mail server this time), but I
> couldn't get the oops/panic.
> 
> This was after more than two weeks of uptime, I was running 2.6.10-ac1
> before and never got this problem.
> 
> Here are the oops information:
> 
> Unable to handle kernel NULL pointer dereference at virtual address 000c
>  printing eip:
> c01a858d
> *pde = 
> Oops: 0002 [#1]
> PREEMPT SMP 
> Modules linked in: i2c_i801 i2c_core ip_conntrack_ftp ipt_LOG ipt_limit 
> ipt_REJECT ipt_state iptable_filter ip_conntrack ip_tables
> CPU:2
> EIP:0060:[journal_commit_transaction+877/5264]Not tainted VLI
> EFLAGS: 00010286   (2.6.10-ac10) 
> EIP is at journal_commit_transaction+0x36d/0x1490

Please do:

gdb vmlinux
(gdb) l *0xc01a858d
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/