Re: xfslogd-spinlock bug?

2007-01-11 Thread Janos Haar

- Original Message - 
From: "David Chinner" <[EMAIL PROTECTED]>
To: "Janos Haar" <[EMAIL PROTECTED]>
Cc: "David Chinner" <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>;

Sent: Thursday, January 11, 2007 4:34 AM
Subject: Re: xfslogd-spinlock bug?


> On Wed, Jan 10, 2007 at 06:18:08PM +0100, Janos Haar wrote:
> > From: "David Chinner" <[EMAIL PROTECTED]>
> > > Different corruption in RBX here. Looks like semi-random garbage
there.
> > > I wonder - what's the mac and ip address(es) of your machine and nbd
> > > servers?
> >
> > dy-base:
>
> no matches. Oh well, it was a long shot.
>
> > Some new stuff:
> > Jan  8 18:11:16 dy-base RAX: 80810800 RBX: 04f0e2850659 RCX:
>
> RBX trashed again with more random garbage.
>
> > Jan  8 18:11:16 dy-base  []
default_wake_function+0xd/0xf
> > Jan  8 18:11:16 dy-base  []
> > autoremove_wake_function+0x11/0x38
> > Jan  8 18:11:16 dy-base  [] __wake_up_common+0x3e/0x68
> > Jan  8 18:11:16 dy-base  [] __wake_up+0x38/0x50
> > Jan  8 18:11:16 dy-base  []
> > sk_stream_write_space+0x5d/0x83
> > Jan  8 18:11:16 dy-base  [] tcp_check_space+0x8f/0xcd
> > Jan  8 18:11:16 dy-base  []
> > tcp_rcv_established+0x116/0x76e
> > Jan  8 18:11:16 dy-base  [] tcp_v4_do_rcv+0x2d/0x322
> > Jan  8 18:11:16 dy-base  [] tcp_v4_rcv+0x8bb/0x925
> > Jan  8 18:11:16 dy-base  []
> > ip_local_deliver_finish+0x0/0x1ce
> > Jan  8 18:11:16 dy-base  []
ip_local_deliver+0x172/0x238
> > Jan  8 18:11:16 dy-base  [] ip_rcv+0x44f/0x497
> > Jan  8 18:11:16 dy-base  []
> > :e1000:e1000_alloc_rx_buffers+0x1e7/0x2cb
> > Jan  8 18:11:16 dy-base  []
netif_receive_skb+0x1ee/0x255
> > Jan  8 18:11:16 dy-base  [] process_backlog+0x8a/0x10f
> > Jan  8 18:11:16 dy-base  [] net_rx_action+0xa9/0x16e
> > Jan  8 18:11:16 dy-base  [] __do_softirq+0x57/0xc7
> > Jan  8 18:11:16 dy-base  [] call_softirq+0x1c/0x28
> > Jan  8 18:11:16 dy-base  [] do_softirq+0x34/0x87
> > Jan  8 18:11:16 dy-base  [] irq_exit+0x3f/0x41
> > Jan  8 18:11:16 dy-base  [] do_IRQ+0xa9/0xc7
> > Jan  8 18:11:16 dy-base  [] ret_from_intr+0x0/0xa
> ..
> > > (i.e. I suspect this is a nbd problem, not an XFS problem)
>
> There's something seriously wrong in your kernel that has, AFAICT,
> nothing to do with XFS. I suggest talking to the NBD folk as that is
> the only unusualy thing that I can see that you are using
>
> Cheers,

OK, i will try

Thanks a lot,

Janos


>
> Dave.
> -- 
> Dave Chinner
> Principal Engineer
> SGI Australian Software Group
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: xfslogd-spinlock bug?

2007-01-11 Thread Janos Haar

- Original Message - 
From: David Chinner [EMAIL PROTECTED]
To: Janos Haar [EMAIL PROTECTED]
Cc: David Chinner [EMAIL PROTECTED]; [EMAIL PROTECTED];
linux-kernel@vger.kernel.org
Sent: Thursday, January 11, 2007 4:34 AM
Subject: Re: xfslogd-spinlock bug?


 On Wed, Jan 10, 2007 at 06:18:08PM +0100, Janos Haar wrote:
  From: David Chinner [EMAIL PROTECTED]
   Different corruption in RBX here. Looks like semi-random garbage
there.
   I wonder - what's the mac and ip address(es) of your machine and nbd
   servers?
 
  dy-base:

 no matches. Oh well, it was a long shot.

  Some new stuff:
  Jan  8 18:11:16 dy-base RAX: 80810800 RBX: 04f0e2850659 RCX:

 RBX trashed again with more random garbage.

  Jan  8 18:11:16 dy-base  [80223f81]
default_wake_function+0xd/0xf
  Jan  8 18:11:16 dy-base  [8023c81d]
  autoremove_wake_function+0x11/0x38
  Jan  8 18:11:16 dy-base  [802225b1] __wake_up_common+0x3e/0x68
  Jan  8 18:11:16 dy-base  [80222ac9] __wake_up+0x38/0x50
  Jan  8 18:11:16 dy-base  [80569d1c]
  sk_stream_write_space+0x5d/0x83
  Jan  8 18:11:16 dy-base  [80591afd] tcp_check_space+0x8f/0xcd
  Jan  8 18:11:16 dy-base  [80596de0]
  tcp_rcv_established+0x116/0x76e
  Jan  8 18:11:16 dy-base  [8059ce41] tcp_v4_do_rcv+0x2d/0x322
  Jan  8 18:11:16 dy-base  [8059f59e] tcp_v4_rcv+0x8bb/0x925
  Jan  8 18:11:16 dy-base  [80583b24]
  ip_local_deliver_finish+0x0/0x1ce
  Jan  8 18:11:16 dy-base  [805843e6]
ip_local_deliver+0x172/0x238
  Jan  8 18:11:16 dy-base  [8058422c] ip_rcv+0x44f/0x497
  Jan  8 18:11:16 dy-base  [88004bbc]
  :e1000:e1000_alloc_rx_buffers+0x1e7/0x2cb
  Jan  8 18:11:16 dy-base  [8056c0a6]
netif_receive_skb+0x1ee/0x255
  Jan  8 18:11:16 dy-base  [8056de0c] process_backlog+0x8a/0x10f
  Jan  8 18:11:16 dy-base  [8056e060] net_rx_action+0xa9/0x16e
  Jan  8 18:11:16 dy-base  [8022edf9] __do_softirq+0x57/0xc7
  Jan  8 18:11:16 dy-base  [8020aa1c] call_softirq+0x1c/0x28
  Jan  8 18:11:16 dy-base  [8020c68a] do_softirq+0x34/0x87
  Jan  8 18:11:16 dy-base  [8022ed0f] irq_exit+0x3f/0x41
  Jan  8 18:11:16 dy-base  [8020c786] do_IRQ+0xa9/0xc7
  Jan  8 18:11:16 dy-base  [80209e11] ret_from_intr+0x0/0xa
 ..
   (i.e. I suspect this is a nbd problem, not an XFS problem)

 There's something seriously wrong in your kernel that has, AFAICT,
 nothing to do with XFS. I suggest talking to the NBD folk as that is
 the only unusualy thing that I can see that you are using

 Cheers,

OK, i will try

Thanks a lot,

Janos



 Dave.
 -- 
 Dave Chinner
 Principal Engineer
 SGI Australian Software Group
 -
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: xfslogd-spinlock bug?

2007-01-10 Thread David Chinner
On Wed, Jan 10, 2007 at 06:18:08PM +0100, Janos Haar wrote:
> From: "David Chinner" <[EMAIL PROTECTED]>
> > Different corruption in RBX here. Looks like semi-random garbage there.
> > I wonder - what's the mac and ip address(es) of your machine and nbd
> > servers?
> 
> dy-base:

no matches. Oh well, it was a long shot.

> Some new stuff:
> Jan  8 18:11:16 dy-base RAX: 80810800 RBX: 04f0e2850659 RCX:

RBX trashed again with more random garbage.

> Jan  8 18:11:16 dy-base  [] default_wake_function+0xd/0xf
> Jan  8 18:11:16 dy-base  []
> autoremove_wake_function+0x11/0x38
> Jan  8 18:11:16 dy-base  [] __wake_up_common+0x3e/0x68
> Jan  8 18:11:16 dy-base  [] __wake_up+0x38/0x50
> Jan  8 18:11:16 dy-base  []
> sk_stream_write_space+0x5d/0x83
> Jan  8 18:11:16 dy-base  [] tcp_check_space+0x8f/0xcd
> Jan  8 18:11:16 dy-base  []
> tcp_rcv_established+0x116/0x76e
> Jan  8 18:11:16 dy-base  [] tcp_v4_do_rcv+0x2d/0x322
> Jan  8 18:11:16 dy-base  [] tcp_v4_rcv+0x8bb/0x925
> Jan  8 18:11:16 dy-base  []
> ip_local_deliver_finish+0x0/0x1ce
> Jan  8 18:11:16 dy-base  [] ip_local_deliver+0x172/0x238
> Jan  8 18:11:16 dy-base  [] ip_rcv+0x44f/0x497
> Jan  8 18:11:16 dy-base  []
> :e1000:e1000_alloc_rx_buffers+0x1e7/0x2cb
> Jan  8 18:11:16 dy-base  [] netif_receive_skb+0x1ee/0x255
> Jan  8 18:11:16 dy-base  [] process_backlog+0x8a/0x10f
> Jan  8 18:11:16 dy-base  [] net_rx_action+0xa9/0x16e
> Jan  8 18:11:16 dy-base  [] __do_softirq+0x57/0xc7
> Jan  8 18:11:16 dy-base  [] call_softirq+0x1c/0x28
> Jan  8 18:11:16 dy-base  [] do_softirq+0x34/0x87
> Jan  8 18:11:16 dy-base  [] irq_exit+0x3f/0x41
> Jan  8 18:11:16 dy-base  [] do_IRQ+0xa9/0xc7
> Jan  8 18:11:16 dy-base  [] ret_from_intr+0x0/0xa
..
> > (i.e. I suspect this is a nbd problem, not an XFS problem)

There's something seriously wrong in your kernel that has, AFAICT,
nothing to do with XFS. I suggest talking to the NBD folk as that is
the only unusualy thing that I can see that you are using

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: xfslogd-spinlock bug?

2007-01-10 Thread Janos Haar

- Original Message - 
From: "David Chinner" <[EMAIL PROTECTED]>
To: "Haar János" <[EMAIL PROTECTED]>
Cc: "David Chinner" <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>;

Sent: Monday, January 08, 2007 12:14 AM
Subject: Re: xfslogd-spinlock bug?


> On Wed, Dec 27, 2006 at 01:58:06PM +0100, Haar János wrote:
> > Hello,
> >
> > - Original Message - 
> > From: "David Chinner" <[EMAIL PROTECTED]>
> > To: "David Chinner" <[EMAIL PROTECTED]>
> > Cc: "Haar János" <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>;
> > 
> > Sent: Tuesday, December 19, 2006 5:47 AM
> > Subject: Re: xfslogd-spinlock bug?
> >
> >
> > > On Tue, Dec 19, 2006 at 01:52:29PM +1100, David Chinner wrote:
> > >
> > > The filesystem was being shutdown so xfs_inode_item_destroy() just
> > > frees the inode log item without removing it from the AIL. I'll fix
that,
> > > and see if i have any luck
> > >
> > > So I'd still try that patch i sent in the previous email...
> >
> > I still using the patch, but didnt shows any messages at this point.
> >
> > I'v got 3 crash/reboot, but 2 causes nbd disconneted, and this one:
> >
> > Dec 27 13:41:29 dy-base BUG: warning at
> > kernel/mutex.c:220/__mutex_unlock_common_slowpath()
> > Dec 27 13:41:29 dy-base Unable to handle kernel paging request at
> > 66604480 RIP:
> > Dec 27 13:41:29 dy-base  [] resched_task+0x12/0x64
> > Dec 27 13:41:29 dy-base PGD 115246067 PUD 0
> > Dec 27 13:41:29 dy-base Oops:  [1] SMP
> > Dec 27 13:41:29 dy-base CPU 1
> > Dec 27 13:41:29 dy-base Modules linked in: nbd rd netconsole e1000 video
> > Dec 27 13:41:29 dy-base Pid: 4069, comm: httpd Not tainted 2.6.19 #3
> > Dec 27 13:41:29 dy-base RIP: 0010:[]
[]
> > resched_task+0x12/0x64
> > Dec 27 13:41:29 dy-base RSP: 0018:810105c01b78  EFLAGS: 00010083
> > Dec 27 13:41:29 dy-base RAX: 807d5800 RBX: 1749fd97c214 RCX:
>
> Different corruption in RBX here. Looks like semi-random garbage there.
> I wonder - what's the mac and ip address(es) of your machine and nbd
> servers?

dy-base:
eth0  Link encap:Ethernet  HWaddr 00:90:27:A2:7B:8B
eth0:1Link encap:Ethernet  HWaddr 00:90:27:A2:7B:8B
eth0:2Link encap:Ethernet  HWaddr 00:90:27:A2:7B:8B
eth1  Link encap:Ethernet  HWaddr 00:07:E9:32:E6:D8
eth1:1Link encap:Ethernet  HWaddr 00:07:E9:32:E6:D8
eth1:2Link encap:Ethernet  HWaddr 00:07:E9:32:E6:D8
eth2  Link encap:Ethernet  HWaddr 00:07:E9:32:E6:D9

node1-4:

00:0E:0C:A0:E5:7E
00:0E:0C:A0:EF:5E
00:0E:0C:A0:E9:58
00:0E:0C:A0:EF:A3

Some new stuff:
Jan  8 18:11:16 dy-base BUG: warning at
kernel/mutex.c:220/__mutex_unlock_common_slowpath()
Jan  8 18:11:16 dy-base Unable to handle kernel NULL pointer dereference at
0008 RIP:
Jan  8 18:11:16 dy-base  [] resched_task+0x1a/0x64
Jan  8 18:11:16 dy-base PGD 9859d067 PUD 4e347067 PMD 0
Jan  8 18:11:16 dy-base Oops:  [1] SMP
Jan  8 18:11:16 dy-base CPU 3
Jan  8 18:11:16 dy-base Modules linked in: nbd rd netconsole e1000
Jan  8 18:11:16 dy-base Pid: 3471, comm: ls Not tainted 2.6.19 #4
Jan  8 18:11:16 dy-base RIP: 0010:[]  []
resched_task+0x1a/0x64
Jan  8 18:11:16 dy-base RSP: :81011fd1fb10  EFLAGS: 00010097
Jan  8 18:11:16 dy-base RAX: 80810800 RBX: 04f0e2850659 RCX:
81010c0a2000
Jan  8 18:11:16 dy-base RDX:  RSI: 81000583c368 RDI:
81002b809830
Jan  8 18:11:16 dy-base RBP: 81011fd1fb10 R08:  R09:
0080
Jan  8 18:11:16 dy-base R10: 0080 R11: 81000584d280 R12:
8100d852a7b0
Jan  8 18:11:16 dy-base R13: 0003 R14: 0001 R15:
0001
Jan  8 18:11:16 dy-base FS:  2b69786c8a90()
GS:81011fcb98c0() knlGS:
Jan  8 18:11:16 dy-base CS:  0010 DS:  ES:  CR0: 8005003b
Jan  8 18:11:16 dy-base CR2: 0008 CR3: 56ec5000 CR4:
06e0
Jan  8 18:11:16 dy-base Process ls (pid: 3471, threadinfo 810080d8,
task 8100dcdea770)
Jan  8 18:11:16 dy-base Stack:  81011fd1fb90 80223f3f
81011fd1fb50 0282
Jan  8 18:11:16 dy-base  0001 0001 81000583ba00
0003032ca4b8
Jan  8 18:11:16 dy-base  000a 0082 810103539d00
810013867bf8
Jan  8 18:11:16 dy-base Call Trace:
Jan  8 18:11:16 dy-base[]
try_to_wake_up+0x3a7/0x3dc
Jan  8 18:11:16 dy-base  [] default_wake_function+0xd/0xf
Jan  8 18:11:16 dy-base  []
autoremove_wake_function+0x11/0x38
Jan  8 18:11:16 dy-base  [] __wake_up_common+0x3e/0x68
Jan  8 18:11:16 dy-base  [] __wake_up+0x38/0x50
Jan  8 18:11:16 dy-base  []
sk_stream_write_space+0x5d

Re: xfslogd-spinlock bug?

2007-01-10 Thread Janos Haar

- Original Message - 
From: David Chinner [EMAIL PROTECTED]
To: Haar János [EMAIL PROTECTED]
Cc: David Chinner [EMAIL PROTECTED]; [EMAIL PROTECTED];
linux-kernel@vger.kernel.org
Sent: Monday, January 08, 2007 12:14 AM
Subject: Re: xfslogd-spinlock bug?


 On Wed, Dec 27, 2006 at 01:58:06PM +0100, Haar János wrote:
  Hello,
 
  - Original Message - 
  From: David Chinner [EMAIL PROTECTED]
  To: David Chinner [EMAIL PROTECTED]
  Cc: Haar János [EMAIL PROTECTED]; [EMAIL PROTECTED];
  linux-kernel@vger.kernel.org
  Sent: Tuesday, December 19, 2006 5:47 AM
  Subject: Re: xfslogd-spinlock bug?
 
 
   On Tue, Dec 19, 2006 at 01:52:29PM +1100, David Chinner wrote:
  
   The filesystem was being shutdown so xfs_inode_item_destroy() just
   frees the inode log item without removing it from the AIL. I'll fix
that,
   and see if i have any luck
  
   So I'd still try that patch i sent in the previous email...
 
  I still using the patch, but didnt shows any messages at this point.
 
  I'v got 3 crash/reboot, but 2 causes nbd disconneted, and this one:
 
  Dec 27 13:41:29 dy-base BUG: warning at
  kernel/mutex.c:220/__mutex_unlock_common_slowpath()
  Dec 27 13:41:29 dy-base Unable to handle kernel paging request at
  66604480 RIP:
  Dec 27 13:41:29 dy-base  [80222c64] resched_task+0x12/0x64
  Dec 27 13:41:29 dy-base PGD 115246067 PUD 0
  Dec 27 13:41:29 dy-base Oops:  [1] SMP
  Dec 27 13:41:29 dy-base CPU 1
  Dec 27 13:41:29 dy-base Modules linked in: nbd rd netconsole e1000 video
  Dec 27 13:41:29 dy-base Pid: 4069, comm: httpd Not tainted 2.6.19 #3
  Dec 27 13:41:29 dy-base RIP: 0010:[80222c64]
[80222c64]
  resched_task+0x12/0x64
  Dec 27 13:41:29 dy-base RSP: 0018:810105c01b78  EFLAGS: 00010083
  Dec 27 13:41:29 dy-base RAX: 807d5800 RBX: 1749fd97c214 RCX:

 Different corruption in RBX here. Looks like semi-random garbage there.
 I wonder - what's the mac and ip address(es) of your machine and nbd
 servers?

dy-base:
eth0  Link encap:Ethernet  HWaddr 00:90:27:A2:7B:8B
eth0:1Link encap:Ethernet  HWaddr 00:90:27:A2:7B:8B
eth0:2Link encap:Ethernet  HWaddr 00:90:27:A2:7B:8B
eth1  Link encap:Ethernet  HWaddr 00:07:E9:32:E6:D8
eth1:1Link encap:Ethernet  HWaddr 00:07:E9:32:E6:D8
eth1:2Link encap:Ethernet  HWaddr 00:07:E9:32:E6:D8
eth2  Link encap:Ethernet  HWaddr 00:07:E9:32:E6:D9

node1-4:

00:0E:0C:A0:E5:7E
00:0E:0C:A0:EF:5E
00:0E:0C:A0:E9:58
00:0E:0C:A0:EF:A3

Some new stuff:
Jan  8 18:11:16 dy-base BUG: warning at
kernel/mutex.c:220/__mutex_unlock_common_slowpath()
Jan  8 18:11:16 dy-base Unable to handle kernel NULL pointer dereference at
0008 RIP:
Jan  8 18:11:16 dy-base  [80222c74] resched_task+0x1a/0x64
Jan  8 18:11:16 dy-base PGD 9859d067 PUD 4e347067 PMD 0
Jan  8 18:11:16 dy-base Oops:  [1] SMP
Jan  8 18:11:16 dy-base CPU 3
Jan  8 18:11:16 dy-base Modules linked in: nbd rd netconsole e1000
Jan  8 18:11:16 dy-base Pid: 3471, comm: ls Not tainted 2.6.19 #4
Jan  8 18:11:16 dy-base RIP: 0010:[80222c74]  [80222c74]
resched_task+0x1a/0x64
Jan  8 18:11:16 dy-base RSP: :81011fd1fb10  EFLAGS: 00010097
Jan  8 18:11:16 dy-base RAX: 80810800 RBX: 04f0e2850659 RCX:
81010c0a2000
Jan  8 18:11:16 dy-base RDX:  RSI: 81000583c368 RDI:
81002b809830
Jan  8 18:11:16 dy-base RBP: 81011fd1fb10 R08:  R09:
0080
Jan  8 18:11:16 dy-base R10: 0080 R11: 81000584d280 R12:
8100d852a7b0
Jan  8 18:11:16 dy-base R13: 0003 R14: 0001 R15:
0001
Jan  8 18:11:16 dy-base FS:  2b69786c8a90()
GS:81011fcb98c0() knlGS:
Jan  8 18:11:16 dy-base CS:  0010 DS:  ES:  CR0: 8005003b
Jan  8 18:11:16 dy-base CR2: 0008 CR3: 56ec5000 CR4:
06e0
Jan  8 18:11:16 dy-base Process ls (pid: 3471, threadinfo 810080d8,
task 8100dcdea770)
Jan  8 18:11:16 dy-base Stack:  81011fd1fb90 80223f3f
81011fd1fb50 0282
Jan  8 18:11:16 dy-base  0001 0001 81000583ba00
0003032ca4b8
Jan  8 18:11:16 dy-base  000a 0082 810103539d00
810013867bf8
Jan  8 18:11:16 dy-base Call Trace:
Jan  8 18:11:16 dy-base  IRQ  [80223f3f]
try_to_wake_up+0x3a7/0x3dc
Jan  8 18:11:16 dy-base  [80223f81] default_wake_function+0xd/0xf
Jan  8 18:11:16 dy-base  [8023c81d]
autoremove_wake_function+0x11/0x38
Jan  8 18:11:16 dy-base  [802225b1] __wake_up_common+0x3e/0x68
Jan  8 18:11:16 dy-base  [80222ac9] __wake_up+0x38/0x50
Jan  8 18:11:16 dy-base  [80569d1c]
sk_stream_write_space+0x5d/0x83
Jan  8 18:11:16 dy-base  [80591afd] tcp_check_space+0x8f/0xcd
Jan  8 18:11:16 dy-base  [80596de0]
tcp_rcv_established+0x116/0x76e
Jan  8 18:11:16 dy-base  [8059ce41] tcp_v4_do_rcv+0x2d/0x322
Jan  8 18:11:16

Re: xfslogd-spinlock bug?

2007-01-10 Thread David Chinner
On Wed, Jan 10, 2007 at 06:18:08PM +0100, Janos Haar wrote:
 From: David Chinner [EMAIL PROTECTED]
  Different corruption in RBX here. Looks like semi-random garbage there.
  I wonder - what's the mac and ip address(es) of your machine and nbd
  servers?
 
 dy-base:

no matches. Oh well, it was a long shot.

 Some new stuff:
 Jan  8 18:11:16 dy-base RAX: 80810800 RBX: 04f0e2850659 RCX:

RBX trashed again with more random garbage.

 Jan  8 18:11:16 dy-base  [80223f81] default_wake_function+0xd/0xf
 Jan  8 18:11:16 dy-base  [8023c81d]
 autoremove_wake_function+0x11/0x38
 Jan  8 18:11:16 dy-base  [802225b1] __wake_up_common+0x3e/0x68
 Jan  8 18:11:16 dy-base  [80222ac9] __wake_up+0x38/0x50
 Jan  8 18:11:16 dy-base  [80569d1c]
 sk_stream_write_space+0x5d/0x83
 Jan  8 18:11:16 dy-base  [80591afd] tcp_check_space+0x8f/0xcd
 Jan  8 18:11:16 dy-base  [80596de0]
 tcp_rcv_established+0x116/0x76e
 Jan  8 18:11:16 dy-base  [8059ce41] tcp_v4_do_rcv+0x2d/0x322
 Jan  8 18:11:16 dy-base  [8059f59e] tcp_v4_rcv+0x8bb/0x925
 Jan  8 18:11:16 dy-base  [80583b24]
 ip_local_deliver_finish+0x0/0x1ce
 Jan  8 18:11:16 dy-base  [805843e6] ip_local_deliver+0x172/0x238
 Jan  8 18:11:16 dy-base  [8058422c] ip_rcv+0x44f/0x497
 Jan  8 18:11:16 dy-base  [88004bbc]
 :e1000:e1000_alloc_rx_buffers+0x1e7/0x2cb
 Jan  8 18:11:16 dy-base  [8056c0a6] netif_receive_skb+0x1ee/0x255
 Jan  8 18:11:16 dy-base  [8056de0c] process_backlog+0x8a/0x10f
 Jan  8 18:11:16 dy-base  [8056e060] net_rx_action+0xa9/0x16e
 Jan  8 18:11:16 dy-base  [8022edf9] __do_softirq+0x57/0xc7
 Jan  8 18:11:16 dy-base  [8020aa1c] call_softirq+0x1c/0x28
 Jan  8 18:11:16 dy-base  [8020c68a] do_softirq+0x34/0x87
 Jan  8 18:11:16 dy-base  [8022ed0f] irq_exit+0x3f/0x41
 Jan  8 18:11:16 dy-base  [8020c786] do_IRQ+0xa9/0xc7
 Jan  8 18:11:16 dy-base  [80209e11] ret_from_intr+0x0/0xa
..
  (i.e. I suspect this is a nbd problem, not an XFS problem)

There's something seriously wrong in your kernel that has, AFAICT,
nothing to do with XFS. I suggest talking to the NBD folk as that is
the only unusualy thing that I can see that you are using

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: xfslogd-spinlock bug?

2007-01-07 Thread David Chinner
On Wed, Dec 27, 2006 at 01:58:06PM +0100, Haar János wrote:
> Hello,
> 
> - Original Message - 
> From: "David Chinner" <[EMAIL PROTECTED]>
> To: "David Chinner" <[EMAIL PROTECTED]>
> Cc: "Haar János" <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>;
> 
> Sent: Tuesday, December 19, 2006 5:47 AM
> Subject: Re: xfslogd-spinlock bug?
> 
> 
> > On Tue, Dec 19, 2006 at 01:52:29PM +1100, David Chinner wrote:
> >
> > The filesystem was being shutdown so xfs_inode_item_destroy() just
> > frees the inode log item without removing it from the AIL. I'll fix that,
> > and see if i have any luck
> >
> > So I'd still try that patch i sent in the previous email...
> 
> I still using the patch, but didnt shows any messages at this point.
> 
> I'v got 3 crash/reboot, but 2 causes nbd disconneted, and this one:
> 
> Dec 27 13:41:29 dy-base BUG: warning at
> kernel/mutex.c:220/__mutex_unlock_common_slowpath()
> Dec 27 13:41:29 dy-base Unable to handle kernel paging request at
> 66604480 RIP:
> Dec 27 13:41:29 dy-base  [] resched_task+0x12/0x64
> Dec 27 13:41:29 dy-base PGD 115246067 PUD 0
> Dec 27 13:41:29 dy-base Oops:  [1] SMP
> Dec 27 13:41:29 dy-base CPU 1
> Dec 27 13:41:29 dy-base Modules linked in: nbd rd netconsole e1000 video
> Dec 27 13:41:29 dy-base Pid: 4069, comm: httpd Not tainted 2.6.19 #3
> Dec 27 13:41:29 dy-base RIP: 0010:[]  []
> resched_task+0x12/0x64
> Dec 27 13:41:29 dy-base RSP: 0018:810105c01b78  EFLAGS: 00010083
> Dec 27 13:41:29 dy-base RAX: 807d5800 RBX: 1749fd97c214 RCX:

Different corruption in RBX here. Looks like semi-random garbage there.
I wonder - what's the mac and ip address(es) of your machine and nbd
servers?

(i.e. I suspect this is a nbd problem, not an XFS problem)

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: xfslogd-spinlock bug?

2007-01-07 Thread David Chinner
On Wed, Dec 27, 2006 at 01:58:06PM +0100, Haar János wrote:
 Hello,
 
 - Original Message - 
 From: David Chinner [EMAIL PROTECTED]
 To: David Chinner [EMAIL PROTECTED]
 Cc: Haar János [EMAIL PROTECTED]; [EMAIL PROTECTED];
 linux-kernel@vger.kernel.org
 Sent: Tuesday, December 19, 2006 5:47 AM
 Subject: Re: xfslogd-spinlock bug?
 
 
  On Tue, Dec 19, 2006 at 01:52:29PM +1100, David Chinner wrote:
 
  The filesystem was being shutdown so xfs_inode_item_destroy() just
  frees the inode log item without removing it from the AIL. I'll fix that,
  and see if i have any luck
 
  So I'd still try that patch i sent in the previous email...
 
 I still using the patch, but didnt shows any messages at this point.
 
 I'v got 3 crash/reboot, but 2 causes nbd disconneted, and this one:
 
 Dec 27 13:41:29 dy-base BUG: warning at
 kernel/mutex.c:220/__mutex_unlock_common_slowpath()
 Dec 27 13:41:29 dy-base Unable to handle kernel paging request at
 66604480 RIP:
 Dec 27 13:41:29 dy-base  [80222c64] resched_task+0x12/0x64
 Dec 27 13:41:29 dy-base PGD 115246067 PUD 0
 Dec 27 13:41:29 dy-base Oops:  [1] SMP
 Dec 27 13:41:29 dy-base CPU 1
 Dec 27 13:41:29 dy-base Modules linked in: nbd rd netconsole e1000 video
 Dec 27 13:41:29 dy-base Pid: 4069, comm: httpd Not tainted 2.6.19 #3
 Dec 27 13:41:29 dy-base RIP: 0010:[80222c64]  [80222c64]
 resched_task+0x12/0x64
 Dec 27 13:41:29 dy-base RSP: 0018:810105c01b78  EFLAGS: 00010083
 Dec 27 13:41:29 dy-base RAX: 807d5800 RBX: 1749fd97c214 RCX:

Different corruption in RBX here. Looks like semi-random garbage there.
I wonder - what's the mac and ip address(es) of your machine and nbd
servers?

(i.e. I suspect this is a nbd problem, not an XFS problem)

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: xfslogd-spinlock bug?

2006-12-27 Thread Haar János
Hello,

- Original Message - 
From: "David Chinner" <[EMAIL PROTECTED]>
To: "David Chinner" <[EMAIL PROTECTED]>
Cc: "Haar János" <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>;

Sent: Tuesday, December 19, 2006 5:47 AM
Subject: Re: xfslogd-spinlock bug?


> On Tue, Dec 19, 2006 at 01:52:29PM +1100, David Chinner wrote:
> > On Tue, Dec 19, 2006 at 12:39:46AM +0100, Haar János wrote:
> > > From: "David Chinner" <[EMAIL PROTECTED]>
> > > > #define POISON_FREE 0x6b
> > > >
> > > > Can you confirm that you are running with CONFIG_DEBUG_SLAB=y?
> > >
> > > Yes, i build with this option enabled.
>
> ..
>
> > FWIW, I've run XFSQA twice now on a scsi disk with slab debuggin turned
> > on and I haven't seen this problem. I'm not sure how to track down
> > the source of the problem without a test case, but as a quick test, can
> > you try the following patch?
>
> Third try an I got a crash on a poisoned object:
>
> [1]kdb> md8c40 e0300d7d5100
> 0xe0300d7d5100 5a2cf071    q.,Z
> 0xe0300d7d5110 5a2cf071 6b6b6b6b6b6b6b6b   q.,Z
> 0xe0300d7d5120 e039eb7b6320 6b6b6b6b6b6b6b6bc{.9...
> 0xe0300d7d5130 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b   
> 0xe0300d7d5140 6b6b6b6f6b6b6b6b 6b6b6b6b6b6b6b6b   okkk
> 0xe0300d7d5150 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b   
> 0xe0300d7d5160 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b   
> 0xe0300d7d5170 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b   
> 0xe0300d7d5180 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b   
> 0xe0300d7d5190 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b   
> 0xe0300d7d51a0 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b   
> 0xe0300d7d51b0 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b   
> 0xe0300d7d51c0 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b   
> 0xe0300d7d51d0 6b6b6b6b6b6b6b6b a56b6b6b6b6b6b6b   kkk.
> 0xe0300d7d51e0 5a2cf071 a00100468c30   q.,Z0.F.
> [1]kdb> mds 0xe0300d7d51e0
> 0xe0300d7d51e0 5a2cf071   q.,Z
> 0xe0300d7d51e8 a00100468c30 xfs_inode_item_destroy+0x30
>
> So the use-after-free here is on an inode item. You're tripping
> over a buffer item.
>
> Unfortunately, it is not the same problem - the problem I've just
> hit is to do with a QA test that does a forced shutdown on an active
> filesystem, and:
>
> [1]kdb> xmount 0xe0304393e238
> .
> flags 0x440010 
>
> The filesystem was being shutdown so xfs_inode_item_destroy() just
> frees the inode log item without removing it from the AIL. I'll fix that,
> and see if i have any luck
>
> So I'd still try that patch i sent in the previous email...

I still using the patch, but didnt shows any messages at this point.

I'v got 3 crash/reboot, but 2 causes nbd disconneted, and this one:

Dec 27 13:41:29 dy-base BUG: warning at
kernel/mutex.c:220/__mutex_unlock_common_slowpath()
Dec 27 13:41:29 dy-base Unable to handle kernel paging request at
66604480 RIP:
Dec 27 13:41:29 dy-base  [] resched_task+0x12/0x64
Dec 27 13:41:29 dy-base PGD 115246067 PUD 0
Dec 27 13:41:29 dy-base Oops:  [1] SMP
Dec 27 13:41:29 dy-base CPU 1
Dec 27 13:41:29 dy-base Modules linked in: nbd rd netconsole e1000 video
Dec 27 13:41:29 dy-base Pid: 4069, comm: httpd Not tainted 2.6.19 #3
Dec 27 13:41:29 dy-base RIP: 0010:[]  []
resched_task+0x12/0x64
Dec 27 13:41:29 dy-base RSP: 0018:810105c01b78  EFLAGS: 00010083
Dec 27 13:41:29 dy-base RAX: 807d5800 RBX: 1749fd97c214 RCX:
81001cbd
Dec 27 13:41:29 dy-base RDX: 1cbd0048 RSI: 810005834068 RDI:
8101047bf040
Dec 27 13:41:29 dy-base RBP: 810105c01b78 R08: 0001 R09:

Dec 27 13:41:29 dy-base R10: 0057 R11: 81000583cd80 R12:
810116693140
Dec 27 13:41:29 dy-base R13: 0001 R14:  R15:

Dec 27 13:41:29 dy-base FS:  2ba3c1ad07d0()
GS:81011fc769c8() knlGS:
Dec 27 13:41:29 dy-base CS:  0010 DS:  ES:  CR0: 80050033
Dec 27 13:41:29 dy-base CR2: 66604480 CR3: 000118196000 CR4:
06e0
Dec 27 13:41:29 dy-base Process httpd (pid: 4069, threadinfo
810105c0, task 8101166e1040)
Dec 27 13:41:29 dy-base Stack:  810105c01bf8 80223f37
810002996a00 02cba600
Dec 27 13:41:29 dy-base  000f 0001 810005833700
0001
Dec 27 13:41:29 dy-base  0005 0296 810105c01bd8
810117fef690
Dec 27 13:41:29 dy-base Call Trace:
Dec 27 13:41:29 dy-base  [] tr

Re: xfslogd-spinlock bug?

2006-12-27 Thread Haar János
Hello,

- Original Message - 
From: David Chinner [EMAIL PROTECTED]
To: David Chinner [EMAIL PROTECTED]
Cc: Haar János [EMAIL PROTECTED]; [EMAIL PROTECTED];
linux-kernel@vger.kernel.org
Sent: Tuesday, December 19, 2006 5:47 AM
Subject: Re: xfslogd-spinlock bug?


 On Tue, Dec 19, 2006 at 01:52:29PM +1100, David Chinner wrote:
  On Tue, Dec 19, 2006 at 12:39:46AM +0100, Haar János wrote:
   From: David Chinner [EMAIL PROTECTED]
#define POISON_FREE 0x6b
   
Can you confirm that you are running with CONFIG_DEBUG_SLAB=y?
  
   Yes, i build with this option enabled.

 ..

  FWIW, I've run XFSQA twice now on a scsi disk with slab debuggin turned
  on and I haven't seen this problem. I'm not sure how to track down
  the source of the problem without a test case, but as a quick test, can
  you try the following patch?

 Third try an I got a crash on a poisoned object:

 [1]kdb md8c40 e0300d7d5100
 0xe0300d7d5100 5a2cf071    q.,Z
 0xe0300d7d5110 5a2cf071 6b6b6b6b6b6b6b6b   q.,Z
 0xe0300d7d5120 e039eb7b6320 6b6b6b6b6b6b6b6bc{.9...
 0xe0300d7d5130 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b   
 0xe0300d7d5140 6b6b6b6f6b6b6b6b 6b6b6b6b6b6b6b6b   okkk
 0xe0300d7d5150 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b   
 0xe0300d7d5160 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b   
 0xe0300d7d5170 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b   
 0xe0300d7d5180 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b   
 0xe0300d7d5190 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b   
 0xe0300d7d51a0 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b   
 0xe0300d7d51b0 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b   
 0xe0300d7d51c0 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b   
 0xe0300d7d51d0 6b6b6b6b6b6b6b6b a56b6b6b6b6b6b6b   kkk.
 0xe0300d7d51e0 5a2cf071 a00100468c30   q.,Z0.F.
 [1]kdb mds 0xe0300d7d51e0
 0xe0300d7d51e0 5a2cf071   q.,Z
 0xe0300d7d51e8 a00100468c30 xfs_inode_item_destroy+0x30

 So the use-after-free here is on an inode item. You're tripping
 over a buffer item.

 Unfortunately, it is not the same problem - the problem I've just
 hit is to do with a QA test that does a forced shutdown on an active
 filesystem, and:

 [1]kdb xmount 0xe0304393e238
 .
 flags 0x440010 FSSHUTDOWN IDELETE COMPAT_IOSIZE 

 The filesystem was being shutdown so xfs_inode_item_destroy() just
 frees the inode log item without removing it from the AIL. I'll fix that,
 and see if i have any luck

 So I'd still try that patch i sent in the previous email...

I still using the patch, but didnt shows any messages at this point.

I'v got 3 crash/reboot, but 2 causes nbd disconneted, and this one:

Dec 27 13:41:29 dy-base BUG: warning at
kernel/mutex.c:220/__mutex_unlock_common_slowpath()
Dec 27 13:41:29 dy-base Unable to handle kernel paging request at
66604480 RIP:
Dec 27 13:41:29 dy-base  [80222c64] resched_task+0x12/0x64
Dec 27 13:41:29 dy-base PGD 115246067 PUD 0
Dec 27 13:41:29 dy-base Oops:  [1] SMP
Dec 27 13:41:29 dy-base CPU 1
Dec 27 13:41:29 dy-base Modules linked in: nbd rd netconsole e1000 video
Dec 27 13:41:29 dy-base Pid: 4069, comm: httpd Not tainted 2.6.19 #3
Dec 27 13:41:29 dy-base RIP: 0010:[80222c64]  [80222c64]
resched_task+0x12/0x64
Dec 27 13:41:29 dy-base RSP: 0018:810105c01b78  EFLAGS: 00010083
Dec 27 13:41:29 dy-base RAX: 807d5800 RBX: 1749fd97c214 RCX:
81001cbd
Dec 27 13:41:29 dy-base RDX: 1cbd0048 RSI: 810005834068 RDI:
8101047bf040
Dec 27 13:41:29 dy-base RBP: 810105c01b78 R08: 0001 R09:

Dec 27 13:41:29 dy-base R10: 0057 R11: 81000583cd80 R12:
810116693140
Dec 27 13:41:29 dy-base R13: 0001 R14:  R15:

Dec 27 13:41:29 dy-base FS:  2ba3c1ad07d0()
GS:81011fc769c8() knlGS:
Dec 27 13:41:29 dy-base CS:  0010 DS:  ES:  CR0: 80050033
Dec 27 13:41:29 dy-base CR2: 66604480 CR3: 000118196000 CR4:
06e0
Dec 27 13:41:29 dy-base Process httpd (pid: 4069, threadinfo
810105c0, task 8101166e1040)
Dec 27 13:41:29 dy-base Stack:  810105c01bf8 80223f37
810002996a00 02cba600
Dec 27 13:41:29 dy-base  000f 0001 810005833700
0001
Dec 27 13:41:29 dy-base  0005 0296 810105c01bd8
810117fef690
Dec 27 13:41:29 dy-base Call Trace:
Dec 27 13:41:29 dy-base  [80223f37] try_to_wake_up+0x3a7/0x3dc
Dec 27 13:41:29 dy-base  [80223f98] wake_up_process+0x10/0x12
Dec 27 13:41:29 dy-base  [803c9186] xfsbufd_wakeup+0x34/0x61
Dec 27 13:41:29 dy-base  [8025cdf9] shrink_slab+0x64/0x163
Dec 27 13:41:29 dy-base

Re: xfslogd-spinlock bug?

2006-12-18 Thread David Chinner
On Tue, Dec 19, 2006 at 01:52:29PM +1100, David Chinner wrote:
> On Tue, Dec 19, 2006 at 12:39:46AM +0100, Haar János wrote:
> > From: "David Chinner" <[EMAIL PROTECTED]>
> > > #define POISON_FREE 0x6b
> > >
> > > Can you confirm that you are running with CONFIG_DEBUG_SLAB=y?
> > 
> > Yes, i build with this option enabled.

..

> FWIW, I've run XFSQA twice now on a scsi disk with slab debuggin turned
> on and I haven't seen this problem. I'm not sure how to track down
> the source of the problem without a test case, but as a quick test, can
> you try the following patch?

Third try an I got a crash on a poisoned object:

[1]kdb> md8c40 e0300d7d5100
0xe0300d7d5100 5a2cf071    q.,Z
0xe0300d7d5110 5a2cf071 6b6b6b6b6b6b6b6b   q.,Z
0xe0300d7d5120 e039eb7b6320 6b6b6b6b6b6b6b6bc{.9...
0xe0300d7d5130 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b   
0xe0300d7d5140 6b6b6b6f6b6b6b6b 6b6b6b6b6b6b6b6b   okkk
0xe0300d7d5150 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b   
0xe0300d7d5160 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b   
0xe0300d7d5170 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b   
0xe0300d7d5180 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b   
0xe0300d7d5190 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b   
0xe0300d7d51a0 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b   
0xe0300d7d51b0 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b   
0xe0300d7d51c0 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b   
0xe0300d7d51d0 6b6b6b6b6b6b6b6b a56b6b6b6b6b6b6b   kkk.
0xe0300d7d51e0 5a2cf071 a00100468c30   q.,Z0.F.
[1]kdb> mds 0xe0300d7d51e0
0xe0300d7d51e0 5a2cf071   q.,Z
0xe0300d7d51e8 a00100468c30 xfs_inode_item_destroy+0x30

So the use-after-free here is on an inode item. You're tripping
over a buffer item.

Unfortunately, it is not the same problem - the problem I've just
hit is to do with a QA test that does a forced shutdown on an active
filesystem, and:

[1]kdb> xmount 0xe0304393e238
.
flags 0x440010 

The filesystem was being shutdown so xfs_inode_item_destroy() just
frees the inode log item without removing it from the AIL. I'll fix that,
and see if i have any luck

So I'd still try that patch i sent in the previous email...

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: xfslogd-spinlock bug?

2006-12-18 Thread David Chinner
On Tue, Dec 19, 2006 at 12:39:46AM +0100, Haar János wrote:
> From: "David Chinner" <[EMAIL PROTECTED]>
> > On Mon, Dec 18, 2006 at 09:17:50AM +0100, Haar János wrote:
> > > From: "David Chinner" <[EMAIL PROTECTED]>
> > > > Ok, I've never heard of a problem like this before and you are doing
> > > > something that very few ppl are doing (i.e. XFS on NBD). I'd start
> > > > Hence  I'd start by suspecting a bug in the NBD driver.
> > >
> > > Ok, if you have right, this also can be in context with the following
> issue:
> > >
> > > http://download.netcenter.hu/bughunt/20061217/messages.txt   (10KB)
> >
> > Which appears to be a crash in wake_up_process() when doing memory
> > reclaim (waking the xfsbufd).
> 
> Sorry, can you translate it to "poor mans language"? :-)
> This is a different bug?

Don't know - it's a different crash, but once again one that I've
never heard of occurring before.

> > Ok, I've found this pattern:
> >
> > #define POISON_FREE 0x6b
> >
> > Can you confirm that you are running with CONFIG_DEBUG_SLAB=y?
> 
> Yes, i build with this option enabled.
> Is this wrong?

No, but it does slow your machine down.

> > If so, we have a use after free occurring here and it would also
> > explain why no-one has reported it before.
> >
> > FWIW, can you turn on CONFIG_XFS_DEBUG=y and see if that triggers
> > a different bug check prior to the above dump?
> 
> [EMAIL PROTECTED] linux-2.6.19]# make bzImage
> scripts/kconfig/conf -s arch/x86_64/Kconfig
> .config:7:warning: trying to assign nonexistent symbol XFS_DEBUG
> 
> I have missed something?

No - I forgot that config option doesn't exist in mainline XFS - it's
only in the dev tree.

FWIW, I've run XFSQA twice now on a scsi disk with slab debuggin turned
on and I haven't seen this problem. I'm not sure how to track down
the source of the problem without a test case, but as a quick test, can
you try the following patch?

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group


---
 fs/xfs/linux-2.6/xfs_buf.c |4 
 1 file changed, 4 insertions(+)

Index: 2.6.x-xfs-new/fs/xfs/linux-2.6/xfs_buf.c
===
--- 2.6.x-xfs-new.orig/fs/xfs/linux-2.6/xfs_buf.c   2006-12-19 
12:22:54.0 +1100
+++ 2.6.x-xfs-new/fs/xfs/linux-2.6/xfs_buf.c2006-12-19 13:48:36.937118569 
+1100
@@ -942,11 +942,14 @@ xfs_buf_unlock(
 /*
  * Pinning Buffer Storage in Memory
  * Ensure that no attempt to force a buffer to disk will succeed.
+ * Hold the buffer so we don't attempt to free it while it
+ * is pinned.
  */
 void
 xfs_buf_pin(
xfs_buf_t   *bp)
 {
+   xfs_buf_hold(bp);
atomic_inc(>b_pin_count);
XB_TRACE(bp, "pin", (long)bp->b_pin_count.counter);
 }
@@ -958,6 +961,7 @@ xfs_buf_unpin(
if (atomic_dec_and_test(>b_pin_count))
wake_up_all(>b_waiters);
XB_TRACE(bp, "unpin", (long)bp->b_pin_count.counter);
+   xfs_buf_rele(bp);
 }
 
 int
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: xfslogd-spinlock bug?

2006-12-18 Thread Haar János

- Original Message - 
From: "David Chinner" <[EMAIL PROTECTED]>
To: "Haar János" <[EMAIL PROTECTED]>
Cc: "David Chinner" <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>;

Sent: Monday, December 18, 2006 11:36 PM
Subject: Re: xfslogd-spinlock bug?


> On Mon, Dec 18, 2006 at 09:17:50AM +0100, Haar János wrote:
> > From: "David Chinner" <[EMAIL PROTECTED]>
> > > > The NBD serves through eth1, and it is on the CPU3, but the ide0 is
on
> > the
> > > > CPU0.
> > >
> > > I'd say your NBD based XFS filesystem is having trouble.
> > >
> > > > > Are you using XFS on a NBD?
> > > >
> > > > Yes, on the 3. source.
> > >
> > > Ok, I've never heard of a problem like this before and you are doing
> > > something that very few ppl are doing (i.e. XFS on NBD). I'd start
> > > Hence  I'd start by suspecting a bug in the NBD driver.
> >
> > Ok, if you have right, this also can be in context with the following
issue:
> >
> > http://download.netcenter.hu/bughunt/20061217/messages.txt   (10KB)
>
> Which appears to be a crash in wake_up_process() when doing memory
> reclaim (waking the xfsbufd).

Sorry, can you translate it to "poor mans language"? :-)
This is a different bug?


>
> > > > > > Dec 16 12:08:36 dy-base RSP: 0018:81011fdedbc0  EFLAGS:
00010002
> > > > > > Dec 16 12:08:36 dy-base RAX: 0033 RBX:
6b6b6b6b6b6b6b6b
> > RCX:
> > > > >

> > > > > Anyone recognise that pattern?
>
> Ok, I've found this pattern:
>
> #define POISON_FREE 0x6b
>
> Can you confirm that you are running with CONFIG_DEBUG_SLAB=y?

Yes, i build with this option enabled.
Is this wrong?

>
> If so, we have a use after free occurring here and it would also
> explain why no-one has reported it before.
>
> FWIW, can you turn on CONFIG_XFS_DEBUG=y and see if that triggers
> a different bug check prior to the above dump?

[EMAIL PROTECTED] linux-2.6.19]# make bzImage
scripts/kconfig/conf -s arch/x86_64/Kconfig
.config:7:warning: trying to assign nonexistent symbol XFS_DEBUG

I have missed something?


Thanks,

Janos

>
> Cheers,
>
> Dave.
> -- 
> Dave Chinner
> Principal Engineer
> SGI Australian Software Group

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: xfslogd-spinlock bug?

2006-12-18 Thread David Chinner
On Mon, Dec 18, 2006 at 09:17:50AM +0100, Haar János wrote:
> From: "David Chinner" <[EMAIL PROTECTED]>
> > > The NBD serves through eth1, and it is on the CPU3, but the ide0 is on
> the
> > > CPU0.
> >
> > I'd say your NBD based XFS filesystem is having trouble.
> >
> > > > Are you using XFS on a NBD?
> > >
> > > Yes, on the 3. source.
> >
> > Ok, I've never heard of a problem like this before and you are doing
> > something that very few ppl are doing (i.e. XFS on NBD). I'd start
> > Hence  I'd start by suspecting a bug in the NBD driver.
> 
> Ok, if you have right, this also can be in context with the following issue:
> 
> http://download.netcenter.hu/bughunt/20061217/messages.txt   (10KB)

Which appears to be a crash in wake_up_process() when doing memory
reclaim (waking the xfsbufd).

> > > > > Dec 16 12:08:36 dy-base RSP: 0018:81011fdedbc0  EFLAGS: 00010002
> > > > > Dec 16 12:08:36 dy-base RAX: 0033 RBX: 6b6b6b6b6b6b6b6b
> RCX:
> > > >  
> > > > Anyone recognise that pattern?

Ok, I've found this pattern:

#define POISON_FREE 0x6b

Can you confirm that you are running with CONFIG_DEBUG_SLAB=y?

If so, we have a use after free occurring here and it would also
explain why no-one has reported it before.

FWIW, can you turn on CONFIG_XFS_DEBUG=y and see if that triggers
a different bug check prior to the above dump?

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: xfslogd-spinlock bug?

2006-12-18 Thread Haar János

- Original Message - 
From: "David Chinner" <[EMAIL PROTECTED]>
To: "Haar János" <[EMAIL PROTECTED]>
Cc: "David Chinner" <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>;

Sent: Monday, December 18, 2006 7:24 AM
Subject: Re: xfslogd-spinlock bug?


> On Mon, Dec 18, 2006 at 12:56:41AM +0100, Haar János wrote:
> > > On Sat, Dec 16, 2006 at 12:19:45PM +0100, Haar János wrote:
> > > > I dont know there is a context between 2 messages, but i can see,
the
> > > > spinlock bug comes always on cpu #3.
> > > >
> > > > Somebody have any idea?
> > >
> > > Your disk interrupts are directed to CPU 3, and so log I/O completion
> > > occurs on that CPU.
> >
> >CPU0   CPU1   CPU2   CPU3
> >   0:100  0  04583704   IO-APIC-edge
timer
> >   1:  0  0  0  2   IO-APIC-edge
i8042
> >   4:  0  0  03878668   IO-APIC-edge
serial
> .
> >  14:3072118  0  0181   IO-APIC-edge
ide0
> .
> >  52:  0  0  0  213052723   IO-APIC-fasteoi
eth1
> >  53:  0  0  0   91913759   IO-APIC-fasteoi
eth2
> > 100:  0  0  0   16776910   IO-APIC-fasteoi
eth0
> 
> >
> > Maybe
> > I have 3 XFS on this system, with 3 source.
> >
> > 1. 200G one ide hdd.
> > 2. 2x200G mirror on 1 ide + 1 sata hdd.
> > 3. 4x3.3TB strip on NBD.
> >
> > The NBD serves through eth1, and it is on the CPU3, but the ide0 is on
the
> > CPU0.
>
> I'd say your NBD based XFS filesystem is having trouble.
>
> > > Are you using XFS on a NBD?
> >
> > Yes, on the 3. source.
>
> Ok, I've never heard of a problem like this before and you are doing
> something that very few ppl are doing (i.e. XFS on NBD). I'd start
> Hence  I'd start by suspecting a bug in the NBD driver.

Ok, if you have right, this also can be in context with the following issue:

http://download.netcenter.hu/bughunt/20061217/messages.txt   (10KB)


>
> > > > Dec 16 12:08:36 dy-base RSP: 0018:81011fdedbc0  EFLAGS: 00010002
> > > > Dec 16 12:08:36 dy-base RAX: 0033 RBX: 6b6b6b6b6b6b6b6b
RCX:
> > >  
> > > Anyone recognise that pattern?
> >
> > I think i have one idea.
> > This issue can stops sometimes the 5sec automatic restart on crash, and
this
> > shows possible memory corruption, and if the bug occurs in the IRQ
> > handling :-)
> > I have a lot of logs about this issue, and the RAX, RBX always the same.
>
> And is this the only place where you see the problem? Or are there
> other stack traces that you see this in as well?

I have used the 2.6.16.18 for a long time, and it was stable, except this
issue. (~20 dump with xfslogd)
And i try the new releases, and now i have more. :-)

What do you think exactly?
I can see in the logs, but search for what?
The RAX, RBX thing, or the xfslogd-spinlock problem or the old nbd-deadlock
+ mem corruption?

[EMAIL PROTECTED] netlog]# grep "0033" messages*
messages.1:Dec 11 22:47:21 dy-base RAX: 0033 RBX:
6b6b6b6b6b6b6b6b RCX: 
messages.1:Dec 12 18:16:35 dy-base RAX: 0033 RBX:
6b6b6b6b6b6b6b6b RCX: 
messages.1:Dec 13 11:40:05 dy-base RAX: 0033 RBX:
6b6b6b6b6b6b6b6b RCX: 
messages.1:Dec 14 22:25:32 dy-base RAX: 0033 RBX:
6b6b6b6b6b6b6b6b RCX: 
messages.1:Dec 15 06:24:44 dy-base RAX: 0033 RBX:
6b6b6b6b6b6b6b6b RCX: 
messages.1:Dec 16 12:08:36 dy-base RAX: 0033 RBX:
6b6b6b6b6b6b6b6b RCX: 
messages.11:Oct  3 19:49:44 dy-base RAX: 0033 RBX:
6b6b6b6b6b6b6b6b RCX: 
messages.11:Oct  7 01:11:17 dy-base RAX: 0033 RBX:
6b6b6b6b6b6b6b6b RCX: 
messages.13:Sep 21 15:35:31 dy-base RAX: 0033 RBX:
6b6b6b6b6b6b6b6b RCX: 
messages.15:Sep  3 16:13:35 dy-base RAX: 0033 RBX:
6b6b6b6b6b6b6b6b RCX: 
messages.15:Sep  5 21:00:38 dy-base RAX: 0033 RBX:
6b6b6b6b6b6b6b6b RCX: 
messages.2:Dec  9 00:10:47 dy-base RAX: 0033 RBX:
6b6b6b6b6b6b6b6b RCX: 
messages.2:Dec  9 14:07:01 dy-base RAX: 0033 RBX:
6b6b6b6b6b6b6b6b RCX: 
messages.2:Dec 10 04:44:48 dy-base RAX: 0033 RBX:
6b6b6b6b6b6b6b6b RCX: 
messages.3:Nov 30 10:59:21 dy-base RAX: 0033 RBX:
6b6b6b6b6b6b6b6b RCX: 
messages.3:Dec 

Re: xfslogd-spinlock bug?

2006-12-18 Thread Haar János

- Original Message - 
From: David Chinner [EMAIL PROTECTED]
To: Haar János [EMAIL PROTECTED]
Cc: David Chinner [EMAIL PROTECTED]; [EMAIL PROTECTED];
linux-kernel@vger.kernel.org
Sent: Monday, December 18, 2006 7:24 AM
Subject: Re: xfslogd-spinlock bug?


 On Mon, Dec 18, 2006 at 12:56:41AM +0100, Haar János wrote:
   On Sat, Dec 16, 2006 at 12:19:45PM +0100, Haar János wrote:
I dont know there is a context between 2 messages, but i can see,
the
spinlock bug comes always on cpu #3.
   
Somebody have any idea?
  
   Your disk interrupts are directed to CPU 3, and so log I/O completion
   occurs on that CPU.
 
 CPU0   CPU1   CPU2   CPU3
0:100  0  04583704   IO-APIC-edge
timer
1:  0  0  0  2   IO-APIC-edge
i8042
4:  0  0  03878668   IO-APIC-edge
serial
 .
   14:3072118  0  0181   IO-APIC-edge
ide0
 .
   52:  0  0  0  213052723   IO-APIC-fasteoi
eth1
   53:  0  0  0   91913759   IO-APIC-fasteoi
eth2
  100:  0  0  0   16776910   IO-APIC-fasteoi
eth0
 
 
  Maybe
  I have 3 XFS on this system, with 3 source.
 
  1. 200G one ide hdd.
  2. 2x200G mirror on 1 ide + 1 sata hdd.
  3. 4x3.3TB strip on NBD.
 
  The NBD serves through eth1, and it is on the CPU3, but the ide0 is on
the
  CPU0.

 I'd say your NBD based XFS filesystem is having trouble.

   Are you using XFS on a NBD?
 
  Yes, on the 3. source.

 Ok, I've never heard of a problem like this before and you are doing
 something that very few ppl are doing (i.e. XFS on NBD). I'd start
 Hence  I'd start by suspecting a bug in the NBD driver.

Ok, if you have right, this also can be in context with the following issue:

http://download.netcenter.hu/bughunt/20061217/messages.txt   (10KB)



Dec 16 12:08:36 dy-base RSP: 0018:81011fdedbc0  EFLAGS: 00010002
Dec 16 12:08:36 dy-base RAX: 0033 RBX: 6b6b6b6b6b6b6b6b
RCX:

   Anyone recognise that pattern?
 
  I think i have one idea.
  This issue can stops sometimes the 5sec automatic restart on crash, and
this
  shows possible memory corruption, and if the bug occurs in the IRQ
  handling :-)
  I have a lot of logs about this issue, and the RAX, RBX always the same.

 And is this the only place where you see the problem? Or are there
 other stack traces that you see this in as well?

I have used the 2.6.16.18 for a long time, and it was stable, except this
issue. (~20 dump with xfslogd)
And i try the new releases, and now i have more. :-)

What do you think exactly?
I can see in the logs, but search for what?
The RAX, RBX thing, or the xfslogd-spinlock problem or the old nbd-deadlock
+ mem corruption?

[EMAIL PROTECTED] netlog]# grep 0033 messages*
messages.1:Dec 11 22:47:21 dy-base RAX: 0033 RBX:
6b6b6b6b6b6b6b6b RCX: 
messages.1:Dec 12 18:16:35 dy-base RAX: 0033 RBX:
6b6b6b6b6b6b6b6b RCX: 
messages.1:Dec 13 11:40:05 dy-base RAX: 0033 RBX:
6b6b6b6b6b6b6b6b RCX: 
messages.1:Dec 14 22:25:32 dy-base RAX: 0033 RBX:
6b6b6b6b6b6b6b6b RCX: 
messages.1:Dec 15 06:24:44 dy-base RAX: 0033 RBX:
6b6b6b6b6b6b6b6b RCX: 
messages.1:Dec 16 12:08:36 dy-base RAX: 0033 RBX:
6b6b6b6b6b6b6b6b RCX: 
messages.11:Oct  3 19:49:44 dy-base RAX: 0033 RBX:
6b6b6b6b6b6b6b6b RCX: 
messages.11:Oct  7 01:11:17 dy-base RAX: 0033 RBX:
6b6b6b6b6b6b6b6b RCX: 
messages.13:Sep 21 15:35:31 dy-base RAX: 0033 RBX:
6b6b6b6b6b6b6b6b RCX: 
messages.15:Sep  3 16:13:35 dy-base RAX: 0033 RBX:
6b6b6b6b6b6b6b6b RCX: 
messages.15:Sep  5 21:00:38 dy-base RAX: 0033 RBX:
6b6b6b6b6b6b6b6b RCX: 
messages.2:Dec  9 00:10:47 dy-base RAX: 0033 RBX:
6b6b6b6b6b6b6b6b RCX: 
messages.2:Dec  9 14:07:01 dy-base RAX: 0033 RBX:
6b6b6b6b6b6b6b6b RCX: 
messages.2:Dec 10 04:44:48 dy-base RAX: 0033 RBX:
6b6b6b6b6b6b6b6b RCX: 
messages.3:Nov 30 10:59:21 dy-base RAX: 0033 RBX:
6b6b6b6b6b6b6b6b RCX: 
messages.3:Dec  2 00:54:23 dy-base RAX: 0033 RBX:
6b6b6b6b6b6b6b6b RCX: 
messages.5:Nov 13 10:44:49 dy-base RAX: 0033 RBX:
6b6b6b6b6b6b6b6b RCX: 
messages.5:Nov 14 03:14:14 dy-base RAX: 0033 RBX:
6b6b6b6b6b6b6b6b RCX: 
messages.5:Nov 14 03:37:07 dy-base RAX: 0033 RBX:
6b6b6b6b6b6b6b6b RCX: 
messages.5:Nov 15 01:39:54 dy-base RAX: 0033 RBX:
6b6b6b6b6b6b6b6b RCX: 
messages.6:Nov  6

Re: xfslogd-spinlock bug?

2006-12-18 Thread David Chinner
On Mon, Dec 18, 2006 at 09:17:50AM +0100, Haar János wrote:
 From: David Chinner [EMAIL PROTECTED]
   The NBD serves through eth1, and it is on the CPU3, but the ide0 is on
 the
   CPU0.
 
  I'd say your NBD based XFS filesystem is having trouble.
 
Are you using XFS on a NBD?
  
   Yes, on the 3. source.
 
  Ok, I've never heard of a problem like this before and you are doing
  something that very few ppl are doing (i.e. XFS on NBD). I'd start
  Hence  I'd start by suspecting a bug in the NBD driver.
 
 Ok, if you have right, this also can be in context with the following issue:
 
 http://download.netcenter.hu/bughunt/20061217/messages.txt   (10KB)

Which appears to be a crash in wake_up_process() when doing memory
reclaim (waking the xfsbufd).

 Dec 16 12:08:36 dy-base RSP: 0018:81011fdedbc0  EFLAGS: 00010002
 Dec 16 12:08:36 dy-base RAX: 0033 RBX: 6b6b6b6b6b6b6b6b
 RCX:
 
Anyone recognise that pattern?

Ok, I've found this pattern:

#define POISON_FREE 0x6b

Can you confirm that you are running with CONFIG_DEBUG_SLAB=y?

If so, we have a use after free occurring here and it would also
explain why no-one has reported it before.

FWIW, can you turn on CONFIG_XFS_DEBUG=y and see if that triggers
a different bug check prior to the above dump?

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: xfslogd-spinlock bug?

2006-12-18 Thread Haar János

- Original Message - 
From: David Chinner [EMAIL PROTECTED]
To: Haar János [EMAIL PROTECTED]
Cc: David Chinner [EMAIL PROTECTED]; [EMAIL PROTECTED];
linux-kernel@vger.kernel.org
Sent: Monday, December 18, 2006 11:36 PM
Subject: Re: xfslogd-spinlock bug?


 On Mon, Dec 18, 2006 at 09:17:50AM +0100, Haar János wrote:
  From: David Chinner [EMAIL PROTECTED]
The NBD serves through eth1, and it is on the CPU3, but the ide0 is
on
  the
CPU0.
  
   I'd say your NBD based XFS filesystem is having trouble.
  
 Are you using XFS on a NBD?
   
Yes, on the 3. source.
  
   Ok, I've never heard of a problem like this before and you are doing
   something that very few ppl are doing (i.e. XFS on NBD). I'd start
   Hence  I'd start by suspecting a bug in the NBD driver.
 
  Ok, if you have right, this also can be in context with the following
issue:
 
  http://download.netcenter.hu/bughunt/20061217/messages.txt   (10KB)

 Which appears to be a crash in wake_up_process() when doing memory
 reclaim (waking the xfsbufd).

Sorry, can you translate it to poor mans language? :-)
This is a different bug?



  Dec 16 12:08:36 dy-base RSP: 0018:81011fdedbc0  EFLAGS:
00010002
  Dec 16 12:08:36 dy-base RAX: 0033 RBX:
6b6b6b6b6b6b6b6b
  RCX:


 Anyone recognise that pattern?

 Ok, I've found this pattern:

 #define POISON_FREE 0x6b

 Can you confirm that you are running with CONFIG_DEBUG_SLAB=y?

Yes, i build with this option enabled.
Is this wrong?


 If so, we have a use after free occurring here and it would also
 explain why no-one has reported it before.

 FWIW, can you turn on CONFIG_XFS_DEBUG=y and see if that triggers
 a different bug check prior to the above dump?

[EMAIL PROTECTED] linux-2.6.19]# make bzImage
scripts/kconfig/conf -s arch/x86_64/Kconfig
.config:7:warning: trying to assign nonexistent symbol XFS_DEBUG

I have missed something?


Thanks,

Janos


 Cheers,

 Dave.
 -- 
 Dave Chinner
 Principal Engineer
 SGI Australian Software Group

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: xfslogd-spinlock bug?

2006-12-18 Thread David Chinner
On Tue, Dec 19, 2006 at 12:39:46AM +0100, Haar János wrote:
 From: David Chinner [EMAIL PROTECTED]
  On Mon, Dec 18, 2006 at 09:17:50AM +0100, Haar János wrote:
   From: David Chinner [EMAIL PROTECTED]
Ok, I've never heard of a problem like this before and you are doing
something that very few ppl are doing (i.e. XFS on NBD). I'd start
Hence  I'd start by suspecting a bug in the NBD driver.
  
   Ok, if you have right, this also can be in context with the following
 issue:
  
   http://download.netcenter.hu/bughunt/20061217/messages.txt   (10KB)
 
  Which appears to be a crash in wake_up_process() when doing memory
  reclaim (waking the xfsbufd).
 
 Sorry, can you translate it to poor mans language? :-)
 This is a different bug?

Don't know - it's a different crash, but once again one that I've
never heard of occurring before.

  Ok, I've found this pattern:
 
  #define POISON_FREE 0x6b
 
  Can you confirm that you are running with CONFIG_DEBUG_SLAB=y?
 
 Yes, i build with this option enabled.
 Is this wrong?

No, but it does slow your machine down.

  If so, we have a use after free occurring here and it would also
  explain why no-one has reported it before.
 
  FWIW, can you turn on CONFIG_XFS_DEBUG=y and see if that triggers
  a different bug check prior to the above dump?
 
 [EMAIL PROTECTED] linux-2.6.19]# make bzImage
 scripts/kconfig/conf -s arch/x86_64/Kconfig
 .config:7:warning: trying to assign nonexistent symbol XFS_DEBUG
 
 I have missed something?

No - I forgot that config option doesn't exist in mainline XFS - it's
only in the dev tree.

FWIW, I've run XFSQA twice now on a scsi disk with slab debuggin turned
on and I haven't seen this problem. I'm not sure how to track down
the source of the problem without a test case, but as a quick test, can
you try the following patch?

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group


---
 fs/xfs/linux-2.6/xfs_buf.c |4 
 1 file changed, 4 insertions(+)

Index: 2.6.x-xfs-new/fs/xfs/linux-2.6/xfs_buf.c
===
--- 2.6.x-xfs-new.orig/fs/xfs/linux-2.6/xfs_buf.c   2006-12-19 
12:22:54.0 +1100
+++ 2.6.x-xfs-new/fs/xfs/linux-2.6/xfs_buf.c2006-12-19 13:48:36.937118569 
+1100
@@ -942,11 +942,14 @@ xfs_buf_unlock(
 /*
  * Pinning Buffer Storage in Memory
  * Ensure that no attempt to force a buffer to disk will succeed.
+ * Hold the buffer so we don't attempt to free it while it
+ * is pinned.
  */
 void
 xfs_buf_pin(
xfs_buf_t   *bp)
 {
+   xfs_buf_hold(bp);
atomic_inc(bp-b_pin_count);
XB_TRACE(bp, pin, (long)bp-b_pin_count.counter);
 }
@@ -958,6 +961,7 @@ xfs_buf_unpin(
if (atomic_dec_and_test(bp-b_pin_count))
wake_up_all(bp-b_waiters);
XB_TRACE(bp, unpin, (long)bp-b_pin_count.counter);
+   xfs_buf_rele(bp);
 }
 
 int
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: xfslogd-spinlock bug?

2006-12-18 Thread David Chinner
On Tue, Dec 19, 2006 at 01:52:29PM +1100, David Chinner wrote:
 On Tue, Dec 19, 2006 at 12:39:46AM +0100, Haar János wrote:
  From: David Chinner [EMAIL PROTECTED]
   #define POISON_FREE 0x6b
  
   Can you confirm that you are running with CONFIG_DEBUG_SLAB=y?
  
  Yes, i build with this option enabled.

..

 FWIW, I've run XFSQA twice now on a scsi disk with slab debuggin turned
 on and I haven't seen this problem. I'm not sure how to track down
 the source of the problem without a test case, but as a quick test, can
 you try the following patch?

Third try an I got a crash on a poisoned object:

[1]kdb md8c40 e0300d7d5100
0xe0300d7d5100 5a2cf071    q.,Z
0xe0300d7d5110 5a2cf071 6b6b6b6b6b6b6b6b   q.,Z
0xe0300d7d5120 e039eb7b6320 6b6b6b6b6b6b6b6bc{.9...
0xe0300d7d5130 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b   
0xe0300d7d5140 6b6b6b6f6b6b6b6b 6b6b6b6b6b6b6b6b   okkk
0xe0300d7d5150 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b   
0xe0300d7d5160 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b   
0xe0300d7d5170 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b   
0xe0300d7d5180 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b   
0xe0300d7d5190 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b   
0xe0300d7d51a0 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b   
0xe0300d7d51b0 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b   
0xe0300d7d51c0 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b   
0xe0300d7d51d0 6b6b6b6b6b6b6b6b a56b6b6b6b6b6b6b   kkk.
0xe0300d7d51e0 5a2cf071 a00100468c30   q.,Z0.F.
[1]kdb mds 0xe0300d7d51e0
0xe0300d7d51e0 5a2cf071   q.,Z
0xe0300d7d51e8 a00100468c30 xfs_inode_item_destroy+0x30

So the use-after-free here is on an inode item. You're tripping
over a buffer item.

Unfortunately, it is not the same problem - the problem I've just
hit is to do with a QA test that does a forced shutdown on an active
filesystem, and:

[1]kdb xmount 0xe0304393e238
.
flags 0x440010 FSSHUTDOWN IDELETE COMPAT_IOSIZE 

The filesystem was being shutdown so xfs_inode_item_destroy() just
frees the inode log item without removing it from the AIL. I'll fix that,
and see if i have any luck

So I'd still try that patch i sent in the previous email...

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: xfslogd-spinlock bug?

2006-12-17 Thread David Chinner
On Mon, Dec 18, 2006 at 12:56:41AM +0100, Haar János wrote:
> > On Sat, Dec 16, 2006 at 12:19:45PM +0100, Haar János wrote:
> > > I dont know there is a context between 2 messages, but i can see, the
> > > spinlock bug comes always on cpu #3.
> > >
> > > Somebody have any idea?
> >
> > Your disk interrupts are directed to CPU 3, and so log I/O completion
> > occurs on that CPU.
> 
>CPU0   CPU1   CPU2   CPU3
>   0:100  0  04583704   IO-APIC-edge  timer
>   1:  0  0  0  2   IO-APIC-edge  i8042
>   4:  0  0  03878668   IO-APIC-edge  serial
.
>  14:3072118  0  0181   IO-APIC-edge  ide0
.
>  52:  0  0  0  213052723   IO-APIC-fasteoi   eth1
>  53:  0  0  0   91913759   IO-APIC-fasteoi   eth2
> 100:  0  0  0   16776910   IO-APIC-fasteoi   eth0

> 
> Maybe
> I have 3 XFS on this system, with 3 source.
> 
> 1. 200G one ide hdd.
> 2. 2x200G mirror on 1 ide + 1 sata hdd.
> 3. 4x3.3TB strip on NBD.
> 
> The NBD serves through eth1, and it is on the CPU3, but the ide0 is on the
> CPU0.

I'd say your NBD based XFS filesystem is having trouble.

> > Are you using XFS on a NBD?
> 
> Yes, on the 3. source.

Ok, I've never heard of a problem like this before and you are doing
something that very few ppl are doing (i.e. XFS on NBD). I'd start
Hence  I'd start by suspecting a bug in the NBD driver.

> > > Dec 16 12:08:36 dy-base RSP: 0018:81011fdedbc0  EFLAGS: 00010002
> > > Dec 16 12:08:36 dy-base RAX: 0033 RBX: 6b6b6b6b6b6b6b6b RCX:
> >  
> > Anyone recognise that pattern?
> 
> I think i have one idea.
> This issue can stops sometimes the 5sec automatic restart on crash, and this
> shows possible memory corruption, and if the bug occurs in the IRQ
> handling :-)
> I have a lot of logs about this issue, and the RAX, RBX always the same.

And is this the only place where you see the problem? Or are there
other stack traces that you see this in as well?

> > This implies a spinlock inside a wait_queue_head_t is corrupt.
> >
> > What are you type of system do you have, and what sort of
> > workload are you running?
> 
> OS: Fedora 5, 64bit.
> HW: dual xeon, with HT, ram 4GB.
> (the min_free_kbytes limit is set to 128000, because sometimes the e1000
> driver run out the reserved memory during irq handling.)

That does not sound good. What happens when it does run out of memory?
Is that when you start to see the above corruptions?

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: xfslogd-spinlock bug?

2006-12-17 Thread Haar János

- Original Message - 
From: "David Chinner" <[EMAIL PROTECTED]>
To: "Haar János" <[EMAIL PROTECTED]>
Cc: <[EMAIL PROTECTED]>; 
Sent: Sunday, December 17, 2006 11:44 PM
Subject: Re: xfslogd-spinlock bug?


> On Sat, Dec 16, 2006 at 12:19:45PM +0100, Haar János wrote:
> > Hi
> >
> > I have some news.
> >
> > I dont know there is a context between 2 messages, but i can see, the
> > spinlock bug comes always on cpu #3.
> >
> > Somebody have any idea?
>
> Your disk interrupts are directed to CPU 3, and so log I/O completion
> occurs on that CPU.

   CPU0   CPU1   CPU2   CPU3
  0:100  0  04583704   IO-APIC-edge  timer
  1:  0  0  0  2   IO-APIC-edge  i8042
  4:  0  0  03878668   IO-APIC-edge  serial
  8:  0  0  0  0   IO-APIC-edge  rtc
  9:  0  0  0  0   IO-APIC-fasteoi   acpi
 12:  0  0  0  3   IO-APIC-edge  i8042
 14:3072118  0  0181   IO-APIC-edge  ide0
 16:  0  0  0  0   IO-APIC-fasteoi
uhci_hcd:usb2
 18:  0  0  0  0   IO-APIC-fasteoi
uhci_hcd:usb4
 19:  0  0  0  0   IO-APIC-fasteoi
uhci_hcd:usb3
 23:  0  0  0  0   IO-APIC-fasteoi
ehci_hcd:usb1
 52:  0  0  0  213052723   IO-APIC-fasteoi   eth1
 53:  0  0  0   91913759   IO-APIC-fasteoi   eth2
100:  0  0  0   16776910   IO-APIC-fasteoi   eth0
NMI:  42271  43187  42234  43168
LOC:4584247458421945842154584198
ERR:  0

Maybe
I have 3 XFS on this system, with 3 source.

1. 200G one ide hdd.
2. 2x200G mirror on 1 ide + 1 sata hdd.
3. 4x3.3TB strip on NBD.

The NBD serves through eth1, and it is on the CPU3, but the ide0 is on the
CPU0.


>
> > Dec 16 12:08:36 dy-base BUG: spinlock bad magic on CPU#3, xfslogd/3/317
> > Dec 16 12:08:36 dy-base general protection fault:  [1]
> > Dec 16 12:08:36 dy-base SMP
> > Dec 16 12:08:36 dy-base
> > Dec 16 12:08:36 dy-base CPU 3
> > Dec 16 12:08:36 dy-base
> > Dec 16 12:08:36 dy-base Modules linked in:
> > Dec 16 12:08:36 dy-base  nbd
>
> Are you using XFS on a NBD?

Yes, on the 3. source.
I used it about 1.5 years.

(The nbd deadlock is fixed on my system, thanks to Herbert Xu on 2.6.14.)

>
> > Dec 16 12:08:36 dy-base  rd
> > Dec 16 12:08:36 dy-base  netconsole
> > Dec 16 12:08:36 dy-base  e1000
> > Dec 16 12:08:36 dy-base  video
> > Dec 16 12:08:36 dy-base
> > Dec 16 12:08:36 dy-base Pid: 317, comm: xfslogd/3 Not tainted 2.6.19 #1
> > Dec 16 12:08:36 dy-base RIP: 0010:[]
> > Dec 16 12:08:36 dy-base  [] spin_bug+0x69/0xdf
> > Dec 16 12:08:36 dy-base RSP: 0018:81011fdedbc0  EFLAGS: 00010002
> > Dec 16 12:08:36 dy-base RAX: 0033 RBX: 6b6b6b6b6b6b6b6b RCX:
>  
> Anyone recognise that pattern?

I think i have one idea.
This issue can stops sometimes the 5sec automatic restart on crash, and this
shows possible memory corruption, and if the bug occurs in the IRQ
handling :-)
I have a lot of logs about this issue, and the RAX, RBX always the same.

>
> > Dec 16 12:08:36 dy-base Call Trace:
> > Dec 16 12:08:36 dy-base  [] _raw_spin_lock+0x23/0xf1
> > Dec 16 12:08:36 dy-base  []
_spin_lock_irqsave+0x11/0x18
> > Dec 16 12:08:36 dy-base  [] __wake_up+0x22/0x50
> > Dec 16 12:08:36 dy-base  [] xfs_buf_unpin+0x21/0x23
> > Dec 16 12:08:36 dy-base  []
xfs_buf_item_unpin+0x2e/0xa6
>
> This implies a spinlock inside a wait_queue_head_t is corrupt.
>
> What are you type of system do you have, and what sort of
> workload are you running?

OS: Fedora 5, 64bit.
HW: dual xeon, with HT, ram 4GB.
(the min_free_kbytes limit is set to 128000, because sometimes the e1000
driver run out the reserved memory during irq handling.)

Workload:

I use this system for free web storage.
(2x apache 2.0.xx,   12x pure-ftpd, 2x mysql but sql only use the source #2
fs.)

The normal system load is ~20-40, but currently i have a little problem with
apache, because it sometimes starts to read a lot from the big XFS device,
and eats all memory, the load is rising to 700-800.
At this point i use httpd restart, and everithing go back to normal, but if
i offline.

Thanks a lot!

Janos

>
> Cheers,
>
> Dave.
> -- 
> Dave Chinner
> Principal Engineer
> SGI Australian Software Group

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: xfslogd-spinlock bug?

2006-12-17 Thread David Chinner
On Sat, Dec 16, 2006 at 12:19:45PM +0100, Haar János wrote:
> Hi
> 
> I have some news.
> 
> I dont know there is a context between 2 messages, but i can see, the
> spinlock bug comes always on cpu #3.
> 
> Somebody have any idea?

Your disk interrupts are directed to CPU 3, and so log I/O completion
occurs on that CPU.

> Dec 16 12:08:36 dy-base BUG: spinlock bad magic on CPU#3, xfslogd/3/317
> Dec 16 12:08:36 dy-base general protection fault:  [1]
> Dec 16 12:08:36 dy-base SMP
> Dec 16 12:08:36 dy-base
> Dec 16 12:08:36 dy-base CPU 3
> Dec 16 12:08:36 dy-base
> Dec 16 12:08:36 dy-base Modules linked in:
> Dec 16 12:08:36 dy-base  nbd

Are you using XFS on a NBD?

> Dec 16 12:08:36 dy-base  rd
> Dec 16 12:08:36 dy-base  netconsole
> Dec 16 12:08:36 dy-base  e1000
> Dec 16 12:08:36 dy-base  video
> Dec 16 12:08:36 dy-base
> Dec 16 12:08:36 dy-base Pid: 317, comm: xfslogd/3 Not tainted 2.6.19 #1
> Dec 16 12:08:36 dy-base RIP: 0010:[]
> Dec 16 12:08:36 dy-base  [] spin_bug+0x69/0xdf
> Dec 16 12:08:36 dy-base RSP: 0018:81011fdedbc0  EFLAGS: 00010002
> Dec 16 12:08:36 dy-base RAX: 0033 RBX: 6b6b6b6b6b6b6b6b RCX:
 
Anyone recognise that pattern?

> Dec 16 12:08:36 dy-base Call Trace:
> Dec 16 12:08:36 dy-base  [] _raw_spin_lock+0x23/0xf1
> Dec 16 12:08:36 dy-base  [] _spin_lock_irqsave+0x11/0x18
> Dec 16 12:08:36 dy-base  [] __wake_up+0x22/0x50
> Dec 16 12:08:36 dy-base  [] xfs_buf_unpin+0x21/0x23
> Dec 16 12:08:36 dy-base  [] xfs_buf_item_unpin+0x2e/0xa6

This implies a spinlock inside a wait_queue_head_t is corrupt.

What are you type of system do you have, and what sort of
workload are you running?

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: xfslogd-spinlock bug?

2006-12-17 Thread David Chinner
On Sat, Dec 16, 2006 at 12:19:45PM +0100, Haar János wrote:
 Hi
 
 I have some news.
 
 I dont know there is a context between 2 messages, but i can see, the
 spinlock bug comes always on cpu #3.
 
 Somebody have any idea?

Your disk interrupts are directed to CPU 3, and so log I/O completion
occurs on that CPU.

 Dec 16 12:08:36 dy-base BUG: spinlock bad magic on CPU#3, xfslogd/3/317
 Dec 16 12:08:36 dy-base general protection fault:  [1]
 Dec 16 12:08:36 dy-base SMP
 Dec 16 12:08:36 dy-base
 Dec 16 12:08:36 dy-base CPU 3
 Dec 16 12:08:36 dy-base
 Dec 16 12:08:36 dy-base Modules linked in:
 Dec 16 12:08:36 dy-base  nbd

Are you using XFS on a NBD?

 Dec 16 12:08:36 dy-base  rd
 Dec 16 12:08:36 dy-base  netconsole
 Dec 16 12:08:36 dy-base  e1000
 Dec 16 12:08:36 dy-base  video
 Dec 16 12:08:36 dy-base
 Dec 16 12:08:36 dy-base Pid: 317, comm: xfslogd/3 Not tainted 2.6.19 #1
 Dec 16 12:08:36 dy-base RIP: 0010:[803f3aba]
 Dec 16 12:08:36 dy-base  [803f3aba] spin_bug+0x69/0xdf
 Dec 16 12:08:36 dy-base RSP: 0018:81011fdedbc0  EFLAGS: 00010002
 Dec 16 12:08:36 dy-base RAX: 0033 RBX: 6b6b6b6b6b6b6b6b RCX:
 
Anyone recognise that pattern?

 Dec 16 12:08:36 dy-base Call Trace:
 Dec 16 12:08:36 dy-base  [803f3bdc] _raw_spin_lock+0x23/0xf1
 Dec 16 12:08:36 dy-base  [805e7f2b] _spin_lock_irqsave+0x11/0x18
 Dec 16 12:08:36 dy-base  [80222aab] __wake_up+0x22/0x50
 Dec 16 12:08:36 dy-base  [803c97f9] xfs_buf_unpin+0x21/0x23
 Dec 16 12:08:36 dy-base  [803970a4] xfs_buf_item_unpin+0x2e/0xa6

This implies a spinlock inside a wait_queue_head_t is corrupt.

What are you type of system do you have, and what sort of
workload are you running?

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: xfslogd-spinlock bug?

2006-12-17 Thread Haar János

- Original Message - 
From: David Chinner [EMAIL PROTECTED]
To: Haar János [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]; linux-kernel@vger.kernel.org
Sent: Sunday, December 17, 2006 11:44 PM
Subject: Re: xfslogd-spinlock bug?


 On Sat, Dec 16, 2006 at 12:19:45PM +0100, Haar János wrote:
  Hi
 
  I have some news.
 
  I dont know there is a context between 2 messages, but i can see, the
  spinlock bug comes always on cpu #3.
 
  Somebody have any idea?

 Your disk interrupts are directed to CPU 3, and so log I/O completion
 occurs on that CPU.

   CPU0   CPU1   CPU2   CPU3
  0:100  0  04583704   IO-APIC-edge  timer
  1:  0  0  0  2   IO-APIC-edge  i8042
  4:  0  0  03878668   IO-APIC-edge  serial
  8:  0  0  0  0   IO-APIC-edge  rtc
  9:  0  0  0  0   IO-APIC-fasteoi   acpi
 12:  0  0  0  3   IO-APIC-edge  i8042
 14:3072118  0  0181   IO-APIC-edge  ide0
 16:  0  0  0  0   IO-APIC-fasteoi
uhci_hcd:usb2
 18:  0  0  0  0   IO-APIC-fasteoi
uhci_hcd:usb4
 19:  0  0  0  0   IO-APIC-fasteoi
uhci_hcd:usb3
 23:  0  0  0  0   IO-APIC-fasteoi
ehci_hcd:usb1
 52:  0  0  0  213052723   IO-APIC-fasteoi   eth1
 53:  0  0  0   91913759   IO-APIC-fasteoi   eth2
100:  0  0  0   16776910   IO-APIC-fasteoi   eth0
NMI:  42271  43187  42234  43168
LOC:4584247458421945842154584198
ERR:  0

Maybe
I have 3 XFS on this system, with 3 source.

1. 200G one ide hdd.
2. 2x200G mirror on 1 ide + 1 sata hdd.
3. 4x3.3TB strip on NBD.

The NBD serves through eth1, and it is on the CPU3, but the ide0 is on the
CPU0.



  Dec 16 12:08:36 dy-base BUG: spinlock bad magic on CPU#3, xfslogd/3/317
  Dec 16 12:08:36 dy-base general protection fault:  [1]
  Dec 16 12:08:36 dy-base SMP
  Dec 16 12:08:36 dy-base
  Dec 16 12:08:36 dy-base CPU 3
  Dec 16 12:08:36 dy-base
  Dec 16 12:08:36 dy-base Modules linked in:
  Dec 16 12:08:36 dy-base  nbd

 Are you using XFS on a NBD?

Yes, on the 3. source.
I used it about 1.5 years.

(The nbd deadlock is fixed on my system, thanks to Herbert Xu on 2.6.14.)


  Dec 16 12:08:36 dy-base  rd
  Dec 16 12:08:36 dy-base  netconsole
  Dec 16 12:08:36 dy-base  e1000
  Dec 16 12:08:36 dy-base  video
  Dec 16 12:08:36 dy-base
  Dec 16 12:08:36 dy-base Pid: 317, comm: xfslogd/3 Not tainted 2.6.19 #1
  Dec 16 12:08:36 dy-base RIP: 0010:[803f3aba]
  Dec 16 12:08:36 dy-base  [803f3aba] spin_bug+0x69/0xdf
  Dec 16 12:08:36 dy-base RSP: 0018:81011fdedbc0  EFLAGS: 00010002
  Dec 16 12:08:36 dy-base RAX: 0033 RBX: 6b6b6b6b6b6b6b6b RCX:
  
 Anyone recognise that pattern?

I think i have one idea.
This issue can stops sometimes the 5sec automatic restart on crash, and this
shows possible memory corruption, and if the bug occurs in the IRQ
handling :-)
I have a lot of logs about this issue, and the RAX, RBX always the same.


  Dec 16 12:08:36 dy-base Call Trace:
  Dec 16 12:08:36 dy-base  [803f3bdc] _raw_spin_lock+0x23/0xf1
  Dec 16 12:08:36 dy-base  [805e7f2b]
_spin_lock_irqsave+0x11/0x18
  Dec 16 12:08:36 dy-base  [80222aab] __wake_up+0x22/0x50
  Dec 16 12:08:36 dy-base  [803c97f9] xfs_buf_unpin+0x21/0x23
  Dec 16 12:08:36 dy-base  [803970a4]
xfs_buf_item_unpin+0x2e/0xa6

 This implies a spinlock inside a wait_queue_head_t is corrupt.

 What are you type of system do you have, and what sort of
 workload are you running?

OS: Fedora 5, 64bit.
HW: dual xeon, with HT, ram 4GB.
(the min_free_kbytes limit is set to 128000, because sometimes the e1000
driver run out the reserved memory during irq handling.)

Workload:

I use this system for free web storage.
(2x apache 2.0.xx,   12x pure-ftpd, 2x mysql but sql only use the source #2
fs.)

The normal system load is ~20-40, but currently i have a little problem with
apache, because it sometimes starts to read a lot from the big XFS device,
and eats all memory, the load is rising to 700-800.
At this point i use httpd restart, and everithing go back to normal, but if
i offline.

Thanks a lot!

Janos


 Cheers,

 Dave.
 -- 
 Dave Chinner
 Principal Engineer
 SGI Australian Software Group

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: xfslogd-spinlock bug?

2006-12-17 Thread David Chinner
On Mon, Dec 18, 2006 at 12:56:41AM +0100, Haar János wrote:
  On Sat, Dec 16, 2006 at 12:19:45PM +0100, Haar János wrote:
   I dont know there is a context between 2 messages, but i can see, the
   spinlock bug comes always on cpu #3.
  
   Somebody have any idea?
 
  Your disk interrupts are directed to CPU 3, and so log I/O completion
  occurs on that CPU.
 
CPU0   CPU1   CPU2   CPU3
   0:100  0  04583704   IO-APIC-edge  timer
   1:  0  0  0  2   IO-APIC-edge  i8042
   4:  0  0  03878668   IO-APIC-edge  serial
.
  14:3072118  0  0181   IO-APIC-edge  ide0
.
  52:  0  0  0  213052723   IO-APIC-fasteoi   eth1
  53:  0  0  0   91913759   IO-APIC-fasteoi   eth2
 100:  0  0  0   16776910   IO-APIC-fasteoi   eth0

 
 Maybe
 I have 3 XFS on this system, with 3 source.
 
 1. 200G one ide hdd.
 2. 2x200G mirror on 1 ide + 1 sata hdd.
 3. 4x3.3TB strip on NBD.
 
 The NBD serves through eth1, and it is on the CPU3, but the ide0 is on the
 CPU0.

I'd say your NBD based XFS filesystem is having trouble.

  Are you using XFS on a NBD?
 
 Yes, on the 3. source.

Ok, I've never heard of a problem like this before and you are doing
something that very few ppl are doing (i.e. XFS on NBD). I'd start
Hence  I'd start by suspecting a bug in the NBD driver.

   Dec 16 12:08:36 dy-base RSP: 0018:81011fdedbc0  EFLAGS: 00010002
   Dec 16 12:08:36 dy-base RAX: 0033 RBX: 6b6b6b6b6b6b6b6b RCX:
   
  Anyone recognise that pattern?
 
 I think i have one idea.
 This issue can stops sometimes the 5sec automatic restart on crash, and this
 shows possible memory corruption, and if the bug occurs in the IRQ
 handling :-)
 I have a lot of logs about this issue, and the RAX, RBX always the same.

And is this the only place where you see the problem? Or are there
other stack traces that you see this in as well?

  This implies a spinlock inside a wait_queue_head_t is corrupt.
 
  What are you type of system do you have, and what sort of
  workload are you running?
 
 OS: Fedora 5, 64bit.
 HW: dual xeon, with HT, ram 4GB.
 (the min_free_kbytes limit is set to 128000, because sometimes the e1000
 driver run out the reserved memory during irq handling.)

That does not sound good. What happens when it does run out of memory?
Is that when you start to see the above corruptions?

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: xfslogd-spinlock bug?

2006-12-16 Thread Haar János
:38 dy-base 00
Dec 16 12:08:38 dy-base 00
Dec 16 12:08:38 dy-base 41
Dec 16 12:08:38 dy-base 8b
Dec 16 12:08:38 dy-base 54
Dec 16 12:08:38 dy-base 24
Dec 16 12:08:38 dy-base 04
Dec 16 12:08:38 dy-base 41
Dec 16 12:08:38 dy-base 89
Dec 16 12:08:38 dy-base
Dec 16 12:08:38 dy-base RIP
Dec 16 12:08:38 dy-base  [] spin_bug+0x69/0xdf
Dec 16 12:08:38 dy-base  RSP 
Dec 16 12:08:38 dy-base
Dec 16 12:08:38 dy-base Kernel panic - not syncing: Fatal exception
Dec 16 12:08:38 dy-base
Dec 16 12:08:38 dy-base Rebooting in 5 seconds..




- Original Message - 
From: "Haar János" <[EMAIL PROTECTED]>
To: "Justin Piszcz" <[EMAIL PROTECTED]>
Cc: <[EMAIL PROTECTED]>; 
Sent: Wednesday, December 13, 2006 2:11 AM
Subject: Re: xfslogd-spinlock bug?


> Hello, Justin,
>
> This is a 64bit system.
>
> But i cannot understand, what is the curious? :-)
>
> I am not a kernel developer, and not a C programmer, but the long pointers
> shows me, the 64 bit.
> Or am i on the wrong clue? :-)
>
> Anyway, this issue happens for me about daily, or max 2-3 day often.
> But i have no idea what cause this exactly.
> The 2.6.16.18 was stable for me a long time, and one day starts to
tricking
> me, and happens more and more often.
> Thats why i thinking some bad part of the (14TB) FS on the disks.
>
> (this fs have a lot of errors, what the xfs_repair cannot be corrected,
but
> anyway this is a productive system, and works well, except this issue.
> Some months before i have replaced the parity disk in one of the RAID4
> array, and the next day, during the resync process, another one disk died.
> I almost lost everything, but thanks to the raid4, and mdadm, i have
> successfully recovered a lot of data with the 1 day older parity-only
drive.
> This was really bad, and leave some scars on the fs. )
>
> This issue looks like for me a race condition between the cpus. (2x Xeon
HT)
>
> Am i right? :-)
>
> Thanks,
>
> Janos
>
>
>
>
> - Original Message - 
> From: "Justin Piszcz" <[EMAIL PROTECTED]>
> To: "Haar János" <[EMAIL PROTECTED]>
> Cc: <[EMAIL PROTECTED]>; 
> Sent: Tuesday, December 12, 2006 3:32 PM
> Subject: Re: xfslogd-spinlock bug?
>
>
> I'm not sure what is causing this problem but I was curious is this on a
> 32bit or 64bit platform?
>
> Justin.
>
> On Tue, 12 Dec 2006, Haar János wrote:
>
> > Hello, list,
> >
> > I am the "big red button men" with the one big 14TB xfs, if somebody can
> > remember me. :-)
> >
> > Now i found something in the 2.6.16.18, and try the 2.6.18.4, and the
> > 2.6.19, but the bug still exists:
> >
> > Dec 11 22:47:21 dy-base BUG: spinlock bad magic on CPU#3, xfslogd/3/317
> > Dec 11 22:47:21 dy-base general protection fault:  [1]
> > Dec 11 22:47:21 dy-base SMP
> > Dec 11 22:47:21 dy-base
> > Dec 11 22:47:21 dy-base CPU 3
> > Dec 11 22:47:21 dy-base
> > Dec 11 22:47:21 dy-base Modules linked in:
> > Dec 11 22:47:21 dy-base  nbd
> > Dec 11 22:47:21 dy-base  rd
> > Dec 11 22:47:21 dy-base  netconsole
> > Dec 11 22:47:21 dy-base  e1000
> > Dec 11 22:47:21 dy-base  video
> > Dec 11 22:47:21 dy-base
> > Dec 11 22:47:21 dy-base Pid: 317, comm: xfslogd/3 Not tainted 2.6.19 #1
> > Dec 11 22:47:21 dy-base RIP: 0010:[]
> > Dec 11 22:47:21 dy-base  [] spin_bug+0x69/0xdf
> > Dec 11 22:47:21 dy-base RSP: 0018:81011fb89bc0  EFLAGS: 00010002
> > Dec 11 22:47:21 dy-base RAX: 0033 RBX: 6b6b6b6b6b6b6b6b RCX:
> > 
> > Dec 11 22:47:21 dy-base RDX: 808137a0 RSI: 0082 RDI:
> > 0001
> > Dec 11 22:47:21 dy-base RBP: 81011fb89be0 R08: 00026a70 R09:
> > 6b6b6b6b
> > Dec 11 22:47:21 dy-base R10: 0082 R11: 81000584d380 R12:
> > 8100db92ad80
> > Dec 11 22:47:21 dy-base R13: 80642dc6 R14:  R15:
> > 0003
> > Dec 11 22:47:21 dy-base FS:  ()
> > GS:81011fc76b90() knlGS:
> > Dec 11 22:47:21 dy-base CS:  0010 DS: 0018 ES: 0018 CR0:
8005003b
> > Dec 11 22:47:21 dy-base CR2: 2ba00770 CR3: 000108c05000 CR4:
> > 06e0
> > Dec 11 22:47:21 dy-base Process xfslogd/3 (pid: 317, threadinfo
> > 81011fb88000, task 81011fa7f830)
> > Dec 11 22:47:21 dy-base Stack:
> > Dec 11 22:47:21 dy-base  81011fb89be0
> > Dec 11 22:47:21 dy-base  8100db92ad80
> > Dec 11 22:47:21 dy-base  
> > Dec 11 22:47:21 dy-base  
> > Dec 11 22:47:21 dy-base
> > Dec 11 22:47:2

Re: xfslogd-spinlock bug?

2006-12-16 Thread Haar János
+0x45/0xa6
Dec 16 12:08:37 dy-base  [8020a6a8] child_rip+0xa/0x12
Dec 16 12:08:37 dy-base  [802399d8] worker_thread+0x0/0x134
Dec 16 12:08:37 dy-base  [8023c60d] kthread+0x0/0x10b
Dec 16 12:08:38 dy-base  [8020a69e] child_rip+0x0/0x12
Dec 16 12:08:38 dy-base
Dec 16 12:08:38 dy-base
Dec 16 12:08:38 dy-base Code:
Dec 16 12:08:38 dy-base 8b
Dec 16 12:08:38 dy-base 83
Dec 16 12:08:38 dy-base 0c
Dec 16 12:08:38 dy-base 01
Dec 16 12:08:38 dy-base 00
Dec 16 12:08:38 dy-base 00
Dec 16 12:08:38 dy-base 48
Dec 16 12:08:38 dy-base 8d
Dec 16 12:08:38 dy-base 8b
Dec 16 12:08:38 dy-base 98
Dec 16 12:08:38 dy-base 02
Dec 16 12:08:38 dy-base 00
Dec 16 12:08:38 dy-base 00
Dec 16 12:08:38 dy-base 41
Dec 16 12:08:38 dy-base 8b
Dec 16 12:08:38 dy-base 54
Dec 16 12:08:38 dy-base 24
Dec 16 12:08:38 dy-base 04
Dec 16 12:08:38 dy-base 41
Dec 16 12:08:38 dy-base 89
Dec 16 12:08:38 dy-base
Dec 16 12:08:38 dy-base RIP
Dec 16 12:08:38 dy-base  [803f3aba] spin_bug+0x69/0xdf
Dec 16 12:08:38 dy-base  RSP 81011fdedbc0
Dec 16 12:08:38 dy-base
Dec 16 12:08:38 dy-base Kernel panic - not syncing: Fatal exception
Dec 16 12:08:38 dy-base
Dec 16 12:08:38 dy-base Rebooting in 5 seconds..




- Original Message - 
From: Haar János [EMAIL PROTECTED]
To: Justin Piszcz [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]; linux-kernel@vger.kernel.org
Sent: Wednesday, December 13, 2006 2:11 AM
Subject: Re: xfslogd-spinlock bug?


 Hello, Justin,

 This is a 64bit system.

 But i cannot understand, what is the curious? :-)

 I am not a kernel developer, and not a C programmer, but the long pointers
 shows me, the 64 bit.
 Or am i on the wrong clue? :-)

 Anyway, this issue happens for me about daily, or max 2-3 day often.
 But i have no idea what cause this exactly.
 The 2.6.16.18 was stable for me a long time, and one day starts to
tricking
 me, and happens more and more often.
 Thats why i thinking some bad part of the (14TB) FS on the disks.

 (this fs have a lot of errors, what the xfs_repair cannot be corrected,
but
 anyway this is a productive system, and works well, except this issue.
 Some months before i have replaced the parity disk in one of the RAID4
 array, and the next day, during the resync process, another one disk died.
 I almost lost everything, but thanks to the raid4, and mdadm, i have
 successfully recovered a lot of data with the 1 day older parity-only
drive.
 This was really bad, and leave some scars on the fs. )

 This issue looks like for me a race condition between the cpus. (2x Xeon
HT)

 Am i right? :-)

 Thanks,

 Janos




 - Original Message - 
 From: Justin Piszcz [EMAIL PROTECTED]
 To: Haar János [EMAIL PROTECTED]
 Cc: [EMAIL PROTECTED]; linux-kernel@vger.kernel.org
 Sent: Tuesday, December 12, 2006 3:32 PM
 Subject: Re: xfslogd-spinlock bug?


 I'm not sure what is causing this problem but I was curious is this on a
 32bit or 64bit platform?

 Justin.

 On Tue, 12 Dec 2006, Haar János wrote:

  Hello, list,
 
  I am the big red button men with the one big 14TB xfs, if somebody can
  remember me. :-)
 
  Now i found something in the 2.6.16.18, and try the 2.6.18.4, and the
  2.6.19, but the bug still exists:
 
  Dec 11 22:47:21 dy-base BUG: spinlock bad magic on CPU#3, xfslogd/3/317
  Dec 11 22:47:21 dy-base general protection fault:  [1]
  Dec 11 22:47:21 dy-base SMP
  Dec 11 22:47:21 dy-base
  Dec 11 22:47:21 dy-base CPU 3
  Dec 11 22:47:21 dy-base
  Dec 11 22:47:21 dy-base Modules linked in:
  Dec 11 22:47:21 dy-base  nbd
  Dec 11 22:47:21 dy-base  rd
  Dec 11 22:47:21 dy-base  netconsole
  Dec 11 22:47:21 dy-base  e1000
  Dec 11 22:47:21 dy-base  video
  Dec 11 22:47:21 dy-base
  Dec 11 22:47:21 dy-base Pid: 317, comm: xfslogd/3 Not tainted 2.6.19 #1
  Dec 11 22:47:21 dy-base RIP: 0010:[803f3aba]
  Dec 11 22:47:21 dy-base  [803f3aba] spin_bug+0x69/0xdf
  Dec 11 22:47:21 dy-base RSP: 0018:81011fb89bc0  EFLAGS: 00010002
  Dec 11 22:47:21 dy-base RAX: 0033 RBX: 6b6b6b6b6b6b6b6b RCX:
  
  Dec 11 22:47:21 dy-base RDX: 808137a0 RSI: 0082 RDI:
  0001
  Dec 11 22:47:21 dy-base RBP: 81011fb89be0 R08: 00026a70 R09:
  6b6b6b6b
  Dec 11 22:47:21 dy-base R10: 0082 R11: 81000584d380 R12:
  8100db92ad80
  Dec 11 22:47:21 dy-base R13: 80642dc6 R14:  R15:
  0003
  Dec 11 22:47:21 dy-base FS:  ()
  GS:81011fc76b90() knlGS:
  Dec 11 22:47:21 dy-base CS:  0010 DS: 0018 ES: 0018 CR0:
8005003b
  Dec 11 22:47:21 dy-base CR2: 2ba00770 CR3: 000108c05000 CR4:
  06e0
  Dec 11 22:47:21 dy-base Process xfslogd/3 (pid: 317, threadinfo
  81011fb88000, task 81011fa7f830)
  Dec 11 22:47:21 dy-base Stack:
  Dec 11 22:47:21 dy-base  81011fb89be0
  Dec 11 22:47:21 dy-base  8100db92ad80
  Dec 11 22:47:21 dy-base  
  Dec 11 22:47:21 dy-base

Re: xfslogd-spinlock bug?

2006-12-12 Thread Haar János
Hello, Justin,

This is a 64bit system.

But i cannot understand, what is the curious? :-)

I am not a kernel developer, and not a C programmer, but the long pointers
shows me, the 64 bit.
Or am i on the wrong clue? :-)

Anyway, this issue happens for me about daily, or max 2-3 day often.
But i have no idea what cause this exactly.
The 2.6.16.18 was stable for me a long time, and one day starts to tricking
me, and happens more and more often.
Thats why i thinking some bad part of the (14TB) FS on the disks.

(this fs have a lot of errors, what the xfs_repair cannot be corrected, but
anyway this is a productive system, and works well, except this issue.
Some months before i have replaced the parity disk in one of the RAID4
array, and the next day, during the resync process, another one disk died.
I almost lost everything, but thanks to the raid4, and mdadm, i have
successfully recovered a lot of data with the 1 day older parity-only drive.
This was really bad, and leave some scars on the fs. )

This issue looks like for me a race condition between the cpus. (2x Xeon HT)

Am i right? :-)

Thanks,

Janos




- Original Message - 
From: "Justin Piszcz" <[EMAIL PROTECTED]>
To: "Haar János" <[EMAIL PROTECTED]>
Cc: <[EMAIL PROTECTED]>; 
Sent: Tuesday, December 12, 2006 3:32 PM
Subject: Re: xfslogd-spinlock bug?


I'm not sure what is causing this problem but I was curious is this on a
32bit or 64bit platform?

Justin.

On Tue, 12 Dec 2006, Haar János wrote:

> Hello, list,
>
> I am the "big red button men" with the one big 14TB xfs, if somebody can
> remember me. :-)
>
> Now i found something in the 2.6.16.18, and try the 2.6.18.4, and the
> 2.6.19, but the bug still exists:
>
> Dec 11 22:47:21 dy-base BUG: spinlock bad magic on CPU#3, xfslogd/3/317
> Dec 11 22:47:21 dy-base general protection fault:  [1]
> Dec 11 22:47:21 dy-base SMP
> Dec 11 22:47:21 dy-base
> Dec 11 22:47:21 dy-base CPU 3
> Dec 11 22:47:21 dy-base
> Dec 11 22:47:21 dy-base Modules linked in:
> Dec 11 22:47:21 dy-base  nbd
> Dec 11 22:47:21 dy-base  rd
> Dec 11 22:47:21 dy-base  netconsole
> Dec 11 22:47:21 dy-base  e1000
> Dec 11 22:47:21 dy-base  video
> Dec 11 22:47:21 dy-base
> Dec 11 22:47:21 dy-base Pid: 317, comm: xfslogd/3 Not tainted 2.6.19 #1
> Dec 11 22:47:21 dy-base RIP: 0010:[]
> Dec 11 22:47:21 dy-base  [] spin_bug+0x69/0xdf
> Dec 11 22:47:21 dy-base RSP: 0018:81011fb89bc0  EFLAGS: 00010002
> Dec 11 22:47:21 dy-base RAX: 0033 RBX: 6b6b6b6b6b6b6b6b RCX:
> 
> Dec 11 22:47:21 dy-base RDX: 808137a0 RSI: 0082 RDI:
> 0001
> Dec 11 22:47:21 dy-base RBP: 81011fb89be0 R08: 00026a70 R09:
> 6b6b6b6b
> Dec 11 22:47:21 dy-base R10: 0082 R11: 81000584d380 R12:
> 8100db92ad80
> Dec 11 22:47:21 dy-base R13: 80642dc6 R14:  R15:
> 0003
> Dec 11 22:47:21 dy-base FS:  ()
> GS:81011fc76b90() knlGS:
> Dec 11 22:47:21 dy-base CS:  0010 DS: 0018 ES: 0018 CR0: 8005003b
> Dec 11 22:47:21 dy-base CR2: 2ba00770 CR3: 000108c05000 CR4:
> 06e0
> Dec 11 22:47:21 dy-base Process xfslogd/3 (pid: 317, threadinfo
> 81011fb88000, task 81011fa7f830)
> Dec 11 22:47:21 dy-base Stack:
> Dec 11 22:47:21 dy-base  81011fb89be0
> Dec 11 22:47:21 dy-base  8100db92ad80
> Dec 11 22:47:21 dy-base  
> Dec 11 22:47:21 dy-base  
> Dec 11 22:47:21 dy-base
> Dec 11 22:47:21 dy-base  81011fb89c10
> Dec 11 22:47:21 dy-base  803f3bdc
> Dec 11 22:47:21 dy-base  0282
> Dec 11 22:47:21 dy-base  
> Dec 11 22:47:21 dy-base
> Dec 11 22:47:21 dy-base  
> Dec 11 22:47:21 dy-base  
> Dec 11 22:47:21 dy-base  81011fb89c30
> Dec 11 22:47:21 dy-base  805e7f2b
> Dec 11 22:47:21 dy-base
> Dec 11 22:47:21 dy-base Call Trace:
> Dec 11 22:47:21 dy-base  [] _raw_spin_lock+0x23/0xf1
> Dec 11 22:47:21 dy-base  [] _spin_lock_irqsave+0x11/0x18
> Dec 11 22:47:21 dy-base  [] __wake_up+0x22/0x50
> Dec 11 22:47:21 dy-base  [] xfs_buf_unpin+0x21/0x23
> Dec 11 22:47:21 dy-base  [] xfs_buf_item_unpin+0x2e/0xa6
> Dec 11 22:47:21 dy-base  []
> xfs_trans_chunk_committed+0xc3/0xf7
> Dec 11 22:47:21 dy-base  []
xfs_trans_committed+0x49/0xde
> Dec 11 22:47:21 dy-base  []
> xlog_state_do_callback+0x185/0x33f
> Dec 11 22:47:21 dy-base  [] xlog_iodone+0x104/0x131
> Dec 11 22:47:22 dy-base  []
xfs_buf_iodone_work+0x1a/0x3e
> Dec 11 22:47:22 dy-base  [] worker_thread+0x0/0x134
> Dec 11 22:47:22 dy-base  [] run_workqueue+0xa8/0xf8
> Dec 11 22:47:22 dy-base  [] xfs_buf_iodone_work+0x0/0x3e
> Dec 11 22:47:2

Re: xfslogd-spinlock bug?

2006-12-12 Thread Justin Piszcz
I'm not sure what is causing this problem but I was curious is this on a 
32bit or 64bit platform?

Justin.

On Tue, 12 Dec 2006, Haar János wrote:

> Hello, list,
> 
> I am the "big red button men" with the one big 14TB xfs, if somebody can
> remember me. :-)
> 
> Now i found something in the 2.6.16.18, and try the 2.6.18.4, and the
> 2.6.19, but the bug still exists:
> 
> Dec 11 22:47:21 dy-base BUG: spinlock bad magic on CPU#3, xfslogd/3/317
> Dec 11 22:47:21 dy-base general protection fault:  [1]
> Dec 11 22:47:21 dy-base SMP
> Dec 11 22:47:21 dy-base
> Dec 11 22:47:21 dy-base CPU 3
> Dec 11 22:47:21 dy-base
> Dec 11 22:47:21 dy-base Modules linked in:
> Dec 11 22:47:21 dy-base  nbd
> Dec 11 22:47:21 dy-base  rd
> Dec 11 22:47:21 dy-base  netconsole
> Dec 11 22:47:21 dy-base  e1000
> Dec 11 22:47:21 dy-base  video
> Dec 11 22:47:21 dy-base
> Dec 11 22:47:21 dy-base Pid: 317, comm: xfslogd/3 Not tainted 2.6.19 #1
> Dec 11 22:47:21 dy-base RIP: 0010:[]
> Dec 11 22:47:21 dy-base  [] spin_bug+0x69/0xdf
> Dec 11 22:47:21 dy-base RSP: 0018:81011fb89bc0  EFLAGS: 00010002
> Dec 11 22:47:21 dy-base RAX: 0033 RBX: 6b6b6b6b6b6b6b6b RCX:
> 
> Dec 11 22:47:21 dy-base RDX: 808137a0 RSI: 0082 RDI:
> 0001
> Dec 11 22:47:21 dy-base RBP: 81011fb89be0 R08: 00026a70 R09:
> 6b6b6b6b
> Dec 11 22:47:21 dy-base R10: 0082 R11: 81000584d380 R12:
> 8100db92ad80
> Dec 11 22:47:21 dy-base R13: 80642dc6 R14:  R15:
> 0003
> Dec 11 22:47:21 dy-base FS:  ()
> GS:81011fc76b90() knlGS:
> Dec 11 22:47:21 dy-base CS:  0010 DS: 0018 ES: 0018 CR0: 8005003b
> Dec 11 22:47:21 dy-base CR2: 2ba00770 CR3: 000108c05000 CR4:
> 06e0
> Dec 11 22:47:21 dy-base Process xfslogd/3 (pid: 317, threadinfo
> 81011fb88000, task 81011fa7f830)
> Dec 11 22:47:21 dy-base Stack:
> Dec 11 22:47:21 dy-base  81011fb89be0
> Dec 11 22:47:21 dy-base  8100db92ad80
> Dec 11 22:47:21 dy-base  
> Dec 11 22:47:21 dy-base  
> Dec 11 22:47:21 dy-base
> Dec 11 22:47:21 dy-base  81011fb89c10
> Dec 11 22:47:21 dy-base  803f3bdc
> Dec 11 22:47:21 dy-base  0282
> Dec 11 22:47:21 dy-base  
> Dec 11 22:47:21 dy-base
> Dec 11 22:47:21 dy-base  
> Dec 11 22:47:21 dy-base  
> Dec 11 22:47:21 dy-base  81011fb89c30
> Dec 11 22:47:21 dy-base  805e7f2b
> Dec 11 22:47:21 dy-base
> Dec 11 22:47:21 dy-base Call Trace:
> Dec 11 22:47:21 dy-base  [] _raw_spin_lock+0x23/0xf1
> Dec 11 22:47:21 dy-base  [] _spin_lock_irqsave+0x11/0x18
> Dec 11 22:47:21 dy-base  [] __wake_up+0x22/0x50
> Dec 11 22:47:21 dy-base  [] xfs_buf_unpin+0x21/0x23
> Dec 11 22:47:21 dy-base  [] xfs_buf_item_unpin+0x2e/0xa6
> Dec 11 22:47:21 dy-base  []
> xfs_trans_chunk_committed+0xc3/0xf7
> Dec 11 22:47:21 dy-base  [] xfs_trans_committed+0x49/0xde
> Dec 11 22:47:21 dy-base  []
> xlog_state_do_callback+0x185/0x33f
> Dec 11 22:47:21 dy-base  [] xlog_iodone+0x104/0x131
> Dec 11 22:47:22 dy-base  [] xfs_buf_iodone_work+0x1a/0x3e
> Dec 11 22:47:22 dy-base  [] worker_thread+0x0/0x134
> Dec 11 22:47:22 dy-base  [] run_workqueue+0xa8/0xf8
> Dec 11 22:47:22 dy-base  [] xfs_buf_iodone_work+0x0/0x3e
> Dec 11 22:47:22 dy-base  [] worker_thread+0x0/0x134
> Dec 11 22:47:22 dy-base  [] worker_thread+0xfb/0x134
> Dec 11 22:47:22 dy-base  [] default_wake_function+0x0/0xf
> Dec 11 22:47:22 dy-base  [] worker_thread+0x0/0x134
> Dec 11 22:47:22 dy-base  [] kthread+0xd8/0x10b
> Dec 11 22:47:22 dy-base  [] schedule_tail+0x45/0xa6
> Dec 11 22:47:22 dy-base  [] child_rip+0xa/0x12
> Dec 11 22:47:22 dy-base  [] worker_thread+0x0/0x134
> Dec 11 22:47:22 dy-base  [] kthread+0x0/0x10b
> Dec 11 22:47:22 dy-base  [] child_rip+0x0/0x12
> Dec 11 22:47:22 dy-base
> Dec 11 22:47:22 dy-base
> Dec 11 22:47:22 dy-base Code:
> Dec 11 22:47:22 dy-base 8b
> Dec 11 22:47:22 dy-base 83
> Dec 11 22:47:22 dy-base 0c
> Dec 11 22:47:22 dy-base 01
> Dec 11 22:47:22 dy-base 00
> Dec 11 22:47:22 dy-base 00
> Dec 11 22:47:22 dy-base 48
> Dec 11 22:47:22 dy-base 8d
> Dec 11 22:47:22 dy-base 8b
> Dec 11 22:47:22 dy-base 98
> Dec 11 22:47:22 dy-base 02
> Dec 11 22:47:22 dy-base 00
> Dec 11 22:47:22 dy-base 00
> Dec 11 22:47:22 dy-base 41
> Dec 11 22:47:22 dy-base 8b
> Dec 11 22:47:22 dy-base 54
> Dec 11 22:47:22 dy-base 24
> Dec 11 22:47:22 dy-base 04
> Dec 11 22:47:22 dy-base 41
> Dec 11 22:47:22 dy-base 89
> Dec 11 22:47:22 dy-base
> Dec 11 22:47:22 dy-base RIP
> Dec 11 22:47:22 dy-base  [] spin_bug+0x69/0xdf
> Dec 11 22:47:22 dy-base  RSP 
> Dec 11 22:47:22 dy-base
> Dec 11 22:47:22 dy-base Kernel panic - not syncing: Fatal exception
> Dec 11 22:47:22 dy-base
> Dec 11 22:47:22 dy-base Rebooting in 5 seconds..
> 
> After this, sometimes the server reboots normally, but sometimes hangs, no
> console, no sysreq, no nothing.
> 
> 

Re: xfslogd-spinlock bug?

2006-12-12 Thread Justin Piszcz
I'm not sure what is causing this problem but I was curious is this on a 
32bit or 64bit platform?

Justin.

On Tue, 12 Dec 2006, Haar János wrote:

 Hello, list,
 
 I am the big red button men with the one big 14TB xfs, if somebody can
 remember me. :-)
 
 Now i found something in the 2.6.16.18, and try the 2.6.18.4, and the
 2.6.19, but the bug still exists:
 
 Dec 11 22:47:21 dy-base BUG: spinlock bad magic on CPU#3, xfslogd/3/317
 Dec 11 22:47:21 dy-base general protection fault:  [1]
 Dec 11 22:47:21 dy-base SMP
 Dec 11 22:47:21 dy-base
 Dec 11 22:47:21 dy-base CPU 3
 Dec 11 22:47:21 dy-base
 Dec 11 22:47:21 dy-base Modules linked in:
 Dec 11 22:47:21 dy-base  nbd
 Dec 11 22:47:21 dy-base  rd
 Dec 11 22:47:21 dy-base  netconsole
 Dec 11 22:47:21 dy-base  e1000
 Dec 11 22:47:21 dy-base  video
 Dec 11 22:47:21 dy-base
 Dec 11 22:47:21 dy-base Pid: 317, comm: xfslogd/3 Not tainted 2.6.19 #1
 Dec 11 22:47:21 dy-base RIP: 0010:[803f3aba]
 Dec 11 22:47:21 dy-base  [803f3aba] spin_bug+0x69/0xdf
 Dec 11 22:47:21 dy-base RSP: 0018:81011fb89bc0  EFLAGS: 00010002
 Dec 11 22:47:21 dy-base RAX: 0033 RBX: 6b6b6b6b6b6b6b6b RCX:
 
 Dec 11 22:47:21 dy-base RDX: 808137a0 RSI: 0082 RDI:
 0001
 Dec 11 22:47:21 dy-base RBP: 81011fb89be0 R08: 00026a70 R09:
 6b6b6b6b
 Dec 11 22:47:21 dy-base R10: 0082 R11: 81000584d380 R12:
 8100db92ad80
 Dec 11 22:47:21 dy-base R13: 80642dc6 R14:  R15:
 0003
 Dec 11 22:47:21 dy-base FS:  ()
 GS:81011fc76b90() knlGS:
 Dec 11 22:47:21 dy-base CS:  0010 DS: 0018 ES: 0018 CR0: 8005003b
 Dec 11 22:47:21 dy-base CR2: 2ba00770 CR3: 000108c05000 CR4:
 06e0
 Dec 11 22:47:21 dy-base Process xfslogd/3 (pid: 317, threadinfo
 81011fb88000, task 81011fa7f830)
 Dec 11 22:47:21 dy-base Stack:
 Dec 11 22:47:21 dy-base  81011fb89be0
 Dec 11 22:47:21 dy-base  8100db92ad80
 Dec 11 22:47:21 dy-base  
 Dec 11 22:47:21 dy-base  
 Dec 11 22:47:21 dy-base
 Dec 11 22:47:21 dy-base  81011fb89c10
 Dec 11 22:47:21 dy-base  803f3bdc
 Dec 11 22:47:21 dy-base  0282
 Dec 11 22:47:21 dy-base  
 Dec 11 22:47:21 dy-base
 Dec 11 22:47:21 dy-base  
 Dec 11 22:47:21 dy-base  
 Dec 11 22:47:21 dy-base  81011fb89c30
 Dec 11 22:47:21 dy-base  805e7f2b
 Dec 11 22:47:21 dy-base
 Dec 11 22:47:21 dy-base Call Trace:
 Dec 11 22:47:21 dy-base  [803f3bdc] _raw_spin_lock+0x23/0xf1
 Dec 11 22:47:21 dy-base  [805e7f2b] _spin_lock_irqsave+0x11/0x18
 Dec 11 22:47:21 dy-base  [80222aab] __wake_up+0x22/0x50
 Dec 11 22:47:21 dy-base  [803c97f9] xfs_buf_unpin+0x21/0x23
 Dec 11 22:47:21 dy-base  [803970a4] xfs_buf_item_unpin+0x2e/0xa6
 Dec 11 22:47:21 dy-base  [803bc460]
 xfs_trans_chunk_committed+0xc3/0xf7
 Dec 11 22:47:21 dy-base  [803bc4dd] xfs_trans_committed+0x49/0xde
 Dec 11 22:47:21 dy-base  [803b1bde]
 xlog_state_do_callback+0x185/0x33f
 Dec 11 22:47:21 dy-base  [803b1e9c] xlog_iodone+0x104/0x131
 Dec 11 22:47:22 dy-base  [803c9dae] xfs_buf_iodone_work+0x1a/0x3e
 Dec 11 22:47:22 dy-base  [802399d8] worker_thread+0x0/0x134
 Dec 11 22:47:22 dy-base  [8023937e] run_workqueue+0xa8/0xf8
 Dec 11 22:47:22 dy-base  [803c9d94] xfs_buf_iodone_work+0x0/0x3e
 Dec 11 22:47:22 dy-base  [802399d8] worker_thread+0x0/0x134
 Dec 11 22:47:22 dy-base  [80239ad3] worker_thread+0xfb/0x134
 Dec 11 22:47:22 dy-base  [80223f6c] default_wake_function+0x0/0xf
 Dec 11 22:47:22 dy-base  [802399d8] worker_thread+0x0/0x134
 Dec 11 22:47:22 dy-base  [8023c6e5] kthread+0xd8/0x10b
 Dec 11 22:47:22 dy-base  [802256ac] schedule_tail+0x45/0xa6
 Dec 11 22:47:22 dy-base  [8020a6a8] child_rip+0xa/0x12
 Dec 11 22:47:22 dy-base  [802399d8] worker_thread+0x0/0x134
 Dec 11 22:47:22 dy-base  [8023c60d] kthread+0x0/0x10b
 Dec 11 22:47:22 dy-base  [8020a69e] child_rip+0x0/0x12
 Dec 11 22:47:22 dy-base
 Dec 11 22:47:22 dy-base
 Dec 11 22:47:22 dy-base Code:
 Dec 11 22:47:22 dy-base 8b
 Dec 11 22:47:22 dy-base 83
 Dec 11 22:47:22 dy-base 0c
 Dec 11 22:47:22 dy-base 01
 Dec 11 22:47:22 dy-base 00
 Dec 11 22:47:22 dy-base 00
 Dec 11 22:47:22 dy-base 48
 Dec 11 22:47:22 dy-base 8d
 Dec 11 22:47:22 dy-base 8b
 Dec 11 22:47:22 dy-base 98
 Dec 11 22:47:22 dy-base 02
 Dec 11 22:47:22 dy-base 00
 Dec 11 22:47:22 dy-base 00
 Dec 11 22:47:22 dy-base 41
 Dec 11 22:47:22 dy-base 8b
 Dec 11 22:47:22 dy-base 54
 Dec 11 22:47:22 dy-base 24
 Dec 11 22:47:22 dy-base 04
 Dec 11 22:47:22 dy-base 41
 Dec 11 22:47:22 dy-base 89
 Dec 11 22:47:22 dy-base
 Dec 11 22:47:22 dy-base RIP
 Dec 11 22:47:22 dy-base  [803f3aba] spin_bug+0x69/0xdf
 Dec 11 22:47:22 

Re: xfslogd-spinlock bug?

2006-12-12 Thread Haar János
Hello, Justin,

This is a 64bit system.

But i cannot understand, what is the curious? :-)

I am not a kernel developer, and not a C programmer, but the long pointers
shows me, the 64 bit.
Or am i on the wrong clue? :-)

Anyway, this issue happens for me about daily, or max 2-3 day often.
But i have no idea what cause this exactly.
The 2.6.16.18 was stable for me a long time, and one day starts to tricking
me, and happens more and more often.
Thats why i thinking some bad part of the (14TB) FS on the disks.

(this fs have a lot of errors, what the xfs_repair cannot be corrected, but
anyway this is a productive system, and works well, except this issue.
Some months before i have replaced the parity disk in one of the RAID4
array, and the next day, during the resync process, another one disk died.
I almost lost everything, but thanks to the raid4, and mdadm, i have
successfully recovered a lot of data with the 1 day older parity-only drive.
This was really bad, and leave some scars on the fs. )

This issue looks like for me a race condition between the cpus. (2x Xeon HT)

Am i right? :-)

Thanks,

Janos




- Original Message - 
From: Justin Piszcz [EMAIL PROTECTED]
To: Haar János [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]; linux-kernel@vger.kernel.org
Sent: Tuesday, December 12, 2006 3:32 PM
Subject: Re: xfslogd-spinlock bug?


I'm not sure what is causing this problem but I was curious is this on a
32bit or 64bit platform?

Justin.

On Tue, 12 Dec 2006, Haar János wrote:

 Hello, list,

 I am the big red button men with the one big 14TB xfs, if somebody can
 remember me. :-)

 Now i found something in the 2.6.16.18, and try the 2.6.18.4, and the
 2.6.19, but the bug still exists:

 Dec 11 22:47:21 dy-base BUG: spinlock bad magic on CPU#3, xfslogd/3/317
 Dec 11 22:47:21 dy-base general protection fault:  [1]
 Dec 11 22:47:21 dy-base SMP
 Dec 11 22:47:21 dy-base
 Dec 11 22:47:21 dy-base CPU 3
 Dec 11 22:47:21 dy-base
 Dec 11 22:47:21 dy-base Modules linked in:
 Dec 11 22:47:21 dy-base  nbd
 Dec 11 22:47:21 dy-base  rd
 Dec 11 22:47:21 dy-base  netconsole
 Dec 11 22:47:21 dy-base  e1000
 Dec 11 22:47:21 dy-base  video
 Dec 11 22:47:21 dy-base
 Dec 11 22:47:21 dy-base Pid: 317, comm: xfslogd/3 Not tainted 2.6.19 #1
 Dec 11 22:47:21 dy-base RIP: 0010:[803f3aba]
 Dec 11 22:47:21 dy-base  [803f3aba] spin_bug+0x69/0xdf
 Dec 11 22:47:21 dy-base RSP: 0018:81011fb89bc0  EFLAGS: 00010002
 Dec 11 22:47:21 dy-base RAX: 0033 RBX: 6b6b6b6b6b6b6b6b RCX:
 
 Dec 11 22:47:21 dy-base RDX: 808137a0 RSI: 0082 RDI:
 0001
 Dec 11 22:47:21 dy-base RBP: 81011fb89be0 R08: 00026a70 R09:
 6b6b6b6b
 Dec 11 22:47:21 dy-base R10: 0082 R11: 81000584d380 R12:
 8100db92ad80
 Dec 11 22:47:21 dy-base R13: 80642dc6 R14:  R15:
 0003
 Dec 11 22:47:21 dy-base FS:  ()
 GS:81011fc76b90() knlGS:
 Dec 11 22:47:21 dy-base CS:  0010 DS: 0018 ES: 0018 CR0: 8005003b
 Dec 11 22:47:21 dy-base CR2: 2ba00770 CR3: 000108c05000 CR4:
 06e0
 Dec 11 22:47:21 dy-base Process xfslogd/3 (pid: 317, threadinfo
 81011fb88000, task 81011fa7f830)
 Dec 11 22:47:21 dy-base Stack:
 Dec 11 22:47:21 dy-base  81011fb89be0
 Dec 11 22:47:21 dy-base  8100db92ad80
 Dec 11 22:47:21 dy-base  
 Dec 11 22:47:21 dy-base  
 Dec 11 22:47:21 dy-base
 Dec 11 22:47:21 dy-base  81011fb89c10
 Dec 11 22:47:21 dy-base  803f3bdc
 Dec 11 22:47:21 dy-base  0282
 Dec 11 22:47:21 dy-base  
 Dec 11 22:47:21 dy-base
 Dec 11 22:47:21 dy-base  
 Dec 11 22:47:21 dy-base  
 Dec 11 22:47:21 dy-base  81011fb89c30
 Dec 11 22:47:21 dy-base  805e7f2b
 Dec 11 22:47:21 dy-base
 Dec 11 22:47:21 dy-base Call Trace:
 Dec 11 22:47:21 dy-base  [803f3bdc] _raw_spin_lock+0x23/0xf1
 Dec 11 22:47:21 dy-base  [805e7f2b] _spin_lock_irqsave+0x11/0x18
 Dec 11 22:47:21 dy-base  [80222aab] __wake_up+0x22/0x50
 Dec 11 22:47:21 dy-base  [803c97f9] xfs_buf_unpin+0x21/0x23
 Dec 11 22:47:21 dy-base  [803970a4] xfs_buf_item_unpin+0x2e/0xa6
 Dec 11 22:47:21 dy-base  [803bc460]
 xfs_trans_chunk_committed+0xc3/0xf7
 Dec 11 22:47:21 dy-base  [803bc4dd]
xfs_trans_committed+0x49/0xde
 Dec 11 22:47:21 dy-base  [803b1bde]
 xlog_state_do_callback+0x185/0x33f
 Dec 11 22:47:21 dy-base  [803b1e9c] xlog_iodone+0x104/0x131
 Dec 11 22:47:22 dy-base  [803c9dae]
xfs_buf_iodone_work+0x1a/0x3e
 Dec 11 22:47:22 dy-base  [802399d8] worker_thread+0x0/0x134
 Dec 11 22:47:22 dy-base  [8023937e] run_workqueue+0xa8/0xf8
 Dec 11 22:47:22 dy-base  [803c9d94] xfs_buf_iodone_work+0x0/0x3e
 Dec 11 22:47:22 dy-base  [802399d8] worker_thread+0x0/0x134
 Dec 11 22:47:22 dy-base

xfslogd-spinlock bug?

2006-12-11 Thread Haar János
Hello, list,

I am the "big red button men" with the one big 14TB xfs, if somebody can
remember me. :-)

Now i found something in the 2.6.16.18, and try the 2.6.18.4, and the
2.6.19, but the bug still exists:

Dec 11 22:47:21 dy-base BUG: spinlock bad magic on CPU#3, xfslogd/3/317
Dec 11 22:47:21 dy-base general protection fault:  [1]
Dec 11 22:47:21 dy-base SMP
Dec 11 22:47:21 dy-base
Dec 11 22:47:21 dy-base CPU 3
Dec 11 22:47:21 dy-base
Dec 11 22:47:21 dy-base Modules linked in:
Dec 11 22:47:21 dy-base  nbd
Dec 11 22:47:21 dy-base  rd
Dec 11 22:47:21 dy-base  netconsole
Dec 11 22:47:21 dy-base  e1000
Dec 11 22:47:21 dy-base  video
Dec 11 22:47:21 dy-base
Dec 11 22:47:21 dy-base Pid: 317, comm: xfslogd/3 Not tainted 2.6.19 #1
Dec 11 22:47:21 dy-base RIP: 0010:[]
Dec 11 22:47:21 dy-base  [] spin_bug+0x69/0xdf
Dec 11 22:47:21 dy-base RSP: 0018:81011fb89bc0  EFLAGS: 00010002
Dec 11 22:47:21 dy-base RAX: 0033 RBX: 6b6b6b6b6b6b6b6b RCX:

Dec 11 22:47:21 dy-base RDX: 808137a0 RSI: 0082 RDI:
0001
Dec 11 22:47:21 dy-base RBP: 81011fb89be0 R08: 00026a70 R09:
6b6b6b6b
Dec 11 22:47:21 dy-base R10: 0082 R11: 81000584d380 R12:
8100db92ad80
Dec 11 22:47:21 dy-base R13: 80642dc6 R14:  R15:
0003
Dec 11 22:47:21 dy-base FS:  ()
GS:81011fc76b90() knlGS:
Dec 11 22:47:21 dy-base CS:  0010 DS: 0018 ES: 0018 CR0: 8005003b
Dec 11 22:47:21 dy-base CR2: 2ba00770 CR3: 000108c05000 CR4:
06e0
Dec 11 22:47:21 dy-base Process xfslogd/3 (pid: 317, threadinfo
81011fb88000, task 81011fa7f830)
Dec 11 22:47:21 dy-base Stack:
Dec 11 22:47:21 dy-base  81011fb89be0
Dec 11 22:47:21 dy-base  8100db92ad80
Dec 11 22:47:21 dy-base  
Dec 11 22:47:21 dy-base  
Dec 11 22:47:21 dy-base
Dec 11 22:47:21 dy-base  81011fb89c10
Dec 11 22:47:21 dy-base  803f3bdc
Dec 11 22:47:21 dy-base  0282
Dec 11 22:47:21 dy-base  
Dec 11 22:47:21 dy-base
Dec 11 22:47:21 dy-base  
Dec 11 22:47:21 dy-base  
Dec 11 22:47:21 dy-base  81011fb89c30
Dec 11 22:47:21 dy-base  805e7f2b
Dec 11 22:47:21 dy-base
Dec 11 22:47:21 dy-base Call Trace:
Dec 11 22:47:21 dy-base  [] _raw_spin_lock+0x23/0xf1
Dec 11 22:47:21 dy-base  [] _spin_lock_irqsave+0x11/0x18
Dec 11 22:47:21 dy-base  [] __wake_up+0x22/0x50
Dec 11 22:47:21 dy-base  [] xfs_buf_unpin+0x21/0x23
Dec 11 22:47:21 dy-base  [] xfs_buf_item_unpin+0x2e/0xa6
Dec 11 22:47:21 dy-base  []
xfs_trans_chunk_committed+0xc3/0xf7
Dec 11 22:47:21 dy-base  [] xfs_trans_committed+0x49/0xde
Dec 11 22:47:21 dy-base  []
xlog_state_do_callback+0x185/0x33f
Dec 11 22:47:21 dy-base  [] xlog_iodone+0x104/0x131
Dec 11 22:47:22 dy-base  [] xfs_buf_iodone_work+0x1a/0x3e
Dec 11 22:47:22 dy-base  [] worker_thread+0x0/0x134
Dec 11 22:47:22 dy-base  [] run_workqueue+0xa8/0xf8
Dec 11 22:47:22 dy-base  [] xfs_buf_iodone_work+0x0/0x3e
Dec 11 22:47:22 dy-base  [] worker_thread+0x0/0x134
Dec 11 22:47:22 dy-base  [] worker_thread+0xfb/0x134
Dec 11 22:47:22 dy-base  [] default_wake_function+0x0/0xf
Dec 11 22:47:22 dy-base  [] worker_thread+0x0/0x134
Dec 11 22:47:22 dy-base  [] kthread+0xd8/0x10b
Dec 11 22:47:22 dy-base  [] schedule_tail+0x45/0xa6
Dec 11 22:47:22 dy-base  [] child_rip+0xa/0x12
Dec 11 22:47:22 dy-base  [] worker_thread+0x0/0x134
Dec 11 22:47:22 dy-base  [] kthread+0x0/0x10b
Dec 11 22:47:22 dy-base  [] child_rip+0x0/0x12
Dec 11 22:47:22 dy-base
Dec 11 22:47:22 dy-base
Dec 11 22:47:22 dy-base Code:
Dec 11 22:47:22 dy-base 8b
Dec 11 22:47:22 dy-base 83
Dec 11 22:47:22 dy-base 0c
Dec 11 22:47:22 dy-base 01
Dec 11 22:47:22 dy-base 00
Dec 11 22:47:22 dy-base 00
Dec 11 22:47:22 dy-base 48
Dec 11 22:47:22 dy-base 8d
Dec 11 22:47:22 dy-base 8b
Dec 11 22:47:22 dy-base 98
Dec 11 22:47:22 dy-base 02
Dec 11 22:47:22 dy-base 00
Dec 11 22:47:22 dy-base 00
Dec 11 22:47:22 dy-base 41
Dec 11 22:47:22 dy-base 8b
Dec 11 22:47:22 dy-base 54
Dec 11 22:47:22 dy-base 24
Dec 11 22:47:22 dy-base 04
Dec 11 22:47:22 dy-base 41
Dec 11 22:47:22 dy-base 89
Dec 11 22:47:22 dy-base
Dec 11 22:47:22 dy-base RIP
Dec 11 22:47:22 dy-base  [] spin_bug+0x69/0xdf
Dec 11 22:47:22 dy-base  RSP 
Dec 11 22:47:22 dy-base
Dec 11 22:47:22 dy-base Kernel panic - not syncing: Fatal exception
Dec 11 22:47:22 dy-base
Dec 11 22:47:22 dy-base Rebooting in 5 seconds..

After this, sometimes the server reboots normally, but sometimes hangs, no
console, no sysreq, no nothing.

This is a "simple" crash, no "too much" data lost, or else.

Can somebody help me to tracking down the problem?

Thanks,
Janos Haar



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


xfslogd-spinlock bug?

2006-12-11 Thread Haar János
Hello, list,

I am the big red button men with the one big 14TB xfs, if somebody can
remember me. :-)

Now i found something in the 2.6.16.18, and try the 2.6.18.4, and the
2.6.19, but the bug still exists:

Dec 11 22:47:21 dy-base BUG: spinlock bad magic on CPU#3, xfslogd/3/317
Dec 11 22:47:21 dy-base general protection fault:  [1]
Dec 11 22:47:21 dy-base SMP
Dec 11 22:47:21 dy-base
Dec 11 22:47:21 dy-base CPU 3
Dec 11 22:47:21 dy-base
Dec 11 22:47:21 dy-base Modules linked in:
Dec 11 22:47:21 dy-base  nbd
Dec 11 22:47:21 dy-base  rd
Dec 11 22:47:21 dy-base  netconsole
Dec 11 22:47:21 dy-base  e1000
Dec 11 22:47:21 dy-base  video
Dec 11 22:47:21 dy-base
Dec 11 22:47:21 dy-base Pid: 317, comm: xfslogd/3 Not tainted 2.6.19 #1
Dec 11 22:47:21 dy-base RIP: 0010:[803f3aba]
Dec 11 22:47:21 dy-base  [803f3aba] spin_bug+0x69/0xdf
Dec 11 22:47:21 dy-base RSP: 0018:81011fb89bc0  EFLAGS: 00010002
Dec 11 22:47:21 dy-base RAX: 0033 RBX: 6b6b6b6b6b6b6b6b RCX:

Dec 11 22:47:21 dy-base RDX: 808137a0 RSI: 0082 RDI:
0001
Dec 11 22:47:21 dy-base RBP: 81011fb89be0 R08: 00026a70 R09:
6b6b6b6b
Dec 11 22:47:21 dy-base R10: 0082 R11: 81000584d380 R12:
8100db92ad80
Dec 11 22:47:21 dy-base R13: 80642dc6 R14:  R15:
0003
Dec 11 22:47:21 dy-base FS:  ()
GS:81011fc76b90() knlGS:
Dec 11 22:47:21 dy-base CS:  0010 DS: 0018 ES: 0018 CR0: 8005003b
Dec 11 22:47:21 dy-base CR2: 2ba00770 CR3: 000108c05000 CR4:
06e0
Dec 11 22:47:21 dy-base Process xfslogd/3 (pid: 317, threadinfo
81011fb88000, task 81011fa7f830)
Dec 11 22:47:21 dy-base Stack:
Dec 11 22:47:21 dy-base  81011fb89be0
Dec 11 22:47:21 dy-base  8100db92ad80
Dec 11 22:47:21 dy-base  
Dec 11 22:47:21 dy-base  
Dec 11 22:47:21 dy-base
Dec 11 22:47:21 dy-base  81011fb89c10
Dec 11 22:47:21 dy-base  803f3bdc
Dec 11 22:47:21 dy-base  0282
Dec 11 22:47:21 dy-base  
Dec 11 22:47:21 dy-base
Dec 11 22:47:21 dy-base  
Dec 11 22:47:21 dy-base  
Dec 11 22:47:21 dy-base  81011fb89c30
Dec 11 22:47:21 dy-base  805e7f2b
Dec 11 22:47:21 dy-base
Dec 11 22:47:21 dy-base Call Trace:
Dec 11 22:47:21 dy-base  [803f3bdc] _raw_spin_lock+0x23/0xf1
Dec 11 22:47:21 dy-base  [805e7f2b] _spin_lock_irqsave+0x11/0x18
Dec 11 22:47:21 dy-base  [80222aab] __wake_up+0x22/0x50
Dec 11 22:47:21 dy-base  [803c97f9] xfs_buf_unpin+0x21/0x23
Dec 11 22:47:21 dy-base  [803970a4] xfs_buf_item_unpin+0x2e/0xa6
Dec 11 22:47:21 dy-base  [803bc460]
xfs_trans_chunk_committed+0xc3/0xf7
Dec 11 22:47:21 dy-base  [803bc4dd] xfs_trans_committed+0x49/0xde
Dec 11 22:47:21 dy-base  [803b1bde]
xlog_state_do_callback+0x185/0x33f
Dec 11 22:47:21 dy-base  [803b1e9c] xlog_iodone+0x104/0x131
Dec 11 22:47:22 dy-base  [803c9dae] xfs_buf_iodone_work+0x1a/0x3e
Dec 11 22:47:22 dy-base  [802399d8] worker_thread+0x0/0x134
Dec 11 22:47:22 dy-base  [8023937e] run_workqueue+0xa8/0xf8
Dec 11 22:47:22 dy-base  [803c9d94] xfs_buf_iodone_work+0x0/0x3e
Dec 11 22:47:22 dy-base  [802399d8] worker_thread+0x0/0x134
Dec 11 22:47:22 dy-base  [80239ad3] worker_thread+0xfb/0x134
Dec 11 22:47:22 dy-base  [80223f6c] default_wake_function+0x0/0xf
Dec 11 22:47:22 dy-base  [802399d8] worker_thread+0x0/0x134
Dec 11 22:47:22 dy-base  [8023c6e5] kthread+0xd8/0x10b
Dec 11 22:47:22 dy-base  [802256ac] schedule_tail+0x45/0xa6
Dec 11 22:47:22 dy-base  [8020a6a8] child_rip+0xa/0x12
Dec 11 22:47:22 dy-base  [802399d8] worker_thread+0x0/0x134
Dec 11 22:47:22 dy-base  [8023c60d] kthread+0x0/0x10b
Dec 11 22:47:22 dy-base  [8020a69e] child_rip+0x0/0x12
Dec 11 22:47:22 dy-base
Dec 11 22:47:22 dy-base
Dec 11 22:47:22 dy-base Code:
Dec 11 22:47:22 dy-base 8b
Dec 11 22:47:22 dy-base 83
Dec 11 22:47:22 dy-base 0c
Dec 11 22:47:22 dy-base 01
Dec 11 22:47:22 dy-base 00
Dec 11 22:47:22 dy-base 00
Dec 11 22:47:22 dy-base 48
Dec 11 22:47:22 dy-base 8d
Dec 11 22:47:22 dy-base 8b
Dec 11 22:47:22 dy-base 98
Dec 11 22:47:22 dy-base 02
Dec 11 22:47:22 dy-base 00
Dec 11 22:47:22 dy-base 00
Dec 11 22:47:22 dy-base 41
Dec 11 22:47:22 dy-base 8b
Dec 11 22:47:22 dy-base 54
Dec 11 22:47:22 dy-base 24
Dec 11 22:47:22 dy-base 04
Dec 11 22:47:22 dy-base 41
Dec 11 22:47:22 dy-base 89
Dec 11 22:47:22 dy-base
Dec 11 22:47:22 dy-base RIP
Dec 11 22:47:22 dy-base  [803f3aba] spin_bug+0x69/0xdf
Dec 11 22:47:22 dy-base  RSP 81011fb89bc0
Dec 11 22:47:22 dy-base
Dec 11 22:47:22 dy-base Kernel panic - not syncing: Fatal exception
Dec 11 22:47:22 dy-base
Dec 11 22:47:22 dy-base Rebooting in 5 seconds..

After this, sometimes the server reboots normally, but sometimes