Re: 2.6.24-rc4-mm1: some issues on sparc64

2007-12-09 Thread Andrew Morton
On Sun, 09 Dec 2007 00:45:17 -0800 (PST) David Miller <[EMAIL PROTECTED]> wrote:

> From: Andrew Morton <[EMAIL PROTECTED]>
> Date: Sat, 8 Dec 2007 10:22:39 -0800
> 
> > That's
> > 
> > J_ASSERT_BH(bh, !buffer_jbddirty(bh));
> > 
> > at the end of journal_unmap_buffer().
> > 
> > I don't recall seeing that before and I can't think of anything we've
> > done recently which could cause it, sorry.
> 
> If the per-cpu data patches are in the -mm tree that is the first
> place I would start looking at for possible cause.

They aren't.  The dust hadn't settled enough on those when Christoph shot
through on vacation.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc4-mm1: some issues on sparc64

2007-12-09 Thread David Miller
From: Andrew Morton <[EMAIL PROTECTED]>
Date: Sat, 8 Dec 2007 10:22:39 -0800

> That's
> 
> J_ASSERT_BH(bh, !buffer_jbddirty(bh));
> 
> at the end of journal_unmap_buffer().
> 
> I don't recall seeing that before and I can't think of anything we've
> done recently which could cause it, sorry.

If the per-cpu data patches are in the -mm tree that is the first
place I would start looking at for possible cause.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc4-mm1: some issues on sparc64

2007-12-09 Thread David Miller
From: Andrew Morton [EMAIL PROTECTED]
Date: Sat, 8 Dec 2007 10:22:39 -0800

 That's
 
 J_ASSERT_BH(bh, !buffer_jbddirty(bh));
 
 at the end of journal_unmap_buffer().
 
 I don't recall seeing that before and I can't think of anything we've
 done recently which could cause it, sorry.

If the per-cpu data patches are in the -mm tree that is the first
place I would start looking at for possible cause.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc4-mm1: some issues on sparc64

2007-12-09 Thread Andrew Morton
On Sun, 09 Dec 2007 00:45:17 -0800 (PST) David Miller [EMAIL PROTECTED] wrote:

 From: Andrew Morton [EMAIL PROTECTED]
 Date: Sat, 8 Dec 2007 10:22:39 -0800
 
  That's
  
  J_ASSERT_BH(bh, !buffer_jbddirty(bh));
  
  at the end of journal_unmap_buffer().
  
  I don't recall seeing that before and I can't think of anything we've
  done recently which could cause it, sorry.
 
 If the per-cpu data patches are in the -mm tree that is the first
 place I would start looking at for possible cause.

They aren't.  The dust hadn't settled enough on those when Christoph shot
through on vacation.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc4-mm1: some issues on sparc64

2007-12-08 Thread Andrew Morton
On Sat, 8 Dec 2007 19:20:28 +0100 Mariusz Kozlowski <[EMAIL PROTECTED]> wrote:

>   The box is sun ultra 60 (dual sparc64). This was caught when
> system (gentoo) was emerging some package. 
> 
> [27006.402237] kernel BUG at fs/jbd/transaction.c:1894!

That's

J_ASSERT_BH(bh, !buffer_jbddirty(bh));

at the end of journal_unmap_buffer().

I don't recall seeing that before and I can't think of anything we've
done recently which could cause it, sorry.

> [27006.402268]   \|/  \|/
> [27006.402274]   "@'/ .. \`@"
> [27006.402279]   /_| \__/ |_\
> [27006.402285]  \__U_/

x86 needs that.

> [27006.402298] rm(4713): Kernel bad sw trap 5 [#1]
> [27006.402538] TSTATE: 009911009605 TPC: 0053b1cc TNPC: 
> 0053b1d0 Y: Not tainted
> [27006.402579] TPC: 
> [27006.402593] g0: 0002 g1:  g2: 0001 
> g3: f800a7d9
> [27006.402610] g4: f800b54ea460 g5: f8007f832000 g6: f800a7d9 
> g7: 0076d868
> [27006.402627] o0: 0072b660 o1: 0766 o2: 0002 
> o3: 0001
> [27006.402644] o4: 008a2940 o5:  sp: f800a7d92c91 
> ret_pc: 0053b1c4
> [27006.402665] RPC: 
> [27006.402679] l0: f800afbf4070 l1: 0069511c l2: 2000 
> l3: 
> [27006.402696] l4: 0001 l5: f800ba4cb730 l6: f800bf1cd338 
> l7: 0001
> [27006.402713] i0: f800bf1cd000 i1: 000201db2708 i2:  
> i3: 00727000
> [27006.402730] i4: 0020 i5: f800bf1cd028 i6: f800a7d92d51 
> i7: 00529254
> [27006.402763] I7: 
> [27006.402776] Caller[00529254]: ext3_invalidatepage+0x3c/0x60
> [27006.402800] Caller[004b22fc]: do_invalidatepage+0x24/0x60
> [27006.402826] Caller[004b29c4]: truncate_complete_page+0x6c/0x80
> [27006.402849] Caller[004b2a6c]: truncate_inode_pages_range+0x94/0x440
> [27006.402872] Caller[004b2e2c]: truncate_inode_pages+0x14/0x20
> [27006.402894] Caller[00529888]: ext3_delete_inode+0x10/0x160
> [27006.402918] Caller[004e7ca0]: generic_delete_inode+0x88/0x120
> [27006.402949] Caller[004e7e60]: generic_drop_inode+0x128/0x1c0
> [27006.402971] Caller[004e75d4]: iput+0x7c/0xa0
> [27006.402992] Caller[004dd680]: do_unlinkat+0x108/0x1a0
> [27006.403024] Caller[004dd884]: sys_unlinkat+0x2c/0x60
> [27006.403047] Caller[004062d4]: linux_sparc_syscall32+0x3c/0x40
> [27006.403081] Caller[f7e7d0ec]: 0xf7e7d0f4
> [27006.403102] Instruction DUMP: 92102766  7ffbbeaf  90122260 <91d02005> 
> 92102780  7ffbbeab  90122260  91d02005  7ffbbea8
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc4-mm1: some issues on sparc64

2007-12-08 Thread Mariusz Kozlowski
Hello,

The box is sun ultra 60 (dual sparc64). This was caught when
system (gentoo) was emerging some package. 

[27006.402237] kernel BUG at fs/jbd/transaction.c:1894!
[27006.402268]   \|/  \|/
[27006.402274]   "@'/ .. \`@"
[27006.402279]   /_| \__/ |_\
[27006.402285]  \__U_/
[27006.402298] rm(4713): Kernel bad sw trap 5 [#1]
[27006.402538] TSTATE: 009911009605 TPC: 0053b1cc TNPC: 
0053b1d0 Y: Not tainted
[27006.402579] TPC: 
[27006.402593] g0: 0002 g1:  g2: 0001 
g3: f800a7d9
[27006.402610] g4: f800b54ea460 g5: f8007f832000 g6: f800a7d9 
g7: 0076d868
[27006.402627] o0: 0072b660 o1: 0766 o2: 0002 
o3: 0001
[27006.402644] o4: 008a2940 o5:  sp: f800a7d92c91 
ret_pc: 0053b1c4
[27006.402665] RPC: 
[27006.402679] l0: f800afbf4070 l1: 0069511c l2: 2000 
l3: 
[27006.402696] l4: 0001 l5: f800ba4cb730 l6: f800bf1cd338 
l7: 0001
[27006.402713] i0: f800bf1cd000 i1: 000201db2708 i2:  
i3: 00727000
[27006.402730] i4: 0020 i5: f800bf1cd028 i6: f800a7d92d51 
i7: 00529254
[27006.402763] I7: 
[27006.402776] Caller[00529254]: ext3_invalidatepage+0x3c/0x60
[27006.402800] Caller[004b22fc]: do_invalidatepage+0x24/0x60
[27006.402826] Caller[004b29c4]: truncate_complete_page+0x6c/0x80
[27006.402849] Caller[004b2a6c]: truncate_inode_pages_range+0x94/0x440
[27006.402872] Caller[004b2e2c]: truncate_inode_pages+0x14/0x20
[27006.402894] Caller[00529888]: ext3_delete_inode+0x10/0x160
[27006.402918] Caller[004e7ca0]: generic_delete_inode+0x88/0x120
[27006.402949] Caller[004e7e60]: generic_drop_inode+0x128/0x1c0
[27006.402971] Caller[004e75d4]: iput+0x7c/0xa0
[27006.402992] Caller[004dd680]: do_unlinkat+0x108/0x1a0
[27006.403024] Caller[004dd884]: sys_unlinkat+0x2c/0x60
[27006.403047] Caller[004062d4]: linux_sparc_syscall32+0x3c/0x40
[27006.403081] Caller[f7e7d0ec]: 0xf7e7d0f4
[27006.403102] Instruction DUMP: 92102766  7ffbbeaf  90122260 <91d02005> 
92102780  7ffbbeab  90122260  91d02005  7ffbbea8

After this happend, one (out of two) cpu got consumed (in kernel space) trying 
to
complete io. Process stuck in D state, wchan says it was in sync_buffer() which
you can see also in 'SysRq : Show Blocked State' below.

[27422.874858] SysRq : Show Blocked State
[27422.877086]   taskPC stack   pid father
[27422.877143] rmD 004f8f68 0  4966   4860
[27422.877160] Call Trace:
[27422.877167]  [00692840] io_schedule+0x28/0x40
[27422.877182]  [004f8f68] sync_buffer+0x50/0x60
[27422.877198]  [00692a58] __wait_on_bit_lock+0x60/0xa0
[27422.877213]  [00692ae4] out_of_line_wait_on_bit_lock+0x4c/0x60
[27422.877228]  [004f9328] __lock_buffer+0x30/0x40
[27422.877242]  [0053b024] journal_invalidatepage+0x22c/0x460
[27422.877268]  [00529254] ext3_invalidatepage+0x3c/0x60
[27422.877297]  [004b22fc] do_invalidatepage+0x24/0x60
[27422.877316]  [004b29c4] truncate_complete_page+0x6c/0x80
[27422.877332]  [004b2a6c] truncate_inode_pages_range+0x94/0x440
[27422.877349]  [004b2e2c] truncate_inode_pages+0x14/0x20
[27422.877364]  [00529888] ext3_delete_inode+0x10/0x160
[27422.877381]  [004e7ca0] generic_delete_inode+0x88/0x120
[27422.877405]  [004e7e60] generic_drop_inode+0x128/0x1c0
[27422.877421]  [004e75d4] iput+0x7c/0xa0
[27422.877435]  [004dd680] do_unlinkat+0x108/0x1a0

The downside is that it is unclear to me how to reproduce that - it just 
happens sometimes.
Also from time to time I get warnings about tcp_fastretrans_alert(), but it 
seems they do no harm.

[30014.779310] WARNING: at net/ipv4/tcp_input.c:2518 tcp_fastretrans_alert()
[30014.781630] Call Trace:
[30014.783976]  [006551c8] tcp_fastretrans_alert+0x70/0xe00
[30014.786312]  [00657c60] tcp_ack+0x988/0x10c0
[30014.788702]  [0065bd80] tcp_rcv_established+0x408/0x840
[30014.791074]  [006634dc] tcp_v4_do_rcv+0xe4/0x4a0
[30014.793440]  [0066632c] tcp_v4_rcv+0xa34/0xb20
[30014.795762]  [00643a10] ip_local_deliver+0xd8/0x2c0
[30014.798102]  [00643ed4] ip_rcv+0x2dc/0x640
[30014.800431]  [0062424c] netif_receive_skb+0x334/0x400
[30014.802762]  [00627228] process_backlog+0x90/0x140
[30014.805097]  [00626d28] net_rx_action+0x190/0x260
[30014.807462]  [00475ea8] __do_softirq+0x90/0x140
[30014.809794]  [00475fe0] do_softirq+0x88/0xa0
[30014.812134]  [0047608c] irq_exit+0x94/0xc0
[30014.814453]  [0042f53c] handler_irq+0xa4/0xc0
[30014.816800]  

Re: 2.6.24-rc4-mm1: some issues on sparc64

2007-12-08 Thread Mariusz Kozlowski
Hello,

The box is sun ultra 60 (dual sparc64). This was caught when
system (gentoo) was emerging some package. 

[27006.402237] kernel BUG at fs/jbd/transaction.c:1894!
[27006.402268]   \|/  \|/
[27006.402274]   @'/ .. \`@
[27006.402279]   /_| \__/ |_\
[27006.402285]  \__U_/
[27006.402298] rm(4713): Kernel bad sw trap 5 [#1]
[27006.402538] TSTATE: 009911009605 TPC: 0053b1cc TNPC: 
0053b1d0 Y: Not tainted
[27006.402579] TPC: journal_invalidatepage+0x3d4/0x460
[27006.402593] g0: 0002 g1:  g2: 0001 
g3: f800a7d9
[27006.402610] g4: f800b54ea460 g5: f8007f832000 g6: f800a7d9 
g7: 0076d868
[27006.402627] o0: 0072b660 o1: 0766 o2: 0002 
o3: 0001
[27006.402644] o4: 008a2940 o5:  sp: f800a7d92c91 
ret_pc: 0053b1c4
[27006.402665] RPC: journal_invalidatepage+0x3cc/0x460
[27006.402679] l0: f800afbf4070 l1: 0069511c l2: 2000 
l3: 
[27006.402696] l4: 0001 l5: f800ba4cb730 l6: f800bf1cd338 
l7: 0001
[27006.402713] i0: f800bf1cd000 i1: 000201db2708 i2:  
i3: 00727000
[27006.402730] i4: 0020 i5: f800bf1cd028 i6: f800a7d92d51 
i7: 00529254
[27006.402763] I7: ext3_invalidatepage+0x3c/0x60
[27006.402776] Caller[00529254]: ext3_invalidatepage+0x3c/0x60
[27006.402800] Caller[004b22fc]: do_invalidatepage+0x24/0x60
[27006.402826] Caller[004b29c4]: truncate_complete_page+0x6c/0x80
[27006.402849] Caller[004b2a6c]: truncate_inode_pages_range+0x94/0x440
[27006.402872] Caller[004b2e2c]: truncate_inode_pages+0x14/0x20
[27006.402894] Caller[00529888]: ext3_delete_inode+0x10/0x160
[27006.402918] Caller[004e7ca0]: generic_delete_inode+0x88/0x120
[27006.402949] Caller[004e7e60]: generic_drop_inode+0x128/0x1c0
[27006.402971] Caller[004e75d4]: iput+0x7c/0xa0
[27006.402992] Caller[004dd680]: do_unlinkat+0x108/0x1a0
[27006.403024] Caller[004dd884]: sys_unlinkat+0x2c/0x60
[27006.403047] Caller[004062d4]: linux_sparc_syscall32+0x3c/0x40
[27006.403081] Caller[f7e7d0ec]: 0xf7e7d0f4
[27006.403102] Instruction DUMP: 92102766  7ffbbeaf  90122260 91d02005 
92102780  7ffbbeab  90122260  91d02005  7ffbbea8

After this happend, one (out of two) cpu got consumed (in kernel space) trying 
to
complete io. Process stuck in D state, wchan says it was in sync_buffer() which
you can see also in 'SysRq : Show Blocked State' below.

[27422.874858] SysRq : Show Blocked State
[27422.877086]   taskPC stack   pid father
[27422.877143] rmD 004f8f68 0  4966   4860
[27422.877160] Call Trace:
[27422.877167]  [00692840] io_schedule+0x28/0x40
[27422.877182]  [004f8f68] sync_buffer+0x50/0x60
[27422.877198]  [00692a58] __wait_on_bit_lock+0x60/0xa0
[27422.877213]  [00692ae4] out_of_line_wait_on_bit_lock+0x4c/0x60
[27422.877228]  [004f9328] __lock_buffer+0x30/0x40
[27422.877242]  [0053b024] journal_invalidatepage+0x22c/0x460
[27422.877268]  [00529254] ext3_invalidatepage+0x3c/0x60
[27422.877297]  [004b22fc] do_invalidatepage+0x24/0x60
[27422.877316]  [004b29c4] truncate_complete_page+0x6c/0x80
[27422.877332]  [004b2a6c] truncate_inode_pages_range+0x94/0x440
[27422.877349]  [004b2e2c] truncate_inode_pages+0x14/0x20
[27422.877364]  [00529888] ext3_delete_inode+0x10/0x160
[27422.877381]  [004e7ca0] generic_delete_inode+0x88/0x120
[27422.877405]  [004e7e60] generic_drop_inode+0x128/0x1c0
[27422.877421]  [004e75d4] iput+0x7c/0xa0
[27422.877435]  [004dd680] do_unlinkat+0x108/0x1a0

The downside is that it is unclear to me how to reproduce that - it just 
happens sometimes.
Also from time to time I get warnings about tcp_fastretrans_alert(), but it 
seems they do no harm.

[30014.779310] WARNING: at net/ipv4/tcp_input.c:2518 tcp_fastretrans_alert()
[30014.781630] Call Trace:
[30014.783976]  [006551c8] tcp_fastretrans_alert+0x70/0xe00
[30014.786312]  [00657c60] tcp_ack+0x988/0x10c0
[30014.788702]  [0065bd80] tcp_rcv_established+0x408/0x840
[30014.791074]  [006634dc] tcp_v4_do_rcv+0xe4/0x4a0
[30014.793440]  [0066632c] tcp_v4_rcv+0xa34/0xb20
[30014.795762]  [00643a10] ip_local_deliver+0xd8/0x2c0
[30014.798102]  [00643ed4] ip_rcv+0x2dc/0x640
[30014.800431]  [0062424c] netif_receive_skb+0x334/0x400
[30014.802762]  [00627228] process_backlog+0x90/0x140
[30014.805097]  [00626d28] net_rx_action+0x190/0x260
[30014.807462]  [00475ea8] __do_softirq+0x90/0x140
[30014.809794]  [00475fe0] do_softirq+0x88/0xa0
[30014.812134]  [0047608c] 

Re: 2.6.24-rc4-mm1: some issues on sparc64

2007-12-08 Thread Andrew Morton
On Sat, 8 Dec 2007 19:20:28 +0100 Mariusz Kozlowski [EMAIL PROTECTED] wrote:

   The box is sun ultra 60 (dual sparc64). This was caught when
 system (gentoo) was emerging some package. 
 
 [27006.402237] kernel BUG at fs/jbd/transaction.c:1894!

That's

J_ASSERT_BH(bh, !buffer_jbddirty(bh));

at the end of journal_unmap_buffer().

I don't recall seeing that before and I can't think of anything we've
done recently which could cause it, sorry.

 [27006.402268]   \|/  \|/
 [27006.402274]   @'/ .. \`@
 [27006.402279]   /_| \__/ |_\
 [27006.402285]  \__U_/

x86 needs that.

 [27006.402298] rm(4713): Kernel bad sw trap 5 [#1]
 [27006.402538] TSTATE: 009911009605 TPC: 0053b1cc TNPC: 
 0053b1d0 Y: Not tainted
 [27006.402579] TPC: journal_invalidatepage+0x3d4/0x460
 [27006.402593] g0: 0002 g1:  g2: 0001 
 g3: f800a7d9
 [27006.402610] g4: f800b54ea460 g5: f8007f832000 g6: f800a7d9 
 g7: 0076d868
 [27006.402627] o0: 0072b660 o1: 0766 o2: 0002 
 o3: 0001
 [27006.402644] o4: 008a2940 o5:  sp: f800a7d92c91 
 ret_pc: 0053b1c4
 [27006.402665] RPC: journal_invalidatepage+0x3cc/0x460
 [27006.402679] l0: f800afbf4070 l1: 0069511c l2: 2000 
 l3: 
 [27006.402696] l4: 0001 l5: f800ba4cb730 l6: f800bf1cd338 
 l7: 0001
 [27006.402713] i0: f800bf1cd000 i1: 000201db2708 i2:  
 i3: 00727000
 [27006.402730] i4: 0020 i5: f800bf1cd028 i6: f800a7d92d51 
 i7: 00529254
 [27006.402763] I7: ext3_invalidatepage+0x3c/0x60
 [27006.402776] Caller[00529254]: ext3_invalidatepage+0x3c/0x60
 [27006.402800] Caller[004b22fc]: do_invalidatepage+0x24/0x60
 [27006.402826] Caller[004b29c4]: truncate_complete_page+0x6c/0x80
 [27006.402849] Caller[004b2a6c]: truncate_inode_pages_range+0x94/0x440
 [27006.402872] Caller[004b2e2c]: truncate_inode_pages+0x14/0x20
 [27006.402894] Caller[00529888]: ext3_delete_inode+0x10/0x160
 [27006.402918] Caller[004e7ca0]: generic_delete_inode+0x88/0x120
 [27006.402949] Caller[004e7e60]: generic_drop_inode+0x128/0x1c0
 [27006.402971] Caller[004e75d4]: iput+0x7c/0xa0
 [27006.402992] Caller[004dd680]: do_unlinkat+0x108/0x1a0
 [27006.403024] Caller[004dd884]: sys_unlinkat+0x2c/0x60
 [27006.403047] Caller[004062d4]: linux_sparc_syscall32+0x3c/0x40
 [27006.403081] Caller[f7e7d0ec]: 0xf7e7d0f4
 [27006.403102] Instruction DUMP: 92102766  7ffbbeaf  90122260 91d02005 
 92102780  7ffbbeab  90122260  91d02005  7ffbbea8
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/