Re: 2.6.24-rc4-mm1: some issues on sparc64
On Sun, 09 Dec 2007 00:45:17 -0800 (PST) David Miller <[EMAIL PROTECTED]> wrote: > From: Andrew Morton <[EMAIL PROTECTED]> > Date: Sat, 8 Dec 2007 10:22:39 -0800 > > > That's > > > > J_ASSERT_BH(bh, !buffer_jbddirty(bh)); > > > > at the end of journal_unmap_buffer(). > > > > I don't recall seeing that before and I can't think of anything we've > > done recently which could cause it, sorry. > > If the per-cpu data patches are in the -mm tree that is the first > place I would start looking at for possible cause. They aren't. The dust hadn't settled enough on those when Christoph shot through on vacation. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc4-mm1: some issues on sparc64
From: Andrew Morton <[EMAIL PROTECTED]> Date: Sat, 8 Dec 2007 10:22:39 -0800 > That's > > J_ASSERT_BH(bh, !buffer_jbddirty(bh)); > > at the end of journal_unmap_buffer(). > > I don't recall seeing that before and I can't think of anything we've > done recently which could cause it, sorry. If the per-cpu data patches are in the -mm tree that is the first place I would start looking at for possible cause. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc4-mm1: some issues on sparc64
From: Andrew Morton [EMAIL PROTECTED] Date: Sat, 8 Dec 2007 10:22:39 -0800 That's J_ASSERT_BH(bh, !buffer_jbddirty(bh)); at the end of journal_unmap_buffer(). I don't recall seeing that before and I can't think of anything we've done recently which could cause it, sorry. If the per-cpu data patches are in the -mm tree that is the first place I would start looking at for possible cause. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc4-mm1: some issues on sparc64
On Sun, 09 Dec 2007 00:45:17 -0800 (PST) David Miller [EMAIL PROTECTED] wrote: From: Andrew Morton [EMAIL PROTECTED] Date: Sat, 8 Dec 2007 10:22:39 -0800 That's J_ASSERT_BH(bh, !buffer_jbddirty(bh)); at the end of journal_unmap_buffer(). I don't recall seeing that before and I can't think of anything we've done recently which could cause it, sorry. If the per-cpu data patches are in the -mm tree that is the first place I would start looking at for possible cause. They aren't. The dust hadn't settled enough on those when Christoph shot through on vacation. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc4-mm1: some issues on sparc64
On Sat, 8 Dec 2007 19:20:28 +0100 Mariusz Kozlowski <[EMAIL PROTECTED]> wrote: > The box is sun ultra 60 (dual sparc64). This was caught when > system (gentoo) was emerging some package. > > [27006.402237] kernel BUG at fs/jbd/transaction.c:1894! That's J_ASSERT_BH(bh, !buffer_jbddirty(bh)); at the end of journal_unmap_buffer(). I don't recall seeing that before and I can't think of anything we've done recently which could cause it, sorry. > [27006.402268] \|/ \|/ > [27006.402274] "@'/ .. \`@" > [27006.402279] /_| \__/ |_\ > [27006.402285] \__U_/ x86 needs that. > [27006.402298] rm(4713): Kernel bad sw trap 5 [#1] > [27006.402538] TSTATE: 009911009605 TPC: 0053b1cc TNPC: > 0053b1d0 Y: Not tainted > [27006.402579] TPC: > [27006.402593] g0: 0002 g1: g2: 0001 > g3: f800a7d9 > [27006.402610] g4: f800b54ea460 g5: f8007f832000 g6: f800a7d9 > g7: 0076d868 > [27006.402627] o0: 0072b660 o1: 0766 o2: 0002 > o3: 0001 > [27006.402644] o4: 008a2940 o5: sp: f800a7d92c91 > ret_pc: 0053b1c4 > [27006.402665] RPC: > [27006.402679] l0: f800afbf4070 l1: 0069511c l2: 2000 > l3: > [27006.402696] l4: 0001 l5: f800ba4cb730 l6: f800bf1cd338 > l7: 0001 > [27006.402713] i0: f800bf1cd000 i1: 000201db2708 i2: > i3: 00727000 > [27006.402730] i4: 0020 i5: f800bf1cd028 i6: f800a7d92d51 > i7: 00529254 > [27006.402763] I7: > [27006.402776] Caller[00529254]: ext3_invalidatepage+0x3c/0x60 > [27006.402800] Caller[004b22fc]: do_invalidatepage+0x24/0x60 > [27006.402826] Caller[004b29c4]: truncate_complete_page+0x6c/0x80 > [27006.402849] Caller[004b2a6c]: truncate_inode_pages_range+0x94/0x440 > [27006.402872] Caller[004b2e2c]: truncate_inode_pages+0x14/0x20 > [27006.402894] Caller[00529888]: ext3_delete_inode+0x10/0x160 > [27006.402918] Caller[004e7ca0]: generic_delete_inode+0x88/0x120 > [27006.402949] Caller[004e7e60]: generic_drop_inode+0x128/0x1c0 > [27006.402971] Caller[004e75d4]: iput+0x7c/0xa0 > [27006.402992] Caller[004dd680]: do_unlinkat+0x108/0x1a0 > [27006.403024] Caller[004dd884]: sys_unlinkat+0x2c/0x60 > [27006.403047] Caller[004062d4]: linux_sparc_syscall32+0x3c/0x40 > [27006.403081] Caller[f7e7d0ec]: 0xf7e7d0f4 > [27006.403102] Instruction DUMP: 92102766 7ffbbeaf 90122260 <91d02005> > 92102780 7ffbbeab 90122260 91d02005 7ffbbea8 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc4-mm1: some issues on sparc64
Hello, The box is sun ultra 60 (dual sparc64). This was caught when system (gentoo) was emerging some package. [27006.402237] kernel BUG at fs/jbd/transaction.c:1894! [27006.402268] \|/ \|/ [27006.402274] "@'/ .. \`@" [27006.402279] /_| \__/ |_\ [27006.402285] \__U_/ [27006.402298] rm(4713): Kernel bad sw trap 5 [#1] [27006.402538] TSTATE: 009911009605 TPC: 0053b1cc TNPC: 0053b1d0 Y: Not tainted [27006.402579] TPC: [27006.402593] g0: 0002 g1: g2: 0001 g3: f800a7d9 [27006.402610] g4: f800b54ea460 g5: f8007f832000 g6: f800a7d9 g7: 0076d868 [27006.402627] o0: 0072b660 o1: 0766 o2: 0002 o3: 0001 [27006.402644] o4: 008a2940 o5: sp: f800a7d92c91 ret_pc: 0053b1c4 [27006.402665] RPC: [27006.402679] l0: f800afbf4070 l1: 0069511c l2: 2000 l3: [27006.402696] l4: 0001 l5: f800ba4cb730 l6: f800bf1cd338 l7: 0001 [27006.402713] i0: f800bf1cd000 i1: 000201db2708 i2: i3: 00727000 [27006.402730] i4: 0020 i5: f800bf1cd028 i6: f800a7d92d51 i7: 00529254 [27006.402763] I7: [27006.402776] Caller[00529254]: ext3_invalidatepage+0x3c/0x60 [27006.402800] Caller[004b22fc]: do_invalidatepage+0x24/0x60 [27006.402826] Caller[004b29c4]: truncate_complete_page+0x6c/0x80 [27006.402849] Caller[004b2a6c]: truncate_inode_pages_range+0x94/0x440 [27006.402872] Caller[004b2e2c]: truncate_inode_pages+0x14/0x20 [27006.402894] Caller[00529888]: ext3_delete_inode+0x10/0x160 [27006.402918] Caller[004e7ca0]: generic_delete_inode+0x88/0x120 [27006.402949] Caller[004e7e60]: generic_drop_inode+0x128/0x1c0 [27006.402971] Caller[004e75d4]: iput+0x7c/0xa0 [27006.402992] Caller[004dd680]: do_unlinkat+0x108/0x1a0 [27006.403024] Caller[004dd884]: sys_unlinkat+0x2c/0x60 [27006.403047] Caller[004062d4]: linux_sparc_syscall32+0x3c/0x40 [27006.403081] Caller[f7e7d0ec]: 0xf7e7d0f4 [27006.403102] Instruction DUMP: 92102766 7ffbbeaf 90122260 <91d02005> 92102780 7ffbbeab 90122260 91d02005 7ffbbea8 After this happend, one (out of two) cpu got consumed (in kernel space) trying to complete io. Process stuck in D state, wchan says it was in sync_buffer() which you can see also in 'SysRq : Show Blocked State' below. [27422.874858] SysRq : Show Blocked State [27422.877086] taskPC stack pid father [27422.877143] rmD 004f8f68 0 4966 4860 [27422.877160] Call Trace: [27422.877167] [00692840] io_schedule+0x28/0x40 [27422.877182] [004f8f68] sync_buffer+0x50/0x60 [27422.877198] [00692a58] __wait_on_bit_lock+0x60/0xa0 [27422.877213] [00692ae4] out_of_line_wait_on_bit_lock+0x4c/0x60 [27422.877228] [004f9328] __lock_buffer+0x30/0x40 [27422.877242] [0053b024] journal_invalidatepage+0x22c/0x460 [27422.877268] [00529254] ext3_invalidatepage+0x3c/0x60 [27422.877297] [004b22fc] do_invalidatepage+0x24/0x60 [27422.877316] [004b29c4] truncate_complete_page+0x6c/0x80 [27422.877332] [004b2a6c] truncate_inode_pages_range+0x94/0x440 [27422.877349] [004b2e2c] truncate_inode_pages+0x14/0x20 [27422.877364] [00529888] ext3_delete_inode+0x10/0x160 [27422.877381] [004e7ca0] generic_delete_inode+0x88/0x120 [27422.877405] [004e7e60] generic_drop_inode+0x128/0x1c0 [27422.877421] [004e75d4] iput+0x7c/0xa0 [27422.877435] [004dd680] do_unlinkat+0x108/0x1a0 The downside is that it is unclear to me how to reproduce that - it just happens sometimes. Also from time to time I get warnings about tcp_fastretrans_alert(), but it seems they do no harm. [30014.779310] WARNING: at net/ipv4/tcp_input.c:2518 tcp_fastretrans_alert() [30014.781630] Call Trace: [30014.783976] [006551c8] tcp_fastretrans_alert+0x70/0xe00 [30014.786312] [00657c60] tcp_ack+0x988/0x10c0 [30014.788702] [0065bd80] tcp_rcv_established+0x408/0x840 [30014.791074] [006634dc] tcp_v4_do_rcv+0xe4/0x4a0 [30014.793440] [0066632c] tcp_v4_rcv+0xa34/0xb20 [30014.795762] [00643a10] ip_local_deliver+0xd8/0x2c0 [30014.798102] [00643ed4] ip_rcv+0x2dc/0x640 [30014.800431] [0062424c] netif_receive_skb+0x334/0x400 [30014.802762] [00627228] process_backlog+0x90/0x140 [30014.805097] [00626d28] net_rx_action+0x190/0x260 [30014.807462] [00475ea8] __do_softirq+0x90/0x140 [30014.809794] [00475fe0] do_softirq+0x88/0xa0 [30014.812134] [0047608c] irq_exit+0x94/0xc0 [30014.814453] [0042f53c] handler_irq+0xa4/0xc0 [30014.816800]
Re: 2.6.24-rc4-mm1: some issues on sparc64
Hello, The box is sun ultra 60 (dual sparc64). This was caught when system (gentoo) was emerging some package. [27006.402237] kernel BUG at fs/jbd/transaction.c:1894! [27006.402268] \|/ \|/ [27006.402274] @'/ .. \`@ [27006.402279] /_| \__/ |_\ [27006.402285] \__U_/ [27006.402298] rm(4713): Kernel bad sw trap 5 [#1] [27006.402538] TSTATE: 009911009605 TPC: 0053b1cc TNPC: 0053b1d0 Y: Not tainted [27006.402579] TPC: journal_invalidatepage+0x3d4/0x460 [27006.402593] g0: 0002 g1: g2: 0001 g3: f800a7d9 [27006.402610] g4: f800b54ea460 g5: f8007f832000 g6: f800a7d9 g7: 0076d868 [27006.402627] o0: 0072b660 o1: 0766 o2: 0002 o3: 0001 [27006.402644] o4: 008a2940 o5: sp: f800a7d92c91 ret_pc: 0053b1c4 [27006.402665] RPC: journal_invalidatepage+0x3cc/0x460 [27006.402679] l0: f800afbf4070 l1: 0069511c l2: 2000 l3: [27006.402696] l4: 0001 l5: f800ba4cb730 l6: f800bf1cd338 l7: 0001 [27006.402713] i0: f800bf1cd000 i1: 000201db2708 i2: i3: 00727000 [27006.402730] i4: 0020 i5: f800bf1cd028 i6: f800a7d92d51 i7: 00529254 [27006.402763] I7: ext3_invalidatepage+0x3c/0x60 [27006.402776] Caller[00529254]: ext3_invalidatepage+0x3c/0x60 [27006.402800] Caller[004b22fc]: do_invalidatepage+0x24/0x60 [27006.402826] Caller[004b29c4]: truncate_complete_page+0x6c/0x80 [27006.402849] Caller[004b2a6c]: truncate_inode_pages_range+0x94/0x440 [27006.402872] Caller[004b2e2c]: truncate_inode_pages+0x14/0x20 [27006.402894] Caller[00529888]: ext3_delete_inode+0x10/0x160 [27006.402918] Caller[004e7ca0]: generic_delete_inode+0x88/0x120 [27006.402949] Caller[004e7e60]: generic_drop_inode+0x128/0x1c0 [27006.402971] Caller[004e75d4]: iput+0x7c/0xa0 [27006.402992] Caller[004dd680]: do_unlinkat+0x108/0x1a0 [27006.403024] Caller[004dd884]: sys_unlinkat+0x2c/0x60 [27006.403047] Caller[004062d4]: linux_sparc_syscall32+0x3c/0x40 [27006.403081] Caller[f7e7d0ec]: 0xf7e7d0f4 [27006.403102] Instruction DUMP: 92102766 7ffbbeaf 90122260 91d02005 92102780 7ffbbeab 90122260 91d02005 7ffbbea8 After this happend, one (out of two) cpu got consumed (in kernel space) trying to complete io. Process stuck in D state, wchan says it was in sync_buffer() which you can see also in 'SysRq : Show Blocked State' below. [27422.874858] SysRq : Show Blocked State [27422.877086] taskPC stack pid father [27422.877143] rmD 004f8f68 0 4966 4860 [27422.877160] Call Trace: [27422.877167] [00692840] io_schedule+0x28/0x40 [27422.877182] [004f8f68] sync_buffer+0x50/0x60 [27422.877198] [00692a58] __wait_on_bit_lock+0x60/0xa0 [27422.877213] [00692ae4] out_of_line_wait_on_bit_lock+0x4c/0x60 [27422.877228] [004f9328] __lock_buffer+0x30/0x40 [27422.877242] [0053b024] journal_invalidatepage+0x22c/0x460 [27422.877268] [00529254] ext3_invalidatepage+0x3c/0x60 [27422.877297] [004b22fc] do_invalidatepage+0x24/0x60 [27422.877316] [004b29c4] truncate_complete_page+0x6c/0x80 [27422.877332] [004b2a6c] truncate_inode_pages_range+0x94/0x440 [27422.877349] [004b2e2c] truncate_inode_pages+0x14/0x20 [27422.877364] [00529888] ext3_delete_inode+0x10/0x160 [27422.877381] [004e7ca0] generic_delete_inode+0x88/0x120 [27422.877405] [004e7e60] generic_drop_inode+0x128/0x1c0 [27422.877421] [004e75d4] iput+0x7c/0xa0 [27422.877435] [004dd680] do_unlinkat+0x108/0x1a0 The downside is that it is unclear to me how to reproduce that - it just happens sometimes. Also from time to time I get warnings about tcp_fastretrans_alert(), but it seems they do no harm. [30014.779310] WARNING: at net/ipv4/tcp_input.c:2518 tcp_fastretrans_alert() [30014.781630] Call Trace: [30014.783976] [006551c8] tcp_fastretrans_alert+0x70/0xe00 [30014.786312] [00657c60] tcp_ack+0x988/0x10c0 [30014.788702] [0065bd80] tcp_rcv_established+0x408/0x840 [30014.791074] [006634dc] tcp_v4_do_rcv+0xe4/0x4a0 [30014.793440] [0066632c] tcp_v4_rcv+0xa34/0xb20 [30014.795762] [00643a10] ip_local_deliver+0xd8/0x2c0 [30014.798102] [00643ed4] ip_rcv+0x2dc/0x640 [30014.800431] [0062424c] netif_receive_skb+0x334/0x400 [30014.802762] [00627228] process_backlog+0x90/0x140 [30014.805097] [00626d28] net_rx_action+0x190/0x260 [30014.807462] [00475ea8] __do_softirq+0x90/0x140 [30014.809794] [00475fe0] do_softirq+0x88/0xa0 [30014.812134] [0047608c]
Re: 2.6.24-rc4-mm1: some issues on sparc64
On Sat, 8 Dec 2007 19:20:28 +0100 Mariusz Kozlowski [EMAIL PROTECTED] wrote: The box is sun ultra 60 (dual sparc64). This was caught when system (gentoo) was emerging some package. [27006.402237] kernel BUG at fs/jbd/transaction.c:1894! That's J_ASSERT_BH(bh, !buffer_jbddirty(bh)); at the end of journal_unmap_buffer(). I don't recall seeing that before and I can't think of anything we've done recently which could cause it, sorry. [27006.402268] \|/ \|/ [27006.402274] @'/ .. \`@ [27006.402279] /_| \__/ |_\ [27006.402285] \__U_/ x86 needs that. [27006.402298] rm(4713): Kernel bad sw trap 5 [#1] [27006.402538] TSTATE: 009911009605 TPC: 0053b1cc TNPC: 0053b1d0 Y: Not tainted [27006.402579] TPC: journal_invalidatepage+0x3d4/0x460 [27006.402593] g0: 0002 g1: g2: 0001 g3: f800a7d9 [27006.402610] g4: f800b54ea460 g5: f8007f832000 g6: f800a7d9 g7: 0076d868 [27006.402627] o0: 0072b660 o1: 0766 o2: 0002 o3: 0001 [27006.402644] o4: 008a2940 o5: sp: f800a7d92c91 ret_pc: 0053b1c4 [27006.402665] RPC: journal_invalidatepage+0x3cc/0x460 [27006.402679] l0: f800afbf4070 l1: 0069511c l2: 2000 l3: [27006.402696] l4: 0001 l5: f800ba4cb730 l6: f800bf1cd338 l7: 0001 [27006.402713] i0: f800bf1cd000 i1: 000201db2708 i2: i3: 00727000 [27006.402730] i4: 0020 i5: f800bf1cd028 i6: f800a7d92d51 i7: 00529254 [27006.402763] I7: ext3_invalidatepage+0x3c/0x60 [27006.402776] Caller[00529254]: ext3_invalidatepage+0x3c/0x60 [27006.402800] Caller[004b22fc]: do_invalidatepage+0x24/0x60 [27006.402826] Caller[004b29c4]: truncate_complete_page+0x6c/0x80 [27006.402849] Caller[004b2a6c]: truncate_inode_pages_range+0x94/0x440 [27006.402872] Caller[004b2e2c]: truncate_inode_pages+0x14/0x20 [27006.402894] Caller[00529888]: ext3_delete_inode+0x10/0x160 [27006.402918] Caller[004e7ca0]: generic_delete_inode+0x88/0x120 [27006.402949] Caller[004e7e60]: generic_drop_inode+0x128/0x1c0 [27006.402971] Caller[004e75d4]: iput+0x7c/0xa0 [27006.402992] Caller[004dd680]: do_unlinkat+0x108/0x1a0 [27006.403024] Caller[004dd884]: sys_unlinkat+0x2c/0x60 [27006.403047] Caller[004062d4]: linux_sparc_syscall32+0x3c/0x40 [27006.403081] Caller[f7e7d0ec]: 0xf7e7d0f4 [27006.403102] Instruction DUMP: 92102766 7ffbbeaf 90122260 91d02005 92102780 7ffbbeab 90122260 91d02005 7ffbbea8 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/