[PATCH] powerpc/irq: Remove HAVE_IRQ_EXIT_ON_IRQ_STACK feature at powerpc platform
From: Wang Dongsheng dongsheng.w...@freescale.com If softirq use hardirq stack, we will get kernel painc when a hard irq coming again during __do_softirq enable local irq to deal with softirq action. So we need to switch satck into softirq stack when invoke soft irq. Task--- | Task stack | Interrput-EXCEPTION-do_IRQ- ^| Hard irq stack || |irq_exit-__do_softirq-local_irq_enable-- --local_irq_disable | | Hard irq stack | | | Interrupt coming again | There will get a Interrupt nesting | Trace 1: Trap 900 Kernel stack overflow in process e8152f40, r1=e8e05ec0 CPU: 0 PID: 2399 Comm: image_compress/ Not tainted 3.13.0-rc3-03475-g2e3f85b #432 task: e8152f40 ti: c080a000 task.ti: ef176000 NIP: c05bec04 LR: c0305590 CTR: 0010 REGS: e8e05e10 TRAP: 0901 Not tainted (3.13.0-rc3-03475-g2e3f85b) MSR: 00029000 CE,EE,ME CR: 22f22722 XER: 2000 GPR00: c0305590 e8e05ec0 e8152f40 c07e1e2c 00029000 00ec fffc 0010 GPR08: 007f b02539f3 a00ae278 NIP [c05bec04] _raw_spin_unlock_irqrestore+0x10/0x14 LR [c0305590] add_timer_randomness+0x60/0xfc Call Trace: [e8e05ec0] [c0305590] add_timer_randomness+0x60/0xfc (unreliable) [e8e05ee0] [c026c9a8] blk_update_bidi_request+0x64/0x94 [e8e05f00] [c026cd00] blk_end_bidi_request+0x20/0x7c [e8e05f20] [c032f21c] scsi_io_completion+0xe0/0x5e8 [e8e05f70] [c0272b84] blk_done_softirq+0x98/0xb8 [e8e05f90] [c004893c] __do_softirq+0xf8/0x1f8 [e8e05fe0] [c0048dbc] irq_exit+0xa4/0xc8 [e8e05ff0] [c000d5f4] call_do_irq+0x24/0x3c [ef177d50] [c00046ec] do_IRQ+0x8c/0xf8 [ef177d70] [c000f6dc] ret_from_except+0x0/0x18 --- Exception: 501 at lzo1x_1_do_compress+0x248/0x40c LR = lzo1x_1_compress+0x98/0x268 [ef177e30] [c07c7440] runqueues+0x0/0x540 (unreliable) [ef177e60] [] (null) [ef177ea0] [c0085a9c] lzo_compress_threadfn+0x6c/0x138 [ef177ef0] [c0062a00] kthread+0xc4/0xd8 [ef177f40] [c000f158] ret_from_kernel_thread+0x5c/0x64 Instruction dump: 40a2fff0 4c00012c 2f89 419e000c 3860 4e800020 3861 4e800020 7c0004ac 3920 9123 7c800106 4e800020 7d201828 35290001 40810010 Kernel panic - not syncing: kernel stack overflow CPU: 0 PID: 2399 Comm: image_compress/ Not tainted 3.13.0-rc3-03475-g2e3f85b #432 Call Trace: Rebooting in 180 seconds.. Trace 2: Trap 500 VFS: Mounted root (ext2 filesystem) on device 1:0. devtmpfs: mounted Freeing unused kernel memory: 268K (c079a000 - c07dd000) INIT: version 2.88 booting Starting udev udevd[1423]: starting version 182 random: nonblocking pool is initialized Kernel stack overflow in process e829ca80, r1=e8badf90 CPU: 0 PID: 1553 Comm: mount.sh Not tainted 3.13.0-rc1-148228-gea7ca7c #21 task: e829ca80 ti: c081c000 task.ti: e9d28000 NIP: c00434bc LR: c0043444 CTR: c0018cec REGS: e8badee0 TRAP: 0501 Not tainted (3.13.0-rc1-148228-gea7ca7c) MSR: 00029000 CE,EE,ME CR: 48222422 XER: 2000 GPR00: c00439a0 e8badf90 e829ca80 0001 e80cc780 0001 b92f44af GPR08: 0001 010ba000 010ba000 ddd3e6d1 48222422 NIP [c00434bc] __do_softirq+0x94/0x1f8 LR [c0043444] __do_softirq+0x1c/0x1f8 Call Trace: [e8badf90] [100f] 0x100f (unreliable) [e8badfe0] [c00439a0] irq_exit+0xa4/0xc8 [e8badff0] [c000ccd8] call_do_irq+0x24/0x3c [e9d29f20] [c000479c] do_IRQ+0x8c/0xf8 [e9d29f40] [c000eb54] ret_from_except+0x0/0x18 --- Exception: 501 at 0x1003d540 LR = 0x10041974 Instruction dump: 3e80c082 3f40c07e 3b6a 3a941040 3aa0 3b5a9388 7f16c378 812f0008 5529103a 7d3c482e 7eb8492e 7c008146 3ba0 7e9ea378 4814 57fff87f Kernel panic - not syncing: kernel stack overflow CPU: 0 PID: 1553 Comm: mount.sh Not tainted 3.13.0-rc1-148228-gea7ca7c #21 Call Trace: Rebooting in 180 seconds.. Signed-off-by: Wang Dongsheng dongsheng.w...@freescale.com Cc: Frederic Weisbecker fweis...@gmail.com Cc: Benjamin Herrenschmidt b...@kernel.crashing.org Cc: Linus Torvalds torva...@linux-foundation.org Cc: Thomas Gleixner t...@linutronix.de Cc: Ingo Molnar mi...@kernel.org Cc: Peter Zijlstra pet...@infradead.org Cc: H. Peter Anvin h...@zytor.com Cc: Paul Mackerras pau...@au1.ibm.com Cc: James Hogan james.ho...@imgtec.com Cc: James E.J. Bottomley j...@parisc-linux.org Cc: Helge Deller del...@gmx.de Cc: Martin Schwidefsky schwidef...@de.ibm.com Cc: Heiko Carstens heiko.carst...@de.ibm.com Cc: David S. Miller da...@davemloft.net Cc: Andrew Morton a...@linux-foundation.org diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 957bf34..ffde3fb 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@
Re: [PATCH] powerpc/irq: Remove HAVE_IRQ_EXIT_ON_IRQ_STACK feature at powerpc platform
On Fri, Mar 28, 2014 at 03:38:32PM +0800, Dongsheng Wang wrote: From: Wang Dongsheng dongsheng.w...@freescale.com If softirq use hardirq stack, we will get kernel painc when a hard irq coming again during __do_softirq enable local irq to deal with softirq action. So we need to switch satck into softirq stack when invoke soft irq. Task--- | Task stack | Interrput-EXCEPTION-do_IRQ- ^| Hard irq stack || |irq_exit-__do_softirq-local_irq_enable-- --local_irq_disable | | Hard irq stack | | | Interrupt coming again | There will get a Interrupt nesting | Trace 1: Trap 900 Kernel stack overflow in process e8152f40, r1=e8e05ec0 CPU: 0 PID: 2399 Comm: image_compress/ Not tainted 3.13.0-rc3-03475-g2e3f85b #432 task: e8152f40 ti: c080a000 task.ti: ef176000 NIP: c05bec04 LR: c0305590 CTR: 0010 REGS: e8e05e10 TRAP: 0901 Not tainted (3.13.0-rc3-03475-g2e3f85b) Could you double check if you got the following patch applied? commit 1a18a66446f3f289b05b634f18012424d82aa63a Author: Kevin Hao haoke...@gmail.com Date: Fri Jan 17 12:25:28 2014 +0800 powerpc: Set the correct ksp_limit on ppc32 when switching to irq stack Guenter Roeck has got the following call trace on a p2020 board: Kernel stack overflow in process eb3e5a00, r1=eb79df90 CPU: 0 PID: 2838 Comm: ssh Not tainted 3.13.0-rc8-juniper-00146-g19eca00 #4 task: eb3e5a00 ti: c0616000 task.ti: ef44 NIP: c003a420 LR: c003a410 CTR: c0017518 REGS: eb79dee0 TRAP: 0901 Not tainted (3.13.0-rc8-juniper-00146-g19eca00) MSR: 00029000 CE,EE,ME CR: 24008444 XER: GPR00: c003a410 eb79df90 eb3e5a00 eb05d900 0001 65d87646 GPR08: 020b8000 44008442 NIP [c003a420] __do_softirq+0x94/0x1ec LR [c003a410] __do_softirq+0x84/0x1ec Call Trace: [eb79df90] [c003a410] __do_softirq+0x84/0x1ec (unreliable) [eb79dfe0] [c003a970] irq_exit+0xbc/0xc8 [eb79dff0] [c000cc1c] call_do_irq+0x24/0x3c [ef441f20] [c00046a8] do_IRQ+0x8c/0xf8 [ef441f40] [c000e7f4] ret_from_except+0x0/0x18 --- Exception: 501 at 0xfcda524 LR = 0x10024900 Instruction dump: 7c781b78 3b4a 3a73b040 543c0024 3a80 3b3913a0 7ef5bb78 48201bf9 5463103a 7d3b182e 7e89b92e 7c008146 3ba0 7e7e9b78 4814 57fff87f Kernel panic - not syncing: kernel stack overflow CPU: 0 PID: 2838 Comm: ssh Not tainted 3.13.0-rc8-juniper-00146-g19eca00 #4 Call Trace: The reason is that we have used the wrong register to calculate the ksp_limit in commit cbc9565ee826 (powerpc: Remove ksp_limit on ppc64). Just fix it. As suggested by Benjamin Herrenschmidt, also add the C prototype of the function in the comment in order to avoid such kind of errors in the future. Cc: sta...@vger.kernel.org # 3.12 Reported-by: Guenter Roeck li...@roeck-us.net Tested-by: Guenter Roeck li...@roeck-us.net Signed-off-by: Kevin Hao haoke...@gmail.com Signed-off-by: Benjamin Herrenschmidt b...@kernel.crashing.org Thanks, Kevin pgp4yk53md3vO.pgp Description: PGP signature ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
RE: [PATCH] powerpc/irq: Remove HAVE_IRQ_EXIT_ON_IRQ_STACK feature at powerpc platform
Thanks Kevin. Your patch works normal. :) I still have some confused. I think when __do_softirq always get a interrupt, the hard stack will be run out, isn't it? Regards, -Dongsheng -Original Message- From: Kevin Hao [mailto:haoke...@gmail.com] Sent: Friday, March 28, 2014 4:18 PM To: Wang Dongsheng-B40534 Cc: fweis...@gmail.com; James Hogan; Andrew Morton; David S. Miller; Peter Zijlstra; Helge Deller; H. Peter Anvin; Heiko Carstens; linux- ker...@vger.kernel.org; Paul Mackerras; James E.J. Bottomley; Linus Torvalds; Jin Zhengxiong-R64188; Wood Scott-B07421; Thomas Gleixner; linuxppc- d...@lists.ozlabs.org; Ingo Molnar; Martin Schwidefsky Subject: Re: [PATCH] powerpc/irq: Remove HAVE_IRQ_EXIT_ON_IRQ_STACK feature at powerpc platform On Fri, Mar 28, 2014 at 03:38:32PM +0800, Dongsheng Wang wrote: From: Wang Dongsheng dongsheng.w...@freescale.com If softirq use hardirq stack, we will get kernel painc when a hard irq coming again during __do_softirq enable local irq to deal with softirq action. So we need to switch satck into softirq stack when invoke soft irq. Task--- | Task stack | Interrput-EXCEPTION-do_IRQ- ^| Hard irq stack || |irq_exit-__do_softirq-local_irq_enable-- -- local_irq_disable | | Hard irq stack | | | Interrupt coming again | There will get a Interrupt nesting | -- -- Trace 1: Trap 900 Kernel stack overflow in process e8152f40, r1=e8e05ec0 CPU: 0 PID: 2399 Comm: image_compress/ Not tainted 3.13.0-rc3-03475-g2e3f85b #432 task: e8152f40 ti: c080a000 task.ti: ef176000 NIP: c05bec04 LR: c0305590 CTR: 0010 REGS: e8e05e10 TRAP: 0901 Not tainted (3.13.0-rc3-03475-g2e3f85b) Could you double check if you got the following patch applied? commit 1a18a66446f3f289b05b634f18012424d82aa63a Author: Kevin Hao haoke...@gmail.com Date: Fri Jan 17 12:25:28 2014 +0800 powerpc: Set the correct ksp_limit on ppc32 when switching to irq stack Guenter Roeck has got the following call trace on a p2020 board: Kernel stack overflow in process eb3e5a00, r1=eb79df90 CPU: 0 PID: 2838 Comm: ssh Not tainted 3.13.0-rc8-juniper-00146-g19eca00 #4 task: eb3e5a00 ti: c0616000 task.ti: ef44 NIP: c003a420 LR: c003a410 CTR: c0017518 REGS: eb79dee0 TRAP: 0901 Not tainted (3.13.0-rc8-juniper-00146-g19eca00) MSR: 00029000 CE,EE,ME CR: 24008444 XER: GPR00: c003a410 eb79df90 eb3e5a00 eb05d900 0001 65d87646 GPR08: 020b8000 44008442 NIP [c003a420] __do_softirq+0x94/0x1ec LR [c003a410] __do_softirq+0x84/0x1ec Call Trace: [eb79df90] [c003a410] __do_softirq+0x84/0x1ec (unreliable) [eb79dfe0] [c003a970] irq_exit+0xbc/0xc8 [eb79dff0] [c000cc1c] call_do_irq+0x24/0x3c [ef441f20] [c00046a8] do_IRQ+0x8c/0xf8 [ef441f40] [c000e7f4] ret_from_except+0x0/0x18 --- Exception: 501 at 0xfcda524 LR = 0x10024900 Instruction dump: 7c781b78 3b4a 3a73b040 543c0024 3a80 3b3913a0 7ef5bb78 48201bf9 5463103a 7d3b182e 7e89b92e 7c008146 3ba0 7e7e9b78 4814 57fff87f Kernel panic - not syncing: kernel stack overflow CPU: 0 PID: 2838 Comm: ssh Not tainted 3.13.0-rc8-juniper-00146-g19eca00 #4 Call Trace: The reason is that we have used the wrong register to calculate the ksp_limit in commit cbc9565ee826 (powerpc: Remove ksp_limit on ppc64). Just fix it. As suggested by Benjamin Herrenschmidt, also add the C prototype of the function in the comment in order to avoid such kind of errors in the future. Cc: sta...@vger.kernel.org # 3.12 Reported-by: Guenter Roeck li...@roeck-us.net Tested-by: Guenter Roeck li...@roeck-us.net Signed-off-by: Kevin Hao haoke...@gmail.com Signed-off-by: Benjamin Herrenschmidt b...@kernel.crashing.org Thanks, Kevin ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] powerpc/irq: Remove HAVE_IRQ_EXIT_ON_IRQ_STACK feature at powerpc platform
On Fri, Mar 28, 2014 at 09:00:13AM +, dongsheng.w...@freescale.com wrote: Thanks Kevin. Your patch works normal. :) I still have some confused. I think when __do_softirq always get a interrupt, the hard stack will be run out, isn't it? No, it won't. Please see the explanation in the following commit log. commit cc1f027454929924471bea2f362431072e3c71be Author: Frederic Weisbecker fweis...@gmail.com Date: Tue Sep 24 17:17:47 2013 +0200 irq: Optimize softirq stack selection in irq exit If irq_exit() is called on the arch's specified irq stack, it should be safe to run softirqs inline under that same irq stack as it is near empty by the time we call irq_exit(). For example if we use the same stack for both hard and soft irqs here, the worst case scenario is: hardirq - softirq - hardirq. But then the softirq supersedes the first hardirq as the stack user since irq_exit() is called in a mostly empty stack. So the stack merge in this case looks acceptable. Stack overrun still have a chance to happen if hardirqs have more opportunities to nest, but then it's another problem to solve. So lets adapt the irq exit's softirq stack on top of a new Kconfig symbol that can be defined when irq_exit() runs on the irq stack. That way we can spare some stack switch on irq processing and all the cache issues that come along. Thanks, Kevin pgpYaRl6M_j5q.pgp Description: PGP signature ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] powerpc/irq: Remove HAVE_IRQ_EXIT_ON_IRQ_STACK feature at powerpc platform
On Fri, 2014-03-28 at 15:38 +0800, Dongsheng Wang wrote: From: Wang Dongsheng dongsheng.w...@freescale.com If softirq use hardirq stack, we will get kernel painc when a hard irq coming again during __do_softirq enable local irq to deal with softirq action. So we need to switch satck into softirq stack when invoke soft irq. Yes, an interrupt can potentially nest but we should be near the top of the stack at that point, as the comment says in softirq.c, it should be fine. And your backtrace doesn't seem to indicate a major overflow. The code in do_IRQ() will make sure we don't switch stack again if we were already on either hard or softirq stack. I need a better analysis of your problem. Is that really a stack overflow ? Or is it a false positive due to a bug in the overflow detection ? I moved around the code that updates KSP_LIMIT in 32-bit to asm in misc_32.S a while ago since we don't do that on 64-bit, maybe we are getting it wrong... Cheers, Ben. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] powerpc/irq: Remove HAVE_IRQ_EXIT_ON_IRQ_STACK feature at powerpc platform
On Fri, 2014-03-28 at 16:18 +0800, Kevin Hao wrote: powerpc: Set the correct ksp_limit on ppc32 when switching to irq stack Kevin. It looks like it was applied to 3.14 and sent to 3.12 stable but not 3.13 ... can you fix that up ? Cheers, Ben. Guenter Roeck has got the following call trace on a p2020 board: Kernel stack overflow in process eb3e5a00, r1=eb79df90 CPU: 0 PID: 2838 Comm: ssh Not tainted 3.13.0-rc8-juniper-00146-g19eca00 #4 task: eb3e5a00 ti: c0616000 task.ti: ef44 NIP: c003a420 LR: c003a410 CTR: c0017518 REGS: eb79dee0 TRAP: 0901 Not tainted (3.13.0-rc8-juniper-00146-g19eca00) MSR: 00029000 CE,EE,ME CR: 24008444 XER: GPR00: c003a410 eb79df90 eb3e5a00 eb05d900 0001 65d87646 GPR08: 020b8000 44008442 NIP [c003a420] __do_softirq+0x94/0x1ec LR [c003a410] __do_softirq+0x84/0x1ec Call Trace: [eb79df90] [c003a410] __do_softirq+0x84/0x1ec (unreliable) [eb79dfe0] [c003a970] irq_exit+0xbc/0xc8 [eb79dff0] [c000cc1c] call_do_irq+0x24/0x3c [ef441f20] [c00046a8] do_IRQ+0x8c/0xf8 [ef441f40] [c000e7f4] ret_from_except+0x0/0x18 --- Exception: 501 at 0xfcda524 LR = 0x10024900 Instruction dump: 7c781b78 3b4a 3a73b040 543c0024 3a80 3b3913a0 7ef5bb78 48201bf9 5463103a 7d3b182e 7e89b92e 7c008146 3ba0 7e7e9b78 4814 57fff87f Kernel panic - not syncing: kernel stack overflow CPU: 0 PID: 2838 Comm: ssh Not tainted 3.13.0-rc8-juniper-00146-g19eca00 #4 Call Trace: The reason is that we have used the wrong register to calculate the ksp_limit in commit cbc9565ee826 (powerpc: Remove ksp_limit on ppc64). Just fix it. As suggested by Benjamin Herrenschmidt, also add the C prototype of the function in the comment in order to avoid such kind of errors in the future. Cc: sta...@vger.kernel.org # 3.12 Reported-by: Guenter Roeck li...@roeck-us.net Tested-by: Guenter Roeck li...@roeck-us.net Signed-off-by: Kevin Hao haoke...@gmail.com Signed-off-by: Benjamin Herrenschmidt b...@kernel.crashing.org Thanks, Kevin ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] powerpc/irq: Remove HAVE_IRQ_EXIT_ON_IRQ_STACK feature at powerpc platform
On Sat, Mar 29, 2014 at 08:27:07AM +1100, Benjamin Herrenschmidt wrote: On Fri, 2014-03-28 at 16:18 +0800, Kevin Hao wrote: powerpc: Set the correct ksp_limit on ppc32 when switching to irq stack Kevin. It looks like it was applied to 3.14 and sent to 3.12 stable but not 3.13 ... can you fix that up ? It was already merged into 3.13 stable since 3.13.6: https://lkml.org/lkml/2014/3/4/787 I guess that Dongsheng didn't use the latest 3.13 stable tree. Thanks, Kevin pgpOYtJ8PSajM.pgp Description: PGP signature ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev