Re: [PATCH] powerpc: Fix alignment of secondary cpu spin vars
On Wed, Jan 08, 2014 at 03:18:26PM +1100, Benjamin Herrenschmidt wrote: On Wed, 2014-01-08 at 15:09 +1100, Michael Ellerman wrote: Of course, main worry is that this is just hiding some latent NULL deref in the kernel now... :-/ Wow, that would have to come close to winning the grossest-hack-in-arch-powerpc award :) Have you tried changing the value at 8 to point to a reserved page? Some other possibilities: * Change the #define so FIXUP_ENDIAN is empty for PASEMI, that would mean you'd only be able to boot pasemi_defconfig. No thanks -- this went uncaught because that used to be all I booted (and for some random reason it didn't trigger in that case). * Move the hack into FIXUP_ENDIAN We actually found the root cause on irc the other day, I was waiting for Olof to send a fix :-) Yeah, I'm low on spare time these days, in particular spare time to spend on ppc stuff. :-( Olof: Can you try this totally untested patch ? With one fixup below: Tested-by: Olof Johansson o...@lixom.net --- a/arch/powerpc/kernel/prom_init.c +++ b/arch/powerpc/kernel/prom_init.c @@ -1986,8 +1986,6 @@ static void __init prom_init_stdout(void) /* Get the full OF pathname of the stdout device */ memset(path, 0, 256); call_prom(instance-to-path, 3, 1, prom.stdout, path, 255); - stdout_node = call_prom(instance-to-package, 1, 1, prom.stdout); - val = cpu_to_be32(stdout_node); prom_setprop(prom.chosen, /chosen, linux,stdout-package, val, sizeof(val)); prom_printf(OF stdout device is: %s\n, of_stdout_device); @@ -1995,10 +1993,14 @@ static void __init prom_init_stdout(void) path, strlen(path) + 1); /* If it's a display, note it */ - memset(type, 0, sizeof(type)); - prom_getprop(stdout_node, device_type, type, sizeof(type)); - if (strcmp(type, display) == 0) - prom_setprop(stdout_node, path, linux,boot-display, NULL, 0); + stdout_node = call_prom(instance-to-package, 1, 1, prom.stdout); + if (stdout_node != PROM_ERROR) { + val = cpu_to_be32(stdout_node); + memset(type, 0, sizeof(type)); + prom_getprop(stdout_node, device_type, type, sizeof(type)); + if (strcmp(type, display) == 0) + prom_setprop(stdout_node, path, linux,boot-display, NU Line is cut off, this needs NULL, 0); at the end. -Olof ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] powerpc: Fix alignment of secondary cpu spin vars
On Wed, 2014-01-08 at 09:48 -0800, Olof Johansson wrote: /* If it's a display, note it */ - memset(type, 0, sizeof(type)); - prom_getprop(stdout_node, device_type, type, sizeof(type)); - if (strcmp(type, display) == 0) - prom_setprop(stdout_node, path, linux,boot-display, NULL, 0); + stdout_node = call_prom(instance-to-package, 1, 1, prom.stdout); + if (stdout_node != PROM_ERROR) { + val = cpu_to_be32(stdout_node); + memset(type, 0, sizeof(type)); + prom_getprop(stdout_node, device_type, type, sizeof(type)); + if (strcmp(type, display) == 0) + prom_setprop(stdout_node, path, linux,boot-display, NU Line is cut off, this needs NULL, 0); at the end. Right, copy/paste failure :-) Thanks, I'll try to get that to Linus before he cuts .13, otherwise it will be -stable. Cheers, Ben. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] powerpc: Fix alignment of secondary cpu spin vars
On Fri, 2014-01-03 at 00:12 -0800, Olof Johansson wrote: On Thu, Jan 02, 2014 at 11:56:04PM -0800, Olof Johansson wrote: This makes things interesting though. The BE/LE trampoline code assumes at least 3 consecutive instructions. What was the reasoning behind entering the kernel LE instead of keeping the old boot protocol and just switching to LE once kernel is loaded? Is it actually used on some platforms or is this just a theoretical thing? Actually, adding a little hack that zeroes out the memory once we're done executing it will work just fine too. I know this is sort of icky, but maybe it'll be good enough for now? Of course, main worry is that this is just hiding some latent NULL deref in the kernel now... :-/ Wow, that would have to come close to winning the grossest-hack-in-arch-powerpc award :) Have you tried changing the value at 8 to point to a reserved page? Some other possibilities: * Change the #define so FIXUP_ENDIAN is empty for PASEMI, that would mean you'd only be able to boot pasemi_defconfig. * Move the hack into FIXUP_ENDIAN cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] powerpc: Fix alignment of secondary cpu spin vars
On Wed, 2014-01-08 at 15:09 +1100, Michael Ellerman wrote: Of course, main worry is that this is just hiding some latent NULL deref in the kernel now... :-/ Wow, that would have to come close to winning the grossest-hack-in-arch-powerpc award :) Have you tried changing the value at 8 to point to a reserved page? Some other possibilities: * Change the #define so FIXUP_ENDIAN is empty for PASEMI, that would mean you'd only be able to boot pasemi_defconfig. * Move the hack into FIXUP_ENDIAN We actually found the root cause on irc the other day, I was waiting for Olof to send a fix :-) Olof: Can you try this totally untested patch ? --- a/arch/powerpc/kernel/prom_init.c +++ b/arch/powerpc/kernel/prom_init.c @@ -1986,8 +1986,6 @@ static void __init prom_init_stdout(void) /* Get the full OF pathname of the stdout device */ memset(path, 0, 256); call_prom(instance-to-path, 3, 1, prom.stdout, path, 255); - stdout_node = call_prom(instance-to-package, 1, 1, prom.stdout); - val = cpu_to_be32(stdout_node); prom_setprop(prom.chosen, /chosen, linux,stdout-package, val, sizeof(val)); prom_printf(OF stdout device is: %s\n, of_stdout_device); @@ -1995,10 +1993,14 @@ static void __init prom_init_stdout(void) path, strlen(path) + 1); /* If it's a display, note it */ - memset(type, 0, sizeof(type)); - prom_getprop(stdout_node, device_type, type, sizeof(type)); - if (strcmp(type, display) == 0) - prom_setprop(stdout_node, path, linux,boot-display, NULL, 0); + stdout_node = call_prom(instance-to-package, 1, 1, prom.stdout); + if (stdout_node != PROM_ERROR) { + val = cpu_to_be32(stdout_node); + memset(type, 0, sizeof(type)); + prom_getprop(stdout_node, device_type, type, sizeof(type)); + if (strcmp(type, display) == 0) + prom_setprop(stdout_node, path, linux,boot-display, NU + } } ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] powerpc: Fix alignment of secondary cpu spin vars
On Thu, Jan 02, 2014 at 11:56:04PM -0800, Olof Johansson wrote: This makes things interesting though. The BE/LE trampoline code assumes at least 3 consecutive instructions. What was the reasoning behind entering the kernel LE instead of keeping the old boot protocol and just switching to LE once kernel is loaded? Is it actually used on some platforms or is this just a theoretical thing? Actually, adding a little hack that zeroes out the memory once we're done executing it will work just fine too. I know this is sort of icky, but maybe it'll be good enough for now? Of course, main worry is that this is just hiding some latent NULL deref in the kernel now... :-/ -Olof --- 8 --- 8 --- 8 --- 8 --- 8 --- 8 --- 8 --- 8 --- 8 --- From 4d003186cae546900cefc9e51b0ed4e65f775be1 Mon Sep 17 00:00:00 2001 From: Olof Johansson o...@lixom.net Date: Fri, 3 Jan 2014 00:09:28 -0800 Subject: [PATCH] powerpc: set some low memory contents to 0 early The little-endian code adds some code path to __start, which essentially ends up adding memory contents in low memory that didn't use to be there. That seems to have triggered a latent bug, either in firmware or kernel, where the 64-bit word located at physical address 8 needs to be 0. The simple hack for this right now is to write it to 0 after we're done executing it, which is what this patch does. Unfortunately I no longer seem to have a working JTAG setup nor firmware sources, so debugging this down to root cause might be more trouble than it's worth given the relatively simple workaround. Signed-off-by: Olof Johansson o...@lixom.net --- arch/powerpc/kernel/head_64.S |7 +++ 1 file changed, 7 insertions(+) diff --git a/arch/powerpc/kernel/head_64.S b/arch/powerpc/kernel/head_64.S index 2ae41ab..437d8bd 100644 --- a/arch/powerpc/kernel/head_64.S +++ b/arch/powerpc/kernel/head_64.S @@ -69,6 +69,13 @@ _GLOBAL(__start) /* NOP this out unconditionally */ BEGIN_FTR_SECTION FIXUP_ENDIAN + /* Hack for PWRficient platforms: Due to CFE(?) bug, the 64-bit +* word at 0x8 needs to be set to 0. Patch it up here once we're +* done executing it (we can be lazy and avoid invalidating +* icache) +*/ + li r0,0 + std 0,8(0) b .__start_initialization_multiplatform END_FTR_SECTION(0, 1) -- 1.7.10.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] powerpc: Fix alignment of secondary cpu spin vars
On Sat, Dec 28, 2013 at 1:05 PM, Olof Johansson o...@lixom.net wrote: Sigh, it's not this after all. I did a clean build with this applied and still see failures. Something else is (also?) going on here. Ok, so after some more digging I actually think that this isn't about the new code added as much as it is about having more code in low memory. Before, there were only two instuctions in __start: b .__start_initialization_multiplatform trap Now, there's a whole bunch: c000 .__start: c000: 08 00 00 48 tdi 0,r0,72 c004: 48 00 00 24 b c028 .__start+0x28 c008: 05 00 9f 42 .long 0x5009f42 c00c: a6 02 48 7d lhzur16,18557(r2) c010: 1c 00 4a 39 mulli r0,r0,19001 c014: a6 00 60 7d lhzur16,24701(0) c018: 01 00 6b 69 .long 0x1006b69 c01c: a6 03 5a 7d lhzur16,23165(r3) c020: a6 03 7b 7d lhzur16,31613(r3) c024: 24 00 00 4c dozir0,r0,76 c028: 48 00 95 84 b c00095ac .__start_initialization_multiplatform c02c: 7f e0 00 08 trap And indeed, by replacing some of the LE hand-converted code with 0x0, it seems that what's really making things blow up here is that 0x8-0xc contain something else than 0x0. Where/why this comes from I'm less certain of -- and since I seem to no longer have a usable JTAG setup, I can't break in and see where the code gets stuck and call paths, etc. So it's pure speculation, but I'm guessing it's a null pointer dereference somewhere with a chained pointer as the second member in a struct, i.e. with NULL the stray null ptr deref does no harm. Since it doesn't seem to impact pSeries, there's a chance that the bug is in firmware, not in the kernel, since this seems to happen during fairly early boot, i.e. possibly while grabbing the DT contents out. This makes things interesting though. The BE/LE trampoline code assumes at least 3 consecutive instructions. What was the reasoning behind entering the kernel LE instead of keeping the old boot protocol and just switching to LE once kernel is loaded? Is it actually used on some platforms or is this just a theoretical thing? -Olof ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH] powerpc: Fix alignment of secondary cpu spin vars
Commit 5c0484e25ec0 ('powerpc: Endian safe trampoline') resulted in losing proper alignment of the spinlock variables used when booting secondary CPUs, causing some quite odd issues with failing to boot on PA Semi-based systems. This showed itself on ppc64_defconfig, but not on pasemi_defconfig, so it had gone unnoticed when I initially tested the LE patch set. Fix is to add explicit alignment instead of relying on good luck. :) Fixes: 5c0484e25ec0 ('powerpc: Endian safe trampoline') Reported-by: Christian Zigotzky chzigot...@xenosoft.de Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=67811 Signed-off-by: Olof Johansson o...@lixom.net --- arch/powerpc/kernel/head_64.S |1 + 1 file changed, 1 insertion(+) diff --git a/arch/powerpc/kernel/head_64.S b/arch/powerpc/kernel/head_64.S index 2ae41ab..4d94477 100644 --- a/arch/powerpc/kernel/head_64.S +++ b/arch/powerpc/kernel/head_64.S @@ -80,6 +80,7 @@ END_FTR_SECTION(0, 1) * of the function that the cpu should jump to to continue * initialization. */ + .balign 8 .globl __secondary_hold_spinloop __secondary_hold_spinloop: .llong 0x0 -- 1.7.10.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] powerpc: Fix alignment of secondary cpu spin vars
On Sat, Dec 28, 2013 at 1:01 PM, Olof Johansson o...@lixom.net wrote: Commit 5c0484e25ec0 ('powerpc: Endian safe trampoline') resulted in losing proper alignment of the spinlock variables used when booting secondary CPUs, causing some quite odd issues with failing to boot on PA Semi-based systems. This showed itself on ppc64_defconfig, but not on pasemi_defconfig, so it had gone unnoticed when I initially tested the LE patch set. Fix is to add explicit alignment instead of relying on good luck. :) Fixes: 5c0484e25ec0 ('powerpc: Endian safe trampoline') Reported-by: Christian Zigotzky chzigot...@xenosoft.de Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=67811 Signed-off-by: Olof Johansson o...@lixom.net Sigh, it's not this after all. I did a clean build with this applied and still see failures. Something else is (also?) going on here. -Olof ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev