Re: [PATCH] powerpc: Fix alignment of secondary cpu spin vars

2014-01-08 Thread Olof Johansson
On Wed, Jan 08, 2014 at 03:18:26PM +1100, Benjamin Herrenschmidt wrote:
 On Wed, 2014-01-08 at 15:09 +1100, Michael Ellerman wrote:
   Of course, main worry is that this is just hiding some latent NULL
  deref in
   the kernel now... :-/
  
  Wow, that would have to come close to winning the
  grossest-hack-in-arch-powerpc
  award :)
  
  Have you tried changing the value at 8 to point to a reserved page?
  
  Some other possibilities:
  
   * Change the #define so FIXUP_ENDIAN is empty for PASEMI, that would
  mean
 you'd only be able to boot pasemi_defconfig.

No thanks -- this went uncaught because that used to be all I booted
(and for some random reason it didn't trigger in that case).

   * Move the hack into FIXUP_ENDIAN
 
 We actually found the root cause on irc the other day, I was waiting for
 Olof to send a fix :-)

Yeah, I'm low on spare time these days, in particular spare time to spend on
ppc stuff. :-(

 Olof: Can you try this totally untested patch ?

With one fixup below:

Tested-by: Olof Johansson o...@lixom.net

 --- a/arch/powerpc/kernel/prom_init.c
 +++ b/arch/powerpc/kernel/prom_init.c
 @@ -1986,8 +1986,6 @@ static void __init prom_init_stdout(void)
 /* Get the full OF pathname of the stdout device */
 memset(path, 0, 256);
 call_prom(instance-to-path, 3, 1, prom.stdout, path, 255);
 -   stdout_node = call_prom(instance-to-package, 1, 1, prom.stdout);
 -   val = cpu_to_be32(stdout_node);
 prom_setprop(prom.chosen, /chosen, linux,stdout-package,
  val, sizeof(val));
 prom_printf(OF stdout device is: %s\n, of_stdout_device);
 @@ -1995,10 +1993,14 @@ static void __init prom_init_stdout(void)
  path, strlen(path) + 1);
  
 /* If it's a display, note it */
 -   memset(type, 0, sizeof(type));
 -   prom_getprop(stdout_node, device_type, type, sizeof(type));
 -   if (strcmp(type, display) == 0)
 -   prom_setprop(stdout_node, path, linux,boot-display, NULL, 
 0);
 +   stdout_node = call_prom(instance-to-package, 1, 1, prom.stdout);
 +   if (stdout_node != PROM_ERROR) {
 +   val = cpu_to_be32(stdout_node);
 +   memset(type, 0, sizeof(type));
 +   prom_getprop(stdout_node, device_type, type, sizeof(type));
 +   if (strcmp(type, display) == 0)
 +   prom_setprop(stdout_node, path, linux,boot-display, 
 NU

Line is cut off, this needs NULL, 0); at the end.



-Olof
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH] powerpc: Fix alignment of secondary cpu spin vars

2014-01-08 Thread Benjamin Herrenschmidt
On Wed, 2014-01-08 at 09:48 -0800, Olof Johansson wrote:

  /* If it's a display, note it */
  -   memset(type, 0, sizeof(type));
  -   prom_getprop(stdout_node, device_type, type, sizeof(type));
  -   if (strcmp(type, display) == 0)
  -   prom_setprop(stdout_node, path, linux,boot-display, NULL, 
  0);
  +   stdout_node = call_prom(instance-to-package, 1, 1, prom.stdout);
  +   if (stdout_node != PROM_ERROR) {
  +   val = cpu_to_be32(stdout_node);
  +   memset(type, 0, sizeof(type));
  +   prom_getprop(stdout_node, device_type, type, 
  sizeof(type));
  +   if (strcmp(type, display) == 0)
  +   prom_setprop(stdout_node, path, 
  linux,boot-display, NU
 
 Line is cut off, this needs NULL, 0); at the end.

Right, copy/paste failure :-)

Thanks, I'll try to get that to Linus before he cuts .13, otherwise it
will be -stable.

Cheers,
Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH] powerpc: Fix alignment of secondary cpu spin vars

2014-01-07 Thread Michael Ellerman
On Fri, 2014-01-03 at 00:12 -0800, Olof Johansson wrote:
 On Thu, Jan 02, 2014 at 11:56:04PM -0800, Olof Johansson wrote:
 
  This makes things interesting though. The BE/LE trampoline code
  assumes at least 3 consecutive instructions. What was the reasoning
  behind entering the kernel LE instead of keeping the old boot protocol
  and just switching to LE once kernel is loaded? Is it actually used on
  some platforms or is this just a theoretical thing?
 
 Actually, adding a little hack that zeroes out the memory once we're done
 executing it will work just fine too. I know this is sort of icky, but maybe
 it'll be good enough for now?
 
 Of course, main worry is that this is just hiding some latent NULL deref in
 the kernel now... :-/

Wow, that would have to come close to winning the grossest-hack-in-arch-powerpc
award :)

Have you tried changing the value at 8 to point to a reserved page?

Some other possibilities:

 * Change the #define so FIXUP_ENDIAN is empty for PASEMI, that would mean
   you'd only be able to boot pasemi_defconfig.
 * Move the hack into FIXUP_ENDIAN

cheers


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH] powerpc: Fix alignment of secondary cpu spin vars

2014-01-07 Thread Benjamin Herrenschmidt
On Wed, 2014-01-08 at 15:09 +1100, Michael Ellerman wrote:
  Of course, main worry is that this is just hiding some latent NULL
 deref in
  the kernel now... :-/
 
 Wow, that would have to come close to winning the
 grossest-hack-in-arch-powerpc
 award :)
 
 Have you tried changing the value at 8 to point to a reserved page?
 
 Some other possibilities:
 
  * Change the #define so FIXUP_ENDIAN is empty for PASEMI, that would
 mean
you'd only be able to boot pasemi_defconfig.
  * Move the hack into FIXUP_ENDIAN

We actually found the root cause on irc the other day, I was waiting for
Olof to send a fix :-)

Olof: Can you try this totally untested patch ?

--- a/arch/powerpc/kernel/prom_init.c
+++ b/arch/powerpc/kernel/prom_init.c
@@ -1986,8 +1986,6 @@ static void __init prom_init_stdout(void)
/* Get the full OF pathname of the stdout device */
memset(path, 0, 256);
call_prom(instance-to-path, 3, 1, prom.stdout, path, 255);
-   stdout_node = call_prom(instance-to-package, 1, 1, prom.stdout);
-   val = cpu_to_be32(stdout_node);
prom_setprop(prom.chosen, /chosen, linux,stdout-package,
 val, sizeof(val));
prom_printf(OF stdout device is: %s\n, of_stdout_device);
@@ -1995,10 +1993,14 @@ static void __init prom_init_stdout(void)
 path, strlen(path) + 1);
 
/* If it's a display, note it */
-   memset(type, 0, sizeof(type));
-   prom_getprop(stdout_node, device_type, type, sizeof(type));
-   if (strcmp(type, display) == 0)
-   prom_setprop(stdout_node, path, linux,boot-display, NULL, 0);
+   stdout_node = call_prom(instance-to-package, 1, 1, prom.stdout);
+   if (stdout_node != PROM_ERROR) {
+   val = cpu_to_be32(stdout_node);
+   memset(type, 0, sizeof(type));
+   prom_getprop(stdout_node, device_type, type, sizeof(type));
+   if (strcmp(type, display) == 0)
+   prom_setprop(stdout_node, path, linux,boot-display, NU
+   }
 }


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH] powerpc: Fix alignment of secondary cpu spin vars

2014-01-03 Thread Olof Johansson
On Thu, Jan 02, 2014 at 11:56:04PM -0800, Olof Johansson wrote:

 This makes things interesting though. The BE/LE trampoline code
 assumes at least 3 consecutive instructions. What was the reasoning
 behind entering the kernel LE instead of keeping the old boot protocol
 and just switching to LE once kernel is loaded? Is it actually used on
 some platforms or is this just a theoretical thing?

Actually, adding a little hack that zeroes out the memory once we're done
executing it will work just fine too. I know this is sort of icky, but maybe
it'll be good enough for now?

Of course, main worry is that this is just hiding some latent NULL deref in
the kernel now... :-/


-Olof

--- 8 --- 8 --- 8 --- 8 --- 8 --- 8 --- 8 --- 8 --- 8 --- 

From 4d003186cae546900cefc9e51b0ed4e65f775be1 Mon Sep 17 00:00:00 2001
From: Olof Johansson o...@lixom.net
Date: Fri, 3 Jan 2014 00:09:28 -0800
Subject: [PATCH] powerpc: set some low memory contents to 0 early

The little-endian code adds some code path to __start, which essentially ends
up adding memory contents in low memory that didn't use to be there.

That seems to have triggered a latent bug, either in firmware or kernel, where
the 64-bit word located at physical address 8 needs to be 0.

The simple hack for this right now is to write it to 0 after we're done
executing it, which is what this patch does. Unfortunately I no longer
seem to have a working JTAG setup nor firmware sources, so debugging
this down to root cause might be more trouble than it's worth given the
relatively simple workaround.

Signed-off-by: Olof Johansson o...@lixom.net
---
 arch/powerpc/kernel/head_64.S |7 +++
 1 file changed, 7 insertions(+)

diff --git a/arch/powerpc/kernel/head_64.S b/arch/powerpc/kernel/head_64.S
index 2ae41ab..437d8bd 100644
--- a/arch/powerpc/kernel/head_64.S
+++ b/arch/powerpc/kernel/head_64.S
@@ -69,6 +69,13 @@ _GLOBAL(__start)
/* NOP this out unconditionally */
 BEGIN_FTR_SECTION
FIXUP_ENDIAN
+   /* Hack for PWRficient platforms: Due to CFE(?) bug, the 64-bit
+* word at 0x8 needs to be set to 0. Patch it up here once we're
+* done executing it (we can be lazy and avoid invalidating
+* icache)
+*/
+   li  r0,0
+   std 0,8(0)
b   .__start_initialization_multiplatform
 END_FTR_SECTION(0, 1)
 
-- 
1.7.10.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH] powerpc: Fix alignment of secondary cpu spin vars

2014-01-02 Thread Olof Johansson
On Sat, Dec 28, 2013 at 1:05 PM, Olof Johansson o...@lixom.net wrote:

 Sigh, it's not this after all. I did a clean build with this applied
 and still see failures. Something else is (also?) going on here.

Ok, so after some more digging I actually think that this isn't about
the new code added as much as it is about having more code in low
memory.

Before, there were only two instuctions in __start:

b   .__start_initialization_multiplatform
trap

Now, there's a whole bunch:

c000 .__start:
c000:   08 00 00 48 tdi 0,r0,72
c004:   48 00 00 24 b   c028 .__start+0x28
c008:   05 00 9f 42 .long 0x5009f42
c00c:   a6 02 48 7d lhzur16,18557(r2)
c010:   1c 00 4a 39 mulli   r0,r0,19001
c014:   a6 00 60 7d lhzur16,24701(0)
c018:   01 00 6b 69 .long 0x1006b69
c01c:   a6 03 5a 7d lhzur16,23165(r3)
c020:   a6 03 7b 7d lhzur16,31613(r3)
c024:   24 00 00 4c dozir0,r0,76
c028:   48 00 95 84 b   c00095ac
.__start_initialization_multiplatform
c02c:   7f e0 00 08 trap

And indeed, by replacing some of the LE hand-converted code with 0x0,
it seems that what's really making things blow up here is that 0x8-0xc
contain something else than 0x0.

Where/why this comes from I'm less certain of -- and since I seem to
no longer have a usable JTAG setup, I can't break in and see where the
code gets stuck and call paths, etc. So it's pure speculation, but I'm
guessing it's a null pointer dereference somewhere with a chained
pointer as the second member in a struct, i.e. with NULL the stray
null ptr deref does no harm.

Since it doesn't seem to impact pSeries, there's a chance that the bug
is in firmware, not in the kernel, since this seems to happen during
fairly early boot, i.e. possibly while grabbing the DT contents out.

This makes things interesting though. The BE/LE trampoline code
assumes at least 3 consecutive instructions. What was the reasoning
behind entering the kernel LE instead of keeping the old boot protocol
and just switching to LE once kernel is loaded? Is it actually used on
some platforms or is this just a theoretical thing?


-Olof
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH] powerpc: Fix alignment of secondary cpu spin vars

2013-12-28 Thread Olof Johansson
Commit 5c0484e25ec0 ('powerpc: Endian safe trampoline') resulted in
losing proper alignment of the spinlock variables used when booting
secondary CPUs, causing some quite odd issues with failing to boot on
PA Semi-based systems.

This showed itself on ppc64_defconfig, but not on pasemi_defconfig,
so it had gone unnoticed when I initially tested the LE patch set.

Fix is to add explicit alignment instead of relying on good luck. :)

Fixes: 5c0484e25ec0 ('powerpc: Endian safe trampoline')
Reported-by: Christian Zigotzky chzigot...@xenosoft.de
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=67811
Signed-off-by: Olof Johansson o...@lixom.net
---
 arch/powerpc/kernel/head_64.S |1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/kernel/head_64.S b/arch/powerpc/kernel/head_64.S
index 2ae41ab..4d94477 100644
--- a/arch/powerpc/kernel/head_64.S
+++ b/arch/powerpc/kernel/head_64.S
@@ -80,6 +80,7 @@ END_FTR_SECTION(0, 1)
 * of the function that the cpu should jump to to continue
 * initialization.
 */
+   .balign 8
.globl  __secondary_hold_spinloop
 __secondary_hold_spinloop:
.llong  0x0
-- 
1.7.10.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH] powerpc: Fix alignment of secondary cpu spin vars

2013-12-28 Thread Olof Johansson
On Sat, Dec 28, 2013 at 1:01 PM, Olof Johansson o...@lixom.net wrote:
 Commit 5c0484e25ec0 ('powerpc: Endian safe trampoline') resulted in
 losing proper alignment of the spinlock variables used when booting
 secondary CPUs, causing some quite odd issues with failing to boot on
 PA Semi-based systems.

 This showed itself on ppc64_defconfig, but not on pasemi_defconfig,
 so it had gone unnoticed when I initially tested the LE patch set.

 Fix is to add explicit alignment instead of relying on good luck. :)

 Fixes: 5c0484e25ec0 ('powerpc: Endian safe trampoline')
 Reported-by: Christian Zigotzky chzigot...@xenosoft.de
 Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=67811
 Signed-off-by: Olof Johansson o...@lixom.net

Sigh, it's not this after all. I did a clean build with this applied
and still see failures. Something else is (also?) going on here.


-Olof
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev