Re: [PATCH V2 2/2] powerpc/kexec: Reset HILE before kexec_sequence

2015-07-09 Thread Stewart Smith
Michael Ellerman m...@ellerman.id.au writes:
  I think a better API would be that opal_return_cpu() deals with this under 
  the
  covers. I think we talked about that, so maybe there was some reason that
  wasn't possible.
 
 opal_return_cpu() acts on current CPU which if we started flipping HILE
 there we'd hit PowerISA 2.07 Section 2.11:
 The contents of the HILE bit must be the same for all
 threads under the control of a given instance of the
 hypervisor; otherwise all results are undefined.
 
 so we'd have to do something kind of funny in opal_return_cpu() to work
 out what's going on. Keeping in mind that opal_return_cpu() is also used
 in the fsp code update path (which I haven't gone and really looked at
 in this context though).
 
 I'm not convinced that opal_return_cpu() doing the HILE switch is
 safe when we'd be relying on the kernel to pretty much do this all at
 the same time (when we really have opal_reinit_cpus to do that)

 Yeah I agree.

 What I meant is that after you return a cpu to OPAL, when you (or actually
 someone else) restart it, at that point it should be put into a well defined
 state, including HILE.

I'll go and try and investigate exactly what's going on here with the
hardware, it's possible some of the hardware documentation is Anton
Blanchard and that needs to make it into documents rather than just
skiboot source. Especially around what makes sense for all-at-a-time
(reinit_cpu with HILE switch) versus one at a time (start
cpu/return_cpu).

Of course, that's a skiboot problem rather than a Linux one.

I think Sam is looking at doing the opal call in real mode somewhere in
kexec_sequence just before we jump into the new kernel. This should make
it so that everything works on existing OPAL APIs, while at the same
time we can go and check if there's a sensible/valid way we can deal
with the hardware to get a broader well defined state after certain
calls.

I'm in favor of using existing calls rather than making a firmware
upgrade a requirement though.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH V2 2/2] powerpc/kexec: Reset HILE before kexec_sequence

2015-07-08 Thread Stewart Smith
Michael Ellerman m...@ellerman.id.au writes:
 On Wed, 2015-07-08 at 14:37 +1000, Samuel Mendoza-Jonas wrote:
 On powernv secondary cpus are returned to OPAL, and will then enter the
 target kernel in big-endian. However if it is set the HILE bit will persist,
 causing the first exception in the target kernel to be delivered in
 litte-endian regardless of the kernel endianess.
 Make sure that the HILE bit is switched off before entering
 kexec_sequence.
 
 Signed-off-by: Samuel Mendoza-Jonas sam...@au1.ibm.com
 ---
  arch/powerpc/kernel/machine_kexec_64.c | 6 ++
  1 file changed, 6 insertions(+)
 
 diff --git a/arch/powerpc/kernel/machine_kexec_64.c 
 b/arch/powerpc/kernel/machine_kexec_64.c
 index 1a74446..2266135c 100644
 --- a/arch/powerpc/kernel/machine_kexec_64.c
 +++ b/arch/powerpc/kernel/machine_kexec_64.c
 @@ -22,8 +22,10 @@
  #include asm/page.h
  #include asm/current.h
  #include asm/machdep.h
 +#include asm/opal.h
  #include asm/cacheflush.h
  #include asm/paca.h
 +#include asm/firmware.h
  #include asm/mmu.h
  #include asm/sections.h   /* _end */
  #include asm/prom.h
 @@ -356,6 +358,10 @@ void default_machine_kexec(struct kimage *image)
   * switched to a static version!
   */
  
 +/* Reset HILE in case we kexec into an older BE kernel */
 +if (firmware_has_feature(FW_FEATURE_OPALv3))
 +opal_reinit_cpus(OPAL_REINIT_CPUS_HILE_BE);

 It's not safe to do this here.

 We are still in virtual mode and have external interrupts enabled, so you 
 could
 easily take an exception of some kind and then you'd blow up. Mashing the
 keyboard during kexec might even be enough.

Hrm... interrupts are disabled in kexec_sequence, should we be doing
this there instead I wonder? At this point we're pretty much at the
point of no return, so maybe we just need to disable interrupts first?

 I think a better API would be that opal_return_cpu() deals with this under the
 covers. I think we talked about that, so maybe there was some reason that
 wasn't possible.

opal_return_cpu() acts on current CPU which if we started flipping HILE
there we'd hit PowerISA 2.07 Section 2.11:
The contents of the HILE bit must be the same for all
threads under the control of a given instance of the
hypervisor; otherwise all results are undefined.

so we'd have to do something kind of funny in opal_return_cpu() to work
out what's going on. Keeping in mind that opal_return_cpu() is also used
in the fsp code update path (which I haven't gone and really looked at
in this context though).

I'm not convinced that opal_return_cpu() doing the HILE switch is
safe when we'd be relying on the kernel to pretty much do this all at
the same time (when we really have opal_reinit_cpus to do that)

Although PowerISA also says:
The HILE bit is set, by an implementa-
tion-dependent method, during system initialization,
and cannot be modified after system initialization.

Which... umm... we are clearly doing and have been since we started
supporting LE powernv, so there's something somewhere in some document
describing it all... I just have to find it (or poke Ben to find out
where he worked it out from).

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH V2 2/2] powerpc/kexec: Reset HILE before kexec_sequence

2015-07-08 Thread Michael Ellerman
On Wed, 2015-07-08 at 16:51 +1000, Stewart Smith wrote:
 Michael Ellerman m...@ellerman.id.au writes:
  On Wed, 2015-07-08 at 14:37 +1000, Samuel Mendoza-Jonas wrote:
  On powernv secondary cpus are returned to OPAL, and will then enter the
  target kernel in big-endian. However if it is set the HILE bit will 
  persist,
  causing the first exception in the target kernel to be delivered in
  litte-endian regardless of the kernel endianess.
  Make sure that the HILE bit is switched off before entering
  kexec_sequence.
  
  Signed-off-by: Samuel Mendoza-Jonas sam...@au1.ibm.com
  ---
   arch/powerpc/kernel/machine_kexec_64.c | 6 ++
   1 file changed, 6 insertions(+)
  
  diff --git a/arch/powerpc/kernel/machine_kexec_64.c 
  b/arch/powerpc/kernel/machine_kexec_64.c
  index 1a74446..2266135c 100644
  --- a/arch/powerpc/kernel/machine_kexec_64.c
  +++ b/arch/powerpc/kernel/machine_kexec_64.c
  @@ -356,6 +358,10 @@ void default_machine_kexec(struct kimage *image)
  * switched to a static version!
  */
   
  +  /* Reset HILE in case we kexec into an older BE kernel */
  +  if (firmware_has_feature(FW_FEATURE_OPALv3))
  +  opal_reinit_cpus(OPAL_REINIT_CPUS_HILE_BE);
 
  It's not safe to do this here.
 
  We are still in virtual mode and have external interrupts enabled, so you 
  could
  easily take an exception of some kind and then you'd blow up. Mashing the
  keyboard during kexec might even be enough.
 
 Hrm... interrupts are disabled in kexec_sequence, should we be doing
 this there instead I wonder? At this point we're pretty much at the
 point of no return, so maybe we just need to disable interrupts first?
 
  I think a better API would be that opal_return_cpu() deals with this under 
  the
  covers. I think we talked about that, so maybe there was some reason that
  wasn't possible.
 
 opal_return_cpu() acts on current CPU which if we started flipping HILE
 there we'd hit PowerISA 2.07 Section 2.11:
 The contents of the HILE bit must be the same for all
 threads under the control of a given instance of the
 hypervisor; otherwise all results are undefined.
 
 so we'd have to do something kind of funny in opal_return_cpu() to work
 out what's going on. Keeping in mind that opal_return_cpu() is also used
 in the fsp code update path (which I haven't gone and really looked at
 in this context though).
 
 I'm not convinced that opal_return_cpu() doing the HILE switch is
 safe when we'd be relying on the kernel to pretty much do this all at
 the same time (when we really have opal_reinit_cpus to do that)

Yeah I agree.

What I meant is that after you return a cpu to OPAL, when you (or actually
someone else) restart it, at that point it should be put into a well defined
state, including HILE.

cheers



___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH V2 2/2] powerpc/kexec: Reset HILE before kexec_sequence

2015-07-08 Thread Samuel Mendoza-Jonas
On 08/07/15 16:51, Stewart Smith wrote:
 Michael Ellerman m...@ellerman.id.au writes:
 On Wed, 2015-07-08 at 14:37 +1000, Samuel Mendoza-Jonas wrote:
 On powernv secondary cpus are returned to OPAL, and will then enter the
 target kernel in big-endian. However if it is set the HILE bit will persist,
 causing the first exception in the target kernel to be delivered in
 litte-endian regardless of the kernel endianess.
 Make sure that the HILE bit is switched off before entering
 kexec_sequence.

 Signed-off-by: Samuel Mendoza-Jonas sam...@au1.ibm.com
 ---
  arch/powerpc/kernel/machine_kexec_64.c | 6 ++
  1 file changed, 6 insertions(+)

 diff --git a/arch/powerpc/kernel/machine_kexec_64.c 
 b/arch/powerpc/kernel/machine_kexec_64.c
 index 1a74446..2266135c 100644
 --- a/arch/powerpc/kernel/machine_kexec_64.c
 +++ b/arch/powerpc/kernel/machine_kexec_64.c
 @@ -22,8 +22,10 @@
  #include asm/page.h
  #include asm/current.h
  #include asm/machdep.h
 +#include asm/opal.h
  #include asm/cacheflush.h
  #include asm/paca.h
 +#include asm/firmware.h
  #include asm/mmu.h
  #include asm/sections.h  /* _end */
  #include asm/prom.h
 @@ -356,6 +358,10 @@ void default_machine_kexec(struct kimage *image)
  * switched to a static version!
  */
  
 +   /* Reset HILE in case we kexec into an older BE kernel */
 +   if (firmware_has_feature(FW_FEATURE_OPALv3))
 +   opal_reinit_cpus(OPAL_REINIT_CPUS_HILE_BE);

 It's not safe to do this here.

 We are still in virtual mode and have external interrupts enabled, so you 
 could
 easily take an exception of some kind and then you'd blow up. Mashing the
 keyboard during kexec might even be enough.
 
 Hrm... interrupts are disabled in kexec_sequence, should we be doing
 this there instead I wonder? At this point we're pretty much at the
 point of no return, so maybe we just need to disable interrupts first?
 
 I think a better API would be that opal_return_cpu() deals with this under 
 the
 covers. I think we talked about that, so maybe there was some reason that
 wasn't possible.
 
 opal_return_cpu() acts on current CPU which if we started flipping HILE
 there we'd hit PowerISA 2.07 Section 2.11:
 The contents of the HILE bit must be the same for all
 threads under the control of a given instance of the
 hypervisor; otherwise all results are undefined.
 
 so we'd have to do something kind of funny in opal_return_cpu() to work
 out what's going on. Keeping in mind that opal_return_cpu() is also used
 in the fsp code update path (which I haven't gone and really looked at
 in this context though).
 
 I'm not convinced that opal_return_cpu() doing the HILE switch is
 safe when we'd be relying on the kernel to pretty much do this all at
 the same time (when we really have opal_reinit_cpus to do that)

Having discovered opal_call_realmode, it looks like we can probably do this
safely in real mode just before moving the kernel around and releasing
the secondaries into the next kernel. I'll test this and send a V2 if it
looks good.

 
 Although PowerISA also says:
 The HILE bit is set, by an implementa-
 tion-dependent method, during system initialization,
 and cannot be modified after system initialization.
 
 Which... umm... we are clearly doing and have been since we started
 supporting LE powernv, so there's something somewhere in some document
 describing it all... I just have to find it (or poke Ben to find out
 where he worked it out from).
 


-- 
---
LTC Ozlabs
IBM

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH V2 2/2] powerpc/kexec: Reset HILE before kexec_sequence

2015-07-07 Thread Samuel Mendoza-Jonas
On powernv secondary cpus are returned to OPAL, and will then enter the
target kernel in big-endian. However if it is set the HILE bit will persist,
causing the first exception in the target kernel to be delivered in
litte-endian regardless of the kernel endianess.
Make sure that the HILE bit is switched off before entering
kexec_sequence.

Signed-off-by: Samuel Mendoza-Jonas sam...@au1.ibm.com
---
 arch/powerpc/kernel/machine_kexec_64.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/arch/powerpc/kernel/machine_kexec_64.c 
b/arch/powerpc/kernel/machine_kexec_64.c
index 1a74446..2266135c 100644
--- a/arch/powerpc/kernel/machine_kexec_64.c
+++ b/arch/powerpc/kernel/machine_kexec_64.c
@@ -22,8 +22,10 @@
 #include asm/page.h
 #include asm/current.h
 #include asm/machdep.h
+#include asm/opal.h
 #include asm/cacheflush.h
 #include asm/paca.h
+#include asm/firmware.h
 #include asm/mmu.h
 #include asm/sections.h  /* _end */
 #include asm/prom.h
@@ -356,6 +358,10 @@ void default_machine_kexec(struct kimage *image)
 * switched to a static version!
 */
 
+   /* Reset HILE in case we kexec into an older BE kernel */
+   if (firmware_has_feature(FW_FEATURE_OPALv3))
+   opal_reinit_cpus(OPAL_REINIT_CPUS_HILE_BE);
+
/* Some things are best done in assembly.  Finding globals with
 * a toc is easier in C, so pass in what we can.
 */
-- 
2.4.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH V2 2/2] powerpc/kexec: Reset HILE before kexec_sequence

2015-07-07 Thread Michael Ellerman
On Wed, 2015-07-08 at 14:37 +1000, Samuel Mendoza-Jonas wrote:
 On powernv secondary cpus are returned to OPAL, and will then enter the
 target kernel in big-endian. However if it is set the HILE bit will persist,
 causing the first exception in the target kernel to be delivered in
 litte-endian regardless of the kernel endianess.
 Make sure that the HILE bit is switched off before entering
 kexec_sequence.
 
 Signed-off-by: Samuel Mendoza-Jonas sam...@au1.ibm.com
 ---
  arch/powerpc/kernel/machine_kexec_64.c | 6 ++
  1 file changed, 6 insertions(+)
 
 diff --git a/arch/powerpc/kernel/machine_kexec_64.c 
 b/arch/powerpc/kernel/machine_kexec_64.c
 index 1a74446..2266135c 100644
 --- a/arch/powerpc/kernel/machine_kexec_64.c
 +++ b/arch/powerpc/kernel/machine_kexec_64.c
 @@ -22,8 +22,10 @@
  #include asm/page.h
  #include asm/current.h
  #include asm/machdep.h
 +#include asm/opal.h
  #include asm/cacheflush.h
  #include asm/paca.h
 +#include asm/firmware.h
  #include asm/mmu.h
  #include asm/sections.h/* _end */
  #include asm/prom.h
 @@ -356,6 +358,10 @@ void default_machine_kexec(struct kimage *image)
* switched to a static version!
*/
  
 + /* Reset HILE in case we kexec into an older BE kernel */
 + if (firmware_has_feature(FW_FEATURE_OPALv3))
 + opal_reinit_cpus(OPAL_REINIT_CPUS_HILE_BE);

It's not safe to do this here.

We are still in virtual mode and have external interrupts enabled, so you could
easily take an exception of some kind and then you'd blow up. Mashing the
keyboard during kexec might even be enough.

I think a better API would be that opal_return_cpu() deals with this under the
covers. I think we talked about that, so maybe there was some reason that
wasn't possible.

cheers


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev