Re: [PATCH] Revert "powerpc/rtas: Implement reentrant rtas call"

2023-04-17 Thread Nathan Lynch
Michal Suchánek  writes:
> On Fri, Sep 16, 2022 at 04:56:18PM -0500, Nathan Lynch wrote:
>> "Nicholas Piggin"  writes:
>> > On Wed Sep 14, 2022 at 3:39 AM AEST, Leonardo Brás wrote:
>> >> On Mon, 2022-09-12 at 14:58 -0500, Nathan Lynch wrote:
>> >> > Leonardo Brás  writes:
>> >> > > On Fri, 2022-09-09 at 09:04 -0500, Nathan Lynch wrote:
>
>> >> > > > No, it means the premise of commit b664db8e3f97 ("powerpc/rtas:
>> >> > > > Implement reentrant rtas call") change is incorrect. The "reentrant"
>> >> > > > property described in the spec applies only to the individual RTAS
>> >> > > > functions. The OS can invoke (for example) ibm,set-xive on multiple 
>> >> > > > CPUs
>> >> > > > simultaneously, but it must adhere to the more general requirement 
>> >> > > > to
>> >> > > > serialize with other RTAS functions.
>> >> > > > 
>> >> > > 
>> >> > > I see. Thanks for explaining that part!
>> >> > > I agree: reentrant calls that way don't look as useful on Linux than I
>> >> > > previously thought.
>> >> > > 
>> >> > > OTOH, I think that instead of reverting the change, we could make use 
>> >> > > of the
>> >> > > correct information and fix the current implementation. (This could 
>> >> > > help when we
>> >> > > do the same rtas call in multiple cpus)
>> >> > 
>> >> > Hmm I'm happy to be mistaken here, but I doubt we ever really need to do
>> >> > that. I'm not seeing the need.
>> >> > 
>> >> > > I have an idea of a patch to fix this. 
>> >> > > Do you think it would be ok if I sent that, to prospect being an 
>> >> > > alternative to
>> >> > > this reversion?
>> >> > 
>> >> > It is my preference, and I believe it is more common, to revert to the
>> >> > well-understood prior state, imperfect as it may be. The revert can be
>> >> > backported to -stable and distros while development and review of
>> >> > another approach proceeds.
>> >>
>> >> Ok then, as long as you are aware of the kdump bug, I'm good.
>> >>
>> >> FWIW:
>> >> Reviewed-by: Leonardo Bras 
>> >
>> > A shame. I guess a reader/writer lock would not be much help because
>> > the crash is probably more likely to hit longer running rtas calls?
>> >
>> > Alternative is just cheat and do this...?
>> >
>> > Thanks,
>> > Nick
>> >
>> > diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
>> > index 693133972294..89728714a06e 100644
>> > --- a/arch/powerpc/kernel/rtas.c
>> > +++ b/arch/powerpc/kernel/rtas.c
>> > @@ -26,6 +26,7 @@
>> >  #include 
>> >  #include 
>> >  #include 
>> > +#include 
>> >  
>> >  #include 
>> >  #include 
>> > @@ -97,6 +98,19 @@ static unsigned long lock_rtas(void)
>> >  {
>> > unsigned long flags;
>> >  
>> > +   if (atomic_read(_cpu) == raw_smp_processor_id()) {
>> > +   /*
>> > +* Crash in progress on this CPU. Other CPUs should be
>> > +* stopped by now, so skip the lock in case it was being
>> > +* held, and is now needed for crashing e.g., kexec
>> > +* (machine_kexec_mask_interrupts) requires rtas calls.
>> > +*
>> > +* It's possible this could have caused rtas state
>> > breakage
>> > +* but the alternative is deadlock.
>> > +*/
>> > +   return 0;
>> > +   }
>> > +
>> > local_irq_save(flags);
>> > preempt_disable();
>> > arch_spin_lock();
>> > @@ -105,6 +119,9 @@ static unsigned long lock_rtas(void)
>> >  
>> >  static void unlock_rtas(unsigned long flags)
>> >  {
>> > +   if (atomic_read(_cpu) == raw_smp_processor_id())
>> > +   return;
>> > +
>> > arch_spin_unlock();
>> > local_irq_restore(flags);
>> > preempt_enable();
>> 
>> Looks correct.
>> 
>> I wonder - would it be worth making the panic path use a separate
>> "emergency" rtas_args buffer as well? If a CPU is actually "stuck" in
>> RTAS at panic time, then leaving rtas.args untouched might make the
>> ibm,int-off, ibm,set-xive, ibm,os-term, and any other RTAS calls we
>> incur on the panic path more likely to succeed.
>
> Was some fix for the case of crashing in rtas merged?
>
> Looks like there is none unless I missed something.

I'm not aware of any crashes in RTAS, but we do have an issue open to
track the issue I think you're referring to:

https://github.com/linuxppc/issues/issues/435

No progress yet. AFAIK only XICS guests are exposed; XIVE doesn't use
RTAS calls.


Re: [PATCH] Revert "powerpc/rtas: Implement reentrant rtas call"

2023-04-14 Thread Michal Suchánek
Hello,

On Fri, Sep 16, 2022 at 04:56:18PM -0500, Nathan Lynch wrote:
> "Nicholas Piggin"  writes:
> > On Wed Sep 14, 2022 at 3:39 AM AEST, Leonardo Brás wrote:
> >> On Mon, 2022-09-12 at 14:58 -0500, Nathan Lynch wrote:
> >> > Leonardo Brás  writes:
> >> > > On Fri, 2022-09-09 at 09:04 -0500, Nathan Lynch wrote:

> >> > > > No, it means the premise of commit b664db8e3f97 ("powerpc/rtas:
> >> > > > Implement reentrant rtas call") change is incorrect. The "reentrant"
> >> > > > property described in the spec applies only to the individual RTAS
> >> > > > functions. The OS can invoke (for example) ibm,set-xive on multiple 
> >> > > > CPUs
> >> > > > simultaneously, but it must adhere to the more general requirement to
> >> > > > serialize with other RTAS functions.
> >> > > > 
> >> > > 
> >> > > I see. Thanks for explaining that part!
> >> > > I agree: reentrant calls that way don't look as useful on Linux than I
> >> > > previously thought.
> >> > > 
> >> > > OTOH, I think that instead of reverting the change, we could make use 
> >> > > of the
> >> > > correct information and fix the current implementation. (This could 
> >> > > help when we
> >> > > do the same rtas call in multiple cpus)
> >> > 
> >> > Hmm I'm happy to be mistaken here, but I doubt we ever really need to do
> >> > that. I'm not seeing the need.
> >> > 
> >> > > I have an idea of a patch to fix this. 
> >> > > Do you think it would be ok if I sent that, to prospect being an 
> >> > > alternative to
> >> > > this reversion?
> >> > 
> >> > It is my preference, and I believe it is more common, to revert to the
> >> > well-understood prior state, imperfect as it may be. The revert can be
> >> > backported to -stable and distros while development and review of
> >> > another approach proceeds.
> >>
> >> Ok then, as long as you are aware of the kdump bug, I'm good.
> >>
> >> FWIW:
> >> Reviewed-by: Leonardo Bras 
> >
> > A shame. I guess a reader/writer lock would not be much help because
> > the crash is probably more likely to hit longer running rtas calls?
> >
> > Alternative is just cheat and do this...?
> >
> > Thanks,
> > Nick
> >
> > diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
> > index 693133972294..89728714a06e 100644
> > --- a/arch/powerpc/kernel/rtas.c
> > +++ b/arch/powerpc/kernel/rtas.c
> > @@ -26,6 +26,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >  
> >  #include 
> >  #include 
> > @@ -97,6 +98,19 @@ static unsigned long lock_rtas(void)
> >  {
> > unsigned long flags;
> >  
> > +   if (atomic_read(_cpu) == raw_smp_processor_id()) {
> > +   /*
> > +* Crash in progress on this CPU. Other CPUs should be
> > +* stopped by now, so skip the lock in case it was being
> > +* held, and is now needed for crashing e.g., kexec
> > +* (machine_kexec_mask_interrupts) requires rtas calls.
> > +*
> > +* It's possible this could have caused rtas state
> > breakage
> > +* but the alternative is deadlock.
> > +*/
> > +   return 0;
> > +   }
> > +
> > local_irq_save(flags);
> > preempt_disable();
> > arch_spin_lock();
> > @@ -105,6 +119,9 @@ static unsigned long lock_rtas(void)
> >  
> >  static void unlock_rtas(unsigned long flags)
> >  {
> > +   if (atomic_read(_cpu) == raw_smp_processor_id())
> > +   return;
> > +
> > arch_spin_unlock();
> > local_irq_restore(flags);
> > preempt_enable();
> 
> Looks correct.
> 
> I wonder - would it be worth making the panic path use a separate
> "emergency" rtas_args buffer as well? If a CPU is actually "stuck" in
> RTAS at panic time, then leaving rtas.args untouched might make the
> ibm,int-off, ibm,set-xive, ibm,os-term, and any other RTAS calls we
> incur on the panic path more likely to succeed.

Was some fix for the case of crashing in rtas merged?

Looks like there is none unless I missed something.

The paramater area allocator might help with the latter
but the former does not seem addressed.

Thanks

Michal


Re: [PATCH] Revert "powerpc/rtas: Implement reentrant rtas call"

2022-09-23 Thread Michael Ellerman
On Wed, 7 Sep 2022 17:01:11 -0500, Nathan Lynch wrote:
> At the time this was submitted by Leonardo, I confirmed -- or thought
> I had confirmed -- with PowerVM partition firmware development that
> the following RTAS functions:
> 
> - ibm,get-xive
> - ibm,int-off
> - ibm,int-on
> - ibm,set-xive
> 
> [...]

Applied to powerpc/next.

[1/1] Revert "powerpc/rtas: Implement reentrant rtas call"
  https://git.kernel.org/powerpc/c/f88aabad33ea22be2ce1c60d8901942e4e2a9edb

cheers


Re: [PATCH] Revert "powerpc/rtas: Implement reentrant rtas call"

2022-09-21 Thread Nathan Lynch
"Nicholas Piggin"  writes:

> On Mon Sep 19, 2022 at 11:51 PM AEST, Nathan Lynch wrote:
>> > I wonder - would it be worth making the panic path use a separate
>> > "emergency" rtas_args buffer as well? If a CPU is actually "stuck" in
>> > RTAS at panic time, then leaving rtas.args untouched might make the
>> > ibm,int-off, ibm,set-xive, ibm,os-term, and any other RTAS calls we
>> > incur on the panic path more likely to succeed.
>
> Yeah I think that's probably not a bad idea. Not sure if you've got the
> bandwidth to take on doing the patch but be my guest if you do :)
> Otherwise we can file it in github issues.

Not sure I'll be able to work it soon. I filed
https://github.com/linuxppc/issues/issues/435


Re: [PATCH] Revert "powerpc/rtas: Implement reentrant rtas call"

2022-09-19 Thread Nicholas Piggin
On Mon Sep 19, 2022 at 11:51 PM AEST, Nathan Lynch wrote:
> Nathan Lynch  writes:
> > "Nicholas Piggin"  writes:
> >> On Wed Sep 14, 2022 at 3:39 AM AEST, Leonardo Brás wrote:
> >>> On Mon, 2022-09-12 at 14:58 -0500, Nathan Lynch wrote:
> >>> > Leonardo Brás  writes:
> >>> > > On Fri, 2022-09-09 at 09:04 -0500, Nathan Lynch wrote:
> >>> > > > Leonardo Brás  writes:
> >>> > > > > On Wed, 2022-09-07 at 17:01 -0500, Nathan Lynch wrote:
> >>> > > > > > At the time this was submitted by Leonardo, I confirmed -- or 
> >>> > > > > > thought
> >>> > > > > > I had confirmed -- with PowerVM partition firmware development 
> >>> > > > > > that
> >>> > > > > > the following RTAS functions:
> >>> > > > > > 
> >>> > > > > > - ibm,get-xive
> >>> > > > > > - ibm,int-off
> >>> > > > > > - ibm,int-on
> >>> > > > > > - ibm,set-xive
> >>> > > > > > 
> >>> > > > > > were safe to call on multiple CPUs simultaneously, not only with
> >>> > > > > > respect to themselves as indicated by PAPR, but with arbitrary 
> >>> > > > > > other
> >>> > > > > > RTAS calls:
> >>> > > > > > 
> >>> > > > > > https://lore.kernel.org/linuxppc-dev/875zcy2v8o@linux.ibm.com/
> >>> > > > > > 
> >>> > > > > > Recent discussion with firmware development makes it clear that 
> >>> > > > > > this
> >>> > > > > > is not true, and that the code in commit b664db8e3f97 
> >>> > > > > > ("powerpc/rtas:
> >>> > > > > > Implement reentrant rtas call") is unsafe, likely explaining 
> >>> > > > > > several
> >>> > > > > > strange bugs we've seen in internal testing involving DLPAR and
> >>> > > > > > LPM. These scenarios use ibm,configure-connector, whose 
> >>> > > > > > internal state
> >>> > > > > > can be corrupted by the concurrent use of the "reentrant" 
> >>> > > > > > functions,
> >>> > > > > > leading to symptoms like endless busy statuses from RTAS.
> >>> > > > > 
> >>> > > > > Oh, does not it means PowerVM is not compliant to the PAPR specs?
> >>> > > > 
> >>> > > > No, it means the premise of commit b664db8e3f97 ("powerpc/rtas:
> >>> > > > Implement reentrant rtas call") change is incorrect. The "reentrant"
> >>> > > > property described in the spec applies only to the individual RTAS
> >>> > > > functions. The OS can invoke (for example) ibm,set-xive on multiple 
> >>> > > > CPUs
> >>> > > > simultaneously, but it must adhere to the more general requirement 
> >>> > > > to
> >>> > > > serialize with other RTAS functions.
> >>> > > > 
> >>> > > 
> >>> > > I see. Thanks for explaining that part!
> >>> > > I agree: reentrant calls that way don't look as useful on Linux than I
> >>> > > previously thought.
> >>> > > 
> >>> > > OTOH, I think that instead of reverting the change, we could make use 
> >>> > > of the
> >>> > > correct information and fix the current implementation. (This could 
> >>> > > help when we
> >>> > > do the same rtas call in multiple cpus)
> >>> > 
> >>> > Hmm I'm happy to be mistaken here, but I doubt we ever really need to do
> >>> > that. I'm not seeing the need.
> >>> > 
> >>> > > I have an idea of a patch to fix this. 
> >>> > > Do you think it would be ok if I sent that, to prospect being an 
> >>> > > alternative to
> >>> > > this reversion?
> >>> > 
> >>> > It is my preference, and I believe it is more common, to revert to the
> >>> > well-understood prior state, imperfect as it may be. The revert can be
> >>> > backported to -stable and distros while development and review of
> >>> > another approach proceeds.
> >>>
> >>> Ok then, as long as you are aware of the kdump bug, I'm good.
> >>>
> >>> FWIW:
> >>> Reviewed-by: Leonardo Bras 
> >>
> >> A shame. I guess a reader/writer lock would not be much help because
> >> the crash is probably more likely to hit longer running rtas calls?
> >>
> >> Alternative is just cheat and do this...?
>
> [...]
>
> >
> > I wonder - would it be worth making the panic path use a separate
> > "emergency" rtas_args buffer as well? If a CPU is actually "stuck" in
> > RTAS at panic time, then leaving rtas.args untouched might make the
> > ibm,int-off, ibm,set-xive, ibm,os-term, and any other RTAS calls we
> > incur on the panic path more likely to succeed.

Yeah I think that's probably not a bad idea. Not sure if you've got the
bandwidth to take on doing the patch but be my guest if you do :)
Otherwise we can file it in github issues.

> Regardless, I request that we proceed with the revert while the crash
> path hardening gets sorted out. If I understand the motivation behind
> commit b664db8e3f97 ("powerpc/rtas: Implement reentrant rtas call"),
> then it looks like it was incomplete anyway? rtas_os_term() still takes
> the lock when calling ibm,os-term.

Yeah agree a simple revert should be done first.

Thanks,
Nick


Re: [PATCH] Revert "powerpc/rtas: Implement reentrant rtas call"

2022-09-19 Thread Michael Ellerman
Nathan Lynch  writes:
> Nathan Lynch  writes:
>> "Nicholas Piggin"  writes:
>>> On Wed Sep 14, 2022 at 3:39 AM AEST, Leonardo Brás wrote:
 On Mon, 2022-09-12 at 14:58 -0500, Nathan Lynch wrote:
 > Leonardo Brás  writes:
 > > On Fri, 2022-09-09 at 09:04 -0500, Nathan Lynch wrote:
 > > > Leonardo Brás  writes:
 > > > > On Wed, 2022-09-07 at 17:01 -0500, Nathan Lynch wrote:
 > > > > > At the time this was submitted by Leonardo, I confirmed -- or 
 > > > > > thought
 > > > > > I had confirmed -- with PowerVM partition firmware development 
 > > > > > that
 > > > > > the following RTAS functions:
 > > > > > 
 > > > > > - ibm,get-xive
 > > > > > - ibm,int-off
 > > > > > - ibm,int-on
 > > > > > - ibm,set-xive
 > > > > > 
 > > > > > were safe to call on multiple CPUs simultaneously, not only with
 > > > > > respect to themselves as indicated by PAPR, but with arbitrary 
 > > > > > other
 > > > > > RTAS calls:
 > > > > > 
 > > > > > https://lore.kernel.org/linuxppc-dev/875zcy2v8o@linux.ibm.com/
 > > > > > 
 > > > > > Recent discussion with firmware development makes it clear that 
 > > > > > this
 > > > > > is not true, and that the code in commit b664db8e3f97 
 > > > > > ("powerpc/rtas:
 > > > > > Implement reentrant rtas call") is unsafe, likely explaining 
 > > > > > several
 > > > > > strange bugs we've seen in internal testing involving DLPAR and
 > > > > > LPM. These scenarios use ibm,configure-connector, whose internal 
 > > > > > state
 > > > > > can be corrupted by the concurrent use of the "reentrant" 
 > > > > > functions,
 > > > > > leading to symptoms like endless busy statuses from RTAS.
 > > > > 
 > > > > Oh, does not it means PowerVM is not compliant to the PAPR specs?
 > > > 
 > > > No, it means the premise of commit b664db8e3f97 ("powerpc/rtas:
 > > > Implement reentrant rtas call") change is incorrect. The "reentrant"
 > > > property described in the spec applies only to the individual RTAS
 > > > functions. The OS can invoke (for example) ibm,set-xive on multiple 
 > > > CPUs
 > > > simultaneously, but it must adhere to the more general requirement to
 > > > serialize with other RTAS functions.
 > > > 
 > > 
 > > I see. Thanks for explaining that part!
 > > I agree: reentrant calls that way don't look as useful on Linux than I
 > > previously thought.
 > > 
 > > OTOH, I think that instead of reverting the change, we could make use 
 > > of the
 > > correct information and fix the current implementation. (This could 
 > > help when we
 > > do the same rtas call in multiple cpus)
 > 
 > Hmm I'm happy to be mistaken here, but I doubt we ever really need to do
 > that. I'm not seeing the need.
 > 
 > > I have an idea of a patch to fix this. 
 > > Do you think it would be ok if I sent that, to prospect being an 
 > > alternative to
 > > this reversion?
 > 
 > It is my preference, and I believe it is more common, to revert to the
 > well-understood prior state, imperfect as it may be. The revert can be
 > backported to -stable and distros while development and review of
 > another approach proceeds.

 Ok then, as long as you are aware of the kdump bug, I'm good.

 FWIW:
 Reviewed-by: Leonardo Bras 
>>>
>>> A shame. I guess a reader/writer lock would not be much help because
>>> the crash is probably more likely to hit longer running rtas calls?
>>>
>>> Alternative is just cheat and do this...?
>
> [...]
>
>>
>> I wonder - would it be worth making the panic path use a separate
>> "emergency" rtas_args buffer as well? If a CPU is actually "stuck" in
>> RTAS at panic time, then leaving rtas.args untouched might make the
>> ibm,int-off, ibm,set-xive, ibm,os-term, and any other RTAS calls we
>> incur on the panic path more likely to succeed.
>
> Regardless, I request that we proceed with the revert while the crash
> path hardening gets sorted out.

It's in next, I just haven't sent out the mails yet because I'm
disorganised :)

> If I understand the motivation behind
> commit b664db8e3f97 ("powerpc/rtas: Implement reentrant rtas call"),
> then it looks like it was incomplete anyway? rtas_os_term() still takes
> the lock when calling ibm,os-term.

The original report was for kdump, which doesn't use rtas_os_term() AFAIK.

cheers


Re: [PATCH] Revert "powerpc/rtas: Implement reentrant rtas call"

2022-09-19 Thread Nathan Lynch
Nathan Lynch  writes:
> "Nicholas Piggin"  writes:
>> On Wed Sep 14, 2022 at 3:39 AM AEST, Leonardo Brás wrote:
>>> On Mon, 2022-09-12 at 14:58 -0500, Nathan Lynch wrote:
>>> > Leonardo Brás  writes:
>>> > > On Fri, 2022-09-09 at 09:04 -0500, Nathan Lynch wrote:
>>> > > > Leonardo Brás  writes:
>>> > > > > On Wed, 2022-09-07 at 17:01 -0500, Nathan Lynch wrote:
>>> > > > > > At the time this was submitted by Leonardo, I confirmed -- or 
>>> > > > > > thought
>>> > > > > > I had confirmed -- with PowerVM partition firmware development 
>>> > > > > > that
>>> > > > > > the following RTAS functions:
>>> > > > > > 
>>> > > > > > - ibm,get-xive
>>> > > > > > - ibm,int-off
>>> > > > > > - ibm,int-on
>>> > > > > > - ibm,set-xive
>>> > > > > > 
>>> > > > > > were safe to call on multiple CPUs simultaneously, not only with
>>> > > > > > respect to themselves as indicated by PAPR, but with arbitrary 
>>> > > > > > other
>>> > > > > > RTAS calls:
>>> > > > > > 
>>> > > > > > https://lore.kernel.org/linuxppc-dev/875zcy2v8o@linux.ibm.com/
>>> > > > > > 
>>> > > > > > Recent discussion with firmware development makes it clear that 
>>> > > > > > this
>>> > > > > > is not true, and that the code in commit b664db8e3f97 
>>> > > > > > ("powerpc/rtas:
>>> > > > > > Implement reentrant rtas call") is unsafe, likely explaining 
>>> > > > > > several
>>> > > > > > strange bugs we've seen in internal testing involving DLPAR and
>>> > > > > > LPM. These scenarios use ibm,configure-connector, whose internal 
>>> > > > > > state
>>> > > > > > can be corrupted by the concurrent use of the "reentrant" 
>>> > > > > > functions,
>>> > > > > > leading to symptoms like endless busy statuses from RTAS.
>>> > > > > 
>>> > > > > Oh, does not it means PowerVM is not compliant to the PAPR specs?
>>> > > > 
>>> > > > No, it means the premise of commit b664db8e3f97 ("powerpc/rtas:
>>> > > > Implement reentrant rtas call") change is incorrect. The "reentrant"
>>> > > > property described in the spec applies only to the individual RTAS
>>> > > > functions. The OS can invoke (for example) ibm,set-xive on multiple 
>>> > > > CPUs
>>> > > > simultaneously, but it must adhere to the more general requirement to
>>> > > > serialize with other RTAS functions.
>>> > > > 
>>> > > 
>>> > > I see. Thanks for explaining that part!
>>> > > I agree: reentrant calls that way don't look as useful on Linux than I
>>> > > previously thought.
>>> > > 
>>> > > OTOH, I think that instead of reverting the change, we could make use 
>>> > > of the
>>> > > correct information and fix the current implementation. (This could 
>>> > > help when we
>>> > > do the same rtas call in multiple cpus)
>>> > 
>>> > Hmm I'm happy to be mistaken here, but I doubt we ever really need to do
>>> > that. I'm not seeing the need.
>>> > 
>>> > > I have an idea of a patch to fix this. 
>>> > > Do you think it would be ok if I sent that, to prospect being an 
>>> > > alternative to
>>> > > this reversion?
>>> > 
>>> > It is my preference, and I believe it is more common, to revert to the
>>> > well-understood prior state, imperfect as it may be. The revert can be
>>> > backported to -stable and distros while development and review of
>>> > another approach proceeds.
>>>
>>> Ok then, as long as you are aware of the kdump bug, I'm good.
>>>
>>> FWIW:
>>> Reviewed-by: Leonardo Bras 
>>
>> A shame. I guess a reader/writer lock would not be much help because
>> the crash is probably more likely to hit longer running rtas calls?
>>
>> Alternative is just cheat and do this...?

[...]

>
> I wonder - would it be worth making the panic path use a separate
> "emergency" rtas_args buffer as well? If a CPU is actually "stuck" in
> RTAS at panic time, then leaving rtas.args untouched might make the
> ibm,int-off, ibm,set-xive, ibm,os-term, and any other RTAS calls we
> incur on the panic path more likely to succeed.

Regardless, I request that we proceed with the revert while the crash
path hardening gets sorted out. If I understand the motivation behind
commit b664db8e3f97 ("powerpc/rtas: Implement reentrant rtas call"),
then it looks like it was incomplete anyway? rtas_os_term() still takes
the lock when calling ibm,os-term.


Re: [PATCH] Revert "powerpc/rtas: Implement reentrant rtas call"

2022-09-16 Thread Nathan Lynch
"Nicholas Piggin"  writes:
> On Wed Sep 14, 2022 at 3:39 AM AEST, Leonardo Brás wrote:
>> On Mon, 2022-09-12 at 14:58 -0500, Nathan Lynch wrote:
>> > Leonardo Brás  writes:
>> > > On Fri, 2022-09-09 at 09:04 -0500, Nathan Lynch wrote:
>> > > > Leonardo Brás  writes:
>> > > > > On Wed, 2022-09-07 at 17:01 -0500, Nathan Lynch wrote:
>> > > > > > At the time this was submitted by Leonardo, I confirmed -- or 
>> > > > > > thought
>> > > > > > I had confirmed -- with PowerVM partition firmware development that
>> > > > > > the following RTAS functions:
>> > > > > > 
>> > > > > > - ibm,get-xive
>> > > > > > - ibm,int-off
>> > > > > > - ibm,int-on
>> > > > > > - ibm,set-xive
>> > > > > > 
>> > > > > > were safe to call on multiple CPUs simultaneously, not only with
>> > > > > > respect to themselves as indicated by PAPR, but with arbitrary 
>> > > > > > other
>> > > > > > RTAS calls:
>> > > > > > 
>> > > > > > https://lore.kernel.org/linuxppc-dev/875zcy2v8o@linux.ibm.com/
>> > > > > > 
>> > > > > > Recent discussion with firmware development makes it clear that 
>> > > > > > this
>> > > > > > is not true, and that the code in commit b664db8e3f97 
>> > > > > > ("powerpc/rtas:
>> > > > > > Implement reentrant rtas call") is unsafe, likely explaining 
>> > > > > > several
>> > > > > > strange bugs we've seen in internal testing involving DLPAR and
>> > > > > > LPM. These scenarios use ibm,configure-connector, whose internal 
>> > > > > > state
>> > > > > > can be corrupted by the concurrent use of the "reentrant" 
>> > > > > > functions,
>> > > > > > leading to symptoms like endless busy statuses from RTAS.
>> > > > > 
>> > > > > Oh, does not it means PowerVM is not compliant to the PAPR specs?
>> > > > 
>> > > > No, it means the premise of commit b664db8e3f97 ("powerpc/rtas:
>> > > > Implement reentrant rtas call") change is incorrect. The "reentrant"
>> > > > property described in the spec applies only to the individual RTAS
>> > > > functions. The OS can invoke (for example) ibm,set-xive on multiple 
>> > > > CPUs
>> > > > simultaneously, but it must adhere to the more general requirement to
>> > > > serialize with other RTAS functions.
>> > > > 
>> > > 
>> > > I see. Thanks for explaining that part!
>> > > I agree: reentrant calls that way don't look as useful on Linux than I
>> > > previously thought.
>> > > 
>> > > OTOH, I think that instead of reverting the change, we could make use of 
>> > > the
>> > > correct information and fix the current implementation. (This could help 
>> > > when we
>> > > do the same rtas call in multiple cpus)
>> > 
>> > Hmm I'm happy to be mistaken here, but I doubt we ever really need to do
>> > that. I'm not seeing the need.
>> > 
>> > > I have an idea of a patch to fix this. 
>> > > Do you think it would be ok if I sent that, to prospect being an 
>> > > alternative to
>> > > this reversion?
>> > 
>> > It is my preference, and I believe it is more common, to revert to the
>> > well-understood prior state, imperfect as it may be. The revert can be
>> > backported to -stable and distros while development and review of
>> > another approach proceeds.
>>
>> Ok then, as long as you are aware of the kdump bug, I'm good.
>>
>> FWIW:
>> Reviewed-by: Leonardo Bras 
>
> A shame. I guess a reader/writer lock would not be much help because
> the crash is probably more likely to hit longer running rtas calls?
>
> Alternative is just cheat and do this...?
>
> Thanks,
> Nick
>
> diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
> index 693133972294..89728714a06e 100644
> --- a/arch/powerpc/kernel/rtas.c
> +++ b/arch/powerpc/kernel/rtas.c
> @@ -26,6 +26,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include 
>  #include 
> @@ -97,6 +98,19 @@ static unsigned long lock_rtas(void)
>  {
> unsigned long flags;
>  
> +   if (atomic_read(_cpu) == raw_smp_processor_id()) {
> +   /*
> +* Crash in progress on this CPU. Other CPUs should be
> +* stopped by now, so skip the lock in case it was being
> +* held, and is now needed for crashing e.g., kexec
> +* (machine_kexec_mask_interrupts) requires rtas calls.
> +*
> +* It's possible this could have caused rtas state
> breakage
> +* but the alternative is deadlock.
> +*/
> +   return 0;
> +   }
> +
> local_irq_save(flags);
> preempt_disable();
> arch_spin_lock();
> @@ -105,6 +119,9 @@ static unsigned long lock_rtas(void)
>  
>  static void unlock_rtas(unsigned long flags)
>  {
> +   if (atomic_read(_cpu) == raw_smp_processor_id())
> +   return;
> +
> arch_spin_unlock();
> local_irq_restore(flags);
> preempt_enable();

Looks correct.

I wonder - would it be worth making the panic path use a separate
"emergency" rtas_args buffer as well? If a CPU is actually "stuck" in
RTAS at panic time, 

Re: [PATCH] Revert "powerpc/rtas: Implement reentrant rtas call"

2022-09-15 Thread Nicholas Piggin
On Wed Sep 14, 2022 at 3:39 AM AEST, Leonardo Brás wrote:
> On Mon, 2022-09-12 at 14:58 -0500, Nathan Lynch wrote:
> > Leonardo Brás  writes:
> > > On Fri, 2022-09-09 at 09:04 -0500, Nathan Lynch wrote:
> > > > Leonardo Brás  writes:
> > > > > On Wed, 2022-09-07 at 17:01 -0500, Nathan Lynch wrote:
> > > > > > At the time this was submitted by Leonardo, I confirmed -- or 
> > > > > > thought
> > > > > > I had confirmed -- with PowerVM partition firmware development that
> > > > > > the following RTAS functions:
> > > > > > 
> > > > > > - ibm,get-xive
> > > > > > - ibm,int-off
> > > > > > - ibm,int-on
> > > > > > - ibm,set-xive
> > > > > > 
> > > > > > were safe to call on multiple CPUs simultaneously, not only with
> > > > > > respect to themselves as indicated by PAPR, but with arbitrary other
> > > > > > RTAS calls:
> > > > > > 
> > > > > > https://lore.kernel.org/linuxppc-dev/875zcy2v8o@linux.ibm.com/
> > > > > > 
> > > > > > Recent discussion with firmware development makes it clear that this
> > > > > > is not true, and that the code in commit b664db8e3f97 
> > > > > > ("powerpc/rtas:
> > > > > > Implement reentrant rtas call") is unsafe, likely explaining several
> > > > > > strange bugs we've seen in internal testing involving DLPAR and
> > > > > > LPM. These scenarios use ibm,configure-connector, whose internal 
> > > > > > state
> > > > > > can be corrupted by the concurrent use of the "reentrant" functions,
> > > > > > leading to symptoms like endless busy statuses from RTAS.
> > > > > 
> > > > > Oh, does not it means PowerVM is not compliant to the PAPR specs?
> > > > 
> > > > No, it means the premise of commit b664db8e3f97 ("powerpc/rtas:
> > > > Implement reentrant rtas call") change is incorrect. The "reentrant"
> > > > property described in the spec applies only to the individual RTAS
> > > > functions. The OS can invoke (for example) ibm,set-xive on multiple CPUs
> > > > simultaneously, but it must adhere to the more general requirement to
> > > > serialize with other RTAS functions.
> > > > 
> > > 
> > > I see. Thanks for explaining that part!
> > > I agree: reentrant calls that way don't look as useful on Linux than I
> > > previously thought.
> > > 
> > > OTOH, I think that instead of reverting the change, we could make use of 
> > > the
> > > correct information and fix the current implementation. (This could help 
> > > when we
> > > do the same rtas call in multiple cpus)
> > 
> > Hmm I'm happy to be mistaken here, but I doubt we ever really need to do
> > that. I'm not seeing the need.
> > 
> > > I have an idea of a patch to fix this. 
> > > Do you think it would be ok if I sent that, to prospect being an 
> > > alternative to
> > > this reversion?
> > 
> > It is my preference, and I believe it is more common, to revert to the
> > well-understood prior state, imperfect as it may be. The revert can be
> > backported to -stable and distros while development and review of
> > another approach proceeds.
>
> Ok then, as long as you are aware of the kdump bug, I'm good.
>
> FWIW:
> Reviewed-by: Leonardo Bras 

A shame. I guess a reader/writer lock would not be much help because
the crash is probably more likely to hit longer running rtas calls?

Alternative is just cheat and do this...?

Thanks,
Nick

diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
index 693133972294..89728714a06e 100644
--- a/arch/powerpc/kernel/rtas.c
+++ b/arch/powerpc/kernel/rtas.c
@@ -26,6 +26,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -97,6 +98,19 @@ static unsigned long lock_rtas(void)
 {
unsigned long flags;
 
+   if (atomic_read(_cpu) == raw_smp_processor_id()) {
+   /*
+* Crash in progress on this CPU. Other CPUs should be
+* stopped by now, so skip the lock in case it was being
+* held, and is now needed for crashing e.g., kexec
+* (machine_kexec_mask_interrupts) requires rtas calls.
+*
+* It's possible this could have caused rtas state
breakage
+* but the alternative is deadlock.
+*/
+   return 0;
+   }
+
local_irq_save(flags);
preempt_disable();
arch_spin_lock();
@@ -105,6 +119,9 @@ static unsigned long lock_rtas(void)
 
 static void unlock_rtas(unsigned long flags)
 {
+   if (atomic_read(_cpu) == raw_smp_processor_id())
+   return;
+
arch_spin_unlock();
local_irq_restore(flags);
preempt_enable();



Re: [PATCH] Revert "powerpc/rtas: Implement reentrant rtas call"

2022-09-13 Thread Leonardo Brás
On Mon, 2022-09-12 at 14:58 -0500, Nathan Lynch wrote:
> Leonardo Brás  writes:
> > On Fri, 2022-09-09 at 09:04 -0500, Nathan Lynch wrote:
> > > Leonardo Brás  writes:
> > > > On Wed, 2022-09-07 at 17:01 -0500, Nathan Lynch wrote:
> > > > > At the time this was submitted by Leonardo, I confirmed -- or thought
> > > > > I had confirmed -- with PowerVM partition firmware development that
> > > > > the following RTAS functions:
> > > > > 
> > > > > - ibm,get-xive
> > > > > - ibm,int-off
> > > > > - ibm,int-on
> > > > > - ibm,set-xive
> > > > > 
> > > > > were safe to call on multiple CPUs simultaneously, not only with
> > > > > respect to themselves as indicated by PAPR, but with arbitrary other
> > > > > RTAS calls:
> > > > > 
> > > > > https://lore.kernel.org/linuxppc-dev/875zcy2v8o@linux.ibm.com/
> > > > > 
> > > > > Recent discussion with firmware development makes it clear that this
> > > > > is not true, and that the code in commit b664db8e3f97 ("powerpc/rtas:
> > > > > Implement reentrant rtas call") is unsafe, likely explaining several
> > > > > strange bugs we've seen in internal testing involving DLPAR and
> > > > > LPM. These scenarios use ibm,configure-connector, whose internal state
> > > > > can be corrupted by the concurrent use of the "reentrant" functions,
> > > > > leading to symptoms like endless busy statuses from RTAS.
> > > > 
> > > > Oh, does not it means PowerVM is not compliant to the PAPR specs?
> > > 
> > > No, it means the premise of commit b664db8e3f97 ("powerpc/rtas:
> > > Implement reentrant rtas call") change is incorrect. The "reentrant"
> > > property described in the spec applies only to the individual RTAS
> > > functions. The OS can invoke (for example) ibm,set-xive on multiple CPUs
> > > simultaneously, but it must adhere to the more general requirement to
> > > serialize with other RTAS functions.
> > > 
> > 
> > I see. Thanks for explaining that part!
> > I agree: reentrant calls that way don't look as useful on Linux than I
> > previously thought.
> > 
> > OTOH, I think that instead of reverting the change, we could make use of the
> > correct information and fix the current implementation. (This could help 
> > when we
> > do the same rtas call in multiple cpus)
> 
> Hmm I'm happy to be mistaken here, but I doubt we ever really need to do
> that. I'm not seeing the need.
> 
> > I have an idea of a patch to fix this. 
> > Do you think it would be ok if I sent that, to prospect being an 
> > alternative to
> > this reversion?
> 
> It is my preference, and I believe it is more common, to revert to the
> well-understood prior state, imperfect as it may be. The revert can be
> backported to -stable and distros while development and review of
> another approach proceeds.

Ok then, as long as you are aware of the kdump bug, I'm good.

FWIW:
Reviewed-by: Leonardo Bras 



Re: [PATCH] Revert "powerpc/rtas: Implement reentrant rtas call"

2022-09-12 Thread Nathan Lynch
Leonardo Brás  writes:
> On Fri, 2022-09-09 at 09:04 -0500, Nathan Lynch wrote:
>> Leonardo Brás  writes:
>> > On Wed, 2022-09-07 at 17:01 -0500, Nathan Lynch wrote:
>> > > At the time this was submitted by Leonardo, I confirmed -- or thought
>> > > I had confirmed -- with PowerVM partition firmware development that
>> > > the following RTAS functions:
>> > > 
>> > > - ibm,get-xive
>> > > - ibm,int-off
>> > > - ibm,int-on
>> > > - ibm,set-xive
>> > > 
>> > > were safe to call on multiple CPUs simultaneously, not only with
>> > > respect to themselves as indicated by PAPR, but with arbitrary other
>> > > RTAS calls:
>> > > 
>> > > https://lore.kernel.org/linuxppc-dev/875zcy2v8o@linux.ibm.com/
>> > > 
>> > > Recent discussion with firmware development makes it clear that this
>> > > is not true, and that the code in commit b664db8e3f97 ("powerpc/rtas:
>> > > Implement reentrant rtas call") is unsafe, likely explaining several
>> > > strange bugs we've seen in internal testing involving DLPAR and
>> > > LPM. These scenarios use ibm,configure-connector, whose internal state
>> > > can be corrupted by the concurrent use of the "reentrant" functions,
>> > > leading to symptoms like endless busy statuses from RTAS.
>> > 
>> > Oh, does not it means PowerVM is not compliant to the PAPR specs?
>> 
>> No, it means the premise of commit b664db8e3f97 ("powerpc/rtas:
>> Implement reentrant rtas call") change is incorrect. The "reentrant"
>> property described in the spec applies only to the individual RTAS
>> functions. The OS can invoke (for example) ibm,set-xive on multiple CPUs
>> simultaneously, but it must adhere to the more general requirement to
>> serialize with other RTAS functions.
>> 
>
> I see. Thanks for explaining that part!
> I agree: reentrant calls that way don't look as useful on Linux than I
> previously thought.
>
> OTOH, I think that instead of reverting the change, we could make use of the
> correct information and fix the current implementation. (This could help when 
> we
> do the same rtas call in multiple cpus)

Hmm I'm happy to be mistaken here, but I doubt we ever really need to do
that. I'm not seeing the need.

> I have an idea of a patch to fix this. 
> Do you think it would be ok if I sent that, to prospect being an alternative 
> to
> this reversion?

It is my preference, and I believe it is more common, to revert to the
well-understood prior state, imperfect as it may be. The revert can be
backported to -stable and distros while development and review of
another approach proceeds.


Re: [PATCH] Revert "powerpc/rtas: Implement reentrant rtas call"

2022-09-12 Thread Leonardo Brás
On Fri, 2022-09-09 at 09:04 -0500, Nathan Lynch wrote:
> Hi Leonardo,
> 
> (restoring the list to the cc, hope that's ok)
> 

Sure, thanks for doing that. 
I probably had some mail composer issue here.

> Leonardo Brás  writes:
> > On Wed, 2022-09-07 at 17:01 -0500, Nathan Lynch wrote:
> > > At the time this was submitted by Leonardo, I confirmed -- or thought
> > > I had confirmed -- with PowerVM partition firmware development that
> > > the following RTAS functions:
> > > 
> > > - ibm,get-xive
> > > - ibm,int-off
> > > - ibm,int-on
> > > - ibm,set-xive
> > > 
> > > were safe to call on multiple CPUs simultaneously, not only with
> > > respect to themselves as indicated by PAPR, but with arbitrary other
> > > RTAS calls:
> > > 
> > > https://lore.kernel.org/linuxppc-dev/875zcy2v8o@linux.ibm.com/
> > > 
> > > Recent discussion with firmware development makes it clear that this
> > > is not true, and that the code in commit b664db8e3f97 ("powerpc/rtas:
> > > Implement reentrant rtas call") is unsafe, likely explaining several
> > > strange bugs we've seen in internal testing involving DLPAR and
> > > LPM. These scenarios use ibm,configure-connector, whose internal state
> > > can be corrupted by the concurrent use of the "reentrant" functions,
> > > leading to symptoms like endless busy statuses from RTAS.
> > 
> > Oh, does not it means PowerVM is not compliant to the PAPR specs?
> 
> No, it means the premise of commit b664db8e3f97 ("powerpc/rtas:
> Implement reentrant rtas call") change is incorrect. The "reentrant"
> property described in the spec applies only to the individual RTAS
> functions. The OS can invoke (for example) ibm,set-xive on multiple CPUs
> simultaneously, but it must adhere to the more general requirement to
> serialize with other RTAS functions.
> 

I see. Thanks for explaining that part!
I agree: reentrant calls that way don't look as useful on Linux than I
previously thought.

OTOH, I think that instead of reverting the change, we could make use of the
correct information and fix the current implementation. (This could help when we
do the same rtas call in multiple cpus)

I have an idea of a patch to fix this. 
Do you think it would be ok if I sent that, to prospect being an alternative to
this reversion?


> I don't claim that this is a useful property, at least not for
> Linux. Maybe other OSes are better able to exploit it.
> 
> I apologize for my role in the confusion here. When reviewing the
> original change, I talked to firmware development about the reentrant
> property and how we wanted to use it. I've since reviewed that
> conversation and concluded that I didn't communicate clearly, and that I
> interpreted their responses too eagerly.

No problem. Thanks for even talking to firmware people during original change!


> 
> In the future, when we (pseries Linux developers at IBM) want to go
> beyond the language of the spec, we probably should initiate an
> architecture change to make the PAPR eventually align with our
> implementation choices.

Yeah, it makes sense.

> 
> > I mentioned this in the original Commit, and it's still true to the last 
> > LoPAR:
> > 
> > ###
> > For "ibm,int-on", "ibm,int-off",ibm,get-xive" and  "ibm,set-xive".
> > 
> > On LoPAPR Version 1.1 (March 24, 2016), from 7.3.10.1 to 7.3.10.4,
> > items 2 and 3 say:
> > 
> > 2 - For the PowerPC External Interrupt option: The * call must be
> > reentrant to the number of processors on the platform.
> > 3 - For the PowerPC External Interrupt option: The * argument call
> > buffer for each simultaneous call must be physically unique.
> > 
> > ### 
> > 
> > Other than that, IIRC, this change was used to avoid issues that were 
> > happening
> > on kdump/kexec: 
> > If another thread was holding the rtas lock, kdump/kexec would get stuck 
> > waiting
> > for the lock forever and never finish the process, often causing a bug
> > reproduction's kdump to not get collected. 
> > 
> > Is there any other fix for the above bug? 
> 
> Not that I'm aware of, but if this is a common failure mode for kdump,
> then perhaps rtas_call() or related code should be made to ignore the
> lock in the crash path, as a more general mitigation.

It was the first idea at the time (bust the rtas lock), but I thought it could
raise some issues. Reentrant rtas calls seemed like a better solution at the
time.

> 
> Then again, maybe it's not that urgent? Only XICS mode guests are
> potentially affected. Do we even have this problem with
> firmware-assisted dumps on PowerVM?

It's been a lot of time, I don't recall all the details on that.

> 
> > Or is that a bug which is acceptable to have back, compared to the new
> > one?
> 
> It was not acceptable to regress existing functions (DLPAR, LPM) in
> exchange for making the crash path more robust. Reverting the change is
> the correct tradeoff, unfortunately.

I was talking about the kdump bug being acceptable, which by the previous
comment it is, or can be solved in other ways 

Re: [PATCH] Revert "powerpc/rtas: Implement reentrant rtas call"

2022-09-09 Thread Nathan Lynch
Hi Leonardo,

(restoring the list to the cc, hope that's ok)

Leonardo Brás  writes:
> On Wed, 2022-09-07 at 17:01 -0500, Nathan Lynch wrote:
>> At the time this was submitted by Leonardo, I confirmed -- or thought
>> I had confirmed -- with PowerVM partition firmware development that
>> the following RTAS functions:
>> 
>> - ibm,get-xive
>> - ibm,int-off
>> - ibm,int-on
>> - ibm,set-xive
>> 
>> were safe to call on multiple CPUs simultaneously, not only with
>> respect to themselves as indicated by PAPR, but with arbitrary other
>> RTAS calls:
>> 
>> https://lore.kernel.org/linuxppc-dev/875zcy2v8o@linux.ibm.com/
>> 
>> Recent discussion with firmware development makes it clear that this
>> is not true, and that the code in commit b664db8e3f97 ("powerpc/rtas:
>> Implement reentrant rtas call") is unsafe, likely explaining several
>> strange bugs we've seen in internal testing involving DLPAR and
>> LPM. These scenarios use ibm,configure-connector, whose internal state
>> can be corrupted by the concurrent use of the "reentrant" functions,
>> leading to symptoms like endless busy statuses from RTAS.
>
> Oh, does not it means PowerVM is not compliant to the PAPR specs?

No, it means the premise of commit b664db8e3f97 ("powerpc/rtas:
Implement reentrant rtas call") change is incorrect. The "reentrant"
property described in the spec applies only to the individual RTAS
functions. The OS can invoke (for example) ibm,set-xive on multiple CPUs
simultaneously, but it must adhere to the more general requirement to
serialize with other RTAS functions.

I don't claim that this is a useful property, at least not for
Linux. Maybe other OSes are better able to exploit it.

I apologize for my role in the confusion here. When reviewing the
original change, I talked to firmware development about the reentrant
property and how we wanted to use it. I've since reviewed that
conversation and concluded that I didn't communicate clearly, and that I
interpreted their responses too eagerly.

In the future, when we (pseries Linux developers at IBM) want to go
beyond the language of the spec, we probably should initiate an
architecture change to make the PAPR eventually align with our
implementation choices.

> I mentioned this in the original Commit, and it's still true to the last 
> LoPAR:
>
> ###
> For "ibm,int-on", "ibm,int-off",ibm,get-xive" and  "ibm,set-xive".
>
> On LoPAPR Version 1.1 (March 24, 2016), from 7.3.10.1 to 7.3.10.4,
> items 2 and 3 say:
>
> 2 - For the PowerPC External Interrupt option: The * call must be
> reentrant to the number of processors on the platform.
> 3 - For the PowerPC External Interrupt option: The * argument call
> buffer for each simultaneous call must be physically unique.
>
> ### 
>
> Other than that, IIRC, this change was used to avoid issues that were 
> happening
> on kdump/kexec: 
> If another thread was holding the rtas lock, kdump/kexec would get stuck 
> waiting
> for the lock forever and never finish the process, often causing a bug
> reproduction's kdump to not get collected. 
>
> Is there any other fix for the above bug? 

Not that I'm aware of, but if this is a common failure mode for kdump,
then perhaps rtas_call() or related code should be made to ignore the
lock in the crash path, as a more general mitigation.

Then again, maybe it's not that urgent? Only XICS mode guests are
potentially affected. Do we even have this problem with
firmware-assisted dumps on PowerVM?

> Or is that a bug which is acceptable to have back, compared to the new
> one?

It was not acceptable to regress existing functions (DLPAR, LPM) in
exchange for making the crash path more robust. Reverting the change is
the correct tradeoff, unfortunately.


Re: [PATCH] Revert "powerpc/rtas: Implement reentrant rtas call"

2022-09-08 Thread Laurent Dufour
Le 08/09/2022 à 00:01, Nathan Lynch a écrit :
> At the time this was submitted by Leonardo, I confirmed -- or thought
> I had confirmed -- with PowerVM partition firmware development that
> the following RTAS functions:
> 
> - ibm,get-xive
> - ibm,int-off
> - ibm,int-on
> - ibm,set-xive
> 
> were safe to call on multiple CPUs simultaneously, not only with
> respect to themselves as indicated by PAPR, but with arbitrary other
> RTAS calls:
> 
> https://lore.kernel.org/linuxppc-dev/875zcy2v8o@linux.ibm.com/
> 
> Recent discussion with firmware development makes it clear that this
> is not true, and that the code in commit b664db8e3f97 ("powerpc/rtas:
> Implement reentrant rtas call") is unsafe, likely explaining several
> strange bugs we've seen in internal testing involving DLPAR and
> LPM. These scenarios use ibm,configure-connector, whose internal state
> can be corrupted by the concurrent use of the "reentrant" functions,
> leading to symptoms like endless busy statuses from RTAS.

Thanks, Nathan,
T
his is fixing LPAR hangs I was facing when doing some migration tests.

Reviewed-by: Laurent Dufour 

> Signed-off-by: Nathan Lynch 
> Fixes: b664db8e3f97 ("powerpc/rtas: Implement reentrant rtas call")
> Cc: sta...@vger.kernel.org
> ---
>  arch/powerpc/include/asm/paca.h |  1 -
>  arch/powerpc/include/asm/rtas.h |  1 -
>  arch/powerpc/kernel/paca.c  | 32 -
>  arch/powerpc/kernel/rtas.c  | 54 -
>  arch/powerpc/sysdev/xics/ics-rtas.c | 22 ++--
>  5 files changed, 11 insertions(+), 99 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h
> index 4d7aaab82702..3537b0500f4d 100644
> --- a/arch/powerpc/include/asm/paca.h
> +++ b/arch/powerpc/include/asm/paca.h
> @@ -263,7 +263,6 @@ struct paca_struct {
>   u64 l1d_flush_size;
>  #endif
>  #ifdef CONFIG_PPC_PSERIES
> - struct rtas_args *rtas_args_reentrant;
>   u8 *mce_data_buf;   /* buffer to hold per cpu rtas errlog */
>  #endif /* CONFIG_PPC_PSERIES */
>  
> diff --git a/arch/powerpc/include/asm/rtas.h b/arch/powerpc/include/asm/rtas.h
> index 00531af17ce0..56319aea646e 100644
> --- a/arch/powerpc/include/asm/rtas.h
> +++ b/arch/powerpc/include/asm/rtas.h
> @@ -240,7 +240,6 @@ extern struct rtas_t rtas;
>  extern int rtas_token(const char *service);
>  extern int rtas_service_present(const char *service);
>  extern int rtas_call(int token, int, int, int *, ...);
> -int rtas_call_reentrant(int token, int nargs, int nret, int *outputs, ...);
>  void rtas_call_unlocked(struct rtas_args *args, int token, int nargs,
>   int nret, ...);
>  extern void __noreturn rtas_restart(char *cmd);
> diff --git a/arch/powerpc/kernel/paca.c b/arch/powerpc/kernel/paca.c
> index ba593fd60124..dfd097b79160 100644
> --- a/arch/powerpc/kernel/paca.c
> +++ b/arch/powerpc/kernel/paca.c
> @@ -16,7 +16,6 @@
>  #include 
>  #include 
>  #include 
> -#include 
>  
>  #include "setup.h"
>  
> @@ -170,30 +169,6 @@ static struct slb_shadow * __init new_slb_shadow(int 
> cpu, unsigned long limit)
>  }
>  #endif /* CONFIG_PPC_64S_HASH_MMU */
>  
> -#ifdef CONFIG_PPC_PSERIES
> -/**
> - * new_rtas_args() - Allocates rtas args
> - * @cpu: CPU number
> - * @limit:   Memory limit for this allocation
> - *
> - * Allocates a struct rtas_args and return it's pointer,
> - * if not in Hypervisor mode
> - *
> - * Return:   Pointer to allocated rtas_args
> - *   NULL if CPU in Hypervisor Mode
> - */
> -static struct rtas_args * __init new_rtas_args(int cpu, unsigned long limit)
> -{
> - limit = min_t(unsigned long, limit, RTAS_INSTANTIATE_MAX);
> -
> - if (early_cpu_has_feature(CPU_FTR_HVMODE))
> - return NULL;
> -
> - return alloc_paca_data(sizeof(struct rtas_args), L1_CACHE_BYTES,
> -limit, cpu);
> -}
> -#endif /* CONFIG_PPC_PSERIES */
> -
>  /* The Paca is an array with one entry per processor.  Each contains an
>   * lppaca, which contains the information shared between the
>   * hypervisor and Linux.
> @@ -232,10 +207,6 @@ void __init initialise_paca(struct paca_struct 
> *new_paca, int cpu)
>   /* For now -- if we have threads this will be adjusted later */
>   new_paca->tcd_ptr = _paca->tcd;
>  #endif
> -
> -#ifdef CONFIG_PPC_PSERIES
> - new_paca->rtas_args_reentrant = NULL;
> -#endif
>  }
>  
>  /* Put the paca pointer into r13 and SPRG_PACA */
> @@ -307,9 +278,6 @@ void __init allocate_paca(int cpu)
>  #endif
>  #ifdef CONFIG_PPC_64S_HASH_MMU
>   paca->slb_shadow_ptr = new_slb_shadow(cpu, limit);
> -#endif
> -#ifdef CONFIG_PPC_PSERIES
> - paca->rtas_args_reentrant = new_rtas_args(cpu, limit);
>  #endif
>   paca_struct_size += sizeof(struct paca_struct);
>  }
> diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
> index 693133972294..0b8a858aa847 100644
> --- a/arch/powerpc/kernel/rtas.c
> +++ b/arch/powerpc/kernel/rtas.c
> @@ -43,7 

[PATCH] Revert "powerpc/rtas: Implement reentrant rtas call"

2022-09-07 Thread Nathan Lynch
At the time this was submitted by Leonardo, I confirmed -- or thought
I had confirmed -- with PowerVM partition firmware development that
the following RTAS functions:

- ibm,get-xive
- ibm,int-off
- ibm,int-on
- ibm,set-xive

were safe to call on multiple CPUs simultaneously, not only with
respect to themselves as indicated by PAPR, but with arbitrary other
RTAS calls:

https://lore.kernel.org/linuxppc-dev/875zcy2v8o@linux.ibm.com/

Recent discussion with firmware development makes it clear that this
is not true, and that the code in commit b664db8e3f97 ("powerpc/rtas:
Implement reentrant rtas call") is unsafe, likely explaining several
strange bugs we've seen in internal testing involving DLPAR and
LPM. These scenarios use ibm,configure-connector, whose internal state
can be corrupted by the concurrent use of the "reentrant" functions,
leading to symptoms like endless busy statuses from RTAS.

Signed-off-by: Nathan Lynch 
Fixes: b664db8e3f97 ("powerpc/rtas: Implement reentrant rtas call")
Cc: sta...@vger.kernel.org
---
 arch/powerpc/include/asm/paca.h |  1 -
 arch/powerpc/include/asm/rtas.h |  1 -
 arch/powerpc/kernel/paca.c  | 32 -
 arch/powerpc/kernel/rtas.c  | 54 -
 arch/powerpc/sysdev/xics/ics-rtas.c | 22 ++--
 5 files changed, 11 insertions(+), 99 deletions(-)

diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h
index 4d7aaab82702..3537b0500f4d 100644
--- a/arch/powerpc/include/asm/paca.h
+++ b/arch/powerpc/include/asm/paca.h
@@ -263,7 +263,6 @@ struct paca_struct {
u64 l1d_flush_size;
 #endif
 #ifdef CONFIG_PPC_PSERIES
-   struct rtas_args *rtas_args_reentrant;
u8 *mce_data_buf;   /* buffer to hold per cpu rtas errlog */
 #endif /* CONFIG_PPC_PSERIES */
 
diff --git a/arch/powerpc/include/asm/rtas.h b/arch/powerpc/include/asm/rtas.h
index 00531af17ce0..56319aea646e 100644
--- a/arch/powerpc/include/asm/rtas.h
+++ b/arch/powerpc/include/asm/rtas.h
@@ -240,7 +240,6 @@ extern struct rtas_t rtas;
 extern int rtas_token(const char *service);
 extern int rtas_service_present(const char *service);
 extern int rtas_call(int token, int, int, int *, ...);
-int rtas_call_reentrant(int token, int nargs, int nret, int *outputs, ...);
 void rtas_call_unlocked(struct rtas_args *args, int token, int nargs,
int nret, ...);
 extern void __noreturn rtas_restart(char *cmd);
diff --git a/arch/powerpc/kernel/paca.c b/arch/powerpc/kernel/paca.c
index ba593fd60124..dfd097b79160 100644
--- a/arch/powerpc/kernel/paca.c
+++ b/arch/powerpc/kernel/paca.c
@@ -16,7 +16,6 @@
 #include 
 #include 
 #include 
-#include 
 
 #include "setup.h"
 
@@ -170,30 +169,6 @@ static struct slb_shadow * __init new_slb_shadow(int cpu, 
unsigned long limit)
 }
 #endif /* CONFIG_PPC_64S_HASH_MMU */
 
-#ifdef CONFIG_PPC_PSERIES
-/**
- * new_rtas_args() - Allocates rtas args
- * @cpu:   CPU number
- * @limit: Memory limit for this allocation
- *
- * Allocates a struct rtas_args and return it's pointer,
- * if not in Hypervisor mode
- *
- * Return: Pointer to allocated rtas_args
- * NULL if CPU in Hypervisor Mode
- */
-static struct rtas_args * __init new_rtas_args(int cpu, unsigned long limit)
-{
-   limit = min_t(unsigned long, limit, RTAS_INSTANTIATE_MAX);
-
-   if (early_cpu_has_feature(CPU_FTR_HVMODE))
-   return NULL;
-
-   return alloc_paca_data(sizeof(struct rtas_args), L1_CACHE_BYTES,
-  limit, cpu);
-}
-#endif /* CONFIG_PPC_PSERIES */
-
 /* The Paca is an array with one entry per processor.  Each contains an
  * lppaca, which contains the information shared between the
  * hypervisor and Linux.
@@ -232,10 +207,6 @@ void __init initialise_paca(struct paca_struct *new_paca, 
int cpu)
/* For now -- if we have threads this will be adjusted later */
new_paca->tcd_ptr = _paca->tcd;
 #endif
-
-#ifdef CONFIG_PPC_PSERIES
-   new_paca->rtas_args_reentrant = NULL;
-#endif
 }
 
 /* Put the paca pointer into r13 and SPRG_PACA */
@@ -307,9 +278,6 @@ void __init allocate_paca(int cpu)
 #endif
 #ifdef CONFIG_PPC_64S_HASH_MMU
paca->slb_shadow_ptr = new_slb_shadow(cpu, limit);
-#endif
-#ifdef CONFIG_PPC_PSERIES
-   paca->rtas_args_reentrant = new_rtas_args(cpu, limit);
 #endif
paca_struct_size += sizeof(struct paca_struct);
 }
diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
index 693133972294..0b8a858aa847 100644
--- a/arch/powerpc/kernel/rtas.c
+++ b/arch/powerpc/kernel/rtas.c
@@ -43,7 +43,6 @@
 #include 
 #include 
 #include 
-#include 
 
 /* This is here deliberately so it's only used in this file */
 void enter_rtas(unsigned long);
@@ -932,59 +931,6 @@ void rtas_activate_firmware(void)
pr_err("ibm,activate-firmware failed (%i)\n", fwrc);
 }
 
-#ifdef CONFIG_PPC_PSERIES
-/**
- * rtas_call_reentrant() - Used for reentrant rtas calls
- * @token: