Re: [Xen-devel] [PATCH v3 10/10] xen/arm: Enable errata for secondary CPU on hotplug after the boot

2018-05-15 Thread Mirela Simonovic
Hi Stefano,

On Mon, May 14, 2018 at 6:59 PM, Stefano Stabellini
 wrote:
> On Mon, 14 May 2018, Julien Grall wrote:
>> On 11/05/18 22:47, Stefano Stabellini wrote:
>> > On Fri, 11 May 2018, Dario Faggioli wrote:
>> > > On Fri, 2018-05-11 at 14:08 +0100, Julien Grall wrote:
>> > > > The whole idea here is we have only one place taking the decision and
>> > > > we
>> > > > don't spread BUG_ON()/panic/stop_cpu everywhere. The benefit is
>> > > > having
>> > > > only one place to fix over multiple one because very likely the
>> > > > decision
>> > > > is the same everywhere.
>> > > >
>> > > > I agree that today it will end up to crashing the system because of
>> > > > the
>> > > > BUG_ON. But that's a separate topic.
>> > > >
>> > > Yes!!! :-D
>> > >
>> > > I.e., as I've said countless times, I would think that a series which
>> > > introduces a CPU_STARTING notifier that fails, should also deal with
>> > > adjusting the CPU process accordingly.
>> > >
>> > > *BUT* if you ARM people are ok with arch/arm/ code that does that,
>> > > perhaps with a comment saying something like:
>> > >
>> > > "This will cause us to hit the BUG_ON() in notify_cpu_starting(). To
>> > > fix that, we need to properly change the CPU bringup code, which will
>> > > happen in a leter series."
>> > >
>> > > that would also work, I guess. :-)
>> >
>> > Yes, I think that returning error with an in-code comment on top is the
>> > best solution.
>>
>> It is the second best solution ;). If we consider the notifier can return an
>> error, then best solution is to fix notify_cpu_starting().
>>
>> I would be ok with the second best solution if we have someone to fix it for
>> Xen 4.12. Per my understanding, Mirela is not going to do it. So what's the
>> plan here?
>
> I can look at fixing notify_cpu_starting(). I am also OK with you
> reworking the vmap code as you suggested below. Regardless, I think
> Mirela should go ahead with the comment now. Then, either you or me are
> going to come in and remove the comment one way or another (either
> fixing notify_cpu_starting or imposing all the callbacks to never return
> an error).
>

Thanks, I submitted v4.

Regards,
Mirela

>
>> Another solution is to impose all the enable callbacks to never return an
>> error (AFAICT Linux is just ignoring the return of the callback)).
>>
>> Today, we happen to return an error only in the case vmap is failing (used to
>> remapped vector table read-write). It might be possible to avoid the 
>> potential
>> re-mapping failure by reworking the code.
>>
>> I could explore that solution if we prefer going towards imposing all the
>> enable callbacks to never return an error.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v3 10/10] xen/arm: Enable errata for secondary CPU on hotplug after the boot

2018-05-14 Thread Stefano Stabellini
On Mon, 14 May 2018, Julien Grall wrote:
> On 11/05/18 22:47, Stefano Stabellini wrote:
> > On Fri, 11 May 2018, Dario Faggioli wrote:
> > > On Fri, 2018-05-11 at 14:08 +0100, Julien Grall wrote:
> > > > The whole idea here is we have only one place taking the decision and
> > > > we
> > > > don't spread BUG_ON()/panic/stop_cpu everywhere. The benefit is
> > > > having
> > > > only one place to fix over multiple one because very likely the
> > > > decision
> > > > is the same everywhere.
> > > > 
> > > > I agree that today it will end up to crashing the system because of
> > > > the
> > > > BUG_ON. But that's a separate topic.
> > > > 
> > > Yes!!! :-D
> > > 
> > > I.e., as I've said countless times, I would think that a series which
> > > introduces a CPU_STARTING notifier that fails, should also deal with
> > > adjusting the CPU process accordingly.
> > > 
> > > *BUT* if you ARM people are ok with arch/arm/ code that does that,
> > > perhaps with a comment saying something like:
> > > 
> > > "This will cause us to hit the BUG_ON() in notify_cpu_starting(). To
> > > fix that, we need to properly change the CPU bringup code, which will
> > > happen in a leter series."
> > > 
> > > that would also work, I guess. :-)
> > 
> > Yes, I think that returning error with an in-code comment on top is the
> > best solution.
> 
> It is the second best solution ;). If we consider the notifier can return an
> error, then best solution is to fix notify_cpu_starting().
>
> I would be ok with the second best solution if we have someone to fix it for
> Xen 4.12. Per my understanding, Mirela is not going to do it. So what's the
> plan here?

I can look at fixing notify_cpu_starting(). I am also OK with you
reworking the vmap code as you suggested below. Regardless, I think
Mirela should go ahead with the comment now. Then, either you or me are
going to come in and remove the comment one way or another (either
fixing notify_cpu_starting or imposing all the callbacks to never return
an error).


> Another solution is to impose all the enable callbacks to never return an
> error (AFAICT Linux is just ignoring the return of the callback)).
> 
> Today, we happen to return an error only in the case vmap is failing (used to
> remapped vector table read-write). It might be possible to avoid the potential
> re-mapping failure by reworking the code.
> 
> I could explore that solution if we prefer going towards imposing all the
> enable callbacks to never return an error.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v3 10/10] xen/arm: Enable errata for secondary CPU on hotplug after the boot

2018-05-14 Thread Julien Grall



On 11/05/18 22:47, Stefano Stabellini wrote:

On Fri, 11 May 2018, Dario Faggioli wrote:

On Fri, 2018-05-11 at 14:08 +0100, Julien Grall wrote:

The whole idea here is we have only one place taking the decision and
we
don't spread BUG_ON()/panic/stop_cpu everywhere. The benefit is
having
only one place to fix over multiple one because very likely the
decision
is the same everywhere.

I agree that today it will end up to crashing the system because of
the
BUG_ON. But that's a separate topic.


Yes!!! :-D

I.e., as I've said countless times, I would think that a series which
introduces a CPU_STARTING notifier that fails, should also deal with
adjusting the CPU process accordingly.

*BUT* if you ARM people are ok with arch/arm/ code that does that,
perhaps with a comment saying something like:

"This will cause us to hit the BUG_ON() in notify_cpu_starting(). To
fix that, we need to properly change the CPU bringup code, which will
happen in a leter series."

that would also work, I guess. :-)


Yes, I think that returning error with an in-code comment on top is the
best solution.


It is the second best solution ;). If we consider the notifier can 
return an error, then best solution is to fix notify_cpu_starting().


I would be ok with the second best solution if we have someone to fix it 
for Xen 4.12. Per my understanding, Mirela is not going to do it. So 
what's the plan here?


Another solution is to impose all the enable callbacks to never return 
an error (AFAICT Linux is just ignoring the return of the callback)).


Today, we happen to return an error only in the case vmap is failing 
(used to remapped vector table read-write). It might be possible to 
avoid the potential re-mapping failure by reworking the code.


I could explore that solution if we prefer going towards imposing all 
the enable callbacks to never return an error.


Cheers,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v3 10/10] xen/arm: Enable errata for secondary CPU on hotplug after the boot

2018-05-11 Thread Stefano Stabellini
On Fri, 11 May 2018, Dario Faggioli wrote:
> On Fri, 2018-05-11 at 14:08 +0100, Julien Grall wrote:
> > The whole idea here is we have only one place taking the decision and
> > we 
> > don't spread BUG_ON()/panic/stop_cpu everywhere. The benefit is
> > having 
> > only one place to fix over multiple one because very likely the
> > decision 
> > is the same everywhere.
> > 
> > I agree that today it will end up to crashing the system because of
> > the 
> > BUG_ON. But that's a separate topic.
> > 
> Yes!!! :-D
> 
> I.e., as I've said countless times, I would think that a series which
> introduces a CPU_STARTING notifier that fails, should also deal with
> adjusting the CPU process accordingly.
> 
> *BUT* if you ARM people are ok with arch/arm/ code that does that,
> perhaps with a comment saying something like:
> 
> "This will cause us to hit the BUG_ON() in notify_cpu_starting(). To
> fix that, we need to properly change the CPU bringup code, which will
> happen in a leter series."
> 
> that would also work, I guess. :-)

Yes, I think that returning error with an in-code comment on top is the
best solution.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v3 10/10] xen/arm: Enable errata for secondary CPU on hotplug after the boot

2018-05-11 Thread Dario Faggioli
On Fri, 2018-05-11 at 15:44 +0200, Mirela Simonovic wrote:
> Hi Dario and Julien,
> 
Err... you've dropped the list and everyone else but me. Re-adding...

> Thanks for the feedback to both. 
>
You're welcome. :-) 

> I think we need to roll back here. I
> believe the root cause of this is an attempt to do errata workarounds
> using notifiers.
> But let me try to enumerate all the options I see as possible
> solutions:
> 
> 1) Don't use notifiers to do errata workaround. Do it before
> CPU_STARTING fires, essentially from start_secondary() before calling
> notify_cpu_starting(). But we need to stop the CPU within
> start_secondary() if enabling errata fails. In start_secondary()
> stop_cpu is already done so I don't see why would an additional call
> be a problem.
> 
I'm no expert of ARM's start_secondary(). Indeed it looks like it can
fail already, so what you say seems to make sense, but really, I better
don't comment on this and leave it to ARM people.

> 2) Still try to use notifiers. We have few options here:
> 2.a) Enabling capability must not fail because a notifier at this
> stage should not fail. This would mean that function to which
> 'enable'
> ptr points (defined in struct arm_cpu_capabilities) has to return
> void
> instead of int. This doesn't seem right to me.
>
It's not.

> 2.b) Change scheduler and whatever else is needed (right place to
> refer to escalation of the scope of series :) in order to make
> CPU_STARTING possible to fail. 
>
Nope, this is just as wrong as 2.a.

> I'm not the person to do that since it
> affects way beyond what I suppose to do. Please note here that I'm
> not
> running away from doing the job. I'm just concerned that this will
> compromise the actual work I suppose to do from the funding/time
> perspective.
> 2.c) Return an error and hit BUG_ON. Add comments as Dario proposed.
> I
> need to state here that I probably won't be the one to implement the
> following series that allows CPU_STARTING to fail.
> 
Yes, this is correct, from the point of view of ARM vs common code
interaction.

Whether or not it is ok to leave this pending, is again up to ARM
people, IMO, as it would be ARM that risks panicing, if the notifier
fails.

> Option 1) or 2.c) sounds like a good compromise to me. What do you
> think?
> 
Well, all I can say is that you should stay away from notifiers
priority --especially of the one of cpu_schedule_callback(). As long as
you do that, I won't be on your way. :-D

Regards,
Dario
-- 
<> (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Software Engineer @ SUSE https://www.suse.com/

signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v3 10/10] xen/arm: Enable errata for secondary CPU on hotplug after the boot

2018-05-11 Thread Dario Faggioli
On Fri, 2018-05-11 at 14:08 +0100, Julien Grall wrote:
> The whole idea here is we have only one place taking the decision and
> we 
> don't spread BUG_ON()/panic/stop_cpu everywhere. The benefit is
> having 
> only one place to fix over multiple one because very likely the
> decision 
> is the same everywhere.
> 
> I agree that today it will end up to crashing the system because of
> the 
> BUG_ON. But that's a separate topic.
> 
Yes!!! :-D

I.e., as I've said countless times, I would think that a series which
introduces a CPU_STARTING notifier that fails, should also deal with
adjusting the CPU process accordingly.

*BUT* if you ARM people are ok with arch/arm/ code that does that,
perhaps with a comment saying something like:

"This will cause us to hit the BUG_ON() in notify_cpu_starting(). To
fix that, we need to properly change the CPU bringup code, which will
happen in a leter series."

that would also work, I guess. :-)

Regards,
Dario
-- 
<> (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Software Engineer @ SUSE https://www.suse.com/

signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v3 10/10] xen/arm: Enable errata for secondary CPU on hotplug after the boot

2018-05-11 Thread Dario Faggioli
On Fri, 2018-05-11 at 11:54 +0100, Julien Grall wrote:
> On 11/05/18 11:41, Mirela Simonovic wrote:
> "We should really avoid to use panic(...) if this is something the 
> system can survive. In that specific case, it would only affect the 
> current CPU. So it would be better to return an error and let the
> caller 
> decide what to do."
> 
> To summarize:
>   1) Notifiers should only report an error and let the upper
> caller (here 
> notify_cpu_starting()) deciding what to do.
>   2) I am OK with the BUG_ON in notify_cpu_starting() for now.
> 
And, in general, I agree with all this.

However (and I think I'm repeating this concept for, what, the 10th
time now?!?! :-P), in this specific case, the reason why none of the
existing CPU_STARTING callbacks report an error, is that the CPU
bringup process is, AFAICT, built around the assumption that once we
reached CPU_STARTING, things can't fail.

For this very reason, whatever new CPU_STARTING callback is introduced,
in this series or everywhere else, _can't_ fail, and _can't_ return any
error.

If we need to have a CPU_STARTING callback that can fail, we need to
change the CPU bringup process accordingly.

Regards,
Dario
-- 
<> (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Software Engineer @ SUSE https://www.suse.com/

signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v3 10/10] xen/arm: Enable errata for secondary CPU on hotplug after the boot

2018-05-11 Thread Julien Grall



On 11/05/18 13:20, Mirela Simonovic wrote:

Hi,

On Fri, May 11, 2018 at 2:07 PM, Mirela Simonovic
 wrote:

On Fri, May 11, 2018 at 12:54 PM, Julien Grall  wrote:

On 11/05/18 11:41, Mirela Simonovic wrote:

On Thu, May 10, 2018 at 6:24 PM, Dario Faggioli 
wrote:

On Thu, 2018-05-10 at 17:49 +0200, Mirela Simonovic wrote:

I am going to repeat the content of my answer to your last e-mail:

I was aware about it since the beginning. The whole point of the
conversation was we should avoid to take the decision at the lower level and
let the upper layer decide what to do.

If the system is failing today then that's fine and still fit what I said in
my first e-mail of that thread. For reminder:

"We should really avoid to use panic(...) if this is something the system
can survive. In that specific case, it would only affect the current CPU. So
it would be better to return an error and let the caller decide what to do."

To summarize:
 1) Notifiers should only report an error and let the upper caller
(here notify_cpu_starting()) deciding what to do.
 2) I am OK with the BUG_ON in notify_cpu_starting() for now.


I agree with BUG_ON in notify_cpu_starting().





Let me just clarify consequence of your proposal according to my
understanding. If instead of stopping the CPU when enabling a
capability fails the notifier returns an error, the error will
propagate to notify_cpu_starting() and BUG_ON will crash the system.
The proposal with stop_cpu() in the enable_nonboot_cpu_caps() instead
of panic that is submitted in this patch would stop only the erroneous
CPU. The rest of the system will continue to work and I though that is
what's the goal since we don't want to panic/BUG_ON.
 From that perspective I believe stop_cpu() in
enable_nonboot_cpu_caps() is better compared to returning an error
from the notifier.

You said above "I would be ok having stop_cpu called here" and I did
so (stop_cpu() in enable_nonboot_cpu_caps() instead of panic that
submitted in this patch).


My thoughts have evolved after Dario's discussion. He expressed 
concerned over your fix to make stop_cpu() working.


As you said I will maintain that code and this solution looks very error 
prone. If Stefano is happy with it, then fine.




If you believe my understanding is not correct, if I missed something
or you have another proposal please let me know.



Also, if you just want to convert panic from this patch into print I
don't believe it's a good approach, but I can do that.


I would prefer to see the notifier reporting the error with a warning 
and returning it.


At the notifier level it does not make sense to take the decision to 
stop the CPU or kill the system. This is a decision that should be taken 
at higher level such as in notify_cpu_starting().


The whole idea here is we have only one place taking the decision and we 
don't spread BUG_ON()/panic/stop_cpu everywhere. The benefit is having 
only one place to fix over multiple one because very likely the decision 
is the same everywhere.


I agree that today it will end up to crashing the system because of the 
BUG_ON. But that's a separate topic.


Cheers,




Thanks,
Mirela


Cheers,

--
Julien Grall


--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v3 10/10] xen/arm: Enable errata for secondary CPU on hotplug after the boot

2018-05-11 Thread Dario Faggioli
On Fri, 2018-05-11 at 12:41 +0200, Mirela Simonovic wrote:
> Hi Dario,
> 
Hi,

> On Thu, May 10, 2018 at 6:24 PM, Dario Faggioli 
> wrote:
> > I may very well be missing or misunderstanding something, but I
> > continue to think that the problem here is that CPU_STARTING can't,
> > right now, fail, while you need it to be able to.
> > 
> > If that is the case, giving different priorities to the notifier,
> > is
> > not a solution.
> > 
> Let me try to clarify. The assumption is that the starting CPU can
> fail. Additional requirement set by Julien is that panic or BUG_ON is
> not acceptable. 
>
So, currently, in Xen, CPU bringup can't fail at the CPU_STARTING
phase. This is the point of the BUG_ON().

It is you that need it to change this, and make it possible for CPU
bringup to fail during CPU_STARTING. This is fine, but require changes,
which, IMO, are not limited to removing the BUG_ON() or trading it with
something else.

> There are 2 ways to deal with this scenario:
> 
> 1) Ignore and report the error, and let the erroneous CPU become
> online.
> This cannot be done without changing the logic in either scheduler or
> notify_cpu_starting(), or I least I don't see how would that be done.
> In previous email when I said "escalating this to who knows where" I
> did not refer to error escalation but the escalation of the scope of
> this series.
> 
How can that be an option? If CPU bringup failed, how is it possible to
let it go online?

To me, it's not that "we can't let a CPU for which bringup failed
continue without changing the scheduler or notify_cpu_starting()". It's
rather "we must not a CPU for which bringup faile continue. Period.".

Which is to say, you need to change things (in common code!) in such a
way that CPU bringup can fail during the CPU_STARTING phase.

> 2) Stop the erroneous CPU.
> Taking this approach requires setting the priority for the notifier.
>
No! Stop the CPU if the bringup fails duting the CPU_STARTING phase
does not require setting the priorities of the CPU_STARTING notifier. 

It requires to changing things (in common code) in such a way that CPU
bringup can fail during the CPU_STARTING phase. (Did I say that
already? :-) )

I understand that, if you set the priority, your series work. But that
does not mean that it is a proper solution to the problem. It means
that it is an hack that...well... makes your series work. :-)

> The key point is that notify_cpu_starting() and scheduler do not have
> to be changed. If errata notifier has higher priority than
> scheduler's
> notifier in the case of an error the CPU will not return into
> notify_cpu_starting() and it will never execute BUG_ON because it
> will
> be stopped. The rest of the system will continue to function without
> that CPU.
> 
Right. But now we have and architecture (ARM) for which CPU bringup can
fail at the CPU_STARTING phase. And yet, looking at common code, we see
that the CPU_STARTING notifier has a BUG_ON() if it ever fails. How
would people looking at this in 2 years time from now make sense of
this?

No, I don't think there really is a sensible workaround here. If you
need the CPU_STARTING phase of CPU bringup to be able to fail, you need
to make all the changes required to make that happen properly.

Which does involve changing common code, and which should therefore be
discussed with the other arches and core hypervisor maintainers.

Regards,
Dario
-- 
<> (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Software Engineer @ SUSE https://www.suse.com/

signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v3 10/10] xen/arm: Enable errata for secondary CPU on hotplug after the boot

2018-05-11 Thread Mirela Simonovic
Hi,

On Fri, May 11, 2018 at 2:07 PM, Mirela Simonovic
 wrote:
> Hi Julien,
>
> On Fri, May 11, 2018 at 12:54 PM, Julien Grall  wrote:
>>
>>
>> On 11/05/18 11:41, Mirela Simonovic wrote:
>>>
>>> Hi Dario,
>>>
>>> On Thu, May 10, 2018 at 6:24 PM, Dario Faggioli 
>>> wrote:

 On Thu, 2018-05-10 at 17:49 +0200, Mirela Simonovic wrote:
>
> Regardless of the fact that the notifier returns an error or not, I
> believe it would be good and safe to set priority and document that
> priority zero would cause racing issue in the scenario I debugged
> today. I'm pretty sure that this discussion would be forgotten soon
> and it really should be documented in code/comment.
>
 I may very well be missing or misunderstanding something, but I
 continue to think that the problem here is that CPU_STARTING can't,
 right now, fail, while you need it to be able to.

 If that is the case, giving different priorities to the notifier, is
 not a solution.

>>>
>>> Let me try to clarify. The assumption is that the starting CPU can
>>> fail. Additional requirement set by Julien is that panic or BUG_ON is
>>> not acceptable.
>>
>>
>> Please don't twist my word. I never said it was not acceptable to have the
>> BUG_ON in notify_cpu_starting().
>
> I didn't say that either. You referenced notify_cpu_starting() here, not me.
> BTW, if you get the impression that I'm twisting words then we have a
> misunderstanding and your approach is not the best way toward
> clarifying it. Please have that in mind next time, because I do not
> have such an intent and I never will.
>
> I referred to panic/BUG_ON in this scenario regardless of the place
> from which it would be invoked. My understanding from your previous
> answers is that you think the system should not panic/BUG_ON in this
> scenario, because this kind of error the system should be able to
> survive.
>
>>
>> I am going to repeat the content of my answer to your last e-mail:
>>
>> I was aware about it since the beginning. The whole point of the
>> conversation was we should avoid to take the decision at the lower level and
>> let the upper layer decide what to do.
>>
>> If the system is failing today then that's fine and still fit what I said in
>> my first e-mail of that thread. For reminder:
>>
>> "We should really avoid to use panic(...) if this is something the system
>> can survive. In that specific case, it would only affect the current CPU. So
>> it would be better to return an error and let the caller decide what to do."
>>
>> To summarize:
>> 1) Notifiers should only report an error and let the upper caller
>> (here notify_cpu_starting()) deciding what to do.
>> 2) I am OK with the BUG_ON in notify_cpu_starting() for now.
>
> I agree with BUG_ON in notify_cpu_starting().
>
>>
>
> Let me just clarify consequence of your proposal according to my
> understanding. If instead of stopping the CPU when enabling a
> capability fails the notifier returns an error, the error will
> propagate to notify_cpu_starting() and BUG_ON will crash the system.
> The proposal with stop_cpu() in the enable_nonboot_cpu_caps() instead
> of panic that is submitted in this patch would stop only the erroneous
> CPU. The rest of the system will continue to work and I though that is
> what's the goal since we don't want to panic/BUG_ON.
> From that perspective I believe stop_cpu() in
> enable_nonboot_cpu_caps() is better compared to returning an error
> from the notifier.
>
> You said above "I would be ok having stop_cpu called here" and I did
> so (stop_cpu() in enable_nonboot_cpu_caps() instead of panic that
> submitted in this patch).
>
> If you believe my understanding is not correct, if I missed something
> or you have another proposal please let me know.
>

Also, if you just want to convert panic from this patch into print I
don't believe it's a good approach, but I can do that.

> Thanks,
> Mirela
>
>> Cheers,
>>
>> --
>> Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v3 10/10] xen/arm: Enable errata for secondary CPU on hotplug after the boot

2018-05-11 Thread Mirela Simonovic
Hi Julien,

On Fri, May 11, 2018 at 12:54 PM, Julien Grall  wrote:
>
>
> On 11/05/18 11:41, Mirela Simonovic wrote:
>>
>> Hi Dario,
>>
>> On Thu, May 10, 2018 at 6:24 PM, Dario Faggioli 
>> wrote:
>>>
>>> On Thu, 2018-05-10 at 17:49 +0200, Mirela Simonovic wrote:

 Regardless of the fact that the notifier returns an error or not, I
 believe it would be good and safe to set priority and document that
 priority zero would cause racing issue in the scenario I debugged
 today. I'm pretty sure that this discussion would be forgotten soon
 and it really should be documented in code/comment.

>>> I may very well be missing or misunderstanding something, but I
>>> continue to think that the problem here is that CPU_STARTING can't,
>>> right now, fail, while you need it to be able to.
>>>
>>> If that is the case, giving different priorities to the notifier, is
>>> not a solution.
>>>
>>
>> Let me try to clarify. The assumption is that the starting CPU can
>> fail. Additional requirement set by Julien is that panic or BUG_ON is
>> not acceptable.
>
>
> Please don't twist my word. I never said it was not acceptable to have the
> BUG_ON in notify_cpu_starting().

I didn't say that either. You referenced notify_cpu_starting() here, not me.
BTW, if you get the impression that I'm twisting words then we have a
misunderstanding and your approach is not the best way toward
clarifying it. Please have that in mind next time, because I do not
have such an intent and I never will.

I referred to panic/BUG_ON in this scenario regardless of the place
from which it would be invoked. My understanding from your previous
answers is that you think the system should not panic/BUG_ON in this
scenario, because this kind of error the system should be able to
survive.

>
> I am going to repeat the content of my answer to your last e-mail:
>
> I was aware about it since the beginning. The whole point of the
> conversation was we should avoid to take the decision at the lower level and
> let the upper layer decide what to do.
>
> If the system is failing today then that's fine and still fit what I said in
> my first e-mail of that thread. For reminder:
>
> "We should really avoid to use panic(...) if this is something the system
> can survive. In that specific case, it would only affect the current CPU. So
> it would be better to return an error and let the caller decide what to do."
>
> To summarize:
> 1) Notifiers should only report an error and let the upper caller
> (here notify_cpu_starting()) deciding what to do.
> 2) I am OK with the BUG_ON in notify_cpu_starting() for now.

I agree with BUG_ON in notify_cpu_starting().

>

Let me just clarify consequence of your proposal according to my
understanding. If instead of stopping the CPU when enabling a
capability fails the notifier returns an error, the error will
propagate to notify_cpu_starting() and BUG_ON will crash the system.
The proposal with stop_cpu() in the enable_nonboot_cpu_caps() instead
of panic that is submitted in this patch would stop only the erroneous
CPU. The rest of the system will continue to work and I though that is
what's the goal since we don't want to panic/BUG_ON.
From that perspective I believe stop_cpu() in
enable_nonboot_cpu_caps() is better compared to returning an error
from the notifier.

You said above "I would be ok having stop_cpu called here" and I did
so (stop_cpu() in enable_nonboot_cpu_caps() instead of panic that
submitted in this patch).

If you believe my understanding is not correct, if I missed something
or you have another proposal please let me know.

Thanks,
Mirela

> Cheers,
>
> --
> Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v3 10/10] xen/arm: Enable errata for secondary CPU on hotplug after the boot

2018-05-11 Thread Julien Grall



On 11/05/18 11:41, Mirela Simonovic wrote:

Hi Dario,

On Thu, May 10, 2018 at 6:24 PM, Dario Faggioli  wrote:

On Thu, 2018-05-10 at 17:49 +0200, Mirela Simonovic wrote:

Regardless of the fact that the notifier returns an error or not, I
believe it would be good and safe to set priority and document that
priority zero would cause racing issue in the scenario I debugged
today. I'm pretty sure that this discussion would be forgotten soon
and it really should be documented in code/comment.


I may very well be missing or misunderstanding something, but I
continue to think that the problem here is that CPU_STARTING can't,
right now, fail, while you need it to be able to.

If that is the case, giving different priorities to the notifier, is
not a solution.



Let me try to clarify. The assumption is that the starting CPU can
fail. Additional requirement set by Julien is that panic or BUG_ON is
not acceptable.


Please don't twist my word. I never said it was not acceptable to have 
the BUG_ON in notify_cpu_starting().


I am going to repeat the content of my answer to your last e-mail:

I was aware about it since the beginning. The whole point of the 
conversation was we should avoid to take the decision at the lower level 
and let the upper layer decide what to do.


If the system is failing today then that's fine and still fit what I 
said in my first e-mail of that thread. For reminder:


"We should really avoid to use panic(...) if this is something the 
system can survive. In that specific case, it would only affect the 
current CPU. So it would be better to return an error and let the caller 
decide what to do."


To summarize:
	1) Notifiers should only report an error and let the upper caller (here 
notify_cpu_starting()) deciding what to do.

2) I am OK with the BUG_ON in notify_cpu_starting() for now.

Cheers,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v3 10/10] xen/arm: Enable errata for secondary CPU on hotplug after the boot

2018-05-11 Thread Mirela Simonovic
Hi Dario,

On Thu, May 10, 2018 at 6:24 PM, Dario Faggioli  wrote:
> On Thu, 2018-05-10 at 17:49 +0200, Mirela Simonovic wrote:
>> Regardless of the fact that the notifier returns an error or not, I
>> believe it would be good and safe to set priority and document that
>> priority zero would cause racing issue in the scenario I debugged
>> today. I'm pretty sure that this discussion would be forgotten soon
>> and it really should be documented in code/comment.
>>
> I may very well be missing or misunderstanding something, but I
> continue to think that the problem here is that CPU_STARTING can't,
> right now, fail, while you need it to be able to.
>
> If that is the case, giving different priorities to the notifier, is
> not a solution.
>

Let me try to clarify. The assumption is that the starting CPU can
fail. Additional requirement set by Julien is that panic or BUG_ON is
not acceptable. There are 2 ways to deal with this scenario:

1) Ignore and report the error, and let the erroneous CPU become online.
This cannot be done without changing the logic in either scheduler or
notify_cpu_starting(), or I least I don't see how would that be done.
In previous email when I said "escalating this to who knows where" I
did not refer to error escalation but the escalation of the scope of
this series.

2) Stop the erroneous CPU.
Taking this approach requires setting the priority for the notifier.
The key point is that notify_cpu_starting() and scheduler do not have
to be changed. If errata notifier has higher priority than scheduler's
notifier in the case of an error the CPU will not return into
notify_cpu_starting() and it will never execute BUG_ON because it will
be stopped. The rest of the system will continue to function without
that CPU.

> Regards,
> Dario
> --
> <> (Raistlin Majere)
> -
> Dario Faggioli, Ph.D, http://about.me/dario.faggioli
> Software Engineer @ SUSE https://www.suse.com/

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v3 10/10] xen/arm: Enable errata for secondary CPU on hotplug after the boot

2018-05-10 Thread Dario Faggioli
On Thu, 2018-05-10 at 17:49 +0200, Mirela Simonovic wrote:
> Regardless of the fact that the notifier returns an error or not, I
> believe it would be good and safe to set priority and document that
> priority zero would cause racing issue in the scenario I debugged
> today. I'm pretty sure that this discussion would be forgotten soon
> and it really should be documented in code/comment.
> 
I may very well be missing or misunderstanding something, but I
continue to think that the problem here is that CPU_STARTING can't,
right now, fail, while you need it to be able to.

If that is the case, giving different priorities to the notifier, is
not a solution.

Regards,
Dario
-- 
<> (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Software Engineer @ SUSE https://www.suse.com/

signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v3 10/10] xen/arm: Enable errata for secondary CPU on hotplug after the boot

2018-05-10 Thread Dario Faggioli
On Thu, 2018-05-10 at 17:02 +0100, Julien Grall wrote:
> On 10/05/18 16:49, Mirela Simonovic wrote:
> > Regardless of the fact that the notifier returns an error or not, I
> > believe it would be good and safe to set priority and document that
> > priority zero would cause racing issue in the scenario I debugged
> > today. I'm pretty sure that this discussion would be forgotten soon
> > and it really should be documented in code/comment.
> > 
> > In emails above I assumed we'll stop the erroneous CPU. I didn't
> > have
> > a chance to try returning an error until few minutes ago.
> > I tried returning an error from the notifier now and the whole
> > system
> > fails. You realized according to the answer below that this is
> > going
> > to happen.
> 
> I was aware about it since the beginning. The whole point of the 
> conversation was we should avoid to take the decision at the lower
> level 
> and let the upper layer decide what to do.
> 
This makes sense to me.

> > I would rather stop CPU because changing notify_cpu_starting
> > affects
> > x86 as well, I cannot dig into that and it would be really to much
> > for
> > this series. Since you're fine with stopping cpu as well, please
> > lets
> > do that instead of escalating this to who knows where :)
> 
> Also, while I suggest that it could be replaced by stop_cpu() in the 
> common code, I also suggested that notifier_cpu_starting() could
> return 
> an error then the architecture specific code can decide what to do.
> 
> On x86 it would still be a BUG_ON(notifier_cpu_starting()). On Arm
> we 
> can decide what to do. But it is not part of that discussion here.
> 
As just saind in the other email, I don't think this is all it's
necessary to enable CPU_STARTING to fail.

Regards,
Dario
-- 
<> (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Software Engineer @ SUSE https://www.suse.com/

signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v3 10/10] xen/arm: Enable errata for secondary CPU on hotplug after the boot

2018-05-10 Thread Dario Faggioli
On Thu, 2018-05-10 at 16:13 +0100, Julien Grall wrote:
> On 10/05/18 16:00, Mirela Simonovic wrote:
> > > If you add your callback to CPU_UP_PREPARE, instead than to
> > > CPU_STARTING, SCHED_OP(init_pdata) wouldn't be called, without
> > > having
> > > to fiddle with priorities.
> 
> This function will enable capabilities on a given CPU, most of them
> are 
> touching system registers. So it is necessary to add the callback to 
> CPU_STARTING.
> 
Right, I guess that answers my question.

> I always like to understand what I maintain :). Why do you need to 
> change the priority if you just return an error in the notifier?
> 
> At the moment, notify_cpu_starting has a BUG_ON() in it. But ideally
> I 
> would like to either replace that BUG_ON by cpu_stop or just
> returning 
> an error to give a chance of the architecture deciding what to do
> (this 
> does not have to be done today).
> 
The problem is that, currently, once we've reached CPU_STARTING, we
assume that the CPU bringup has gone ok, and things can't fail.

Therefore, the only place when we undo what CPU_STARTING does is during
CPU_DEAD, i.e., during hotunplug/suspend/teardown.

I understand the point of having to run stuff on the CPU that is coming
up, as well as your more general point.

However, I don't know whether allowing CPU_STARTING to fail is the best
way to achieve what you want to achieve, nor whether the BUG_ON in
notify_cpu_starting() is the only issue you'll face trying to do that.

I'd say that, if CPU_STARTING can fail, we need an appropriate rollback
step, i.e., the flow must become something like (but I'd appreciate the
opinion of x86 and core hypervisor maintainers about this):

CPU_UP_PREPARE --> CPU_STARTING xx> CPU_DIDNT_START --> CPU_UP_CANCEL

At which point, e.g., from the scheduler point of view, we can try to
put a call to SCHED_OP(deinit_pdata) in CPU_DIDNT_START, and that would
avoid the problem Mirela is facing, without having to play with
priorities.

Regards,
Dario
-- 
<> (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Software Engineer @ SUSE https://www.suse.com/

signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v3 10/10] xen/arm: Enable errata for secondary CPU on hotplug after the boot

2018-05-10 Thread Julien Grall



On 10/05/18 16:49, Mirela Simonovic wrote:

Hi Julien,


Hi,


On Thu, May 10, 2018 at 5:13 PM, Julien Grall  wrote:



On 10/05/18 16:00, Mirela Simonovic wrote:


Hi Dario,

On Thu, May 10, 2018 at 4:25 PM, Dario Faggioli 
wrote:


On Thu, 2018-05-10 at 15:24 +0200, Mirela Simonovic wrote:


On Thu, May 10, 2018 at 1:57 PM, Mirela Simonovic


Please take a look at function cpu_schedule_callback in schedule.c.
Within switch, case CPU_DEAD doesn't have a break, causing the
bellow
CPU_UP_CANCELED to execute as well when the CPU goes down. This
looks
wrong to me.
Dario, could you please confirm that this is a bug? Otherwise could
you please confirm the reasoning beyond?



Dario sorry, this looked wrong in my scenario but actually it is
correct. I understand the purpose of the missing break now.


No problem.


For the curious ones (if any) here is detailed description: errata
notifier added in this patch had the same priority as scheduler
notifier. I though priority doesn't matter, but I was wrong. In this
particular scenario where a CPU fails to enable capabilities
(triggered by errata notifier added in this patch), scheduler
callback
executed before the errata callback for CPU_STARTING event.


So, you're adding your errata callback to the CPU_STARTING notifier,
right? (Sorry for having to ask, I don't have the patch handy right
now.)


In other
words, scheduler already called init_pdata before the errata callback
fired (and stopped the CPU).
Later on when errata callback fired, enabling of capabilities has
failed, so the erroneous non-boot CPU stopped itself and never
declared to be online.
Then CPU#0 fired new notification with CPU_UP_CANCELED event in order
to clean up for the job done on CPU_STARTING. However, this broke the
assumption (which is good) made in cpu_schedule_callback. The
assumption is that the sequence of steps should be
alloc_pdata-->init_pdata-->deinit_pdata-->free_pdata. In this
particular case deinit_pdata was not done because this would be done
only upon CPU_DEAD event which makes no sense in this scenario.
In order to avoid running into the invalid scenario described above,
the errata callback should fire before the scheduler callback. If
enabling capabilities fails, the scheduler callback for CPU_STARTING
will never execute afterwards, so the following CPU_UP_CANCELED event
triggered by the CPU#0 will do free_pdata, which is ok because
init_pdata was not executed and alloc_pdata-->free_pdata flow is also
valid. Congratulations to the reader who reached this point :)


Ok, but the flow is, AFAICR, CPU_UP_PREPARE->CPU_STARTING->CPU_ONLINE.

If you add your callback to CPU_UP_PREPARE, instead than to
CPU_STARTING, SCHED_OP(init_pdata) wouldn't be called, without having
to fiddle with priorities.


This function will enable capabilities on a given CPU, most of them are
touching system registers. So it is necessary to add the callback to
CPU_STARTING.





Difference between CPU_UP_PREPARE and CPU_STARTING (in addition to the
sequential ordering) is about which CPU executes the callback.
In CPU_UP_PREPARE case the CPU which called cpu_up for another CPU
will execute the callback. If I have 2 CPUs: CPU#0 executes callback
when trying to hotplug CPU#1. I need CPU#1 to execute this callback.
In CPU_STARTING case the CPU#1 will execute the callback, that is the
reason why it has to be CPU_STARTING event.

I tried CPU_UP_PREPARE in my tuned scenario and I needed few moment to
realize why the system died (CPU#0 stopped himself :)


Is there any reason why you can't do it that way? It would look more
natural to me, and it's definitely going to be easier debug and
maintain (e.g., look at how many callbacks CPU_UP_PREPARE has, as
compared to CPU_STARTING ;-P).



Julien is going to maintain this :)))



I always like to understand what I maintain :). Why do you need to change
the priority if you just return an error in the notifier?



Regardless of the fact that the notifier returns an error or not, I
believe it would be good and safe to set priority and document that
priority zero would cause racing issue in the scenario I debugged
today. I'm pretty sure that this discussion would be forgotten soon
and it really should be documented in code/comment.

In emails above I assumed we'll stop the erroneous CPU. I didn't have
a chance to try returning an error until few minutes ago.
I tried returning an error from the notifier now and the whole system
fails. You realized according to the answer below that this is going
to happen.


I was aware about it since the beginning. The whole point of the 
conversation was we should avoid to take the decision at the lower level 
and let the upper layer decide what to do.


If the system is failing today then that's fine and still fit what I 
said in my first e-mail of that thread. For reminder:


"We should really avoid to use panic(...) if this is something the 
system can survive. In that specific case, it 

Re: [Xen-devel] [PATCH v3 10/10] xen/arm: Enable errata for secondary CPU on hotplug after the boot

2018-05-10 Thread Mirela Simonovic
Hi Julien,

On Thu, May 10, 2018 at 5:13 PM, Julien Grall  wrote:
>
>
> On 10/05/18 16:00, Mirela Simonovic wrote:
>>
>> Hi Dario,
>>
>> On Thu, May 10, 2018 at 4:25 PM, Dario Faggioli 
>> wrote:
>>>
>>> On Thu, 2018-05-10 at 15:24 +0200, Mirela Simonovic wrote:

 On Thu, May 10, 2018 at 1:57 PM, Mirela Simonovic

> Please take a look at function cpu_schedule_callback in schedule.c.
> Within switch, case CPU_DEAD doesn't have a break, causing the
> bellow
> CPU_UP_CANCELED to execute as well when the CPU goes down. This
> looks
> wrong to me.
> Dario, could you please confirm that this is a bug? Otherwise could
> you please confirm the reasoning beyond?
>

 Dario sorry, this looked wrong in my scenario but actually it is
 correct. I understand the purpose of the missing break now.

>>> No problem.
>>>
 For the curious ones (if any) here is detailed description: errata
 notifier added in this patch had the same priority as scheduler
 notifier. I though priority doesn't matter, but I was wrong. In this
 particular scenario where a CPU fails to enable capabilities
 (triggered by errata notifier added in this patch), scheduler
 callback
 executed before the errata callback for CPU_STARTING event.

>>> So, you're adding your errata callback to the CPU_STARTING notifier,
>>> right? (Sorry for having to ask, I don't have the patch handy right
>>> now.)
>>>
 In other
 words, scheduler already called init_pdata before the errata callback
 fired (and stopped the CPU).
 Later on when errata callback fired, enabling of capabilities has
 failed, so the erroneous non-boot CPU stopped itself and never
 declared to be online.
 Then CPU#0 fired new notification with CPU_UP_CANCELED event in order
 to clean up for the job done on CPU_STARTING. However, this broke the
 assumption (which is good) made in cpu_schedule_callback. The
 assumption is that the sequence of steps should be
 alloc_pdata-->init_pdata-->deinit_pdata-->free_pdata. In this
 particular case deinit_pdata was not done because this would be done
 only upon CPU_DEAD event which makes no sense in this scenario.
 In order to avoid running into the invalid scenario described above,
 the errata callback should fire before the scheduler callback. If
 enabling capabilities fails, the scheduler callback for CPU_STARTING
 will never execute afterwards, so the following CPU_UP_CANCELED event
 triggered by the CPU#0 will do free_pdata, which is ok because
 init_pdata was not executed and alloc_pdata-->free_pdata flow is also
 valid. Congratulations to the reader who reached this point :)

>>> Ok, but the flow is, AFAICR, CPU_UP_PREPARE->CPU_STARTING->CPU_ONLINE.
>>>
>>> If you add your callback to CPU_UP_PREPARE, instead than to
>>> CPU_STARTING, SCHED_OP(init_pdata) wouldn't be called, without having
>>> to fiddle with priorities.
>
> This function will enable capabilities on a given CPU, most of them are
> touching system registers. So it is necessary to add the callback to
> CPU_STARTING.
>
>>>
>>
>> Difference between CPU_UP_PREPARE and CPU_STARTING (in addition to the
>> sequential ordering) is about which CPU executes the callback.
>> In CPU_UP_PREPARE case the CPU which called cpu_up for another CPU
>> will execute the callback. If I have 2 CPUs: CPU#0 executes callback
>> when trying to hotplug CPU#1. I need CPU#1 to execute this callback.
>> In CPU_STARTING case the CPU#1 will execute the callback, that is the
>> reason why it has to be CPU_STARTING event.
>>
>> I tried CPU_UP_PREPARE in my tuned scenario and I needed few moment to
>> realize why the system died (CPU#0 stopped himself :)
>>
>>> Is there any reason why you can't do it that way? It would look more
>>> natural to me, and it's definitely going to be easier debug and
>>> maintain (e.g., look at how many callbacks CPU_UP_PREPARE has, as
>>> compared to CPU_STARTING ;-P).
>>>
>>
>> Julien is going to maintain this :)))
>
>
> I always like to understand what I maintain :). Why do you need to change
> the priority if you just return an error in the notifier?
>

Regardless of the fact that the notifier returns an error or not, I
believe it would be good and safe to set priority and document that
priority zero would cause racing issue in the scenario I debugged
today. I'm pretty sure that this discussion would be forgotten soon
and it really should be documented in code/comment.

In emails above I assumed we'll stop the erroneous CPU. I didn't have
a chance to try returning an error until few minutes ago.
I tried returning an error from the notifier now and the whole system
fails. You realized according to the answer below that this is going
to happen.

> At the moment, notify_cpu_starting has a BUG_ON() in it. But ideally I would
> like to either replace that BUG_ON by cpu_stop or just 

Re: [Xen-devel] [PATCH v3 10/10] xen/arm: Enable errata for secondary CPU on hotplug after the boot

2018-05-10 Thread Mirela Simonovic
Hi Dario,

On Thu, May 10, 2018 at 4:25 PM, Dario Faggioli  wrote:
> On Thu, 2018-05-10 at 15:24 +0200, Mirela Simonovic wrote:
>> On Thu, May 10, 2018 at 1:57 PM, Mirela Simonovic
>>
>> > Please take a look at function cpu_schedule_callback in schedule.c.
>> > Within switch, case CPU_DEAD doesn't have a break, causing the
>> > bellow
>> > CPU_UP_CANCELED to execute as well when the CPU goes down. This
>> > looks
>> > wrong to me.
>> > Dario, could you please confirm that this is a bug? Otherwise could
>> > you please confirm the reasoning beyond?
>> >
>>
>> Dario sorry, this looked wrong in my scenario but actually it is
>> correct. I understand the purpose of the missing break now.
>>
> No problem.
>
>> For the curious ones (if any) here is detailed description: errata
>> notifier added in this patch had the same priority as scheduler
>> notifier. I though priority doesn't matter, but I was wrong. In this
>> particular scenario where a CPU fails to enable capabilities
>> (triggered by errata notifier added in this patch), scheduler
>> callback
>> executed before the errata callback for CPU_STARTING event.
>>
> So, you're adding your errata callback to the CPU_STARTING notifier,
> right? (Sorry for having to ask, I don't have the patch handy right
> now.)
>
>> In other
>> words, scheduler already called init_pdata before the errata callback
>> fired (and stopped the CPU).
>> Later on when errata callback fired, enabling of capabilities has
>> failed, so the erroneous non-boot CPU stopped itself and never
>> declared to be online.
>> Then CPU#0 fired new notification with CPU_UP_CANCELED event in order
>> to clean up for the job done on CPU_STARTING. However, this broke the
>> assumption (which is good) made in cpu_schedule_callback. The
>> assumption is that the sequence of steps should be
>> alloc_pdata-->init_pdata-->deinit_pdata-->free_pdata. In this
>> particular case deinit_pdata was not done because this would be done
>> only upon CPU_DEAD event which makes no sense in this scenario.
>> In order to avoid running into the invalid scenario described above,
>> the errata callback should fire before the scheduler callback. If
>> enabling capabilities fails, the scheduler callback for CPU_STARTING
>> will never execute afterwards, so the following CPU_UP_CANCELED event
>> triggered by the CPU#0 will do free_pdata, which is ok because
>> init_pdata was not executed and alloc_pdata-->free_pdata flow is also
>> valid. Congratulations to the reader who reached this point :)
>>
> Ok, but the flow is, AFAICR, CPU_UP_PREPARE->CPU_STARTING->CPU_ONLINE.
>
> If you add your callback to CPU_UP_PREPARE, instead than to
> CPU_STARTING, SCHED_OP(init_pdata) wouldn't be called, without having
> to fiddle with priorities.
>

Difference between CPU_UP_PREPARE and CPU_STARTING (in addition to the
sequential ordering) is about which CPU executes the callback.
In CPU_UP_PREPARE case the CPU which called cpu_up for another CPU
will execute the callback. If I have 2 CPUs: CPU#0 executes callback
when trying to hotplug CPU#1. I need CPU#1 to execute this callback.
In CPU_STARTING case the CPU#1 will execute the callback, that is the
reason why it has to be CPU_STARTING event.

I tried CPU_UP_PREPARE in my tuned scenario and I needed few moment to
realize why the system died (CPU#0 stopped himself :)

> Is there any reason why you can't do it that way? It would look more
> natural to me, and it's definitely going to be easier debug and
> maintain (e.g., look at how many callbacks CPU_UP_PREPARE has, as
> compared to CPU_STARTING ;-P).
>

Julien is going to maintain this :)))

> Regards,
> Dario
> --
> <> (Raistlin Majere)
> -
> Dario Faggioli, Ph.D, http://about.me/dario.faggioli
> Software Engineer @ SUSE https://www.suse.com/

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v3 10/10] xen/arm: Enable errata for secondary CPU on hotplug after the boot

2018-05-10 Thread Julien Grall



On 10/05/18 15:12, Mirela Simonovic wrote:

Hi Julien,

On Thu, May 10, 2018 at 3:29 PM, Julien Grall  wrote:

Hi,

On 05/10/2018 02:24 PM, Mirela Simonovic wrote:


On Thu, May 10, 2018 at 1:57 PM, Mirela Simonovic
 wrote:




I have tested the tuned scenario where enabling capabilities fails -
the erroneous CPU is stopped/powered down and the system continues to
work fine without it. Although I still don't understand why indeed I
needed to debug this I'll add the fix in v4.



I don't understand your last sentence. What did you mean?



I still don't understand how can this scenario happen practically at
runtime (a scenario without me tuning the failure for testing).


error path are by nature hard to reach without tweaking the code. 
Imagine something that can only happen once every thousand boot. It is 
not often, but thanks to murphy that will only happen at the worst 
moment. And you know very well how it is a pain to debug :).


So I always tend to tweak the code in order to exercise major error path.


Memory
allocation failure shouldn't be possible at this point and any
workaround related failure would have happened at boot.
I don't see
anything else or my understanding may not be correct...
However, if some new capability would be introduced one day and it
could fail, it would be good if we don't have to go back and fix this
patch.


The memory failure was an example among the other. The whole point here 
is you can't extrapolate how the errata are implemented today for future 
one. I also want to limit the number of stop_cpu/panic over the code, so 
we have only place to make the decision if something is wrong.



At the end, it doesn't matter since the system will be able to
properly deal with such a scenario with the fix.


I would be ok having stop_cpu called here. Although my preference is 
returning an error and let the caller decides what to do.


Cheers,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v3 10/10] xen/arm: Enable errata for secondary CPU on hotplug after the boot

2018-05-10 Thread Dario Faggioli
On Thu, 2018-05-10 at 15:24 +0200, Mirela Simonovic wrote:
> On Thu, May 10, 2018 at 1:57 PM, Mirela Simonovic
> 
> > Please take a look at function cpu_schedule_callback in schedule.c.
> > Within switch, case CPU_DEAD doesn't have a break, causing the
> > bellow
> > CPU_UP_CANCELED to execute as well when the CPU goes down. This
> > looks
> > wrong to me.
> > Dario, could you please confirm that this is a bug? Otherwise could
> > you please confirm the reasoning beyond?
> > 
> 
> Dario sorry, this looked wrong in my scenario but actually it is
> correct. I understand the purpose of the missing break now.
> 
No problem.

> For the curious ones (if any) here is detailed description: errata
> notifier added in this patch had the same priority as scheduler
> notifier. I though priority doesn't matter, but I was wrong. In this
> particular scenario where a CPU fails to enable capabilities
> (triggered by errata notifier added in this patch), scheduler
> callback
> executed before the errata callback for CPU_STARTING event. 
>
So, you're adding your errata callback to the CPU_STARTING notifier,
right? (Sorry for having to ask, I don't have the patch handy right
now.)

> In other
> words, scheduler already called init_pdata before the errata callback
> fired (and stopped the CPU).
> Later on when errata callback fired, enabling of capabilities has
> failed, so the erroneous non-boot CPU stopped itself and never
> declared to be online.
> Then CPU#0 fired new notification with CPU_UP_CANCELED event in order
> to clean up for the job done on CPU_STARTING. However, this broke the
> assumption (which is good) made in cpu_schedule_callback. The
> assumption is that the sequence of steps should be
> alloc_pdata-->init_pdata-->deinit_pdata-->free_pdata. In this
> particular case deinit_pdata was not done because this would be done
> only upon CPU_DEAD event which makes no sense in this scenario.
> In order to avoid running into the invalid scenario described above,
> the errata callback should fire before the scheduler callback. If
> enabling capabilities fails, the scheduler callback for CPU_STARTING
> will never execute afterwards, so the following CPU_UP_CANCELED event
> triggered by the CPU#0 will do free_pdata, which is ok because
> init_pdata was not executed and alloc_pdata-->free_pdata flow is also
> valid. Congratulations to the reader who reached this point :)
> 
Ok, but the flow is, AFAICR, CPU_UP_PREPARE->CPU_STARTING->CPU_ONLINE.

If you add your callback to CPU_UP_PREPARE, instead than to
CPU_STARTING, SCHED_OP(init_pdata) wouldn't be called, without having
to fiddle with priorities.

Is there any reason why you can't do it that way? It would look more
natural to me, and it's definitely going to be easier debug and
maintain (e.g., look at how many callbacks CPU_UP_PREPARE has, as
compared to CPU_STARTING ;-P).

Regards,
Dario
-- 
<> (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Software Engineer @ SUSE https://www.suse.com/

signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v3 10/10] xen/arm: Enable errata for secondary CPU on hotplug after the boot

2018-05-10 Thread Dario Faggioli
On Thu, 2018-05-10 at 13:57 +0200, Mirela Simonovic wrote:
> Hi,
> 
> +Dario
> 
Thanks.

> On Wed, May 9, 2018 at 6:32 PM, Julien Grall 
> wrote:
> > If there is a bug in the scheduler it should be fixed rather trying
> > to
> > workaround with a panic in the code. If you provide more details,
> > we might
> > be able to help here.
> > 
> 
> This flow seems to have several bugs. Lets start from here:
> 
> Please take a look at function cpu_schedule_callback in schedule.c.
> Within switch, case CPU_DEAD doesn't have a break, causing the bellow
> CPU_UP_CANCELED to execute as well when the CPU goes down. This looks
> wrong to me.
> Dario, could you please confirm that this is a bug? 
>
No, that is entirely on purpose (as you can tell from the
"/* Fallthrough */" comment).

> Otherwise could
> you please confirm the reasoning beyond?
> 
Well, the idea is that the CPU_UP_CANCELLED notifier should _only_
invoke cpu_schedule_down(). On the other hand, the CPU_DEAD notifier
should invoke both:
 SCHED_OP(deinit_pdata)
 cpu_schedule_down()

and the fallthrough is the way we achieve that.

On x86, the shutdown/suspend path is as follows:

XENPF_enter_acpi_sleep (platform_hypercall.c)
  acpi_enter_sleep()
continue_hypercall_on_cpu(0, enter_state_helper)
  enter_state()
system_state = SYS_STATE_suspend
disable_nonboot_cpus()
  for_each_online_cpu ( cpu ) { 
cpu_down(cpu) }
  notifier_call_chain(CPU_DOWN_PREPARE)
cpu_callback(CPU_DOWN_PREPARE)
  cpupool_cpu_remove(cpu)
  cpufreq_del_cpu(cpu)
  stop_machine_run(take_cpu_down, cpu)
__cpu_disable();
  notifier_call_chain(CPU_DYING)
...
  cpumask_clear_cpu(cpu, _online_map);
cpu_disable_scheduler(cpu);
  __cpu_die(cpu)
while ( (seen_state = cpu_state) != CPU_STATE_DEAD ) { ... }
notifier_call_chain(CPU_DEAD)
  cpu_schedule_callback(CPU_DEAD)
...
...
cpu_schedule_callback(CPU_DEAD)
  SCHED_OP(deinit_pdata)
  cpu_schedule_down(cpu)
SCHED_OP(free_pdata)
SCHED_OP(free_vdata)


If you see a BUG_ON triggering, please provide details about that, and
we'll try to figure out what's causing it.

Regards,
Dario
-- 
<> (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Software Engineer @ SUSE https://www.suse.com/

signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v3 10/10] xen/arm: Enable errata for secondary CPU on hotplug after the boot

2018-05-10 Thread Julien Grall

Hi,

On 05/10/2018 02:24 PM, Mirela Simonovic wrote:

On Thu, May 10, 2018 at 1:57 PM, Mirela Simonovic
 wrote:



I have tested the tuned scenario where enabling capabilities fails -
the erroneous CPU is stopped/powered down and the system continues to
work fine without it. Although I still don't understand why indeed I
needed to debug this I'll add the fix in v4.


I don't understand your last sentence. What did you mean?

Cheers,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v3 10/10] xen/arm: Enable errata for secondary CPU on hotplug after the boot

2018-05-10 Thread Mirela Simonovic
On Thu, May 10, 2018 at 1:57 PM, Mirela Simonovic
 wrote:
> Hi,
>
> +Dario
>
> On Wed, May 9, 2018 at 6:32 PM, Julien Grall  wrote:
>>
>>
>> On 09/05/18 16:48, Mirela Simonovic wrote:
>>>
>>> Hi Julien,
>>
>>
>> Hi Mirela,
>>
>>
>>> On Mon, Apr 30, 2018 at 6:09 PM, Julien Grall 
>>> wrote:

 Hi Mirela,


 On 27/04/18 18:12, Mirela Simonovic wrote:
>
>
> On boot, enabling errata workarounds will be triggered by the boot CPU
> from start_xen(). On CPU hotplug (non-boot scenario) this would not be
> done. This patch adds the code required to enable errata workarounds
> for a CPU being hotplugged after the system boots. This is triggered
> using a notifier. If the CPU fails to enable the errata Xen will panic.
> This is done because it is assumed that the CPU which is hotplugged
> after the system/Xen boots, was initially hotplugged during the
> system/Xen boot. Therefore, enabling errata workarounds should never
> fail.
>
> Signed-off-by: Mirela Simonovic 
>
> ---
> CC: Stefano Stabellini 
> CC: Julien Grall 
> ---
>xen/arch/arm/cpuerrata.c | 35
> +++
>xen/arch/arm/cpufeature.c| 23 +++
>xen/include/asm-arm/cpufeature.h |  1 +
>3 files changed, 59 insertions(+)
>
> diff --git a/xen/arch/arm/cpuerrata.c b/xen/arch/arm/cpuerrata.c
> index 1baa20654b..4040f781ec 100644
> --- a/xen/arch/arm/cpuerrata.c
> +++ b/xen/arch/arm/cpuerrata.c
> @@ -5,6 +5,8 @@
>#include 
>#include 
>#include 
> +#include 
> +#include 
>#include 
>#include 
>#include 
> @@ -349,6 +351,39 @@ void __init enable_errata_workarounds(void)
>enable_cpu_capabilities(arm_errata);
>}
>+static int cpu_errata_callback(
> +struct notifier_block *nfb, unsigned long action, void *hcpu)
> +{
> +switch ( action )
> +{
> +case CPU_STARTING:
> +enable_nonboot_cpu_caps(arm_errata);
> +break;
> +default:
> +break;
> +}
> +
> +return NOTIFY_DONE;
> +}
> +
> +static struct notifier_block cpu_errata_nfb = {
> +.notifier_call = cpu_errata_callback,
> +};
> +
> +static int __init cpu_errata_notifier_init(void)
> +{
> +register_cpu_notifier(_errata_nfb);
> +return 0;
> +}
> +/*
> + * Initialization has to be done at init rather than presmp_init phase
> because
> + * the callback should execute only after the secondary CPUs are
> initially
> + * booted (in hotplug scenarios when the system state is not boot). On
> boot,
> + * the enabling of errata workarounds will be triggered by the boot CPU
> from
> + * start_xen().
> + */
> +__initcall(cpu_errata_notifier_init);
> +
>/*
> * Local variables:
> * mode: C
> diff --git a/xen/arch/arm/cpufeature.c b/xen/arch/arm/cpufeature.c
> index 525b45e22f..dd30f0d29c 100644
> --- a/xen/arch/arm/cpufeature.c
> +++ b/xen/arch/arm/cpufeature.c
> @@ -68,6 +68,29 @@ void __init enable_cpu_capabilities(const struct
> arm_cpu_capabilities *caps)
>}
>}
>+/* Run through the enabled capabilities and enable() them on the
> calling CPU */
> +void enable_nonboot_cpu_caps(const struct arm_cpu_capabilities *caps)
> +{
> +ASSERT(system_state != SYS_STATE_boot);
> +
> +for ( ; caps->matches; caps++ )
> +{
> +if ( !cpus_have_cap(caps->capability) )
> +continue;
> +
> +if ( caps->enable )
> +{
> +/*
> + * Since the CPU has enabled errata workarounds on boot, it
> should



 This function is not really about errata, it is about capabilities.
 Errata
 is just a sub-category of them.

>>>
>>> I've fixed the comment, thanks.
>>>
> + * never fail to enable them here.
> + */
> +if ( caps->enable((void *)caps) )
> +panic("CPU%u failed to enable capability %u\n",
> +  smp_processor_id(), caps->capability);



 We should really avoid to use panic(...) if this is something the system
 can
 survive. In that specific case, it would only affect the current CPU. So
 it
 would be better to return an error and let the caller decide what to do.

>>>
>>> I need to emphasize two points:
>>> 1) I don't see how is this different compared to PSCI CPU OFF where we
>>> do panic. Essentially, in both cases the system will not be able to
>>> use that CPU and we 

Re: [Xen-devel] [PATCH v3 10/10] xen/arm: Enable errata for secondary CPU on hotplug after the boot

2018-05-10 Thread Mirela Simonovic
Hi,

+Dario

On Wed, May 9, 2018 at 6:32 PM, Julien Grall  wrote:
>
>
> On 09/05/18 16:48, Mirela Simonovic wrote:
>>
>> Hi Julien,
>
>
> Hi Mirela,
>
>
>> On Mon, Apr 30, 2018 at 6:09 PM, Julien Grall 
>> wrote:
>>>
>>> Hi Mirela,
>>>
>>>
>>> On 27/04/18 18:12, Mirela Simonovic wrote:


 On boot, enabling errata workarounds will be triggered by the boot CPU
 from start_xen(). On CPU hotplug (non-boot scenario) this would not be
 done. This patch adds the code required to enable errata workarounds
 for a CPU being hotplugged after the system boots. This is triggered
 using a notifier. If the CPU fails to enable the errata Xen will panic.
 This is done because it is assumed that the CPU which is hotplugged
 after the system/Xen boots, was initially hotplugged during the
 system/Xen boot. Therefore, enabling errata workarounds should never
 fail.

 Signed-off-by: Mirela Simonovic 

 ---
 CC: Stefano Stabellini 
 CC: Julien Grall 
 ---
xen/arch/arm/cpuerrata.c | 35
 +++
xen/arch/arm/cpufeature.c| 23 +++
xen/include/asm-arm/cpufeature.h |  1 +
3 files changed, 59 insertions(+)

 diff --git a/xen/arch/arm/cpuerrata.c b/xen/arch/arm/cpuerrata.c
 index 1baa20654b..4040f781ec 100644
 --- a/xen/arch/arm/cpuerrata.c
 +++ b/xen/arch/arm/cpuerrata.c
 @@ -5,6 +5,8 @@
#include 
#include 
#include 
 +#include 
 +#include 
#include 
#include 
#include 
 @@ -349,6 +351,39 @@ void __init enable_errata_workarounds(void)
enable_cpu_capabilities(arm_errata);
}
+static int cpu_errata_callback(
 +struct notifier_block *nfb, unsigned long action, void *hcpu)
 +{
 +switch ( action )
 +{
 +case CPU_STARTING:
 +enable_nonboot_cpu_caps(arm_errata);
 +break;
 +default:
 +break;
 +}
 +
 +return NOTIFY_DONE;
 +}
 +
 +static struct notifier_block cpu_errata_nfb = {
 +.notifier_call = cpu_errata_callback,
 +};
 +
 +static int __init cpu_errata_notifier_init(void)
 +{
 +register_cpu_notifier(_errata_nfb);
 +return 0;
 +}
 +/*
 + * Initialization has to be done at init rather than presmp_init phase
 because
 + * the callback should execute only after the secondary CPUs are
 initially
 + * booted (in hotplug scenarios when the system state is not boot). On
 boot,
 + * the enabling of errata workarounds will be triggered by the boot CPU
 from
 + * start_xen().
 + */
 +__initcall(cpu_errata_notifier_init);
 +
/*
 * Local variables:
 * mode: C
 diff --git a/xen/arch/arm/cpufeature.c b/xen/arch/arm/cpufeature.c
 index 525b45e22f..dd30f0d29c 100644
 --- a/xen/arch/arm/cpufeature.c
 +++ b/xen/arch/arm/cpufeature.c
 @@ -68,6 +68,29 @@ void __init enable_cpu_capabilities(const struct
 arm_cpu_capabilities *caps)
}
}
+/* Run through the enabled capabilities and enable() them on the
 calling CPU */
 +void enable_nonboot_cpu_caps(const struct arm_cpu_capabilities *caps)
 +{
 +ASSERT(system_state != SYS_STATE_boot);
 +
 +for ( ; caps->matches; caps++ )
 +{
 +if ( !cpus_have_cap(caps->capability) )
 +continue;
 +
 +if ( caps->enable )
 +{
 +/*
 + * Since the CPU has enabled errata workarounds on boot, it
 should
>>>
>>>
>>>
>>> This function is not really about errata, it is about capabilities.
>>> Errata
>>> is just a sub-category of them.
>>>
>>
>> I've fixed the comment, thanks.
>>
 + * never fail to enable them here.
 + */
 +if ( caps->enable((void *)caps) )
 +panic("CPU%u failed to enable capability %u\n",
 +  smp_processor_id(), caps->capability);
>>>
>>>
>>>
>>> We should really avoid to use panic(...) if this is something the system
>>> can
>>> survive. In that specific case, it would only affect the current CPU. So
>>> it
>>> would be better to return an error and let the caller decide what to do.
>>>
>>
>> I need to emphasize two points:
>> 1) I don't see how is this different compared to PSCI CPU OFF where we
>> do panic. Essentially, in both cases the system will not be able to
>> use that CPU and we already agreed that is a good reason to panic.
>
>
> You can't compare PSCI CPU off and the enable callback failing. The *only*
> reason PSCI CPU off can fail is because the Trusted OS is resident on that
> CPU. If that ever happen it is a 

Re: [Xen-devel] [PATCH v3 10/10] xen/arm: Enable errata for secondary CPU on hotplug after the boot

2018-05-09 Thread Mirela Simonovic
Hi Julien,

On Mon, Apr 30, 2018 at 6:09 PM, Julien Grall  wrote:
> Hi Mirela,
>
>
> On 27/04/18 18:12, Mirela Simonovic wrote:
>>
>> On boot, enabling errata workarounds will be triggered by the boot CPU
>> from start_xen(). On CPU hotplug (non-boot scenario) this would not be
>> done. This patch adds the code required to enable errata workarounds
>> for a CPU being hotplugged after the system boots. This is triggered
>> using a notifier. If the CPU fails to enable the errata Xen will panic.
>> This is done because it is assumed that the CPU which is hotplugged
>> after the system/Xen boots, was initially hotplugged during the
>> system/Xen boot. Therefore, enabling errata workarounds should never
>> fail.
>>
>> Signed-off-by: Mirela Simonovic 
>>
>> ---
>> CC: Stefano Stabellini 
>> CC: Julien Grall 
>> ---
>>   xen/arch/arm/cpuerrata.c | 35
>> +++
>>   xen/arch/arm/cpufeature.c| 23 +++
>>   xen/include/asm-arm/cpufeature.h |  1 +
>>   3 files changed, 59 insertions(+)
>>
>> diff --git a/xen/arch/arm/cpuerrata.c b/xen/arch/arm/cpuerrata.c
>> index 1baa20654b..4040f781ec 100644
>> --- a/xen/arch/arm/cpuerrata.c
>> +++ b/xen/arch/arm/cpuerrata.c
>> @@ -5,6 +5,8 @@
>>   #include 
>>   #include 
>>   #include 
>> +#include 
>> +#include 
>>   #include 
>>   #include 
>>   #include 
>> @@ -349,6 +351,39 @@ void __init enable_errata_workarounds(void)
>>   enable_cpu_capabilities(arm_errata);
>>   }
>>   +static int cpu_errata_callback(
>> +struct notifier_block *nfb, unsigned long action, void *hcpu)
>> +{
>> +switch ( action )
>> +{
>> +case CPU_STARTING:
>> +enable_nonboot_cpu_caps(arm_errata);
>> +break;
>> +default:
>> +break;
>> +}
>> +
>> +return NOTIFY_DONE;
>> +}
>> +
>> +static struct notifier_block cpu_errata_nfb = {
>> +.notifier_call = cpu_errata_callback,
>> +};
>> +
>> +static int __init cpu_errata_notifier_init(void)
>> +{
>> +register_cpu_notifier(_errata_nfb);
>> +return 0;
>> +}
>> +/*
>> + * Initialization has to be done at init rather than presmp_init phase
>> because
>> + * the callback should execute only after the secondary CPUs are
>> initially
>> + * booted (in hotplug scenarios when the system state is not boot). On
>> boot,
>> + * the enabling of errata workarounds will be triggered by the boot CPU
>> from
>> + * start_xen().
>> + */
>> +__initcall(cpu_errata_notifier_init);
>> +
>>   /*
>>* Local variables:
>>* mode: C
>> diff --git a/xen/arch/arm/cpufeature.c b/xen/arch/arm/cpufeature.c
>> index 525b45e22f..dd30f0d29c 100644
>> --- a/xen/arch/arm/cpufeature.c
>> +++ b/xen/arch/arm/cpufeature.c
>> @@ -68,6 +68,29 @@ void __init enable_cpu_capabilities(const struct
>> arm_cpu_capabilities *caps)
>>   }
>>   }
>>   +/* Run through the enabled capabilities and enable() them on the
>> calling CPU */
>> +void enable_nonboot_cpu_caps(const struct arm_cpu_capabilities *caps)
>> +{
>> +ASSERT(system_state != SYS_STATE_boot);
>> +
>> +for ( ; caps->matches; caps++ )
>> +{
>> +if ( !cpus_have_cap(caps->capability) )
>> +continue;
>> +
>> +if ( caps->enable )
>> +{
>> +/*
>> + * Since the CPU has enabled errata workarounds on boot, it
>> should
>
>
> This function is not really about errata, it is about capabilities. Errata
> is just a sub-category of them.
>

I've fixed the comment, thanks.

>> + * never fail to enable them here.
>> + */
>> +if ( caps->enable((void *)caps) )
>> +panic("CPU%u failed to enable capability %u\n",
>> +  smp_processor_id(), caps->capability);
>
>
> We should really avoid to use panic(...) if this is something the system can
> survive. In that specific case, it would only affect the current CPU. So it
> would be better to return an error and let the caller decide what to do.
>

I need to emphasize two points:
1) I don't see how is this different compared to PSCI CPU OFF where we
do panic. Essentially, in both cases the system will not be able to
use that CPU and we already agreed that is a good reason to panic.
As oppose to CPU_OFF which wasn't called on boot so we indeed have no
idea whether it will pass on suspend, no matter how unlikely it could
fail, in this scenario we are sure that enabling capability should
pass because it already passed on boot. So if it doesn't pass, which I
consider to be impossible, I believe we should panic.
On the other hand, I understand how would this make a difference on
big.LITTLE where you try to hotplug a CPU that was never booted.
However, that scenario is out of this scope.

2) I still wanted to give a chance to your proposal and just convert
panic into stop_cpu+printing error. The system cannot survive if
enabling a capability fails. In order to 

Re: [Xen-devel] [PATCH v3 10/10] xen/arm: Enable errata for secondary CPU on hotplug after the boot

2018-04-30 Thread Julien Grall

Hi Mirela,

On 27/04/18 18:12, Mirela Simonovic wrote:

On boot, enabling errata workarounds will be triggered by the boot CPU
from start_xen(). On CPU hotplug (non-boot scenario) this would not be
done. This patch adds the code required to enable errata workarounds
for a CPU being hotplugged after the system boots. This is triggered
using a notifier. If the CPU fails to enable the errata Xen will panic.
This is done because it is assumed that the CPU which is hotplugged
after the system/Xen boots, was initially hotplugged during the
system/Xen boot. Therefore, enabling errata workarounds should never
fail.

Signed-off-by: Mirela Simonovic 

---
CC: Stefano Stabellini 
CC: Julien Grall 
---
  xen/arch/arm/cpuerrata.c | 35 +++
  xen/arch/arm/cpufeature.c| 23 +++
  xen/include/asm-arm/cpufeature.h |  1 +
  3 files changed, 59 insertions(+)

diff --git a/xen/arch/arm/cpuerrata.c b/xen/arch/arm/cpuerrata.c
index 1baa20654b..4040f781ec 100644
--- a/xen/arch/arm/cpuerrata.c
+++ b/xen/arch/arm/cpuerrata.c
@@ -5,6 +5,8 @@
  #include 
  #include 
  #include 
+#include 
+#include 
  #include 
  #include 
  #include 
@@ -349,6 +351,39 @@ void __init enable_errata_workarounds(void)
  enable_cpu_capabilities(arm_errata);
  }
  
+static int cpu_errata_callback(

+struct notifier_block *nfb, unsigned long action, void *hcpu)
+{
+switch ( action )
+{
+case CPU_STARTING:
+enable_nonboot_cpu_caps(arm_errata);
+break;
+default:
+break;
+}
+
+return NOTIFY_DONE;
+}
+
+static struct notifier_block cpu_errata_nfb = {
+.notifier_call = cpu_errata_callback,
+};
+
+static int __init cpu_errata_notifier_init(void)
+{
+register_cpu_notifier(_errata_nfb);
+return 0;
+}
+/*
+ * Initialization has to be done at init rather than presmp_init phase because
+ * the callback should execute only after the secondary CPUs are initially
+ * booted (in hotplug scenarios when the system state is not boot). On boot,
+ * the enabling of errata workarounds will be triggered by the boot CPU from
+ * start_xen().
+ */
+__initcall(cpu_errata_notifier_init);
+
  /*
   * Local variables:
   * mode: C
diff --git a/xen/arch/arm/cpufeature.c b/xen/arch/arm/cpufeature.c
index 525b45e22f..dd30f0d29c 100644
--- a/xen/arch/arm/cpufeature.c
+++ b/xen/arch/arm/cpufeature.c
@@ -68,6 +68,29 @@ void __init enable_cpu_capabilities(const struct 
arm_cpu_capabilities *caps)
  }
  }
  
+/* Run through the enabled capabilities and enable() them on the calling CPU */

+void enable_nonboot_cpu_caps(const struct arm_cpu_capabilities *caps)
+{
+ASSERT(system_state != SYS_STATE_boot);
+
+for ( ; caps->matches; caps++ )
+{
+if ( !cpus_have_cap(caps->capability) )
+continue;
+
+if ( caps->enable )
+{
+/*
+ * Since the CPU has enabled errata workarounds on boot, it should


This function is not really about errata, it is about capabilities. 
Errata is just a sub-category of them.



+ * never fail to enable them here.
+ */
+if ( caps->enable((void *)caps) )
+panic("CPU%u failed to enable capability %u\n",
+  smp_processor_id(), caps->capability);


We should really avoid to use panic(...) if this is something the system 
can survive. In that specific case, it would only affect the current 
CPU. So it would be better to return an error and let the caller decide 
what to do.


Cheers,


+}
+}
+}
+
  /*
   * Local variables:
   * mode: C
diff --git a/xen/include/asm-arm/cpufeature.h b/xen/include/asm-arm/cpufeature.h
index e557a095af..b14e226401 100644
--- a/xen/include/asm-arm/cpufeature.h
+++ b/xen/include/asm-arm/cpufeature.h
@@ -88,6 +88,7 @@ void update_cpu_capabilities(const struct 
arm_cpu_capabilities *caps,
   const char *info);
  
  void enable_cpu_capabilities(const struct arm_cpu_capabilities *caps);

+void enable_nonboot_cpu_caps(const struct arm_cpu_capabilities *caps);
  
  #endif /* __ASSEMBLY__ */
  



Cheers,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v3 10/10] xen/arm: Enable errata for secondary CPU on hotplug after the boot

2018-04-27 Thread Mirela Simonovic
On boot, enabling errata workarounds will be triggered by the boot CPU
from start_xen(). On CPU hotplug (non-boot scenario) this would not be
done. This patch adds the code required to enable errata workarounds
for a CPU being hotplugged after the system boots. This is triggered
using a notifier. If the CPU fails to enable the errata Xen will panic.
This is done because it is assumed that the CPU which is hotplugged
after the system/Xen boots, was initially hotplugged during the
system/Xen boot. Therefore, enabling errata workarounds should never
fail.

Signed-off-by: Mirela Simonovic 

---
CC: Stefano Stabellini 
CC: Julien Grall 
---
 xen/arch/arm/cpuerrata.c | 35 +++
 xen/arch/arm/cpufeature.c| 23 +++
 xen/include/asm-arm/cpufeature.h |  1 +
 3 files changed, 59 insertions(+)

diff --git a/xen/arch/arm/cpuerrata.c b/xen/arch/arm/cpuerrata.c
index 1baa20654b..4040f781ec 100644
--- a/xen/arch/arm/cpuerrata.c
+++ b/xen/arch/arm/cpuerrata.c
@@ -5,6 +5,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 #include 
 #include 
 #include 
@@ -349,6 +351,39 @@ void __init enable_errata_workarounds(void)
 enable_cpu_capabilities(arm_errata);
 }
 
+static int cpu_errata_callback(
+struct notifier_block *nfb, unsigned long action, void *hcpu)
+{
+switch ( action )
+{
+case CPU_STARTING:
+enable_nonboot_cpu_caps(arm_errata);
+break;
+default:
+break;
+}
+
+return NOTIFY_DONE;
+}
+
+static struct notifier_block cpu_errata_nfb = {
+.notifier_call = cpu_errata_callback,
+};
+
+static int __init cpu_errata_notifier_init(void)
+{
+register_cpu_notifier(_errata_nfb);
+return 0;
+}
+/*
+ * Initialization has to be done at init rather than presmp_init phase because
+ * the callback should execute only after the secondary CPUs are initially
+ * booted (in hotplug scenarios when the system state is not boot). On boot,
+ * the enabling of errata workarounds will be triggered by the boot CPU from
+ * start_xen().
+ */
+__initcall(cpu_errata_notifier_init);
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/arch/arm/cpufeature.c b/xen/arch/arm/cpufeature.c
index 525b45e22f..dd30f0d29c 100644
--- a/xen/arch/arm/cpufeature.c
+++ b/xen/arch/arm/cpufeature.c
@@ -68,6 +68,29 @@ void __init enable_cpu_capabilities(const struct 
arm_cpu_capabilities *caps)
 }
 }
 
+/* Run through the enabled capabilities and enable() them on the calling CPU */
+void enable_nonboot_cpu_caps(const struct arm_cpu_capabilities *caps)
+{
+ASSERT(system_state != SYS_STATE_boot);
+
+for ( ; caps->matches; caps++ )
+{
+if ( !cpus_have_cap(caps->capability) )
+continue;
+
+if ( caps->enable )
+{
+/*
+ * Since the CPU has enabled errata workarounds on boot, it should
+ * never fail to enable them here.
+ */
+if ( caps->enable((void *)caps) )
+panic("CPU%u failed to enable capability %u\n",
+  smp_processor_id(), caps->capability);
+}
+}
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/include/asm-arm/cpufeature.h b/xen/include/asm-arm/cpufeature.h
index e557a095af..b14e226401 100644
--- a/xen/include/asm-arm/cpufeature.h
+++ b/xen/include/asm-arm/cpufeature.h
@@ -88,6 +88,7 @@ void update_cpu_capabilities(const struct 
arm_cpu_capabilities *caps,
  const char *info);
 
 void enable_cpu_capabilities(const struct arm_cpu_capabilities *caps);
+void enable_nonboot_cpu_caps(const struct arm_cpu_capabilities *caps);
 
 #endif /* __ASSEMBLY__ */
 
-- 
2.13.0


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel