Re: [Xen-devel] [PATCH v3 10/10] xen/arm: Enable errata for secondary CPU on hotplug after the boot
Hi Stefano, On Mon, May 14, 2018 at 6:59 PM, Stefano Stabelliniwrote: > On Mon, 14 May 2018, Julien Grall wrote: >> On 11/05/18 22:47, Stefano Stabellini wrote: >> > On Fri, 11 May 2018, Dario Faggioli wrote: >> > > On Fri, 2018-05-11 at 14:08 +0100, Julien Grall wrote: >> > > > The whole idea here is we have only one place taking the decision and >> > > > we >> > > > don't spread BUG_ON()/panic/stop_cpu everywhere. The benefit is >> > > > having >> > > > only one place to fix over multiple one because very likely the >> > > > decision >> > > > is the same everywhere. >> > > > >> > > > I agree that today it will end up to crashing the system because of >> > > > the >> > > > BUG_ON. But that's a separate topic. >> > > > >> > > Yes!!! :-D >> > > >> > > I.e., as I've said countless times, I would think that a series which >> > > introduces a CPU_STARTING notifier that fails, should also deal with >> > > adjusting the CPU process accordingly. >> > > >> > > *BUT* if you ARM people are ok with arch/arm/ code that does that, >> > > perhaps with a comment saying something like: >> > > >> > > "This will cause us to hit the BUG_ON() in notify_cpu_starting(). To >> > > fix that, we need to properly change the CPU bringup code, which will >> > > happen in a leter series." >> > > >> > > that would also work, I guess. :-) >> > >> > Yes, I think that returning error with an in-code comment on top is the >> > best solution. >> >> It is the second best solution ;). If we consider the notifier can return an >> error, then best solution is to fix notify_cpu_starting(). >> >> I would be ok with the second best solution if we have someone to fix it for >> Xen 4.12. Per my understanding, Mirela is not going to do it. So what's the >> plan here? > > I can look at fixing notify_cpu_starting(). I am also OK with you > reworking the vmap code as you suggested below. Regardless, I think > Mirela should go ahead with the comment now. Then, either you or me are > going to come in and remove the comment one way or another (either > fixing notify_cpu_starting or imposing all the callbacks to never return > an error). > Thanks, I submitted v4. Regards, Mirela > >> Another solution is to impose all the enable callbacks to never return an >> error (AFAICT Linux is just ignoring the return of the callback)). >> >> Today, we happen to return an error only in the case vmap is failing (used to >> remapped vector table read-write). It might be possible to avoid the >> potential >> re-mapping failure by reworking the code. >> >> I could explore that solution if we prefer going towards imposing all the >> enable callbacks to never return an error. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v3 10/10] xen/arm: Enable errata for secondary CPU on hotplug after the boot
On Mon, 14 May 2018, Julien Grall wrote: > On 11/05/18 22:47, Stefano Stabellini wrote: > > On Fri, 11 May 2018, Dario Faggioli wrote: > > > On Fri, 2018-05-11 at 14:08 +0100, Julien Grall wrote: > > > > The whole idea here is we have only one place taking the decision and > > > > we > > > > don't spread BUG_ON()/panic/stop_cpu everywhere. The benefit is > > > > having > > > > only one place to fix over multiple one because very likely the > > > > decision > > > > is the same everywhere. > > > > > > > > I agree that today it will end up to crashing the system because of > > > > the > > > > BUG_ON. But that's a separate topic. > > > > > > > Yes!!! :-D > > > > > > I.e., as I've said countless times, I would think that a series which > > > introduces a CPU_STARTING notifier that fails, should also deal with > > > adjusting the CPU process accordingly. > > > > > > *BUT* if you ARM people are ok with arch/arm/ code that does that, > > > perhaps with a comment saying something like: > > > > > > "This will cause us to hit the BUG_ON() in notify_cpu_starting(). To > > > fix that, we need to properly change the CPU bringup code, which will > > > happen in a leter series." > > > > > > that would also work, I guess. :-) > > > > Yes, I think that returning error with an in-code comment on top is the > > best solution. > > It is the second best solution ;). If we consider the notifier can return an > error, then best solution is to fix notify_cpu_starting(). > > I would be ok with the second best solution if we have someone to fix it for > Xen 4.12. Per my understanding, Mirela is not going to do it. So what's the > plan here? I can look at fixing notify_cpu_starting(). I am also OK with you reworking the vmap code as you suggested below. Regardless, I think Mirela should go ahead with the comment now. Then, either you or me are going to come in and remove the comment one way or another (either fixing notify_cpu_starting or imposing all the callbacks to never return an error). > Another solution is to impose all the enable callbacks to never return an > error (AFAICT Linux is just ignoring the return of the callback)). > > Today, we happen to return an error only in the case vmap is failing (used to > remapped vector table read-write). It might be possible to avoid the potential > re-mapping failure by reworking the code. > > I could explore that solution if we prefer going towards imposing all the > enable callbacks to never return an error. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v3 10/10] xen/arm: Enable errata for secondary CPU on hotplug after the boot
On 11/05/18 22:47, Stefano Stabellini wrote: On Fri, 11 May 2018, Dario Faggioli wrote: On Fri, 2018-05-11 at 14:08 +0100, Julien Grall wrote: The whole idea here is we have only one place taking the decision and we don't spread BUG_ON()/panic/stop_cpu everywhere. The benefit is having only one place to fix over multiple one because very likely the decision is the same everywhere. I agree that today it will end up to crashing the system because of the BUG_ON. But that's a separate topic. Yes!!! :-D I.e., as I've said countless times, I would think that a series which introduces a CPU_STARTING notifier that fails, should also deal with adjusting the CPU process accordingly. *BUT* if you ARM people are ok with arch/arm/ code that does that, perhaps with a comment saying something like: "This will cause us to hit the BUG_ON() in notify_cpu_starting(). To fix that, we need to properly change the CPU bringup code, which will happen in a leter series." that would also work, I guess. :-) Yes, I think that returning error with an in-code comment on top is the best solution. It is the second best solution ;). If we consider the notifier can return an error, then best solution is to fix notify_cpu_starting(). I would be ok with the second best solution if we have someone to fix it for Xen 4.12. Per my understanding, Mirela is not going to do it. So what's the plan here? Another solution is to impose all the enable callbacks to never return an error (AFAICT Linux is just ignoring the return of the callback)). Today, we happen to return an error only in the case vmap is failing (used to remapped vector table read-write). It might be possible to avoid the potential re-mapping failure by reworking the code. I could explore that solution if we prefer going towards imposing all the enable callbacks to never return an error. Cheers, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v3 10/10] xen/arm: Enable errata for secondary CPU on hotplug after the boot
On Fri, 11 May 2018, Dario Faggioli wrote: > On Fri, 2018-05-11 at 14:08 +0100, Julien Grall wrote: > > The whole idea here is we have only one place taking the decision and > > we > > don't spread BUG_ON()/panic/stop_cpu everywhere. The benefit is > > having > > only one place to fix over multiple one because very likely the > > decision > > is the same everywhere. > > > > I agree that today it will end up to crashing the system because of > > the > > BUG_ON. But that's a separate topic. > > > Yes!!! :-D > > I.e., as I've said countless times, I would think that a series which > introduces a CPU_STARTING notifier that fails, should also deal with > adjusting the CPU process accordingly. > > *BUT* if you ARM people are ok with arch/arm/ code that does that, > perhaps with a comment saying something like: > > "This will cause us to hit the BUG_ON() in notify_cpu_starting(). To > fix that, we need to properly change the CPU bringup code, which will > happen in a leter series." > > that would also work, I guess. :-) Yes, I think that returning error with an in-code comment on top is the best solution. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v3 10/10] xen/arm: Enable errata for secondary CPU on hotplug after the boot
On Fri, 2018-05-11 at 15:44 +0200, Mirela Simonovic wrote: > Hi Dario and Julien, > Err... you've dropped the list and everyone else but me. Re-adding... > Thanks for the feedback to both. > You're welcome. :-) > I think we need to roll back here. I > believe the root cause of this is an attempt to do errata workarounds > using notifiers. > But let me try to enumerate all the options I see as possible > solutions: > > 1) Don't use notifiers to do errata workaround. Do it before > CPU_STARTING fires, essentially from start_secondary() before calling > notify_cpu_starting(). But we need to stop the CPU within > start_secondary() if enabling errata fails. In start_secondary() > stop_cpu is already done so I don't see why would an additional call > be a problem. > I'm no expert of ARM's start_secondary(). Indeed it looks like it can fail already, so what you say seems to make sense, but really, I better don't comment on this and leave it to ARM people. > 2) Still try to use notifiers. We have few options here: > 2.a) Enabling capability must not fail because a notifier at this > stage should not fail. This would mean that function to which > 'enable' > ptr points (defined in struct arm_cpu_capabilities) has to return > void > instead of int. This doesn't seem right to me. > It's not. > 2.b) Change scheduler and whatever else is needed (right place to > refer to escalation of the scope of series :) in order to make > CPU_STARTING possible to fail. > Nope, this is just as wrong as 2.a. > I'm not the person to do that since it > affects way beyond what I suppose to do. Please note here that I'm > not > running away from doing the job. I'm just concerned that this will > compromise the actual work I suppose to do from the funding/time > perspective. > 2.c) Return an error and hit BUG_ON. Add comments as Dario proposed. > I > need to state here that I probably won't be the one to implement the > following series that allows CPU_STARTING to fail. > Yes, this is correct, from the point of view of ARM vs common code interaction. Whether or not it is ok to leave this pending, is again up to ARM people, IMO, as it would be ARM that risks panicing, if the notifier fails. > Option 1) or 2.c) sounds like a good compromise to me. What do you > think? > Well, all I can say is that you should stay away from notifiers priority --especially of the one of cpu_schedule_callback(). As long as you do that, I won't be on your way. :-D Regards, Dario -- <> (Raistlin Majere) - Dario Faggioli, Ph.D, http://about.me/dario.faggioli Software Engineer @ SUSE https://www.suse.com/ signature.asc Description: This is a digitally signed message part ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v3 10/10] xen/arm: Enable errata for secondary CPU on hotplug after the boot
On Fri, 2018-05-11 at 14:08 +0100, Julien Grall wrote: > The whole idea here is we have only one place taking the decision and > we > don't spread BUG_ON()/panic/stop_cpu everywhere. The benefit is > having > only one place to fix over multiple one because very likely the > decision > is the same everywhere. > > I agree that today it will end up to crashing the system because of > the > BUG_ON. But that's a separate topic. > Yes!!! :-D I.e., as I've said countless times, I would think that a series which introduces a CPU_STARTING notifier that fails, should also deal with adjusting the CPU process accordingly. *BUT* if you ARM people are ok with arch/arm/ code that does that, perhaps with a comment saying something like: "This will cause us to hit the BUG_ON() in notify_cpu_starting(). To fix that, we need to properly change the CPU bringup code, which will happen in a leter series." that would also work, I guess. :-) Regards, Dario -- <> (Raistlin Majere) - Dario Faggioli, Ph.D, http://about.me/dario.faggioli Software Engineer @ SUSE https://www.suse.com/ signature.asc Description: This is a digitally signed message part ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v3 10/10] xen/arm: Enable errata for secondary CPU on hotplug after the boot
On Fri, 2018-05-11 at 11:54 +0100, Julien Grall wrote: > On 11/05/18 11:41, Mirela Simonovic wrote: > "We should really avoid to use panic(...) if this is something the > system can survive. In that specific case, it would only affect the > current CPU. So it would be better to return an error and let the > caller > decide what to do." > > To summarize: > 1) Notifiers should only report an error and let the upper > caller (here > notify_cpu_starting()) deciding what to do. > 2) I am OK with the BUG_ON in notify_cpu_starting() for now. > And, in general, I agree with all this. However (and I think I'm repeating this concept for, what, the 10th time now?!?! :-P), in this specific case, the reason why none of the existing CPU_STARTING callbacks report an error, is that the CPU bringup process is, AFAICT, built around the assumption that once we reached CPU_STARTING, things can't fail. For this very reason, whatever new CPU_STARTING callback is introduced, in this series or everywhere else, _can't_ fail, and _can't_ return any error. If we need to have a CPU_STARTING callback that can fail, we need to change the CPU bringup process accordingly. Regards, Dario -- <> (Raistlin Majere) - Dario Faggioli, Ph.D, http://about.me/dario.faggioli Software Engineer @ SUSE https://www.suse.com/ signature.asc Description: This is a digitally signed message part ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v3 10/10] xen/arm: Enable errata for secondary CPU on hotplug after the boot
On 11/05/18 13:20, Mirela Simonovic wrote: Hi, On Fri, May 11, 2018 at 2:07 PM, Mirela Simonovicwrote: On Fri, May 11, 2018 at 12:54 PM, Julien Grall wrote: On 11/05/18 11:41, Mirela Simonovic wrote: On Thu, May 10, 2018 at 6:24 PM, Dario Faggioli wrote: On Thu, 2018-05-10 at 17:49 +0200, Mirela Simonovic wrote: I am going to repeat the content of my answer to your last e-mail: I was aware about it since the beginning. The whole point of the conversation was we should avoid to take the decision at the lower level and let the upper layer decide what to do. If the system is failing today then that's fine and still fit what I said in my first e-mail of that thread. For reminder: "We should really avoid to use panic(...) if this is something the system can survive. In that specific case, it would only affect the current CPU. So it would be better to return an error and let the caller decide what to do." To summarize: 1) Notifiers should only report an error and let the upper caller (here notify_cpu_starting()) deciding what to do. 2) I am OK with the BUG_ON in notify_cpu_starting() for now. I agree with BUG_ON in notify_cpu_starting(). Let me just clarify consequence of your proposal according to my understanding. If instead of stopping the CPU when enabling a capability fails the notifier returns an error, the error will propagate to notify_cpu_starting() and BUG_ON will crash the system. The proposal with stop_cpu() in the enable_nonboot_cpu_caps() instead of panic that is submitted in this patch would stop only the erroneous CPU. The rest of the system will continue to work and I though that is what's the goal since we don't want to panic/BUG_ON. From that perspective I believe stop_cpu() in enable_nonboot_cpu_caps() is better compared to returning an error from the notifier. You said above "I would be ok having stop_cpu called here" and I did so (stop_cpu() in enable_nonboot_cpu_caps() instead of panic that submitted in this patch). My thoughts have evolved after Dario's discussion. He expressed concerned over your fix to make stop_cpu() working. As you said I will maintain that code and this solution looks very error prone. If Stefano is happy with it, then fine. If you believe my understanding is not correct, if I missed something or you have another proposal please let me know. Also, if you just want to convert panic from this patch into print I don't believe it's a good approach, but I can do that. I would prefer to see the notifier reporting the error with a warning and returning it. At the notifier level it does not make sense to take the decision to stop the CPU or kill the system. This is a decision that should be taken at higher level such as in notify_cpu_starting(). The whole idea here is we have only one place taking the decision and we don't spread BUG_ON()/panic/stop_cpu everywhere. The benefit is having only one place to fix over multiple one because very likely the decision is the same everywhere. I agree that today it will end up to crashing the system because of the BUG_ON. But that's a separate topic. Cheers, Thanks, Mirela Cheers, -- Julien Grall -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v3 10/10] xen/arm: Enable errata for secondary CPU on hotplug after the boot
On Fri, 2018-05-11 at 12:41 +0200, Mirela Simonovic wrote: > Hi Dario, > Hi, > On Thu, May 10, 2018 at 6:24 PM, Dario Faggioli> wrote: > > I may very well be missing or misunderstanding something, but I > > continue to think that the problem here is that CPU_STARTING can't, > > right now, fail, while you need it to be able to. > > > > If that is the case, giving different priorities to the notifier, > > is > > not a solution. > > > Let me try to clarify. The assumption is that the starting CPU can > fail. Additional requirement set by Julien is that panic or BUG_ON is > not acceptable. > So, currently, in Xen, CPU bringup can't fail at the CPU_STARTING phase. This is the point of the BUG_ON(). It is you that need it to change this, and make it possible for CPU bringup to fail during CPU_STARTING. This is fine, but require changes, which, IMO, are not limited to removing the BUG_ON() or trading it with something else. > There are 2 ways to deal with this scenario: > > 1) Ignore and report the error, and let the erroneous CPU become > online. > This cannot be done without changing the logic in either scheduler or > notify_cpu_starting(), or I least I don't see how would that be done. > In previous email when I said "escalating this to who knows where" I > did not refer to error escalation but the escalation of the scope of > this series. > How can that be an option? If CPU bringup failed, how is it possible to let it go online? To me, it's not that "we can't let a CPU for which bringup failed continue without changing the scheduler or notify_cpu_starting()". It's rather "we must not a CPU for which bringup faile continue. Period.". Which is to say, you need to change things (in common code!) in such a way that CPU bringup can fail during the CPU_STARTING phase. > 2) Stop the erroneous CPU. > Taking this approach requires setting the priority for the notifier. > No! Stop the CPU if the bringup fails duting the CPU_STARTING phase does not require setting the priorities of the CPU_STARTING notifier. It requires to changing things (in common code) in such a way that CPU bringup can fail during the CPU_STARTING phase. (Did I say that already? :-) ) I understand that, if you set the priority, your series work. But that does not mean that it is a proper solution to the problem. It means that it is an hack that...well... makes your series work. :-) > The key point is that notify_cpu_starting() and scheduler do not have > to be changed. If errata notifier has higher priority than > scheduler's > notifier in the case of an error the CPU will not return into > notify_cpu_starting() and it will never execute BUG_ON because it > will > be stopped. The rest of the system will continue to function without > that CPU. > Right. But now we have and architecture (ARM) for which CPU bringup can fail at the CPU_STARTING phase. And yet, looking at common code, we see that the CPU_STARTING notifier has a BUG_ON() if it ever fails. How would people looking at this in 2 years time from now make sense of this? No, I don't think there really is a sensible workaround here. If you need the CPU_STARTING phase of CPU bringup to be able to fail, you need to make all the changes required to make that happen properly. Which does involve changing common code, and which should therefore be discussed with the other arches and core hypervisor maintainers. Regards, Dario -- <> (Raistlin Majere) - Dario Faggioli, Ph.D, http://about.me/dario.faggioli Software Engineer @ SUSE https://www.suse.com/ signature.asc Description: This is a digitally signed message part ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v3 10/10] xen/arm: Enable errata for secondary CPU on hotplug after the boot
Hi, On Fri, May 11, 2018 at 2:07 PM, Mirela Simonovicwrote: > Hi Julien, > > On Fri, May 11, 2018 at 12:54 PM, Julien Grall wrote: >> >> >> On 11/05/18 11:41, Mirela Simonovic wrote: >>> >>> Hi Dario, >>> >>> On Thu, May 10, 2018 at 6:24 PM, Dario Faggioli >>> wrote: On Thu, 2018-05-10 at 17:49 +0200, Mirela Simonovic wrote: > > Regardless of the fact that the notifier returns an error or not, I > believe it would be good and safe to set priority and document that > priority zero would cause racing issue in the scenario I debugged > today. I'm pretty sure that this discussion would be forgotten soon > and it really should be documented in code/comment. > I may very well be missing or misunderstanding something, but I continue to think that the problem here is that CPU_STARTING can't, right now, fail, while you need it to be able to. If that is the case, giving different priorities to the notifier, is not a solution. >>> >>> Let me try to clarify. The assumption is that the starting CPU can >>> fail. Additional requirement set by Julien is that panic or BUG_ON is >>> not acceptable. >> >> >> Please don't twist my word. I never said it was not acceptable to have the >> BUG_ON in notify_cpu_starting(). > > I didn't say that either. You referenced notify_cpu_starting() here, not me. > BTW, if you get the impression that I'm twisting words then we have a > misunderstanding and your approach is not the best way toward > clarifying it. Please have that in mind next time, because I do not > have such an intent and I never will. > > I referred to panic/BUG_ON in this scenario regardless of the place > from which it would be invoked. My understanding from your previous > answers is that you think the system should not panic/BUG_ON in this > scenario, because this kind of error the system should be able to > survive. > >> >> I am going to repeat the content of my answer to your last e-mail: >> >> I was aware about it since the beginning. The whole point of the >> conversation was we should avoid to take the decision at the lower level and >> let the upper layer decide what to do. >> >> If the system is failing today then that's fine and still fit what I said in >> my first e-mail of that thread. For reminder: >> >> "We should really avoid to use panic(...) if this is something the system >> can survive. In that specific case, it would only affect the current CPU. So >> it would be better to return an error and let the caller decide what to do." >> >> To summarize: >> 1) Notifiers should only report an error and let the upper caller >> (here notify_cpu_starting()) deciding what to do. >> 2) I am OK with the BUG_ON in notify_cpu_starting() for now. > > I agree with BUG_ON in notify_cpu_starting(). > >> > > Let me just clarify consequence of your proposal according to my > understanding. If instead of stopping the CPU when enabling a > capability fails the notifier returns an error, the error will > propagate to notify_cpu_starting() and BUG_ON will crash the system. > The proposal with stop_cpu() in the enable_nonboot_cpu_caps() instead > of panic that is submitted in this patch would stop only the erroneous > CPU. The rest of the system will continue to work and I though that is > what's the goal since we don't want to panic/BUG_ON. > From that perspective I believe stop_cpu() in > enable_nonboot_cpu_caps() is better compared to returning an error > from the notifier. > > You said above "I would be ok having stop_cpu called here" and I did > so (stop_cpu() in enable_nonboot_cpu_caps() instead of panic that > submitted in this patch). > > If you believe my understanding is not correct, if I missed something > or you have another proposal please let me know. > Also, if you just want to convert panic from this patch into print I don't believe it's a good approach, but I can do that. > Thanks, > Mirela > >> Cheers, >> >> -- >> Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v3 10/10] xen/arm: Enable errata for secondary CPU on hotplug after the boot
Hi Julien, On Fri, May 11, 2018 at 12:54 PM, Julien Grallwrote: > > > On 11/05/18 11:41, Mirela Simonovic wrote: >> >> Hi Dario, >> >> On Thu, May 10, 2018 at 6:24 PM, Dario Faggioli >> wrote: >>> >>> On Thu, 2018-05-10 at 17:49 +0200, Mirela Simonovic wrote: Regardless of the fact that the notifier returns an error or not, I believe it would be good and safe to set priority and document that priority zero would cause racing issue in the scenario I debugged today. I'm pretty sure that this discussion would be forgotten soon and it really should be documented in code/comment. >>> I may very well be missing or misunderstanding something, but I >>> continue to think that the problem here is that CPU_STARTING can't, >>> right now, fail, while you need it to be able to. >>> >>> If that is the case, giving different priorities to the notifier, is >>> not a solution. >>> >> >> Let me try to clarify. The assumption is that the starting CPU can >> fail. Additional requirement set by Julien is that panic or BUG_ON is >> not acceptable. > > > Please don't twist my word. I never said it was not acceptable to have the > BUG_ON in notify_cpu_starting(). I didn't say that either. You referenced notify_cpu_starting() here, not me. BTW, if you get the impression that I'm twisting words then we have a misunderstanding and your approach is not the best way toward clarifying it. Please have that in mind next time, because I do not have such an intent and I never will. I referred to panic/BUG_ON in this scenario regardless of the place from which it would be invoked. My understanding from your previous answers is that you think the system should not panic/BUG_ON in this scenario, because this kind of error the system should be able to survive. > > I am going to repeat the content of my answer to your last e-mail: > > I was aware about it since the beginning. The whole point of the > conversation was we should avoid to take the decision at the lower level and > let the upper layer decide what to do. > > If the system is failing today then that's fine and still fit what I said in > my first e-mail of that thread. For reminder: > > "We should really avoid to use panic(...) if this is something the system > can survive. In that specific case, it would only affect the current CPU. So > it would be better to return an error and let the caller decide what to do." > > To summarize: > 1) Notifiers should only report an error and let the upper caller > (here notify_cpu_starting()) deciding what to do. > 2) I am OK with the BUG_ON in notify_cpu_starting() for now. I agree with BUG_ON in notify_cpu_starting(). > Let me just clarify consequence of your proposal according to my understanding. If instead of stopping the CPU when enabling a capability fails the notifier returns an error, the error will propagate to notify_cpu_starting() and BUG_ON will crash the system. The proposal with stop_cpu() in the enable_nonboot_cpu_caps() instead of panic that is submitted in this patch would stop only the erroneous CPU. The rest of the system will continue to work and I though that is what's the goal since we don't want to panic/BUG_ON. From that perspective I believe stop_cpu() in enable_nonboot_cpu_caps() is better compared to returning an error from the notifier. You said above "I would be ok having stop_cpu called here" and I did so (stop_cpu() in enable_nonboot_cpu_caps() instead of panic that submitted in this patch). If you believe my understanding is not correct, if I missed something or you have another proposal please let me know. Thanks, Mirela > Cheers, > > -- > Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v3 10/10] xen/arm: Enable errata for secondary CPU on hotplug after the boot
On 11/05/18 11:41, Mirela Simonovic wrote: Hi Dario, On Thu, May 10, 2018 at 6:24 PM, Dario Faggioliwrote: On Thu, 2018-05-10 at 17:49 +0200, Mirela Simonovic wrote: Regardless of the fact that the notifier returns an error or not, I believe it would be good and safe to set priority and document that priority zero would cause racing issue in the scenario I debugged today. I'm pretty sure that this discussion would be forgotten soon and it really should be documented in code/comment. I may very well be missing or misunderstanding something, but I continue to think that the problem here is that CPU_STARTING can't, right now, fail, while you need it to be able to. If that is the case, giving different priorities to the notifier, is not a solution. Let me try to clarify. The assumption is that the starting CPU can fail. Additional requirement set by Julien is that panic or BUG_ON is not acceptable. Please don't twist my word. I never said it was not acceptable to have the BUG_ON in notify_cpu_starting(). I am going to repeat the content of my answer to your last e-mail: I was aware about it since the beginning. The whole point of the conversation was we should avoid to take the decision at the lower level and let the upper layer decide what to do. If the system is failing today then that's fine and still fit what I said in my first e-mail of that thread. For reminder: "We should really avoid to use panic(...) if this is something the system can survive. In that specific case, it would only affect the current CPU. So it would be better to return an error and let the caller decide what to do." To summarize: 1) Notifiers should only report an error and let the upper caller (here notify_cpu_starting()) deciding what to do. 2) I am OK with the BUG_ON in notify_cpu_starting() for now. Cheers, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v3 10/10] xen/arm: Enable errata for secondary CPU on hotplug after the boot
Hi Dario, On Thu, May 10, 2018 at 6:24 PM, Dario Faggioliwrote: > On Thu, 2018-05-10 at 17:49 +0200, Mirela Simonovic wrote: >> Regardless of the fact that the notifier returns an error or not, I >> believe it would be good and safe to set priority and document that >> priority zero would cause racing issue in the scenario I debugged >> today. I'm pretty sure that this discussion would be forgotten soon >> and it really should be documented in code/comment. >> > I may very well be missing or misunderstanding something, but I > continue to think that the problem here is that CPU_STARTING can't, > right now, fail, while you need it to be able to. > > If that is the case, giving different priorities to the notifier, is > not a solution. > Let me try to clarify. The assumption is that the starting CPU can fail. Additional requirement set by Julien is that panic or BUG_ON is not acceptable. There are 2 ways to deal with this scenario: 1) Ignore and report the error, and let the erroneous CPU become online. This cannot be done without changing the logic in either scheduler or notify_cpu_starting(), or I least I don't see how would that be done. In previous email when I said "escalating this to who knows where" I did not refer to error escalation but the escalation of the scope of this series. 2) Stop the erroneous CPU. Taking this approach requires setting the priority for the notifier. The key point is that notify_cpu_starting() and scheduler do not have to be changed. If errata notifier has higher priority than scheduler's notifier in the case of an error the CPU will not return into notify_cpu_starting() and it will never execute BUG_ON because it will be stopped. The rest of the system will continue to function without that CPU. > Regards, > Dario > -- > <> (Raistlin Majere) > - > Dario Faggioli, Ph.D, http://about.me/dario.faggioli > Software Engineer @ SUSE https://www.suse.com/ ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v3 10/10] xen/arm: Enable errata for secondary CPU on hotplug after the boot
On Thu, 2018-05-10 at 17:49 +0200, Mirela Simonovic wrote: > Regardless of the fact that the notifier returns an error or not, I > believe it would be good and safe to set priority and document that > priority zero would cause racing issue in the scenario I debugged > today. I'm pretty sure that this discussion would be forgotten soon > and it really should be documented in code/comment. > I may very well be missing or misunderstanding something, but I continue to think that the problem here is that CPU_STARTING can't, right now, fail, while you need it to be able to. If that is the case, giving different priorities to the notifier, is not a solution. Regards, Dario -- <> (Raistlin Majere) - Dario Faggioli, Ph.D, http://about.me/dario.faggioli Software Engineer @ SUSE https://www.suse.com/ signature.asc Description: This is a digitally signed message part ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v3 10/10] xen/arm: Enable errata for secondary CPU on hotplug after the boot
On Thu, 2018-05-10 at 17:02 +0100, Julien Grall wrote: > On 10/05/18 16:49, Mirela Simonovic wrote: > > Regardless of the fact that the notifier returns an error or not, I > > believe it would be good and safe to set priority and document that > > priority zero would cause racing issue in the scenario I debugged > > today. I'm pretty sure that this discussion would be forgotten soon > > and it really should be documented in code/comment. > > > > In emails above I assumed we'll stop the erroneous CPU. I didn't > > have > > a chance to try returning an error until few minutes ago. > > I tried returning an error from the notifier now and the whole > > system > > fails. You realized according to the answer below that this is > > going > > to happen. > > I was aware about it since the beginning. The whole point of the > conversation was we should avoid to take the decision at the lower > level > and let the upper layer decide what to do. > This makes sense to me. > > I would rather stop CPU because changing notify_cpu_starting > > affects > > x86 as well, I cannot dig into that and it would be really to much > > for > > this series. Since you're fine with stopping cpu as well, please > > lets > > do that instead of escalating this to who knows where :) > > Also, while I suggest that it could be replaced by stop_cpu() in the > common code, I also suggested that notifier_cpu_starting() could > return > an error then the architecture specific code can decide what to do. > > On x86 it would still be a BUG_ON(notifier_cpu_starting()). On Arm > we > can decide what to do. But it is not part of that discussion here. > As just saind in the other email, I don't think this is all it's necessary to enable CPU_STARTING to fail. Regards, Dario -- <> (Raistlin Majere) - Dario Faggioli, Ph.D, http://about.me/dario.faggioli Software Engineer @ SUSE https://www.suse.com/ signature.asc Description: This is a digitally signed message part ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v3 10/10] xen/arm: Enable errata for secondary CPU on hotplug after the boot
On Thu, 2018-05-10 at 16:13 +0100, Julien Grall wrote: > On 10/05/18 16:00, Mirela Simonovic wrote: > > > If you add your callback to CPU_UP_PREPARE, instead than to > > > CPU_STARTING, SCHED_OP(init_pdata) wouldn't be called, without > > > having > > > to fiddle with priorities. > > This function will enable capabilities on a given CPU, most of them > are > touching system registers. So it is necessary to add the callback to > CPU_STARTING. > Right, I guess that answers my question. > I always like to understand what I maintain :). Why do you need to > change the priority if you just return an error in the notifier? > > At the moment, notify_cpu_starting has a BUG_ON() in it. But ideally > I > would like to either replace that BUG_ON by cpu_stop or just > returning > an error to give a chance of the architecture deciding what to do > (this > does not have to be done today). > The problem is that, currently, once we've reached CPU_STARTING, we assume that the CPU bringup has gone ok, and things can't fail. Therefore, the only place when we undo what CPU_STARTING does is during CPU_DEAD, i.e., during hotunplug/suspend/teardown. I understand the point of having to run stuff on the CPU that is coming up, as well as your more general point. However, I don't know whether allowing CPU_STARTING to fail is the best way to achieve what you want to achieve, nor whether the BUG_ON in notify_cpu_starting() is the only issue you'll face trying to do that. I'd say that, if CPU_STARTING can fail, we need an appropriate rollback step, i.e., the flow must become something like (but I'd appreciate the opinion of x86 and core hypervisor maintainers about this): CPU_UP_PREPARE --> CPU_STARTING xx> CPU_DIDNT_START --> CPU_UP_CANCEL At which point, e.g., from the scheduler point of view, we can try to put a call to SCHED_OP(deinit_pdata) in CPU_DIDNT_START, and that would avoid the problem Mirela is facing, without having to play with priorities. Regards, Dario -- <> (Raistlin Majere) - Dario Faggioli, Ph.D, http://about.me/dario.faggioli Software Engineer @ SUSE https://www.suse.com/ signature.asc Description: This is a digitally signed message part ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v3 10/10] xen/arm: Enable errata for secondary CPU on hotplug after the boot
On 10/05/18 16:49, Mirela Simonovic wrote: Hi Julien, Hi, On Thu, May 10, 2018 at 5:13 PM, Julien Grallwrote: On 10/05/18 16:00, Mirela Simonovic wrote: Hi Dario, On Thu, May 10, 2018 at 4:25 PM, Dario Faggioli wrote: On Thu, 2018-05-10 at 15:24 +0200, Mirela Simonovic wrote: On Thu, May 10, 2018 at 1:57 PM, Mirela Simonovic Please take a look at function cpu_schedule_callback in schedule.c. Within switch, case CPU_DEAD doesn't have a break, causing the bellow CPU_UP_CANCELED to execute as well when the CPU goes down. This looks wrong to me. Dario, could you please confirm that this is a bug? Otherwise could you please confirm the reasoning beyond? Dario sorry, this looked wrong in my scenario but actually it is correct. I understand the purpose of the missing break now. No problem. For the curious ones (if any) here is detailed description: errata notifier added in this patch had the same priority as scheduler notifier. I though priority doesn't matter, but I was wrong. In this particular scenario where a CPU fails to enable capabilities (triggered by errata notifier added in this patch), scheduler callback executed before the errata callback for CPU_STARTING event. So, you're adding your errata callback to the CPU_STARTING notifier, right? (Sorry for having to ask, I don't have the patch handy right now.) In other words, scheduler already called init_pdata before the errata callback fired (and stopped the CPU). Later on when errata callback fired, enabling of capabilities has failed, so the erroneous non-boot CPU stopped itself and never declared to be online. Then CPU#0 fired new notification with CPU_UP_CANCELED event in order to clean up for the job done on CPU_STARTING. However, this broke the assumption (which is good) made in cpu_schedule_callback. The assumption is that the sequence of steps should be alloc_pdata-->init_pdata-->deinit_pdata-->free_pdata. In this particular case deinit_pdata was not done because this would be done only upon CPU_DEAD event which makes no sense in this scenario. In order to avoid running into the invalid scenario described above, the errata callback should fire before the scheduler callback. If enabling capabilities fails, the scheduler callback for CPU_STARTING will never execute afterwards, so the following CPU_UP_CANCELED event triggered by the CPU#0 will do free_pdata, which is ok because init_pdata was not executed and alloc_pdata-->free_pdata flow is also valid. Congratulations to the reader who reached this point :) Ok, but the flow is, AFAICR, CPU_UP_PREPARE->CPU_STARTING->CPU_ONLINE. If you add your callback to CPU_UP_PREPARE, instead than to CPU_STARTING, SCHED_OP(init_pdata) wouldn't be called, without having to fiddle with priorities. This function will enable capabilities on a given CPU, most of them are touching system registers. So it is necessary to add the callback to CPU_STARTING. Difference between CPU_UP_PREPARE and CPU_STARTING (in addition to the sequential ordering) is about which CPU executes the callback. In CPU_UP_PREPARE case the CPU which called cpu_up for another CPU will execute the callback. If I have 2 CPUs: CPU#0 executes callback when trying to hotplug CPU#1. I need CPU#1 to execute this callback. In CPU_STARTING case the CPU#1 will execute the callback, that is the reason why it has to be CPU_STARTING event. I tried CPU_UP_PREPARE in my tuned scenario and I needed few moment to realize why the system died (CPU#0 stopped himself :) Is there any reason why you can't do it that way? It would look more natural to me, and it's definitely going to be easier debug and maintain (e.g., look at how many callbacks CPU_UP_PREPARE has, as compared to CPU_STARTING ;-P). Julien is going to maintain this :))) I always like to understand what I maintain :). Why do you need to change the priority if you just return an error in the notifier? Regardless of the fact that the notifier returns an error or not, I believe it would be good and safe to set priority and document that priority zero would cause racing issue in the scenario I debugged today. I'm pretty sure that this discussion would be forgotten soon and it really should be documented in code/comment. In emails above I assumed we'll stop the erroneous CPU. I didn't have a chance to try returning an error until few minutes ago. I tried returning an error from the notifier now and the whole system fails. You realized according to the answer below that this is going to happen. I was aware about it since the beginning. The whole point of the conversation was we should avoid to take the decision at the lower level and let the upper layer decide what to do. If the system is failing today then that's fine and still fit what I said in my first e-mail of that thread. For reminder: "We should really avoid to use panic(...) if this is something the system can survive. In that specific case, it
Re: [Xen-devel] [PATCH v3 10/10] xen/arm: Enable errata for secondary CPU on hotplug after the boot
Hi Julien, On Thu, May 10, 2018 at 5:13 PM, Julien Grallwrote: > > > On 10/05/18 16:00, Mirela Simonovic wrote: >> >> Hi Dario, >> >> On Thu, May 10, 2018 at 4:25 PM, Dario Faggioli >> wrote: >>> >>> On Thu, 2018-05-10 at 15:24 +0200, Mirela Simonovic wrote: On Thu, May 10, 2018 at 1:57 PM, Mirela Simonovic > Please take a look at function cpu_schedule_callback in schedule.c. > Within switch, case CPU_DEAD doesn't have a break, causing the > bellow > CPU_UP_CANCELED to execute as well when the CPU goes down. This > looks > wrong to me. > Dario, could you please confirm that this is a bug? Otherwise could > you please confirm the reasoning beyond? > Dario sorry, this looked wrong in my scenario but actually it is correct. I understand the purpose of the missing break now. >>> No problem. >>> For the curious ones (if any) here is detailed description: errata notifier added in this patch had the same priority as scheduler notifier. I though priority doesn't matter, but I was wrong. In this particular scenario where a CPU fails to enable capabilities (triggered by errata notifier added in this patch), scheduler callback executed before the errata callback for CPU_STARTING event. >>> So, you're adding your errata callback to the CPU_STARTING notifier, >>> right? (Sorry for having to ask, I don't have the patch handy right >>> now.) >>> In other words, scheduler already called init_pdata before the errata callback fired (and stopped the CPU). Later on when errata callback fired, enabling of capabilities has failed, so the erroneous non-boot CPU stopped itself and never declared to be online. Then CPU#0 fired new notification with CPU_UP_CANCELED event in order to clean up for the job done on CPU_STARTING. However, this broke the assumption (which is good) made in cpu_schedule_callback. The assumption is that the sequence of steps should be alloc_pdata-->init_pdata-->deinit_pdata-->free_pdata. In this particular case deinit_pdata was not done because this would be done only upon CPU_DEAD event which makes no sense in this scenario. In order to avoid running into the invalid scenario described above, the errata callback should fire before the scheduler callback. If enabling capabilities fails, the scheduler callback for CPU_STARTING will never execute afterwards, so the following CPU_UP_CANCELED event triggered by the CPU#0 will do free_pdata, which is ok because init_pdata was not executed and alloc_pdata-->free_pdata flow is also valid. Congratulations to the reader who reached this point :) >>> Ok, but the flow is, AFAICR, CPU_UP_PREPARE->CPU_STARTING->CPU_ONLINE. >>> >>> If you add your callback to CPU_UP_PREPARE, instead than to >>> CPU_STARTING, SCHED_OP(init_pdata) wouldn't be called, without having >>> to fiddle with priorities. > > This function will enable capabilities on a given CPU, most of them are > touching system registers. So it is necessary to add the callback to > CPU_STARTING. > >>> >> >> Difference between CPU_UP_PREPARE and CPU_STARTING (in addition to the >> sequential ordering) is about which CPU executes the callback. >> In CPU_UP_PREPARE case the CPU which called cpu_up for another CPU >> will execute the callback. If I have 2 CPUs: CPU#0 executes callback >> when trying to hotplug CPU#1. I need CPU#1 to execute this callback. >> In CPU_STARTING case the CPU#1 will execute the callback, that is the >> reason why it has to be CPU_STARTING event. >> >> I tried CPU_UP_PREPARE in my tuned scenario and I needed few moment to >> realize why the system died (CPU#0 stopped himself :) >> >>> Is there any reason why you can't do it that way? It would look more >>> natural to me, and it's definitely going to be easier debug and >>> maintain (e.g., look at how many callbacks CPU_UP_PREPARE has, as >>> compared to CPU_STARTING ;-P). >>> >> >> Julien is going to maintain this :))) > > > I always like to understand what I maintain :). Why do you need to change > the priority if you just return an error in the notifier? > Regardless of the fact that the notifier returns an error or not, I believe it would be good and safe to set priority and document that priority zero would cause racing issue in the scenario I debugged today. I'm pretty sure that this discussion would be forgotten soon and it really should be documented in code/comment. In emails above I assumed we'll stop the erroneous CPU. I didn't have a chance to try returning an error until few minutes ago. I tried returning an error from the notifier now and the whole system fails. You realized according to the answer below that this is going to happen. > At the moment, notify_cpu_starting has a BUG_ON() in it. But ideally I would > like to either replace that BUG_ON by cpu_stop or just
Re: [Xen-devel] [PATCH v3 10/10] xen/arm: Enable errata for secondary CPU on hotplug after the boot
Hi Dario, On Thu, May 10, 2018 at 4:25 PM, Dario Faggioliwrote: > On Thu, 2018-05-10 at 15:24 +0200, Mirela Simonovic wrote: >> On Thu, May 10, 2018 at 1:57 PM, Mirela Simonovic >> >> > Please take a look at function cpu_schedule_callback in schedule.c. >> > Within switch, case CPU_DEAD doesn't have a break, causing the >> > bellow >> > CPU_UP_CANCELED to execute as well when the CPU goes down. This >> > looks >> > wrong to me. >> > Dario, could you please confirm that this is a bug? Otherwise could >> > you please confirm the reasoning beyond? >> > >> >> Dario sorry, this looked wrong in my scenario but actually it is >> correct. I understand the purpose of the missing break now. >> > No problem. > >> For the curious ones (if any) here is detailed description: errata >> notifier added in this patch had the same priority as scheduler >> notifier. I though priority doesn't matter, but I was wrong. In this >> particular scenario where a CPU fails to enable capabilities >> (triggered by errata notifier added in this patch), scheduler >> callback >> executed before the errata callback for CPU_STARTING event. >> > So, you're adding your errata callback to the CPU_STARTING notifier, > right? (Sorry for having to ask, I don't have the patch handy right > now.) > >> In other >> words, scheduler already called init_pdata before the errata callback >> fired (and stopped the CPU). >> Later on when errata callback fired, enabling of capabilities has >> failed, so the erroneous non-boot CPU stopped itself and never >> declared to be online. >> Then CPU#0 fired new notification with CPU_UP_CANCELED event in order >> to clean up for the job done on CPU_STARTING. However, this broke the >> assumption (which is good) made in cpu_schedule_callback. The >> assumption is that the sequence of steps should be >> alloc_pdata-->init_pdata-->deinit_pdata-->free_pdata. In this >> particular case deinit_pdata was not done because this would be done >> only upon CPU_DEAD event which makes no sense in this scenario. >> In order to avoid running into the invalid scenario described above, >> the errata callback should fire before the scheduler callback. If >> enabling capabilities fails, the scheduler callback for CPU_STARTING >> will never execute afterwards, so the following CPU_UP_CANCELED event >> triggered by the CPU#0 will do free_pdata, which is ok because >> init_pdata was not executed and alloc_pdata-->free_pdata flow is also >> valid. Congratulations to the reader who reached this point :) >> > Ok, but the flow is, AFAICR, CPU_UP_PREPARE->CPU_STARTING->CPU_ONLINE. > > If you add your callback to CPU_UP_PREPARE, instead than to > CPU_STARTING, SCHED_OP(init_pdata) wouldn't be called, without having > to fiddle with priorities. > Difference between CPU_UP_PREPARE and CPU_STARTING (in addition to the sequential ordering) is about which CPU executes the callback. In CPU_UP_PREPARE case the CPU which called cpu_up for another CPU will execute the callback. If I have 2 CPUs: CPU#0 executes callback when trying to hotplug CPU#1. I need CPU#1 to execute this callback. In CPU_STARTING case the CPU#1 will execute the callback, that is the reason why it has to be CPU_STARTING event. I tried CPU_UP_PREPARE in my tuned scenario and I needed few moment to realize why the system died (CPU#0 stopped himself :) > Is there any reason why you can't do it that way? It would look more > natural to me, and it's definitely going to be easier debug and > maintain (e.g., look at how many callbacks CPU_UP_PREPARE has, as > compared to CPU_STARTING ;-P). > Julien is going to maintain this :))) > Regards, > Dario > -- > <> (Raistlin Majere) > - > Dario Faggioli, Ph.D, http://about.me/dario.faggioli > Software Engineer @ SUSE https://www.suse.com/ ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v3 10/10] xen/arm: Enable errata for secondary CPU on hotplug after the boot
On 10/05/18 15:12, Mirela Simonovic wrote: Hi Julien, On Thu, May 10, 2018 at 3:29 PM, Julien Grallwrote: Hi, On 05/10/2018 02:24 PM, Mirela Simonovic wrote: On Thu, May 10, 2018 at 1:57 PM, Mirela Simonovic wrote: I have tested the tuned scenario where enabling capabilities fails - the erroneous CPU is stopped/powered down and the system continues to work fine without it. Although I still don't understand why indeed I needed to debug this I'll add the fix in v4. I don't understand your last sentence. What did you mean? I still don't understand how can this scenario happen practically at runtime (a scenario without me tuning the failure for testing). error path are by nature hard to reach without tweaking the code. Imagine something that can only happen once every thousand boot. It is not often, but thanks to murphy that will only happen at the worst moment. And you know very well how it is a pain to debug :). So I always tend to tweak the code in order to exercise major error path. Memory allocation failure shouldn't be possible at this point and any workaround related failure would have happened at boot. I don't see anything else or my understanding may not be correct... However, if some new capability would be introduced one day and it could fail, it would be good if we don't have to go back and fix this patch. The memory failure was an example among the other. The whole point here is you can't extrapolate how the errata are implemented today for future one. I also want to limit the number of stop_cpu/panic over the code, so we have only place to make the decision if something is wrong. At the end, it doesn't matter since the system will be able to properly deal with such a scenario with the fix. I would be ok having stop_cpu called here. Although my preference is returning an error and let the caller decides what to do. Cheers, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v3 10/10] xen/arm: Enable errata for secondary CPU on hotplug after the boot
On Thu, 2018-05-10 at 15:24 +0200, Mirela Simonovic wrote: > On Thu, May 10, 2018 at 1:57 PM, Mirela Simonovic > > > Please take a look at function cpu_schedule_callback in schedule.c. > > Within switch, case CPU_DEAD doesn't have a break, causing the > > bellow > > CPU_UP_CANCELED to execute as well when the CPU goes down. This > > looks > > wrong to me. > > Dario, could you please confirm that this is a bug? Otherwise could > > you please confirm the reasoning beyond? > > > > Dario sorry, this looked wrong in my scenario but actually it is > correct. I understand the purpose of the missing break now. > No problem. > For the curious ones (if any) here is detailed description: errata > notifier added in this patch had the same priority as scheduler > notifier. I though priority doesn't matter, but I was wrong. In this > particular scenario where a CPU fails to enable capabilities > (triggered by errata notifier added in this patch), scheduler > callback > executed before the errata callback for CPU_STARTING event. > So, you're adding your errata callback to the CPU_STARTING notifier, right? (Sorry for having to ask, I don't have the patch handy right now.) > In other > words, scheduler already called init_pdata before the errata callback > fired (and stopped the CPU). > Later on when errata callback fired, enabling of capabilities has > failed, so the erroneous non-boot CPU stopped itself and never > declared to be online. > Then CPU#0 fired new notification with CPU_UP_CANCELED event in order > to clean up for the job done on CPU_STARTING. However, this broke the > assumption (which is good) made in cpu_schedule_callback. The > assumption is that the sequence of steps should be > alloc_pdata-->init_pdata-->deinit_pdata-->free_pdata. In this > particular case deinit_pdata was not done because this would be done > only upon CPU_DEAD event which makes no sense in this scenario. > In order to avoid running into the invalid scenario described above, > the errata callback should fire before the scheduler callback. If > enabling capabilities fails, the scheduler callback for CPU_STARTING > will never execute afterwards, so the following CPU_UP_CANCELED event > triggered by the CPU#0 will do free_pdata, which is ok because > init_pdata was not executed and alloc_pdata-->free_pdata flow is also > valid. Congratulations to the reader who reached this point :) > Ok, but the flow is, AFAICR, CPU_UP_PREPARE->CPU_STARTING->CPU_ONLINE. If you add your callback to CPU_UP_PREPARE, instead than to CPU_STARTING, SCHED_OP(init_pdata) wouldn't be called, without having to fiddle with priorities. Is there any reason why you can't do it that way? It would look more natural to me, and it's definitely going to be easier debug and maintain (e.g., look at how many callbacks CPU_UP_PREPARE has, as compared to CPU_STARTING ;-P). Regards, Dario -- <> (Raistlin Majere) - Dario Faggioli, Ph.D, http://about.me/dario.faggioli Software Engineer @ SUSE https://www.suse.com/ signature.asc Description: This is a digitally signed message part ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v3 10/10] xen/arm: Enable errata for secondary CPU on hotplug after the boot
On Thu, 2018-05-10 at 13:57 +0200, Mirela Simonovic wrote: > Hi, > > +Dario > Thanks. > On Wed, May 9, 2018 at 6:32 PM, Julien Grall> wrote: > > If there is a bug in the scheduler it should be fixed rather trying > > to > > workaround with a panic in the code. If you provide more details, > > we might > > be able to help here. > > > > This flow seems to have several bugs. Lets start from here: > > Please take a look at function cpu_schedule_callback in schedule.c. > Within switch, case CPU_DEAD doesn't have a break, causing the bellow > CPU_UP_CANCELED to execute as well when the CPU goes down. This looks > wrong to me. > Dario, could you please confirm that this is a bug? > No, that is entirely on purpose (as you can tell from the "/* Fallthrough */" comment). > Otherwise could > you please confirm the reasoning beyond? > Well, the idea is that the CPU_UP_CANCELLED notifier should _only_ invoke cpu_schedule_down(). On the other hand, the CPU_DEAD notifier should invoke both: SCHED_OP(deinit_pdata) cpu_schedule_down() and the fallthrough is the way we achieve that. On x86, the shutdown/suspend path is as follows: XENPF_enter_acpi_sleep (platform_hypercall.c) acpi_enter_sleep() continue_hypercall_on_cpu(0, enter_state_helper) enter_state() system_state = SYS_STATE_suspend disable_nonboot_cpus() for_each_online_cpu ( cpu ) { cpu_down(cpu) } notifier_call_chain(CPU_DOWN_PREPARE) cpu_callback(CPU_DOWN_PREPARE) cpupool_cpu_remove(cpu) cpufreq_del_cpu(cpu) stop_machine_run(take_cpu_down, cpu) __cpu_disable(); notifier_call_chain(CPU_DYING) ... cpumask_clear_cpu(cpu, _online_map); cpu_disable_scheduler(cpu); __cpu_die(cpu) while ( (seen_state = cpu_state) != CPU_STATE_DEAD ) { ... } notifier_call_chain(CPU_DEAD) cpu_schedule_callback(CPU_DEAD) ... ... cpu_schedule_callback(CPU_DEAD) SCHED_OP(deinit_pdata) cpu_schedule_down(cpu) SCHED_OP(free_pdata) SCHED_OP(free_vdata) If you see a BUG_ON triggering, please provide details about that, and we'll try to figure out what's causing it. Regards, Dario -- <> (Raistlin Majere) - Dario Faggioli, Ph.D, http://about.me/dario.faggioli Software Engineer @ SUSE https://www.suse.com/ signature.asc Description: This is a digitally signed message part ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v3 10/10] xen/arm: Enable errata for secondary CPU on hotplug after the boot
Hi, On 05/10/2018 02:24 PM, Mirela Simonovic wrote: On Thu, May 10, 2018 at 1:57 PM, Mirela Simonovicwrote: I have tested the tuned scenario where enabling capabilities fails - the erroneous CPU is stopped/powered down and the system continues to work fine without it. Although I still don't understand why indeed I needed to debug this I'll add the fix in v4. I don't understand your last sentence. What did you mean? Cheers, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v3 10/10] xen/arm: Enable errata for secondary CPU on hotplug after the boot
On Thu, May 10, 2018 at 1:57 PM, Mirela Simonovicwrote: > Hi, > > +Dario > > On Wed, May 9, 2018 at 6:32 PM, Julien Grall wrote: >> >> >> On 09/05/18 16:48, Mirela Simonovic wrote: >>> >>> Hi Julien, >> >> >> Hi Mirela, >> >> >>> On Mon, Apr 30, 2018 at 6:09 PM, Julien Grall >>> wrote: Hi Mirela, On 27/04/18 18:12, Mirela Simonovic wrote: > > > On boot, enabling errata workarounds will be triggered by the boot CPU > from start_xen(). On CPU hotplug (non-boot scenario) this would not be > done. This patch adds the code required to enable errata workarounds > for a CPU being hotplugged after the system boots. This is triggered > using a notifier. If the CPU fails to enable the errata Xen will panic. > This is done because it is assumed that the CPU which is hotplugged > after the system/Xen boots, was initially hotplugged during the > system/Xen boot. Therefore, enabling errata workarounds should never > fail. > > Signed-off-by: Mirela Simonovic > > --- > CC: Stefano Stabellini > CC: Julien Grall > --- >xen/arch/arm/cpuerrata.c | 35 > +++ >xen/arch/arm/cpufeature.c| 23 +++ >xen/include/asm-arm/cpufeature.h | 1 + >3 files changed, 59 insertions(+) > > diff --git a/xen/arch/arm/cpuerrata.c b/xen/arch/arm/cpuerrata.c > index 1baa20654b..4040f781ec 100644 > --- a/xen/arch/arm/cpuerrata.c > +++ b/xen/arch/arm/cpuerrata.c > @@ -5,6 +5,8 @@ >#include >#include >#include > +#include > +#include >#include >#include >#include > @@ -349,6 +351,39 @@ void __init enable_errata_workarounds(void) >enable_cpu_capabilities(arm_errata); >} >+static int cpu_errata_callback( > +struct notifier_block *nfb, unsigned long action, void *hcpu) > +{ > +switch ( action ) > +{ > +case CPU_STARTING: > +enable_nonboot_cpu_caps(arm_errata); > +break; > +default: > +break; > +} > + > +return NOTIFY_DONE; > +} > + > +static struct notifier_block cpu_errata_nfb = { > +.notifier_call = cpu_errata_callback, > +}; > + > +static int __init cpu_errata_notifier_init(void) > +{ > +register_cpu_notifier(_errata_nfb); > +return 0; > +} > +/* > + * Initialization has to be done at init rather than presmp_init phase > because > + * the callback should execute only after the secondary CPUs are > initially > + * booted (in hotplug scenarios when the system state is not boot). On > boot, > + * the enabling of errata workarounds will be triggered by the boot CPU > from > + * start_xen(). > + */ > +__initcall(cpu_errata_notifier_init); > + >/* > * Local variables: > * mode: C > diff --git a/xen/arch/arm/cpufeature.c b/xen/arch/arm/cpufeature.c > index 525b45e22f..dd30f0d29c 100644 > --- a/xen/arch/arm/cpufeature.c > +++ b/xen/arch/arm/cpufeature.c > @@ -68,6 +68,29 @@ void __init enable_cpu_capabilities(const struct > arm_cpu_capabilities *caps) >} >} >+/* Run through the enabled capabilities and enable() them on the > calling CPU */ > +void enable_nonboot_cpu_caps(const struct arm_cpu_capabilities *caps) > +{ > +ASSERT(system_state != SYS_STATE_boot); > + > +for ( ; caps->matches; caps++ ) > +{ > +if ( !cpus_have_cap(caps->capability) ) > +continue; > + > +if ( caps->enable ) > +{ > +/* > + * Since the CPU has enabled errata workarounds on boot, it > should This function is not really about errata, it is about capabilities. Errata is just a sub-category of them. >>> >>> I've fixed the comment, thanks. >>> > + * never fail to enable them here. > + */ > +if ( caps->enable((void *)caps) ) > +panic("CPU%u failed to enable capability %u\n", > + smp_processor_id(), caps->capability); We should really avoid to use panic(...) if this is something the system can survive. In that specific case, it would only affect the current CPU. So it would be better to return an error and let the caller decide what to do. >>> >>> I need to emphasize two points: >>> 1) I don't see how is this different compared to PSCI CPU OFF where we >>> do panic. Essentially, in both cases the system will not be able to >>> use that CPU and we
Re: [Xen-devel] [PATCH v3 10/10] xen/arm: Enable errata for secondary CPU on hotplug after the boot
Hi, +Dario On Wed, May 9, 2018 at 6:32 PM, Julien Grallwrote: > > > On 09/05/18 16:48, Mirela Simonovic wrote: >> >> Hi Julien, > > > Hi Mirela, > > >> On Mon, Apr 30, 2018 at 6:09 PM, Julien Grall >> wrote: >>> >>> Hi Mirela, >>> >>> >>> On 27/04/18 18:12, Mirela Simonovic wrote: On boot, enabling errata workarounds will be triggered by the boot CPU from start_xen(). On CPU hotplug (non-boot scenario) this would not be done. This patch adds the code required to enable errata workarounds for a CPU being hotplugged after the system boots. This is triggered using a notifier. If the CPU fails to enable the errata Xen will panic. This is done because it is assumed that the CPU which is hotplugged after the system/Xen boots, was initially hotplugged during the system/Xen boot. Therefore, enabling errata workarounds should never fail. Signed-off-by: Mirela Simonovic --- CC: Stefano Stabellini CC: Julien Grall --- xen/arch/arm/cpuerrata.c | 35 +++ xen/arch/arm/cpufeature.c| 23 +++ xen/include/asm-arm/cpufeature.h | 1 + 3 files changed, 59 insertions(+) diff --git a/xen/arch/arm/cpuerrata.c b/xen/arch/arm/cpuerrata.c index 1baa20654b..4040f781ec 100644 --- a/xen/arch/arm/cpuerrata.c +++ b/xen/arch/arm/cpuerrata.c @@ -5,6 +5,8 @@ #include #include #include +#include +#include #include #include #include @@ -349,6 +351,39 @@ void __init enable_errata_workarounds(void) enable_cpu_capabilities(arm_errata); } +static int cpu_errata_callback( +struct notifier_block *nfb, unsigned long action, void *hcpu) +{ +switch ( action ) +{ +case CPU_STARTING: +enable_nonboot_cpu_caps(arm_errata); +break; +default: +break; +} + +return NOTIFY_DONE; +} + +static struct notifier_block cpu_errata_nfb = { +.notifier_call = cpu_errata_callback, +}; + +static int __init cpu_errata_notifier_init(void) +{ +register_cpu_notifier(_errata_nfb); +return 0; +} +/* + * Initialization has to be done at init rather than presmp_init phase because + * the callback should execute only after the secondary CPUs are initially + * booted (in hotplug scenarios when the system state is not boot). On boot, + * the enabling of errata workarounds will be triggered by the boot CPU from + * start_xen(). + */ +__initcall(cpu_errata_notifier_init); + /* * Local variables: * mode: C diff --git a/xen/arch/arm/cpufeature.c b/xen/arch/arm/cpufeature.c index 525b45e22f..dd30f0d29c 100644 --- a/xen/arch/arm/cpufeature.c +++ b/xen/arch/arm/cpufeature.c @@ -68,6 +68,29 @@ void __init enable_cpu_capabilities(const struct arm_cpu_capabilities *caps) } } +/* Run through the enabled capabilities and enable() them on the calling CPU */ +void enable_nonboot_cpu_caps(const struct arm_cpu_capabilities *caps) +{ +ASSERT(system_state != SYS_STATE_boot); + +for ( ; caps->matches; caps++ ) +{ +if ( !cpus_have_cap(caps->capability) ) +continue; + +if ( caps->enable ) +{ +/* + * Since the CPU has enabled errata workarounds on boot, it should >>> >>> >>> >>> This function is not really about errata, it is about capabilities. >>> Errata >>> is just a sub-category of them. >>> >> >> I've fixed the comment, thanks. >> + * never fail to enable them here. + */ +if ( caps->enable((void *)caps) ) +panic("CPU%u failed to enable capability %u\n", + smp_processor_id(), caps->capability); >>> >>> >>> >>> We should really avoid to use panic(...) if this is something the system >>> can >>> survive. In that specific case, it would only affect the current CPU. So >>> it >>> would be better to return an error and let the caller decide what to do. >>> >> >> I need to emphasize two points: >> 1) I don't see how is this different compared to PSCI CPU OFF where we >> do panic. Essentially, in both cases the system will not be able to >> use that CPU and we already agreed that is a good reason to panic. > > > You can't compare PSCI CPU off and the enable callback failing. The *only* > reason PSCI CPU off can fail is because the Trusted OS is resident on that > CPU. If that ever happen it is a
Re: [Xen-devel] [PATCH v3 10/10] xen/arm: Enable errata for secondary CPU on hotplug after the boot
Hi Julien, On Mon, Apr 30, 2018 at 6:09 PM, Julien Grallwrote: > Hi Mirela, > > > On 27/04/18 18:12, Mirela Simonovic wrote: >> >> On boot, enabling errata workarounds will be triggered by the boot CPU >> from start_xen(). On CPU hotplug (non-boot scenario) this would not be >> done. This patch adds the code required to enable errata workarounds >> for a CPU being hotplugged after the system boots. This is triggered >> using a notifier. If the CPU fails to enable the errata Xen will panic. >> This is done because it is assumed that the CPU which is hotplugged >> after the system/Xen boots, was initially hotplugged during the >> system/Xen boot. Therefore, enabling errata workarounds should never >> fail. >> >> Signed-off-by: Mirela Simonovic >> >> --- >> CC: Stefano Stabellini >> CC: Julien Grall >> --- >> xen/arch/arm/cpuerrata.c | 35 >> +++ >> xen/arch/arm/cpufeature.c| 23 +++ >> xen/include/asm-arm/cpufeature.h | 1 + >> 3 files changed, 59 insertions(+) >> >> diff --git a/xen/arch/arm/cpuerrata.c b/xen/arch/arm/cpuerrata.c >> index 1baa20654b..4040f781ec 100644 >> --- a/xen/arch/arm/cpuerrata.c >> +++ b/xen/arch/arm/cpuerrata.c >> @@ -5,6 +5,8 @@ >> #include >> #include >> #include >> +#include >> +#include >> #include >> #include >> #include >> @@ -349,6 +351,39 @@ void __init enable_errata_workarounds(void) >> enable_cpu_capabilities(arm_errata); >> } >> +static int cpu_errata_callback( >> +struct notifier_block *nfb, unsigned long action, void *hcpu) >> +{ >> +switch ( action ) >> +{ >> +case CPU_STARTING: >> +enable_nonboot_cpu_caps(arm_errata); >> +break; >> +default: >> +break; >> +} >> + >> +return NOTIFY_DONE; >> +} >> + >> +static struct notifier_block cpu_errata_nfb = { >> +.notifier_call = cpu_errata_callback, >> +}; >> + >> +static int __init cpu_errata_notifier_init(void) >> +{ >> +register_cpu_notifier(_errata_nfb); >> +return 0; >> +} >> +/* >> + * Initialization has to be done at init rather than presmp_init phase >> because >> + * the callback should execute only after the secondary CPUs are >> initially >> + * booted (in hotplug scenarios when the system state is not boot). On >> boot, >> + * the enabling of errata workarounds will be triggered by the boot CPU >> from >> + * start_xen(). >> + */ >> +__initcall(cpu_errata_notifier_init); >> + >> /* >>* Local variables: >>* mode: C >> diff --git a/xen/arch/arm/cpufeature.c b/xen/arch/arm/cpufeature.c >> index 525b45e22f..dd30f0d29c 100644 >> --- a/xen/arch/arm/cpufeature.c >> +++ b/xen/arch/arm/cpufeature.c >> @@ -68,6 +68,29 @@ void __init enable_cpu_capabilities(const struct >> arm_cpu_capabilities *caps) >> } >> } >> +/* Run through the enabled capabilities and enable() them on the >> calling CPU */ >> +void enable_nonboot_cpu_caps(const struct arm_cpu_capabilities *caps) >> +{ >> +ASSERT(system_state != SYS_STATE_boot); >> + >> +for ( ; caps->matches; caps++ ) >> +{ >> +if ( !cpus_have_cap(caps->capability) ) >> +continue; >> + >> +if ( caps->enable ) >> +{ >> +/* >> + * Since the CPU has enabled errata workarounds on boot, it >> should > > > This function is not really about errata, it is about capabilities. Errata > is just a sub-category of them. > I've fixed the comment, thanks. >> + * never fail to enable them here. >> + */ >> +if ( caps->enable((void *)caps) ) >> +panic("CPU%u failed to enable capability %u\n", >> + smp_processor_id(), caps->capability); > > > We should really avoid to use panic(...) if this is something the system can > survive. In that specific case, it would only affect the current CPU. So it > would be better to return an error and let the caller decide what to do. > I need to emphasize two points: 1) I don't see how is this different compared to PSCI CPU OFF where we do panic. Essentially, in both cases the system will not be able to use that CPU and we already agreed that is a good reason to panic. As oppose to CPU_OFF which wasn't called on boot so we indeed have no idea whether it will pass on suspend, no matter how unlikely it could fail, in this scenario we are sure that enabling capability should pass because it already passed on boot. So if it doesn't pass, which I consider to be impossible, I believe we should panic. On the other hand, I understand how would this make a difference on big.LITTLE where you try to hotplug a CPU that was never booted. However, that scenario is out of this scope. 2) I still wanted to give a chance to your proposal and just convert panic into stop_cpu+printing error. The system cannot survive if enabling a capability fails. In order to
Re: [Xen-devel] [PATCH v3 10/10] xen/arm: Enable errata for secondary CPU on hotplug after the boot
Hi Mirela, On 27/04/18 18:12, Mirela Simonovic wrote: On boot, enabling errata workarounds will be triggered by the boot CPU from start_xen(). On CPU hotplug (non-boot scenario) this would not be done. This patch adds the code required to enable errata workarounds for a CPU being hotplugged after the system boots. This is triggered using a notifier. If the CPU fails to enable the errata Xen will panic. This is done because it is assumed that the CPU which is hotplugged after the system/Xen boots, was initially hotplugged during the system/Xen boot. Therefore, enabling errata workarounds should never fail. Signed-off-by: Mirela Simonovic--- CC: Stefano Stabellini CC: Julien Grall --- xen/arch/arm/cpuerrata.c | 35 +++ xen/arch/arm/cpufeature.c| 23 +++ xen/include/asm-arm/cpufeature.h | 1 + 3 files changed, 59 insertions(+) diff --git a/xen/arch/arm/cpuerrata.c b/xen/arch/arm/cpuerrata.c index 1baa20654b..4040f781ec 100644 --- a/xen/arch/arm/cpuerrata.c +++ b/xen/arch/arm/cpuerrata.c @@ -5,6 +5,8 @@ #include #include #include +#include +#include #include #include #include @@ -349,6 +351,39 @@ void __init enable_errata_workarounds(void) enable_cpu_capabilities(arm_errata); } +static int cpu_errata_callback( +struct notifier_block *nfb, unsigned long action, void *hcpu) +{ +switch ( action ) +{ +case CPU_STARTING: +enable_nonboot_cpu_caps(arm_errata); +break; +default: +break; +} + +return NOTIFY_DONE; +} + +static struct notifier_block cpu_errata_nfb = { +.notifier_call = cpu_errata_callback, +}; + +static int __init cpu_errata_notifier_init(void) +{ +register_cpu_notifier(_errata_nfb); +return 0; +} +/* + * Initialization has to be done at init rather than presmp_init phase because + * the callback should execute only after the secondary CPUs are initially + * booted (in hotplug scenarios when the system state is not boot). On boot, + * the enabling of errata workarounds will be triggered by the boot CPU from + * start_xen(). + */ +__initcall(cpu_errata_notifier_init); + /* * Local variables: * mode: C diff --git a/xen/arch/arm/cpufeature.c b/xen/arch/arm/cpufeature.c index 525b45e22f..dd30f0d29c 100644 --- a/xen/arch/arm/cpufeature.c +++ b/xen/arch/arm/cpufeature.c @@ -68,6 +68,29 @@ void __init enable_cpu_capabilities(const struct arm_cpu_capabilities *caps) } } +/* Run through the enabled capabilities and enable() them on the calling CPU */ +void enable_nonboot_cpu_caps(const struct arm_cpu_capabilities *caps) +{ +ASSERT(system_state != SYS_STATE_boot); + +for ( ; caps->matches; caps++ ) +{ +if ( !cpus_have_cap(caps->capability) ) +continue; + +if ( caps->enable ) +{ +/* + * Since the CPU has enabled errata workarounds on boot, it should This function is not really about errata, it is about capabilities. Errata is just a sub-category of them. + * never fail to enable them here. + */ +if ( caps->enable((void *)caps) ) +panic("CPU%u failed to enable capability %u\n", + smp_processor_id(), caps->capability); We should really avoid to use panic(...) if this is something the system can survive. In that specific case, it would only affect the current CPU. So it would be better to return an error and let the caller decide what to do. Cheers, +} +} +} + /* * Local variables: * mode: C diff --git a/xen/include/asm-arm/cpufeature.h b/xen/include/asm-arm/cpufeature.h index e557a095af..b14e226401 100644 --- a/xen/include/asm-arm/cpufeature.h +++ b/xen/include/asm-arm/cpufeature.h @@ -88,6 +88,7 @@ void update_cpu_capabilities(const struct arm_cpu_capabilities *caps, const char *info); void enable_cpu_capabilities(const struct arm_cpu_capabilities *caps); +void enable_nonboot_cpu_caps(const struct arm_cpu_capabilities *caps); #endif /* __ASSEMBLY__ */ Cheers, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v3 10/10] xen/arm: Enable errata for secondary CPU on hotplug after the boot
On boot, enabling errata workarounds will be triggered by the boot CPU from start_xen(). On CPU hotplug (non-boot scenario) this would not be done. This patch adds the code required to enable errata workarounds for a CPU being hotplugged after the system boots. This is triggered using a notifier. If the CPU fails to enable the errata Xen will panic. This is done because it is assumed that the CPU which is hotplugged after the system/Xen boots, was initially hotplugged during the system/Xen boot. Therefore, enabling errata workarounds should never fail. Signed-off-by: Mirela Simonovic--- CC: Stefano Stabellini CC: Julien Grall --- xen/arch/arm/cpuerrata.c | 35 +++ xen/arch/arm/cpufeature.c| 23 +++ xen/include/asm-arm/cpufeature.h | 1 + 3 files changed, 59 insertions(+) diff --git a/xen/arch/arm/cpuerrata.c b/xen/arch/arm/cpuerrata.c index 1baa20654b..4040f781ec 100644 --- a/xen/arch/arm/cpuerrata.c +++ b/xen/arch/arm/cpuerrata.c @@ -5,6 +5,8 @@ #include #include #include +#include +#include #include #include #include @@ -349,6 +351,39 @@ void __init enable_errata_workarounds(void) enable_cpu_capabilities(arm_errata); } +static int cpu_errata_callback( +struct notifier_block *nfb, unsigned long action, void *hcpu) +{ +switch ( action ) +{ +case CPU_STARTING: +enable_nonboot_cpu_caps(arm_errata); +break; +default: +break; +} + +return NOTIFY_DONE; +} + +static struct notifier_block cpu_errata_nfb = { +.notifier_call = cpu_errata_callback, +}; + +static int __init cpu_errata_notifier_init(void) +{ +register_cpu_notifier(_errata_nfb); +return 0; +} +/* + * Initialization has to be done at init rather than presmp_init phase because + * the callback should execute only after the secondary CPUs are initially + * booted (in hotplug scenarios when the system state is not boot). On boot, + * the enabling of errata workarounds will be triggered by the boot CPU from + * start_xen(). + */ +__initcall(cpu_errata_notifier_init); + /* * Local variables: * mode: C diff --git a/xen/arch/arm/cpufeature.c b/xen/arch/arm/cpufeature.c index 525b45e22f..dd30f0d29c 100644 --- a/xen/arch/arm/cpufeature.c +++ b/xen/arch/arm/cpufeature.c @@ -68,6 +68,29 @@ void __init enable_cpu_capabilities(const struct arm_cpu_capabilities *caps) } } +/* Run through the enabled capabilities and enable() them on the calling CPU */ +void enable_nonboot_cpu_caps(const struct arm_cpu_capabilities *caps) +{ +ASSERT(system_state != SYS_STATE_boot); + +for ( ; caps->matches; caps++ ) +{ +if ( !cpus_have_cap(caps->capability) ) +continue; + +if ( caps->enable ) +{ +/* + * Since the CPU has enabled errata workarounds on boot, it should + * never fail to enable them here. + */ +if ( caps->enable((void *)caps) ) +panic("CPU%u failed to enable capability %u\n", + smp_processor_id(), caps->capability); +} +} +} + /* * Local variables: * mode: C diff --git a/xen/include/asm-arm/cpufeature.h b/xen/include/asm-arm/cpufeature.h index e557a095af..b14e226401 100644 --- a/xen/include/asm-arm/cpufeature.h +++ b/xen/include/asm-arm/cpufeature.h @@ -88,6 +88,7 @@ void update_cpu_capabilities(const struct arm_cpu_capabilities *caps, const char *info); void enable_cpu_capabilities(const struct arm_cpu_capabilities *caps); +void enable_nonboot_cpu_caps(const struct arm_cpu_capabilities *caps); #endif /* __ASSEMBLY__ */ -- 2.13.0 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel