> On Jan 14, 2022, at 9:01 PM, Stefano Stabellini <sstabell...@kernel.org> 
> wrote:
> 
> On Fri, 14 Jan 2022, Dario Faggioli wrote:
>> On Thu, 2022-01-06 at 17:52 -0800, Stefano Stabellini wrote:
>>> On Thu, 6 Jan 2022, Julien Grall wrote:
>>>> 
>>>> This issue and solution were discussed numerous time on the ML. In
>>>> short, we
>>>> want to tell the RCU that CPU running in guest context are always
>>>> quiesced.
>>>> For more details, you can read the previous thread (which also
>>>> contains a link
>>>> to the one before):
>>>> 
>>>> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Fxen-devel%2Ffe3dd9f0-b035-01fe-3e01-ddf065f182ab%40codiax.se%2F&amp;data=04%7C01%7CGeorge.Dunlap%40citrix.com%7Cb6795e0be3af416841a408d9d7a12030%7C335836de42ef43a2b145348c2ee9ca5b%7C0%7C0%7C637777909305566330%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=9%2BoiFfdK3rGAeWFSNCRu5aSuYgql1XZcaGJgT3aRsOA%3D&amp;reserved=0
>>> 
>>> Thanks Julien for the pointer!
>>> 
>>> Dario, I forward-ported your three patches to staging:
>>> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.com%2Fxen-project%2Fpeople%2Fsstabellini%2Fxen%2F-%2Ftree%2Frcu-quiet&amp;data=04%7C01%7CGeorge.Dunlap%40citrix.com%7Cb6795e0be3af416841a408d9d7a12030%7C335836de42ef43a2b145348c2ee9ca5b%7C0%7C0%7C637777909305566330%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=vrNN5KgwXj93ZThreIDNB7UgKJdPNz%2BoL98b%2FoopN8w%3D&amp;reserved=0
>>> 
>> Hi Stefano!
>> 
>> I definitely would like to see the end of this issue, so thanks a lot
>> for your interest and your help with the patches.
>> 
>>> I can confirm that they fix the bug. Note that I had to add a small
>>> change on top to remove the ASSERT at the beginning of
>>> rcu_quiet_enter:
>>> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.com%2Fxen-project%2Fpeople%2Fsstabellini%2Fxen%2F-%2Fcommit%2F6fc02b90814d3fe630715e353d16f397a5b280f9&amp;data=04%7C01%7CGeorge.Dunlap%40citrix.com%7Cb6795e0be3af416841a408d9d7a12030%7C335836de42ef43a2b145348c2ee9ca5b%7C0%7C0%7C637777909305566330%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=vjxT35b%2FglqzzA4DCLqTjbo0bAfOjtLcvN90OFs8U9Q%3D&amp;reserved=0
>>> 
>> Yeah, that should be fine.
>> 
>>> Would you be up for submitting them for upstreaming? I would prefer
>>> if
>>> you send out the patches because I cannot claim to understand them
>>> completely (except for the one doing renaming :-P )
>>> 
>> Haha! So, I am up for properly submitting, but there's one problem. As
>> you've probably got, the idea here is to use transitions toward the
>> guest and inside the hypervisor as RCU quiescence and "activation"
>> points.
>> 
>> Now, on ARM, that just meant calling rcu_quiet_exit() in
>> enter_hypervisor_from_guest() and calling rcu_quiet_enter() in
>> leave_hypervisor_to_guest(). Nice and easy, and even myself --and I'm
>> definitely not an ARM person-- cloud understand it (although with some
>> help from Julien) and put the patches together.
>> 
>> However, the problem is really arch independent and, despite not
>> surfacing equally frequently, it affects x86 as well. And for x86 the
>> situation is by far not equally nice, when it comes to identifying all
>> the places from where to call rcu_quiet_{enter,exit}().
>> 
>> And finding out where to put them, among the various functions that we
>> have in the various entry.S variants is where I stopped. The plan was
>> to get back to it, but as shamefully as it sounds, I could not do that
>> yet.
>> 
>> So, if anyone wants to help with this, handing over suggestions for
>> potential good spots, that would help a lot.
> 
> Unfortunately I cannot volunteer due to time and also because I wouldn't
> know where to look and I don't have a reproducer or a test environment
> on x86. I would be flying blind.
> 
> 
>> Alternatively, we can submit the series as ARM-only... But I fear that
>> the x86 side of things would then be easily forgotten. :-(
> 
> I agree with you on this, but at the same time we are having problems
> with customers in the field -- it is not like we can wait to solve the
> problem on ARM any longer. And the issue is certainly far less likely to
> happen on x86 (there is no vwfi=native, right?) In other words, I think
> it is better to have half of the solution now to solve the worst part of
> the problem than to wait more months for a full solution.

An x86 equivalent of vwfi=native could be implemented easily, but AFAIK nobody 
has asked for it yet.  I agree that we need to fix if for ARM, and so in the 
absence of someone with the time to fix up the x86 side, I think fixing 
ARM-only is the way to go.

It would be good if we could add appropriate comments warning anyone who 
implements `hlt=native` on x86 the problems they’ll face and how to fix them.  
Not sure the best place to do that; in the VMX / SVM code that sets the exit 
for HLT &c?

 -George

Reply via email to