sched: add fall back to idle vcpu when scheduling item

Juergen Gross Fri, 17 May 2019 00:50:19 -0700

On 17/05/2019 08:57, Jan Beulich wrote:
>>>> On 17.05.19 at 07:13, <jgr...@suse.com> wrote:
>> On 16/05/2019 16:41, Jan Beulich wrote:
>>>>>> On 16.05.19 at 15:51, <jgr...@suse.com> wrote:
>>>> On 16/05/2019 15:05, Jan Beulich wrote:
>>>>>>>> On 06.05.19 at 08:56, <jgr...@suse.com> wrote:
>>>>>> --- a/xen/arch/x86/domain.c
>>>>>> +++ b/xen/arch/x86/domain.c
>>>>>> @@ -154,6 +154,24 @@ static void idle_loop(void)
>>>>>>      }
>>>>>>  }
>>>>>>  
>>>>>> +/*
>>>>>> + * Idle loop for siblings of active schedule items.
>>>>>> + * We don't do any standard idle work like tasklets, page scrubbing or
>>>>>> + * livepatching.
>>>>>> + * Use default_idle() in order to simulate v->is_urgent.
>>>>>
>>>>> I guess I'm missing a part of the description which explains all this:
>>>>> What's wrong with doing scrubbing work, for example? Why is
>>>>> doing tasklet work not okay, but softirqs are? What is the deal with
>>>>> v->is_urgent, i.e. what justifies not entering a decent power
>>>>> saving mode here on Intel, but doing so on AMD?
>>>>
>>>> One of the reasons for using core scheduling is to avoid running vcpus
>>>> of different domains on the same core in order to minimize the chances
>>>> for side channel attacks to data of other domains. Not allowing
>>>> scrubbing or tasklets here is due to avoid accessing data of other
>>>> domains.
>>>
>>> So how is running softirqs okay then? And how is scrubbing accessing
>>> other domains' data?
>>
>> Right now I'm not sure whether it is a good idea to block any softirqs.
>> We definitely need to process scheduling requests and I believe RCU and
>> tasklets, too. The tlbflush one should be uncritical, so timers is the
>> remaining one which might be questionable. This can be fine-tuned later
>> IMO e.g. by defining a softirq mask of critical softirqs to block and
>> eventually splitting up e.g. timer and tasklet softirqs into critical
>> and uncritical ones.
> 
> Well, okay, but please add an abridged version of this to the patch
> description then.


Okay.

> 
>> Scrubbing will probably pull the cache lines of the dirty pages into
>> the L1 cache of the cpu. For me this sounds problematic. In case we
>> are fine to do scrubbing as there is no risk associated I'm fine to add
>> it back in.
> 
> Well, of course there's going to be a brief period of time where
> a cache line will be present in CPU internal buffers (it's not just the
> cache after all, as we've learned with XSA-297). So I can certainly
> buy that when using core granularity you don't want to scrub on
> the other thread. But what about the socket granularity case?
> Scrubbing on fully idle cores should still be fine, I would think.

I think this would depend on the reason for selecting socket scheduling.
I'd at least would want to have a way to select that as I could think of
e.g. L3-cache side channel attacks, too.

So maybe I could add a patch on top for adding a sub-option to the
sched-gran parameter which will allow (or disallow?) scrubbing on idle
cores or threads.

> 
>>>> As with core scheduling we can be sure the other thread is active
>>>> (otherwise we would schedule the idle item) and hoping for saving power
>>>> by using mwait is moot.
>>>
>>> Saving power may be indirect, by the CPU re-arranging
>>> resource assignment between threads when one goes idle.
>>> I have no idea whether they do this when entering C1, or
>>> only when entering deeper C states.
>>
>> SDM Vol. 3 chapter 8.10.1 "HLT instruction":
>>
>> "Here shared resources that were being used by the halted logical
>> processor become available to active logical processors, allowing them
>> to execute at greater efficiency."
> 
> To be honest, this is to broad/generic a statement to fully
> trust it, judging from other areas of the SDM. And then, as
> per above, what about the socket granularity case? Putting
> entirely idle cores to sleep is surely worthwhile?

Yes, I assume it is. OTOH this might affect context switches badly
as the reaction time for the coordinated switch will rise. Maybe a
good reason for another sub-option?

>>> And anyway - I'm still none the wiser as to the v->is_urgent
>>> relationship.
>>
>> With v->is_urgent set today's idle loop will drop into default_idle().
>> I can remove this sentence in case it is just confusing.
> 
> I'd prefer if the connection would become more obvious. One
> needs to go from ->is_urgent via ->urgent_count to
> sched_has_urgent_vcpu() to find where the described
> behavior really lives.
> 
> What's worse though: This won't work as intended on AMD
> at all. I don't think it's correct to fall back to default_idle() in
> this case. Instead sched_has_urgent_vcpu() returning true
> should amount to the same effect as max_cstate being set
> to 1. There's
> (a) no reason not to use MWAIT on Intel CPUs in this case,
> if MWAIT can enter C1, and
> (b) a strong need to use MWAIT on (at least) AMD Fam17,
> or else it won't be C1 that gets entered.
> I'll see about making a patch in due course.

Thanks. Would you mind doing it in a way that the caller can specify
max_cstate? This would remove the need to call sched_has_urgent_vcpu()
deep down the idle handling and I could re-use it for my purpose.


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH RFC V2 42/45] xen/sched: add fall back to idle vcpu when scheduling item

Reply via email to