Re: [PATCH] sched/fair: reduce preemption with IDLE tasks runable(Internet mail)

2020-08-20 Thread


> On Aug 20, 2020, at 3:35 PM, Vincent Guittot  
> wrote:
> 
> On Thu, 20 Aug 2020 at 02:13, benbjiang(蒋彪)  wrote:
>> 
>> 
>> 
>>> On Aug 19, 2020, at 10:55 PM, Vincent Guittot  
>>> wrote:
>>> 
>>> On Wed, 19 Aug 2020 at 16:27, benbjiang(蒋彪)  wrote:
>>>> 
>>>> 
>>>> 
>>>>> On Aug 19, 2020, at 7:55 PM, Dietmar Eggemann  
>>>>> wrote:
>>>>> 
>>>>> On 19/08/2020 13:05, Vincent Guittot wrote:
>>>>>> On Wed, 19 Aug 2020 at 12:46, Dietmar Eggemann 
>>>>>>  wrote:
>>>>>>> 
>>>>>>> On 17/08/2020 14:05, benbjiang(蒋彪) wrote:
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> On Aug 17, 2020, at 4:57 PM, Dietmar Eggemann 
>>>>>>>>>  wrote:
>>>>>>>>> 
>>>>>>>>> On 14/08/2020 01:55, benbjiang(蒋彪) wrote:
>>>>>>>>>> Hi,
>>>>>>>>>> 
>>>>>>>>>>> On Aug 13, 2020, at 2:39 AM, Dietmar Eggemann 
>>>>>>>>>>>  wrote:
>>>>>>>>>>> 
>>>>>>>>>>> On 12/08/2020 05:19, benbjiang(蒋彪) wrote:
>>>>>>>>>>>> Hi,
>>>>>>>>>>>> 
>>>>>>>>>>>>> On Aug 11, 2020, at 11:54 PM, Dietmar Eggemann 
>>>>>>>>>>>>>  wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On 11/08/2020 02:41, benbjiang(蒋彪) wrote:
>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On Aug 10, 2020, at 9:24 PM, Dietmar Eggemann 
>>>>>>>>>>>>>>>  wrote:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On 06/08/2020 17:52, benbjiang(蒋彪) wrote:
>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> On Aug 6, 2020, at 9:29 PM, Dietmar Eggemann 
>>>>>>>>>>>>>>>>>  wrote:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> On 03/08/2020 13:26, benbjiang(蒋彪) wrote:
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> On Aug 3, 2020, at 4:16 PM, Dietmar Eggemann 
>>>>>>>>>>>>>>>>>>>  wrote:
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> On 01/08/2020 04:32, Jiang Biao wrote:
>>>>>>>>>>>>>>>>>>>> From: Jiang Biao 
>>>>>>> 
>>>>>>> [...]
>>>>>>> 
>>>>>>>>> Are you sure about this?
>>>>>>>> Yes. :)
>>>>>>>>> 
>>>>>>>>> The math is telling me for the:
>>>>>>>>> 
>>>>>>>>> idle task:  (3 / (1024 + 1024 + 3))^(-1) * 4ms = 2735ms
>>>>>>>>> 
>>>>>>>>> normal task: (1024 / (1024 + 1024 + 3))^(-1) * 4ms =8ms
>>>>>>>>> 
>>>>>>>>> (4ms - 250 Hz)
>>>>>>>> My tick is 1ms - 1000HZ, which seems reasonable for 600ms? :)
>>>>>>> 
>>>>>>> OK, I see.
>>>>>>> 
>>>>>>> But here the different sched slices (check_preempt_tick()->
>>>>>>> sched_slice()) between normal tasks and the idle task play a role to.
>>>>>>> 
>>>>>>> Normal tasks get ~3ms whereas the idle task gets <0.01ms.
>>>>>> 
>>>>>> In fact that depends on the number of CPUs on the system
>>>>>> :sysctl_sched_latency = 6ms * (1 + ilog(ncpus)) . On a 8 cores system,
>>>>>> normal task will run around 12ms in one shoot and the idle task still
>>>>>> one tick period
>>>

Re: [PATCH] sched/fair: reduce preemption with IDLE tasks runable(Internet mail)

2020-08-19 Thread


> On Aug 19, 2020, at 10:55 PM, Vincent Guittot  
> wrote:
> 
> On Wed, 19 Aug 2020 at 16:27, benbjiang(蒋彪)  wrote:
>> 
>> 
>> 
>>> On Aug 19, 2020, at 7:55 PM, Dietmar Eggemann  
>>> wrote:
>>> 
>>> On 19/08/2020 13:05, Vincent Guittot wrote:
>>>> On Wed, 19 Aug 2020 at 12:46, Dietmar Eggemann  
>>>> wrote:
>>>>> 
>>>>> On 17/08/2020 14:05, benbjiang(蒋彪) wrote:
>>>>>> 
>>>>>> 
>>>>>>> On Aug 17, 2020, at 4:57 PM, Dietmar Eggemann 
>>>>>>>  wrote:
>>>>>>> 
>>>>>>> On 14/08/2020 01:55, benbjiang(蒋彪) wrote:
>>>>>>>> Hi,
>>>>>>>> 
>>>>>>>>> On Aug 13, 2020, at 2:39 AM, Dietmar Eggemann 
>>>>>>>>>  wrote:
>>>>>>>>> 
>>>>>>>>> On 12/08/2020 05:19, benbjiang(蒋彪) wrote:
>>>>>>>>>> Hi,
>>>>>>>>>> 
>>>>>>>>>>> On Aug 11, 2020, at 11:54 PM, Dietmar Eggemann 
>>>>>>>>>>>  wrote:
>>>>>>>>>>> 
>>>>>>>>>>> On 11/08/2020 02:41, benbjiang(蒋彪) wrote:
>>>>>>>>>>>> Hi,
>>>>>>>>>>>> 
>>>>>>>>>>>>> On Aug 10, 2020, at 9:24 PM, Dietmar Eggemann 
>>>>>>>>>>>>>  wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On 06/08/2020 17:52, benbjiang(蒋彪) wrote:
>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On Aug 6, 2020, at 9:29 PM, Dietmar Eggemann 
>>>>>>>>>>>>>>>  wrote:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On 03/08/2020 13:26, benbjiang(蒋彪) wrote:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> On Aug 3, 2020, at 4:16 PM, Dietmar Eggemann 
>>>>>>>>>>>>>>>>>  wrote:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> On 01/08/2020 04:32, Jiang Biao wrote:
>>>>>>>>>>>>>>>>>> From: Jiang Biao 
>>>>> 
>>>>> [...]
>>>>> 
>>>>>>> Are you sure about this?
>>>>>> Yes. :)
>>>>>>> 
>>>>>>> The math is telling me for the:
>>>>>>> 
>>>>>>> idle task:  (3 / (1024 + 1024 + 3))^(-1) * 4ms = 2735ms
>>>>>>> 
>>>>>>> normal task: (1024 / (1024 + 1024 + 3))^(-1) * 4ms =8ms
>>>>>>> 
>>>>>>> (4ms - 250 Hz)
>>>>>> My tick is 1ms - 1000HZ, which seems reasonable for 600ms? :)
>>>>> 
>>>>> OK, I see.
>>>>> 
>>>>> But here the different sched slices (check_preempt_tick()->
>>>>> sched_slice()) between normal tasks and the idle task play a role to.
>>>>> 
>>>>> Normal tasks get ~3ms whereas the idle task gets <0.01ms.
>>>> 
>>>> In fact that depends on the number of CPUs on the system
>>>> :sysctl_sched_latency = 6ms * (1 + ilog(ncpus)) . On a 8 cores system,
>>>> normal task will run around 12ms in one shoot and the idle task still
>>>> one tick period
>>> 
>>> True. This is on a single CPU.
>> Agree. :)
>> 
>>> 
>>>> Also, you can increase even more the period between 2 runs of idle
>>>> task by using cgroups and min shares value : 2
>>> 
>>> Ah yes, maybe this is what Jiang wants to do then? If his runtime does
>>> not have other requirements preventing this.
>> That could work for increasing the period between 2 runs. But could not
>> reduce the single runtime of idle task I guess, which means normal task
>> could have 1-tick schedule latency because of idle task.
> 
> Yes.  An idle task will preempt an always running task during 1 tick
> every 680ms. But also you should keep in mind that a waking normal
> task will preempt the idle task immediately which means that it will
> not add scheduling latency to a normal task but "steal" 0.14% of
> normal task throughput (1/680) at most
That’s true. But in the VM case, when VM are busy(MWAIT passthrough
or running cpu eating works), the 1-tick scheduling latency could be
detected by cyclictest running in the VM.

OTOH, we compensate vruntime in place_entity() to boot waking
without distinguish SCHED_IDLE task, do you think it’s necessary to
do that? like

--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4115,7 +4115,7 @@ place_entity(struct cfs_rq *cfs_rq, struct sched_entity 
*se, int initial)
vruntime += sched_vslice(cfs_rq, se);

/* sleeps up to a single latency don't count. */
-   if (!initial) {
+   if (!initial && likely(!task_has_idle_policy(task_of(se {
unsigned long thresh = sysctl_sched_latency;

> 
>> OTOH, cgroups(shares) could introduce extra complexity. :)
>> 
>> I wonder if there’s any possibility to make SCHED_IDLEs’ priorities 
>> absolutely
>> lower than SCHED_NORMAL(OTHER), which means no weights/shares
>> for them, and they run only when no other task’s runnable.
>> I guess there may be priority inversion issue if we do that. But maybe we
> 
> Exactly, that's why we must ensure a minimum running time for sched_idle task

Still for VM case, different VMs have been much isolated from each other,
priority inversion issue could be very rare, we’re trying to make offline tasks
absoultly harmless to online tasks. :)

Thanks a lot for your time.
Regards,
Jiang

> 
>> could avoid it by load-balance more aggressively, or it(priority inversion)
>> could be ignored in some special case.
>> 
>>> 



Re: [PATCH] sched/fair: reduce preemption with IDLE tasks runable(Internet mail)

2020-08-19 Thread


> On Aug 19, 2020, at 7:55 PM, Dietmar Eggemann  
> wrote:
> 
> On 19/08/2020 13:05, Vincent Guittot wrote:
>> On Wed, 19 Aug 2020 at 12:46, Dietmar Eggemann  
>> wrote:
>>> 
>>> On 17/08/2020 14:05, benbjiang(蒋彪) wrote:
>>>> 
>>>> 
>>>>> On Aug 17, 2020, at 4:57 PM, Dietmar Eggemann  
>>>>> wrote:
>>>>> 
>>>>> On 14/08/2020 01:55, benbjiang(蒋彪) wrote:
>>>>>> Hi,
>>>>>> 
>>>>>>> On Aug 13, 2020, at 2:39 AM, Dietmar Eggemann 
>>>>>>>  wrote:
>>>>>>> 
>>>>>>> On 12/08/2020 05:19, benbjiang(蒋彪) wrote:
>>>>>>>> Hi,
>>>>>>>> 
>>>>>>>>> On Aug 11, 2020, at 11:54 PM, Dietmar Eggemann 
>>>>>>>>>  wrote:
>>>>>>>>> 
>>>>>>>>> On 11/08/2020 02:41, benbjiang(蒋彪) wrote:
>>>>>>>>>> Hi,
>>>>>>>>>> 
>>>>>>>>>>> On Aug 10, 2020, at 9:24 PM, Dietmar Eggemann 
>>>>>>>>>>>  wrote:
>>>>>>>>>>> 
>>>>>>>>>>> On 06/08/2020 17:52, benbjiang(蒋彪) wrote:
>>>>>>>>>>>> Hi,
>>>>>>>>>>>> 
>>>>>>>>>>>>> On Aug 6, 2020, at 9:29 PM, Dietmar Eggemann 
>>>>>>>>>>>>>  wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On 03/08/2020 13:26, benbjiang(蒋彪) wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On Aug 3, 2020, at 4:16 PM, Dietmar Eggemann 
>>>>>>>>>>>>>>>  wrote:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On 01/08/2020 04:32, Jiang Biao wrote:
>>>>>>>>>>>>>>>> From: Jiang Biao 
>>> 
>>> [...]
>>> 
>>>>> Are you sure about this?
>>>> Yes. :)
>>>>> 
>>>>> The math is telling me for the:
>>>>> 
>>>>> idle task:  (3 / (1024 + 1024 + 3))^(-1) * 4ms = 2735ms
>>>>> 
>>>>> normal task: (1024 / (1024 + 1024 + 3))^(-1) * 4ms =8ms
>>>>> 
>>>>> (4ms - 250 Hz)
>>>> My tick is 1ms - 1000HZ, which seems reasonable for 600ms? :)
>>> 
>>> OK, I see.
>>> 
>>> But here the different sched slices (check_preempt_tick()->
>>> sched_slice()) between normal tasks and the idle task play a role to.
>>> 
>>> Normal tasks get ~3ms whereas the idle task gets <0.01ms.
>> 
>> In fact that depends on the number of CPUs on the system
>> :sysctl_sched_latency = 6ms * (1 + ilog(ncpus)) . On a 8 cores system,
>> normal task will run around 12ms in one shoot and the idle task still
>> one tick period
> 
> True. This is on a single CPU.
Agree. :)

> 
>> Also, you can increase even more the period between 2 runs of idle
>> task by using cgroups and min shares value : 2
> 
> Ah yes, maybe this is what Jiang wants to do then? If his runtime does
> not have other requirements preventing this.
That could work for increasing the period between 2 runs. But could not
reduce the single runtime of idle task I guess, which means normal task
could have 1-tick schedule latency because of idle task. 
OTOH, cgroups(shares) could introduce extra complexity. :)  

I wonder if there’s any possibility to make SCHED_IDLEs’ priorities absolutely
lower than SCHED_NORMAL(OTHER), which means no weights/shares
for them, and they run only when no other task’s runnable.
I guess there may be priority inversion issue if we do that. But maybe we
could avoid it by load-balance more aggressively, or it(priority inversion)
could be ignored in some special case.

Thx. 
Regard,
Jiang
 
> 
> [...]



Re: [PATCH] sched/fair: reduce preemption with IDLE tasks runable(Internet mail)

2020-08-19 Thread


> On Aug 19, 2020, at 6:46 PM, Dietmar Eggemann  
> wrote:
> 
> On 17/08/2020 14:05, benbjiang(蒋彪) wrote:
>> 
>> 
>>> On Aug 17, 2020, at 4:57 PM, Dietmar Eggemann  
>>> wrote:
>>> 
>>> On 14/08/2020 01:55, benbjiang(蒋彪) wrote:
>>>> Hi,
>>>> 
>>>>> On Aug 13, 2020, at 2:39 AM, Dietmar Eggemann  
>>>>> wrote:
>>>>> 
>>>>> On 12/08/2020 05:19, benbjiang(蒋彪) wrote:
>>>>>> Hi,
>>>>>> 
>>>>>>> On Aug 11, 2020, at 11:54 PM, Dietmar Eggemann 
>>>>>>>  wrote:
>>>>>>> 
>>>>>>> On 11/08/2020 02:41, benbjiang(蒋彪) wrote:
>>>>>>>> Hi,
>>>>>>>> 
>>>>>>>>> On Aug 10, 2020, at 9:24 PM, Dietmar Eggemann 
>>>>>>>>>  wrote:
>>>>>>>>> 
>>>>>>>>> On 06/08/2020 17:52, benbjiang(蒋彪) wrote:
>>>>>>>>>> Hi, 
>>>>>>>>>> 
>>>>>>>>>>> On Aug 6, 2020, at 9:29 PM, Dietmar Eggemann 
>>>>>>>>>>>  wrote:
>>>>>>>>>>> 
>>>>>>>>>>> On 03/08/2020 13:26, benbjiang(蒋彪) wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>>> On Aug 3, 2020, at 4:16 PM, Dietmar Eggemann 
>>>>>>>>>>>>>  wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On 01/08/2020 04:32, Jiang Biao wrote:
>>>>>>>>>>>>>> From: Jiang Biao 
> 
> [...]
> 
>>> Are you sure about this?
>> Yes. :)
>>> 
>>> The math is telling me for the:
>>> 
>>> idle task:  (3 / (1024 + 1024 + 3))^(-1) * 4ms = 2735ms
>>> 
>>> normal task: (1024 / (1024 + 1024 + 3))^(-1) * 4ms =8ms
>>> 
>>> (4ms - 250 Hz)
>> My tick is 1ms - 1000HZ, which seems reasonable for 600ms? :)
> 
> OK, I see.
> 
> But here the different sched slices (check_preempt_tick()->
> sched_slice()) between normal tasks and the idle task play a role to.
> 
> Normal tasks get ~3ms whereas the idle task gets <0.01ms.
> 
> So the idle task runs every ~680ms but only for 1 tick (1ms) (4 times
> less than the normal tasks). The condition 'if (delta_exec >
> ideal_runtime)' in check_preempt_tick() is only true at the 4th tick
> when a normal task runs even though the slice is 3ms.
> 
> In the 250 Hz example the sched slice diffs are hidden behind the 4ms tick.
Exactly. Tick size decides the single runtime and the interval between
two runs of idle task. 
That makes sense for most of the test results above.

Thx.
Regards,
Jiang




Re: [PATCH] sched/fair: reduce preemption with IDLE tasks runable(Internet mail)

2020-08-17 Thread


> On Aug 17, 2020, at 4:57 PM, Dietmar Eggemann  
> wrote:
> 
> On 14/08/2020 01:55, benbjiang(蒋彪) wrote:
>> Hi,
>> 
>>> On Aug 13, 2020, at 2:39 AM, Dietmar Eggemann  
>>> wrote:
>>> 
>>> On 12/08/2020 05:19, benbjiang(蒋彪) wrote:
>>>> Hi,
>>>> 
>>>>> On Aug 11, 2020, at 11:54 PM, Dietmar Eggemann  
>>>>> wrote:
>>>>> 
>>>>> On 11/08/2020 02:41, benbjiang(蒋彪) wrote:
>>>>>> Hi,
>>>>>> 
>>>>>>> On Aug 10, 2020, at 9:24 PM, Dietmar Eggemann 
>>>>>>>  wrote:
>>>>>>> 
>>>>>>> On 06/08/2020 17:52, benbjiang(蒋彪) wrote:
>>>>>>>> Hi, 
>>>>>>>> 
>>>>>>>>> On Aug 6, 2020, at 9:29 PM, Dietmar Eggemann 
>>>>>>>>>  wrote:
>>>>>>>>> 
>>>>>>>>> On 03/08/2020 13:26, benbjiang(蒋彪) wrote:
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>> On Aug 3, 2020, at 4:16 PM, Dietmar Eggemann 
>>>>>>>>>>>  wrote:
>>>>>>>>>>> 
>>>>>>>>>>> On 01/08/2020 04:32, Jiang Biao wrote:
>>>>>>>>>>>> From: Jiang Biao 
> 
> [...]
> 
>>>> ** 2normal+1idle: idle preempt normal every 600+ms **
>>> 
>>> During the 3.2s the 2 SCHED_OTHER tasks run, the SCHED_IDLE task is
>>> switched in only once, after ~2.5s.
>> Use your config with loop increased from 200 to 2000, to observe longer,
>> 
>>   <...>-37620 [002] d... 47950.446191: sched_switch: 
>> prev_comm=task_other-1 prev_pid=37620 prev_prio=120 prev_state=S ==> 
>> next_comm=task_idle-2 next_pid=37621 next_prio=120
>>   <...>-37619 [002] d... 47955.687709: sched_switch: 
>> prev_comm=task_other-0 prev_pid=37619 prev_prio=120 prev_state=R ==> 
>> next_comm=task_idle-2 next_pid=37621 next_prio=120
>> // The first preemption interval is 5.2s.
>>   <...>-37620 [002] d... 47956.375716: sched_switch: 
>> prev_comm=task_other-1 prev_pid=37620 prev_prio=120 prev_state=R ==> 
>> next_comm=task_idle-2 next_pid=37621 next_prio=120
>>   <...>-37619 [002] d... 47957.060722: sched_switch: 
>> prev_comm=task_other-0 prev_pid=37619 prev_prio=120 prev_state=R ==> 
>> next_comm=task_idle-2 next_pid=37621 next_prio=120
>>   <...>-37620 [002] d... 47957.747728: sched_switch: 
>> prev_comm=task_other-1 prev_pid=37620 prev_prio=120 prev_state=R ==> 
>> next_comm=task_idle-2 next_pid=37621 next_prio=120
>>  <...>-37620 [002] d... 47958.423734: sched_switch: 
>> prev_comm=task_other-1 prev_pid=37620 prev_prio=120 prev_state=R ==> 
>> next_comm=task_idle-2 next_pid=37621 next_prio=120
>>   <...>-37620 [002] d... 47959.119740: sched_switch: 
>> prev_comm=task_other-1 prev_pid=37620 prev_prio=120 prev_state=R ==> 
>> next_comm=task_idle-2 next_pid=37621 next_prio=120
>> // After the first preemption, the rest preemption intervals are all about 
>> 600ms+. :)
> 
> Are you sure about this?
Yes. :)
> 
> The math is telling me for the:
> 
> idle task:  (3 / (1024 + 1024 + 3))^(-1) * 4ms = 2735ms
> 
> normal task: (1024 / (1024 + 1024 + 3))^(-1) * 4ms =8ms
> 
> (4ms - 250 Hz)
My tick is 1ms - 1000HZ, which seems reasonable for 600ms? :)

Thx.
Regards,
Jiang
> 
>>>> ** 3normal+idle: idle preempt normal every 1000+ms **
>>> 
>>> Ah, this was meant to be 3 SCHED_OTHER tasks only! To see the difference
>>> in behavior.
>> With 3 SCHED_OHTER tasks only, the SCHED_OHTER task is switched in
>> Every 27ms.
> 
> normal task: (1024 / (1024 + 1024 + 1024))^(-1) * 4ms = 12ms
> 
>>>> ** 2normal(nice 19)+1idle(nice 0): idle preempt normal every 30+ms **
>>> 
>>> During the 3.2s the 2 SCHED_OTHER tasks run, the SCHED_IDLE task is
>>> switched in every ~45ms.
>> That’s as what I expected. :)
> 
> idle task:(3 / (15 + 15 + 3))^(-1) * 4ms = 44ms
> 
> normal task: (15 / (15 + 15 + 3))^(-1) * 4ms =  9ms



Re: [RFC PATCH 00/16] Core scheduling v6(Internet mail)

2020-08-14 Thread
Hi,

> On Aug 14, 2020, at 1:18 PM, Li, Aubrey  wrote:
> 
> On 2020/8/14 12:04, benbjiang(蒋彪) wrote:
>> 
>> 
>>> On Aug 14, 2020, at 9:36 AM, Li, Aubrey  wrote:
>>> 
>>> On 2020/8/14 8:26, benbjiang(蒋彪) wrote:
>>>> 
>>>> 
>>>>> On Aug 13, 2020, at 12:28 PM, Li, Aubrey  
>>>>> wrote:
>>>>> 
>>>>> On 2020/8/13 7:08, Joel Fernandes wrote:
>>>>>> On Wed, Aug 12, 2020 at 10:01:24AM +0800, Li, Aubrey wrote:
>>>>>>> Hi Joel,
>>>>>>> 
>>>>>>> On 2020/8/10 0:44, Joel Fernandes wrote:
>>>>>>>> Hi Aubrey,
>>>>>>>> 
>>>>>>>> Apologies for replying late as I was still looking into the details.
>>>>>>>> 
>>>>>>>> On Wed, Aug 05, 2020 at 11:57:20AM +0800, Li, Aubrey wrote:
>>>>>>>> [...]
>>>>>>>>> +/*
>>>>>>>>> + * Core scheduling policy:
>>>>>>>>> + * - CORE_SCHED_DISABLED: core scheduling is disabled.
>>>>>>>>> + * - CORE_COOKIE_MATCH: tasks with same cookie can run
>>>>>>>>> + * on the same core concurrently.
>>>>>>>>> + * - CORE_COOKIE_TRUST: trusted task can run with kernel
>>>>>>>>>   thread on the same core concurrently. 
>>>>>>>>> + * - CORE_COOKIE_LONELY: tasks with cookie can run only
>>>>>>>>> + * with idle thread on the same core.
>>>>>>>>> + */
>>>>>>>>> +enum coresched_policy {
>>>>>>>>> +   CORE_SCHED_DISABLED,
>>>>>>>>> +   CORE_SCHED_COOKIE_MATCH,
>>>>>>>>> + CORE_SCHED_COOKIE_TRUST,
>>>>>>>>> +   CORE_SCHED_COOKIE_LONELY,
>>>>>>>>> +};
>>>>>>>>> 
>>>>>>>>> We can set policy to CORE_COOKIE_TRUST of uperf cgroup and fix this 
>>>>>>>>> kind
>>>>>>>>> of performance regression. Not sure if this sounds attractive?
>>>>>>>> 
>>>>>>>> Instead of this, I think it can be something simpler IMHO:
>>>>>>>> 
>>>>>>>> 1. Consider all cookie-0 task as trusted. (Even right now, if you 
>>>>>>>> apply the
>>>>>>>> core-scheduling patchset, such tasks will share a core and sniff on 
>>>>>>>> each
>>>>>>>> other. So let us not pretend that such tasks are not trusted).
>>>>>>>> 
>>>>>>>> 2. All kernel threads and idle task would have a cookie 0 (so that 
>>>>>>>> will cover
>>>>>>>> ksoftirqd reported in your original issue).
>>>>>>>> 
>>>>>>>> 3. Add a config option (CONFIG_SCHED_CORE_DEFAULT_TASKS_UNTRUSTED). 
>>>>>>>> Default
>>>>>>>> enable it. Setting this option would tag all tasks that are forked 
>>>>>>>> from a
>>>>>>>> cookie-0 task with their own cookie. Later on, such tasks can be added 
>>>>>>>> to
>>>>>>>> a group. This cover's PeterZ's ask about having 'default untrusted').
>>>>>>>> (Users like ChromeOS that don't want to userspace system processes to 
>>>>>>>> be
>>>>>>>> tagged can disable this option so such tasks will be cookie-0).
>>>>>>>> 
>>>>>>>> 4. Allow prctl/cgroup interfaces to create groups of tasks and 
>>>>>>>> override the
>>>>>>>> above behaviors.
>>>>>>> 
>>>>>>> How does uperf in a cgroup work with ksoftirqd? Are you suggesting I 
>>>>>>> set uperf's
>>>>>>> cookie to be cookie-0 via prctl?
>>>>>> 
>>>>>> Yes, but let me try to understand better. There are 2 problems here I 
>>>>>> think:
>>>>>> 
>>>>>> 1. ksoftirqd getting idled when HT is turned on, because uperf is 
>>>>>> sharing a
>>>>>> core with it: This should not be any worse than SMT OFF, because even 
>>>>>> SM

Re: [RFC PATCH 00/16] Core scheduling v6(Internet mail)

2020-08-13 Thread


> On Aug 14, 2020, at 9:36 AM, Li, Aubrey  wrote:
> 
> On 2020/8/14 8:26, benbjiang(蒋彪) wrote:
>> 
>> 
>>> On Aug 13, 2020, at 12:28 PM, Li, Aubrey  wrote:
>>> 
>>> On 2020/8/13 7:08, Joel Fernandes wrote:
>>>> On Wed, Aug 12, 2020 at 10:01:24AM +0800, Li, Aubrey wrote:
>>>>> Hi Joel,
>>>>> 
>>>>> On 2020/8/10 0:44, Joel Fernandes wrote:
>>>>>> Hi Aubrey,
>>>>>> 
>>>>>> Apologies for replying late as I was still looking into the details.
>>>>>> 
>>>>>> On Wed, Aug 05, 2020 at 11:57:20AM +0800, Li, Aubrey wrote:
>>>>>> [...]
>>>>>>> +/*
>>>>>>> + * Core scheduling policy:
>>>>>>> + * - CORE_SCHED_DISABLED: core scheduling is disabled.
>>>>>>> + * - CORE_COOKIE_MATCH: tasks with same cookie can run
>>>>>>> + * on the same core concurrently.
>>>>>>> + * - CORE_COOKIE_TRUST: trusted task can run with kernel
>>>>>>> thread on the same core concurrently. 
>>>>>>> + * - CORE_COOKIE_LONELY: tasks with cookie can run only
>>>>>>> + * with idle thread on the same core.
>>>>>>> + */
>>>>>>> +enum coresched_policy {
>>>>>>> +   CORE_SCHED_DISABLED,
>>>>>>> +   CORE_SCHED_COOKIE_MATCH,
>>>>>>> +   CORE_SCHED_COOKIE_TRUST,
>>>>>>> +   CORE_SCHED_COOKIE_LONELY,
>>>>>>> +};
>>>>>>> 
>>>>>>> We can set policy to CORE_COOKIE_TRUST of uperf cgroup and fix this kind
>>>>>>> of performance regression. Not sure if this sounds attractive?
>>>>>> 
>>>>>> Instead of this, I think it can be something simpler IMHO:
>>>>>> 
>>>>>> 1. Consider all cookie-0 task as trusted. (Even right now, if you apply 
>>>>>> the
>>>>>>  core-scheduling patchset, such tasks will share a core and sniff on each
>>>>>>  other. So let us not pretend that such tasks are not trusted).
>>>>>> 
>>>>>> 2. All kernel threads and idle task would have a cookie 0 (so that will 
>>>>>> cover
>>>>>>  ksoftirqd reported in your original issue).
>>>>>> 
>>>>>> 3. Add a config option (CONFIG_SCHED_CORE_DEFAULT_TASKS_UNTRUSTED). 
>>>>>> Default
>>>>>>  enable it. Setting this option would tag all tasks that are forked from 
>>>>>> a
>>>>>>  cookie-0 task with their own cookie. Later on, such tasks can be added 
>>>>>> to
>>>>>>  a group. This cover's PeterZ's ask about having 'default untrusted').
>>>>>>  (Users like ChromeOS that don't want to userspace system processes to be
>>>>>>  tagged can disable this option so such tasks will be cookie-0).
>>>>>> 
>>>>>> 4. Allow prctl/cgroup interfaces to create groups of tasks and override 
>>>>>> the
>>>>>>  above behaviors.
>>>>> 
>>>>> How does uperf in a cgroup work with ksoftirqd? Are you suggesting I set 
>>>>> uperf's
>>>>> cookie to be cookie-0 via prctl?
>>>> 
>>>> Yes, but let me try to understand better. There are 2 problems here I 
>>>> think:
>>>> 
>>>> 1. ksoftirqd getting idled when HT is turned on, because uperf is sharing a
>>>> core with it: This should not be any worse than SMT OFF, because even SMT 
>>>> OFF
>>>> would also reduce ksoftirqd's CPU time just core sched is doing. Sure
>>>> core-scheduling adds some overhead with IPIs but such a huge drop of perf 
>>>> is
>>>> strange. Peter any thoughts on that?
>>>> 
>>>> 2. Interface: To solve the performance problem, you are saying you want 
>>>> uperf
>>>> to share a core with ksoftirqd so that it is not forced into idle.  Why not
>>>> just keep uperf out of the cgroup?
>>> 
>>> I guess this is unacceptable for who runs their apps in container and vm.
>> IMHO,  just as Joel proposed, 
>> 1. Consider all cookie-0 task as trusted.
>> 2. All kernel threads and idle task would have a cookie 0 
>> In that way, all tasks with cookies(including uperf in a cgroup) could run
>> concurrently with kernel threads.
>> That could be a good solution for the issue. :)
> 
> From uperf point of review, it can trust cookie-0(I assume we still need
> some modifications to change cookie-match to cookie-compatible to allow
> ZERO and NONZERO run together).
> 
> But from kernel thread point of review, it can NOT trust uperf, unless
> we set uperf's cookie to 0.
That’s right. :)
Could we set the cookie of cgroup where uperf lies to 0?

Thx.
Regards,
Jiang

> 
> Thanks,
> -Aubrey
> 



Re: [RFC PATCH 00/16] Core scheduling v6(Internet mail)

2020-08-13 Thread


> On Aug 13, 2020, at 12:28 PM, Li, Aubrey  wrote:
> 
> On 2020/8/13 7:08, Joel Fernandes wrote:
>> On Wed, Aug 12, 2020 at 10:01:24AM +0800, Li, Aubrey wrote:
>>> Hi Joel,
>>> 
>>> On 2020/8/10 0:44, Joel Fernandes wrote:
 Hi Aubrey,
 
 Apologies for replying late as I was still looking into the details.
 
 On Wed, Aug 05, 2020 at 11:57:20AM +0800, Li, Aubrey wrote:
 [...]
> +/*
> + * Core scheduling policy:
> + * - CORE_SCHED_DISABLED: core scheduling is disabled.
> + * - CORE_COOKIE_MATCH: tasks with same cookie can run
> + * on the same core concurrently.
> + * - CORE_COOKIE_TRUST: trusted task can run with kernel
>   thread on the same core concurrently. 
> + * - CORE_COOKIE_LONELY: tasks with cookie can run only
> + * with idle thread on the same core.
> + */
> +enum coresched_policy {
> +   CORE_SCHED_DISABLED,
> +   CORE_SCHED_COOKIE_MATCH,
> + CORE_SCHED_COOKIE_TRUST,
> +   CORE_SCHED_COOKIE_LONELY,
> +};
> 
> We can set policy to CORE_COOKIE_TRUST of uperf cgroup and fix this kind
> of performance regression. Not sure if this sounds attractive?
 
 Instead of this, I think it can be something simpler IMHO:
 
 1. Consider all cookie-0 task as trusted. (Even right now, if you apply the
   core-scheduling patchset, such tasks will share a core and sniff on each
   other. So let us not pretend that such tasks are not trusted).
 
 2. All kernel threads and idle task would have a cookie 0 (so that will 
 cover
   ksoftirqd reported in your original issue).
 
 3. Add a config option (CONFIG_SCHED_CORE_DEFAULT_TASKS_UNTRUSTED). Default
   enable it. Setting this option would tag all tasks that are forked from a
   cookie-0 task with their own cookie. Later on, such tasks can be added to
   a group. This cover's PeterZ's ask about having 'default untrusted').
   (Users like ChromeOS that don't want to userspace system processes to be
   tagged can disable this option so such tasks will be cookie-0).
 
 4. Allow prctl/cgroup interfaces to create groups of tasks and override the
   above behaviors.
>>> 
>>> How does uperf in a cgroup work with ksoftirqd? Are you suggesting I set 
>>> uperf's
>>> cookie to be cookie-0 via prctl?
>> 
>> Yes, but let me try to understand better. There are 2 problems here I think:
>> 
>> 1. ksoftirqd getting idled when HT is turned on, because uperf is sharing a
>> core with it: This should not be any worse than SMT OFF, because even SMT OFF
>> would also reduce ksoftirqd's CPU time just core sched is doing. Sure
>> core-scheduling adds some overhead with IPIs but such a huge drop of perf is
>> strange. Peter any thoughts on that?
>> 
>> 2. Interface: To solve the performance problem, you are saying you want uperf
>> to share a core with ksoftirqd so that it is not forced into idle.  Why not
>> just keep uperf out of the cgroup?
> 
> I guess this is unacceptable for who runs their apps in container and vm.
IMHO,  just as Joel proposed, 
1. Consider all cookie-0 task as trusted.
2. All kernel threads and idle task would have a cookie 0 
In that way, all tasks with cookies(including uperf in a cgroup) could run
concurrently with kernel threads.
That could be a good solution for the issue. :)

If with CONFIG_SCHED_CORE_DEFAULT_TASKS_UNTRUSTED enabled,
maybe we should set ksoftirqd’s cookie to be cookie-0 to solve the issue. 

Thx.
Regards,
Jiang
> 
> Thanks,
> -Aubrey
> 
>> Then it will have cookie 0 and be able to
>> share core with kernel threads. About user-user isolation that you need, if
>> you tag any "untrusted" threads by adding it to CGroup, then there will
>> automatically isolated from uperf while allowing uperf to share CPU with
>> kernel threads.
>> 
>> Please let me know your thoughts and thanks,
>> 
>> - Joel
>> 
>>> 
>>> Thanks,
>>> -Aubrey
 
 5. Document everything clearly so the semantics are clear both to the
   developers of core scheduling and to system administrators.
 
 Note that, with the concept of "system trusted cookie", we can also do
 optimizations like:
 1. Disable STIBP when switching into trusted tasks.
 2. Disable L1D flushing / verw stuff for L1TF/MDS issues, when switching 
 into
   trusted tasks.
 
 At least #1 seems to be biting enabling HT on ChromeOS right now, and one
 other engineer requested I do something like #2 already.
 
 Once we get full-syscall isolation working, threads belonging to a process
 can also share a core so those can just share a core with the task-group
 leader.
 
>> Is the uperf throughput worse with SMT+core-scheduling versus no-SMT ?
> 
> This is a good question, from the data we measured by uperf,
> SMT+core-scheduling is 28.2% worse than no-SMT, :(
 
 This is worrying for 

Re: [PATCH] sched/fair: reduce preemption with IDLE tasks runable(Internet mail)

2020-08-13 Thread
Hi,

> On Aug 13, 2020, at 2:39 AM, Dietmar Eggemann  
> wrote:
> 
> On 12/08/2020 05:19, benbjiang(蒋彪) wrote:
>> Hi,
>> 
>>> On Aug 11, 2020, at 11:54 PM, Dietmar Eggemann  
>>> wrote:
>>> 
>>> On 11/08/2020 02:41, benbjiang(蒋彪) wrote:
>>>> Hi,
>>>> 
>>>>> On Aug 10, 2020, at 9:24 PM, Dietmar Eggemann  
>>>>> wrote:
>>>>> 
>>>>> On 06/08/2020 17:52, benbjiang(蒋彪) wrote:
>>>>>> Hi, 
>>>>>> 
>>>>>>> On Aug 6, 2020, at 9:29 PM, Dietmar Eggemann  
>>>>>>> wrote:
>>>>>>> 
>>>>>>> On 03/08/2020 13:26, benbjiang(蒋彪) wrote:
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> On Aug 3, 2020, at 4:16 PM, Dietmar Eggemann 
>>>>>>>>>  wrote:
>>>>>>>>> 
>>>>>>>>> On 01/08/2020 04:32, Jiang Biao wrote:
>>>>>>>>>> From: Jiang Biao 
> 
> [...]
> 
>>> Trace a run of 2 SCHED_OTHER (nice 0) tasks and 1 SCHED_IDLE task on a
>>> single CPU and trace_printk the conditions  'if (delta < 0)' and ' if
>>> (delta > ideal_runtime)' in check_preempt_tick().
>>> 
>>> Then do the same with 3 SCHED_OTHER (nice 0) tasks. You can also change
>>> the niceness of the 2 SCHED_OTHER task to 19 to see some differences in
>>> the kernelshark's task layout.
>>> 
>>> rt-app (https://github.com/scheduler-tools/rt-app) is a nice tool to
>>> craft those artificial use cases.
>> With rt-app tool, sched_switch traced by ftrace, the result is as what I 
>> expected,
> 
> I use: 
> 
> {
>"tasks" : {
>"task_other" : {
>"instance" : 2,
>"loop" : 200,
>"policy" : "SCHED_OTHER",
>"run" : 8000,
>"timer" : { "ref" : "unique1" , "period" : 16000, 
> "mode" : "absolute" },
>"priority" : 0
>},
>"task_idle" : {
>"instance" : 1,
>"loop" : 200,
>"policy" : "SCHED_IDLE",
>"run" : 8000,
>"timer" : { "ref" : "unique2" , "period" : 16000, 
> "mode" : "absolute" }
>}
>},
>"global" : {
>"calibration" : 243, <-- Has to be calibrated against the CPU 
> you run on !!!
>"default_policy" : "SCHED_OTHER",
>"duration" : -1
>}
> }
> 
> to have 2 (periodic) SCHED_OTHER and 1 SCHED_IDLE task. 
> 
>> ** 2normal+1idle: idle preempt normal every 600+ms **
> 
> During the 3.2s the 2 SCHED_OTHER tasks run, the SCHED_IDLE task is
> switched in only once, after ~2.5s.
Use your config with loop increased from 200 to 2000, to observe longer,

   <...>-37620 [002] d... 47950.446191: sched_switch: 
prev_comm=task_other-1 prev_pid=37620 prev_prio=120 prev_state=S ==> 
next_comm=task_idle-2 next_pid=37621 next_prio=120
   <...>-37619 [002] d... 47955.687709: sched_switch: 
prev_comm=task_other-0 prev_pid=37619 prev_prio=120 prev_state=R ==> 
next_comm=task_idle-2 next_pid=37621 next_prio=120
// The first preemption interval is 5.2s.
   <...>-37620 [002] d... 47956.375716: sched_switch: 
prev_comm=task_other-1 prev_pid=37620 prev_prio=120 prev_state=R ==> 
next_comm=task_idle-2 next_pid=37621 next_prio=120
   <...>-37619 [002] d... 47957.060722: sched_switch: 
prev_comm=task_other-0 prev_pid=37619 prev_prio=120 prev_state=R ==> 
next_comm=task_idle-2 next_pid=37621 next_prio=120
   <...>-37620 [002] d... 47957.747728: sched_switch: 
prev_comm=task_other-1 prev_pid=37620 prev_prio=120 prev_state=R ==> 
next_comm=task_idle-2 next_pid=37621 next_prio=120
  <...>-37620 [002] d... 47958.423734: sched_switch: 
prev_comm=task_other-1 prev_pid=37620 prev_prio=120 prev_state=R ==> 
next_comm=task_idle-2 next_pid=37621 next_prio=120
   <...>-37620 [002] d... 47959.119740: sched_switch: 
prev_comm=task_other-1 prev_pid=37620 prev_prio=120 prev_state=R ==> 
next_comm=task_idle-2 next_pid=37621 next_prio=120
// After the first preemption, the rest preemption intervals are all about 
600ms+. :)

> 
>> ** 3normal+idle: idle preempt normal every 1000+ms **
> 
> Ah, this was meant to be 3 SCHED_OTHER tasks only! To see the difference
> in behavior.
With 3 SCHED_OHTER tasks only, the SCHED_OHTER task is switched in
Every 27ms.

> 
>> ** 2normal(nice 19)+1idle(nice 0): idle preempt normal every 30+ms **
> 
> During the 3.2s the 2 SCHED_OTHER tasks run, the SCHED_IDLE task is
> switched in every ~45ms.
That’s as what I expected. :)

Thx.
Regards,
Jiang
> 
> [...]



Re: [PATCH] sched/fair: reduce preemption with IDLE tasks runable(Internet mail)

2020-08-11 Thread
Hi,

> On Aug 11, 2020, at 11:54 PM, Dietmar Eggemann  
> wrote:
> 
> On 11/08/2020 02:41, benbjiang(蒋彪) wrote:
>> Hi,
>> 
>>> On Aug 10, 2020, at 9:24 PM, Dietmar Eggemann  
>>> wrote:
>>> 
>>> On 06/08/2020 17:52, benbjiang(蒋彪) wrote:
>>>> Hi, 
>>>> 
>>>>> On Aug 6, 2020, at 9:29 PM, Dietmar Eggemann  
>>>>> wrote:
>>>>> 
>>>>> On 03/08/2020 13:26, benbjiang(蒋彪) wrote:
>>>>>> 
>>>>>> 
>>>>>>> On Aug 3, 2020, at 4:16 PM, Dietmar Eggemann  
>>>>>>> wrote:
>>>>>>> 
>>>>>>> On 01/08/2020 04:32, Jiang Biao wrote:
>>>>>>>> From: Jiang Biao 
> 
> [...]
> 
>>> Because of this very small weight (weight=3), compared to a SCHED_NORMAL
>>> nice 0 task (weight=1024), a SCHED_IDLE task is penalized by a huge
>>> se->vruntime value (1024/3 higher than for a SCHED_NORMAL nice 0 task).
>>> This should make sure it doesn't tick preempt a SCHED_NORMAL nice 0 task.
>> Could you please explain how the huge penalization of vruntime(1024/3) could
>> make sure SCHED_IDLE not tick preempting SCHED_NORMAL nice 0 task?
>> 
>> Thanks a lot.
> 
> Trace a run of 2 SCHED_OTHER (nice 0) tasks and 1 SCHED_IDLE task on a
> single CPU and trace_printk the conditions  'if (delta < 0)' and ' if
> (delta > ideal_runtime)' in check_preempt_tick().
> 
> Then do the same with 3 SCHED_OTHER (nice 0) tasks. You can also change
> the niceness of the 2 SCHED_OTHER task to 19 to see some differences in
> the kernelshark's task layout.
> 
> rt-app (https://github.com/scheduler-tools/rt-app) is a nice tool to
> craft those artificial use cases.
With rt-app tool, sched_switch traced by ftrace, the result is as what I 
expected,

** 1normal+1idle: idle preempt normal every 200ms **
   <...>-92016 [002] d... 2398066.902477: sched_switch: 
prev_comm=normal0-0 prev_pid=92016 prev_prio=120 prev_state=S ==> 
next_comm=idle0-0 next_pid=91814 next_prio=120
   <...>-91814 [002] d... 2398066.902527: sched_switch: 
prev_comm=idle0-0 prev_pid=91814 prev_prio=120 prev_state=R ==> 
next_comm=normal0-0 next_pid=92016 next_prio=120
   <...>-92016 [002] d... 2398066.922472: sched_switch: 
prev_comm=normal0-0 prev_pid=92016 prev_prio=120 prev_state=S ==> 
next_comm=idle0-0 next_pid=91814 next_prio=120
   <...>-91814 [002] d... 2398066.922522: sched_switch: 
prev_comm=idle0-0 prev_pid=91814 prev_prio=120 prev_state=R ==> 
next_comm=normal0-0 next_pid=92016 next_prio=120
   <...>-92016 [002] d... 2398066.942292: sched_switch: 
prev_comm=normal0-0 prev_pid=92016 prev_prio=120 prev_state=S ==> 
next_comm=idle0-0 next_pid=91814 next_prio=120
   <...>-91814 [002] d... 2398066.942343: sched_switch: 
prev_comm=idle0-0 prev_pid=91814 prev_prio=120 prev_state=R ==> 
next_comm=normal0-0 next_pid=92016 next_prio=120
   <...>-92016 [002] d... 2398066.962331: sched_switch: 
prev_comm=normal0-0 prev_pid=92016 prev_prio=120 prev_state=S ==> 
next_comm=idle0-0 next_pid=91814 next_prio=120

** 2normal+1idle: idle preempt normal every 600+ms **
<...>-49009 [002] d... 2400562.746640: sched_switch: 
prev_comm=normal0-0 prev_pid=49009 prev_prio=120 prev_state=R ==> 
next_comm=idle0-0 next_pid=187466 next_prio=120
   <...>-187466 [002] d... 2400562.747502: sched_switch: 
prev_comm=idle0-0 prev_pid=187466 prev_prio=120 prev_state=S ==> 
next_comm=normal1-0 next_pid=198658 next_prio=120
   <...>-198658 [002] d... 2400563.335262: sched_switch: 
prev_comm=normal1-0 prev_pid=198658 prev_prio=120 prev_state=R ==> 
next_comm=idle0-0 next_pid=187466 next_prio=120
   <...>-187466 [002] d... 2400563.336258: sched_switch: 
prev_comm=idle0-0 prev_pid=187466 prev_prio=120 prev_state=R ==> 
next_comm=normal0-0 next_pid=49009 next_prio=120
   <...>-198658 [002] d... 2400564.017663: sched_switch: 
prev_comm=normal1-0 prev_pid=198658 prev_prio=120 prev_state=R ==> 
next_comm=idle0-0 next_pid=187466 next_prio=120
   <...>-187466 [002] d... 2400564.018661: sched_switch: 
prev_comm=idle0-0 prev_pid=187466 prev_prio=120 prev_state=R ==> 
next_comm=normal0-0 next_pid=49009 next_prio=120
   <...>-198658 [002] d... 2400564.701063: sched_switch: 
prev_comm=normal1-0 prev_pid=198658 prev_prio=120 prev_state=R ==> 
next_comm=idle0-0 next_pid=187466 next_prio=120

** 3normal+idle: idle preempt normal every 1000+ms **
   <...>-198658 [002] d... 2400415.780701: sched_switch: 
prev_comm=normal1-0 prev_pid=198658 prev_prio=120 prev_state=R ==> 
next_comm=idle0-0 next_pid

Re: [PATCH] sched/fair: Optimize dequeue_task_fair()(Internet mail)

2020-08-11 Thread
Hi,

> On Aug 12, 2020, at 12:55 AM, Dietmar Eggemann  
> wrote:
> 
> On 11/08/2020 10:43, Jiang Biao wrote:
>> Similar optimization as what has been done in commit,
>> 7d148be69e3a(sched/fair: Optimize enqueue_task_fair())
>> 
>> dequeue_task_fair jumps to dequeue_throttle label when cfs_rq_of(se) is
>> throttled which means that se can't be NULL. We can move the label after
>> the if (!se) statement and remove the if(!se) statment as se is always
>> NULL when reaching this point.
>> 
>> Besides, trying to keep the same pattern with enqueue_task_fair can make
>> it more readable.
>> 
>> Signed-off-by: Jiang Biao 
>> ---
>> kernel/sched/fair.c | 6 +++---
>> 1 file changed, 3 insertions(+), 3 deletions(-)
>> 
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index 04fa8dbcfa4d..cbbeafdfa8b7 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -5618,10 +5618,10 @@ static void dequeue_task_fair(struct rq *rq, struct 
>> task_struct *p, int flags)
>> 
>>  }
>> 
>> -dequeue_throttle:
>> -if (!se)
>> -sub_nr_running(rq, 1);
>> +/* At this point se is NULL and we are at root level*/
>> +sub_nr_running(rq, 1);
>> 
>> +dequeue_throttle:
>>  /* balance early to pull high priority tasks */
>>  if (unlikely(!was_sched_idle && sched_idle_rq(rq)))
>>  rq->next_balance = jiffies;
> 
> There is already a similar patch in master.
> 
> 423d02e1463b - sched/fair: Optimize dequeue_task_fair() (2020-06-25 Peng
> Wang)
Indeed, my local repo has been outdated, sorry for the interruption. :)

Thx.
Regards,
Jiang



Re: [PATCH RFC v2] sched/fair: simplify the work when reweighting entity(Internet mail)

2020-08-11 Thread
Hi,

> On Aug 11, 2020, at 5:50 PM, Dietmar Eggemann  
> wrote:
> 
> On 06/08/2020 18:14, Jiang Biao wrote:
>> From: Jiang Biao 
>> 
>> If a se is on_rq when reweighting entity, all we need should be
>> updating the load of cfs_rq, other dequeue/enqueue work could be
>> redundant, such as,
>> * nr_running--/nr_running++
>> 
>> Even though the following dequeue/enqueue path would never be reached
>> * account_numa_dequeue/account_numa_enqueue
>> * list_del/list_add from/into cfs_tasks
>> but it could be a little confusing.
>> 
>> Simplifying the logic could be helpful to reduce a litte overhead for
>> hot path, and make it cleaner and more readable.
> 
> LGTM. I guess you have to resend it w/o the RFC tag.
Will do soon.
> 
> Maybe you can rework the patch header a little?
> 
> Something like this:
> 
> The code in reweight_entity() can be simplified.
> 
> For a sched entity on the rq, the entity accounting can be replaced by
> cfs_rq instantaneous load updates currently called from within the
> entity accounting.
> 
> Even though an entity on the rq can't represent a task in
> reweight_entity() (a task is always dequeued before calling this
> function) and so the numa task accounting and the rq->cfs_tasks list
> management of the entity accounting are never called, the redundant
> cfs_rq->nr_running decrement/increment will be avoided.
That’s a much better. I’ll pick the header if you don’t mind. :)

Thanks a lot for your kindly reply and detailed explanation.

Regards,
Jiang
> 
>> 
>> Signed-off-by: Jiang Biao 
>> ---
>> v2<-v1: 
>> - Amend the commit log.
>> 
>> kernel/sched/fair.c | 4 ++--
>> 1 file changed, 2 insertions(+), 2 deletions(-)
>> 
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index 04fa8dbcfa4d..18a8fc7bd0de 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -3086,7 +3086,7 @@ static void reweight_entity(struct cfs_rq *cfs_rq, 
>> struct sched_entity *se,
>>  /* commit outstanding execution time */
>>  if (cfs_rq->curr == se)
>>  update_curr(cfs_rq);
>> -account_entity_dequeue(cfs_rq, se);
>> +update_load_sub(_rq->load, se->load.weight);
>>  }
>>  dequeue_load_avg(cfs_rq, se);
>> 
>> @@ -3102,7 +3102,7 @@ static void reweight_entity(struct cfs_rq *cfs_rq, 
>> struct sched_entity *se,
>> 
>>  enqueue_load_avg(cfs_rq, se);
>>  if (se->on_rq)
>> -account_entity_enqueue(cfs_rq, se);
>> +update_load_add(_rq->load, se->load.weight);
>> 
>> }



Re: [PATCH] sched/fair: reduce preemption with IDLE tasks runable(Internet mail)

2020-08-10 Thread
Hi,

> On Aug 10, 2020, at 9:24 PM, Dietmar Eggemann  
> wrote:
> 
> On 06/08/2020 17:52, benbjiang(蒋彪) wrote:
>> Hi, 
>> 
>>> On Aug 6, 2020, at 9:29 PM, Dietmar Eggemann  
>>> wrote:
>>> 
>>> On 03/08/2020 13:26, benbjiang(蒋彪) wrote:
>>>> 
>>>> 
>>>>> On Aug 3, 2020, at 4:16 PM, Dietmar Eggemann  
>>>>> wrote:
>>>>> 
>>>>> On 01/08/2020 04:32, Jiang Biao wrote:
>>>>>> From: Jiang Biao 
> 
> [...]
> 
>>> How would you deal with se's representing taskgroups which contain
>>> SCHED_IDLE and SCHED_NORMAL tasks or other taskgroups doing that?
>> I’m not sure I get the point. :) How about the following patch,
>> 
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index 04fa8dbcfa4d..8715f03ed6d7 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -2994,6 +2994,9 @@ account_entity_enqueue(struct cfs_rq *cfs_rq, struct 
>> sched_entity *se)
>>list_add(>group_node, >cfs_tasks);
>>}
>> #endif
>> +   if (task_has_idle_policy(task_of(se)))
>> +   cfs_rq->idle_nr_running++;
>> +
>>cfs_rq->nr_running++;
>> }
>> 
>> @@ -3007,6 +3010,9 @@ account_entity_dequeue(struct cfs_rq *cfs_rq, struct 
>> sched_entity *se)
>>list_del_init(>group_node);
>>}
>> #endif
>> +   if (task_has_idle_policy(task_of(se)))
>> +   cfs_rq->idle_nr_running--;
>> +
>>cfs_rq->nr_running--;
>> }
>> 
>> @@ -4527,7 +4533,7 @@ entity_tick(struct cfs_rq *cfs_rq, struct sched_entity 
>> *curr, int queued)
>>return;
>> #endif
>> 
>> -   if (cfs_rq->nr_running > 1)
>> +   if (cfs_rq->nr_running > cfs_rq->idle_nr_running + 1 &&
>> +   cfs_rq->h_nr_running - cfs_rq->idle_h_nr_running > 
>> cfs_rq->idle_nr_running + 1)
>>check_preempt_tick(cfs_rq, curr);
>> }
>> 
>> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
>> index 877fb08eb1b0..401090393e09 100644
>> --- a/kernel/sched/sched.h
>> +++ b/kernel/sched/sched.h
>> @@ -500,6 +500,7 @@ struct cfs_bandwidth { };
>> struct cfs_rq {
>>struct load_weight  load;
>>unsigned intnr_running;
>> +   unsigned intidle_nr_running;
>>unsigned inth_nr_running;  /* 
>> SCHED_{NORMAL,BATCH,IDLE} */
>>unsigned intidle_h_nr_running; /* SCHED_IDLE */
> 
> /
>   / |  \
>  A  n0 i0
> / \
>n1 i1
> 
> I don't think this will work. E.g. the patch would prevent tick
> preemption between 'A' and 'n0' on '/' as well
> 
> (3 > 1 + 1) && (4 - 2 > 1 + 1)
> 
> You also have to make sure that a SCHED_IDLE task can tick preempt
> another SCHED_IDLE task.

That’s right. :)

> 
>>>> I’m not sure if it’s ok to do that, because the IDLE class seems not to be 
>>>> so
>>>> pure that could tolerate starving.
>>> 
>>> Not sure I understand but idle_sched_class is not the same as SCHED_IDLE
>>> (policy)?
>> The case is that we need tasks(low priority, called offline tasks) to 
>> utilize the
>> spare cpu left by CFS SCHED_NORMAL tasks(called online tasks) without
>> interfering the online tasks. 
>> Offline tasks only run when there’s no runnable online tasks, and offline 
>> tasks
>> never preempt online tasks.
>> The SCHED_IDLE policy seems not to be abled to be qualified for that 
>> requirement,
>> because it has a weight(3), even though it’s small, but it can still preempt 
>> online
>> tasks considering the fairness. In that way, offline tasks of SCHED_IDLE 
>> policy
>> could interfere the online tasks.
> 
> Because of this very small weight (weight=3), compared to a SCHED_NORMAL
> nice 0 task (weight=1024), a SCHED_IDLE task is penalized by a huge
> se->vruntime value (1024/3 higher than for a SCHED_NORMAL nice 0 task).
> This should make sure it doesn't tick preempt a SCHED_NORMAL nice 0 task.
Could you please explain how the huge penalization of vruntime(1024/3) could
make sure SCHED_IDLE not tick preempting SCHED_NORMAL nice 0 task?

Thanks a lot.

Regards,
Jiang

> 
> It's different when the SCHED_NORMAL task has nice 19 (weight=15) but
> that's part of the CFS design.
> 
>> On the other hand, idle_sched_class seems not to be qualified either. It’s 
>> too
>> simple and only used for per-cpu idle task currently.
> 
> Yeah, leave this for the rq->idle task (swapper/X).
Got it.

> 
>>>> We need an absolutely low priority class that could tolerate starving, 
>>>> which
>>>> could be used to co-locate offline tasks. But IDLE class seems to be not
>>>> *low* enough, if considering the fairness of CFS, and IDLE class still has 
>>>> a
>>>> weight.
> 
> Understood. But this (tick) preemption should happen extremely rarely,
> especially if you have SCHED_NORMAL nice 0 tasks, right?



Re: [PATCH] sched/fair: reduce preemption with IDLE tasks runable(Internet mail)

2020-08-06 Thread
Hi, 

> On Aug 6, 2020, at 9:29 PM, Dietmar Eggemann  wrote:
> 
> On 03/08/2020 13:26, benbjiang(蒋彪) wrote:
>> 
>> 
>>> On Aug 3, 2020, at 4:16 PM, Dietmar Eggemann  
>>> wrote:
>>> 
>>> On 01/08/2020 04:32, Jiang Biao wrote:
>>>> From: Jiang Biao 
>>>> 
>>>> No need to preempt when there are only one runable CFS task with
>>>> other IDLE tasks on runqueue. The only one CFS task would always
>>>> be picked in that case.
>>>> 
>>>> Signed-off-by: Jiang Biao 
>>>> ---
>>>> kernel/sched/fair.c | 2 +-
>>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>> 
>>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>>>> index 04fa8dbcfa4d..8fb80636b010 100644
>>>> --- a/kernel/sched/fair.c
>>>> +++ b/kernel/sched/fair.c
>>>> @@ -4527,7 +4527,7 @@ entity_tick(struct cfs_rq *cfs_rq, struct 
>>>> sched_entity *curr, int queued)
>>>>return;
>>>> #endif
>>>> 
>>>> -  if (cfs_rq->nr_running > 1)
>>>> +  if (cfs_rq->nr_running > cfs_rq.idle_h_nr_running + 1)
>>> 
>>> cfs_rq is a pointer.
>> It is. Sorry about that. :)
>> 
>>> 
>>>>check_preempt_tick(cfs_rq, curr);
>>>> }
>>> 
>>> You can't compare cfs_rq->nr_running with cfs_rq->idle_h_nr_running!
>>> 
>>> There is a difference between cfs_rq->h_nr_running and
>>> cfs_rq->nr_running. The '_h_' stands for hierarchical.
>>> 
>>> The former gives you hierarchical task accounting whereas the latter is
>>> the number of sched entities (representing tasks or taskgroups) enqueued
>>> in cfs_rq.
>>> 
>>> In entity_tick(), cfs_rq->nr_running has to be used for the condition to
>>> call check_preempt_tick(). We want to check if curr has to be preempted
>>> by __pick_first_entity(cfs_rq) on this cfs_rq.
>>> 
>>> entity_tick() is called for each sched entity (and so for each
>>> cfs_rq_of(se)) of the task group hierarchy (e.g. task p running in
>>> taskgroup /A/B : se(p) -> se(A/B) -> se(A)).
>> That’s true. I was thinking adding a new cfs_rq->idle_nr_running member to
>> track the per cfs_rq's IDLE task number, and reducing preemption here based
>> on that. 
> 
> How would you deal with se's representing taskgroups which contain
> SCHED_IDLE and SCHED_NORMAL tasks or other taskgroups doing that?
I’m not sure I get the point. :) How about the following patch,

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 04fa8dbcfa4d..8715f03ed6d7 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2994,6 +2994,9 @@ account_entity_enqueue(struct cfs_rq *cfs_rq, struct 
sched_entity *se)
list_add(>group_node, >cfs_tasks);
}
 #endif
+   if (task_has_idle_policy(task_of(se)))
+   cfs_rq->idle_nr_running++;
+
cfs_rq->nr_running++;
 }

@@ -3007,6 +3010,9 @@ account_entity_dequeue(struct cfs_rq *cfs_rq, struct 
sched_entity *se)
list_del_init(>group_node);
}
 #endif
+   if (task_has_idle_policy(task_of(se)))
+   cfs_rq->idle_nr_running--;
+
cfs_rq->nr_running--;
 }

@@ -4527,7 +4533,7 @@ entity_tick(struct cfs_rq *cfs_rq, struct sched_entity 
*curr, int queued)
return;
 #endif

-   if (cfs_rq->nr_running > 1)
+   if (cfs_rq->nr_running > cfs_rq->idle_nr_running + 1 &&
+   cfs_rq->h_nr_running - cfs_rq->idle_h_nr_running > 
cfs_rq->idle_nr_running + 1)
check_preempt_tick(cfs_rq, curr);
 }

diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 877fb08eb1b0..401090393e09 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -500,6 +500,7 @@ struct cfs_bandwidth { };
 struct cfs_rq {
struct load_weight  load;
unsigned intnr_running;
+   unsigned intidle_nr_running;
unsigned inth_nr_running;  /* SCHED_{NORMAL,BATCH,IDLE} 
*/
unsigned intidle_h_nr_running; /* SCHED_IDLE */
> 
>> I’m not sure if it’s ok to do that, because the IDLE class seems not to be so
>> pure that could tolerate starving.
> 
> Not sure I understand but idle_sched_class is not the same as SCHED_IDLE
> (policy)?
The case is that we need tasks(low priority, called offline tasks) to utilize 
the
spare cpu left by CFS SCHED_NORMAL tasks(called online tasks) without
interfering the online tasks. 
Offline tasks only run when there’s no runnable online tasks, and offline ta

Re: [PATCH RFC] sched/fair: simplfy the work when reweighting entity

2020-08-05 Thread


> On Aug 6, 2020, at 12:21 AM, Dietmar Eggemann  
> wrote:
> 
> On 04/08/2020 09:12, Jiang Biao wrote:
>> If a se is on_rq when reweighting entity, all we need should be
>> updating the load of cfs_rq, other dequeue/enqueue works could be
>> redundant, such as,
>> * account_numa_dequeue/account_numa_enqueue
>> * list_del/list_add from/into cfs_tasks
>> * nr_running--/nr_running++
> 
> I think this could make sense. Have you spotted a code path where this
> gives you a change?
> 
> I guess only for a task on the rq, so: entity_is_task(se) && se->on_rq
Yes, you're right. No other code path I spotted except what you list below.

> 
>> Just simplfy the work. Could be helpful for the hot path.
> 
> IMHO hotpath is update_cfs_group() -> reweight_entity() but this is only
> called for '!entity_is_task(se)'.
> 
> See
> 
> 3290 if (!gcfs_rq)
> 3291 return;
> 
> in update_cfs_group().
Yes, It is.
But *nr_running--/nr_running++* works are still redundant for
‘!entity_is_task(se)' hot path. :)
Besides, I guess we could simplify the logic and make it cleaner and
more readable with this patch.
If it could make sense to you, would you mind if I resend the patch
with the commit log amended?

> 
> The 'entity_is_task(se)' case is
> 
> set_load_weight(struct task_struct *p, ...) -> reweight_task(p, ...) ->
> reweight_entity(..., >se, ...)
> 
> but here !se->on_rq.
Yes, indeed.

Thanks a lot for your comments.
Regards,
Jiang

> 
>> Signed-off-by: Jiang Biao 
>> ---
>> kernel/sched/fair.c | 4 ++--
>> 1 file changed, 2 insertions(+), 2 deletions(-)
>> 
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index 04fa8dbcfa4d..18a8fc7bd0de 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -3086,7 +3086,7 @@ static void reweight_entity(struct cfs_rq *cfs_rq, 
>> struct sched_entity *se,
>>  /* commit outstanding execution time */
>>  if (cfs_rq->curr == se)
>>  update_curr(cfs_rq);
>> -account_entity_dequeue(cfs_rq, se);
>> +update_load_sub(_rq->load, se->load.weight);
>>  }
>>  dequeue_load_avg(cfs_rq, se);
>> 
>> @@ -3102,7 +3102,7 @@ static void reweight_entity(struct cfs_rq *cfs_rq, 
>> struct sched_entity *se,
>> 
>>  enqueue_load_avg(cfs_rq, se);
>>  if (se->on_rq)
>> -account_entity_enqueue(cfs_rq, se);
>> +update_load_add(_rq->load, se->load.weight);
>> 
>> }
> 



Re: [RFC PATCH 00/16] Core scheduling v6(Internet mail)

2020-08-05 Thread
Hi,

> On Aug 5, 2020, at 11:57 AM, Li, Aubrey  wrote:
> 
> On 2020/8/4 0:53, Joel Fernandes wrote:
>> Hi Aubrey,
>> 
>> On Mon, Aug 3, 2020 at 4:23 AM Li, Aubrey  wrote:
>>> 
>>> On 2020/7/1 5:32, Vineeth Remanan Pillai wrote:
 Sixth iteration of the Core-Scheduling feature.
 
 Core scheduling is a feature that allows only trusted tasks to run
 concurrently on cpus sharing compute resources (eg: hyperthreads on a
 core). The goal is to mitigate the core-level side-channel attacks
 without requiring to disable SMT (which has a significant impact on
 performance in some situations). Core scheduling (as of v6) mitigates
 user-space to user-space attacks and user to kernel attack when one of
 the siblings enters the kernel via interrupts. It is still possible to
 have a task attack the sibling thread when it enters the kernel via
 syscalls.
 
 By default, the feature doesn't change any of the current scheduler
 behavior. The user decides which tasks can run simultaneously on the
 same core (for now by having them in the same tagged cgroup). When a
 tag is enabled in a cgroup and a task from that cgroup is running on a
 hardware thread, the scheduler ensures that only idle or trusted tasks
 run on the other sibling(s). Besides security concerns, this feature
 can also be beneficial for RT and performance applications where we
 want to control how tasks make use of SMT dynamically.
 
 This iteration is mostly a cleanup of v5 except for a major feature of
 pausing sibling when a cpu enters kernel via nmi/irq/softirq. Also
 introducing documentation and includes minor crash fixes.
 
 One major cleanup was removing the hotplug support and related code.
 The hotplug related crashes were not documented and the fixes piled up
 over time leading to complex code. We were not able to reproduce the
 crashes in the limited testing done. But if they are reroducable, we
 don't want to hide them. We should document them and design better
 fixes if any.
 
 In terms of performance, the results in this release are similar to
 v5. On a x86 system with N hardware threads:
 - if only N/2 hardware threads are busy, the performance is similar
  between baseline, corescheduling and nosmt
 - if N hardware threads are busy with N different corescheduling
  groups, the impact of corescheduling is similar to nosmt
 - if N hardware threads are busy and multiple active threads share the
  same corescheduling cookie, they gain a performance improvement over
  nosmt.
  The specific performance impact depends on the workload, but for a
  really busy database 12-vcpu VM (1 coresched tag) running on a 36
  hardware threads NUMA node with 96 mostly idle neighbor VMs (each in
  their own coresched tag), the performance drops by 54% with
  corescheduling and drops by 90% with nosmt.
 
>>> 
>>> We found uperf(in cgroup) throughput drops by ~50% with corescheduling.
>>> 
>>> The problem is, uperf triggered a lot of softirq and offloaded softirq
>>> service to *ksoftirqd* thread.
>>> 
>>> - default, ksoftirqd thread can run with uperf on the same core, we saw
>>>  100% CPU utilization.
>>> - coresched enabled, ksoftirqd's core cookie is different from uperf, so
>>>  they can't run concurrently on the same core, we saw ~15% forced idle.
>>> 
>>> I guess this kind of performance drop can be replicated by other similar
>>> (a lot of softirq activities) workloads.
>>> 
>>> Currently core scheduler picks cookie-match tasks for all SMT siblings, does
>>> it make sense we add a policy to allow cookie-compatible task running 
>>> together?
>>> For example, if a task is trusted(set by admin), it can work with kernel 
>>> thread.
>>> The difference from corescheduling disabled is that we still have user to 
>>> user
>>> isolation.
>> 
>> In ChromeOS we are considering all cookie-0 tasks as trusted.
>> Basically if you don't trust a task, then that is when you assign the
>> task a tag. We do this for the sandboxed processes.
> 
> I have a proposal of this, by changing cpu.tag to cpu.coresched_policy,
> something like the following:
> 
> +/*
> + * Core scheduling policy:
> + * - CORE_SCHED_DISABLED: core scheduling is disabled.
> + * - CORE_COOKIE_MATCH: tasks with same cookie can run
> + * on the same core concurrently.
> + * - CORE_COOKIE_TRUST: trusted task can run with kernel
>   thread on the same core concurrently. 
How about other OS tasks(like systemd) except kernel thread? :)

Thx.
Regards,
Jiang
> + * - CORE_COOKIE_LONELY: tasks with cookie can run only
> + * with idle thread on the same core.
> + */
> +enum coresched_policy {
> +   CORE_SCHED_DISABLED,
> +   CORE_SCHED_COOKIE_MATCH,
> + CORE_SCHED_COOKIE_TRUST,
> +   CORE_SCHED_COOKIE_LONELY,
> +};
> 
> We can set policy to CORE_COOKIE_TRUST 

Re: [PATCH] sched/fair: reduce preemption with IDLE tasks runable(Internet mail)

2020-08-03 Thread


> On Aug 3, 2020, at 4:16 PM, Dietmar Eggemann  wrote:
> 
> On 01/08/2020 04:32, Jiang Biao wrote:
>> From: Jiang Biao 
>> 
>> No need to preempt when there are only one runable CFS task with
>> other IDLE tasks on runqueue. The only one CFS task would always
>> be picked in that case.
>> 
>> Signed-off-by: Jiang Biao 
>> ---
>> kernel/sched/fair.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>> 
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index 04fa8dbcfa4d..8fb80636b010 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -4527,7 +4527,7 @@ entity_tick(struct cfs_rq *cfs_rq, struct sched_entity 
>> *curr, int queued)
>>  return;
>> #endif
>> 
>> -if (cfs_rq->nr_running > 1)
>> +if (cfs_rq->nr_running > cfs_rq.idle_h_nr_running + 1)
> 
> cfs_rq is a pointer.
It is. Sorry about that. :)

> 
>>  check_preempt_tick(cfs_rq, curr);
>> }
> 
> You can't compare cfs_rq->nr_running with cfs_rq->idle_h_nr_running!
> 
> There is a difference between cfs_rq->h_nr_running and
> cfs_rq->nr_running. The '_h_' stands for hierarchical.
> 
> The former gives you hierarchical task accounting whereas the latter is
> the number of sched entities (representing tasks or taskgroups) enqueued
> in cfs_rq.
> 
> In entity_tick(), cfs_rq->nr_running has to be used for the condition to
> call check_preempt_tick(). We want to check if curr has to be preempted
> by __pick_first_entity(cfs_rq) on this cfs_rq.
> 
> entity_tick() is called for each sched entity (and so for each
> cfs_rq_of(se)) of the task group hierarchy (e.g. task p running in
> taskgroup /A/B : se(p) -> se(A/B) -> se(A)).
That’s true. I was thinking adding a new cfs_rq->idle_nr_running member to
track the per cfs_rq's IDLE task number, and reducing preemption here based
on that. 
I’m not sure if it’s ok to do that, because the IDLE class seems not to be so
pure that could tolerate starving.
We need an absolutely low priority class that could tolerate starving, which
could be used to co-locate offline tasks. But IDLE class seems to be not
*low* enough, if considering the fairness of CFS, and IDLE class still has a
weight.

Thanks for you reply.

Regards,
Jiang

Re: [RFC PATCH 11/16] sched: migration changes for core scheduling(Internet mail)

2020-07-23 Thread
Hi,

> On Jul 24, 2020, at 10:05 AM, Li, Aubrey  wrote:
> 
> On 2020/7/24 9:26, benbjiang(蒋彪) wrote:
>> Hi,
>> 
>>> On Jul 24, 2020, at 7:43 AM, Aubrey Li  wrote:
>>> 
>>> On Thu, Jul 23, 2020 at 4:28 PM benbjiang(蒋彪)  wrote:
>>>> 
>>>> Hi,
>>>> 
>>>>> On Jul 23, 2020, at 4:06 PM, Li, Aubrey  wrote:
>>>>> 
>>>>> On 2020/7/23 15:47, benbjiang(蒋彪) wrote:
>>>>>> Hi,
>>>>>> 
>>>>>>> On Jul 23, 2020, at 1:39 PM, Li, Aubrey  
>>>>>>> wrote:
>>>>>>> 
>>>>>>> On 2020/7/23 12:23, benbjiang(蒋彪) wrote:
>>>>>>>> Hi,
>>>>>>>>> On Jul 23, 2020, at 11:35 AM, Li, Aubrey  
>>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>> On 2020/7/23 10:42, benbjiang(蒋彪) wrote:
>>>>>>>>>> Hi,
>>>>>>>>>> 
>>>>>>>>>>> On Jul 23, 2020, at 9:57 AM, Li, Aubrey  
>>>>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> On 2020/7/22 22:32, benbjiang(蒋彪) wrote:
>>>>>>>>>>>> Hi,
>>>>>>>>>>>> 
>>>>>>>>>>>>> On Jul 22, 2020, at 8:13 PM, Li, Aubrey 
>>>>>>>>>>>>>  wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On 2020/7/22 16:54, benbjiang(蒋彪) wrote:
>>>>>>>>>>>>>> Hi, Aubrey,
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On Jul 1, 2020, at 5:32 AM, Vineeth Remanan Pillai 
>>>>>>>>>>>>>>>  wrote:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> From: Aubrey Li 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> - Don't migrate if there is a cookie mismatch
>>>>>>>>>>>>>>> Load balance tries to move task from busiest CPU to the
>>>>>>>>>>>>>>> destination CPU. When core scheduling is enabled, if the
>>>>>>>>>>>>>>> task's cookie does not match with the destination CPU's
>>>>>>>>>>>>>>> core cookie, this task will be skipped by this CPU. This
>>>>>>>>>>>>>>> mitigates the forced idle time on the destination CPU.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> - Select cookie matched idle CPU
>>>>>>>>>>>>>>> In the fast path of task wakeup, select the first cookie matched
>>>>>>>>>>>>>>> idle CPU instead of the first idle CPU.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> - Find cookie matched idlest CPU
>>>>>>>>>>>>>>> In the slow path of task wakeup, find the idlest CPU whose core
>>>>>>>>>>>>>>> cookie matches with task's cookie
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> - Don't migrate task if cookie not match
>>>>>>>>>>>>>>> For the NUMA load balance, don't migrate task to the CPU whose
>>>>>>>>>>>>>>> core cookie does not match with task's cookie
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Signed-off-by: Aubrey Li 
>>>>>>>>>>>>>>> Signed-off-by: Tim Chen 
>>>>>>>>>>>>>>> Signed-off-by: Vineeth Remanan Pillai 
>>>>>>>>>>>>>>> ---
>>>>>>>>>>>>>>> kernel/sched/fair.c  | 64 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> kernel/sched/sched.h | 29 
>>>>>>>>>>>>>>> 2 files changed, 88 insertions(+), 5 deletions(-)
>>>>>>&g

Re: [RFC PATCH 11/16] sched: migration changes for core scheduling(Internet mail)

2020-07-23 Thread
Hi,

> On Jul 24, 2020, at 7:43 AM, Aubrey Li  wrote:
> 
> On Thu, Jul 23, 2020 at 4:28 PM benbjiang(蒋彪)  wrote:
>> 
>> Hi,
>> 
>>> On Jul 23, 2020, at 4:06 PM, Li, Aubrey  wrote:
>>> 
>>> On 2020/7/23 15:47, benbjiang(蒋彪) wrote:
>>>> Hi,
>>>> 
>>>>> On Jul 23, 2020, at 1:39 PM, Li, Aubrey  wrote:
>>>>> 
>>>>> On 2020/7/23 12:23, benbjiang(蒋彪) wrote:
>>>>>> Hi,
>>>>>>> On Jul 23, 2020, at 11:35 AM, Li, Aubrey  
>>>>>>> wrote:
>>>>>>> 
>>>>>>> On 2020/7/23 10:42, benbjiang(蒋彪) wrote:
>>>>>>>> Hi,
>>>>>>>> 
>>>>>>>>> On Jul 23, 2020, at 9:57 AM, Li, Aubrey  
>>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>> On 2020/7/22 22:32, benbjiang(蒋彪) wrote:
>>>>>>>>>> Hi,
>>>>>>>>>> 
>>>>>>>>>>> On Jul 22, 2020, at 8:13 PM, Li, Aubrey  
>>>>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> On 2020/7/22 16:54, benbjiang(蒋彪) wrote:
>>>>>>>>>>>> Hi, Aubrey,
>>>>>>>>>>>> 
>>>>>>>>>>>>> On Jul 1, 2020, at 5:32 AM, Vineeth Remanan Pillai 
>>>>>>>>>>>>>  wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> From: Aubrey Li 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> - Don't migrate if there is a cookie mismatch
>>>>>>>>>>>>> Load balance tries to move task from busiest CPU to the
>>>>>>>>>>>>> destination CPU. When core scheduling is enabled, if the
>>>>>>>>>>>>> task's cookie does not match with the destination CPU's
>>>>>>>>>>>>> core cookie, this task will be skipped by this CPU. This
>>>>>>>>>>>>> mitigates the forced idle time on the destination CPU.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> - Select cookie matched idle CPU
>>>>>>>>>>>>> In the fast path of task wakeup, select the first cookie matched
>>>>>>>>>>>>> idle CPU instead of the first idle CPU.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> - Find cookie matched idlest CPU
>>>>>>>>>>>>> In the slow path of task wakeup, find the idlest CPU whose core
>>>>>>>>>>>>> cookie matches with task's cookie
>>>>>>>>>>>>> 
>>>>>>>>>>>>> - Don't migrate task if cookie not match
>>>>>>>>>>>>> For the NUMA load balance, don't migrate task to the CPU whose
>>>>>>>>>>>>> core cookie does not match with task's cookie
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Signed-off-by: Aubrey Li 
>>>>>>>>>>>>> Signed-off-by: Tim Chen 
>>>>>>>>>>>>> Signed-off-by: Vineeth Remanan Pillai 
>>>>>>>>>>>>> ---
>>>>>>>>>>>>> kernel/sched/fair.c  | 64 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> kernel/sched/sched.h | 29 
>>>>>>>>>>>>> 2 files changed, 88 insertions(+), 5 deletions(-)
>>>>>>>>>>>>> 
>>>>>>>>>>>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>>>>>>>>>>>>> index d16939766361..33dc4bf01817 100644
>>>>>>>>>>>>> --- a/kernel/sched/fair.c
>>>>>>>>>>>>> +++ b/kernel/sched/fair.c
>>>>>>>>>>>>> @@ -2051,6 +2051,15 @@ static void task_numa_find_cpu(struct 
>>>>>>>>>>>>> task_numa_env *env,
>>>>>>>>>>>>> 

Re: [RFC PATCH 11/16] sched: migration changes for core scheduling(Internet mail)

2020-07-23 Thread
Hi,

> On Jul 23, 2020, at 4:06 PM, Li, Aubrey  wrote:
> 
> On 2020/7/23 15:47, benbjiang(蒋彪) wrote:
>> Hi,
>> 
>>> On Jul 23, 2020, at 1:39 PM, Li, Aubrey  wrote:
>>> 
>>> On 2020/7/23 12:23, benbjiang(蒋彪) wrote:
>>>> Hi,
>>>>> On Jul 23, 2020, at 11:35 AM, Li, Aubrey  
>>>>> wrote:
>>>>> 
>>>>> On 2020/7/23 10:42, benbjiang(蒋彪) wrote:
>>>>>> Hi,
>>>>>> 
>>>>>>> On Jul 23, 2020, at 9:57 AM, Li, Aubrey  
>>>>>>> wrote:
>>>>>>> 
>>>>>>> On 2020/7/22 22:32, benbjiang(蒋彪) wrote:
>>>>>>>> Hi,
>>>>>>>> 
>>>>>>>>> On Jul 22, 2020, at 8:13 PM, Li, Aubrey  
>>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>> On 2020/7/22 16:54, benbjiang(蒋彪) wrote:
>>>>>>>>>> Hi, Aubrey,
>>>>>>>>>> 
>>>>>>>>>>> On Jul 1, 2020, at 5:32 AM, Vineeth Remanan Pillai 
>>>>>>>>>>>  wrote:
>>>>>>>>>>> 
>>>>>>>>>>> From: Aubrey Li 
>>>>>>>>>>> 
>>>>>>>>>>> - Don't migrate if there is a cookie mismatch
>>>>>>>>>>> Load balance tries to move task from busiest CPU to the
>>>>>>>>>>> destination CPU. When core scheduling is enabled, if the
>>>>>>>>>>> task's cookie does not match with the destination CPU's
>>>>>>>>>>> core cookie, this task will be skipped by this CPU. This
>>>>>>>>>>> mitigates the forced idle time on the destination CPU.
>>>>>>>>>>> 
>>>>>>>>>>> - Select cookie matched idle CPU
>>>>>>>>>>> In the fast path of task wakeup, select the first cookie matched
>>>>>>>>>>> idle CPU instead of the first idle CPU.
>>>>>>>>>>> 
>>>>>>>>>>> - Find cookie matched idlest CPU
>>>>>>>>>>> In the slow path of task wakeup, find the idlest CPU whose core
>>>>>>>>>>> cookie matches with task's cookie
>>>>>>>>>>> 
>>>>>>>>>>> - Don't migrate task if cookie not match
>>>>>>>>>>> For the NUMA load balance, don't migrate task to the CPU whose
>>>>>>>>>>> core cookie does not match with task's cookie
>>>>>>>>>>> 
>>>>>>>>>>> Signed-off-by: Aubrey Li 
>>>>>>>>>>> Signed-off-by: Tim Chen 
>>>>>>>>>>> Signed-off-by: Vineeth Remanan Pillai 
>>>>>>>>>>> ---
>>>>>>>>>>> kernel/sched/fair.c  | 64 
>>>>>>>>>>> 
>>>>>>>>>>> kernel/sched/sched.h | 29 
>>>>>>>>>>> 2 files changed, 88 insertions(+), 5 deletions(-)
>>>>>>>>>>> 
>>>>>>>>>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>>>>>>>>>>> index d16939766361..33dc4bf01817 100644
>>>>>>>>>>> --- a/kernel/sched/fair.c
>>>>>>>>>>> +++ b/kernel/sched/fair.c
>>>>>>>>>>> @@ -2051,6 +2051,15 @@ static void task_numa_find_cpu(struct 
>>>>>>>>>>> task_numa_env *env,
>>>>>>>>>>> if (!cpumask_test_cpu(cpu, env->p->cpus_ptr))
>>>>>>>>>>> continue;
>>>>>>>>>>> 
>>>>>>>>>>> +#ifdef CONFIG_SCHED_CORE
>>>>>>>>>>> +   /*
>>>>>>>>>>> +* Skip this cpu if source task's cookie does not match
>>>>>>>>>>> +* with CPU's core cookie.
>>>>>>>>>>> +*/
>>>>>>>>>>> +   if (!sched_core_cookie_match(cpu_rq(cpu), env->p))
>>>>>>>>>>> +

Re: [RFC PATCH 11/16] sched: migration changes for core scheduling(Internet mail)

2020-07-23 Thread
Hi,

> On Jul 23, 2020, at 1:39 PM, Li, Aubrey  wrote:
> 
> On 2020/7/23 12:23, benbjiang(蒋彪) wrote:
>> Hi,
>>> On Jul 23, 2020, at 11:35 AM, Li, Aubrey  wrote:
>>> 
>>> On 2020/7/23 10:42, benbjiang(蒋彪) wrote:
>>>> Hi,
>>>> 
>>>>> On Jul 23, 2020, at 9:57 AM, Li, Aubrey  wrote:
>>>>> 
>>>>> On 2020/7/22 22:32, benbjiang(蒋彪) wrote:
>>>>>> Hi,
>>>>>> 
>>>>>>> On Jul 22, 2020, at 8:13 PM, Li, Aubrey  
>>>>>>> wrote:
>>>>>>> 
>>>>>>> On 2020/7/22 16:54, benbjiang(蒋彪) wrote:
>>>>>>>> Hi, Aubrey,
>>>>>>>> 
>>>>>>>>> On Jul 1, 2020, at 5:32 AM, Vineeth Remanan Pillai 
>>>>>>>>>  wrote:
>>>>>>>>> 
>>>>>>>>> From: Aubrey Li 
>>>>>>>>> 
>>>>>>>>> - Don't migrate if there is a cookie mismatch
>>>>>>>>> Load balance tries to move task from busiest CPU to the
>>>>>>>>> destination CPU. When core scheduling is enabled, if the
>>>>>>>>> task's cookie does not match with the destination CPU's
>>>>>>>>> core cookie, this task will be skipped by this CPU. This
>>>>>>>>> mitigates the forced idle time on the destination CPU.
>>>>>>>>> 
>>>>>>>>> - Select cookie matched idle CPU
>>>>>>>>> In the fast path of task wakeup, select the first cookie matched
>>>>>>>>> idle CPU instead of the first idle CPU.
>>>>>>>>> 
>>>>>>>>> - Find cookie matched idlest CPU
>>>>>>>>> In the slow path of task wakeup, find the idlest CPU whose core
>>>>>>>>> cookie matches with task's cookie
>>>>>>>>> 
>>>>>>>>> - Don't migrate task if cookie not match
>>>>>>>>> For the NUMA load balance, don't migrate task to the CPU whose
>>>>>>>>> core cookie does not match with task's cookie
>>>>>>>>> 
>>>>>>>>> Signed-off-by: Aubrey Li 
>>>>>>>>> Signed-off-by: Tim Chen 
>>>>>>>>> Signed-off-by: Vineeth Remanan Pillai 
>>>>>>>>> ---
>>>>>>>>> kernel/sched/fair.c  | 64 
>>>>>>>>> kernel/sched/sched.h | 29 
>>>>>>>>> 2 files changed, 88 insertions(+), 5 deletions(-)
>>>>>>>>> 
>>>>>>>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>>>>>>>>> index d16939766361..33dc4bf01817 100644
>>>>>>>>> --- a/kernel/sched/fair.c
>>>>>>>>> +++ b/kernel/sched/fair.c
>>>>>>>>> @@ -2051,6 +2051,15 @@ static void task_numa_find_cpu(struct 
>>>>>>>>> task_numa_env *env,
>>>>>>>>>   if (!cpumask_test_cpu(cpu, env->p->cpus_ptr))
>>>>>>>>>   continue;
>>>>>>>>> 
>>>>>>>>> +#ifdef CONFIG_SCHED_CORE
>>>>>>>>> + /*
>>>>>>>>> +  * Skip this cpu if source task's cookie does not match
>>>>>>>>> +  * with CPU's core cookie.
>>>>>>>>> +  */
>>>>>>>>> + if (!sched_core_cookie_match(cpu_rq(cpu), env->p))
>>>>>>>>> + continue;
>>>>>>>>> +#endif
>>>>>>>>> +
>>>>>>>>>   env->dst_cpu = cpu;
>>>>>>>>>   if (task_numa_compare(env, taskimp, groupimp, maymove))
>>>>>>>>>   break;
>>>>>>>>> @@ -5963,11 +5972,17 @@ find_idlest_group_cpu(struct sched_group 
>>>>>>>>> *group, struct task_struct *p, int this
>>>>>>>>> 
>>>>>>>>>   /* Traverse only the allowed CPUs */
>>>>>>>>>   for_each_cpu_and(i, sched_group_span(group), p->cpus_ptr

Re: [RFC PATCH 11/16] sched: migration changes for core scheduling(Internet mail)

2020-07-22 Thread
Hi,
> On Jul 23, 2020, at 11:35 AM, Li, Aubrey  wrote:
> 
> On 2020/7/23 10:42, benbjiang(蒋彪) wrote:
>> Hi,
>> 
>>> On Jul 23, 2020, at 9:57 AM, Li, Aubrey  wrote:
>>> 
>>> On 2020/7/22 22:32, benbjiang(蒋彪) wrote:
>>>> Hi,
>>>> 
>>>>> On Jul 22, 2020, at 8:13 PM, Li, Aubrey  wrote:
>>>>> 
>>>>> On 2020/7/22 16:54, benbjiang(蒋彪) wrote:
>>>>>> Hi, Aubrey,
>>>>>> 
>>>>>>> On Jul 1, 2020, at 5:32 AM, Vineeth Remanan Pillai 
>>>>>>>  wrote:
>>>>>>> 
>>>>>>> From: Aubrey Li 
>>>>>>> 
>>>>>>> - Don't migrate if there is a cookie mismatch
>>>>>>>  Load balance tries to move task from busiest CPU to the
>>>>>>>  destination CPU. When core scheduling is enabled, if the
>>>>>>>  task's cookie does not match with the destination CPU's
>>>>>>>  core cookie, this task will be skipped by this CPU. This
>>>>>>>  mitigates the forced idle time on the destination CPU.
>>>>>>> 
>>>>>>> - Select cookie matched idle CPU
>>>>>>>  In the fast path of task wakeup, select the first cookie matched
>>>>>>>  idle CPU instead of the first idle CPU.
>>>>>>> 
>>>>>>> - Find cookie matched idlest CPU
>>>>>>>  In the slow path of task wakeup, find the idlest CPU whose core
>>>>>>>  cookie matches with task's cookie
>>>>>>> 
>>>>>>> - Don't migrate task if cookie not match
>>>>>>>  For the NUMA load balance, don't migrate task to the CPU whose
>>>>>>>  core cookie does not match with task's cookie
>>>>>>> 
>>>>>>> Signed-off-by: Aubrey Li 
>>>>>>> Signed-off-by: Tim Chen 
>>>>>>> Signed-off-by: Vineeth Remanan Pillai 
>>>>>>> ---
>>>>>>> kernel/sched/fair.c  | 64 
>>>>>>> kernel/sched/sched.h | 29 
>>>>>>> 2 files changed, 88 insertions(+), 5 deletions(-)
>>>>>>> 
>>>>>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>>>>>>> index d16939766361..33dc4bf01817 100644
>>>>>>> --- a/kernel/sched/fair.c
>>>>>>> +++ b/kernel/sched/fair.c
>>>>>>> @@ -2051,6 +2051,15 @@ static void task_numa_find_cpu(struct 
>>>>>>> task_numa_env *env,
>>>>>>> if (!cpumask_test_cpu(cpu, env->p->cpus_ptr))
>>>>>>> continue;
>>>>>>> 
>>>>>>> +#ifdef CONFIG_SCHED_CORE
>>>>>>> +   /*
>>>>>>> +* Skip this cpu if source task's cookie does not match
>>>>>>> +* with CPU's core cookie.
>>>>>>> +*/
>>>>>>> +   if (!sched_core_cookie_match(cpu_rq(cpu), env->p))
>>>>>>> +   continue;
>>>>>>> +#endif
>>>>>>> +
>>>>>>> env->dst_cpu = cpu;
>>>>>>> if (task_numa_compare(env, taskimp, groupimp, maymove))
>>>>>>> break;
>>>>>>> @@ -5963,11 +5972,17 @@ find_idlest_group_cpu(struct sched_group 
>>>>>>> *group, struct task_struct *p, int this
>>>>>>> 
>>>>>>> /* Traverse only the allowed CPUs */
>>>>>>> for_each_cpu_and(i, sched_group_span(group), p->cpus_ptr) {
>>>>>>> +   struct rq *rq = cpu_rq(i);
>>>>>>> +
>>>>>>> +#ifdef CONFIG_SCHED_CORE
>>>>>>> +   if (!sched_core_cookie_match(rq, p))
>>>>>>> +   continue;
>>>>>>> +#endif
>>>>>>> +
>>>>>>> if (sched_idle_cpu(i))
>>>>>>> return i;
>>>>>>> 
>>>>>>> if (available_idle_cpu(i)) {
>>>>>>> -   struct rq *rq = cpu_rq(i);
>>&

Re: [RFC PATCH 11/16] sched: migration changes for core scheduling(Internet mail)

2020-07-22 Thread
Hi,

> On Jul 23, 2020, at 9:57 AM, Li, Aubrey  wrote:
> 
> On 2020/7/22 22:32, benbjiang(蒋彪) wrote:
>> Hi,
>> 
>>> On Jul 22, 2020, at 8:13 PM, Li, Aubrey  wrote:
>>> 
>>> On 2020/7/22 16:54, benbjiang(蒋彪) wrote:
>>>> Hi, Aubrey,
>>>> 
>>>>> On Jul 1, 2020, at 5:32 AM, Vineeth Remanan Pillai 
>>>>>  wrote:
>>>>> 
>>>>> From: Aubrey Li 
>>>>> 
>>>>> - Don't migrate if there is a cookie mismatch
>>>>>   Load balance tries to move task from busiest CPU to the
>>>>>   destination CPU. When core scheduling is enabled, if the
>>>>>   task's cookie does not match with the destination CPU's
>>>>>   core cookie, this task will be skipped by this CPU. This
>>>>>   mitigates the forced idle time on the destination CPU.
>>>>> 
>>>>> - Select cookie matched idle CPU
>>>>>   In the fast path of task wakeup, select the first cookie matched
>>>>>   idle CPU instead of the first idle CPU.
>>>>> 
>>>>> - Find cookie matched idlest CPU
>>>>>   In the slow path of task wakeup, find the idlest CPU whose core
>>>>>   cookie matches with task's cookie
>>>>> 
>>>>> - Don't migrate task if cookie not match
>>>>>   For the NUMA load balance, don't migrate task to the CPU whose
>>>>>   core cookie does not match with task's cookie
>>>>> 
>>>>> Signed-off-by: Aubrey Li 
>>>>> Signed-off-by: Tim Chen 
>>>>> Signed-off-by: Vineeth Remanan Pillai 
>>>>> ---
>>>>> kernel/sched/fair.c  | 64 
>>>>> kernel/sched/sched.h | 29 
>>>>> 2 files changed, 88 insertions(+), 5 deletions(-)
>>>>> 
>>>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>>>>> index d16939766361..33dc4bf01817 100644
>>>>> --- a/kernel/sched/fair.c
>>>>> +++ b/kernel/sched/fair.c
>>>>> @@ -2051,6 +2051,15 @@ static void task_numa_find_cpu(struct 
>>>>> task_numa_env *env,
>>>>>   if (!cpumask_test_cpu(cpu, env->p->cpus_ptr))
>>>>>   continue;
>>>>> 
>>>>> +#ifdef CONFIG_SCHED_CORE
>>>>> + /*
>>>>> +  * Skip this cpu if source task's cookie does not match
>>>>> +  * with CPU's core cookie.
>>>>> +  */
>>>>> + if (!sched_core_cookie_match(cpu_rq(cpu), env->p))
>>>>> + continue;
>>>>> +#endif
>>>>> +
>>>>>   env->dst_cpu = cpu;
>>>>>   if (task_numa_compare(env, taskimp, groupimp, maymove))
>>>>>   break;
>>>>> @@ -5963,11 +5972,17 @@ find_idlest_group_cpu(struct sched_group *group, 
>>>>> struct task_struct *p, int this
>>>>> 
>>>>>   /* Traverse only the allowed CPUs */
>>>>>   for_each_cpu_and(i, sched_group_span(group), p->cpus_ptr) {
>>>>> + struct rq *rq = cpu_rq(i);
>>>>> +
>>>>> +#ifdef CONFIG_SCHED_CORE
>>>>> + if (!sched_core_cookie_match(rq, p))
>>>>> + continue;
>>>>> +#endif
>>>>> +
>>>>>   if (sched_idle_cpu(i))
>>>>>   return i;
>>>>> 
>>>>>   if (available_idle_cpu(i)) {
>>>>> - struct rq *rq = cpu_rq(i);
>>>>>   struct cpuidle_state *idle = idle_get_state(rq);
>>>>>   if (idle && idle->exit_latency < min_exit_latency) {
>>>>>   /*
>>>>> @@ -6224,8 +6239,18 @@ static int select_idle_cpu(struct task_struct *p, 
>>>>> struct sched_domain *sd, int t
>>>>>   for_each_cpu_wrap(cpu, cpus, target) {
>>>>>   if (!--nr)
>>>>>   return -1;
>>>>> - if (available_idle_cpu(cpu) || sched_idle_cpu(cpu))
>>>>> - break;
>>>>> +
>>>>> + if (available_idle_cpu(cpu) || sched_idle_cpu(cpu)) {
>>>>> +#ifdef CONFIG_SCHED_CORE
>>>>> + /*
>>>>&g

Re: [RFC PATCH 11/16] sched: migration changes for core scheduling(Internet mail)

2020-07-22 Thread
Hi,

> On Jul 22, 2020, at 8:13 PM, Li, Aubrey  wrote:
> 
> On 2020/7/22 16:54, benbjiang(蒋彪) wrote:
>> Hi, Aubrey,
>> 
>>> On Jul 1, 2020, at 5:32 AM, Vineeth Remanan Pillai 
>>>  wrote:
>>> 
>>> From: Aubrey Li 
>>> 
>>> - Don't migrate if there is a cookie mismatch
>>>Load balance tries to move task from busiest CPU to the
>>>destination CPU. When core scheduling is enabled, if the
>>>task's cookie does not match with the destination CPU's
>>>core cookie, this task will be skipped by this CPU. This
>>>mitigates the forced idle time on the destination CPU.
>>> 
>>> - Select cookie matched idle CPU
>>>In the fast path of task wakeup, select the first cookie matched
>>>idle CPU instead of the first idle CPU.
>>> 
>>> - Find cookie matched idlest CPU
>>>In the slow path of task wakeup, find the idlest CPU whose core
>>>cookie matches with task's cookie
>>> 
>>> - Don't migrate task if cookie not match
>>>For the NUMA load balance, don't migrate task to the CPU whose
>>>core cookie does not match with task's cookie
>>> 
>>> Signed-off-by: Aubrey Li 
>>> Signed-off-by: Tim Chen 
>>> Signed-off-by: Vineeth Remanan Pillai 
>>> ---
>>> kernel/sched/fair.c  | 64 
>>> kernel/sched/sched.h | 29 
>>> 2 files changed, 88 insertions(+), 5 deletions(-)
>>> 
>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>>> index d16939766361..33dc4bf01817 100644
>>> --- a/kernel/sched/fair.c
>>> +++ b/kernel/sched/fair.c
>>> @@ -2051,6 +2051,15 @@ static void task_numa_find_cpu(struct task_numa_env 
>>> *env,
>>> if (!cpumask_test_cpu(cpu, env->p->cpus_ptr))
>>> continue;
>>> 
>>> +#ifdef CONFIG_SCHED_CORE
>>> +   /*
>>> +* Skip this cpu if source task's cookie does not match
>>> +* with CPU's core cookie.
>>> +*/
>>> +   if (!sched_core_cookie_match(cpu_rq(cpu), env->p))
>>> +   continue;
>>> +#endif
>>> +
>>> env->dst_cpu = cpu;
>>> if (task_numa_compare(env, taskimp, groupimp, maymove))
>>> break;
>>> @@ -5963,11 +5972,17 @@ find_idlest_group_cpu(struct sched_group *group, 
>>> struct task_struct *p, int this
>>> 
>>> /* Traverse only the allowed CPUs */
>>> for_each_cpu_and(i, sched_group_span(group), p->cpus_ptr) {
>>> +   struct rq *rq = cpu_rq(i);
>>> +
>>> +#ifdef CONFIG_SCHED_CORE
>>> +   if (!sched_core_cookie_match(rq, p))
>>> +   continue;
>>> +#endif
>>> +
>>> if (sched_idle_cpu(i))
>>> return i;
>>> 
>>> if (available_idle_cpu(i)) {
>>> -   struct rq *rq = cpu_rq(i);
>>> struct cpuidle_state *idle = idle_get_state(rq);
>>> if (idle && idle->exit_latency < min_exit_latency) {
>>> /*
>>> @@ -6224,8 +6239,18 @@ static int select_idle_cpu(struct task_struct *p, 
>>> struct sched_domain *sd, int t
>>> for_each_cpu_wrap(cpu, cpus, target) {
>>> if (!--nr)
>>> return -1;
>>> -   if (available_idle_cpu(cpu) || sched_idle_cpu(cpu))
>>> -   break;
>>> +
>>> +   if (available_idle_cpu(cpu) || sched_idle_cpu(cpu)) {
>>> +#ifdef CONFIG_SCHED_CORE
>>> +   /*
>>> +* If Core Scheduling is enabled, select this cpu
>>> +* only if the process cookie matches core cookie.
>>> +*/
>>> +   if (sched_core_enabled(cpu_rq(cpu)) &&
>>> +   p->core_cookie == cpu_rq(cpu)->core->core_cookie)
>> Why not also add similar logic in select_idle_smt to reduce forced-idle? :)
> We hit select_idle_smt after we scaned the entire LLC domain for idle cores
> and idle cpus and failed,so IMHO, an idle smt is probably a good choice under
> this scenario.

AFAIC, selecting idle sibling with unmatched cookie will cause unnecessary 
fored-idle, unfairness and latency, compared to choosing *target* cp

Re: [RFC PATCH 11/16] sched: migration changes for core scheduling(Internet mail)

2020-07-22 Thread
Hi, Aubrey,

> On Jul 1, 2020, at 5:32 AM, Vineeth Remanan Pillai  
> wrote:
> 
> From: Aubrey Li 
> 
> - Don't migrate if there is a cookie mismatch
> Load balance tries to move task from busiest CPU to the
> destination CPU. When core scheduling is enabled, if the
> task's cookie does not match with the destination CPU's
> core cookie, this task will be skipped by this CPU. This
> mitigates the forced idle time on the destination CPU.
> 
> - Select cookie matched idle CPU
> In the fast path of task wakeup, select the first cookie matched
> idle CPU instead of the first idle CPU.
> 
> - Find cookie matched idlest CPU
> In the slow path of task wakeup, find the idlest CPU whose core
> cookie matches with task's cookie
> 
> - Don't migrate task if cookie not match
> For the NUMA load balance, don't migrate task to the CPU whose
> core cookie does not match with task's cookie
> 
> Signed-off-by: Aubrey Li 
> Signed-off-by: Tim Chen 
> Signed-off-by: Vineeth Remanan Pillai 
> ---
> kernel/sched/fair.c  | 64 
> kernel/sched/sched.h | 29 
> 2 files changed, 88 insertions(+), 5 deletions(-)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index d16939766361..33dc4bf01817 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -2051,6 +2051,15 @@ static void task_numa_find_cpu(struct task_numa_env 
> *env,
>   if (!cpumask_test_cpu(cpu, env->p->cpus_ptr))
>   continue;
> 
> +#ifdef CONFIG_SCHED_CORE
> + /*
> +  * Skip this cpu if source task's cookie does not match
> +  * with CPU's core cookie.
> +  */
> + if (!sched_core_cookie_match(cpu_rq(cpu), env->p))
> + continue;
> +#endif
> +
>   env->dst_cpu = cpu;
>   if (task_numa_compare(env, taskimp, groupimp, maymove))
>   break;
> @@ -5963,11 +5972,17 @@ find_idlest_group_cpu(struct sched_group *group, 
> struct task_struct *p, int this
> 
>   /* Traverse only the allowed CPUs */
>   for_each_cpu_and(i, sched_group_span(group), p->cpus_ptr) {
> + struct rq *rq = cpu_rq(i);
> +
> +#ifdef CONFIG_SCHED_CORE
> + if (!sched_core_cookie_match(rq, p))
> + continue;
> +#endif
> +
>   if (sched_idle_cpu(i))
>   return i;
> 
>   if (available_idle_cpu(i)) {
> - struct rq *rq = cpu_rq(i);
>   struct cpuidle_state *idle = idle_get_state(rq);
>   if (idle && idle->exit_latency < min_exit_latency) {
>   /*
> @@ -6224,8 +6239,18 @@ static int select_idle_cpu(struct task_struct *p, 
> struct sched_domain *sd, int t
>   for_each_cpu_wrap(cpu, cpus, target) {
>   if (!--nr)
>   return -1;
> - if (available_idle_cpu(cpu) || sched_idle_cpu(cpu))
> - break;
> +
> + if (available_idle_cpu(cpu) || sched_idle_cpu(cpu)) {
> +#ifdef CONFIG_SCHED_CORE
> + /*
> +  * If Core Scheduling is enabled, select this cpu
> +  * only if the process cookie matches core cookie.
> +  */
> + if (sched_core_enabled(cpu_rq(cpu)) &&
> + p->core_cookie == cpu_rq(cpu)->core->core_cookie)
Why not also add similar logic in select_idle_smt to reduce forced-idle? :)


> +#endif
> + break;
> + }
>   }
> 
>   time = cpu_clock(this) - time;
> @@ -7609,8 +7634,9 @@ int can_migrate_task(struct task_struct *p, struct 
> lb_env *env)
>* We do not migrate tasks that are:
>* 1) throttled_lb_pair, or
>* 2) cannot be migrated to this CPU due to cpus_ptr, or
> -  * 3) running (obviously), or
> -  * 4) are cache-hot on their current CPU.
> +  * 3) task's cookie does not match with this CPU's core cookie
> +  * 4) running (obviously), or
> +  * 5) are cache-hot on their current CPU.
>*/
>   if (throttled_lb_pair(task_group(p), env->src_cpu, env->dst_cpu))
>   return 0;
> @@ -7645,6 +7671,15 @@ int can_migrate_task(struct task_struct *p, struct 
> lb_env *env)
>   return 0;
>   }
> 
> +#ifdef CONFIG_SCHED_CORE
> + /*
> +  * Don't migrate task if the task's cookie does not match
> +  * with the destination CPU's core cookie.
> +  */
> + if (!sched_core_cookie_match(cpu_rq(env->dst_cpu), p))
> + return 0;
> +#endif
> +
>   /* Record that we found atleast one task that could run on dst_cpu */
>   env->flags &= ~LBF_ALL_PINNED;
> 
> @@ -8857,6 +8892,25 @@ find_idlest_group(struct sched_domain *sd, struct 
> task_struct *p,
>   p->cpus_ptr))
>   

Re: [RFC PATCH 07/16] sched/fair: Fix forced idle sibling starvation corner case(Internet mail)

2020-07-22 Thread



> On Jul 1, 2020, at 5:32 AM, Vineeth Remanan Pillai  
> wrote:
> 
> From: vpillai 
> 
> If there is only one long running local task and the sibling is
> forced idle, it  might not get a chance to run until a schedule
> event happens on any cpu in the core.
> 
> So we check for this condition during a tick to see if a sibling
> is starved and then give it a chance to schedule.
Hi,

There may be other similar starvation cases this patch can not cover. 
Such as, If there is one long running RT task and sibling is forced idle, then 
all tasks with different cookies on all siblings could be starving forever.
Current load-balances seems not able to pull the starved tasks away. 
Would load-balance be more aware of core-scheduling to make things better? :)

Thx.
Regards,
Jiang 

> 
> Signed-off-by: Vineeth Remanan Pillai 
> Signed-off-by: Julien Desfossez 
> ---
> kernel/sched/fair.c | 39 +++
> 1 file changed, 39 insertions(+)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index ae17507533a0..49fb93296e35 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -10613,6 +10613,40 @@ static void rq_offline_fair(struct rq *rq)
> 
> #endif /* CONFIG_SMP */
> 
> +#ifdef CONFIG_SCHED_CORE
> +static inline bool
> +__entity_slice_used(struct sched_entity *se)
> +{
> + return (se->sum_exec_runtime - se->prev_sum_exec_runtime) >
> + sched_slice(cfs_rq_of(se), se);
> +}
> +
> +/*
> + * If runqueue has only one task which used up its slice and if the sibling
> + * is forced idle, then trigger schedule to give forced idle task a chance.
> + */
> +static void resched_forceidle_sibling(struct rq *rq, struct sched_entity *se)
> +{
> + int cpu = cpu_of(rq), sibling_cpu;
> +
> + if (rq->cfs.nr_running > 1 || !__entity_slice_used(se))
> + return;
> +
> + for_each_cpu(sibling_cpu, cpu_smt_mask(cpu)) {
> + struct rq *sibling_rq;
> + if (sibling_cpu == cpu)
> + continue;
> + if (cpu_is_offline(sibling_cpu))
> + continue;
> +
> + sibling_rq = cpu_rq(sibling_cpu);
> + if (sibling_rq->core_forceidle) {
> + resched_curr(sibling_rq);
> + }
> + }
> +}
> +#endif
> +
> /*
>  * scheduler tick hitting a task of our scheduling class.
>  *
> @@ -10636,6 +10670,11 @@ static void task_tick_fair(struct rq *rq, struct 
> task_struct *curr, int queued)
> 
>   update_misfit_status(curr, rq);
>   update_overutilized_status(task_rq(curr));
> +
> +#ifdef CONFIG_SCHED_CORE
> + if (sched_core_enabled(rq))
> + resched_forceidle_sibling(rq, >se);
> +#endif
> }
> 
> /*
> -- 
> 2.17.1
> 
> 



Re: [RFC PATCH 09/16] sched/fair: core wide cfs task priority comparison(Internet mail)

2020-07-21 Thread



> On Jul 1, 2020, at 5:32 AM, Vineeth Remanan Pillai  
> wrote:
> 
> From: Aaron Lu 
> 
> This patch provides a vruntime based way to compare two cfs task's
> priority, be it on the same cpu or different threads of the same core.
> 
> When the two tasks are on the same CPU, we just need to find a common
> cfs_rq both sched_entities are on and then do the comparison.
> 
> When the two tasks are on differen threads of the same core, each thread
> will choose the next task to run the usual way and then the root level
> sched entities which the two tasks belong to will be used to decide
> which task runs next core wide.
> 
> An illustration for the cross CPU case:
> 
>   cpu0 cpu1
> /   |  \ /   |  \
> se1 se2 se3  se4 se5 se6
>/  \/   \
>  se21 se22   se61  se62
>  (A)/
>   se621
>(B)
> 
> Assume CPU0 and CPU1 are smt siblings and cpu0 has decided task A to
> run next and cpu1 has decided task B to run next. To compare priority
> of task A and B, we compare priority of se2 and se6. Whose vruntime is
> smaller, who wins.
> 
> To make this work, the root level sched entities' vruntime of the two
> threads must be directly comparable. So one of the hyperthread's root
> cfs_rq's min_vruntime is chosen as the core wide one and all root level
> sched entities' vruntime is normalized against it.
> 
> All sub cfs_rqs and sched entities are not interesting in cross cpu
> priority comparison as they will only participate in the usual cpu local
> schedule decision so no need to normalize their vruntimes.
> 
> Signed-off-by: Aaron Lu 
> ---
> kernel/sched/core.c  |  23 +++
> kernel/sched/fair.c  | 142 ++-
> kernel/sched/sched.h |   3 +
> 3 files changed, 150 insertions(+), 18 deletions(-)
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index f51e5c4798c8..4d6d6a678013 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -114,19 +114,8 @@ static inline bool prio_less(struct task_struct *a, 
> struct task_struct *b)
>   if (pa == -1) /* dl_prio() doesn't work because of stop_class above */
>   return !dl_time_before(a->dl.deadline, b->dl.deadline);
> 
> - if (pa == MAX_RT_PRIO + MAX_NICE)  { /* fair */
> - u64 vruntime = b->se.vruntime;
> -
> - /*
> -  * Normalize the vruntime if tasks are in different cpus.
> -  */
> - if (task_cpu(a) != task_cpu(b)) {
> - vruntime -= task_cfs_rq(b)->min_vruntime;
> - vruntime += task_cfs_rq(a)->min_vruntime;
> - }
> -
> - return !((s64)(a->se.vruntime - vruntime) <= 0);
> - }
> + if (pa == MAX_RT_PRIO + MAX_NICE) /* fair */
> + return cfs_prio_less(a, b);
> 
>   return false;
> }
> @@ -229,8 +218,12 @@ static int __sched_core_stopper(void *data)
>   bool enabled = !!(unsigned long)data;
>   int cpu;
> 
> - for_each_possible_cpu(cpu)
> - cpu_rq(cpu)->core_enabled = enabled;
> + for_each_possible_cpu(cpu) {
> + struct rq *rq = cpu_rq(cpu);
> + rq->core_enabled = enabled;
> + if (cpu_online(cpu) && rq->core != rq)
> + sched_core_adjust_sibling_vruntime(cpu, enabled);
> + }
> 
>   return 0;
> }
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 61d19e573443..d16939766361 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -462,11 +462,142 @@ find_matching_se(struct sched_entity **se, struct 
> sched_entity **pse)
> 
> #endif/* CONFIG_FAIR_GROUP_SCHED */
> 
> +#ifdef CONFIG_SCHED_CORE
> +static inline struct cfs_rq *root_cfs_rq(struct cfs_rq *cfs_rq)
> +{
> + return _of(cfs_rq)->cfs;
> +}
> +
> +static inline bool is_root_cfs_rq(struct cfs_rq *cfs_rq)
> +{
> + return cfs_rq == root_cfs_rq(cfs_rq);
> +}
> +
> +static inline struct cfs_rq *core_cfs_rq(struct cfs_rq *cfs_rq)
> +{
> + return _of(cfs_rq)->core->cfs;
> +}
> +
> +static inline u64 cfs_rq_min_vruntime(struct cfs_rq *cfs_rq)
> +{
> + if (!sched_core_enabled(rq_of(cfs_rq)) || !is_root_cfs_rq(cfs_rq))
> + return cfs_rq->min_vruntime;
> +
> + return core_cfs_rq(cfs_rq)->min_vruntime;
> +}
> +
> +#ifndef CONFIG_64BIT
> +static inline u64 cfs_rq_min_vruntime_copy(struct cfs_rq *cfs_rq)
> +{
> + if (!sched_core_enabled(rq_of(cfs_rq)) || !is_root_cfs_rq(cfs_rq))
> + return cfs_rq->min_vruntime_copy;
> +
> + return core_cfs_rq(cfs_rq)->min_vruntime_copy;
> +}
> +#endif /* CONFIG_64BIT */
> +
> +bool cfs_prio_less(struct task_struct *a, struct task_struct *b)
> +{
> + struct sched_entity *sea = >se;
> + struct sched_entity *seb = >se;
> + bool samecpu = task_cpu(a) == task_cpu(b);
> + s64 delta;
> +
> +#ifdef CONFIG_FAIR_GROUP_SCHED
> + if (samecpu) {
> + /* vruntime is per cfs_rq */
> + 

Re: [RFC PATCH 05/16] sched: Basic tracking of matching tasks(Internet mail)

2020-07-21 Thread
Hi, perter

> On Jul 1, 2020, at 5:32 AM, Vineeth Remanan Pillai  
> wrote:
> 
> From: Peter Zijlstra 
> 
> Introduce task_struct::core_cookie as an opaque identifier for core
> scheduling. When enabled; core scheduling will only allow matching
> task to be on the core; where idle matches everything.
> 
> When task_struct::core_cookie is set (and core scheduling is enabled)
> these tasks are indexed in a second RB-tree, first on cookie value
> then on scheduling function, such that matching task selection always
> finds the most elegible match.
> 
> NOTE: *shudder* at the overhead...
> 
> NOTE: *sigh*, a 3rd copy of the scheduling function; the alternative
> is per class tracking of cookies and that just duplicates a lot of
> stuff for no raisin (the 2nd copy lives in the rt-mutex PI code).
> 
> Signed-off-by: Peter Zijlstra (Intel) 
> Signed-off-by: Vineeth Remanan Pillai 
> Signed-off-by: Julien Desfossez 
> ---
> include/linux/sched.h |   8 ++-
> kernel/sched/core.c   | 146 ++
> kernel/sched/fair.c   |  46 -
> kernel/sched/sched.h  |  55 
> 4 files changed, 208 insertions(+), 47 deletions(-)
> 
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 4418f5cb8324..3c8dcc5ff039 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -683,10 +683,16 @@ struct task_struct {
>   const struct sched_class*sched_class;
>   struct sched_entity se;
>   struct sched_rt_entity  rt;
> + struct sched_dl_entity  dl;
> +
> +#ifdef CONFIG_SCHED_CORE
> + struct rb_node  core_node;
> + unsigned long   core_cookie;
> +#endif
> +
> #ifdef CONFIG_CGROUP_SCHED
>   struct task_group   *sched_task_group;
> #endif
> - struct sched_dl_entity  dl;
> 
> #ifdef CONFIG_UCLAMP_TASK
>   /* Clamp values requested for a scheduling entity */
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 4b81301e3f21..b21bcab20da6 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -77,6 +77,141 @@ int sysctl_sched_rt_runtime = 95;
> 
> DEFINE_STATIC_KEY_FALSE(__sched_core_enabled);
> 
> +/* kernel prio, less is more */
> +static inline int __task_prio(struct task_struct *p)
> +{
> + if (p->sched_class == _sched_class) /* trumps deadline */
> + return -2;
> +
> + if (rt_prio(p->prio)) /* includes deadline */
> + return p->prio; /* [-1, 99] */
> +
> + if (p->sched_class == _sched_class)
> + return MAX_RT_PRIO + NICE_WIDTH; /* 140 */

return MAX_PRIO; 
May be simpler?

> +
> + return MAX_RT_PRIO + MAX_NICE; /* 120, squash fair */

MAX_RT_PRIO(100) + MAX_NICE(19) Should be 119?  :)

Thx.
Regards,
Jiang

> +}
> +
> +/*
> + * l(a,b)
> + * le(a,b) := !l(b,a)
> + * g(a,b)  := l(b,a)
> + * ge(a,b) := !l(a,b)
> + */
> +
> +/* real prio, less is less */
> +static inline bool prio_less(struct task_struct *a, struct task_struct *b)
> +{
> +
> + int pa = __task_prio(a), pb = __task_prio(b);
> +
> + if (-pa < -pb)
> + return true;
> +
> + if (-pb < -pa)
> + return false;
> +
> + if (pa == -1) /* dl_prio() doesn't work because of stop_class above */
> + return !dl_time_before(a->dl.deadline, b->dl.deadline);
> +
> + if (pa == MAX_RT_PRIO + MAX_NICE)  { /* fair */
> + u64 vruntime = b->se.vruntime;
> +
> + /*
> +  * Normalize the vruntime if tasks are in different cpus.
> +  */
> + if (task_cpu(a) != task_cpu(b)) {
> + vruntime -= task_cfs_rq(b)->min_vruntime;
> + vruntime += task_cfs_rq(a)->min_vruntime;
> + }
> +
> + return !((s64)(a->se.vruntime - vruntime) <= 0);
> + }
> +
> + return false;
> +}
> +
> +static inline bool __sched_core_less(struct task_struct *a, struct 
> task_struct *b)
> +{
> + if (a->core_cookie < b->core_cookie)
> + return true;
> +
> + if (a->core_cookie > b->core_cookie)
> + return false;
> +
> + /* flip prio, so high prio is leftmost */
> + if (prio_less(b, a))
> + return true;
> +
> + return false;
> +}
> +
> +static void sched_core_enqueue(struct rq *rq, struct task_struct *p)
> +{
> + struct rb_node *parent, **node;
> + struct task_struct *node_task;
> +
> + rq->core->core_task_seq++;
> +
> + if (!p->core_cookie)
> + return;
> +
> + node = >core_tree.rb_node;
> + parent = *node;
> +
> + while (*node) {
> + node_task = container_of(*node, struct task_struct, core_node);
> + parent = *node;
> +
> + if (__sched_core_less(p, node_task))
> + node = >rb_left;
> + else
> + node = >rb_right;
> + }
> +
> + rb_link_node(>core_node, parent, node);
> + 

Re: [RFC PATCH 07/16] sched/fair: Fix forced idle sibling starvation corner case(Internet mail)

2020-07-21 Thread
Hi,

> On Jul 1, 2020, at 5:32 AM, Vineeth Remanan Pillai  
> wrote:
> 
> From: vpillai 
> 
> If there is only one long running local task and the sibling is
> forced idle, it  might not get a chance to run until a schedule
> event happens on any cpu in the core.
> 
> So we check for this condition during a tick to see if a sibling
> is starved and then give it a chance to schedule.
> 
> Signed-off-by: Vineeth Remanan Pillai 
> Signed-off-by: Julien Desfossez 
> ---
> kernel/sched/fair.c | 39 +++
> 1 file changed, 39 insertions(+)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index ae17507533a0..49fb93296e35 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -10613,6 +10613,40 @@ static void rq_offline_fair(struct rq *rq)
> 
> #endif /* CONFIG_SMP */
> 
> +#ifdef CONFIG_SCHED_CORE
> +static inline bool
> +__entity_slice_used(struct sched_entity *se)
> +{
> + return (se->sum_exec_runtime - se->prev_sum_exec_runtime) >
> + sched_slice(cfs_rq_of(se), se);
> +}
> +
> +/*
> + * If runqueue has only one task which used up its slice and if the sibling
> + * is forced idle, then trigger schedule to give forced idle task a chance.
> + */
> +static void resched_forceidle_sibling(struct rq *rq, struct sched_entity *se)
> +{
> + int cpu = cpu_of(rq), sibling_cpu;
> +
> + if (rq->cfs.nr_running > 1 || !__entity_slice_used(se))
> + return;
> +
> + for_each_cpu(sibling_cpu, cpu_smt_mask(cpu)) {
> + struct rq *sibling_rq;
> + if (sibling_cpu == cpu)
> + continue;
> + if (cpu_is_offline(sibling_cpu))
> + continue;
> +
> + sibling_rq = cpu_rq(sibling_cpu);
> + if (sibling_rq->core_forceidle) {
> + resched_curr(sibling_rq);
> + }
> + }
> +}
> +#endif
> +
> /*
>  * scheduler tick hitting a task of our scheduling class.
>  *
> @@ -10636,6 +10670,11 @@ static void task_tick_fair(struct rq *rq, struct 
> task_struct *curr, int queued)
> 
>   update_misfit_status(curr, rq);
>   update_overutilized_status(task_rq(curr));
> +
> +#ifdef CONFIG_SCHED_CORE
> + if (sched_core_enabled(rq))
> + resched_forceidle_sibling(rq, >se);
> +#endif
Hi,

resched_forceidle_sibling depends on tick, but there could be no tick in 
1s(scheduler_tick_max_derferment) after
entering nohz_full mode. 
And when enable nohz_full, cpu will enter nohz_full mode frequently when *there 
is only one long running local task*.
That means the siblings rescheduling would be delayed much more than 
sched_slice(), could be unfair and result in
big latency.

Should we restrict cpu with forced-idle sibling entering nohz_full mode by 
adding specific flag and checking it before
stop tick?

Or we can do rescheduling on siblings in task_tick_idle by checking starvation 
time? :)

Thx
Regard,
Jiang 

> }
> 
> /*
> -- 
> 2.17.1
> 
> 



Re: [RFC PATCH 10/16] sched: Trivial forced-newidle balancer(Internet mail)

2020-07-20 Thread



> On Jul 1, 2020, at 5:32 AM, Vineeth Remanan Pillai  
> wrote:
> 
> From: Peter Zijlstra 
> 
> When a sibling is forced-idle to match the core-cookie; search for
> matching tasks to fill the core.
> 
> rcu_read_unlock() can incur an infrequent deadlock in
> sched_core_balance(). Fix this by using the RCU-sched flavor instead.
> 
> Signed-off-by: Peter Zijlstra (Intel) 
> Signed-off-by: Joel Fernandes (Google) 
> Acked-by: Paul E. McKenney 
> ---
> include/linux/sched.h |   1 +
> kernel/sched/core.c   | 131 +-
> kernel/sched/idle.c   |   1 +
> kernel/sched/sched.h  |   6 ++
> 4 files changed, 138 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 3c8dcc5ff039..4f9edf013df3 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -688,6 +688,7 @@ struct task_struct {
> #ifdef CONFIG_SCHED_CORE
>   struct rb_node  core_node;
>   unsigned long   core_cookie;
> + unsigned intcore_occupation;
> #endif
> 
> #ifdef CONFIG_CGROUP_SCHED
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 4d6d6a678013..fb9edb09ead7 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -201,6 +201,21 @@ static struct task_struct *sched_core_find(struct rq 
> *rq, unsigned long cookie)
>   return match;
> }
> 
> +static struct task_struct *sched_core_next(struct task_struct *p, unsigned 
> long cookie)
> +{
> + struct rb_node *node = >core_node;
> +
> + node = rb_next(node);
> + if (!node)
> + return NULL;
> +
> + p = container_of(node, struct task_struct, core_node);
> + if (p->core_cookie != cookie)
> + return NULL;
> +
> + return p;
> +}
> +
> /*
>  * The static-key + stop-machine variable are needed such that:
>  *
> @@ -4233,7 +4248,7 @@ pick_next_task(struct rq *rq, struct task_struct *prev, 
> struct rq_flags *rf)
>   struct task_struct *next, *max = NULL;
>   const struct sched_class *class;
>   const struct cpumask *smt_mask;
> - int i, j, cpu;
> + int i, j, cpu, occ = 0;
>   bool need_sync;
> 
>   if (!sched_core_enabled(rq))
> @@ -4332,6 +4347,9 @@ pick_next_task(struct rq *rq, struct task_struct *prev, 
> struct rq_flags *rf)
>   goto done;
>   }
> 
> + if (!is_idle_task(p))
> + occ++;
> +
>   rq_i->core_pick = p;
> 
>   /*
> @@ -4357,6 +4375,7 @@ pick_next_task(struct rq *rq, struct task_struct *prev, 
> struct rq_flags *rf)
> 
>   cpu_rq(j)->core_pick = NULL;
>   }
> + occ = 1;
>   goto again;
>   } else {
>   /*
> @@ -4393,6 +4412,8 @@ next_class:;
>   if (is_idle_task(rq_i->core_pick) && rq_i->nr_running)
>   rq_i->core_forceidle = true;
> 
> + rq_i->core_pick->core_occupation = occ;
> +
>   if (i == cpu)
>   continue;
> 
> @@ -4408,6 +4429,114 @@ next_class:;
>   return next;
> }
> 
> +static bool try_steal_cookie(int this, int that)
> +{
> + struct rq *dst = cpu_rq(this), *src = cpu_rq(that);
> + struct task_struct *p;
> + unsigned long cookie;
> + bool success = false;
> +
> + local_irq_disable();
> + double_rq_lock(dst, src);
> +
> + cookie = dst->core->core_cookie;
> + if (!cookie)
> + goto unlock;
> +
> + if (dst->curr != dst->idle)
> + goto unlock;
> +
Could it be ok to add another fast return here?
if  (src->nr_running == 1)
goto unlock;
When src cpu has only 1 running task, no need to pull and no need to do 
sched_core_find.

Thx.
Regards,
Jiang

> + p = sched_core_find(src, cookie);
> + if (p == src->idle)
> + goto unlock;
> +
> + do {
> + if (p == src->core_pick || p == src->curr)
> + goto next;
> +
> + if (!cpumask_test_cpu(this, >cpus_mask))
> + goto next;
> +
> + if (p->core_occupation > dst->idle->core_occupation)
> + goto next;
> +
> + p->on_rq = TASK_ON_RQ_MIGRATING;
> + deactivate_task(src, p, 0);
> + set_task_cpu(p, this);
> + activate_task(dst, p, 0);
> + p->on_rq = TASK_ON_RQ_QUEUED;
> +
> + resched_curr(dst);
> +
> + success = true;
> + break;
> +
> +next:
> + p = sched_core_next(p, cookie);
> + } while (p);
> +
> +unlock:
> + double_rq_unlock(dst, src);
> + local_irq_enable();
> +
> + return success;
> +}
> +
> +static bool steal_cookie_task(int cpu, struct sched_domain *sd)
> 

Re: [RFC PATCH 10/16] sched: Trivial forced-newidle balancer(Internet mail)

2020-07-20 Thread


> On Jul 20, 2020, at 4:03 PM, Li, Aubrey  wrote:
> 
> On 2020/7/20 15:23, benbjiang(蒋彪) wrote:
>> Hi, 
>> 
>>> On Jul 20, 2020, at 2:06 PM, Li, Aubrey >> <mailto:aubrey...@linux.intel.com>> wrote:
>>> 
>>> On 2020/7/20 12:06, benbjiang(蒋彪) wrote:
>>>> Hi,
>>>> 
>>>>> On Jul 1, 2020, at 5:32 AM, Vineeth Remanan Pillai 
>>>>> mailto:vpil...@digitalocean.com>> wrote:
>>>>> 
>>>>> From: Peter Zijlstra mailto:pet...@infradead.org>>
>>>>> 
>>>>> When a sibling is forced-idle to match the core-cookie; search for
>>>>> matching tasks to fill the core.
>>>>> 
>>>>> rcu_read_unlock() can incur an infrequent deadlock in
>>>>> sched_core_balance(). Fix this by using the RCU-sched flavor instead.
>>>>> 
>>>>> Signed-off-by: Peter Zijlstra (Intel) >>>> <mailto:pet...@infradead.org>>
>>>>> Signed-off-by: Joel Fernandes (Google) >>>> <mailto:j...@joelfernandes.org>>
>>>>> Acked-by: Paul E. McKenney >>>> <mailto:paul...@kernel.org>>
>>>>> ---
>>>>> include/linux/sched.h |   1 +
>>>>> kernel/sched/core.c   | 131 +-
>>>>> kernel/sched/idle.c   |   1 +
>>>>> kernel/sched/sched.h  |   6 ++
>>>>> 4 files changed, 138 insertions(+), 1 deletion(-)
>>>>> 
>>>>> diff --git a/include/linux/sched.h b/include/linux/sched.h
>>>>> index 3c8dcc5ff039..4f9edf013df3 100644
>>>>> --- a/include/linux/sched.h
>>>>> +++ b/include/linux/sched.h
>>>>> @@ -688,6 +688,7 @@ struct task_struct {
>>>>> #ifdef CONFIG_SCHED_CORE
>>>>> struct rb_nodecore_node;
>>>>> unsigned longcore_cookie;
>>>>> +unsigned intcore_occupation;
>>>>> #endif
>>>>> 
>>>>> #ifdef CONFIG_CGROUP_SCHED
>>>>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>>>>> index 4d6d6a678013..fb9edb09ead7 100644
>>>>> --- a/kernel/sched/core.c
>>>>> +++ b/kernel/sched/core.c
>>>>> @@ -201,6 +201,21 @@ static struct task_struct *sched_core_find(struct rq 
>>>>> *rq, unsigned long cookie)
>>>>> return match;
>>>>> }
>>>>> 
>>>>> +static struct task_struct *sched_core_next(struct task_struct *p, 
>>>>> unsigned long cookie)
>>>>> +{
>>>>> +struct rb_node *node = >core_node;
>>>>> +
>>>>> +node = rb_next(node);
>>>>> +if (!node)
>>>>> +return NULL;
>>>>> +
>>>>> +p = container_of(node, struct task_struct, core_node);
>>>>> +if (p->core_cookie != cookie)
>>>>> +return NULL;
>>>>> +
>>>>> +return p;
>>>>> +}
>>>>> +
>>>>> /*
>>>>> * The static-key + stop-machine variable are needed such that:
>>>>> *
>>>>> @@ -4233,7 +4248,7 @@ pick_next_task(struct rq *rq, struct task_struct 
>>>>> *prev, struct rq_flags *rf)
>>>>> struct task_struct *next, *max = NULL;
>>>>> const struct sched_class *class;
>>>>> const struct cpumask *smt_mask;
>>>>> -int i, j, cpu;
>>>>> +int i, j, cpu, occ = 0;
>>>>> bool need_sync;
>>>>> 
>>>>> if (!sched_core_enabled(rq))
>>>>> @@ -4332,6 +4347,9 @@ pick_next_task(struct rq *rq, struct task_struct 
>>>>> *prev, struct rq_flags *rf)
>>>>> goto done;
>>>>> }
>>>>> 
>>>>> +if (!is_idle_task(p))
>>>>> +occ++;
>>>>> +
>>>>> rq_i->core_pick = p;
>>>>> 
>>>>> /*
>>>>> @@ -4357,6 +4375,7 @@ pick_next_task(struct rq *rq, struct task_struct 
>>>>> *prev, struct rq_flags *rf)
>>>>> 
>>>>> cpu_rq(j)->core_pick = NULL;
>>>>> }
>>>>> +occ = 1;
>>>>> goto again;
>>>>> } else {
>>>>> /*
>>>>> @@ -4393,6 +4412,8 @@ next_class:;
>>>>> if (is_idle_task(rq_i->core_pick) && rq_i->nr_running)
>>>>> rq_i->core_forceidle = true;
>>>>> 
>>>>>

Re: [RFC PATCH 10/16] sched: Trivial forced-newidle balancer(Internet mail)

2020-07-19 Thread
Hi,

> On Jul 1, 2020, at 5:32 AM, Vineeth Remanan Pillai  
> wrote:
> 
> From: Peter Zijlstra 
> 
> When a sibling is forced-idle to match the core-cookie; search for
> matching tasks to fill the core.
> 
> rcu_read_unlock() can incur an infrequent deadlock in
> sched_core_balance(). Fix this by using the RCU-sched flavor instead.
> 
> Signed-off-by: Peter Zijlstra (Intel) 
> Signed-off-by: Joel Fernandes (Google) 
> Acked-by: Paul E. McKenney 
> ---
> include/linux/sched.h |   1 +
> kernel/sched/core.c   | 131 +-
> kernel/sched/idle.c   |   1 +
> kernel/sched/sched.h  |   6 ++
> 4 files changed, 138 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 3c8dcc5ff039..4f9edf013df3 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -688,6 +688,7 @@ struct task_struct {
> #ifdef CONFIG_SCHED_CORE
>   struct rb_node  core_node;
>   unsigned long   core_cookie;
> + unsigned intcore_occupation;
> #endif
> 
> #ifdef CONFIG_CGROUP_SCHED
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 4d6d6a678013..fb9edb09ead7 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -201,6 +201,21 @@ static struct task_struct *sched_core_find(struct rq 
> *rq, unsigned long cookie)
>   return match;
> }
> 
> +static struct task_struct *sched_core_next(struct task_struct *p, unsigned 
> long cookie)
> +{
> + struct rb_node *node = >core_node;
> +
> + node = rb_next(node);
> + if (!node)
> + return NULL;
> +
> + p = container_of(node, struct task_struct, core_node);
> + if (p->core_cookie != cookie)
> + return NULL;
> +
> + return p;
> +}
> +
> /*
>  * The static-key + stop-machine variable are needed such that:
>  *
> @@ -4233,7 +4248,7 @@ pick_next_task(struct rq *rq, struct task_struct *prev, 
> struct rq_flags *rf)
>   struct task_struct *next, *max = NULL;
>   const struct sched_class *class;
>   const struct cpumask *smt_mask;
> - int i, j, cpu;
> + int i, j, cpu, occ = 0;
>   bool need_sync;
> 
>   if (!sched_core_enabled(rq))
> @@ -4332,6 +4347,9 @@ pick_next_task(struct rq *rq, struct task_struct *prev, 
> struct rq_flags *rf)
>   goto done;
>   }
> 
> + if (!is_idle_task(p))
> + occ++;
> +
>   rq_i->core_pick = p;
> 
>   /*
> @@ -4357,6 +4375,7 @@ pick_next_task(struct rq *rq, struct task_struct *prev, 
> struct rq_flags *rf)
> 
>   cpu_rq(j)->core_pick = NULL;
>   }
> + occ = 1;
>   goto again;
>   } else {
>   /*
> @@ -4393,6 +4412,8 @@ next_class:;
>   if (is_idle_task(rq_i->core_pick) && rq_i->nr_running)
>   rq_i->core_forceidle = true;
> 
> + rq_i->core_pick->core_occupation = occ;
> +
>   if (i == cpu)
>   continue;
> 
> @@ -4408,6 +4429,114 @@ next_class:;
>   return next;
> }
> 
> +static bool try_steal_cookie(int this, int that)
> +{
> + struct rq *dst = cpu_rq(this), *src = cpu_rq(that);
> + struct task_struct *p;
> + unsigned long cookie;
> + bool success = false;
> +
> + local_irq_disable();
> + double_rq_lock(dst, src);
> +
> + cookie = dst->core->core_cookie;
> + if (!cookie)
> + goto unlock;
> +
> + if (dst->curr != dst->idle)
> + goto unlock;
> +
> + p = sched_core_find(src, cookie);
> + if (p == src->idle)
> + goto unlock;
> +
> + do {
> + if (p == src->core_pick || p == src->curr)
> + goto next;
> +
> + if (!cpumask_test_cpu(this, >cpus_mask))
> + goto next;
> +
> + if (p->core_occupation > dst->idle->core_occupation)
> + goto next;
> +
> + p->on_rq = TASK_ON_RQ_MIGRATING;
> + deactivate_task(src, p, 0);
> + set_task_cpu(p, this);
> + activate_task(dst, p, 0);
> + p->on_rq = TASK_ON_RQ_QUEUED;
> +
> + resched_curr(dst);
> +
> + success = true;
> + break;
> +
> +next:
> + p = sched_core_next(p, cookie);
> + } while (p);
> +
> +unlock:
> + double_rq_unlock(dst, src);
> + local_irq_enable();
> +
> + return success;
> +}
> +
> +static bool steal_cookie_task(int cpu, struct sched_domain *sd)
> +{
> + int i;
> +
> + for_each_cpu_wrap(i, sched_domain_span(sd), cpu) {
Since (i == cpu) should be skipped, should we start iteration at cpu+1? like,
for_each_cpu_wrap(i, sched_domain_span(sd), cpu+1) {
  

Re: [PATCH RFC] sched: Add a per-thread core scheduling interface(Internet mail)

2020-05-20 Thread



> On May 21, 2020, at 6:26 AM, Joel Fernandes (Google)  
> wrote:
> 
> Add a per-thread core scheduling interface which allows a thread to tag
> itself and enable core scheduling. Based on discussion at OSPM with
> maintainers, we propose a prctl(2) interface accepting values of 0 or 1.
> 1 - enable core scheduling for the task.
> 0 - disable core scheduling for the task.
> 
> Special cases:
> (1)
> The core-scheduling patchset contains a CGroup interface as well. In
> order for us to respect users of that interface, we avoid overriding the
> tag if a task was CGroup-tagged because the task becomes inconsistent
> with the CGroup tag. Instead return -EBUSY.
> 
> (2)
> If a task is prctl-tagged, allow the CGroup interface to override
> the task's tag.
> 
> ChromeOS will use core-scheduling to securely enable hyperthreading.
> This cuts down the keypress latency in Google docs from 150ms to 50ms
> while improving the camera streaming frame rate by ~3%.
Hi,
Are the performance improvements compared to the hyperthreading disabled 
scenario or not?
Could you help to explain how the keypress latency improvement comes with 
core-scheduling?

Thanks a lot.

Regards,
Jiang



Re: [PATCH] rcu/tree_exp: cleanup initialized but not used rdp(Internet mail)

2019-04-23 Thread


> 在 2019年4月23日,下午4:06,Paul E. McKenney  写道:
> 
> On Tue, Apr 23, 2019 at 09:21:55AM +0800, Jiang Biao wrote:
>> rdp is initialized but never used in synchronize_rcu_expedited(),
>> just remove it.
>> 
>> Signed-off-by: Jiang Biao 
> 
> I queued and pushed both patches for testing and review, good eyes,
> thank you!  I reworded the subject and commit logs, so could you please
> check to make sure that I didn't mess anything up?
> 
>   Thanx, Paul
> 
Already checked, everything’s good to me. Thanks a lot.
Jiang,

>> ---
>> kernel/rcu/tree_exp.h | 2 --
>> 1 file changed, 2 deletions(-)
>> 
>> diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
>> index 4c2a0189e748..5772612379e4 100644
>> --- a/kernel/rcu/tree_exp.h
>> +++ b/kernel/rcu/tree_exp.h
>> @@ -733,7 +733,6 @@ static void sync_sched_exp_online_cleanup(int cpu)
>>  */
>> void synchronize_rcu_expedited(void)
>> {
>> -struct rcu_data *rdp;
>>  struct rcu_exp_work rew;
>>  struct rcu_node *rnp;
>>  unsigned long s;
>> @@ -770,7 +769,6 @@ void synchronize_rcu_expedited(void)
>>  }
>> 
>>  /* Wait for expedited grace period to complete. */
>> -rdp = per_cpu_ptr(_data, raw_smp_processor_id());
>>  rnp = rcu_get_root();
>>  wait_event(rnp->exp_wq[rcu_seq_ctr(s) & 0x3],
>> sync_exp_work_done(s));
>> -- 
>> 2.17.2 (Apple Git-113)
>> 
> 
>