Re: [PATCH 2/3] sched: Move idle_balance() to post_schedule
On Fri, 2013-02-15 at 12:51 +0100, Peter Zijlstra wrote: > On Wed, 2013-02-13 at 14:05 -0500, Steven Rostedt wrote: > Ah, so that's push_rt_task() which wants to move extra rt tasks to other > cpus. Doing that from where we have idle_balance() won't actually work I > think since we might need to move current, which we cannot at that point > -- I'm thinking a higher prio task (than current) waking to this cpu and > then cascading current to another cpu, can that happen? Yep, that's exactly why we do the push after the switch. It also catches a bunch of races that can happen if in the middle of the switch another CPU drops in priority. A drop in priority can cause that CPU to search for pending rt tasks on other CPUs, but because current hasn't been taken off the queue yet, it can miss trying to pull it, and current could get stuck on its CPU even though there's another CPU it could run on. Having the higher priority task do the check solves that race. > > If we never need to migrate current because we don't do the cascade by > ensuring we wake the higher prio task to the approriate cpu we might > just get away with it. I'm not exactly sure what you mean by this? Migrate the higher task? But what if it's pinned. And we use to do that, until we found out that higher priority tasks were ping-ponging around like crazy. -- Steve -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/3] sched: Move idle_balance() to post_schedule
On Wed, 2013-02-13 at 14:05 -0500, Steven Rostedt wrote: > That is, the CPU is about to go idle, thus a load balance is done, and > perhaps a task is pulled to the current queue. To do this, rq locks > and > such need to be grabbed across CPUs. Right, grabbing the rq locks and all isn't my main worry, we do that either case, but my worry was the two extra switches we do for no good reason at all. Now its not as if we'll actually run the idle thread, that would be very expensive indeed, so its just the two context_switch() calls, but still, I somehow remember us spending quite a lot of effort to keep idle_balance where it is now, if only I could remember the benchmark we had for it :/ Can't you do the opposite and fold post_schedule() into idle_balance()? /me goes stare at code to help remember what the -rt requirements were again.. Ah, so that's push_rt_task() which wants to move extra rt tasks to other cpus. Doing that from where we have idle_balance() won't actually work I think since we might need to move current, which we cannot at that point -- I'm thinking a higher prio task (than current) waking to this cpu and then cascading current to another cpu, can that happen? If we never need to migrate current because we don't do the cascade by ensuring we wake the higher prio task to the approriate cpu we might just get away with it. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/3] sched: Move idle_balance() to post_schedule
On Wed, 2013-02-13 at 14:05 -0500, Steven Rostedt wrote: That is, the CPU is about to go idle, thus a load balance is done, and perhaps a task is pulled to the current queue. To do this, rq locks and such need to be grabbed across CPUs. Right, grabbing the rq locks and all isn't my main worry, we do that either case, but my worry was the two extra switches we do for no good reason at all. Now its not as if we'll actually run the idle thread, that would be very expensive indeed, so its just the two context_switch() calls, but still, I somehow remember us spending quite a lot of effort to keep idle_balance where it is now, if only I could remember the benchmark we had for it :/ Can't you do the opposite and fold post_schedule() into idle_balance()? /me goes stare at code to help remember what the -rt requirements were again.. Ah, so that's push_rt_task() which wants to move extra rt tasks to other cpus. Doing that from where we have idle_balance() won't actually work I think since we might need to move current, which we cannot at that point -- I'm thinking a higher prio task (than current) waking to this cpu and then cascading current to another cpu, can that happen? If we never need to migrate current because we don't do the cascade by ensuring we wake the higher prio task to the approriate cpu we might just get away with it. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/3] sched: Move idle_balance() to post_schedule
On Fri, 2013-02-15 at 12:51 +0100, Peter Zijlstra wrote: On Wed, 2013-02-13 at 14:05 -0500, Steven Rostedt wrote: Ah, so that's push_rt_task() which wants to move extra rt tasks to other cpus. Doing that from where we have idle_balance() won't actually work I think since we might need to move current, which we cannot at that point -- I'm thinking a higher prio task (than current) waking to this cpu and then cascading current to another cpu, can that happen? Yep, that's exactly why we do the push after the switch. It also catches a bunch of races that can happen if in the middle of the switch another CPU drops in priority. A drop in priority can cause that CPU to search for pending rt tasks on other CPUs, but because current hasn't been taken off the queue yet, it can miss trying to pull it, and current could get stuck on its CPU even though there's another CPU it could run on. Having the higher priority task do the check solves that race. If we never need to migrate current because we don't do the cascade by ensuring we wake the higher prio task to the approriate cpu we might just get away with it. I'm not exactly sure what you mean by this? Migrate the higher task? But what if it's pinned. And we use to do that, until we found out that higher priority tasks were ping-ponging around like crazy. -- Steve -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/3] sched: Move idle_balance() to post_schedule
On Wed, 2013-02-13 at 19:43 +0100, Peter Zijlstra wrote: > How does that follow? We can have to-idle switches _far_ more often than > we balance. I ran "perf stat -a -r 100 hackbench 500" on an i7 4 core hyperthreaded box, and got the following results: With no patches applied: Performance counter stats for '/work/c/hackbench 500' (100 runs): 199820.045583 task-clock#8.016 CPUs utilized ( +- 5.29% ) [100.00%] 3,594,264 context-switches #0.018 M/sec ( +- 5.94% ) [100.00%] 352,240 cpu-migrations#0.002 M/sec ( +- 3.31% ) [100.00%] 1,006,732 page-faults #0.005 M/sec ( +- 0.56% ) 293,801,912,874 cycles#1.470 GHz ( +- 4.20% ) [100.00%] 261,808,125,109 stalled-cycles-frontend # 89.11% frontend cycles idle ( +- 4.38% ) [100.00%] stalled-cycles-backend 135,521,344,089 instructions #0.46 insns per cycle #1.93 stalled cycles per insn ( +- 4.37% ) [100.00%] 26,198,116,586 branches # 131.109 M/sec ( +- 4.59% ) [100.00%] 115,326,812 branch-misses #0.44% of all branches ( +- 4.12% ) 24.929136087 seconds time elapsed ( +- 5.31% ) With the two idle_balance patches applied: Performance counter stats for '/work/c/hackbench 500' (100 runs): 178619.929457 task-clock#8.011 CPUs utilized ( +- 4.16% ) [100.00%] 3,171,229 context-switches #0.018 M/sec ( +- 3.30% ) [100.00%] 323,115 cpu-migrations#0.002 M/sec ( +- 2.47% ) [100.00%] 994,506 page-faults #0.006 M/sec ( +- 0.52% ) 269,744,391,573 cycles#1.510 GHz ( +- 2.12% ) [100.00%] 238,589,242,461 stalled-cycles-frontend # 88.45% frontend cycles idle ( +- 2.26% ) [100.00%] stalled-cycles-backend 124,298,251,712 instructions #0.46 insns per cycle #1.92 stalled cycles per insn ( +- 1.99% ) [100.00%] 23,918,305,712 branches # 133.906 M/sec ( +- 2.13% ) [100.00%] 105,863,415 branch-misses #0.44% of all branches ( +- 2.14% ) 22.296151996 seconds time elapsed ( +- 4.18% ) And finally with the patch at the bottom of this email applied: Performance counter stats for '/work/c/hackbench 500' (100 runs): 170170.547682 task-clock#8.021 CPUs utilized ( +- 5.59% ) [100.00%] 3,118,923 context-switches #0.018 M/sec ( +- 4.82% ) [100.00%] 318,479 cpu-migrations#0.002 M/sec ( +- 2.58% ) [100.00%] 988,187 page-faults #0.006 M/sec ( +- 0.62% ) 271,343,352,987 cycles#1.595 GHz ( +- 3.84% ) [100.00%] 240,599,089,430 stalled-cycles-frontend # 88.67% frontend cycles idle ( +- 4.24% ) [100.00%] stalled-cycles-backend 125,888,645,748 instructions #0.46 insns per cycle #1.91 stalled cycles per insn ( +- 3.97% ) [100.00%] 24,219,147,811 branches # 142.323 M/sec ( +- 4.22% ) [100.00%] 105,077,636 branch-misses #0.43% of all branches ( +- 3.70% ) 21.214840224 seconds time elapsed ( +- 5.61% ) Yeah, it seems the extra check for rq empty helps. But even without that, the current idle patches still seem pretty good. -- Steve diff --git a/kernel/sched/idle_task.c b/kernel/sched/idle_task.c index 66b5220..025350b 100644 --- a/kernel/sched/idle_task.c +++ b/kernel/sched/idle_task.c @@ -16,7 +16,9 @@ select_task_rq_idle(struct task_struct *p, int sd_flag, int flags) static void post_schedule_idle(struct rq *rq) { - idle_balance(smp_processor_id(), rq); + /* rq lock was released, make sure system is still idle */ + if (likely(!rq->nr_running)) + idle_balance(smp_processor_id(), rq); } #endif /* CONFIG_SMP */ /* -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/3] sched: Move idle_balance() to post_schedule
On Wed, 2013-02-13 at 19:43 +0100, Peter Zijlstra wrote: How does that follow? We can have to-idle switches _far_ more often than we balance. I ran perf stat -a -r 100 hackbench 500 on an i7 4 core hyperthreaded box, and got the following results: With no patches applied: Performance counter stats for '/work/c/hackbench 500' (100 runs): 199820.045583 task-clock#8.016 CPUs utilized ( +- 5.29% ) [100.00%] 3,594,264 context-switches #0.018 M/sec ( +- 5.94% ) [100.00%] 352,240 cpu-migrations#0.002 M/sec ( +- 3.31% ) [100.00%] 1,006,732 page-faults #0.005 M/sec ( +- 0.56% ) 293,801,912,874 cycles#1.470 GHz ( +- 4.20% ) [100.00%] 261,808,125,109 stalled-cycles-frontend # 89.11% frontend cycles idle ( +- 4.38% ) [100.00%] not supported stalled-cycles-backend 135,521,344,089 instructions #0.46 insns per cycle #1.93 stalled cycles per insn ( +- 4.37% ) [100.00%] 26,198,116,586 branches # 131.109 M/sec ( +- 4.59% ) [100.00%] 115,326,812 branch-misses #0.44% of all branches ( +- 4.12% ) 24.929136087 seconds time elapsed ( +- 5.31% ) With the two idle_balance patches applied: Performance counter stats for '/work/c/hackbench 500' (100 runs): 178619.929457 task-clock#8.011 CPUs utilized ( +- 4.16% ) [100.00%] 3,171,229 context-switches #0.018 M/sec ( +- 3.30% ) [100.00%] 323,115 cpu-migrations#0.002 M/sec ( +- 2.47% ) [100.00%] 994,506 page-faults #0.006 M/sec ( +- 0.52% ) 269,744,391,573 cycles#1.510 GHz ( +- 2.12% ) [100.00%] 238,589,242,461 stalled-cycles-frontend # 88.45% frontend cycles idle ( +- 2.26% ) [100.00%] not supported stalled-cycles-backend 124,298,251,712 instructions #0.46 insns per cycle #1.92 stalled cycles per insn ( +- 1.99% ) [100.00%] 23,918,305,712 branches # 133.906 M/sec ( +- 2.13% ) [100.00%] 105,863,415 branch-misses #0.44% of all branches ( +- 2.14% ) 22.296151996 seconds time elapsed ( +- 4.18% ) And finally with the patch at the bottom of this email applied: Performance counter stats for '/work/c/hackbench 500' (100 runs): 170170.547682 task-clock#8.021 CPUs utilized ( +- 5.59% ) [100.00%] 3,118,923 context-switches #0.018 M/sec ( +- 4.82% ) [100.00%] 318,479 cpu-migrations#0.002 M/sec ( +- 2.58% ) [100.00%] 988,187 page-faults #0.006 M/sec ( +- 0.62% ) 271,343,352,987 cycles#1.595 GHz ( +- 3.84% ) [100.00%] 240,599,089,430 stalled-cycles-frontend # 88.67% frontend cycles idle ( +- 4.24% ) [100.00%] not supported stalled-cycles-backend 125,888,645,748 instructions #0.46 insns per cycle #1.91 stalled cycles per insn ( +- 3.97% ) [100.00%] 24,219,147,811 branches # 142.323 M/sec ( +- 4.22% ) [100.00%] 105,077,636 branch-misses #0.43% of all branches ( +- 3.70% ) 21.214840224 seconds time elapsed ( +- 5.61% ) Yeah, it seems the extra check for rq empty helps. But even without that, the current idle patches still seem pretty good. -- Steve diff --git a/kernel/sched/idle_task.c b/kernel/sched/idle_task.c index 66b5220..025350b 100644 --- a/kernel/sched/idle_task.c +++ b/kernel/sched/idle_task.c @@ -16,7 +16,9 @@ select_task_rq_idle(struct task_struct *p, int sd_flag, int flags) static void post_schedule_idle(struct rq *rq) { - idle_balance(smp_processor_id(), rq); + /* rq lock was released, make sure system is still idle */ + if (likely(!rq-nr_running)) + idle_balance(smp_processor_id(), rq); } #endif /* CONFIG_SMP */ /* -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/3] sched: Move idle_balance() to post_schedule
On Wed, 2013-02-13 at 19:43 +0100, Peter Zijlstra wrote: > On Tue, 2013-02-12 at 17:54 -0500, Steven Rostedt wrote: > > There's no real reason that the idle_balance() needs to be called in > > the > > middle of schedule anyway. The only benefit is that if a task is > > pulled > > to this CPU, it can be scheduled without the need to schedule the idle > > task. > > Uhm, istr that extra schedule being an issue somewhere.. Make very sure > you don't regress anything silly like sysbench or hackbench. Maybe ask > Mike, he seems to have a better retention for benchmark weirdness than > me. I'm all for benchmarks :-) > > > But load balancing and migrating the task makes a switch to idle > > and back negligible. > > How does that follow? We can have to-idle switches _far_ more often than > we balance. But we have the to-idle switch regardless in that case, don't we? That is, the CPU is about to go idle, thus a load balance is done, and perhaps a task is pulled to the current queue. To do this, rq locks and such need to be grabbed across CPUs. If it didn't balance the switch would happen anyways. The current method is: if (!rq->nr_running) idle_balance(); /* if something pulled, then run that instead, if not, continue to switch to idle. */ What this change did was: pick_next_task_idle() rq->post_schedule = 1; post_schedule() idle_balance(); Now if a balance occurred, we would have to switch back from idle, to what we just pulled. Hence the change is that a switch to and from idle is done only in the case that load balancing occurred, otherwise it just goes to idle like it always has. Or perhaps I need add another check to make sure rq->nr_running is still 0 before doing the balance, because the rq lock was released. That is: post_schedule() if (!rq->nr_running) idle_balance(); needs to be added. -- Steve -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/3] sched: Move idle_balance() to post_schedule
On Tue, 2013-02-12 at 17:54 -0500, Steven Rostedt wrote: > There's no real reason that the idle_balance() needs to be called in > the > middle of schedule anyway. The only benefit is that if a task is > pulled > to this CPU, it can be scheduled without the need to schedule the idle > task. Uhm, istr that extra schedule being an issue somewhere.. Make very sure you don't regress anything silly like sysbench or hackbench. Maybe ask Mike, he seems to have a better retention for benchmark weirdness than me. > But load balancing and migrating the task makes a switch to idle > and back negligible. How does that follow? We can have to-idle switches _far_ more often than we balance. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/3] sched: Move idle_balance() to post_schedule
On Tue, 2013-02-12 at 17:54 -0500, Steven Rostedt wrote: There's no real reason that the idle_balance() needs to be called in the middle of schedule anyway. The only benefit is that if a task is pulled to this CPU, it can be scheduled without the need to schedule the idle task. Uhm, istr that extra schedule being an issue somewhere.. Make very sure you don't regress anything silly like sysbench or hackbench. Maybe ask Mike, he seems to have a better retention for benchmark weirdness than me. But load balancing and migrating the task makes a switch to idle and back negligible. How does that follow? We can have to-idle switches _far_ more often than we balance. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/3] sched: Move idle_balance() to post_schedule
On Wed, 2013-02-13 at 19:43 +0100, Peter Zijlstra wrote: On Tue, 2013-02-12 at 17:54 -0500, Steven Rostedt wrote: There's no real reason that the idle_balance() needs to be called in the middle of schedule anyway. The only benefit is that if a task is pulled to this CPU, it can be scheduled without the need to schedule the idle task. Uhm, istr that extra schedule being an issue somewhere.. Make very sure you don't regress anything silly like sysbench or hackbench. Maybe ask Mike, he seems to have a better retention for benchmark weirdness than me. I'm all for benchmarks :-) But load balancing and migrating the task makes a switch to idle and back negligible. How does that follow? We can have to-idle switches _far_ more often than we balance. But we have the to-idle switch regardless in that case, don't we? That is, the CPU is about to go idle, thus a load balance is done, and perhaps a task is pulled to the current queue. To do this, rq locks and such need to be grabbed across CPUs. If it didn't balance the switch would happen anyways. The current method is: if (!rq-nr_running) idle_balance(); /* if something pulled, then run that instead, if not, continue to switch to idle. */ What this change did was: pick_next_task_idle() rq-post_schedule = 1; post_schedule() idle_balance(); Now if a balance occurred, we would have to switch back from idle, to what we just pulled. Hence the change is that a switch to and from idle is done only in the case that load balancing occurred, otherwise it just goes to idle like it always has. Or perhaps I need add another check to make sure rq-nr_running is still 0 before doing the balance, because the rq lock was released. That is: post_schedule() if (!rq-nr_running) idle_balance(); needs to be added. -- Steve -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2/3] sched: Move idle_balance() to post_schedule
From: "Steven Rostedt (Red Hat)" The idle_balance() code is called to do task load balancing just before going to idle. This makes sense as the CPU is about to sleep anyway. But currently it's called in the middle of the scheduler and in a place that must have interrupts disabled. That means, while the load balancing is going on, if a task wakes up on this CPU, it wont get to run while the interrupts are disabled. The idle task doing the balancing will be clueless about it. There's no real reason that the idle_balance() needs to be called in the middle of schedule anyway. The only benefit is that if a task is pulled to this CPU, it can be scheduled without the need to schedule the idle task. But load balancing and migrating the task makes a switch to idle and back negligible. By using the post_schedule function pointer of the sched class, the unlikely branch in the hot path of the scheduler can be removed, and the idle task itself can do the load balancing. Another advantage of this, is that by moving the idle_balance to the post_schedule routine, interrupts can now be enabled in the load balance allowing for interrupts and wakeups to still occur on that CPU while a balance is taking place. The enabling of interrupts will come as a separate patch. Cc: Peter Zijlstra Signed-off-by: Steven Rostedt --- kernel/sched/core.c |3 --- kernel/sched/idle_task.c | 10 ++ 2 files changed, 10 insertions(+), 3 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 1dff78a..a9317b7 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -2927,9 +2927,6 @@ need_resched: pre_schedule(rq, prev); - if (unlikely(!rq->nr_running)) - idle_balance(cpu, rq); - put_prev_task(rq, prev); next = pick_next_task(rq); clear_tsk_need_resched(prev); diff --git a/kernel/sched/idle_task.c b/kernel/sched/idle_task.c index b6baf37..66b5220 100644 --- a/kernel/sched/idle_task.c +++ b/kernel/sched/idle_task.c @@ -13,6 +13,11 @@ select_task_rq_idle(struct task_struct *p, int sd_flag, int flags) { return task_cpu(p); /* IDLE tasks as never migrated */ } + +static void post_schedule_idle(struct rq *rq) +{ + idle_balance(smp_processor_id(), rq); +} #endif /* CONFIG_SMP */ /* * Idle tasks are unconditionally rescheduled: @@ -25,6 +30,10 @@ static void check_preempt_curr_idle(struct rq *rq, struct task_struct *p, int fl static struct task_struct *pick_next_task_idle(struct rq *rq) { schedstat_inc(rq, sched_goidle); +#ifdef CONFIG_SMP + /* Trigger the post schedule to do an idle_balance */ + rq->post_schedule = 1; +#endif return rq->idle; } @@ -86,6 +95,7 @@ const struct sched_class idle_sched_class = { #ifdef CONFIG_SMP .select_task_rq = select_task_rq_idle, + .post_schedule = post_schedule_idle, #endif .set_curr_task = set_curr_task_idle, -- 1.7.10.4 signature.asc Description: This is a digitally signed message part
[PATCH 2/3] sched: Move idle_balance() to post_schedule
From: Steven Rostedt (Red Hat) rost...@goodmis.org The idle_balance() code is called to do task load balancing just before going to idle. This makes sense as the CPU is about to sleep anyway. But currently it's called in the middle of the scheduler and in a place that must have interrupts disabled. That means, while the load balancing is going on, if a task wakes up on this CPU, it wont get to run while the interrupts are disabled. The idle task doing the balancing will be clueless about it. There's no real reason that the idle_balance() needs to be called in the middle of schedule anyway. The only benefit is that if a task is pulled to this CPU, it can be scheduled without the need to schedule the idle task. But load balancing and migrating the task makes a switch to idle and back negligible. By using the post_schedule function pointer of the sched class, the unlikely branch in the hot path of the scheduler can be removed, and the idle task itself can do the load balancing. Another advantage of this, is that by moving the idle_balance to the post_schedule routine, interrupts can now be enabled in the load balance allowing for interrupts and wakeups to still occur on that CPU while a balance is taking place. The enabling of interrupts will come as a separate patch. Cc: Peter Zijlstra pet...@infradead.org Signed-off-by: Steven Rostedt rost...@goodmis.org --- kernel/sched/core.c |3 --- kernel/sched/idle_task.c | 10 ++ 2 files changed, 10 insertions(+), 3 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 1dff78a..a9317b7 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -2927,9 +2927,6 @@ need_resched: pre_schedule(rq, prev); - if (unlikely(!rq-nr_running)) - idle_balance(cpu, rq); - put_prev_task(rq, prev); next = pick_next_task(rq); clear_tsk_need_resched(prev); diff --git a/kernel/sched/idle_task.c b/kernel/sched/idle_task.c index b6baf37..66b5220 100644 --- a/kernel/sched/idle_task.c +++ b/kernel/sched/idle_task.c @@ -13,6 +13,11 @@ select_task_rq_idle(struct task_struct *p, int sd_flag, int flags) { return task_cpu(p); /* IDLE tasks as never migrated */ } + +static void post_schedule_idle(struct rq *rq) +{ + idle_balance(smp_processor_id(), rq); +} #endif /* CONFIG_SMP */ /* * Idle tasks are unconditionally rescheduled: @@ -25,6 +30,10 @@ static void check_preempt_curr_idle(struct rq *rq, struct task_struct *p, int fl static struct task_struct *pick_next_task_idle(struct rq *rq) { schedstat_inc(rq, sched_goidle); +#ifdef CONFIG_SMP + /* Trigger the post schedule to do an idle_balance */ + rq-post_schedule = 1; +#endif return rq-idle; } @@ -86,6 +95,7 @@ const struct sched_class idle_sched_class = { #ifdef CONFIG_SMP .select_task_rq = select_task_rq_idle, + .post_schedule = post_schedule_idle, #endif .set_curr_task = set_curr_task_idle, -- 1.7.10.4 signature.asc Description: This is a digitally signed message part