Haven't finished (hardly started) the arch code for this yet (and probably
broke ppc64), so it is just an RFC at this stage.
The large CC list is because it is a reasonably big change, so I hope one
or two of you smart guys can see if it looks sound before I try to tackle
the rest of the arch code and start asking for testers.
This actually improves performance noticably (ie. a % or so) on schedule /
wakeup happy benchmarks (tbench, on a dual Xeon with HT using mwait idle).
--
SUSE Labs, Novell Inc.
Make some changes to the NEED_RESCHED and POLLING_NRFLAG to reduce
confusion, and make their semantics rigid. Also have preempt explicitly
disabled in idle routines. Improves efficiency of resched_task and some
cpu_idle routines.
* In resched_task:
- TIF_NEED_RESCHED is only cleared with the task's runqueue lock held,
and as we hold it during resched_task, then there is no need for an
atomic test and set there. (The only time this may prevent an IPI is
when the task's quantum expires in the timer interrupt - this is a
very rare race to bother with in comparison with the cost).
- If TIF_NEED_RESCHED is set, then we don't need to do anything. It
won't get unset until the task get's schedule()d off.
- If we are running on the same CPU as the task we resched, then set
TIF_NEED_RESCHED and no further action is required.
- If we are running on another CPU, and TIF_POLLING_NRFLAG is *not* set
after TIF_NEED_RESCHED has been set, then we need to send an IPI.
Using these rules, we are able to remove the test and set operation in
resched_task, and make clear the previously vague semantics of POLLING_NRFLAG.
* In idle routines:
- Enter cpu_idle with preempt disabled. When the need_resched() condition
becomes true, explicitly call schedule(). This makes things a bit clearer
(IMO), but haven't updated all architectures yet.
- Many do a test and clear of TIF_NEED_RESCHED for some reason. According
to the resched_task rules, this isn't needed (and actually breaks the
assumption that TIF_NEED_RESCHED is only cleared with the runqueue lock
held). So remove that. Generally one less locked memory op when switching
to the idle thread.
- Many idle routines clear TIF_POLLING_NRFLAG, and only set it in the inner
most polling idle loops. The above resched_task semantics allow it to be
set until before the last time need_resched() is checked before going into
a halt requiring interrupt wakeup.
Many idle routines simply never enter such a halt, and so POLLING_NRFLAG
can be always left set, completely eliminating resched IPIs when rescheduling
the idle task.
POLLING_NRFLAG width can be increased, to reduce the chance of resched IPIs.
Index: linux-2.6/kernel/sched.c
===
--- linux-2.6.orig/kernel/sched.c 2005-03-27 00:25:49.0 +1100
+++ linux-2.6/kernel/sched.c2005-03-27 00:27:30.0 +1100
@@ -796,21 +796,28 @@ static void deactivate_task(struct task_
#ifdef CONFIG_SMP
static void resched_task(task_t *p)
{
- int need_resched, nrpolling;
+ int cpu;
assert_spin_locked(&task_rq(p)->lock);
- /* minimise the chance of sending an interrupt to poll_idle() */
- nrpolling = test_tsk_thread_flag(p,TIF_POLLING_NRFLAG);
- need_resched = test_and_set_tsk_thread_flag(p,TIF_NEED_RESCHED);
- nrpolling |= test_tsk_thread_flag(p,TIF_POLLING_NRFLAG);
+ if (test_tsk_thread_flag(p, TIF_NEED_RESCHED))
+ return;
+
+ set_tsk_thread_flag(p, TIF_NEED_RESCHED);
- if (!need_resched && !nrpolling && (task_cpu(p) != smp_processor_id()))
- smp_send_reschedule(task_cpu(p));
+ cpu = task_cpu(p);
+ if (cpu == smp_processor_id())
+ return;
+
+ /* NEED_RESCHED must be visible before we test POLLING_NRFLAG */
+ smp_mb();
+ if (!test_tsk_thread_flag(p, TIF_POLLING_NRFLAG))
+ smp_send_reschedule(cpu);
}
#else
static inline void resched_task(task_t *p)
{
+ assert_spin_locked(&task_rq(p)->lock);
set_tsk_need_resched(p);
}
#endif
Index: linux-2.6/arch/i386/kernel/process.c
===
--- linux-2.6.orig/arch/i386/kernel/process.c 2005-03-27 00:25:49.0
+1100
+++ linux-2.6/arch/i386/kernel/process.c2005-03-27 00:27:30.0
+1100
@@ -95,14 +95,19 @@ EXPORT_SYMBOL(enable_hlt);
*/
void default_idle(void)
{
+ local_irq_enable();
+
if (!hlt_counter && boot_cpu_data.hlt_works_ok) {
- local_irq_disable();
- if (!need_resched())
+ clear_thread_flag(TIF_POLLING_NRFLAG);
+ smp_mb__after_clear_bit();
+ while (!need_resched()) {
+ local_irq_disable();
safe_halt();
- else
- local_irq_enable();
+ }
+ set_th