Re: [RFD/RFC PATCH 5/8] sched: Add proxy execution

2018-10-12 Thread Juri Lelli
On 12/10/18 09:22, luca abeni wrote:
> On Thu, 11 Oct 2018 14:53:25 +0200
> Peter Zijlstra  wrote:
> 
> [...]
> > > > > + if (rq->curr != rq->idle) {
> > > > > + rq->proxy = rq->idle;
> > > > > + set_tsk_need_resched(rq->idle);
> > > > > + /*
> > > > > +  * XXX [juril] don't we still need to migrate
> > > > > @next to
> > > > > +  * @owner's CPU?
> > > > > +  */
> > > > > + return rq->idle;
> > > > > + }  
> > > > 
> > > > If I understand well, this code ends up migrating the task only
> > > > if the CPU was previously idle? (scheduling the idle task if the
> > > > CPU was not previously idle)
> > > > 
> > > > Out of curiosity (I admit this is my ignorance), why is this
> > > > needed? If I understand well, after scheduling the idle task the
> > > > scheduler will be invoked again (because of the
> > > > set_tsk_need_resched(rq->idle)) but I do not understand why it is
> > > > not possible to migrate task "p" immediately (I would just check
> > > > "rq->curr != p", to avoid migrating the currently scheduled
> > > > task).  
> [...]
> > I think it was the safe and simple choice; note that we're not
> > migrating just a single @p, but a whole chain of @p.
> 
> Ah, that's the point I was missing... Thanks for explaining, now
> everything looks more clear!
> 
> 
> But... Here is my next dumb question: once the tasks are migrated to
> the other runqueue, what prevents the scheduler from migrating them
> back? In particular, task p: if it is (for example) a fixed priority
> task an is on this runqueue, it is probably because the FP invariant
> wants this... So, the push mechanism might end up migrating p back to
> this runqueue soon... No?

Not if p is going to be proxying for owner on owner's rq.
OTOH, I guess we might have counter-migrations generated by push/pull
decisions. Maybe we should remove potential proxies from pushable?
We'd still have the same problem for FAIR though.
In general it seems to make sense to me the fact that potential proxies
shouldn't participate to load balancing while waiting to be activated by
the mutex owner; they are basically sleeping, even if they are not.

> Another doubt: if I understand well, when a task p "blocks" on a mutex
> the proxy mechanism migrates it (and the whole chain of blocked tasks)
> to the owner's core... Right?
> Now, I understand why this is simpler to implement, but from the
> schedulability point of view shouldn't we migrate the owner to p's core
> instead?

Guess the most important reason is that we need to respect owner's
affinity, p (and the rest of the list) might have affinity mask that
doesn't work well with owner's.


Re: [RFD/RFC PATCH 5/8] sched: Add proxy execution

2018-10-12 Thread Juri Lelli
On 12/10/18 09:22, luca abeni wrote:
> On Thu, 11 Oct 2018 14:53:25 +0200
> Peter Zijlstra  wrote:
> 
> [...]
> > > > > + if (rq->curr != rq->idle) {
> > > > > + rq->proxy = rq->idle;
> > > > > + set_tsk_need_resched(rq->idle);
> > > > > + /*
> > > > > +  * XXX [juril] don't we still need to migrate
> > > > > @next to
> > > > > +  * @owner's CPU?
> > > > > +  */
> > > > > + return rq->idle;
> > > > > + }  
> > > > 
> > > > If I understand well, this code ends up migrating the task only
> > > > if the CPU was previously idle? (scheduling the idle task if the
> > > > CPU was not previously idle)
> > > > 
> > > > Out of curiosity (I admit this is my ignorance), why is this
> > > > needed? If I understand well, after scheduling the idle task the
> > > > scheduler will be invoked again (because of the
> > > > set_tsk_need_resched(rq->idle)) but I do not understand why it is
> > > > not possible to migrate task "p" immediately (I would just check
> > > > "rq->curr != p", to avoid migrating the currently scheduled
> > > > task).  
> [...]
> > I think it was the safe and simple choice; note that we're not
> > migrating just a single @p, but a whole chain of @p.
> 
> Ah, that's the point I was missing... Thanks for explaining, now
> everything looks more clear!
> 
> 
> But... Here is my next dumb question: once the tasks are migrated to
> the other runqueue, what prevents the scheduler from migrating them
> back? In particular, task p: if it is (for example) a fixed priority
> task an is on this runqueue, it is probably because the FP invariant
> wants this... So, the push mechanism might end up migrating p back to
> this runqueue soon... No?

Not if p is going to be proxying for owner on owner's rq.
OTOH, I guess we might have counter-migrations generated by push/pull
decisions. Maybe we should remove potential proxies from pushable?
We'd still have the same problem for FAIR though.
In general it seems to make sense to me the fact that potential proxies
shouldn't participate to load balancing while waiting to be activated by
the mutex owner; they are basically sleeping, even if they are not.

> Another doubt: if I understand well, when a task p "blocks" on a mutex
> the proxy mechanism migrates it (and the whole chain of blocked tasks)
> to the owner's core... Right?
> Now, I understand why this is simpler to implement, but from the
> schedulability point of view shouldn't we migrate the owner to p's core
> instead?

Guess the most important reason is that we need to respect owner's
affinity, p (and the rest of the list) might have affinity mask that
doesn't work well with owner's.


Re: [RFD/RFC PATCH 5/8] sched: Add proxy execution

2018-10-12 Thread luca abeni
On Thu, 11 Oct 2018 14:53:25 +0200
Peter Zijlstra  wrote:

[...]
> > > > +   if (rq->curr != rq->idle) {
> > > > +   rq->proxy = rq->idle;
> > > > +   set_tsk_need_resched(rq->idle);
> > > > +   /*
> > > > +* XXX [juril] don't we still need to migrate
> > > > @next to
> > > > +* @owner's CPU?
> > > > +*/
> > > > +   return rq->idle;
> > > > +   }  
> > > 
> > > If I understand well, this code ends up migrating the task only
> > > if the CPU was previously idle? (scheduling the idle task if the
> > > CPU was not previously idle)
> > > 
> > > Out of curiosity (I admit this is my ignorance), why is this
> > > needed? If I understand well, after scheduling the idle task the
> > > scheduler will be invoked again (because of the
> > > set_tsk_need_resched(rq->idle)) but I do not understand why it is
> > > not possible to migrate task "p" immediately (I would just check
> > > "rq->curr != p", to avoid migrating the currently scheduled
> > > task).  
[...]
> I think it was the safe and simple choice; note that we're not
> migrating just a single @p, but a whole chain of @p.

Ah, that's the point I was missing... Thanks for explaining, now
everything looks more clear!


But... Here is my next dumb question: once the tasks are migrated to
the other runqueue, what prevents the scheduler from migrating them
back? In particular, task p: if it is (for example) a fixed priority
task an is on this runqueue, it is probably because the FP invariant
wants this... So, the push mechanism might end up migrating p back to
this runqueue soon... No?

Another doubt: if I understand well, when a task p "blocks" on a mutex
the proxy mechanism migrates it (and the whole chain of blocked tasks)
to the owner's core... Right?
Now, I understand why this is simpler to implement, but from the
schedulability point of view shouldn't we migrate the owner to p's core
instead?


Thanks,
Luca

> rq->curr must
> not be any of the possible @p's. rq->idle, is per definition not one
> of the @p's.
> 
> Does that make sense?



Re: [RFD/RFC PATCH 5/8] sched: Add proxy execution

2018-10-12 Thread luca abeni
On Thu, 11 Oct 2018 14:53:25 +0200
Peter Zijlstra  wrote:

[...]
> > > > +   if (rq->curr != rq->idle) {
> > > > +   rq->proxy = rq->idle;
> > > > +   set_tsk_need_resched(rq->idle);
> > > > +   /*
> > > > +* XXX [juril] don't we still need to migrate
> > > > @next to
> > > > +* @owner's CPU?
> > > > +*/
> > > > +   return rq->idle;
> > > > +   }  
> > > 
> > > If I understand well, this code ends up migrating the task only
> > > if the CPU was previously idle? (scheduling the idle task if the
> > > CPU was not previously idle)
> > > 
> > > Out of curiosity (I admit this is my ignorance), why is this
> > > needed? If I understand well, after scheduling the idle task the
> > > scheduler will be invoked again (because of the
> > > set_tsk_need_resched(rq->idle)) but I do not understand why it is
> > > not possible to migrate task "p" immediately (I would just check
> > > "rq->curr != p", to avoid migrating the currently scheduled
> > > task).  
[...]
> I think it was the safe and simple choice; note that we're not
> migrating just a single @p, but a whole chain of @p.

Ah, that's the point I was missing... Thanks for explaining, now
everything looks more clear!


But... Here is my next dumb question: once the tasks are migrated to
the other runqueue, what prevents the scheduler from migrating them
back? In particular, task p: if it is (for example) a fixed priority
task an is on this runqueue, it is probably because the FP invariant
wants this... So, the push mechanism might end up migrating p back to
this runqueue soon... No?

Another doubt: if I understand well, when a task p "blocks" on a mutex
the proxy mechanism migrates it (and the whole chain of blocked tasks)
to the owner's core... Right?
Now, I understand why this is simpler to implement, but from the
schedulability point of view shouldn't we migrate the owner to p's core
instead?


Thanks,
Luca

> rq->curr must
> not be any of the possible @p's. rq->idle, is per definition not one
> of the @p's.
> 
> Does that make sense?



Re: [RFD/RFC PATCH 5/8] sched: Add proxy execution

2018-10-11 Thread Juri Lelli
On 11/10/18 14:53, Peter Zijlstra wrote:

[...]

> I think it was the safe and simple choice; note that we're not migrating
> just a single @p, but a whole chain of @p. rq->curr must not be any of the
> possible @p's. rq->idle, is per definition not one of the @p's.
> 
> Does that make sense?

It does, and I guess is most probably the safest choice indeed. But,
just to put together a proper comment for next version..

The chain we are migrating is composed of blocked_task(s), so tasks that
blocked on a mutex owned by @p. Now, if such a chain has been built, it
means that proxy() has been called "for" @p previously, and @p might be
curr while one of its waiters might be proxy. So, none of the blocked_
task(s) should be rq->curr (even if one of their scheduling context
might be in use)?


Re: [RFD/RFC PATCH 5/8] sched: Add proxy execution

2018-10-11 Thread Juri Lelli
On 11/10/18 14:53, Peter Zijlstra wrote:

[...]

> I think it was the safe and simple choice; note that we're not migrating
> just a single @p, but a whole chain of @p. rq->curr must not be any of the
> possible @p's. rq->idle, is per definition not one of the @p's.
> 
> Does that make sense?

It does, and I guess is most probably the safest choice indeed. But,
just to put together a proper comment for next version..

The chain we are migrating is composed of blocked_task(s), so tasks that
blocked on a mutex owned by @p. Now, if such a chain has been built, it
means that proxy() has been called "for" @p previously, and @p might be
curr while one of its waiters might be proxy. So, none of the blocked_
task(s) should be rq->curr (even if one of their scheduling context
might be in use)?


Re: [RFD/RFC PATCH 5/8] sched: Add proxy execution

2018-10-11 Thread Peter Zijlstra
On Thu, Oct 11, 2018 at 02:34:48PM +0200, Juri Lelli wrote:
> Hi Luca,
> 
> On 10/10/18 13:10, luca abeni wrote:
> > Hi,
> > 
> > On Tue,  9 Oct 2018 11:24:31 +0200
> > Juri Lelli  wrote:
> > [...]
> > > +migrate_task:
> > [...]
> > > + put_prev_task(rq, next);
> > > + if (rq->curr != rq->idle) {
> > > + rq->proxy = rq->idle;
> > > + set_tsk_need_resched(rq->idle);
> > > + /*
> > > +  * XXX [juril] don't we still need to migrate @next
> > > to
> > > +  * @owner's CPU?
> > > +  */
> > > + return rq->idle;
> > > + }
> > 
> > If I understand well, this code ends up migrating the task only if the
> > CPU was previously idle? (scheduling the idle task if the CPU was not
> > previously idle)
> > 
> > Out of curiosity (I admit this is my ignorance), why is this needed?
> > If I understand well, after scheduling the idle task the scheduler will
> > be invoked again (because of the set_tsk_need_resched(rq->idle)) but I
> > do not understand why it is not possible to migrate task "p" immediately
> > (I would just check "rq->curr != p", to avoid migrating the currently
> > scheduled task).
> 
> As the comment suggests, I was also puzzled by this bit.
> 
> I'd be inclined to agree with you, it seems that the only case in which
> we want to "temporarily" schedule the idle task is if the proxy was
> executing (so it just blocked on the mutex and being scheduled out).
> 
> If it wasn't we should be able to let the current "curr" continue
> executing, in this case returning it as next will mean that schedule
> takes the else branch and there isn't an actual context switch.
> 
> > > + rq->proxy = _task;
> 
> [...]
> 
> We can maybe also rq->proxy = rq->curr and return rq->curr in such a
> case, instead of the below?
> 
> > > + return NULL; /* Retry task selection on _this_ CPU. */
> 
> Peter, what are we missing? :-)

I think it was the safe and simple choice; note that we're not migrating
just a single @p, but a whole chain of @p. rq->curr must not be any of the
possible @p's. rq->idle, is per definition not one of the @p's.

Does that make sense?


Re: [RFD/RFC PATCH 5/8] sched: Add proxy execution

2018-10-11 Thread Peter Zijlstra
On Thu, Oct 11, 2018 at 02:34:48PM +0200, Juri Lelli wrote:
> Hi Luca,
> 
> On 10/10/18 13:10, luca abeni wrote:
> > Hi,
> > 
> > On Tue,  9 Oct 2018 11:24:31 +0200
> > Juri Lelli  wrote:
> > [...]
> > > +migrate_task:
> > [...]
> > > + put_prev_task(rq, next);
> > > + if (rq->curr != rq->idle) {
> > > + rq->proxy = rq->idle;
> > > + set_tsk_need_resched(rq->idle);
> > > + /*
> > > +  * XXX [juril] don't we still need to migrate @next
> > > to
> > > +  * @owner's CPU?
> > > +  */
> > > + return rq->idle;
> > > + }
> > 
> > If I understand well, this code ends up migrating the task only if the
> > CPU was previously idle? (scheduling the idle task if the CPU was not
> > previously idle)
> > 
> > Out of curiosity (I admit this is my ignorance), why is this needed?
> > If I understand well, after scheduling the idle task the scheduler will
> > be invoked again (because of the set_tsk_need_resched(rq->idle)) but I
> > do not understand why it is not possible to migrate task "p" immediately
> > (I would just check "rq->curr != p", to avoid migrating the currently
> > scheduled task).
> 
> As the comment suggests, I was also puzzled by this bit.
> 
> I'd be inclined to agree with you, it seems that the only case in which
> we want to "temporarily" schedule the idle task is if the proxy was
> executing (so it just blocked on the mutex and being scheduled out).
> 
> If it wasn't we should be able to let the current "curr" continue
> executing, in this case returning it as next will mean that schedule
> takes the else branch and there isn't an actual context switch.
> 
> > > + rq->proxy = _task;
> 
> [...]
> 
> We can maybe also rq->proxy = rq->curr and return rq->curr in such a
> case, instead of the below?
> 
> > > + return NULL; /* Retry task selection on _this_ CPU. */
> 
> Peter, what are we missing? :-)

I think it was the safe and simple choice; note that we're not migrating
just a single @p, but a whole chain of @p. rq->curr must not be any of the
possible @p's. rq->idle, is per definition not one of the @p's.

Does that make sense?


Re: [RFD/RFC PATCH 5/8] sched: Add proxy execution

2018-10-11 Thread Juri Lelli
Hi Luca,

On 10/10/18 13:10, luca abeni wrote:
> Hi,
> 
> On Tue,  9 Oct 2018 11:24:31 +0200
> Juri Lelli  wrote:
> [...]
> > +migrate_task:
> [...]
> > +   put_prev_task(rq, next);
> > +   if (rq->curr != rq->idle) {
> > +   rq->proxy = rq->idle;
> > +   set_tsk_need_resched(rq->idle);
> > +   /*
> > +* XXX [juril] don't we still need to migrate @next
> > to
> > +* @owner's CPU?
> > +*/
> > +   return rq->idle;
> > +   }
> 
> If I understand well, this code ends up migrating the task only if the
> CPU was previously idle? (scheduling the idle task if the CPU was not
> previously idle)
> 
> Out of curiosity (I admit this is my ignorance), why is this needed?
> If I understand well, after scheduling the idle task the scheduler will
> be invoked again (because of the set_tsk_need_resched(rq->idle)) but I
> do not understand why it is not possible to migrate task "p" immediately
> (I would just check "rq->curr != p", to avoid migrating the currently
> scheduled task).

As the comment suggests, I was also puzzled by this bit.

I'd be inclined to agree with you, it seems that the only case in which
we want to "temporarily" schedule the idle task is if the proxy was
executing (so it just blocked on the mutex and being scheduled out).

If it wasn't we should be able to let the current "curr" continue
executing, in this case returning it as next will mean that schedule
takes the else branch and there isn't an actual context switch.

> > +   rq->proxy = _task;

[...]

We can maybe also rq->proxy = rq->curr and return rq->curr in such a
case, instead of the below?

> > +   return NULL; /* Retry task selection on _this_ CPU. */

Peter, what are we missing? :-)


Re: [RFD/RFC PATCH 5/8] sched: Add proxy execution

2018-10-11 Thread Juri Lelli
Hi Luca,

On 10/10/18 13:10, luca abeni wrote:
> Hi,
> 
> On Tue,  9 Oct 2018 11:24:31 +0200
> Juri Lelli  wrote:
> [...]
> > +migrate_task:
> [...]
> > +   put_prev_task(rq, next);
> > +   if (rq->curr != rq->idle) {
> > +   rq->proxy = rq->idle;
> > +   set_tsk_need_resched(rq->idle);
> > +   /*
> > +* XXX [juril] don't we still need to migrate @next
> > to
> > +* @owner's CPU?
> > +*/
> > +   return rq->idle;
> > +   }
> 
> If I understand well, this code ends up migrating the task only if the
> CPU was previously idle? (scheduling the idle task if the CPU was not
> previously idle)
> 
> Out of curiosity (I admit this is my ignorance), why is this needed?
> If I understand well, after scheduling the idle task the scheduler will
> be invoked again (because of the set_tsk_need_resched(rq->idle)) but I
> do not understand why it is not possible to migrate task "p" immediately
> (I would just check "rq->curr != p", to avoid migrating the currently
> scheduled task).

As the comment suggests, I was also puzzled by this bit.

I'd be inclined to agree with you, it seems that the only case in which
we want to "temporarily" schedule the idle task is if the proxy was
executing (so it just blocked on the mutex and being scheduled out).

If it wasn't we should be able to let the current "curr" continue
executing, in this case returning it as next will mean that schedule
takes the else branch and there isn't an actual context switch.

> > +   rq->proxy = _task;

[...]

We can maybe also rq->proxy = rq->curr and return rq->curr in such a
case, instead of the below?

> > +   return NULL; /* Retry task selection on _this_ CPU. */

Peter, what are we missing? :-)


Re: [RFD/RFC PATCH 5/8] sched: Add proxy execution

2018-10-10 Thread luca abeni
Hi,

On Tue,  9 Oct 2018 11:24:31 +0200
Juri Lelli  wrote:
[...]
> +migrate_task:
[...]
> + put_prev_task(rq, next);
> + if (rq->curr != rq->idle) {
> + rq->proxy = rq->idle;
> + set_tsk_need_resched(rq->idle);
> + /*
> +  * XXX [juril] don't we still need to migrate @next
> to
> +  * @owner's CPU?
> +  */
> + return rq->idle;
> + }

If I understand well, this code ends up migrating the task only if the
CPU was previously idle? (scheduling the idle task if the CPU was not
previously idle)

Out of curiosity (I admit this is my ignorance), why is this needed?
If I understand well, after scheduling the idle task the scheduler will
be invoked again (because of the set_tsk_need_resched(rq->idle)) but I
do not understand why it is not possible to migrate task "p" immediately
(I would just check "rq->curr != p", to avoid migrating the currently
scheduled task).


Thanks,
Luca

> + rq->proxy = _task;
> +
> + for (; p; p = p->blocked_task) {
> + int wake_cpu = p->wake_cpu;
> +
> + WARN_ON(p == rq->curr);
> +
> + p->on_rq = TASK_ON_RQ_MIGRATING;
> + dequeue_task(rq, p, 0);
> + set_task_cpu(p, that_cpu);
> + /*
> +  * We can abuse blocked_entry to migrate the thing,
> because @p is
> +  * still on the rq.
> +  */
> + list_add(>blocked_entry, _list);
> +
> + /*
> +  * Preserve p->wake_cpu, such that we can tell where
> it
> +  * used to run later.
> +  */
> + p->wake_cpu = wake_cpu;
> + }
> +
> + rq_unpin_lock(rq, rf);
> + raw_spin_unlock(>lock);
> + raw_spin_lock(_rq->lock);
> +
> + while (!list_empty(_list)) {
> + p = list_first_entry(_list, struct
> task_struct, blocked_entry);
> + list_del_init(>blocked_entry);
> +
> + enqueue_task(that_rq, p, 0);
> + check_preempt_curr(that_rq, p, 0);
> + p->on_rq = TASK_ON_RQ_QUEUED;
> + resched_curr(that_rq);
> + }
> +
> + raw_spin_unlock(_rq->lock);
> + raw_spin_lock(>lock);
> + rq_repin_lock(rq, rf);
> +
> + return NULL; /* Retry task selection on _this_ CPU. */
> +
> +owned_task:
> + /*
> +  * Its possible we interleave with mutex_unlock like:
> +  *
> +  *  lock(>lock);
> +  *proxy()
> +  * mutex_unlock()
> +  *   lock(_lock);
> +  *   next(owner) = current->blocked_task;
> +  *   unlock(_lock);
> +  *
> +  *   wake_up_q();
> +  * ...
> +  *   ttwu_remote()
> +  * __task_rq_lock()
> +  *lock(_lock);
> +  *owner == p
> +  *
> +  * Which leaves us to finish the ttwu_remote() and make it
> go.
> +  *
> +  * XXX is this happening in case of an HANDOFF to p?
> +  * In any case, reading of the owner in
> __mutex_unlock_slowpath is
> +  * done atomically outside wait_lock (only adding waiters to
> wake_q is
> +  * done inside the critical section).
> +  * Does this means we can get to proxy _w/o an owner_ if
> that was
> +  * cleared before grabbing wait_lock? Do we account for this
> case?
> +  * OK we actually do (see PROXY_EXEC ifdeffery in unlock
> function).
> +  */
> +
> + /*
> +  * Finish wakeup, will make the contending ttwu do a
> +  * _spurious_ wakeup, but all code should be able to
> +  * deal with that.
> +  */
> + owner->blocked_on = NULL;
> + owner->state = TASK_RUNNING;
> + // XXX task_woken
> +
> + /*
> +  * If @owner/@p is allowed to run on this CPU, make it go.
> +  */
> + if (cpumask_test_cpu(this_cpu, >cpus_allowed)) {
> + raw_spin_unlock(>wait_lock);
> + return owner;
> + }
> +
> + /*
> +  * We have to let ttwu fix things up, because we
> +  * can't restore the affinity. So dequeue.
> +  */
> + owner->on_rq = 0;
> + deactivate_task(rq, p, DEQUEUE_SLEEP);
> + goto blocked_task;
> +
> +blocked_task:
> + /*
> +  * If !@owner->on_rq, holding @rq->lock will not pin the
> task,
> +  * so we cannot drop @mutex->wait_lock until we're sure its
> a blocked
> +  * task on this rq.
> +  *
> +  * We use @owner->blocked_lock to serialize against
> ttwu_activate().
> +  * Either we see its new owner->on_rq or it will see our
> list_add().
> +  */
> + raw_spin_lock(>blocked_lock);
> +
> + /*
> +  * If we became runnable while waiting for blocked_lock,
> retry.
> +  */
> + if (owner->on_rq) {
> + /*
> +  * If we see the new on->rq, we must also see the
> new task_cpu().
> +  */
> + 

Re: [RFD/RFC PATCH 5/8] sched: Add proxy execution

2018-10-10 Thread luca abeni
Hi,

On Tue,  9 Oct 2018 11:24:31 +0200
Juri Lelli  wrote:
[...]
> +migrate_task:
[...]
> + put_prev_task(rq, next);
> + if (rq->curr != rq->idle) {
> + rq->proxy = rq->idle;
> + set_tsk_need_resched(rq->idle);
> + /*
> +  * XXX [juril] don't we still need to migrate @next
> to
> +  * @owner's CPU?
> +  */
> + return rq->idle;
> + }

If I understand well, this code ends up migrating the task only if the
CPU was previously idle? (scheduling the idle task if the CPU was not
previously idle)

Out of curiosity (I admit this is my ignorance), why is this needed?
If I understand well, after scheduling the idle task the scheduler will
be invoked again (because of the set_tsk_need_resched(rq->idle)) but I
do not understand why it is not possible to migrate task "p" immediately
(I would just check "rq->curr != p", to avoid migrating the currently
scheduled task).


Thanks,
Luca

> + rq->proxy = _task;
> +
> + for (; p; p = p->blocked_task) {
> + int wake_cpu = p->wake_cpu;
> +
> + WARN_ON(p == rq->curr);
> +
> + p->on_rq = TASK_ON_RQ_MIGRATING;
> + dequeue_task(rq, p, 0);
> + set_task_cpu(p, that_cpu);
> + /*
> +  * We can abuse blocked_entry to migrate the thing,
> because @p is
> +  * still on the rq.
> +  */
> + list_add(>blocked_entry, _list);
> +
> + /*
> +  * Preserve p->wake_cpu, such that we can tell where
> it
> +  * used to run later.
> +  */
> + p->wake_cpu = wake_cpu;
> + }
> +
> + rq_unpin_lock(rq, rf);
> + raw_spin_unlock(>lock);
> + raw_spin_lock(_rq->lock);
> +
> + while (!list_empty(_list)) {
> + p = list_first_entry(_list, struct
> task_struct, blocked_entry);
> + list_del_init(>blocked_entry);
> +
> + enqueue_task(that_rq, p, 0);
> + check_preempt_curr(that_rq, p, 0);
> + p->on_rq = TASK_ON_RQ_QUEUED;
> + resched_curr(that_rq);
> + }
> +
> + raw_spin_unlock(_rq->lock);
> + raw_spin_lock(>lock);
> + rq_repin_lock(rq, rf);
> +
> + return NULL; /* Retry task selection on _this_ CPU. */
> +
> +owned_task:
> + /*
> +  * Its possible we interleave with mutex_unlock like:
> +  *
> +  *  lock(>lock);
> +  *proxy()
> +  * mutex_unlock()
> +  *   lock(_lock);
> +  *   next(owner) = current->blocked_task;
> +  *   unlock(_lock);
> +  *
> +  *   wake_up_q();
> +  * ...
> +  *   ttwu_remote()
> +  * __task_rq_lock()
> +  *lock(_lock);
> +  *owner == p
> +  *
> +  * Which leaves us to finish the ttwu_remote() and make it
> go.
> +  *
> +  * XXX is this happening in case of an HANDOFF to p?
> +  * In any case, reading of the owner in
> __mutex_unlock_slowpath is
> +  * done atomically outside wait_lock (only adding waiters to
> wake_q is
> +  * done inside the critical section).
> +  * Does this means we can get to proxy _w/o an owner_ if
> that was
> +  * cleared before grabbing wait_lock? Do we account for this
> case?
> +  * OK we actually do (see PROXY_EXEC ifdeffery in unlock
> function).
> +  */
> +
> + /*
> +  * Finish wakeup, will make the contending ttwu do a
> +  * _spurious_ wakeup, but all code should be able to
> +  * deal with that.
> +  */
> + owner->blocked_on = NULL;
> + owner->state = TASK_RUNNING;
> + // XXX task_woken
> +
> + /*
> +  * If @owner/@p is allowed to run on this CPU, make it go.
> +  */
> + if (cpumask_test_cpu(this_cpu, >cpus_allowed)) {
> + raw_spin_unlock(>wait_lock);
> + return owner;
> + }
> +
> + /*
> +  * We have to let ttwu fix things up, because we
> +  * can't restore the affinity. So dequeue.
> +  */
> + owner->on_rq = 0;
> + deactivate_task(rq, p, DEQUEUE_SLEEP);
> + goto blocked_task;
> +
> +blocked_task:
> + /*
> +  * If !@owner->on_rq, holding @rq->lock will not pin the
> task,
> +  * so we cannot drop @mutex->wait_lock until we're sure its
> a blocked
> +  * task on this rq.
> +  *
> +  * We use @owner->blocked_lock to serialize against
> ttwu_activate().
> +  * Either we see its new owner->on_rq or it will see our
> list_add().
> +  */
> + raw_spin_lock(>blocked_lock);
> +
> + /*
> +  * If we became runnable while waiting for blocked_lock,
> retry.
> +  */
> + if (owner->on_rq) {
> + /*
> +  * If we see the new on->rq, we must also see the
> new task_cpu().
> +  */
> +