Re: [RFC PATCH] kernel: sched: Provide a pointer to the valid CPU mask

2017-04-07 Thread Ingo Molnar

* Thomas Gleixner  wrote:

> On Thu, 6 Apr 2017, Ingo Molnar wrote:
> > CPU hotplug and changing the affinity mask are the more complex cases, 
> > because 
> > there migrating or not migrating is a correctness issue:
> > 
> >  - CPU hotplug has to be aware of this anyway, regardless of whether it's 
> > solved 
> >via a counter of the affinity mask.
> 
> You have to prevent CPU hotplug simply as long as there are migration 
> disabled 
> tasks on the fly. Making that depend on whether they are on a CPU which is 
> about 
> to be unplugged or not would be complete overkill as you still have to solve 
> the 
> case that a task sets the migrate_disable() AFTER the cpu down machinery 
> started.
>
> [...]
>
> The counter alone might be enough for the scheduler placement decisions, but 
> it 
> cannot solve the hotplug issue. You still need something like I sketched out 
> in 
> my previous reply.

Yes, so what you outlined:

void migrate_disable(void)
{
if (in_atomic() || irqs_disabled())
return;

if (!current->migration_disabled) {
percpu_down_read_preempt_disable(hotplug_rwsem);
current->migration_disabled++;
preempt_enable();
} else {
current->migration_disabled++;
}
}

Would solve it?

I.e. my point is: whether migrate_disable()/enable() is implemented via a 
counter 
or a pointer to a cpumask does not materially change how the CPU-hotplug 
solution 
looks like, right?

I.e. we could just use the counter and avoid the whole wrapping of cpumask 
complexity.

Thanks,

Ingo


Re: [RFC PATCH] kernel: sched: Provide a pointer to the valid CPU mask

2017-04-07 Thread Ingo Molnar

* Thomas Gleixner  wrote:

> On Thu, 6 Apr 2017, Ingo Molnar wrote:
> > CPU hotplug and changing the affinity mask are the more complex cases, 
> > because 
> > there migrating or not migrating is a correctness issue:
> > 
> >  - CPU hotplug has to be aware of this anyway, regardless of whether it's 
> > solved 
> >via a counter of the affinity mask.
> 
> You have to prevent CPU hotplug simply as long as there are migration 
> disabled 
> tasks on the fly. Making that depend on whether they are on a CPU which is 
> about 
> to be unplugged or not would be complete overkill as you still have to solve 
> the 
> case that a task sets the migrate_disable() AFTER the cpu down machinery 
> started.
>
> [...]
>
> The counter alone might be enough for the scheduler placement decisions, but 
> it 
> cannot solve the hotplug issue. You still need something like I sketched out 
> in 
> my previous reply.

Yes, so what you outlined:

void migrate_disable(void)
{
if (in_atomic() || irqs_disabled())
return;

if (!current->migration_disabled) {
percpu_down_read_preempt_disable(hotplug_rwsem);
current->migration_disabled++;
preempt_enable();
} else {
current->migration_disabled++;
}
}

Would solve it?

I.e. my point is: whether migrate_disable()/enable() is implemented via a 
counter 
or a pointer to a cpumask does not materially change how the CPU-hotplug 
solution 
looks like, right?

I.e. we could just use the counter and avoid the whole wrapping of cpumask 
complexity.

Thanks,

Ingo


Re: [RFC PATCH] kernel: sched: Provide a pointer to the valid CPU mask

2017-04-06 Thread Peter Zijlstra
On Thu, Apr 06, 2017 at 01:56:12PM +0200, Thomas Gleixner wrote:

> A pointer to the effective mask is definitely enough. And that's what we
> need for migrate_disable() as well. That still leaves the storage
> requirement to MIPS.

So then you're back to Ingo's proposal, right?




Re: [RFC PATCH] kernel: sched: Provide a pointer to the valid CPU mask

2017-04-06 Thread Peter Zijlstra
On Thu, Apr 06, 2017 at 01:56:12PM +0200, Thomas Gleixner wrote:

> A pointer to the effective mask is definitely enough. And that's what we
> need for migrate_disable() as well. That still leaves the storage
> requirement to MIPS.

So then you're back to Ingo's proposal, right?




Re: [RFC PATCH] kernel: sched: Provide a pointer to the valid CPU mask

2017-04-06 Thread Thomas Gleixner
On Thu, 6 Apr 2017, Peter Zijlstra wrote:
> On Thu, Apr 06, 2017 at 01:03:16PM +0200, Thomas Gleixner wrote:
> 
> > Fair enough, but I prefer to have the ability for a temporary restriction
> > of the user space visible cpus allowed mask in the core code, which can be
> > used for both migrate_disable() and things like that MIPS FPU stuff rather
> > than all those home brewn hackeries which are prone to bitrot, security
> > issues and subtle wreckage. cpus_allowed should be solely under sched core
> > control and not accessible from anything outside.
> 
> So the big difference between what MIPS does and what migrate_disable()
> needs here is the amount of storage.
> 
> For migrate_disable() we don't need a whole second cpumask bitmap. We
> can simply use one of the static cpumask_of() things.
> 
> Given that cpumask bitmaps can be quite large, adding another one to
> task_struct just for $very_rare_occasion, doesn't seem like a good
> thing.

A pointer to the effective mask is definitely enough. And that's what we
need for migrate_disable() as well. That still leaves the storage
requirement to MIPS.

Thanks,

tglx




Re: [RFC PATCH] kernel: sched: Provide a pointer to the valid CPU mask

2017-04-06 Thread Thomas Gleixner
On Thu, 6 Apr 2017, Peter Zijlstra wrote:
> On Thu, Apr 06, 2017 at 01:03:16PM +0200, Thomas Gleixner wrote:
> 
> > Fair enough, but I prefer to have the ability for a temporary restriction
> > of the user space visible cpus allowed mask in the core code, which can be
> > used for both migrate_disable() and things like that MIPS FPU stuff rather
> > than all those home brewn hackeries which are prone to bitrot, security
> > issues and subtle wreckage. cpus_allowed should be solely under sched core
> > control and not accessible from anything outside.
> 
> So the big difference between what MIPS does and what migrate_disable()
> needs here is the amount of storage.
> 
> For migrate_disable() we don't need a whole second cpumask bitmap. We
> can simply use one of the static cpumask_of() things.
> 
> Given that cpumask bitmaps can be quite large, adding another one to
> task_struct just for $very_rare_occasion, doesn't seem like a good
> thing.

A pointer to the effective mask is definitely enough. And that's what we
need for migrate_disable() as well. That still leaves the storage
requirement to MIPS.

Thanks,

tglx




Re: [RFC PATCH] kernel: sched: Provide a pointer to the valid CPU mask

2017-04-06 Thread Peter Zijlstra
On Thu, Apr 06, 2017 at 01:03:16PM +0200, Thomas Gleixner wrote:

> Fair enough, but I prefer to have the ability for a temporary restriction
> of the user space visible cpus allowed mask in the core code, which can be
> used for both migrate_disable() and things like that MIPS FPU stuff rather
> than all those home brewn hackeries which are prone to bitrot, security
> issues and subtle wreckage. cpus_allowed should be solely under sched core
> control and not accessible from anything outside.

So the big difference between what MIPS does and what migrate_disable()
needs here is the amount of storage.

For migrate_disable() we don't need a whole second cpumask bitmap. We
can simply use one of the static cpumask_of() things.

Given that cpumask bitmaps can be quite large, adding another one to
task_struct just for $very_rare_occasion, doesn't seem like a good
thing.


Re: [RFC PATCH] kernel: sched: Provide a pointer to the valid CPU mask

2017-04-06 Thread Peter Zijlstra
On Thu, Apr 06, 2017 at 01:03:16PM +0200, Thomas Gleixner wrote:

> Fair enough, but I prefer to have the ability for a temporary restriction
> of the user space visible cpus allowed mask in the core code, which can be
> used for both migrate_disable() and things like that MIPS FPU stuff rather
> than all those home brewn hackeries which are prone to bitrot, security
> issues and subtle wreckage. cpus_allowed should be solely under sched core
> control and not accessible from anything outside.

So the big difference between what MIPS does and what migrate_disable()
needs here is the amount of storage.

For migrate_disable() we don't need a whole second cpumask bitmap. We
can simply use one of the static cpumask_of() things.

Given that cpumask bitmaps can be quite large, adding another one to
task_struct just for $very_rare_occasion, doesn't seem like a good
thing.


Re: [RFC PATCH] kernel: sched: Provide a pointer to the valid CPU mask

2017-04-06 Thread Peter Zijlstra
On Thu, Apr 06, 2017 at 12:58:27PM +0200, Thomas Gleixner wrote:
> On Thu, 6 Apr 2017, Peter Zijlstra wrote:

> > Because mainline will not get anything other than prempt_disable() for
> > migrate_disable().
> 
> I'm not yet convinced of that. There is enough stuff which pointlessly
> disables preemption and works around the restrictions caused by that or
> plays games with cpus_allowed just to stay on a particular cpu.
> 
> I know you don't like it because it imposes restrictions on schedulability,
> but disabling preemption for a long time or playing cpus_allowed games has
> the same and worse effects.

There really isn't much code that plays games with cpus_allowed. And
code that disabled preemption for a long time can surely be fixed, we've
done so many a time.

A pure migrate_disable() (without wrapping lock) is a straight up
nightmare, you can end up with a gazillion runnable tasks stuck to one
CPU and all other CPUs with their thumbs up their arses waiting for
work.

The moment you stick migrate_disable() in a lock, you at least get PI to
bail you out of some of the pain (but by no means all of it).

Adding migrate_disable() to the kernel is really just moving pain
around, it doesn't solve anything much.


Re: [RFC PATCH] kernel: sched: Provide a pointer to the valid CPU mask

2017-04-06 Thread Peter Zijlstra
On Thu, Apr 06, 2017 at 12:58:27PM +0200, Thomas Gleixner wrote:
> On Thu, 6 Apr 2017, Peter Zijlstra wrote:

> > Because mainline will not get anything other than prempt_disable() for
> > migrate_disable().
> 
> I'm not yet convinced of that. There is enough stuff which pointlessly
> disables preemption and works around the restrictions caused by that or
> plays games with cpus_allowed just to stay on a particular cpu.
> 
> I know you don't like it because it imposes restrictions on schedulability,
> but disabling preemption for a long time or playing cpus_allowed games has
> the same and worse effects.

There really isn't much code that plays games with cpus_allowed. And
code that disabled preemption for a long time can surely be fixed, we've
done so many a time.

A pure migrate_disable() (without wrapping lock) is a straight up
nightmare, you can end up with a gazillion runnable tasks stuck to one
CPU and all other CPUs with their thumbs up their arses waiting for
work.

The moment you stick migrate_disable() in a lock, you at least get PI to
bail you out of some of the pain (but by no means all of it).

Adding migrate_disable() to the kernel is really just moving pain
around, it doesn't solve anything much.


Re: [RFC PATCH] kernel: sched: Provide a pointer to the valid CPU mask

2017-04-06 Thread Thomas Gleixner
On Thu, 6 Apr 2017, Ingo Molnar wrote:
> CPU hotplug and changing the affinity mask are the more complex cases, 
> because 
> there migrating or not migrating is a correctness issue:
> 
>  - CPU hotplug has to be aware of this anyway, regardless of whether it's 
> solved 
>via a counter of the affinity mask.

You have to prevent CPU hotplug simply as long as there are migration
disabled tasks on the fly. Making that depend on whether they are on a CPU
which is about to be unplugged or not would be complete overkill as you
still have to solve the case that a task sets the migrate_disable() AFTER
the cpu down machinery started.

>  - Changing the affinity mask (set_cpus_allowed()) has two main cases:
>the synchronous and asynchronous case:
> 
>  - synchronous is when the current task changes its own affinity mask, 
> this 
>should work fine mostly out of box, as we don't call 
> set_cpus_allowed() 
>from inside migration disabled regions. (We can enforce this via a 
>debugging check.)
> 
>  - The asynchronous case is when the affinity task of some other task is 
>changed - this would not have an immediate effect with 
> migration-disabled 
>logic, the migration would be delayed to when migration is re-enabled 
>again.
> 
> As for general fragility, is there any reason why a simple debugging check in 
> set_task_cpu() would not catch most mishaps:
> 
>   WARN_ON_ONCE(p->state != TASK_RUNNING && p->migration_disabled);
> 
> ... or something like that?
>
> I.e. my point is that I think using a counter would be much simpler, yet 
> still as 
> robust and maintainable. We could in fact move migrate_disable()/enable() 
> upstream 
> straight away and eliminate this small fork of functionality between mainline 
> and 
> -rt.

The counter alone might be enough for the scheduler placement decisions,
but it cannot solve the hotplug issue. You still need something like I
sketched out in my previous reply.

Thanks,

tglx


Re: [RFC PATCH] kernel: sched: Provide a pointer to the valid CPU mask

2017-04-06 Thread Thomas Gleixner
On Thu, 6 Apr 2017, Ingo Molnar wrote:
> CPU hotplug and changing the affinity mask are the more complex cases, 
> because 
> there migrating or not migrating is a correctness issue:
> 
>  - CPU hotplug has to be aware of this anyway, regardless of whether it's 
> solved 
>via a counter of the affinity mask.

You have to prevent CPU hotplug simply as long as there are migration
disabled tasks on the fly. Making that depend on whether they are on a CPU
which is about to be unplugged or not would be complete overkill as you
still have to solve the case that a task sets the migrate_disable() AFTER
the cpu down machinery started.

>  - Changing the affinity mask (set_cpus_allowed()) has two main cases:
>the synchronous and asynchronous case:
> 
>  - synchronous is when the current task changes its own affinity mask, 
> this 
>should work fine mostly out of box, as we don't call 
> set_cpus_allowed() 
>from inside migration disabled regions. (We can enforce this via a 
>debugging check.)
> 
>  - The asynchronous case is when the affinity task of some other task is 
>changed - this would not have an immediate effect with 
> migration-disabled 
>logic, the migration would be delayed to when migration is re-enabled 
>again.
> 
> As for general fragility, is there any reason why a simple debugging check in 
> set_task_cpu() would not catch most mishaps:
> 
>   WARN_ON_ONCE(p->state != TASK_RUNNING && p->migration_disabled);
> 
> ... or something like that?
>
> I.e. my point is that I think using a counter would be much simpler, yet 
> still as 
> robust and maintainable. We could in fact move migrate_disable()/enable() 
> upstream 
> straight away and eliminate this small fork of functionality between mainline 
> and 
> -rt.

The counter alone might be enough for the scheduler placement decisions,
but it cannot solve the hotplug issue. You still need something like I
sketched out in my previous reply.

Thanks,

tglx


Re: [RFC PATCH] kernel: sched: Provide a pointer to the valid CPU mask

2017-04-06 Thread Thomas Gleixner
On Thu, 6 Apr 2017, Peter Zijlstra wrote:
> On Thu, Apr 06, 2017 at 12:47:21PM +0200, Thomas Gleixner wrote:
> > On Thu, 6 Apr 2017, Peter Zijlstra wrote:
> 
> > > IIRC MIPS has a case where only 1 in N cores has an FPU. And once a task
> > > uses FPU, it gets affined to the core that has one or something like
> > > that.
> > >
> > > Of course, nothing then stops someone else breaking that affinity. But I
> > > suspect it will simply fault on the next FPU instruction and 'reset' the
> > > mask or something. I've no clue and no real desire to know.
> > 
> > It does nasty games with it's own storage of p->thread.user_cpus_allowed
> > and a fully seperate implementation of sys_sched_set|getaffinity.
> > 
> > Plus a magic trap handler which forces the thread to a CPU with FPU when
> > the user_cpus_allowed mask intersects with the cpus_with_fpu_mask...
> > 
> > Magic crap, which could all be replaced by a simple function in the
> > scheduler which allows to push a task to a FPU CPU and then disable
> > migration.
> 
> If its even halfway coherent, I'd much rather let it stay where it it.
> 
> I really want to limit migrate_disable() to PREEMPT_RT=y where its used
> to preserve spinlock semantics and not allow random other
> migrate_disable() usage in the kernel.
> 
> Also note, that per the above, it can actually migrate to any core that
> has an FPU on, so its not a good match for migrate_disable() in any
> case.

Fair enough, but I prefer to have the ability for a temporary restriction
of the user space visible cpus allowed mask in the core code, which can be
used for both migrate_disable() and things like that MIPS FPU stuff rather
than all those home brewn hackeries which are prone to bitrot, security
issues and subtle wreckage. cpus_allowed should be solely under sched core
control and not accessible from anything outside.

Thanks,

tglx





Re: [RFC PATCH] kernel: sched: Provide a pointer to the valid CPU mask

2017-04-06 Thread Thomas Gleixner
On Thu, 6 Apr 2017, Peter Zijlstra wrote:
> On Thu, Apr 06, 2017 at 12:47:21PM +0200, Thomas Gleixner wrote:
> > On Thu, 6 Apr 2017, Peter Zijlstra wrote:
> 
> > > IIRC MIPS has a case where only 1 in N cores has an FPU. And once a task
> > > uses FPU, it gets affined to the core that has one or something like
> > > that.
> > >
> > > Of course, nothing then stops someone else breaking that affinity. But I
> > > suspect it will simply fault on the next FPU instruction and 'reset' the
> > > mask or something. I've no clue and no real desire to know.
> > 
> > It does nasty games with it's own storage of p->thread.user_cpus_allowed
> > and a fully seperate implementation of sys_sched_set|getaffinity.
> > 
> > Plus a magic trap handler which forces the thread to a CPU with FPU when
> > the user_cpus_allowed mask intersects with the cpus_with_fpu_mask...
> > 
> > Magic crap, which could all be replaced by a simple function in the
> > scheduler which allows to push a task to a FPU CPU and then disable
> > migration.
> 
> If its even halfway coherent, I'd much rather let it stay where it it.
> 
> I really want to limit migrate_disable() to PREEMPT_RT=y where its used
> to preserve spinlock semantics and not allow random other
> migrate_disable() usage in the kernel.
> 
> Also note, that per the above, it can actually migrate to any core that
> has an FPU on, so its not a good match for migrate_disable() in any
> case.

Fair enough, but I prefer to have the ability for a temporary restriction
of the user space visible cpus allowed mask in the core code, which can be
used for both migrate_disable() and things like that MIPS FPU stuff rather
than all those home brewn hackeries which are prone to bitrot, security
issues and subtle wreckage. cpus_allowed should be solely under sched core
control and not accessible from anything outside.

Thanks,

tglx





Re: [RFC PATCH] kernel: sched: Provide a pointer to the valid CPU mask

2017-04-06 Thread Ingo Molnar

* Thomas Gleixner  wrote:

> On Thu, 6 Apr 2017, Ingo Molnar wrote:
>
> > Sorry if this is a back and forth - I was somehow convinced that we do need 
> > to 
> > frob the cpus_allowed mask to get this functionality - but in hindsight I 
> > think the counter should be enough.
> > 
> > I.e. just have a counter and these two APIs:
> > 
> > static inline void migrate_disable(void)
> > {
> > current->migration_disabled++;
> > }
> > 
> > ...
> > 
> > static inline void migrate_enable(void)
> > {
> > current->migration_disabled--;
> > }
> > 
> > ... and make sure the scheduler migration code plus the CPU hotplug code 
> > considers 
> > the counter.
> 
> We tried that some time ago, but there are a lot of places in the scheduler 
> which just rely on the cpus_allowed_mask, so we have to chase all of them and 
> make sure that new users do not ignore that counter. That's why we chose the 
> cpus_allowed_mask approach. And I still think that's the proper thing to do.

But but ...

The number of places in the scheduler where we actually end up migrating a task 
is 
pretty limited:

try_to_wake_up():
- main wakeup code

migrate_swap():
- active NUMA-balancing feature

move_queued_task():
- hotplug CPU-down migration
- changing the affinity mask

The wakeup and NUMA balancing case is trivial to solve: it's an optimization 
and 
we can skip the migration if migration is disabled.

CPU hotplug and changing the affinity mask are the more complex cases, because 
there migrating or not migrating is a correctness issue:

 - CPU hotplug has to be aware of this anyway, regardless of whether it's 
solved 
   via a counter of the affinity mask.

 - Changing the affinity mask (set_cpus_allowed()) has two main cases:
   the synchronous and asynchronous case:

 - synchronous is when the current task changes its own affinity mask, this 
   should work fine mostly out of box, as we don't call set_cpus_allowed() 
   from inside migration disabled regions. (We can enforce this via a 
   debugging check.)

 - The asynchronous case is when the affinity task of some other task is 
   changed - this would not have an immediate effect with 
migration-disabled 
   logic, the migration would be delayed to when migration is re-enabled 
   again.

As for general fragility, is there any reason why a simple debugging check in 
set_task_cpu() would not catch most mishaps:

WARN_ON_ONCE(p->state != TASK_RUNNING && p->migration_disabled);

... or something like that?

I.e. my point is that I think using a counter would be much simpler, yet still 
as 
robust and maintainable. We could in fact move migrate_disable()/enable() 
upstream 
straight away and eliminate this small fork of functionality between mainline 
and 
-rt.

Thanks,

Ingo


Re: [RFC PATCH] kernel: sched: Provide a pointer to the valid CPU mask

2017-04-06 Thread Ingo Molnar

* Thomas Gleixner  wrote:

> On Thu, 6 Apr 2017, Ingo Molnar wrote:
>
> > Sorry if this is a back and forth - I was somehow convinced that we do need 
> > to 
> > frob the cpus_allowed mask to get this functionality - but in hindsight I 
> > think the counter should be enough.
> > 
> > I.e. just have a counter and these two APIs:
> > 
> > static inline void migrate_disable(void)
> > {
> > current->migration_disabled++;
> > }
> > 
> > ...
> > 
> > static inline void migrate_enable(void)
> > {
> > current->migration_disabled--;
> > }
> > 
> > ... and make sure the scheduler migration code plus the CPU hotplug code 
> > considers 
> > the counter.
> 
> We tried that some time ago, but there are a lot of places in the scheduler 
> which just rely on the cpus_allowed_mask, so we have to chase all of them and 
> make sure that new users do not ignore that counter. That's why we chose the 
> cpus_allowed_mask approach. And I still think that's the proper thing to do.

But but ...

The number of places in the scheduler where we actually end up migrating a task 
is 
pretty limited:

try_to_wake_up():
- main wakeup code

migrate_swap():
- active NUMA-balancing feature

move_queued_task():
- hotplug CPU-down migration
- changing the affinity mask

The wakeup and NUMA balancing case is trivial to solve: it's an optimization 
and 
we can skip the migration if migration is disabled.

CPU hotplug and changing the affinity mask are the more complex cases, because 
there migrating or not migrating is a correctness issue:

 - CPU hotplug has to be aware of this anyway, regardless of whether it's 
solved 
   via a counter of the affinity mask.

 - Changing the affinity mask (set_cpus_allowed()) has two main cases:
   the synchronous and asynchronous case:

 - synchronous is when the current task changes its own affinity mask, this 
   should work fine mostly out of box, as we don't call set_cpus_allowed() 
   from inside migration disabled regions. (We can enforce this via a 
   debugging check.)

 - The asynchronous case is when the affinity task of some other task is 
   changed - this would not have an immediate effect with 
migration-disabled 
   logic, the migration would be delayed to when migration is re-enabled 
   again.

As for general fragility, is there any reason why a simple debugging check in 
set_task_cpu() would not catch most mishaps:

WARN_ON_ONCE(p->state != TASK_RUNNING && p->migration_disabled);

... or something like that?

I.e. my point is that I think using a counter would be much simpler, yet still 
as 
robust and maintainable. We could in fact move migrate_disable()/enable() 
upstream 
straight away and eliminate this small fork of functionality between mainline 
and 
-rt.

Thanks,

Ingo


Re: [RFC PATCH] kernel: sched: Provide a pointer to the valid CPU mask

2017-04-06 Thread Thomas Gleixner
On Thu, 6 Apr 2017, Peter Zijlstra wrote:
> On Thu, Apr 06, 2017 at 11:25:42AM +0200, Sebastian Andrzej Siewior wrote:
> > > I.e. just have a counter and these two APIs:
> > > 
> > > static inline void migrate_disable(void)
> > > {
> > > current->migration_disabled++;
> > plus
> > if (current->migration_disabled == 1)
> > get_online_cpus()
> 
> You want that other hotplug fast hack crap you have in rt, no?

As I said in the other mail, we certainly want to make get_online_cpus()
fast independently of RT. That's on our todo list anyway.

> Because mainline will not get anything other than prempt_disable() for
> migrate_disable().

I'm not yet convinced of that. There is enough stuff which pointlessly
disables preemption and works around the restrictions caused by that or
plays games with cpus_allowed just to stay on a particular cpu.

I know you don't like it because it imposes restrictions on schedulability,
but disabling preemption for a long time or playing cpus_allowed games has
the same and worse effects.

Thanks,

tglx


Re: [RFC PATCH] kernel: sched: Provide a pointer to the valid CPU mask

2017-04-06 Thread Thomas Gleixner
On Thu, 6 Apr 2017, Peter Zijlstra wrote:
> On Thu, Apr 06, 2017 at 11:25:42AM +0200, Sebastian Andrzej Siewior wrote:
> > > I.e. just have a counter and these two APIs:
> > > 
> > > static inline void migrate_disable(void)
> > > {
> > > current->migration_disabled++;
> > plus
> > if (current->migration_disabled == 1)
> > get_online_cpus()
> 
> You want that other hotplug fast hack crap you have in rt, no?

As I said in the other mail, we certainly want to make get_online_cpus()
fast independently of RT. That's on our todo list anyway.

> Because mainline will not get anything other than prempt_disable() for
> migrate_disable().

I'm not yet convinced of that. There is enough stuff which pointlessly
disables preemption and works around the restrictions caused by that or
plays games with cpus_allowed just to stay on a particular cpu.

I know you don't like it because it imposes restrictions on schedulability,
but disabling preemption for a long time or playing cpus_allowed games has
the same and worse effects.

Thanks,

tglx


Re: [RFC PATCH] kernel: sched: Provide a pointer to the valid CPU mask

2017-04-06 Thread Peter Zijlstra
On Thu, Apr 06, 2017 at 12:47:21PM +0200, Thomas Gleixner wrote:
> On Thu, 6 Apr 2017, Peter Zijlstra wrote:

> > IIRC MIPS has a case where only 1 in N cores has an FPU. And once a task
> > uses FPU, it gets affined to the core that has one or something like
> > that.
> >
> > Of course, nothing then stops someone else breaking that affinity. But I
> > suspect it will simply fault on the next FPU instruction and 'reset' the
> > mask or something. I've no clue and no real desire to know.
> 
> It does nasty games with it's own storage of p->thread.user_cpus_allowed
> and a fully seperate implementation of sys_sched_set|getaffinity.
> 
> Plus a magic trap handler which forces the thread to a CPU with FPU when
> the user_cpus_allowed mask intersects with the cpus_with_fpu_mask...
> 
> Magic crap, which could all be replaced by a simple function in the
> scheduler which allows to push a task to a FPU CPU and then disable
> migration.

If its even halfway coherent, I'd much rather let it stay where it it.

I really want to limit migrate_disable() to PREEMPT_RT=y where its used
to preserve spinlock semantics and not allow random other
migrate_disable() usage in the kernel.

Also note, that per the above, it can actually migrate to any core that
has an FPU on, so its not a good match for migrate_disable() in any
case.


Re: [RFC PATCH] kernel: sched: Provide a pointer to the valid CPU mask

2017-04-06 Thread Peter Zijlstra
On Thu, Apr 06, 2017 at 12:47:21PM +0200, Thomas Gleixner wrote:
> On Thu, 6 Apr 2017, Peter Zijlstra wrote:

> > IIRC MIPS has a case where only 1 in N cores has an FPU. And once a task
> > uses FPU, it gets affined to the core that has one or something like
> > that.
> >
> > Of course, nothing then stops someone else breaking that affinity. But I
> > suspect it will simply fault on the next FPU instruction and 'reset' the
> > mask or something. I've no clue and no real desire to know.
> 
> It does nasty games with it's own storage of p->thread.user_cpus_allowed
> and a fully seperate implementation of sys_sched_set|getaffinity.
> 
> Plus a magic trap handler which forces the thread to a CPU with FPU when
> the user_cpus_allowed mask intersects with the cpus_with_fpu_mask...
> 
> Magic crap, which could all be replaced by a simple function in the
> scheduler which allows to push a task to a FPU CPU and then disable
> migration.

If its even halfway coherent, I'd much rather let it stay where it it.

I really want to limit migrate_disable() to PREEMPT_RT=y where its used
to preserve spinlock semantics and not allow random other
migrate_disable() usage in the kernel.

Also note, that per the above, it can actually migrate to any core that
has an FPU on, so its not a good match for migrate_disable() in any
case.


Re: [RFC PATCH] kernel: sched: Provide a pointer to the valid CPU mask

2017-04-06 Thread Thomas Gleixner
On Thu, 6 Apr 2017, Peter Zijlstra wrote:
> On Thu, Apr 06, 2017 at 11:46:33AM +0200, Sebastian Andrzej Siewior wrote:
> > On 2017-04-06 11:32:24 [+0200], Peter Zijlstra wrote:
> > > On Tue, Apr 04, 2017 at 08:42:02PM +0200, Sebastian Andrzej Siewior wrote:
> > > > While converting the existing users I tried to stick with the rules
> > > > above however… well mostly CPUFREQ tries to temporary switch the CPU
> > > > mask to do something on a certain CPU and then switches the mask back it
> > > > its original value.
> > > 
> > > 
> > > There's a bunch of that through ancient and rotten parts of the kernel.
> > > All those sites are broken.
> > > 
> > > Nothing stops userspace from setting a different affinity right after
> > > the kernel does for those threads.
> > 
> > Good. So you are saying I should convert them to something like
> > queue_work_on()?
> 
> Not sure; iirc there were a few variants. Some can indeed simply do
> queue_work_on() and possibly wait for completion. some should maybe be a
> per-cpu kthread, others will be more 'interesting'.
> 
> IIRC MIPS has a case where only 1 in N cores has an FPU. And once a task
> uses FPU, it gets affined to the core that has one or something like
> that.
>
> Of course, nothing then stops someone else breaking that affinity. But I
> suspect it will simply fault on the next FPU instruction and 'reset' the
> mask or something. I've no clue and no real desire to know.

It does nasty games with it's own storage of p->thread.user_cpus_allowed
and a fully seperate implementation of sys_sched_set|getaffinity.

Plus a magic trap handler which forces the thread to a CPU with FPU when
the user_cpus_allowed mask intersects with the cpus_with_fpu_mask...

Magic crap, which could all be replaced by a simple function in the
scheduler which allows to push a task to a FPU CPU and then disable
migration.

Thanks,

tglx




Re: [RFC PATCH] kernel: sched: Provide a pointer to the valid CPU mask

2017-04-06 Thread Thomas Gleixner
On Thu, 6 Apr 2017, Peter Zijlstra wrote:
> On Thu, Apr 06, 2017 at 11:46:33AM +0200, Sebastian Andrzej Siewior wrote:
> > On 2017-04-06 11:32:24 [+0200], Peter Zijlstra wrote:
> > > On Tue, Apr 04, 2017 at 08:42:02PM +0200, Sebastian Andrzej Siewior wrote:
> > > > While converting the existing users I tried to stick with the rules
> > > > above however… well mostly CPUFREQ tries to temporary switch the CPU
> > > > mask to do something on a certain CPU and then switches the mask back it
> > > > its original value.
> > > 
> > > 
> > > There's a bunch of that through ancient and rotten parts of the kernel.
> > > All those sites are broken.
> > > 
> > > Nothing stops userspace from setting a different affinity right after
> > > the kernel does for those threads.
> > 
> > Good. So you are saying I should convert them to something like
> > queue_work_on()?
> 
> Not sure; iirc there were a few variants. Some can indeed simply do
> queue_work_on() and possibly wait for completion. some should maybe be a
> per-cpu kthread, others will be more 'interesting'.
> 
> IIRC MIPS has a case where only 1 in N cores has an FPU. And once a task
> uses FPU, it gets affined to the core that has one or something like
> that.
>
> Of course, nothing then stops someone else breaking that affinity. But I
> suspect it will simply fault on the next FPU instruction and 'reset' the
> mask or something. I've no clue and no real desire to know.

It does nasty games with it's own storage of p->thread.user_cpus_allowed
and a fully seperate implementation of sys_sched_set|getaffinity.

Plus a magic trap handler which forces the thread to a CPU with FPU when
the user_cpus_allowed mask intersects with the cpus_with_fpu_mask...

Magic crap, which could all be replaced by a simple function in the
scheduler which allows to push a task to a FPU CPU and then disable
migration.

Thanks,

tglx




Re: [RFC PATCH] kernel: sched: Provide a pointer to the valid CPU mask

2017-04-06 Thread Thomas Gleixner
On Thu, 6 Apr 2017, Ingo Molnar wrote:
> Sorry if this is a back and forth - I was somehow convinced that we do need 
> to 
> frob the cpus_allowed mask to get this functionality - but in hindsight I 
> think 
> the counter should be enough.
> 
> I.e. just have a counter and these two APIs:
> 
> static inline void migrate_disable(void)
> {
> current->migration_disabled++;
> }
> 
> ...
> 
> static inline void migrate_enable(void)
> {
> current->migration_disabled--;
> }
> 
> ... and make sure the scheduler migration code plus the CPU hotplug code 
> considers 
> the counter.

We tried that some time ago, but there are a lot of places in the scheduler
which just rely on the cpus_allowed_mask, so we have to chase all of them
and make sure that new users do not ignore that counter. That's why we
chose the cpus_allowed_mask approach. And I still think that's the proper
thing to do.

Also this still requires magic in the context switch where we need to
transport that information to the CPU, so it can't go away.

CPU hotplug has other issues. If migration is disabled, then blocking out
the hotplug code is probably easy to do, but you need to undo the blocking
when the CPU is not longer pinned, i.e. if the last task enabled migration
again.

To make that all less horrible, we need to fix the stupid nested usage of
get_online_cpus() first. That'd allow us to convert the hotplug locking
into a percpu_rwsem, which is desired anyway as it makes get_online_cpus()
cheap.

So then the migration stuff becomes:

void migrate_disable(void)
{
if (in_atomic() || irqs_disabled())
return;

if (!current->migration_disabled) {
percpu_down_read_preempt_disable(hotplug_rwsem);
current->migration_disabled++;
preempt_enable();
} else {
current->migration_disabled++;
}
}

void migrate_enable(void)
{
if (in_atomic() || irqs_disabled())
return;

if (current->migration_disabled == 1) {
preempt_disable();
current->migration_disabled--;
percpu_up_read_preempt_enable(hotplug_rwsem);
} else {
current->migration_disabled--;
}
}

That want's to have some debug extras: see the current RT version of it.

Coming back to the cpus_allowed mask.

Most of the cpus_allowed usage outside of the scheduler is broken and we
really should clean that up now and then restrict access to cpus_allowed.

The easy ones (and the majority) are those:

 cpumask_copy(saved_mask, >cpus_allowed);
 set_cpus_allowed_ptr(current, cpumask_of(XXX));
 do_something();
 set_cpus_allowed_ptr(current, saved_mask);

That code is not at all protected against user space changing affinity or
cpus going down. We better do that now before doing anything about
cpus_allowed in the core code.

Thanks,

tglx







Re: [RFC PATCH] kernel: sched: Provide a pointer to the valid CPU mask

2017-04-06 Thread Thomas Gleixner
On Thu, 6 Apr 2017, Ingo Molnar wrote:
> Sorry if this is a back and forth - I was somehow convinced that we do need 
> to 
> frob the cpus_allowed mask to get this functionality - but in hindsight I 
> think 
> the counter should be enough.
> 
> I.e. just have a counter and these two APIs:
> 
> static inline void migrate_disable(void)
> {
> current->migration_disabled++;
> }
> 
> ...
> 
> static inline void migrate_enable(void)
> {
> current->migration_disabled--;
> }
> 
> ... and make sure the scheduler migration code plus the CPU hotplug code 
> considers 
> the counter.

We tried that some time ago, but there are a lot of places in the scheduler
which just rely on the cpus_allowed_mask, so we have to chase all of them
and make sure that new users do not ignore that counter. That's why we
chose the cpus_allowed_mask approach. And I still think that's the proper
thing to do.

Also this still requires magic in the context switch where we need to
transport that information to the CPU, so it can't go away.

CPU hotplug has other issues. If migration is disabled, then blocking out
the hotplug code is probably easy to do, but you need to undo the blocking
when the CPU is not longer pinned, i.e. if the last task enabled migration
again.

To make that all less horrible, we need to fix the stupid nested usage of
get_online_cpus() first. That'd allow us to convert the hotplug locking
into a percpu_rwsem, which is desired anyway as it makes get_online_cpus()
cheap.

So then the migration stuff becomes:

void migrate_disable(void)
{
if (in_atomic() || irqs_disabled())
return;

if (!current->migration_disabled) {
percpu_down_read_preempt_disable(hotplug_rwsem);
current->migration_disabled++;
preempt_enable();
} else {
current->migration_disabled++;
}
}

void migrate_enable(void)
{
if (in_atomic() || irqs_disabled())
return;

if (current->migration_disabled == 1) {
preempt_disable();
current->migration_disabled--;
percpu_up_read_preempt_enable(hotplug_rwsem);
} else {
current->migration_disabled--;
}
}

That want's to have some debug extras: see the current RT version of it.

Coming back to the cpus_allowed mask.

Most of the cpus_allowed usage outside of the scheduler is broken and we
really should clean that up now and then restrict access to cpus_allowed.

The easy ones (and the majority) are those:

 cpumask_copy(saved_mask, >cpus_allowed);
 set_cpus_allowed_ptr(current, cpumask_of(XXX));
 do_something();
 set_cpus_allowed_ptr(current, saved_mask);

That code is not at all protected against user space changing affinity or
cpus going down. We better do that now before doing anything about
cpus_allowed in the core code.

Thanks,

tglx







Re: [RFC PATCH] kernel: sched: Provide a pointer to the valid CPU mask

2017-04-06 Thread Peter Zijlstra
On Thu, Apr 06, 2017 at 11:46:33AM +0200, Sebastian Andrzej Siewior wrote:
> On 2017-04-06 11:32:24 [+0200], Peter Zijlstra wrote:
> > On Tue, Apr 04, 2017 at 08:42:02PM +0200, Sebastian Andrzej Siewior wrote:
> > > While converting the existing users I tried to stick with the rules
> > > above however… well mostly CPUFREQ tries to temporary switch the CPU
> > > mask to do something on a certain CPU and then switches the mask back it
> > > its original value.
> > 
> > 
> > There's a bunch of that through ancient and rotten parts of the kernel.
> > All those sites are broken.
> > 
> > Nothing stops userspace from setting a different affinity right after
> > the kernel does for those threads.
> 
> Good. So you are saying I should convert them to something like
> queue_work_on()?

Not sure; iirc there were a few variants. Some can indeed simply do
queue_work_on() and possibly wait for completion. some should maybe be a
per-cpu kthread, others will be more 'interesting'.

IIRC MIPS has a case where only 1 in N cores has an FPU. And once a task
uses FPU, it gets affined to the core that has one or something like
that.

Of course, nothing then stops someone else breaking that affinity. But I
suspect it will simply fault on the next FPU instruction and 'reset' the
mask or something. I've no clue and no real desire to know.


Re: [RFC PATCH] kernel: sched: Provide a pointer to the valid CPU mask

2017-04-06 Thread Peter Zijlstra
On Thu, Apr 06, 2017 at 11:46:33AM +0200, Sebastian Andrzej Siewior wrote:
> On 2017-04-06 11:32:24 [+0200], Peter Zijlstra wrote:
> > On Tue, Apr 04, 2017 at 08:42:02PM +0200, Sebastian Andrzej Siewior wrote:
> > > While converting the existing users I tried to stick with the rules
> > > above however… well mostly CPUFREQ tries to temporary switch the CPU
> > > mask to do something on a certain CPU and then switches the mask back it
> > > its original value.
> > 
> > 
> > There's a bunch of that through ancient and rotten parts of the kernel.
> > All those sites are broken.
> > 
> > Nothing stops userspace from setting a different affinity right after
> > the kernel does for those threads.
> 
> Good. So you are saying I should convert them to something like
> queue_work_on()?

Not sure; iirc there were a few variants. Some can indeed simply do
queue_work_on() and possibly wait for completion. some should maybe be a
per-cpu kthread, others will be more 'interesting'.

IIRC MIPS has a case where only 1 in N cores has an FPU. And once a task
uses FPU, it gets affined to the core that has one or something like
that.

Of course, nothing then stops someone else breaking that affinity. But I
suspect it will simply fault on the next FPU instruction and 'reset' the
mask or something. I've no clue and no real desire to know.


Re: [RFC PATCH] kernel: sched: Provide a pointer to the valid CPU mask

2017-04-06 Thread Sebastian Andrzej Siewior
On 2017-04-06 11:32:24 [+0200], Peter Zijlstra wrote:
> On Tue, Apr 04, 2017 at 08:42:02PM +0200, Sebastian Andrzej Siewior wrote:
> > While converting the existing users I tried to stick with the rules
> > above however… well mostly CPUFREQ tries to temporary switch the CPU
> > mask to do something on a certain CPU and then switches the mask back it
> > its original value.
> 
> 
> There's a bunch of that through ancient and rotten parts of the kernel.
> All those sites are broken.
> 
> Nothing stops userspace from setting a different affinity right after
> the kernel does for those threads.

Good. So you are saying I should convert them to something like
queue_work_on()?

Sebastian


Re: [RFC PATCH] kernel: sched: Provide a pointer to the valid CPU mask

2017-04-06 Thread Sebastian Andrzej Siewior
On 2017-04-06 11:32:24 [+0200], Peter Zijlstra wrote:
> On Tue, Apr 04, 2017 at 08:42:02PM +0200, Sebastian Andrzej Siewior wrote:
> > While converting the existing users I tried to stick with the rules
> > above however… well mostly CPUFREQ tries to temporary switch the CPU
> > mask to do something on a certain CPU and then switches the mask back it
> > its original value.
> 
> 
> There's a bunch of that through ancient and rotten parts of the kernel.
> All those sites are broken.
> 
> Nothing stops userspace from setting a different affinity right after
> the kernel does for those threads.

Good. So you are saying I should convert them to something like
queue_work_on()?

Sebastian


Re: [RFC PATCH] kernel: sched: Provide a pointer to the valid CPU mask

2017-04-06 Thread Peter Zijlstra
On Thu, Apr 06, 2017 at 11:25:42AM +0200, Sebastian Andrzej Siewior wrote:
> > I.e. just have a counter and these two APIs:
> > 
> > static inline void migrate_disable(void)
> > {
> > current->migration_disabled++;
> plus
>   if (current->migration_disabled == 1)
>   get_online_cpus()

You want that other hotplug fast hack crap you have in rt, no? Because
mainline will not get anything other than prempt_disable() for
migrate_disable().

> > }
> > 
> > ...
> > 
> > static inline void migrate_enable(void)
> > {
> > current->migration_disabled--;
> > }
> > 
> > ... and make sure the scheduler migration code plus the CPU hotplug code 
> > considers 
> > the counter.
> 
> So on the sched part I need go through all places where it looks at the
> mask and make sure it ignores the CPU switch decision. This could be
> doable.

Don't sprinkle that around. Use a wrapper like say: tsk_cpus_allowed()
to encapsulate that logic :-)



Re: [RFC PATCH] kernel: sched: Provide a pointer to the valid CPU mask

2017-04-06 Thread Peter Zijlstra
On Thu, Apr 06, 2017 at 11:25:42AM +0200, Sebastian Andrzej Siewior wrote:
> > I.e. just have a counter and these two APIs:
> > 
> > static inline void migrate_disable(void)
> > {
> > current->migration_disabled++;
> plus
>   if (current->migration_disabled == 1)
>   get_online_cpus()

You want that other hotplug fast hack crap you have in rt, no? Because
mainline will not get anything other than prempt_disable() for
migrate_disable().

> > }
> > 
> > ...
> > 
> > static inline void migrate_enable(void)
> > {
> > current->migration_disabled--;
> > }
> > 
> > ... and make sure the scheduler migration code plus the CPU hotplug code 
> > considers 
> > the counter.
> 
> So on the sched part I need go through all places where it looks at the
> mask and make sure it ignores the CPU switch decision. This could be
> doable.

Don't sprinkle that around. Use a wrapper like say: tsk_cpus_allowed()
to encapsulate that logic :-)



Re: [RFC PATCH] kernel: sched: Provide a pointer to the valid CPU mask

2017-04-06 Thread Peter Zijlstra
On Thu, Apr 06, 2017 at 11:35:45AM +0200, Peter Zijlstra wrote:
> On Thu, Apr 06, 2017 at 10:01:39AM +0200, Ingo Molnar wrote:
> > I.e. just have a counter and these two APIs:
> > 
> > static inline void migrate_disable(void)
> > {
> > current->migration_disabled++;
> > }
> > 
> > ...
> > 
> > static inline void migrate_enable(void)
> > {
> > current->migration_disabled--;
> > }
> > 
> > ... and make sure the scheduler migration code plus the CPU hotplug code 
> > considers 
> > the counter.
> > 
> > Would this work, and would this be the simplest all around solution?
> 
> If you want to add expensive bits to hot paths like wakeups ... :/

Hmm, doesn't that get us back to exactly where we were: ?

tsk_cpus_allowed() 
{
if (!tsk->migration_disabled)
return tsk->cpus_allowed;
else
return cpumask_of(task_cpu(tsk));
}




Re: [RFC PATCH] kernel: sched: Provide a pointer to the valid CPU mask

2017-04-06 Thread Peter Zijlstra
On Thu, Apr 06, 2017 at 11:35:45AM +0200, Peter Zijlstra wrote:
> On Thu, Apr 06, 2017 at 10:01:39AM +0200, Ingo Molnar wrote:
> > I.e. just have a counter and these two APIs:
> > 
> > static inline void migrate_disable(void)
> > {
> > current->migration_disabled++;
> > }
> > 
> > ...
> > 
> > static inline void migrate_enable(void)
> > {
> > current->migration_disabled--;
> > }
> > 
> > ... and make sure the scheduler migration code plus the CPU hotplug code 
> > considers 
> > the counter.
> > 
> > Would this work, and would this be the simplest all around solution?
> 
> If you want to add expensive bits to hot paths like wakeups ... :/

Hmm, doesn't that get us back to exactly where we were: ?

tsk_cpus_allowed() 
{
if (!tsk->migration_disabled)
return tsk->cpus_allowed;
else
return cpumask_of(task_cpu(tsk));
}




Re: [RFC PATCH] kernel: sched: Provide a pointer to the valid CPU mask

2017-04-06 Thread Peter Zijlstra
On Thu, Apr 06, 2017 at 10:01:39AM +0200, Ingo Molnar wrote:
> I.e. just have a counter and these two APIs:
> 
> static inline void migrate_disable(void)
> {
> current->migration_disabled++;
> }
> 
> ...
> 
> static inline void migrate_enable(void)
> {
> current->migration_disabled--;
> }
> 
> ... and make sure the scheduler migration code plus the CPU hotplug code 
> considers 
> the counter.
> 
> Would this work, and would this be the simplest all around solution?

If you want to add expensive bits to hot paths like wakeups ... :/


Re: [RFC PATCH] kernel: sched: Provide a pointer to the valid CPU mask

2017-04-06 Thread Peter Zijlstra
On Thu, Apr 06, 2017 at 10:01:39AM +0200, Ingo Molnar wrote:
> I.e. just have a counter and these two APIs:
> 
> static inline void migrate_disable(void)
> {
> current->migration_disabled++;
> }
> 
> ...
> 
> static inline void migrate_enable(void)
> {
> current->migration_disabled--;
> }
> 
> ... and make sure the scheduler migration code plus the CPU hotplug code 
> considers 
> the counter.
> 
> Would this work, and would this be the simplest all around solution?

If you want to add expensive bits to hot paths like wakeups ... :/


Re: [RFC PATCH] kernel: sched: Provide a pointer to the valid CPU mask

2017-04-06 Thread Peter Zijlstra
On Wed, Apr 05, 2017 at 09:39:43AM +0200, Ingo Molnar wrote:

> > While converting the existing users I tried to stick with the rules
> > above however… well mostly CPUFREQ tries to temporary switch the CPU
> > mask to do something on a certain CPU and then switches the mask back it
> > its original value. So in theory `cpus_ptr' could or should be used.
> > However if this is invoked in a migration disabled region (which is not
> > the case because it would require something like preempt_disable() and
> > set_cpus_allowed_ptr() might sleep so it can't be) then the "restore"
> > part would restore the wrong mask. So it only looks strange and I go for
> > the pointer…
> 
> So maybe we could add the following facility:
> 
>   ptr = sched_migrate_to_cpu_save(cpu);
> 
>   ...
> 
>   sched_migrate_to_cpu_restore(ptr);
> 
> ... and use it in the cpufreq code. Then -rt could simply define 
> migrate_disable() 
> to be:

Argh, no

We have plenty of proper per-cpu kthreads / workqueue and other APIs to
do this.

Not to mention that implementing the above in a non-broken way has user
visible side-effects, which ugly.


Re: [RFC PATCH] kernel: sched: Provide a pointer to the valid CPU mask

2017-04-06 Thread Peter Zijlstra
On Wed, Apr 05, 2017 at 09:39:43AM +0200, Ingo Molnar wrote:

> > While converting the existing users I tried to stick with the rules
> > above however… well mostly CPUFREQ tries to temporary switch the CPU
> > mask to do something on a certain CPU and then switches the mask back it
> > its original value. So in theory `cpus_ptr' could or should be used.
> > However if this is invoked in a migration disabled region (which is not
> > the case because it would require something like preempt_disable() and
> > set_cpus_allowed_ptr() might sleep so it can't be) then the "restore"
> > part would restore the wrong mask. So it only looks strange and I go for
> > the pointer…
> 
> So maybe we could add the following facility:
> 
>   ptr = sched_migrate_to_cpu_save(cpu);
> 
>   ...
> 
>   sched_migrate_to_cpu_restore(ptr);
> 
> ... and use it in the cpufreq code. Then -rt could simply define 
> migrate_disable() 
> to be:

Argh, no

We have plenty of proper per-cpu kthreads / workqueue and other APIs to
do this.

Not to mention that implementing the above in a non-broken way has user
visible side-effects, which ugly.


Re: [RFC PATCH] kernel: sched: Provide a pointer to the valid CPU mask

2017-04-06 Thread Peter Zijlstra
On Tue, Apr 04, 2017 at 08:42:02PM +0200, Sebastian Andrzej Siewior wrote:
> While converting the existing users I tried to stick with the rules
> above however… well mostly CPUFREQ tries to temporary switch the CPU
> mask to do something on a certain CPU and then switches the mask back it
> its original value.


There's a bunch of that through ancient and rotten parts of the kernel.
All those sites are broken.

Nothing stops userspace from setting a different affinity right after
the kernel does for those threads.


Re: [RFC PATCH] kernel: sched: Provide a pointer to the valid CPU mask

2017-04-06 Thread Peter Zijlstra
On Tue, Apr 04, 2017 at 08:42:02PM +0200, Sebastian Andrzej Siewior wrote:
> While converting the existing users I tried to stick with the rules
> above however… well mostly CPUFREQ tries to temporary switch the CPU
> mask to do something on a certain CPU and then switches the mask back it
> its original value.


There's a bunch of that through ancient and rotten parts of the kernel.
All those sites are broken.

Nothing stops userspace from setting a different affinity right after
the kernel does for those threads.


Re: [RFC PATCH] kernel: sched: Provide a pointer to the valid CPU mask

2017-04-06 Thread Sebastian Andrzej Siewior
On 2017-04-06 10:01:39 [+0200], Ingo Molnar wrote:
> 
> * Sebastian Andrzej Siewior  wrote:
> 
> > On 2017-04-06 08:16:22 [+0200], Ingo Molnar wrote:
> > > 
> > > * Sebastian Andrzej Siewior  wrote:
> > > 
> > > > On 2017-04-05 09:39:43 [+0200], Ingo Molnar wrote:
> > > > > 
> > > > > So maybe we could add the following facility:
> > > > > 
> > > > >   ptr = sched_migrate_to_cpu_save(cpu);
> > > > > 
> > > > >   ...
> > > > > 
> > > > >   sched_migrate_to_cpu_restore(ptr);
> > > 
> > > BTW., and I'm sure this has come up before, but why doesn't 
> > > migrate_disable() use 
> > > a simple per task flag that the scheduler migration code takes into 
> > > account?
> > 
> > we could add that. But right now there are two spots which look at the
> > counter to decide whether or not migration is disabled.
> > 
> > > It should be functionally equivalent to the current solution, and it 
> > > appears to 
> > > have a heck of a smaller cross section with the rest of the scheduler.
> > > 
> > > I.e.:
> > > 
> > >   static inline void migrate_disable(void)
> > >   {
> > >   current->migration_disabled++;
> > >   }
> > > 
> > >   ...
> > > 
> > >   static inline void migrate_enable(void)
> > >   {
> > >   current->migration_disabled--;
> > >   }
> > > 
> > > or so? Then add this flag as a condition to can_migrate_task() et al.
> > > 
> > > While we generally dislike such flags as they wreck havoc with the 
> > > scheduler if 
> > > overused, the cpus_allowed based solution has the exact same effect so 
> > > it's not 
> > > like it's a step backwards - and it should also be much faster and less 
> > > intrusive.
> > 
> > So you are saying that we drop the cpus_ptr + cpus_mask fields again and
> > instead add a task-flag to ensure that the scheduler does not migrate
> > the task to another CPU?
> 
> Yeah - but no need to add a per-task flag if we already have a counter.
> 
> > > Am I missing some complication?
> > 
> > We do have the counter. We have need to ensure that the CPU is not going 
> > away 
> > while we are in a migrate_disable() region since we can be scheduled out. 
> > So the 
> > CPU can't go offline until we leave that region.
> 
> Yeah. But it should be relatively straightforward to extend the logic that 
> makes 
> sure that a CPU does not go away from under tasks pinned to that CPU alone, 
> right?

I used get_online_cpus() that is enough.

> > #define migrate_disable()   sched_migrate_to_cpu_save(-1)
> > 
> > int sched_migrate_to_cpu_save(int cpu)
> 
> So if we have a ->migration_disabled counter then we don't need the 
> sched_migrate_to_cpu_save()/restore() complication, right?

correct. Unless (as you suggested) we want a migrate to specific CPU we
just the function until the BUG() statement and don't need to check the
CPU we are on.

> Sorry if this is a back and forth - I was somehow convinced that we do need 
> to 
> frob the cpus_allowed mask to get this functionality - but in hindsight I 
> think 
> the counter should be enough.
> 
> I.e. just have a counter and these two APIs:
> 
> static inline void migrate_disable(void)
> {
> current->migration_disabled++;
plus
if (current->migration_disabled == 1)
get_online_cpus()
> }
> 
> ...
> 
> static inline void migrate_enable(void)
> {
> current->migration_disabled--;
> }
> 
> ... and make sure the scheduler migration code plus the CPU hotplug code 
> considers 
> the counter.

So on the sched part I need go through all places where it looks at the
mask and make sure it ignores the CPU switch decision. This could be
doable.
On the CPU hotplug I don't see a way around get_online_cpus(). It is
essentially there to ensure a CPU does not go away. The other way around
it would be to remove the CPU from the online_mask followed by going
through all task left on the CPU waiting for them leave the CPU. And I
remember peterz saying that he wanted to make get_online_cpus() fast and
stay with that.

> Would this work, and would this be the simplest all around solution?

Let me try this with get_online_cpus() (as suggested) and without
cpus_ptr.

> Thanks,
> 
>   Ingo

Sebastian


Re: [RFC PATCH] kernel: sched: Provide a pointer to the valid CPU mask

2017-04-06 Thread Sebastian Andrzej Siewior
On 2017-04-06 10:01:39 [+0200], Ingo Molnar wrote:
> 
> * Sebastian Andrzej Siewior  wrote:
> 
> > On 2017-04-06 08:16:22 [+0200], Ingo Molnar wrote:
> > > 
> > > * Sebastian Andrzej Siewior  wrote:
> > > 
> > > > On 2017-04-05 09:39:43 [+0200], Ingo Molnar wrote:
> > > > > 
> > > > > So maybe we could add the following facility:
> > > > > 
> > > > >   ptr = sched_migrate_to_cpu_save(cpu);
> > > > > 
> > > > >   ...
> > > > > 
> > > > >   sched_migrate_to_cpu_restore(ptr);
> > > 
> > > BTW., and I'm sure this has come up before, but why doesn't 
> > > migrate_disable() use 
> > > a simple per task flag that the scheduler migration code takes into 
> > > account?
> > 
> > we could add that. But right now there are two spots which look at the
> > counter to decide whether or not migration is disabled.
> > 
> > > It should be functionally equivalent to the current solution, and it 
> > > appears to 
> > > have a heck of a smaller cross section with the rest of the scheduler.
> > > 
> > > I.e.:
> > > 
> > >   static inline void migrate_disable(void)
> > >   {
> > >   current->migration_disabled++;
> > >   }
> > > 
> > >   ...
> > > 
> > >   static inline void migrate_enable(void)
> > >   {
> > >   current->migration_disabled--;
> > >   }
> > > 
> > > or so? Then add this flag as a condition to can_migrate_task() et al.
> > > 
> > > While we generally dislike such flags as they wreck havoc with the 
> > > scheduler if 
> > > overused, the cpus_allowed based solution has the exact same effect so 
> > > it's not 
> > > like it's a step backwards - and it should also be much faster and less 
> > > intrusive.
> > 
> > So you are saying that we drop the cpus_ptr + cpus_mask fields again and
> > instead add a task-flag to ensure that the scheduler does not migrate
> > the task to another CPU?
> 
> Yeah - but no need to add a per-task flag if we already have a counter.
> 
> > > Am I missing some complication?
> > 
> > We do have the counter. We have need to ensure that the CPU is not going 
> > away 
> > while we are in a migrate_disable() region since we can be scheduled out. 
> > So the 
> > CPU can't go offline until we leave that region.
> 
> Yeah. But it should be relatively straightforward to extend the logic that 
> makes 
> sure that a CPU does not go away from under tasks pinned to that CPU alone, 
> right?

I used get_online_cpus() that is enough.

> > #define migrate_disable()   sched_migrate_to_cpu_save(-1)
> > 
> > int sched_migrate_to_cpu_save(int cpu)
> 
> So if we have a ->migration_disabled counter then we don't need the 
> sched_migrate_to_cpu_save()/restore() complication, right?

correct. Unless (as you suggested) we want a migrate to specific CPU we
just the function until the BUG() statement and don't need to check the
CPU we are on.

> Sorry if this is a back and forth - I was somehow convinced that we do need 
> to 
> frob the cpus_allowed mask to get this functionality - but in hindsight I 
> think 
> the counter should be enough.
> 
> I.e. just have a counter and these two APIs:
> 
> static inline void migrate_disable(void)
> {
> current->migration_disabled++;
plus
if (current->migration_disabled == 1)
get_online_cpus()
> }
> 
> ...
> 
> static inline void migrate_enable(void)
> {
> current->migration_disabled--;
> }
> 
> ... and make sure the scheduler migration code plus the CPU hotplug code 
> considers 
> the counter.

So on the sched part I need go through all places where it looks at the
mask and make sure it ignores the CPU switch decision. This could be
doable.
On the CPU hotplug I don't see a way around get_online_cpus(). It is
essentially there to ensure a CPU does not go away. The other way around
it would be to remove the CPU from the online_mask followed by going
through all task left on the CPU waiting for them leave the CPU. And I
remember peterz saying that he wanted to make get_online_cpus() fast and
stay with that.

> Would this work, and would this be the simplest all around solution?

Let me try this with get_online_cpus() (as suggested) and without
cpus_ptr.

> Thanks,
> 
>   Ingo

Sebastian


Re: [RFC PATCH] kernel: sched: Provide a pointer to the valid CPU mask

2017-04-06 Thread Ingo Molnar

* Sebastian Andrzej Siewior  wrote:

> On 2017-04-06 08:16:22 [+0200], Ingo Molnar wrote:
> > 
> > * Sebastian Andrzej Siewior  wrote:
> > 
> > > On 2017-04-05 09:39:43 [+0200], Ingo Molnar wrote:
> > > > 
> > > > So maybe we could add the following facility:
> > > > 
> > > > ptr = sched_migrate_to_cpu_save(cpu);
> > > > 
> > > > ...
> > > > 
> > > > sched_migrate_to_cpu_restore(ptr);
> > 
> > BTW., and I'm sure this has come up before, but why doesn't 
> > migrate_disable() use 
> > a simple per task flag that the scheduler migration code takes into account?
> 
> we could add that. But right now there are two spots which look at the
> counter to decide whether or not migration is disabled.
> 
> > It should be functionally equivalent to the current solution, and it 
> > appears to 
> > have a heck of a smaller cross section with the rest of the scheduler.
> > 
> > I.e.:
> > 
> > static inline void migrate_disable(void)
> > {
> > current->migration_disabled++;
> > }
> > 
> > ...
> > 
> > static inline void migrate_enable(void)
> > {
> > current->migration_disabled--;
> > }
> > 
> > or so? Then add this flag as a condition to can_migrate_task() et al.
> > 
> > While we generally dislike such flags as they wreck havoc with the 
> > scheduler if 
> > overused, the cpus_allowed based solution has the exact same effect so it's 
> > not 
> > like it's a step backwards - and it should also be much faster and less 
> > intrusive.
> 
> So you are saying that we drop the cpus_ptr + cpus_mask fields again and
> instead add a task-flag to ensure that the scheduler does not migrate
> the task to another CPU?

Yeah - but no need to add a per-task flag if we already have a counter.

> > Am I missing some complication?
> 
> We do have the counter. We have need to ensure that the CPU is not going away 
> while we are in a migrate_disable() region since we can be scheduled out. So 
> the 
> CPU can't go offline until we leave that region.

Yeah. But it should be relatively straightforward to extend the logic that 
makes 
sure that a CPU does not go away from under tasks pinned to that CPU alone, 
right?

> #define migrate_disable() sched_migrate_to_cpu_save(-1)
> 
> int sched_migrate_to_cpu_save(int cpu)

So if we have a ->migration_disabled counter then we don't need the 
sched_migrate_to_cpu_save()/restore() complication, right?

Sorry if this is a back and forth - I was somehow convinced that we do need to 
frob the cpus_allowed mask to get this functionality - but in hindsight I think 
the counter should be enough.

I.e. just have a counter and these two APIs:

static inline void migrate_disable(void)
{
current->migration_disabled++;
}

...

static inline void migrate_enable(void)
{
current->migration_disabled--;
}

... and make sure the scheduler migration code plus the CPU hotplug code 
considers 
the counter.

Would this work, and would this be the simplest all around solution?

Thanks,

Ingo


Re: [RFC PATCH] kernel: sched: Provide a pointer to the valid CPU mask

2017-04-06 Thread Ingo Molnar

* Sebastian Andrzej Siewior  wrote:

> On 2017-04-06 08:16:22 [+0200], Ingo Molnar wrote:
> > 
> > * Sebastian Andrzej Siewior  wrote:
> > 
> > > On 2017-04-05 09:39:43 [+0200], Ingo Molnar wrote:
> > > > 
> > > > So maybe we could add the following facility:
> > > > 
> > > > ptr = sched_migrate_to_cpu_save(cpu);
> > > > 
> > > > ...
> > > > 
> > > > sched_migrate_to_cpu_restore(ptr);
> > 
> > BTW., and I'm sure this has come up before, but why doesn't 
> > migrate_disable() use 
> > a simple per task flag that the scheduler migration code takes into account?
> 
> we could add that. But right now there are two spots which look at the
> counter to decide whether or not migration is disabled.
> 
> > It should be functionally equivalent to the current solution, and it 
> > appears to 
> > have a heck of a smaller cross section with the rest of the scheduler.
> > 
> > I.e.:
> > 
> > static inline void migrate_disable(void)
> > {
> > current->migration_disabled++;
> > }
> > 
> > ...
> > 
> > static inline void migrate_enable(void)
> > {
> > current->migration_disabled--;
> > }
> > 
> > or so? Then add this flag as a condition to can_migrate_task() et al.
> > 
> > While we generally dislike such flags as they wreck havoc with the 
> > scheduler if 
> > overused, the cpus_allowed based solution has the exact same effect so it's 
> > not 
> > like it's a step backwards - and it should also be much faster and less 
> > intrusive.
> 
> So you are saying that we drop the cpus_ptr + cpus_mask fields again and
> instead add a task-flag to ensure that the scheduler does not migrate
> the task to another CPU?

Yeah - but no need to add a per-task flag if we already have a counter.

> > Am I missing some complication?
> 
> We do have the counter. We have need to ensure that the CPU is not going away 
> while we are in a migrate_disable() region since we can be scheduled out. So 
> the 
> CPU can't go offline until we leave that region.

Yeah. But it should be relatively straightforward to extend the logic that 
makes 
sure that a CPU does not go away from under tasks pinned to that CPU alone, 
right?

> #define migrate_disable() sched_migrate_to_cpu_save(-1)
> 
> int sched_migrate_to_cpu_save(int cpu)

So if we have a ->migration_disabled counter then we don't need the 
sched_migrate_to_cpu_save()/restore() complication, right?

Sorry if this is a back and forth - I was somehow convinced that we do need to 
frob the cpus_allowed mask to get this functionality - but in hindsight I think 
the counter should be enough.

I.e. just have a counter and these two APIs:

static inline void migrate_disable(void)
{
current->migration_disabled++;
}

...

static inline void migrate_enable(void)
{
current->migration_disabled--;
}

... and make sure the scheduler migration code plus the CPU hotplug code 
considers 
the counter.

Would this work, and would this be the simplest all around solution?

Thanks,

Ingo


Re: [RFC PATCH] kernel: sched: Provide a pointer to the valid CPU mask

2017-04-06 Thread Sebastian Andrzej Siewior
On 2017-04-06 08:16:22 [+0200], Ingo Molnar wrote:
> 
> * Sebastian Andrzej Siewior  wrote:
> 
> > On 2017-04-05 09:39:43 [+0200], Ingo Molnar wrote:
> > > 
> > > So maybe we could add the following facility:
> > > 
> > >   ptr = sched_migrate_to_cpu_save(cpu);
> > > 
> > >   ...
> > > 
> > >   sched_migrate_to_cpu_restore(ptr);
> 
> BTW., and I'm sure this has come up before, but why doesn't migrate_disable() 
> use 
> a simple per task flag that the scheduler migration code takes into account?

we could add that. But right now there are two spots which look at the
counter to decide whether or not migration is disabled.

> It should be functionally equivalent to the current solution, and it appears 
> to 
> have a heck of a smaller cross section with the rest of the scheduler.
> 
> I.e.:
> 
>   static inline void migrate_disable(void)
>   {
>   current->migration_disabled++;
>   }
> 
>   ...
> 
>   static inline void migrate_enable(void)
>   {
>   current->migration_disabled--;
>   }
> 
> or so? Then add this flag as a condition to can_migrate_task() et al.
> 
> While we generally dislike such flags as they wreck havoc with the scheduler 
> if 
> overused, the cpus_allowed based solution has the exact same effect so it's 
> not 
> like it's a step backwards - and it should also be much faster and less 
> intrusive.

So you are saying that we drop the cpus_ptr + cpus_mask fields again and
instead add a task-flag to ensure that the scheduler does not migrate
the task to another CPU?

> Am I missing some complication?

We do have the counter. We have need to ensure that the CPU is not going
away while we are in a migrate_disable() region since we can be
scheduled out. So the CPU can't go offline until we leave that region.
This version uses get_online_cpus() while in -RT we have something
called "pin_current_cpu()". This is a lightweight version of
get_online_cpus() which should go away…
Right now I have this and I need to test this and complete CPU hop part:

#define migrate_disable()   sched_migrate_to_cpu_save(-1)

int sched_migrate_to_cpu_save(int cpu)
{
   struct task_struct *p = current;

   if (in_atomic()) {
#ifdef CONFIG_SCHED_DEBUG
   p->migrate_disable_atomic++;
   if (cpu >= 0)
   WARN_ON_ONCE(!cpumask_equal(p->cpus_ptr, 
cpumask_of(cpu)));
#endif
   return raw_smp_processor_id();
   }
#ifdef CONFIG_SCHED_DEBUG
   WARN_ON_ONCE(p->migrate_disable_atomic);
#endif

   if (p->migrate_disable) {
   p->migrate_disable++;
#ifdef CONFIG_SCHED_DEBUG
   if (cpu >= 0)
   WARN_ON_ONCE(!cpumask_equal(p->cpus_ptr, 
cpumask_of(cpu)));
#endif
   return raw_smp_processor_id();
   }

   get_online_cpus();

   preempt_disable();
   p->migrate_disable = 1;

   if (cpu < 0) {
   p->cpus_ptr = _of(task_cpu(raw_smp_processor_id()));
   } else {
   if (!cpu_online(cpu)) {
   preempt_enable();
   put_online_cpus();
   WARN(1, "CPU is offline\n");
   return -ENODEV;
   }
   p->cpus_ptr = _of(task_cpu((cpu)));
   }
   t->nr_cpus = 1;
   preempt_enable();

   if (cpumask_equal(p->cpus_ptr, cpumask_of(cpu)))
   return cpu;

   /* move to the correct CPU */
   BUG();
   return raw_smp_processor_id();
}

The task-flag / p->migrate_disable() counter is used in two spots in
   do_set_cpus_allowed();
   __set_cpus_allowed_ptr();

so that a change to the affinity mask does not force a CPU hop.

> Thanks,
> 
>   Ingo

Sebastian


Re: [RFC PATCH] kernel: sched: Provide a pointer to the valid CPU mask

2017-04-06 Thread Sebastian Andrzej Siewior
On 2017-04-06 08:16:22 [+0200], Ingo Molnar wrote:
> 
> * Sebastian Andrzej Siewior  wrote:
> 
> > On 2017-04-05 09:39:43 [+0200], Ingo Molnar wrote:
> > > 
> > > So maybe we could add the following facility:
> > > 
> > >   ptr = sched_migrate_to_cpu_save(cpu);
> > > 
> > >   ...
> > > 
> > >   sched_migrate_to_cpu_restore(ptr);
> 
> BTW., and I'm sure this has come up before, but why doesn't migrate_disable() 
> use 
> a simple per task flag that the scheduler migration code takes into account?

we could add that. But right now there are two spots which look at the
counter to decide whether or not migration is disabled.

> It should be functionally equivalent to the current solution, and it appears 
> to 
> have a heck of a smaller cross section with the rest of the scheduler.
> 
> I.e.:
> 
>   static inline void migrate_disable(void)
>   {
>   current->migration_disabled++;
>   }
> 
>   ...
> 
>   static inline void migrate_enable(void)
>   {
>   current->migration_disabled--;
>   }
> 
> or so? Then add this flag as a condition to can_migrate_task() et al.
> 
> While we generally dislike such flags as they wreck havoc with the scheduler 
> if 
> overused, the cpus_allowed based solution has the exact same effect so it's 
> not 
> like it's a step backwards - and it should also be much faster and less 
> intrusive.

So you are saying that we drop the cpus_ptr + cpus_mask fields again and
instead add a task-flag to ensure that the scheduler does not migrate
the task to another CPU?

> Am I missing some complication?

We do have the counter. We have need to ensure that the CPU is not going
away while we are in a migrate_disable() region since we can be
scheduled out. So the CPU can't go offline until we leave that region.
This version uses get_online_cpus() while in -RT we have something
called "pin_current_cpu()". This is a lightweight version of
get_online_cpus() which should go away…
Right now I have this and I need to test this and complete CPU hop part:

#define migrate_disable()   sched_migrate_to_cpu_save(-1)

int sched_migrate_to_cpu_save(int cpu)
{
   struct task_struct *p = current;

   if (in_atomic()) {
#ifdef CONFIG_SCHED_DEBUG
   p->migrate_disable_atomic++;
   if (cpu >= 0)
   WARN_ON_ONCE(!cpumask_equal(p->cpus_ptr, 
cpumask_of(cpu)));
#endif
   return raw_smp_processor_id();
   }
#ifdef CONFIG_SCHED_DEBUG
   WARN_ON_ONCE(p->migrate_disable_atomic);
#endif

   if (p->migrate_disable) {
   p->migrate_disable++;
#ifdef CONFIG_SCHED_DEBUG
   if (cpu >= 0)
   WARN_ON_ONCE(!cpumask_equal(p->cpus_ptr, 
cpumask_of(cpu)));
#endif
   return raw_smp_processor_id();
   }

   get_online_cpus();

   preempt_disable();
   p->migrate_disable = 1;

   if (cpu < 0) {
   p->cpus_ptr = _of(task_cpu(raw_smp_processor_id()));
   } else {
   if (!cpu_online(cpu)) {
   preempt_enable();
   put_online_cpus();
   WARN(1, "CPU is offline\n");
   return -ENODEV;
   }
   p->cpus_ptr = _of(task_cpu((cpu)));
   }
   t->nr_cpus = 1;
   preempt_enable();

   if (cpumask_equal(p->cpus_ptr, cpumask_of(cpu)))
   return cpu;

   /* move to the correct CPU */
   BUG();
   return raw_smp_processor_id();
}

The task-flag / p->migrate_disable() counter is used in two spots in
   do_set_cpus_allowed();
   __set_cpus_allowed_ptr();

so that a change to the affinity mask does not force a CPU hop.

> Thanks,
> 
>   Ingo

Sebastian


Re: [RFC PATCH] kernel: sched: Provide a pointer to the valid CPU mask

2017-04-06 Thread Ingo Molnar

* Sebastian Andrzej Siewior  wrote:

> On 2017-04-05 09:39:43 [+0200], Ingo Molnar wrote:
> > 
> > So maybe we could add the following facility:
> > 
> > ptr = sched_migrate_to_cpu_save(cpu);
> > 
> > ...
> > 
> > sched_migrate_to_cpu_restore(ptr);

BTW., and I'm sure this has come up before, but why doesn't migrate_disable() 
use 
a simple per task flag that the scheduler migration code takes into account?

It should be functionally equivalent to the current solution, and it appears to 
have a heck of a smaller cross section with the rest of the scheduler.

I.e.:

static inline void migrate_disable(void)
{
current->migration_disabled++;
}

...

static inline void migrate_enable(void)
{
current->migration_disabled--;
}

or so? Then add this flag as a condition to can_migrate_task() et al.

While we generally dislike such flags as they wreck havoc with the scheduler if 
overused, the cpus_allowed based solution has the exact same effect so it's not 
like it's a step backwards - and it should also be much faster and less 
intrusive.

Am I missing some complication?

Thanks,

Ingo


Re: [RFC PATCH] kernel: sched: Provide a pointer to the valid CPU mask

2017-04-06 Thread Ingo Molnar

* Sebastian Andrzej Siewior  wrote:

> On 2017-04-05 09:39:43 [+0200], Ingo Molnar wrote:
> > 
> > So maybe we could add the following facility:
> > 
> > ptr = sched_migrate_to_cpu_save(cpu);
> > 
> > ...
> > 
> > sched_migrate_to_cpu_restore(ptr);

BTW., and I'm sure this has come up before, but why doesn't migrate_disable() 
use 
a simple per task flag that the scheduler migration code takes into account?

It should be functionally equivalent to the current solution, and it appears to 
have a heck of a smaller cross section with the rest of the scheduler.

I.e.:

static inline void migrate_disable(void)
{
current->migration_disabled++;
}

...

static inline void migrate_enable(void)
{
current->migration_disabled--;
}

or so? Then add this flag as a condition to can_migrate_task() et al.

While we generally dislike such flags as they wreck havoc with the scheduler if 
overused, the cpus_allowed based solution has the exact same effect so it's not 
like it's a step backwards - and it should also be much faster and less 
intrusive.

Am I missing some complication?

Thanks,

Ingo


Re: [RFC PATCH] kernel: sched: Provide a pointer to the valid CPU mask

2017-04-05 Thread Sebastian Andrzej Siewior
On 2017-04-05 09:39:43 [+0200], Ingo Molnar wrote:
> 
> So maybe we could add the following facility:
> 
>   ptr = sched_migrate_to_cpu_save(cpu);
> 
>   ...
> 
>   sched_migrate_to_cpu_restore(ptr);
> 
> ... and use it in the cpufreq code. Then -rt could simply define 
> migrate_disable() 
> to be:
> 
>   ptr = sched_migrate_to_cpu_save(raw_smp_processor_id());
> 
> and define migrate_enable() as:
> 
>   sched_migrate_to_cpu_restore(ptr);
> 
> ... or such.
> 
> In the cpu == current_cpu case it would be super fast - otherwise it would 
> migrate 
> over to the target CPU first. Also note that this facility is strictly a 
> special 
> case for single-CPU masks and migrations - i.e. the constant pointer cpumask 
> optimization would always apply.
> 
> Note that due to the use of the 'ptr' local variable the interface nests 
> naturally, so this would be a legitimate use:
>   
>   ptr = sched_migrate_to_cpu_save(cpu);
> 
>   ...
>   migrate_disable();
>   ...
>   migrate_enable();
>   ...
> 
>   sched_migrate_to_cpu_restore(ptr);
> 
> I.e. my proposal would be to essentially upstream the -rt migrate_disable() 
> facility in a slightly more generic form that would fit the cpufreq usecase.
> 
> I bet a number of the current driver's mucking with cpumask would also fit 
> this 
> new API.
> 
> Does this make sense?

It kind of does. If you want to allow migration to different CPU then it
will might make things little complicated because we need to supported
nested migrate_disable() and we can't (must not) change the mask while
nesting.
Other than that, I am not sure the cpufreq usecase is valid.
schedule_work_on() looks better but maybe not as fast as the proposed
sched_migrate_to_cpu_save(). Also, some users look wrong:
[PATCH] CPUFREQ: Loongson2: drop set_cpus_allowed_ptr()
https://www.linux-mips.org/archives/linux-mips/2017-04/msg00042.html

and I received offlist mail pointing to the other cpufreq users. So here
I am waiting for some feedback from the cpufreq maintainer.

But I get your point. Other than super-fast switching to a specific CPU
for $reason we could replace a few preempt_disable() invocation which
are only there due to per-CPU variables. So let me hack this into -RT
and come back…

> Thanks,
> 
>   Ingo

Sebastian


Re: [RFC PATCH] kernel: sched: Provide a pointer to the valid CPU mask

2017-04-05 Thread Sebastian Andrzej Siewior
On 2017-04-05 09:39:43 [+0200], Ingo Molnar wrote:
> 
> So maybe we could add the following facility:
> 
>   ptr = sched_migrate_to_cpu_save(cpu);
> 
>   ...
> 
>   sched_migrate_to_cpu_restore(ptr);
> 
> ... and use it in the cpufreq code. Then -rt could simply define 
> migrate_disable() 
> to be:
> 
>   ptr = sched_migrate_to_cpu_save(raw_smp_processor_id());
> 
> and define migrate_enable() as:
> 
>   sched_migrate_to_cpu_restore(ptr);
> 
> ... or such.
> 
> In the cpu == current_cpu case it would be super fast - otherwise it would 
> migrate 
> over to the target CPU first. Also note that this facility is strictly a 
> special 
> case for single-CPU masks and migrations - i.e. the constant pointer cpumask 
> optimization would always apply.
> 
> Note that due to the use of the 'ptr' local variable the interface nests 
> naturally, so this would be a legitimate use:
>   
>   ptr = sched_migrate_to_cpu_save(cpu);
> 
>   ...
>   migrate_disable();
>   ...
>   migrate_enable();
>   ...
> 
>   sched_migrate_to_cpu_restore(ptr);
> 
> I.e. my proposal would be to essentially upstream the -rt migrate_disable() 
> facility in a slightly more generic form that would fit the cpufreq usecase.
> 
> I bet a number of the current driver's mucking with cpumask would also fit 
> this 
> new API.
> 
> Does this make sense?

It kind of does. If you want to allow migration to different CPU then it
will might make things little complicated because we need to supported
nested migrate_disable() and we can't (must not) change the mask while
nesting.
Other than that, I am not sure the cpufreq usecase is valid.
schedule_work_on() looks better but maybe not as fast as the proposed
sched_migrate_to_cpu_save(). Also, some users look wrong:
[PATCH] CPUFREQ: Loongson2: drop set_cpus_allowed_ptr()
https://www.linux-mips.org/archives/linux-mips/2017-04/msg00042.html

and I received offlist mail pointing to the other cpufreq users. So here
I am waiting for some feedback from the cpufreq maintainer.

But I get your point. Other than super-fast switching to a specific CPU
for $reason we could replace a few preempt_disable() invocation which
are only there due to per-CPU variables. So let me hack this into -RT
and come back…

> Thanks,
> 
>   Ingo

Sebastian


Re: [RFC PATCH] kernel: sched: Provide a pointer to the valid CPU mask

2017-04-05 Thread Ingo Molnar

* Sebastian Andrzej Siewior  wrote:

> In commit 4b53a3412d66 ("sched/core: Remove the tsk_nr_cpus_allowed()
> wrapper") the tsk_nr_cpus_allowed() wrapper was removed. There was not
> much difference in !RT but in RT we used this to implement
> migrate_disable(). Within a migrate_disable() section the CPU mask is
> restricted to single CPU while the "normal" CPU mask remains untouched.
> 
> As an alternative implementation Ingo suggested to use
>   struct task_struct {
>   const cpumask_t *cpus_ptr;
>   cpumask_t   cpus_mask;
> };
> with
>   t->cpus_allowed_ptr = >cpus_allowed;
> 
> In -RT we then can switch the cpus_ptr to
>   t->cpus_allowed_ptr = _of(task_cpu(p));
> 
> in a migration disabled region. The rules are simple:
> - Code that 'uses' ->cpus_allowed would use the pointer.
> - Code that 'modifies' ->cpus_allowed would use the direct mask.
> 
> While converting the existing users I tried to stick with the rules
> above however… well mostly CPUFREQ tries to temporary switch the CPU
> mask to do something on a certain CPU and then switches the mask back it
> its original value. So in theory `cpus_ptr' could or should be used.
> However if this is invoked in a migration disabled region (which is not
> the case because it would require something like preempt_disable() and
> set_cpus_allowed_ptr() might sleep so it can't be) then the "restore"
> part would restore the wrong mask. So it only looks strange and I go for
> the pointer…

So maybe we could add the following facility:

ptr = sched_migrate_to_cpu_save(cpu);

...

sched_migrate_to_cpu_restore(ptr);

... and use it in the cpufreq code. Then -rt could simply define 
migrate_disable() 
to be:

ptr = sched_migrate_to_cpu_save(raw_smp_processor_id());

and define migrate_enable() as:

sched_migrate_to_cpu_restore(ptr);

... or such.

In the cpu == current_cpu case it would be super fast - otherwise it would 
migrate 
over to the target CPU first. Also note that this facility is strictly a 
special 
case for single-CPU masks and migrations - i.e. the constant pointer cpumask 
optimization would always apply.

Note that due to the use of the 'ptr' local variable the interface nests 
naturally, so this would be a legitimate use:

ptr = sched_migrate_to_cpu_save(cpu);

...
migrate_disable();
...
migrate_enable();
...

sched_migrate_to_cpu_restore(ptr);

I.e. my proposal would be to essentially upstream the -rt migrate_disable() 
facility in a slightly more generic form that would fit the cpufreq usecase.

I bet a number of the current driver's mucking with cpumask would also fit this 
new API.

Does this make sense?

Thanks,

Ingo


Re: [RFC PATCH] kernel: sched: Provide a pointer to the valid CPU mask

2017-04-05 Thread Ingo Molnar

* Sebastian Andrzej Siewior  wrote:

> In commit 4b53a3412d66 ("sched/core: Remove the tsk_nr_cpus_allowed()
> wrapper") the tsk_nr_cpus_allowed() wrapper was removed. There was not
> much difference in !RT but in RT we used this to implement
> migrate_disable(). Within a migrate_disable() section the CPU mask is
> restricted to single CPU while the "normal" CPU mask remains untouched.
> 
> As an alternative implementation Ingo suggested to use
>   struct task_struct {
>   const cpumask_t *cpus_ptr;
>   cpumask_t   cpus_mask;
> };
> with
>   t->cpus_allowed_ptr = >cpus_allowed;
> 
> In -RT we then can switch the cpus_ptr to
>   t->cpus_allowed_ptr = _of(task_cpu(p));
> 
> in a migration disabled region. The rules are simple:
> - Code that 'uses' ->cpus_allowed would use the pointer.
> - Code that 'modifies' ->cpus_allowed would use the direct mask.
> 
> While converting the existing users I tried to stick with the rules
> above however… well mostly CPUFREQ tries to temporary switch the CPU
> mask to do something on a certain CPU and then switches the mask back it
> its original value. So in theory `cpus_ptr' could or should be used.
> However if this is invoked in a migration disabled region (which is not
> the case because it would require something like preempt_disable() and
> set_cpus_allowed_ptr() might sleep so it can't be) then the "restore"
> part would restore the wrong mask. So it only looks strange and I go for
> the pointer…

So maybe we could add the following facility:

ptr = sched_migrate_to_cpu_save(cpu);

...

sched_migrate_to_cpu_restore(ptr);

... and use it in the cpufreq code. Then -rt could simply define 
migrate_disable() 
to be:

ptr = sched_migrate_to_cpu_save(raw_smp_processor_id());

and define migrate_enable() as:

sched_migrate_to_cpu_restore(ptr);

... or such.

In the cpu == current_cpu case it would be super fast - otherwise it would 
migrate 
over to the target CPU first. Also note that this facility is strictly a 
special 
case for single-CPU masks and migrations - i.e. the constant pointer cpumask 
optimization would always apply.

Note that due to the use of the 'ptr' local variable the interface nests 
naturally, so this would be a legitimate use:

ptr = sched_migrate_to_cpu_save(cpu);

...
migrate_disable();
...
migrate_enable();
...

sched_migrate_to_cpu_restore(ptr);

I.e. my proposal would be to essentially upstream the -rt migrate_disable() 
facility in a slightly more generic form that would fit the cpufreq usecase.

I bet a number of the current driver's mucking with cpumask would also fit this 
new API.

Does this make sense?

Thanks,

Ingo