Re: [RFC PATCH] kernel: sched: Provide a pointer to the valid CPU mask
* Thomas Gleixnerwrote: > On Thu, 6 Apr 2017, Ingo Molnar wrote: > > CPU hotplug and changing the affinity mask are the more complex cases, > > because > > there migrating or not migrating is a correctness issue: > > > > - CPU hotplug has to be aware of this anyway, regardless of whether it's > > solved > >via a counter of the affinity mask. > > You have to prevent CPU hotplug simply as long as there are migration > disabled > tasks on the fly. Making that depend on whether they are on a CPU which is > about > to be unplugged or not would be complete overkill as you still have to solve > the > case that a task sets the migrate_disable() AFTER the cpu down machinery > started. > > [...] > > The counter alone might be enough for the scheduler placement decisions, but > it > cannot solve the hotplug issue. You still need something like I sketched out > in > my previous reply. Yes, so what you outlined: void migrate_disable(void) { if (in_atomic() || irqs_disabled()) return; if (!current->migration_disabled) { percpu_down_read_preempt_disable(hotplug_rwsem); current->migration_disabled++; preempt_enable(); } else { current->migration_disabled++; } } Would solve it? I.e. my point is: whether migrate_disable()/enable() is implemented via a counter or a pointer to a cpumask does not materially change how the CPU-hotplug solution looks like, right? I.e. we could just use the counter and avoid the whole wrapping of cpumask complexity. Thanks, Ingo
Re: [RFC PATCH] kernel: sched: Provide a pointer to the valid CPU mask
* Thomas Gleixner wrote: > On Thu, 6 Apr 2017, Ingo Molnar wrote: > > CPU hotplug and changing the affinity mask are the more complex cases, > > because > > there migrating or not migrating is a correctness issue: > > > > - CPU hotplug has to be aware of this anyway, regardless of whether it's > > solved > >via a counter of the affinity mask. > > You have to prevent CPU hotplug simply as long as there are migration > disabled > tasks on the fly. Making that depend on whether they are on a CPU which is > about > to be unplugged or not would be complete overkill as you still have to solve > the > case that a task sets the migrate_disable() AFTER the cpu down machinery > started. > > [...] > > The counter alone might be enough for the scheduler placement decisions, but > it > cannot solve the hotplug issue. You still need something like I sketched out > in > my previous reply. Yes, so what you outlined: void migrate_disable(void) { if (in_atomic() || irqs_disabled()) return; if (!current->migration_disabled) { percpu_down_read_preempt_disable(hotplug_rwsem); current->migration_disabled++; preempt_enable(); } else { current->migration_disabled++; } } Would solve it? I.e. my point is: whether migrate_disable()/enable() is implemented via a counter or a pointer to a cpumask does not materially change how the CPU-hotplug solution looks like, right? I.e. we could just use the counter and avoid the whole wrapping of cpumask complexity. Thanks, Ingo
Re: [RFC PATCH] kernel: sched: Provide a pointer to the valid CPU mask
On Thu, Apr 06, 2017 at 01:56:12PM +0200, Thomas Gleixner wrote: > A pointer to the effective mask is definitely enough. And that's what we > need for migrate_disable() as well. That still leaves the storage > requirement to MIPS. So then you're back to Ingo's proposal, right?
Re: [RFC PATCH] kernel: sched: Provide a pointer to the valid CPU mask
On Thu, Apr 06, 2017 at 01:56:12PM +0200, Thomas Gleixner wrote: > A pointer to the effective mask is definitely enough. And that's what we > need for migrate_disable() as well. That still leaves the storage > requirement to MIPS. So then you're back to Ingo's proposal, right?
Re: [RFC PATCH] kernel: sched: Provide a pointer to the valid CPU mask
On Thu, 6 Apr 2017, Peter Zijlstra wrote: > On Thu, Apr 06, 2017 at 01:03:16PM +0200, Thomas Gleixner wrote: > > > Fair enough, but I prefer to have the ability for a temporary restriction > > of the user space visible cpus allowed mask in the core code, which can be > > used for both migrate_disable() and things like that MIPS FPU stuff rather > > than all those home brewn hackeries which are prone to bitrot, security > > issues and subtle wreckage. cpus_allowed should be solely under sched core > > control and not accessible from anything outside. > > So the big difference between what MIPS does and what migrate_disable() > needs here is the amount of storage. > > For migrate_disable() we don't need a whole second cpumask bitmap. We > can simply use one of the static cpumask_of() things. > > Given that cpumask bitmaps can be quite large, adding another one to > task_struct just for $very_rare_occasion, doesn't seem like a good > thing. A pointer to the effective mask is definitely enough. And that's what we need for migrate_disable() as well. That still leaves the storage requirement to MIPS. Thanks, tglx
Re: [RFC PATCH] kernel: sched: Provide a pointer to the valid CPU mask
On Thu, 6 Apr 2017, Peter Zijlstra wrote: > On Thu, Apr 06, 2017 at 01:03:16PM +0200, Thomas Gleixner wrote: > > > Fair enough, but I prefer to have the ability for a temporary restriction > > of the user space visible cpus allowed mask in the core code, which can be > > used for both migrate_disable() and things like that MIPS FPU stuff rather > > than all those home brewn hackeries which are prone to bitrot, security > > issues and subtle wreckage. cpus_allowed should be solely under sched core > > control and not accessible from anything outside. > > So the big difference between what MIPS does and what migrate_disable() > needs here is the amount of storage. > > For migrate_disable() we don't need a whole second cpumask bitmap. We > can simply use one of the static cpumask_of() things. > > Given that cpumask bitmaps can be quite large, adding another one to > task_struct just for $very_rare_occasion, doesn't seem like a good > thing. A pointer to the effective mask is definitely enough. And that's what we need for migrate_disable() as well. That still leaves the storage requirement to MIPS. Thanks, tglx
Re: [RFC PATCH] kernel: sched: Provide a pointer to the valid CPU mask
On Thu, Apr 06, 2017 at 01:03:16PM +0200, Thomas Gleixner wrote: > Fair enough, but I prefer to have the ability for a temporary restriction > of the user space visible cpus allowed mask in the core code, which can be > used for both migrate_disable() and things like that MIPS FPU stuff rather > than all those home brewn hackeries which are prone to bitrot, security > issues and subtle wreckage. cpus_allowed should be solely under sched core > control and not accessible from anything outside. So the big difference between what MIPS does and what migrate_disable() needs here is the amount of storage. For migrate_disable() we don't need a whole second cpumask bitmap. We can simply use one of the static cpumask_of() things. Given that cpumask bitmaps can be quite large, adding another one to task_struct just for $very_rare_occasion, doesn't seem like a good thing.
Re: [RFC PATCH] kernel: sched: Provide a pointer to the valid CPU mask
On Thu, Apr 06, 2017 at 01:03:16PM +0200, Thomas Gleixner wrote: > Fair enough, but I prefer to have the ability for a temporary restriction > of the user space visible cpus allowed mask in the core code, which can be > used for both migrate_disable() and things like that MIPS FPU stuff rather > than all those home brewn hackeries which are prone to bitrot, security > issues and subtle wreckage. cpus_allowed should be solely under sched core > control and not accessible from anything outside. So the big difference between what MIPS does and what migrate_disable() needs here is the amount of storage. For migrate_disable() we don't need a whole second cpumask bitmap. We can simply use one of the static cpumask_of() things. Given that cpumask bitmaps can be quite large, adding another one to task_struct just for $very_rare_occasion, doesn't seem like a good thing.
Re: [RFC PATCH] kernel: sched: Provide a pointer to the valid CPU mask
On Thu, Apr 06, 2017 at 12:58:27PM +0200, Thomas Gleixner wrote: > On Thu, 6 Apr 2017, Peter Zijlstra wrote: > > Because mainline will not get anything other than prempt_disable() for > > migrate_disable(). > > I'm not yet convinced of that. There is enough stuff which pointlessly > disables preemption and works around the restrictions caused by that or > plays games with cpus_allowed just to stay on a particular cpu. > > I know you don't like it because it imposes restrictions on schedulability, > but disabling preemption for a long time or playing cpus_allowed games has > the same and worse effects. There really isn't much code that plays games with cpus_allowed. And code that disabled preemption for a long time can surely be fixed, we've done so many a time. A pure migrate_disable() (without wrapping lock) is a straight up nightmare, you can end up with a gazillion runnable tasks stuck to one CPU and all other CPUs with their thumbs up their arses waiting for work. The moment you stick migrate_disable() in a lock, you at least get PI to bail you out of some of the pain (but by no means all of it). Adding migrate_disable() to the kernel is really just moving pain around, it doesn't solve anything much.
Re: [RFC PATCH] kernel: sched: Provide a pointer to the valid CPU mask
On Thu, Apr 06, 2017 at 12:58:27PM +0200, Thomas Gleixner wrote: > On Thu, 6 Apr 2017, Peter Zijlstra wrote: > > Because mainline will not get anything other than prempt_disable() for > > migrate_disable(). > > I'm not yet convinced of that. There is enough stuff which pointlessly > disables preemption and works around the restrictions caused by that or > plays games with cpus_allowed just to stay on a particular cpu. > > I know you don't like it because it imposes restrictions on schedulability, > but disabling preemption for a long time or playing cpus_allowed games has > the same and worse effects. There really isn't much code that plays games with cpus_allowed. And code that disabled preemption for a long time can surely be fixed, we've done so many a time. A pure migrate_disable() (without wrapping lock) is a straight up nightmare, you can end up with a gazillion runnable tasks stuck to one CPU and all other CPUs with their thumbs up their arses waiting for work. The moment you stick migrate_disable() in a lock, you at least get PI to bail you out of some of the pain (but by no means all of it). Adding migrate_disable() to the kernel is really just moving pain around, it doesn't solve anything much.
Re: [RFC PATCH] kernel: sched: Provide a pointer to the valid CPU mask
On Thu, 6 Apr 2017, Ingo Molnar wrote: > CPU hotplug and changing the affinity mask are the more complex cases, > because > there migrating or not migrating is a correctness issue: > > - CPU hotplug has to be aware of this anyway, regardless of whether it's > solved >via a counter of the affinity mask. You have to prevent CPU hotplug simply as long as there are migration disabled tasks on the fly. Making that depend on whether they are on a CPU which is about to be unplugged or not would be complete overkill as you still have to solve the case that a task sets the migrate_disable() AFTER the cpu down machinery started. > - Changing the affinity mask (set_cpus_allowed()) has two main cases: >the synchronous and asynchronous case: > > - synchronous is when the current task changes its own affinity mask, > this >should work fine mostly out of box, as we don't call > set_cpus_allowed() >from inside migration disabled regions. (We can enforce this via a >debugging check.) > > - The asynchronous case is when the affinity task of some other task is >changed - this would not have an immediate effect with > migration-disabled >logic, the migration would be delayed to when migration is re-enabled >again. > > As for general fragility, is there any reason why a simple debugging check in > set_task_cpu() would not catch most mishaps: > > WARN_ON_ONCE(p->state != TASK_RUNNING && p->migration_disabled); > > ... or something like that? > > I.e. my point is that I think using a counter would be much simpler, yet > still as > robust and maintainable. We could in fact move migrate_disable()/enable() > upstream > straight away and eliminate this small fork of functionality between mainline > and > -rt. The counter alone might be enough for the scheduler placement decisions, but it cannot solve the hotplug issue. You still need something like I sketched out in my previous reply. Thanks, tglx
Re: [RFC PATCH] kernel: sched: Provide a pointer to the valid CPU mask
On Thu, 6 Apr 2017, Ingo Molnar wrote: > CPU hotplug and changing the affinity mask are the more complex cases, > because > there migrating or not migrating is a correctness issue: > > - CPU hotplug has to be aware of this anyway, regardless of whether it's > solved >via a counter of the affinity mask. You have to prevent CPU hotplug simply as long as there are migration disabled tasks on the fly. Making that depend on whether they are on a CPU which is about to be unplugged or not would be complete overkill as you still have to solve the case that a task sets the migrate_disable() AFTER the cpu down machinery started. > - Changing the affinity mask (set_cpus_allowed()) has two main cases: >the synchronous and asynchronous case: > > - synchronous is when the current task changes its own affinity mask, > this >should work fine mostly out of box, as we don't call > set_cpus_allowed() >from inside migration disabled regions. (We can enforce this via a >debugging check.) > > - The asynchronous case is when the affinity task of some other task is >changed - this would not have an immediate effect with > migration-disabled >logic, the migration would be delayed to when migration is re-enabled >again. > > As for general fragility, is there any reason why a simple debugging check in > set_task_cpu() would not catch most mishaps: > > WARN_ON_ONCE(p->state != TASK_RUNNING && p->migration_disabled); > > ... or something like that? > > I.e. my point is that I think using a counter would be much simpler, yet > still as > robust and maintainable. We could in fact move migrate_disable()/enable() > upstream > straight away and eliminate this small fork of functionality between mainline > and > -rt. The counter alone might be enough for the scheduler placement decisions, but it cannot solve the hotplug issue. You still need something like I sketched out in my previous reply. Thanks, tglx
Re: [RFC PATCH] kernel: sched: Provide a pointer to the valid CPU mask
On Thu, 6 Apr 2017, Peter Zijlstra wrote: > On Thu, Apr 06, 2017 at 12:47:21PM +0200, Thomas Gleixner wrote: > > On Thu, 6 Apr 2017, Peter Zijlstra wrote: > > > > IIRC MIPS has a case where only 1 in N cores has an FPU. And once a task > > > uses FPU, it gets affined to the core that has one or something like > > > that. > > > > > > Of course, nothing then stops someone else breaking that affinity. But I > > > suspect it will simply fault on the next FPU instruction and 'reset' the > > > mask or something. I've no clue and no real desire to know. > > > > It does nasty games with it's own storage of p->thread.user_cpus_allowed > > and a fully seperate implementation of sys_sched_set|getaffinity. > > > > Plus a magic trap handler which forces the thread to a CPU with FPU when > > the user_cpus_allowed mask intersects with the cpus_with_fpu_mask... > > > > Magic crap, which could all be replaced by a simple function in the > > scheduler which allows to push a task to a FPU CPU and then disable > > migration. > > If its even halfway coherent, I'd much rather let it stay where it it. > > I really want to limit migrate_disable() to PREEMPT_RT=y where its used > to preserve spinlock semantics and not allow random other > migrate_disable() usage in the kernel. > > Also note, that per the above, it can actually migrate to any core that > has an FPU on, so its not a good match for migrate_disable() in any > case. Fair enough, but I prefer to have the ability for a temporary restriction of the user space visible cpus allowed mask in the core code, which can be used for both migrate_disable() and things like that MIPS FPU stuff rather than all those home brewn hackeries which are prone to bitrot, security issues and subtle wreckage. cpus_allowed should be solely under sched core control and not accessible from anything outside. Thanks, tglx
Re: [RFC PATCH] kernel: sched: Provide a pointer to the valid CPU mask
On Thu, 6 Apr 2017, Peter Zijlstra wrote: > On Thu, Apr 06, 2017 at 12:47:21PM +0200, Thomas Gleixner wrote: > > On Thu, 6 Apr 2017, Peter Zijlstra wrote: > > > > IIRC MIPS has a case where only 1 in N cores has an FPU. And once a task > > > uses FPU, it gets affined to the core that has one or something like > > > that. > > > > > > Of course, nothing then stops someone else breaking that affinity. But I > > > suspect it will simply fault on the next FPU instruction and 'reset' the > > > mask or something. I've no clue and no real desire to know. > > > > It does nasty games with it's own storage of p->thread.user_cpus_allowed > > and a fully seperate implementation of sys_sched_set|getaffinity. > > > > Plus a magic trap handler which forces the thread to a CPU with FPU when > > the user_cpus_allowed mask intersects with the cpus_with_fpu_mask... > > > > Magic crap, which could all be replaced by a simple function in the > > scheduler which allows to push a task to a FPU CPU and then disable > > migration. > > If its even halfway coherent, I'd much rather let it stay where it it. > > I really want to limit migrate_disable() to PREEMPT_RT=y where its used > to preserve spinlock semantics and not allow random other > migrate_disable() usage in the kernel. > > Also note, that per the above, it can actually migrate to any core that > has an FPU on, so its not a good match for migrate_disable() in any > case. Fair enough, but I prefer to have the ability for a temporary restriction of the user space visible cpus allowed mask in the core code, which can be used for both migrate_disable() and things like that MIPS FPU stuff rather than all those home brewn hackeries which are prone to bitrot, security issues and subtle wreckage. cpus_allowed should be solely under sched core control and not accessible from anything outside. Thanks, tglx
Re: [RFC PATCH] kernel: sched: Provide a pointer to the valid CPU mask
* Thomas Gleixnerwrote: > On Thu, 6 Apr 2017, Ingo Molnar wrote: > > > Sorry if this is a back and forth - I was somehow convinced that we do need > > to > > frob the cpus_allowed mask to get this functionality - but in hindsight I > > think the counter should be enough. > > > > I.e. just have a counter and these two APIs: > > > > static inline void migrate_disable(void) > > { > > current->migration_disabled++; > > } > > > > ... > > > > static inline void migrate_enable(void) > > { > > current->migration_disabled--; > > } > > > > ... and make sure the scheduler migration code plus the CPU hotplug code > > considers > > the counter. > > We tried that some time ago, but there are a lot of places in the scheduler > which just rely on the cpus_allowed_mask, so we have to chase all of them and > make sure that new users do not ignore that counter. That's why we chose the > cpus_allowed_mask approach. And I still think that's the proper thing to do. But but ... The number of places in the scheduler where we actually end up migrating a task is pretty limited: try_to_wake_up(): - main wakeup code migrate_swap(): - active NUMA-balancing feature move_queued_task(): - hotplug CPU-down migration - changing the affinity mask The wakeup and NUMA balancing case is trivial to solve: it's an optimization and we can skip the migration if migration is disabled. CPU hotplug and changing the affinity mask are the more complex cases, because there migrating or not migrating is a correctness issue: - CPU hotplug has to be aware of this anyway, regardless of whether it's solved via a counter of the affinity mask. - Changing the affinity mask (set_cpus_allowed()) has two main cases: the synchronous and asynchronous case: - synchronous is when the current task changes its own affinity mask, this should work fine mostly out of box, as we don't call set_cpus_allowed() from inside migration disabled regions. (We can enforce this via a debugging check.) - The asynchronous case is when the affinity task of some other task is changed - this would not have an immediate effect with migration-disabled logic, the migration would be delayed to when migration is re-enabled again. As for general fragility, is there any reason why a simple debugging check in set_task_cpu() would not catch most mishaps: WARN_ON_ONCE(p->state != TASK_RUNNING && p->migration_disabled); ... or something like that? I.e. my point is that I think using a counter would be much simpler, yet still as robust and maintainable. We could in fact move migrate_disable()/enable() upstream straight away and eliminate this small fork of functionality between mainline and -rt. Thanks, Ingo
Re: [RFC PATCH] kernel: sched: Provide a pointer to the valid CPU mask
* Thomas Gleixner wrote: > On Thu, 6 Apr 2017, Ingo Molnar wrote: > > > Sorry if this is a back and forth - I was somehow convinced that we do need > > to > > frob the cpus_allowed mask to get this functionality - but in hindsight I > > think the counter should be enough. > > > > I.e. just have a counter and these two APIs: > > > > static inline void migrate_disable(void) > > { > > current->migration_disabled++; > > } > > > > ... > > > > static inline void migrate_enable(void) > > { > > current->migration_disabled--; > > } > > > > ... and make sure the scheduler migration code plus the CPU hotplug code > > considers > > the counter. > > We tried that some time ago, but there are a lot of places in the scheduler > which just rely on the cpus_allowed_mask, so we have to chase all of them and > make sure that new users do not ignore that counter. That's why we chose the > cpus_allowed_mask approach. And I still think that's the proper thing to do. But but ... The number of places in the scheduler where we actually end up migrating a task is pretty limited: try_to_wake_up(): - main wakeup code migrate_swap(): - active NUMA-balancing feature move_queued_task(): - hotplug CPU-down migration - changing the affinity mask The wakeup and NUMA balancing case is trivial to solve: it's an optimization and we can skip the migration if migration is disabled. CPU hotplug and changing the affinity mask are the more complex cases, because there migrating or not migrating is a correctness issue: - CPU hotplug has to be aware of this anyway, regardless of whether it's solved via a counter of the affinity mask. - Changing the affinity mask (set_cpus_allowed()) has two main cases: the synchronous and asynchronous case: - synchronous is when the current task changes its own affinity mask, this should work fine mostly out of box, as we don't call set_cpus_allowed() from inside migration disabled regions. (We can enforce this via a debugging check.) - The asynchronous case is when the affinity task of some other task is changed - this would not have an immediate effect with migration-disabled logic, the migration would be delayed to when migration is re-enabled again. As for general fragility, is there any reason why a simple debugging check in set_task_cpu() would not catch most mishaps: WARN_ON_ONCE(p->state != TASK_RUNNING && p->migration_disabled); ... or something like that? I.e. my point is that I think using a counter would be much simpler, yet still as robust and maintainable. We could in fact move migrate_disable()/enable() upstream straight away and eliminate this small fork of functionality between mainline and -rt. Thanks, Ingo
Re: [RFC PATCH] kernel: sched: Provide a pointer to the valid CPU mask
On Thu, 6 Apr 2017, Peter Zijlstra wrote: > On Thu, Apr 06, 2017 at 11:25:42AM +0200, Sebastian Andrzej Siewior wrote: > > > I.e. just have a counter and these two APIs: > > > > > > static inline void migrate_disable(void) > > > { > > > current->migration_disabled++; > > plus > > if (current->migration_disabled == 1) > > get_online_cpus() > > You want that other hotplug fast hack crap you have in rt, no? As I said in the other mail, we certainly want to make get_online_cpus() fast independently of RT. That's on our todo list anyway. > Because mainline will not get anything other than prempt_disable() for > migrate_disable(). I'm not yet convinced of that. There is enough stuff which pointlessly disables preemption and works around the restrictions caused by that or plays games with cpus_allowed just to stay on a particular cpu. I know you don't like it because it imposes restrictions on schedulability, but disabling preemption for a long time or playing cpus_allowed games has the same and worse effects. Thanks, tglx
Re: [RFC PATCH] kernel: sched: Provide a pointer to the valid CPU mask
On Thu, 6 Apr 2017, Peter Zijlstra wrote: > On Thu, Apr 06, 2017 at 11:25:42AM +0200, Sebastian Andrzej Siewior wrote: > > > I.e. just have a counter and these two APIs: > > > > > > static inline void migrate_disable(void) > > > { > > > current->migration_disabled++; > > plus > > if (current->migration_disabled == 1) > > get_online_cpus() > > You want that other hotplug fast hack crap you have in rt, no? As I said in the other mail, we certainly want to make get_online_cpus() fast independently of RT. That's on our todo list anyway. > Because mainline will not get anything other than prempt_disable() for > migrate_disable(). I'm not yet convinced of that. There is enough stuff which pointlessly disables preemption and works around the restrictions caused by that or plays games with cpus_allowed just to stay on a particular cpu. I know you don't like it because it imposes restrictions on schedulability, but disabling preemption for a long time or playing cpus_allowed games has the same and worse effects. Thanks, tglx
Re: [RFC PATCH] kernel: sched: Provide a pointer to the valid CPU mask
On Thu, Apr 06, 2017 at 12:47:21PM +0200, Thomas Gleixner wrote: > On Thu, 6 Apr 2017, Peter Zijlstra wrote: > > IIRC MIPS has a case where only 1 in N cores has an FPU. And once a task > > uses FPU, it gets affined to the core that has one or something like > > that. > > > > Of course, nothing then stops someone else breaking that affinity. But I > > suspect it will simply fault on the next FPU instruction and 'reset' the > > mask or something. I've no clue and no real desire to know. > > It does nasty games with it's own storage of p->thread.user_cpus_allowed > and a fully seperate implementation of sys_sched_set|getaffinity. > > Plus a magic trap handler which forces the thread to a CPU with FPU when > the user_cpus_allowed mask intersects with the cpus_with_fpu_mask... > > Magic crap, which could all be replaced by a simple function in the > scheduler which allows to push a task to a FPU CPU and then disable > migration. If its even halfway coherent, I'd much rather let it stay where it it. I really want to limit migrate_disable() to PREEMPT_RT=y where its used to preserve spinlock semantics and not allow random other migrate_disable() usage in the kernel. Also note, that per the above, it can actually migrate to any core that has an FPU on, so its not a good match for migrate_disable() in any case.
Re: [RFC PATCH] kernel: sched: Provide a pointer to the valid CPU mask
On Thu, Apr 06, 2017 at 12:47:21PM +0200, Thomas Gleixner wrote: > On Thu, 6 Apr 2017, Peter Zijlstra wrote: > > IIRC MIPS has a case where only 1 in N cores has an FPU. And once a task > > uses FPU, it gets affined to the core that has one or something like > > that. > > > > Of course, nothing then stops someone else breaking that affinity. But I > > suspect it will simply fault on the next FPU instruction and 'reset' the > > mask or something. I've no clue and no real desire to know. > > It does nasty games with it's own storage of p->thread.user_cpus_allowed > and a fully seperate implementation of sys_sched_set|getaffinity. > > Plus a magic trap handler which forces the thread to a CPU with FPU when > the user_cpus_allowed mask intersects with the cpus_with_fpu_mask... > > Magic crap, which could all be replaced by a simple function in the > scheduler which allows to push a task to a FPU CPU and then disable > migration. If its even halfway coherent, I'd much rather let it stay where it it. I really want to limit migrate_disable() to PREEMPT_RT=y where its used to preserve spinlock semantics and not allow random other migrate_disable() usage in the kernel. Also note, that per the above, it can actually migrate to any core that has an FPU on, so its not a good match for migrate_disable() in any case.
Re: [RFC PATCH] kernel: sched: Provide a pointer to the valid CPU mask
On Thu, 6 Apr 2017, Peter Zijlstra wrote: > On Thu, Apr 06, 2017 at 11:46:33AM +0200, Sebastian Andrzej Siewior wrote: > > On 2017-04-06 11:32:24 [+0200], Peter Zijlstra wrote: > > > On Tue, Apr 04, 2017 at 08:42:02PM +0200, Sebastian Andrzej Siewior wrote: > > > > While converting the existing users I tried to stick with the rules > > > > above however… well mostly CPUFREQ tries to temporary switch the CPU > > > > mask to do something on a certain CPU and then switches the mask back it > > > > its original value. > > > > > > > > > There's a bunch of that through ancient and rotten parts of the kernel. > > > All those sites are broken. > > > > > > Nothing stops userspace from setting a different affinity right after > > > the kernel does for those threads. > > > > Good. So you are saying I should convert them to something like > > queue_work_on()? > > Not sure; iirc there were a few variants. Some can indeed simply do > queue_work_on() and possibly wait for completion. some should maybe be a > per-cpu kthread, others will be more 'interesting'. > > IIRC MIPS has a case where only 1 in N cores has an FPU. And once a task > uses FPU, it gets affined to the core that has one or something like > that. > > Of course, nothing then stops someone else breaking that affinity. But I > suspect it will simply fault on the next FPU instruction and 'reset' the > mask or something. I've no clue and no real desire to know. It does nasty games with it's own storage of p->thread.user_cpus_allowed and a fully seperate implementation of sys_sched_set|getaffinity. Plus a magic trap handler which forces the thread to a CPU with FPU when the user_cpus_allowed mask intersects with the cpus_with_fpu_mask... Magic crap, which could all be replaced by a simple function in the scheduler which allows to push a task to a FPU CPU and then disable migration. Thanks, tglx
Re: [RFC PATCH] kernel: sched: Provide a pointer to the valid CPU mask
On Thu, 6 Apr 2017, Peter Zijlstra wrote: > On Thu, Apr 06, 2017 at 11:46:33AM +0200, Sebastian Andrzej Siewior wrote: > > On 2017-04-06 11:32:24 [+0200], Peter Zijlstra wrote: > > > On Tue, Apr 04, 2017 at 08:42:02PM +0200, Sebastian Andrzej Siewior wrote: > > > > While converting the existing users I tried to stick with the rules > > > > above however… well mostly CPUFREQ tries to temporary switch the CPU > > > > mask to do something on a certain CPU and then switches the mask back it > > > > its original value. > > > > > > > > > There's a bunch of that through ancient and rotten parts of the kernel. > > > All those sites are broken. > > > > > > Nothing stops userspace from setting a different affinity right after > > > the kernel does for those threads. > > > > Good. So you are saying I should convert them to something like > > queue_work_on()? > > Not sure; iirc there were a few variants. Some can indeed simply do > queue_work_on() and possibly wait for completion. some should maybe be a > per-cpu kthread, others will be more 'interesting'. > > IIRC MIPS has a case where only 1 in N cores has an FPU. And once a task > uses FPU, it gets affined to the core that has one or something like > that. > > Of course, nothing then stops someone else breaking that affinity. But I > suspect it will simply fault on the next FPU instruction and 'reset' the > mask or something. I've no clue and no real desire to know. It does nasty games with it's own storage of p->thread.user_cpus_allowed and a fully seperate implementation of sys_sched_set|getaffinity. Plus a magic trap handler which forces the thread to a CPU with FPU when the user_cpus_allowed mask intersects with the cpus_with_fpu_mask... Magic crap, which could all be replaced by a simple function in the scheduler which allows to push a task to a FPU CPU and then disable migration. Thanks, tglx
Re: [RFC PATCH] kernel: sched: Provide a pointer to the valid CPU mask
On Thu, 6 Apr 2017, Ingo Molnar wrote: > Sorry if this is a back and forth - I was somehow convinced that we do need > to > frob the cpus_allowed mask to get this functionality - but in hindsight I > think > the counter should be enough. > > I.e. just have a counter and these two APIs: > > static inline void migrate_disable(void) > { > current->migration_disabled++; > } > > ... > > static inline void migrate_enable(void) > { > current->migration_disabled--; > } > > ... and make sure the scheduler migration code plus the CPU hotplug code > considers > the counter. We tried that some time ago, but there are a lot of places in the scheduler which just rely on the cpus_allowed_mask, so we have to chase all of them and make sure that new users do not ignore that counter. That's why we chose the cpus_allowed_mask approach. And I still think that's the proper thing to do. Also this still requires magic in the context switch where we need to transport that information to the CPU, so it can't go away. CPU hotplug has other issues. If migration is disabled, then blocking out the hotplug code is probably easy to do, but you need to undo the blocking when the CPU is not longer pinned, i.e. if the last task enabled migration again. To make that all less horrible, we need to fix the stupid nested usage of get_online_cpus() first. That'd allow us to convert the hotplug locking into a percpu_rwsem, which is desired anyway as it makes get_online_cpus() cheap. So then the migration stuff becomes: void migrate_disable(void) { if (in_atomic() || irqs_disabled()) return; if (!current->migration_disabled) { percpu_down_read_preempt_disable(hotplug_rwsem); current->migration_disabled++; preempt_enable(); } else { current->migration_disabled++; } } void migrate_enable(void) { if (in_atomic() || irqs_disabled()) return; if (current->migration_disabled == 1) { preempt_disable(); current->migration_disabled--; percpu_up_read_preempt_enable(hotplug_rwsem); } else { current->migration_disabled--; } } That want's to have some debug extras: see the current RT version of it. Coming back to the cpus_allowed mask. Most of the cpus_allowed usage outside of the scheduler is broken and we really should clean that up now and then restrict access to cpus_allowed. The easy ones (and the majority) are those: cpumask_copy(saved_mask, >cpus_allowed); set_cpus_allowed_ptr(current, cpumask_of(XXX)); do_something(); set_cpus_allowed_ptr(current, saved_mask); That code is not at all protected against user space changing affinity or cpus going down. We better do that now before doing anything about cpus_allowed in the core code. Thanks, tglx
Re: [RFC PATCH] kernel: sched: Provide a pointer to the valid CPU mask
On Thu, 6 Apr 2017, Ingo Molnar wrote: > Sorry if this is a back and forth - I was somehow convinced that we do need > to > frob the cpus_allowed mask to get this functionality - but in hindsight I > think > the counter should be enough. > > I.e. just have a counter and these two APIs: > > static inline void migrate_disable(void) > { > current->migration_disabled++; > } > > ... > > static inline void migrate_enable(void) > { > current->migration_disabled--; > } > > ... and make sure the scheduler migration code plus the CPU hotplug code > considers > the counter. We tried that some time ago, but there are a lot of places in the scheduler which just rely on the cpus_allowed_mask, so we have to chase all of them and make sure that new users do not ignore that counter. That's why we chose the cpus_allowed_mask approach. And I still think that's the proper thing to do. Also this still requires magic in the context switch where we need to transport that information to the CPU, so it can't go away. CPU hotplug has other issues. If migration is disabled, then blocking out the hotplug code is probably easy to do, but you need to undo the blocking when the CPU is not longer pinned, i.e. if the last task enabled migration again. To make that all less horrible, we need to fix the stupid nested usage of get_online_cpus() first. That'd allow us to convert the hotplug locking into a percpu_rwsem, which is desired anyway as it makes get_online_cpus() cheap. So then the migration stuff becomes: void migrate_disable(void) { if (in_atomic() || irqs_disabled()) return; if (!current->migration_disabled) { percpu_down_read_preempt_disable(hotplug_rwsem); current->migration_disabled++; preempt_enable(); } else { current->migration_disabled++; } } void migrate_enable(void) { if (in_atomic() || irqs_disabled()) return; if (current->migration_disabled == 1) { preempt_disable(); current->migration_disabled--; percpu_up_read_preempt_enable(hotplug_rwsem); } else { current->migration_disabled--; } } That want's to have some debug extras: see the current RT version of it. Coming back to the cpus_allowed mask. Most of the cpus_allowed usage outside of the scheduler is broken and we really should clean that up now and then restrict access to cpus_allowed. The easy ones (and the majority) are those: cpumask_copy(saved_mask, >cpus_allowed); set_cpus_allowed_ptr(current, cpumask_of(XXX)); do_something(); set_cpus_allowed_ptr(current, saved_mask); That code is not at all protected against user space changing affinity or cpus going down. We better do that now before doing anything about cpus_allowed in the core code. Thanks, tglx
Re: [RFC PATCH] kernel: sched: Provide a pointer to the valid CPU mask
On Thu, Apr 06, 2017 at 11:46:33AM +0200, Sebastian Andrzej Siewior wrote: > On 2017-04-06 11:32:24 [+0200], Peter Zijlstra wrote: > > On Tue, Apr 04, 2017 at 08:42:02PM +0200, Sebastian Andrzej Siewior wrote: > > > While converting the existing users I tried to stick with the rules > > > above however… well mostly CPUFREQ tries to temporary switch the CPU > > > mask to do something on a certain CPU and then switches the mask back it > > > its original value. > > > > > > There's a bunch of that through ancient and rotten parts of the kernel. > > All those sites are broken. > > > > Nothing stops userspace from setting a different affinity right after > > the kernel does for those threads. > > Good. So you are saying I should convert them to something like > queue_work_on()? Not sure; iirc there were a few variants. Some can indeed simply do queue_work_on() and possibly wait for completion. some should maybe be a per-cpu kthread, others will be more 'interesting'. IIRC MIPS has a case where only 1 in N cores has an FPU. And once a task uses FPU, it gets affined to the core that has one or something like that. Of course, nothing then stops someone else breaking that affinity. But I suspect it will simply fault on the next FPU instruction and 'reset' the mask or something. I've no clue and no real desire to know.
Re: [RFC PATCH] kernel: sched: Provide a pointer to the valid CPU mask
On Thu, Apr 06, 2017 at 11:46:33AM +0200, Sebastian Andrzej Siewior wrote: > On 2017-04-06 11:32:24 [+0200], Peter Zijlstra wrote: > > On Tue, Apr 04, 2017 at 08:42:02PM +0200, Sebastian Andrzej Siewior wrote: > > > While converting the existing users I tried to stick with the rules > > > above however… well mostly CPUFREQ tries to temporary switch the CPU > > > mask to do something on a certain CPU and then switches the mask back it > > > its original value. > > > > > > There's a bunch of that through ancient and rotten parts of the kernel. > > All those sites are broken. > > > > Nothing stops userspace from setting a different affinity right after > > the kernel does for those threads. > > Good. So you are saying I should convert them to something like > queue_work_on()? Not sure; iirc there were a few variants. Some can indeed simply do queue_work_on() and possibly wait for completion. some should maybe be a per-cpu kthread, others will be more 'interesting'. IIRC MIPS has a case where only 1 in N cores has an FPU. And once a task uses FPU, it gets affined to the core that has one or something like that. Of course, nothing then stops someone else breaking that affinity. But I suspect it will simply fault on the next FPU instruction and 'reset' the mask or something. I've no clue and no real desire to know.
Re: [RFC PATCH] kernel: sched: Provide a pointer to the valid CPU mask
On 2017-04-06 11:32:24 [+0200], Peter Zijlstra wrote: > On Tue, Apr 04, 2017 at 08:42:02PM +0200, Sebastian Andrzej Siewior wrote: > > While converting the existing users I tried to stick with the rules > > above however… well mostly CPUFREQ tries to temporary switch the CPU > > mask to do something on a certain CPU and then switches the mask back it > > its original value. > > > There's a bunch of that through ancient and rotten parts of the kernel. > All those sites are broken. > > Nothing stops userspace from setting a different affinity right after > the kernel does for those threads. Good. So you are saying I should convert them to something like queue_work_on()? Sebastian
Re: [RFC PATCH] kernel: sched: Provide a pointer to the valid CPU mask
On 2017-04-06 11:32:24 [+0200], Peter Zijlstra wrote: > On Tue, Apr 04, 2017 at 08:42:02PM +0200, Sebastian Andrzej Siewior wrote: > > While converting the existing users I tried to stick with the rules > > above however… well mostly CPUFREQ tries to temporary switch the CPU > > mask to do something on a certain CPU and then switches the mask back it > > its original value. > > > There's a bunch of that through ancient and rotten parts of the kernel. > All those sites are broken. > > Nothing stops userspace from setting a different affinity right after > the kernel does for those threads. Good. So you are saying I should convert them to something like queue_work_on()? Sebastian
Re: [RFC PATCH] kernel: sched: Provide a pointer to the valid CPU mask
On Thu, Apr 06, 2017 at 11:25:42AM +0200, Sebastian Andrzej Siewior wrote: > > I.e. just have a counter and these two APIs: > > > > static inline void migrate_disable(void) > > { > > current->migration_disabled++; > plus > if (current->migration_disabled == 1) > get_online_cpus() You want that other hotplug fast hack crap you have in rt, no? Because mainline will not get anything other than prempt_disable() for migrate_disable(). > > } > > > > ... > > > > static inline void migrate_enable(void) > > { > > current->migration_disabled--; > > } > > > > ... and make sure the scheduler migration code plus the CPU hotplug code > > considers > > the counter. > > So on the sched part I need go through all places where it looks at the > mask and make sure it ignores the CPU switch decision. This could be > doable. Don't sprinkle that around. Use a wrapper like say: tsk_cpus_allowed() to encapsulate that logic :-)
Re: [RFC PATCH] kernel: sched: Provide a pointer to the valid CPU mask
On Thu, Apr 06, 2017 at 11:25:42AM +0200, Sebastian Andrzej Siewior wrote: > > I.e. just have a counter and these two APIs: > > > > static inline void migrate_disable(void) > > { > > current->migration_disabled++; > plus > if (current->migration_disabled == 1) > get_online_cpus() You want that other hotplug fast hack crap you have in rt, no? Because mainline will not get anything other than prempt_disable() for migrate_disable(). > > } > > > > ... > > > > static inline void migrate_enable(void) > > { > > current->migration_disabled--; > > } > > > > ... and make sure the scheduler migration code plus the CPU hotplug code > > considers > > the counter. > > So on the sched part I need go through all places where it looks at the > mask and make sure it ignores the CPU switch decision. This could be > doable. Don't sprinkle that around. Use a wrapper like say: tsk_cpus_allowed() to encapsulate that logic :-)
Re: [RFC PATCH] kernel: sched: Provide a pointer to the valid CPU mask
On Thu, Apr 06, 2017 at 11:35:45AM +0200, Peter Zijlstra wrote: > On Thu, Apr 06, 2017 at 10:01:39AM +0200, Ingo Molnar wrote: > > I.e. just have a counter and these two APIs: > > > > static inline void migrate_disable(void) > > { > > current->migration_disabled++; > > } > > > > ... > > > > static inline void migrate_enable(void) > > { > > current->migration_disabled--; > > } > > > > ... and make sure the scheduler migration code plus the CPU hotplug code > > considers > > the counter. > > > > Would this work, and would this be the simplest all around solution? > > If you want to add expensive bits to hot paths like wakeups ... :/ Hmm, doesn't that get us back to exactly where we were: ? tsk_cpus_allowed() { if (!tsk->migration_disabled) return tsk->cpus_allowed; else return cpumask_of(task_cpu(tsk)); }
Re: [RFC PATCH] kernel: sched: Provide a pointer to the valid CPU mask
On Thu, Apr 06, 2017 at 11:35:45AM +0200, Peter Zijlstra wrote: > On Thu, Apr 06, 2017 at 10:01:39AM +0200, Ingo Molnar wrote: > > I.e. just have a counter and these two APIs: > > > > static inline void migrate_disable(void) > > { > > current->migration_disabled++; > > } > > > > ... > > > > static inline void migrate_enable(void) > > { > > current->migration_disabled--; > > } > > > > ... and make sure the scheduler migration code plus the CPU hotplug code > > considers > > the counter. > > > > Would this work, and would this be the simplest all around solution? > > If you want to add expensive bits to hot paths like wakeups ... :/ Hmm, doesn't that get us back to exactly where we were: ? tsk_cpus_allowed() { if (!tsk->migration_disabled) return tsk->cpus_allowed; else return cpumask_of(task_cpu(tsk)); }
Re: [RFC PATCH] kernel: sched: Provide a pointer to the valid CPU mask
On Thu, Apr 06, 2017 at 10:01:39AM +0200, Ingo Molnar wrote: > I.e. just have a counter and these two APIs: > > static inline void migrate_disable(void) > { > current->migration_disabled++; > } > > ... > > static inline void migrate_enable(void) > { > current->migration_disabled--; > } > > ... and make sure the scheduler migration code plus the CPU hotplug code > considers > the counter. > > Would this work, and would this be the simplest all around solution? If you want to add expensive bits to hot paths like wakeups ... :/
Re: [RFC PATCH] kernel: sched: Provide a pointer to the valid CPU mask
On Thu, Apr 06, 2017 at 10:01:39AM +0200, Ingo Molnar wrote: > I.e. just have a counter and these two APIs: > > static inline void migrate_disable(void) > { > current->migration_disabled++; > } > > ... > > static inline void migrate_enable(void) > { > current->migration_disabled--; > } > > ... and make sure the scheduler migration code plus the CPU hotplug code > considers > the counter. > > Would this work, and would this be the simplest all around solution? If you want to add expensive bits to hot paths like wakeups ... :/
Re: [RFC PATCH] kernel: sched: Provide a pointer to the valid CPU mask
On Wed, Apr 05, 2017 at 09:39:43AM +0200, Ingo Molnar wrote: > > While converting the existing users I tried to stick with the rules > > above however… well mostly CPUFREQ tries to temporary switch the CPU > > mask to do something on a certain CPU and then switches the mask back it > > its original value. So in theory `cpus_ptr' could or should be used. > > However if this is invoked in a migration disabled region (which is not > > the case because it would require something like preempt_disable() and > > set_cpus_allowed_ptr() might sleep so it can't be) then the "restore" > > part would restore the wrong mask. So it only looks strange and I go for > > the pointer… > > So maybe we could add the following facility: > > ptr = sched_migrate_to_cpu_save(cpu); > > ... > > sched_migrate_to_cpu_restore(ptr); > > ... and use it in the cpufreq code. Then -rt could simply define > migrate_disable() > to be: Argh, no We have plenty of proper per-cpu kthreads / workqueue and other APIs to do this. Not to mention that implementing the above in a non-broken way has user visible side-effects, which ugly.
Re: [RFC PATCH] kernel: sched: Provide a pointer to the valid CPU mask
On Wed, Apr 05, 2017 at 09:39:43AM +0200, Ingo Molnar wrote: > > While converting the existing users I tried to stick with the rules > > above however… well mostly CPUFREQ tries to temporary switch the CPU > > mask to do something on a certain CPU and then switches the mask back it > > its original value. So in theory `cpus_ptr' could or should be used. > > However if this is invoked in a migration disabled region (which is not > > the case because it would require something like preempt_disable() and > > set_cpus_allowed_ptr() might sleep so it can't be) then the "restore" > > part would restore the wrong mask. So it only looks strange and I go for > > the pointer… > > So maybe we could add the following facility: > > ptr = sched_migrate_to_cpu_save(cpu); > > ... > > sched_migrate_to_cpu_restore(ptr); > > ... and use it in the cpufreq code. Then -rt could simply define > migrate_disable() > to be: Argh, no We have plenty of proper per-cpu kthreads / workqueue and other APIs to do this. Not to mention that implementing the above in a non-broken way has user visible side-effects, which ugly.
Re: [RFC PATCH] kernel: sched: Provide a pointer to the valid CPU mask
On Tue, Apr 04, 2017 at 08:42:02PM +0200, Sebastian Andrzej Siewior wrote: > While converting the existing users I tried to stick with the rules > above however… well mostly CPUFREQ tries to temporary switch the CPU > mask to do something on a certain CPU and then switches the mask back it > its original value. There's a bunch of that through ancient and rotten parts of the kernel. All those sites are broken. Nothing stops userspace from setting a different affinity right after the kernel does for those threads.
Re: [RFC PATCH] kernel: sched: Provide a pointer to the valid CPU mask
On Tue, Apr 04, 2017 at 08:42:02PM +0200, Sebastian Andrzej Siewior wrote: > While converting the existing users I tried to stick with the rules > above however… well mostly CPUFREQ tries to temporary switch the CPU > mask to do something on a certain CPU and then switches the mask back it > its original value. There's a bunch of that through ancient and rotten parts of the kernel. All those sites are broken. Nothing stops userspace from setting a different affinity right after the kernel does for those threads.
Re: [RFC PATCH] kernel: sched: Provide a pointer to the valid CPU mask
On 2017-04-06 10:01:39 [+0200], Ingo Molnar wrote: > > * Sebastian Andrzej Siewiorwrote: > > > On 2017-04-06 08:16:22 [+0200], Ingo Molnar wrote: > > > > > > * Sebastian Andrzej Siewior wrote: > > > > > > > On 2017-04-05 09:39:43 [+0200], Ingo Molnar wrote: > > > > > > > > > > So maybe we could add the following facility: > > > > > > > > > > ptr = sched_migrate_to_cpu_save(cpu); > > > > > > > > > > ... > > > > > > > > > > sched_migrate_to_cpu_restore(ptr); > > > > > > BTW., and I'm sure this has come up before, but why doesn't > > > migrate_disable() use > > > a simple per task flag that the scheduler migration code takes into > > > account? > > > > we could add that. But right now there are two spots which look at the > > counter to decide whether or not migration is disabled. > > > > > It should be functionally equivalent to the current solution, and it > > > appears to > > > have a heck of a smaller cross section with the rest of the scheduler. > > > > > > I.e.: > > > > > > static inline void migrate_disable(void) > > > { > > > current->migration_disabled++; > > > } > > > > > > ... > > > > > > static inline void migrate_enable(void) > > > { > > > current->migration_disabled--; > > > } > > > > > > or so? Then add this flag as a condition to can_migrate_task() et al. > > > > > > While we generally dislike such flags as they wreck havoc with the > > > scheduler if > > > overused, the cpus_allowed based solution has the exact same effect so > > > it's not > > > like it's a step backwards - and it should also be much faster and less > > > intrusive. > > > > So you are saying that we drop the cpus_ptr + cpus_mask fields again and > > instead add a task-flag to ensure that the scheduler does not migrate > > the task to another CPU? > > Yeah - but no need to add a per-task flag if we already have a counter. > > > > Am I missing some complication? > > > > We do have the counter. We have need to ensure that the CPU is not going > > away > > while we are in a migrate_disable() region since we can be scheduled out. > > So the > > CPU can't go offline until we leave that region. > > Yeah. But it should be relatively straightforward to extend the logic that > makes > sure that a CPU does not go away from under tasks pinned to that CPU alone, > right? I used get_online_cpus() that is enough. > > #define migrate_disable() sched_migrate_to_cpu_save(-1) > > > > int sched_migrate_to_cpu_save(int cpu) > > So if we have a ->migration_disabled counter then we don't need the > sched_migrate_to_cpu_save()/restore() complication, right? correct. Unless (as you suggested) we want a migrate to specific CPU we just the function until the BUG() statement and don't need to check the CPU we are on. > Sorry if this is a back and forth - I was somehow convinced that we do need > to > frob the cpus_allowed mask to get this functionality - but in hindsight I > think > the counter should be enough. > > I.e. just have a counter and these two APIs: > > static inline void migrate_disable(void) > { > current->migration_disabled++; plus if (current->migration_disabled == 1) get_online_cpus() > } > > ... > > static inline void migrate_enable(void) > { > current->migration_disabled--; > } > > ... and make sure the scheduler migration code plus the CPU hotplug code > considers > the counter. So on the sched part I need go through all places where it looks at the mask and make sure it ignores the CPU switch decision. This could be doable. On the CPU hotplug I don't see a way around get_online_cpus(). It is essentially there to ensure a CPU does not go away. The other way around it would be to remove the CPU from the online_mask followed by going through all task left on the CPU waiting for them leave the CPU. And I remember peterz saying that he wanted to make get_online_cpus() fast and stay with that. > Would this work, and would this be the simplest all around solution? Let me try this with get_online_cpus() (as suggested) and without cpus_ptr. > Thanks, > > Ingo Sebastian
Re: [RFC PATCH] kernel: sched: Provide a pointer to the valid CPU mask
On 2017-04-06 10:01:39 [+0200], Ingo Molnar wrote: > > * Sebastian Andrzej Siewior wrote: > > > On 2017-04-06 08:16:22 [+0200], Ingo Molnar wrote: > > > > > > * Sebastian Andrzej Siewior wrote: > > > > > > > On 2017-04-05 09:39:43 [+0200], Ingo Molnar wrote: > > > > > > > > > > So maybe we could add the following facility: > > > > > > > > > > ptr = sched_migrate_to_cpu_save(cpu); > > > > > > > > > > ... > > > > > > > > > > sched_migrate_to_cpu_restore(ptr); > > > > > > BTW., and I'm sure this has come up before, but why doesn't > > > migrate_disable() use > > > a simple per task flag that the scheduler migration code takes into > > > account? > > > > we could add that. But right now there are two spots which look at the > > counter to decide whether or not migration is disabled. > > > > > It should be functionally equivalent to the current solution, and it > > > appears to > > > have a heck of a smaller cross section with the rest of the scheduler. > > > > > > I.e.: > > > > > > static inline void migrate_disable(void) > > > { > > > current->migration_disabled++; > > > } > > > > > > ... > > > > > > static inline void migrate_enable(void) > > > { > > > current->migration_disabled--; > > > } > > > > > > or so? Then add this flag as a condition to can_migrate_task() et al. > > > > > > While we generally dislike such flags as they wreck havoc with the > > > scheduler if > > > overused, the cpus_allowed based solution has the exact same effect so > > > it's not > > > like it's a step backwards - and it should also be much faster and less > > > intrusive. > > > > So you are saying that we drop the cpus_ptr + cpus_mask fields again and > > instead add a task-flag to ensure that the scheduler does not migrate > > the task to another CPU? > > Yeah - but no need to add a per-task flag if we already have a counter. > > > > Am I missing some complication? > > > > We do have the counter. We have need to ensure that the CPU is not going > > away > > while we are in a migrate_disable() region since we can be scheduled out. > > So the > > CPU can't go offline until we leave that region. > > Yeah. But it should be relatively straightforward to extend the logic that > makes > sure that a CPU does not go away from under tasks pinned to that CPU alone, > right? I used get_online_cpus() that is enough. > > #define migrate_disable() sched_migrate_to_cpu_save(-1) > > > > int sched_migrate_to_cpu_save(int cpu) > > So if we have a ->migration_disabled counter then we don't need the > sched_migrate_to_cpu_save()/restore() complication, right? correct. Unless (as you suggested) we want a migrate to specific CPU we just the function until the BUG() statement and don't need to check the CPU we are on. > Sorry if this is a back and forth - I was somehow convinced that we do need > to > frob the cpus_allowed mask to get this functionality - but in hindsight I > think > the counter should be enough. > > I.e. just have a counter and these two APIs: > > static inline void migrate_disable(void) > { > current->migration_disabled++; plus if (current->migration_disabled == 1) get_online_cpus() > } > > ... > > static inline void migrate_enable(void) > { > current->migration_disabled--; > } > > ... and make sure the scheduler migration code plus the CPU hotplug code > considers > the counter. So on the sched part I need go through all places where it looks at the mask and make sure it ignores the CPU switch decision. This could be doable. On the CPU hotplug I don't see a way around get_online_cpus(). It is essentially there to ensure a CPU does not go away. The other way around it would be to remove the CPU from the online_mask followed by going through all task left on the CPU waiting for them leave the CPU. And I remember peterz saying that he wanted to make get_online_cpus() fast and stay with that. > Would this work, and would this be the simplest all around solution? Let me try this with get_online_cpus() (as suggested) and without cpus_ptr. > Thanks, > > Ingo Sebastian
Re: [RFC PATCH] kernel: sched: Provide a pointer to the valid CPU mask
* Sebastian Andrzej Siewiorwrote: > On 2017-04-06 08:16:22 [+0200], Ingo Molnar wrote: > > > > * Sebastian Andrzej Siewior wrote: > > > > > On 2017-04-05 09:39:43 [+0200], Ingo Molnar wrote: > > > > > > > > So maybe we could add the following facility: > > > > > > > > ptr = sched_migrate_to_cpu_save(cpu); > > > > > > > > ... > > > > > > > > sched_migrate_to_cpu_restore(ptr); > > > > BTW., and I'm sure this has come up before, but why doesn't > > migrate_disable() use > > a simple per task flag that the scheduler migration code takes into account? > > we could add that. But right now there are two spots which look at the > counter to decide whether or not migration is disabled. > > > It should be functionally equivalent to the current solution, and it > > appears to > > have a heck of a smaller cross section with the rest of the scheduler. > > > > I.e.: > > > > static inline void migrate_disable(void) > > { > > current->migration_disabled++; > > } > > > > ... > > > > static inline void migrate_enable(void) > > { > > current->migration_disabled--; > > } > > > > or so? Then add this flag as a condition to can_migrate_task() et al. > > > > While we generally dislike such flags as they wreck havoc with the > > scheduler if > > overused, the cpus_allowed based solution has the exact same effect so it's > > not > > like it's a step backwards - and it should also be much faster and less > > intrusive. > > So you are saying that we drop the cpus_ptr + cpus_mask fields again and > instead add a task-flag to ensure that the scheduler does not migrate > the task to another CPU? Yeah - but no need to add a per-task flag if we already have a counter. > > Am I missing some complication? > > We do have the counter. We have need to ensure that the CPU is not going away > while we are in a migrate_disable() region since we can be scheduled out. So > the > CPU can't go offline until we leave that region. Yeah. But it should be relatively straightforward to extend the logic that makes sure that a CPU does not go away from under tasks pinned to that CPU alone, right? > #define migrate_disable() sched_migrate_to_cpu_save(-1) > > int sched_migrate_to_cpu_save(int cpu) So if we have a ->migration_disabled counter then we don't need the sched_migrate_to_cpu_save()/restore() complication, right? Sorry if this is a back and forth - I was somehow convinced that we do need to frob the cpus_allowed mask to get this functionality - but in hindsight I think the counter should be enough. I.e. just have a counter and these two APIs: static inline void migrate_disable(void) { current->migration_disabled++; } ... static inline void migrate_enable(void) { current->migration_disabled--; } ... and make sure the scheduler migration code plus the CPU hotplug code considers the counter. Would this work, and would this be the simplest all around solution? Thanks, Ingo
Re: [RFC PATCH] kernel: sched: Provide a pointer to the valid CPU mask
* Sebastian Andrzej Siewior wrote: > On 2017-04-06 08:16:22 [+0200], Ingo Molnar wrote: > > > > * Sebastian Andrzej Siewior wrote: > > > > > On 2017-04-05 09:39:43 [+0200], Ingo Molnar wrote: > > > > > > > > So maybe we could add the following facility: > > > > > > > > ptr = sched_migrate_to_cpu_save(cpu); > > > > > > > > ... > > > > > > > > sched_migrate_to_cpu_restore(ptr); > > > > BTW., and I'm sure this has come up before, but why doesn't > > migrate_disable() use > > a simple per task flag that the scheduler migration code takes into account? > > we could add that. But right now there are two spots which look at the > counter to decide whether or not migration is disabled. > > > It should be functionally equivalent to the current solution, and it > > appears to > > have a heck of a smaller cross section with the rest of the scheduler. > > > > I.e.: > > > > static inline void migrate_disable(void) > > { > > current->migration_disabled++; > > } > > > > ... > > > > static inline void migrate_enable(void) > > { > > current->migration_disabled--; > > } > > > > or so? Then add this flag as a condition to can_migrate_task() et al. > > > > While we generally dislike such flags as they wreck havoc with the > > scheduler if > > overused, the cpus_allowed based solution has the exact same effect so it's > > not > > like it's a step backwards - and it should also be much faster and less > > intrusive. > > So you are saying that we drop the cpus_ptr + cpus_mask fields again and > instead add a task-flag to ensure that the scheduler does not migrate > the task to another CPU? Yeah - but no need to add a per-task flag if we already have a counter. > > Am I missing some complication? > > We do have the counter. We have need to ensure that the CPU is not going away > while we are in a migrate_disable() region since we can be scheduled out. So > the > CPU can't go offline until we leave that region. Yeah. But it should be relatively straightforward to extend the logic that makes sure that a CPU does not go away from under tasks pinned to that CPU alone, right? > #define migrate_disable() sched_migrate_to_cpu_save(-1) > > int sched_migrate_to_cpu_save(int cpu) So if we have a ->migration_disabled counter then we don't need the sched_migrate_to_cpu_save()/restore() complication, right? Sorry if this is a back and forth - I was somehow convinced that we do need to frob the cpus_allowed mask to get this functionality - but in hindsight I think the counter should be enough. I.e. just have a counter and these two APIs: static inline void migrate_disable(void) { current->migration_disabled++; } ... static inline void migrate_enable(void) { current->migration_disabled--; } ... and make sure the scheduler migration code plus the CPU hotplug code considers the counter. Would this work, and would this be the simplest all around solution? Thanks, Ingo
Re: [RFC PATCH] kernel: sched: Provide a pointer to the valid CPU mask
On 2017-04-06 08:16:22 [+0200], Ingo Molnar wrote: > > * Sebastian Andrzej Siewiorwrote: > > > On 2017-04-05 09:39:43 [+0200], Ingo Molnar wrote: > > > > > > So maybe we could add the following facility: > > > > > > ptr = sched_migrate_to_cpu_save(cpu); > > > > > > ... > > > > > > sched_migrate_to_cpu_restore(ptr); > > BTW., and I'm sure this has come up before, but why doesn't migrate_disable() > use > a simple per task flag that the scheduler migration code takes into account? we could add that. But right now there are two spots which look at the counter to decide whether or not migration is disabled. > It should be functionally equivalent to the current solution, and it appears > to > have a heck of a smaller cross section with the rest of the scheduler. > > I.e.: > > static inline void migrate_disable(void) > { > current->migration_disabled++; > } > > ... > > static inline void migrate_enable(void) > { > current->migration_disabled--; > } > > or so? Then add this flag as a condition to can_migrate_task() et al. > > While we generally dislike such flags as they wreck havoc with the scheduler > if > overused, the cpus_allowed based solution has the exact same effect so it's > not > like it's a step backwards - and it should also be much faster and less > intrusive. So you are saying that we drop the cpus_ptr + cpus_mask fields again and instead add a task-flag to ensure that the scheduler does not migrate the task to another CPU? > Am I missing some complication? We do have the counter. We have need to ensure that the CPU is not going away while we are in a migrate_disable() region since we can be scheduled out. So the CPU can't go offline until we leave that region. This version uses get_online_cpus() while in -RT we have something called "pin_current_cpu()". This is a lightweight version of get_online_cpus() which should go away… Right now I have this and I need to test this and complete CPU hop part: #define migrate_disable() sched_migrate_to_cpu_save(-1) int sched_migrate_to_cpu_save(int cpu) { struct task_struct *p = current; if (in_atomic()) { #ifdef CONFIG_SCHED_DEBUG p->migrate_disable_atomic++; if (cpu >= 0) WARN_ON_ONCE(!cpumask_equal(p->cpus_ptr, cpumask_of(cpu))); #endif return raw_smp_processor_id(); } #ifdef CONFIG_SCHED_DEBUG WARN_ON_ONCE(p->migrate_disable_atomic); #endif if (p->migrate_disable) { p->migrate_disable++; #ifdef CONFIG_SCHED_DEBUG if (cpu >= 0) WARN_ON_ONCE(!cpumask_equal(p->cpus_ptr, cpumask_of(cpu))); #endif return raw_smp_processor_id(); } get_online_cpus(); preempt_disable(); p->migrate_disable = 1; if (cpu < 0) { p->cpus_ptr = _of(task_cpu(raw_smp_processor_id())); } else { if (!cpu_online(cpu)) { preempt_enable(); put_online_cpus(); WARN(1, "CPU is offline\n"); return -ENODEV; } p->cpus_ptr = _of(task_cpu((cpu))); } t->nr_cpus = 1; preempt_enable(); if (cpumask_equal(p->cpus_ptr, cpumask_of(cpu))) return cpu; /* move to the correct CPU */ BUG(); return raw_smp_processor_id(); } The task-flag / p->migrate_disable() counter is used in two spots in do_set_cpus_allowed(); __set_cpus_allowed_ptr(); so that a change to the affinity mask does not force a CPU hop. > Thanks, > > Ingo Sebastian
Re: [RFC PATCH] kernel: sched: Provide a pointer to the valid CPU mask
On 2017-04-06 08:16:22 [+0200], Ingo Molnar wrote: > > * Sebastian Andrzej Siewior wrote: > > > On 2017-04-05 09:39:43 [+0200], Ingo Molnar wrote: > > > > > > So maybe we could add the following facility: > > > > > > ptr = sched_migrate_to_cpu_save(cpu); > > > > > > ... > > > > > > sched_migrate_to_cpu_restore(ptr); > > BTW., and I'm sure this has come up before, but why doesn't migrate_disable() > use > a simple per task flag that the scheduler migration code takes into account? we could add that. But right now there are two spots which look at the counter to decide whether or not migration is disabled. > It should be functionally equivalent to the current solution, and it appears > to > have a heck of a smaller cross section with the rest of the scheduler. > > I.e.: > > static inline void migrate_disable(void) > { > current->migration_disabled++; > } > > ... > > static inline void migrate_enable(void) > { > current->migration_disabled--; > } > > or so? Then add this flag as a condition to can_migrate_task() et al. > > While we generally dislike such flags as they wreck havoc with the scheduler > if > overused, the cpus_allowed based solution has the exact same effect so it's > not > like it's a step backwards - and it should also be much faster and less > intrusive. So you are saying that we drop the cpus_ptr + cpus_mask fields again and instead add a task-flag to ensure that the scheduler does not migrate the task to another CPU? > Am I missing some complication? We do have the counter. We have need to ensure that the CPU is not going away while we are in a migrate_disable() region since we can be scheduled out. So the CPU can't go offline until we leave that region. This version uses get_online_cpus() while in -RT we have something called "pin_current_cpu()". This is a lightweight version of get_online_cpus() which should go away… Right now I have this and I need to test this and complete CPU hop part: #define migrate_disable() sched_migrate_to_cpu_save(-1) int sched_migrate_to_cpu_save(int cpu) { struct task_struct *p = current; if (in_atomic()) { #ifdef CONFIG_SCHED_DEBUG p->migrate_disable_atomic++; if (cpu >= 0) WARN_ON_ONCE(!cpumask_equal(p->cpus_ptr, cpumask_of(cpu))); #endif return raw_smp_processor_id(); } #ifdef CONFIG_SCHED_DEBUG WARN_ON_ONCE(p->migrate_disable_atomic); #endif if (p->migrate_disable) { p->migrate_disable++; #ifdef CONFIG_SCHED_DEBUG if (cpu >= 0) WARN_ON_ONCE(!cpumask_equal(p->cpus_ptr, cpumask_of(cpu))); #endif return raw_smp_processor_id(); } get_online_cpus(); preempt_disable(); p->migrate_disable = 1; if (cpu < 0) { p->cpus_ptr = _of(task_cpu(raw_smp_processor_id())); } else { if (!cpu_online(cpu)) { preempt_enable(); put_online_cpus(); WARN(1, "CPU is offline\n"); return -ENODEV; } p->cpus_ptr = _of(task_cpu((cpu))); } t->nr_cpus = 1; preempt_enable(); if (cpumask_equal(p->cpus_ptr, cpumask_of(cpu))) return cpu; /* move to the correct CPU */ BUG(); return raw_smp_processor_id(); } The task-flag / p->migrate_disable() counter is used in two spots in do_set_cpus_allowed(); __set_cpus_allowed_ptr(); so that a change to the affinity mask does not force a CPU hop. > Thanks, > > Ingo Sebastian
Re: [RFC PATCH] kernel: sched: Provide a pointer to the valid CPU mask
* Sebastian Andrzej Siewiorwrote: > On 2017-04-05 09:39:43 [+0200], Ingo Molnar wrote: > > > > So maybe we could add the following facility: > > > > ptr = sched_migrate_to_cpu_save(cpu); > > > > ... > > > > sched_migrate_to_cpu_restore(ptr); BTW., and I'm sure this has come up before, but why doesn't migrate_disable() use a simple per task flag that the scheduler migration code takes into account? It should be functionally equivalent to the current solution, and it appears to have a heck of a smaller cross section with the rest of the scheduler. I.e.: static inline void migrate_disable(void) { current->migration_disabled++; } ... static inline void migrate_enable(void) { current->migration_disabled--; } or so? Then add this flag as a condition to can_migrate_task() et al. While we generally dislike such flags as they wreck havoc with the scheduler if overused, the cpus_allowed based solution has the exact same effect so it's not like it's a step backwards - and it should also be much faster and less intrusive. Am I missing some complication? Thanks, Ingo
Re: [RFC PATCH] kernel: sched: Provide a pointer to the valid CPU mask
* Sebastian Andrzej Siewior wrote: > On 2017-04-05 09:39:43 [+0200], Ingo Molnar wrote: > > > > So maybe we could add the following facility: > > > > ptr = sched_migrate_to_cpu_save(cpu); > > > > ... > > > > sched_migrate_to_cpu_restore(ptr); BTW., and I'm sure this has come up before, but why doesn't migrate_disable() use a simple per task flag that the scheduler migration code takes into account? It should be functionally equivalent to the current solution, and it appears to have a heck of a smaller cross section with the rest of the scheduler. I.e.: static inline void migrate_disable(void) { current->migration_disabled++; } ... static inline void migrate_enable(void) { current->migration_disabled--; } or so? Then add this flag as a condition to can_migrate_task() et al. While we generally dislike such flags as they wreck havoc with the scheduler if overused, the cpus_allowed based solution has the exact same effect so it's not like it's a step backwards - and it should also be much faster and less intrusive. Am I missing some complication? Thanks, Ingo
Re: [RFC PATCH] kernel: sched: Provide a pointer to the valid CPU mask
On 2017-04-05 09:39:43 [+0200], Ingo Molnar wrote: > > So maybe we could add the following facility: > > ptr = sched_migrate_to_cpu_save(cpu); > > ... > > sched_migrate_to_cpu_restore(ptr); > > ... and use it in the cpufreq code. Then -rt could simply define > migrate_disable() > to be: > > ptr = sched_migrate_to_cpu_save(raw_smp_processor_id()); > > and define migrate_enable() as: > > sched_migrate_to_cpu_restore(ptr); > > ... or such. > > In the cpu == current_cpu case it would be super fast - otherwise it would > migrate > over to the target CPU first. Also note that this facility is strictly a > special > case for single-CPU masks and migrations - i.e. the constant pointer cpumask > optimization would always apply. > > Note that due to the use of the 'ptr' local variable the interface nests > naturally, so this would be a legitimate use: > > ptr = sched_migrate_to_cpu_save(cpu); > > ... > migrate_disable(); > ... > migrate_enable(); > ... > > sched_migrate_to_cpu_restore(ptr); > > I.e. my proposal would be to essentially upstream the -rt migrate_disable() > facility in a slightly more generic form that would fit the cpufreq usecase. > > I bet a number of the current driver's mucking with cpumask would also fit > this > new API. > > Does this make sense? It kind of does. If you want to allow migration to different CPU then it will might make things little complicated because we need to supported nested migrate_disable() and we can't (must not) change the mask while nesting. Other than that, I am not sure the cpufreq usecase is valid. schedule_work_on() looks better but maybe not as fast as the proposed sched_migrate_to_cpu_save(). Also, some users look wrong: [PATCH] CPUFREQ: Loongson2: drop set_cpus_allowed_ptr() https://www.linux-mips.org/archives/linux-mips/2017-04/msg00042.html and I received offlist mail pointing to the other cpufreq users. So here I am waiting for some feedback from the cpufreq maintainer. But I get your point. Other than super-fast switching to a specific CPU for $reason we could replace a few preempt_disable() invocation which are only there due to per-CPU variables. So let me hack this into -RT and come back… > Thanks, > > Ingo Sebastian
Re: [RFC PATCH] kernel: sched: Provide a pointer to the valid CPU mask
On 2017-04-05 09:39:43 [+0200], Ingo Molnar wrote: > > So maybe we could add the following facility: > > ptr = sched_migrate_to_cpu_save(cpu); > > ... > > sched_migrate_to_cpu_restore(ptr); > > ... and use it in the cpufreq code. Then -rt could simply define > migrate_disable() > to be: > > ptr = sched_migrate_to_cpu_save(raw_smp_processor_id()); > > and define migrate_enable() as: > > sched_migrate_to_cpu_restore(ptr); > > ... or such. > > In the cpu == current_cpu case it would be super fast - otherwise it would > migrate > over to the target CPU first. Also note that this facility is strictly a > special > case for single-CPU masks and migrations - i.e. the constant pointer cpumask > optimization would always apply. > > Note that due to the use of the 'ptr' local variable the interface nests > naturally, so this would be a legitimate use: > > ptr = sched_migrate_to_cpu_save(cpu); > > ... > migrate_disable(); > ... > migrate_enable(); > ... > > sched_migrate_to_cpu_restore(ptr); > > I.e. my proposal would be to essentially upstream the -rt migrate_disable() > facility in a slightly more generic form that would fit the cpufreq usecase. > > I bet a number of the current driver's mucking with cpumask would also fit > this > new API. > > Does this make sense? It kind of does. If you want to allow migration to different CPU then it will might make things little complicated because we need to supported nested migrate_disable() and we can't (must not) change the mask while nesting. Other than that, I am not sure the cpufreq usecase is valid. schedule_work_on() looks better but maybe not as fast as the proposed sched_migrate_to_cpu_save(). Also, some users look wrong: [PATCH] CPUFREQ: Loongson2: drop set_cpus_allowed_ptr() https://www.linux-mips.org/archives/linux-mips/2017-04/msg00042.html and I received offlist mail pointing to the other cpufreq users. So here I am waiting for some feedback from the cpufreq maintainer. But I get your point. Other than super-fast switching to a specific CPU for $reason we could replace a few preempt_disable() invocation which are only there due to per-CPU variables. So let me hack this into -RT and come back… > Thanks, > > Ingo Sebastian
Re: [RFC PATCH] kernel: sched: Provide a pointer to the valid CPU mask
* Sebastian Andrzej Siewiorwrote: > In commit 4b53a3412d66 ("sched/core: Remove the tsk_nr_cpus_allowed() > wrapper") the tsk_nr_cpus_allowed() wrapper was removed. There was not > much difference in !RT but in RT we used this to implement > migrate_disable(). Within a migrate_disable() section the CPU mask is > restricted to single CPU while the "normal" CPU mask remains untouched. > > As an alternative implementation Ingo suggested to use > struct task_struct { > const cpumask_t *cpus_ptr; > cpumask_t cpus_mask; > }; > with > t->cpus_allowed_ptr = >cpus_allowed; > > In -RT we then can switch the cpus_ptr to > t->cpus_allowed_ptr = _of(task_cpu(p)); > > in a migration disabled region. The rules are simple: > - Code that 'uses' ->cpus_allowed would use the pointer. > - Code that 'modifies' ->cpus_allowed would use the direct mask. > > While converting the existing users I tried to stick with the rules > above however… well mostly CPUFREQ tries to temporary switch the CPU > mask to do something on a certain CPU and then switches the mask back it > its original value. So in theory `cpus_ptr' could or should be used. > However if this is invoked in a migration disabled region (which is not > the case because it would require something like preempt_disable() and > set_cpus_allowed_ptr() might sleep so it can't be) then the "restore" > part would restore the wrong mask. So it only looks strange and I go for > the pointer… So maybe we could add the following facility: ptr = sched_migrate_to_cpu_save(cpu); ... sched_migrate_to_cpu_restore(ptr); ... and use it in the cpufreq code. Then -rt could simply define migrate_disable() to be: ptr = sched_migrate_to_cpu_save(raw_smp_processor_id()); and define migrate_enable() as: sched_migrate_to_cpu_restore(ptr); ... or such. In the cpu == current_cpu case it would be super fast - otherwise it would migrate over to the target CPU first. Also note that this facility is strictly a special case for single-CPU masks and migrations - i.e. the constant pointer cpumask optimization would always apply. Note that due to the use of the 'ptr' local variable the interface nests naturally, so this would be a legitimate use: ptr = sched_migrate_to_cpu_save(cpu); ... migrate_disable(); ... migrate_enable(); ... sched_migrate_to_cpu_restore(ptr); I.e. my proposal would be to essentially upstream the -rt migrate_disable() facility in a slightly more generic form that would fit the cpufreq usecase. I bet a number of the current driver's mucking with cpumask would also fit this new API. Does this make sense? Thanks, Ingo
Re: [RFC PATCH] kernel: sched: Provide a pointer to the valid CPU mask
* Sebastian Andrzej Siewior wrote: > In commit 4b53a3412d66 ("sched/core: Remove the tsk_nr_cpus_allowed() > wrapper") the tsk_nr_cpus_allowed() wrapper was removed. There was not > much difference in !RT but in RT we used this to implement > migrate_disable(). Within a migrate_disable() section the CPU mask is > restricted to single CPU while the "normal" CPU mask remains untouched. > > As an alternative implementation Ingo suggested to use > struct task_struct { > const cpumask_t *cpus_ptr; > cpumask_t cpus_mask; > }; > with > t->cpus_allowed_ptr = >cpus_allowed; > > In -RT we then can switch the cpus_ptr to > t->cpus_allowed_ptr = _of(task_cpu(p)); > > in a migration disabled region. The rules are simple: > - Code that 'uses' ->cpus_allowed would use the pointer. > - Code that 'modifies' ->cpus_allowed would use the direct mask. > > While converting the existing users I tried to stick with the rules > above however… well mostly CPUFREQ tries to temporary switch the CPU > mask to do something on a certain CPU and then switches the mask back it > its original value. So in theory `cpus_ptr' could or should be used. > However if this is invoked in a migration disabled region (which is not > the case because it would require something like preempt_disable() and > set_cpus_allowed_ptr() might sleep so it can't be) then the "restore" > part would restore the wrong mask. So it only looks strange and I go for > the pointer… So maybe we could add the following facility: ptr = sched_migrate_to_cpu_save(cpu); ... sched_migrate_to_cpu_restore(ptr); ... and use it in the cpufreq code. Then -rt could simply define migrate_disable() to be: ptr = sched_migrate_to_cpu_save(raw_smp_processor_id()); and define migrate_enable() as: sched_migrate_to_cpu_restore(ptr); ... or such. In the cpu == current_cpu case it would be super fast - otherwise it would migrate over to the target CPU first. Also note that this facility is strictly a special case for single-CPU masks and migrations - i.e. the constant pointer cpumask optimization would always apply. Note that due to the use of the 'ptr' local variable the interface nests naturally, so this would be a legitimate use: ptr = sched_migrate_to_cpu_save(cpu); ... migrate_disable(); ... migrate_enable(); ... sched_migrate_to_cpu_restore(ptr); I.e. my proposal would be to essentially upstream the -rt migrate_disable() facility in a slightly more generic form that would fit the cpufreq usecase. I bet a number of the current driver's mucking with cpumask would also fit this new API. Does this make sense? Thanks, Ingo