Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2
On Mon, Oct 14, 2013 at 02:23:55AM -0700, Paul E. McKenney wrote: > On Mon, Oct 14, 2013 at 11:05:08AM +0200, Peter Zijlstra wrote: > > On Sat, Oct 12, 2013 at 07:06:56PM +0200, Oleg Nesterov wrote: > > > it even disables irqs, so this should always imply rcu_read_lock() with > > > any implementation, > > > > Not so; I could make an RCU implementation that drives the state machine > > from rcu_read_unlock(). Such an implementation doesn't need the > > interrupt driven poll-state driver we currently have and could thus > > subvert that assumption :-) > > > > Then again, there's a good reason PaulMck didn't pick this > > implementation. > > True enough, but there really are some out-of-tree RCU implementations > that do take this approach and where disabling interrupts would not > block preemptible RCU. So please do not rely on this implementation > detail. You never know... Actually, the current implementation of SRCU is not blocked by CPUs disabling interrupts! Thanx, Paul > > > In fact I do not even understand why getaffinity() doesn't simply > > > return ->cpus_allowed, but this is off-topic. > > > > Yeah, me neither :-(, it always surprises me. But changing it is likely > > to break stuff so there we are. > > I know that feeling... > > Thanx, Paul -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2
On Mon, Oct 14, 2013 at 11:05:08AM +0200, Peter Zijlstra wrote: > On Sat, Oct 12, 2013 at 07:06:56PM +0200, Oleg Nesterov wrote: > > it even disables irqs, so this should always imply rcu_read_lock() with > > any implementation, > > Not so; I could make an RCU implementation that drives the state machine > from rcu_read_unlock(). Such an implementation doesn't need the > interrupt driven poll-state driver we currently have and could thus > subvert that assumption :-) > > Then again, there's a good reason PaulMck didn't pick this > implementation. True enough, but there really are some out-of-tree RCU implementations that do take this approach and where disabling interrupts would not block preemptible RCU. So please do not rely on this implementation detail. You never know... > > In fact I do not even understand why getaffinity() doesn't simply > > return ->cpus_allowed, but this is off-topic. > > Yeah, me neither :-(, it always surprises me. But changing it is likely > to break stuff so there we are. I know that feeling... Thanx, Paul -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2
On Sat, Oct 12, 2013 at 07:06:56PM +0200, Oleg Nesterov wrote: > it even disables irqs, so this should always imply rcu_read_lock() with > any implementation, Not so; I could make an RCU implementation that drives the state machine from rcu_read_unlock(). Such an implementation doesn't need the interrupt driven poll-state driver we currently have and could thus subvert that assumption :-) Then again, there's a good reason PaulMck didn't pick this implementation. > In fact I do not even understand why getaffinity() doesn't simply > return ->cpus_allowed, but this is off-topic. Yeah, me neither :-(, it always surprises me. But changing it is likely to break stuff so there we are. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2
On Sat, Oct 12, 2013 at 07:06:56PM +0200, Oleg Nesterov wrote: it even disables irqs, so this should always imply rcu_read_lock() with any implementation, Not so; I could make an RCU implementation that drives the state machine from rcu_read_unlock(). Such an implementation doesn't need the interrupt driven poll-state driver we currently have and could thus subvert that assumption :-) Then again, there's a good reason PaulMck didn't pick this implementation. In fact I do not even understand why getaffinity() doesn't simply return -cpus_allowed, but this is off-topic. Yeah, me neither :-(, it always surprises me. But changing it is likely to break stuff so there we are. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2
On Mon, Oct 14, 2013 at 11:05:08AM +0200, Peter Zijlstra wrote: On Sat, Oct 12, 2013 at 07:06:56PM +0200, Oleg Nesterov wrote: it even disables irqs, so this should always imply rcu_read_lock() with any implementation, Not so; I could make an RCU implementation that drives the state machine from rcu_read_unlock(). Such an implementation doesn't need the interrupt driven poll-state driver we currently have and could thus subvert that assumption :-) Then again, there's a good reason PaulMck didn't pick this implementation. True enough, but there really are some out-of-tree RCU implementations that do take this approach and where disabling interrupts would not block preemptible RCU. So please do not rely on this implementation detail. You never know... In fact I do not even understand why getaffinity() doesn't simply return -cpus_allowed, but this is off-topic. Yeah, me neither :-(, it always surprises me. But changing it is likely to break stuff so there we are. I know that feeling... Thanx, Paul -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2
On Mon, Oct 14, 2013 at 02:23:55AM -0700, Paul E. McKenney wrote: On Mon, Oct 14, 2013 at 11:05:08AM +0200, Peter Zijlstra wrote: On Sat, Oct 12, 2013 at 07:06:56PM +0200, Oleg Nesterov wrote: it even disables irqs, so this should always imply rcu_read_lock() with any implementation, Not so; I could make an RCU implementation that drives the state machine from rcu_read_unlock(). Such an implementation doesn't need the interrupt driven poll-state driver we currently have and could thus subvert that assumption :-) Then again, there's a good reason PaulMck didn't pick this implementation. True enough, but there really are some out-of-tree RCU implementations that do take this approach and where disabling interrupts would not block preemptible RCU. So please do not rely on this implementation detail. You never know... Actually, the current implementation of SRCU is not blocked by CPUs disabling interrupts! Thanx, Paul In fact I do not even understand why getaffinity() doesn't simply return -cpus_allowed, but this is off-topic. Yeah, me neither :-(, it always surprises me. But changing it is likely to break stuff so there we are. I know that feeling... Thanx, Paul -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2
On 10/11, Peter Zijlstra wrote: > > On Fri, Oct 11, 2013 at 08:25:07PM +0200, Oleg Nesterov wrote: > > On 10/11, Peter Zijlstra wrote: > > > > > > As a penance I'll start by removing all get_online_cpus() usage from the > > > scheduler. > > > > I only looked at the change in setaffinity, > > > > > @@ -3706,7 +3707,6 @@ long sched_setaffinity(pid_t pid, const struct > > > cpumask *in_mask) > > > struct task_struct *p; > > > int retval; > > > > > > - get_online_cpus(); > > > rcu_read_lock(); > > > > Hmm. In theory task_rq_lock() doesn't imply rcu-lock, so > > set_cpus_allowed_ptr() can miss the change in cpu_active_mask. But this > > is probably fine, CPU_DYING does __migrate_task(). > > I'm fine with always doing sync_sched(); sync_rcu(); if that makes you > feel better. No, I was just curious. iow, I am asking, not arguing. > But I thought that assuming that !PREEMPT sync_rcu() would > imply sync_sched() was ok. I think the comment there even says as much. > > And task_rq_lock() will very much disable preemption; and thus we get > what we want, right? it even disables irqs, so this should always imply rcu_read_lock() with any implementation, I guess. However I was told we should not rely on this, and say posix_timer_event() does rcu_read_lock() even it is always called under spin_lock_irq(). But what I actually tried to say, it seems that this particular change looks fine even if cpu_down() doesn't do sync_sched/rcu at all, because we can rely on __migrate_task(). IOW, if we race with DOWN_PREPARE and miss set_cpu_active(false) we can pretend that this CPU goes down later. > In any case; the goal was to make either RCU or preempt-disable > sufficient. > > > However. This means that sched_setaffinity() can fail if it races with > > the failing cpu_down() (say, __cpu_notify(CPU_DOWN_PREPARE) fails). > > Probably we do not really care, just this looks a bit confusing. > > Couldn't be bothered; failing hotplug will have side-effects any which > way. OK. > > > @@ -3827,12 +3825,11 @@ long sched_getaffinity(pid_t pid, struct cpumask > > > *mask) > > > goto out_unlock; > > > > > > raw_spin_lock_irqsave(>pi_lock, flags); > > > - cpumask_and(mask, >cpus_allowed, cpu_online_mask); > > > + cpumask_and(mask, >cpus_allowed, cpu_active_mask); > > > > But I am just curious, is this change is strictly needed? > > No; we could do without. It really doesn't matter much if anything. I > only did it because sched_setaffinity()->set_cpus_allowed_ptr() checks > against active, not online. And had a sudden urge to make get/set > symmetric -- totally pointless otherwise. OK, thanks, I was just curious. In fact I do not even understand why getaffinity() doesn't simply return ->cpus_allowed, but this is off-topic. Oleg. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2
On 10/11, Peter Zijlstra wrote: On Fri, Oct 11, 2013 at 08:25:07PM +0200, Oleg Nesterov wrote: On 10/11, Peter Zijlstra wrote: As a penance I'll start by removing all get_online_cpus() usage from the scheduler. I only looked at the change in setaffinity, @@ -3706,7 +3707,6 @@ long sched_setaffinity(pid_t pid, const struct cpumask *in_mask) struct task_struct *p; int retval; - get_online_cpus(); rcu_read_lock(); Hmm. In theory task_rq_lock() doesn't imply rcu-lock, so set_cpus_allowed_ptr() can miss the change in cpu_active_mask. But this is probably fine, CPU_DYING does __migrate_task(). I'm fine with always doing sync_sched(); sync_rcu(); if that makes you feel better. No, I was just curious. iow, I am asking, not arguing. But I thought that assuming that !PREEMPT sync_rcu() would imply sync_sched() was ok. I think the comment there even says as much. And task_rq_lock() will very much disable preemption; and thus we get what we want, right? it even disables irqs, so this should always imply rcu_read_lock() with any implementation, I guess. However I was told we should not rely on this, and say posix_timer_event() does rcu_read_lock() even it is always called under spin_lock_irq(). But what I actually tried to say, it seems that this particular change looks fine even if cpu_down() doesn't do sync_sched/rcu at all, because we can rely on __migrate_task(). IOW, if we race with DOWN_PREPARE and miss set_cpu_active(false) we can pretend that this CPU goes down later. In any case; the goal was to make either RCU or preempt-disable sufficient. However. This means that sched_setaffinity() can fail if it races with the failing cpu_down() (say, __cpu_notify(CPU_DOWN_PREPARE) fails). Probably we do not really care, just this looks a bit confusing. Couldn't be bothered; failing hotplug will have side-effects any which way. OK. @@ -3827,12 +3825,11 @@ long sched_getaffinity(pid_t pid, struct cpumask *mask) goto out_unlock; raw_spin_lock_irqsave(p-pi_lock, flags); - cpumask_and(mask, p-cpus_allowed, cpu_online_mask); + cpumask_and(mask, p-cpus_allowed, cpu_active_mask); But I am just curious, is this change is strictly needed? No; we could do without. It really doesn't matter much if anything. I only did it because sched_setaffinity()-set_cpus_allowed_ptr() checks against active, not online. And had a sudden urge to make get/set symmetric -- totally pointless otherwise. OK, thanks, I was just curious. In fact I do not even understand why getaffinity() doesn't simply return -cpus_allowed, but this is off-topic. Oleg. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2
On Fri, Oct 11, 2013 at 08:25:07PM +0200, Oleg Nesterov wrote: > On 10/11, Peter Zijlstra wrote: > > > > As a penance I'll start by removing all get_online_cpus() usage from the > > scheduler. > > I only looked at the change in setaffinity, > > > @@ -3706,7 +3707,6 @@ long sched_setaffinity(pid_t pid, const struct > > cpumask *in_mask) > > struct task_struct *p; > > int retval; > > > > - get_online_cpus(); > > rcu_read_lock(); > > Hmm. In theory task_rq_lock() doesn't imply rcu-lock, so > set_cpus_allowed_ptr() can miss the change in cpu_active_mask. But this > is probably fine, CPU_DYING does __migrate_task(). I'm fine with always doing sync_sched(); sync_rcu(); if that makes you feel better. But I thought that assuming that !PREEMPT sync_rcu() would imply sync_sched() was ok. I think the comment there even says as much. And task_rq_lock() will very much disable preemption; and thus we get what we want, right? In any case; the goal was to make either RCU or preempt-disable sufficient. > However. This means that sched_setaffinity() can fail if it races with > the failing cpu_down() (say, __cpu_notify(CPU_DOWN_PREPARE) fails). > Probably we do not really care, just this looks a bit confusing. Couldn't be bothered; failing hotplug will have side-effects any which way. > > @@ -3814,7 +3813,6 @@ long sched_getaffinity(pid_t pid, struct cpumask > > *mask) > > unsigned long flags; > > int retval; > > > > - get_online_cpus(); > > This change is probably fine in any case? Yes. > > rcu_read_lock(); > > > > retval = -ESRCH; > > @@ -3827,12 +3825,11 @@ long sched_getaffinity(pid_t pid, struct cpumask > > *mask) > > goto out_unlock; > > > > raw_spin_lock_irqsave(>pi_lock, flags); > > - cpumask_and(mask, >cpus_allowed, cpu_online_mask); > > + cpumask_and(mask, >cpus_allowed, cpu_active_mask); > > But I am just curious, is this change is strictly needed? No; we could do without. It really doesn't matter much if anything. I only did it because sched_setaffinity()->set_cpus_allowed_ptr() checks against active, not online. And had a sudden urge to make get/set symmetric -- totally pointless otherwise. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2
On 10/11, Peter Zijlstra wrote: > > As a penance I'll start by removing all get_online_cpus() usage from the > scheduler. I only looked at the change in setaffinity, > @@ -3706,7 +3707,6 @@ long sched_setaffinity(pid_t pid, const struct cpumask > *in_mask) > struct task_struct *p; > int retval; > > - get_online_cpus(); > rcu_read_lock(); Hmm. In theory task_rq_lock() doesn't imply rcu-lock, so set_cpus_allowed_ptr() can miss the change in cpu_active_mask. But this is probably fine, CPU_DYING does __migrate_task(). However. This means that sched_setaffinity() can fail if it races with the failing cpu_down() (say, __cpu_notify(CPU_DOWN_PREPARE) fails). Probably we do not really care, just this looks a bit confusing. > @@ -3814,7 +3813,6 @@ long sched_getaffinity(pid_t pid, struct cpumask *mask) > unsigned long flags; > int retval; > > - get_online_cpus(); This change is probably fine in any case? > rcu_read_lock(); > > retval = -ESRCH; > @@ -3827,12 +3825,11 @@ long sched_getaffinity(pid_t pid, struct cpumask > *mask) > goto out_unlock; > > raw_spin_lock_irqsave(>pi_lock, flags); > - cpumask_and(mask, >cpus_allowed, cpu_online_mask); > + cpumask_and(mask, >cpus_allowed, cpu_active_mask); But I am just curious, is this change is strictly needed? Afaics we do not care if we race with set_cpu_online(true/false). Oleg. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2
> The problem is that the scheduler doesn't see that the current task has > preemption disabled. It only looks at the priorities of the current > task, and if it can preempt it, it will. It sets the NEED_RESCHED to the > current task and waits for the preemption to schedule it out. Ah, got it. It's a priority inversion problem, and while there is a standard solution (priority inheritance), if I were to suggest such a heavyweight solution, Linus would start yelling again and Sarah would get unhappy, and we all know how that ends up. Thank you. I will now cease displaying my ignorance in public. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2
On 11 Oct 2013 08:14:57 -0400 "George Spelvin" wrote: > > There's places in the kernel that does for_each_cpu() that I'm sure you > > don't want to disable preemption for. Especially when you start having > > 4096 CPU machines! > > Er... why not? > > Seriously. If I have 4096 processors, and preemption is disabled on > *one* of them for a long time, can't an urgent task just find a different > processor to preempt? The problem is that the scheduler doesn't see that the current task has preemption disabled. It only looks at the priorities of the current task, and if it can preempt it, it will. It sets the NEED_RESCHED to the current task and waits for the preemption to schedule it out. If an RT task is woken on a CPU that's running a SCHED_OTHER task that's doing this loop. It will have to wait for the loop to finish before it can schedule in. After doing the wake up, the scheduling algorithm is happy that it got an RT task on a CPU quickly, when in reality it has to wait a long time till preemption is enabled again. -- Steve > > This seems like a non-problem. Or am I misunderstanding something about > processor affinity? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2
On Thu, Oct 10, 2013 at 11:49:15AM -0700, Linus Torvalds wrote: > On Thu, Oct 10, 2013 at 11:34 AM, Peter Zijlstra wrote: > > > > But my point is that even though there aren't many of these today; with > > the growing number of cpus in 'commodity' hardware you want to move away > > from using preempt_disable() as hotplug lock. > > Umm. > > Wasn't this pretty much the argument for the COMPLETELY F*CKED UP > change to make the vma locking use a semaphore? Not entirely; but fair enough, I did screw up there. > The whole "it's more convenient to use sleeping locks" argument is > PURE AND UTTER SHIT when it comes to really core code. We are *much* > better off saying "this is core, we care so deeply about performance > that you had better use your brain before you do this". > > Seriously. Your argument is bad, but more importantly, it is > *dangerously* bad. It's crap that results in bad code: and the bad > code is almost impossible to fix up later, because once you encourage > people to do the easy thing, they'll do so. Right, I fell face first into this trap. The existence of {get,put}_online_cpus() made me stop thinking and use it. As a penance I'll start by removing all get_online_cpus() usage from the scheduler. --- Subject: sched: Remove get_online_cpus() usage Remove get_online_cpus() usage from the scheduler; there's 4 sites that use it: - sched_init_smp(); where its completely superfluous since we're in 'early' boot and there simply cannot be any hotplugging. - sched_getaffinity(); we already take a raw spinlock to protect the task cpus_allowed mask, this disables preemption and therefore also stabilizes cpu_online_mask as that's modified using stop_machine. However switch to active mask for symmetry with sched_setaffinity()/set_cpus_allowed_ptr(). We guarantee active mask stability by inserting sync_rcu/sched() into _cpu_down. - sched_setaffinity(); we don't appear to need get_online_cpus() either, there's two sites where hotplug appears relevant: * cpuset_cpus_allowed(); for the !cpuset case we use possible_mask, for the cpuset case we hold task_lock, which is a spinlock and thus for mainline disables preemption (might cause pain on RT). * set_cpus_allowed_ptr(); Holds all scheduler locks and thus has preemption properly disabled; also it already deals with hotplug races explicitly where it releases them. - migrate_swap(); we can make stop_two_cpus() do the heavy lifting for us with a little trickery. By adding a sync_sched/rcu() after the CPU_DOWN_PREPARE notifier we can provide preempt/rcu guarantees for cpu_active_mask. Use these to validate that both our cpus are active when queueing the stop work before we queue the stop_machine works for take_cpu_down(). Signed-off-by: Peter Zijlstra --- kernel/cpu.c | 17 + kernel/sched/core.c | 20 ++-- kernel/stop_machine.c | 26 +- 3 files changed, 48 insertions(+), 15 deletions(-) diff --git a/kernel/cpu.c b/kernel/cpu.c index d7f07a2da5a6..63aa50d7ce1e 100644 --- a/kernel/cpu.c +++ b/kernel/cpu.c @@ -308,6 +308,23 @@ static int __ref _cpu_down(unsigned int cpu, int tasks_frozen) } smpboot_park_threads(cpu); + /* +* By now we've cleared cpu_active_mask, wait for all preempt-disabled +* and RCU users of this state to go away such that all new such users +* will observe it. +* +* For CONFIG_PREEMPT we have preemptible RCU and its sync_rcu() might +* not imply sync_sched(), so explicitly call both. +*/ +#ifdef CONFIG_PREEMPT + synchronize_sched(); +#endif + synchronize_rcu(); + + /* +* So now all preempt/rcu users must observe !cpu_active(). +*/ + err = __stop_machine(take_cpu_down, _param, cpumask_of(cpu)); if (err) { /* CPU didn't die: tell everyone. Can't complain. */ diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 0c3feebcf112..498a5e5a53f5 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -1081,8 +1081,6 @@ int migrate_swap(struct task_struct *cur, struct task_struct *p) struct migration_swap_arg arg; int ret = -EINVAL; - get_online_cpus(); - arg = (struct migration_swap_arg){ .src_task = cur, .src_cpu = task_cpu(cur), @@ -1093,6 +1091,10 @@ int migrate_swap(struct task_struct *cur, struct task_struct *p) if (arg.src_cpu == arg.dst_cpu) goto out; + /* +* These three tests are all lockless; this is OK since all of them +* will be re-checked with proper locks held further down the line. +*/ if (!cpu_active(arg.src_cpu) || !cpu_active(arg.dst_cpu)) goto out; @@ -1105,7 +1107,6 @@ int migrate_swap(struct task_struct *cur, struct task_struct *p) ret = stop_two_cpus(arg.dst_cpu,
Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2
> There's places in the kernel that does for_each_cpu() that I'm sure you > don't want to disable preemption for. Especially when you start having > 4096 CPU machines! Er... why not? Seriously. If I have 4096 processors, and preemption is disabled on *one* of them for a long time, can't an urgent task just find a different processor to preempt? This seems like a non-problem. Or am I misunderstanding something about processor affinity? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2
* Linus Torvalds wrote: > On Thu, Oct 10, 2013 at 12:04 PM, Steven Rostedt wrote: > > > > I'm wondering if we can have a for_each_cpu() that only disables > > preemption in the loop. > > I think we'd generally want to have it be something the loop asks for. > > If the loop is just some kind of "gather statistics" thing, I don't > think it's required. The cost per loop is so low (usually adding up a > couple of words) that the downside drowns the upside. > > And we could easily look at MAXSMP (or NR_CPUS) at compile-time, and not > do it for common small values (although it looks like Fedora defaults to > 128 CPU's for their distro kernels, which seems a bit excessive - too > many by far for normal people, too few for the crazy big ones). Ubuntu has it at 256, so I guess Fedora is even a bit conservative ;-) Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2
There's places in the kernel that does for_each_cpu() that I'm sure you don't want to disable preemption for. Especially when you start having 4096 CPU machines! Er... why not? Seriously. If I have 4096 processors, and preemption is disabled on *one* of them for a long time, can't an urgent task just find a different processor to preempt? This seems like a non-problem. Or am I misunderstanding something about processor affinity? -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2
On Thu, Oct 10, 2013 at 11:49:15AM -0700, Linus Torvalds wrote: On Thu, Oct 10, 2013 at 11:34 AM, Peter Zijlstra pet...@infradead.org wrote: But my point is that even though there aren't many of these today; with the growing number of cpus in 'commodity' hardware you want to move away from using preempt_disable() as hotplug lock. Umm. Wasn't this pretty much the argument for the COMPLETELY F*CKED UP change to make the vma locking use a semaphore? Not entirely; but fair enough, I did screw up there. The whole it's more convenient to use sleeping locks argument is PURE AND UTTER SHIT when it comes to really core code. We are *much* better off saying this is core, we care so deeply about performance that you had better use your brain before you do this. Seriously. Your argument is bad, but more importantly, it is *dangerously* bad. It's crap that results in bad code: and the bad code is almost impossible to fix up later, because once you encourage people to do the easy thing, they'll do so. Right, I fell face first into this trap. The existence of {get,put}_online_cpus() made me stop thinking and use it. As a penance I'll start by removing all get_online_cpus() usage from the scheduler. --- Subject: sched: Remove get_online_cpus() usage Remove get_online_cpus() usage from the scheduler; there's 4 sites that use it: - sched_init_smp(); where its completely superfluous since we're in 'early' boot and there simply cannot be any hotplugging. - sched_getaffinity(); we already take a raw spinlock to protect the task cpus_allowed mask, this disables preemption and therefore also stabilizes cpu_online_mask as that's modified using stop_machine. However switch to active mask for symmetry with sched_setaffinity()/set_cpus_allowed_ptr(). We guarantee active mask stability by inserting sync_rcu/sched() into _cpu_down. - sched_setaffinity(); we don't appear to need get_online_cpus() either, there's two sites where hotplug appears relevant: * cpuset_cpus_allowed(); for the !cpuset case we use possible_mask, for the cpuset case we hold task_lock, which is a spinlock and thus for mainline disables preemption (might cause pain on RT). * set_cpus_allowed_ptr(); Holds all scheduler locks and thus has preemption properly disabled; also it already deals with hotplug races explicitly where it releases them. - migrate_swap(); we can make stop_two_cpus() do the heavy lifting for us with a little trickery. By adding a sync_sched/rcu() after the CPU_DOWN_PREPARE notifier we can provide preempt/rcu guarantees for cpu_active_mask. Use these to validate that both our cpus are active when queueing the stop work before we queue the stop_machine works for take_cpu_down(). Signed-off-by: Peter Zijlstra pet...@infradead.org --- kernel/cpu.c | 17 + kernel/sched/core.c | 20 ++-- kernel/stop_machine.c | 26 +- 3 files changed, 48 insertions(+), 15 deletions(-) diff --git a/kernel/cpu.c b/kernel/cpu.c index d7f07a2da5a6..63aa50d7ce1e 100644 --- a/kernel/cpu.c +++ b/kernel/cpu.c @@ -308,6 +308,23 @@ static int __ref _cpu_down(unsigned int cpu, int tasks_frozen) } smpboot_park_threads(cpu); + /* +* By now we've cleared cpu_active_mask, wait for all preempt-disabled +* and RCU users of this state to go away such that all new such users +* will observe it. +* +* For CONFIG_PREEMPT we have preemptible RCU and its sync_rcu() might +* not imply sync_sched(), so explicitly call both. +*/ +#ifdef CONFIG_PREEMPT + synchronize_sched(); +#endif + synchronize_rcu(); + + /* +* So now all preempt/rcu users must observe !cpu_active(). +*/ + err = __stop_machine(take_cpu_down, tcd_param, cpumask_of(cpu)); if (err) { /* CPU didn't die: tell everyone. Can't complain. */ diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 0c3feebcf112..498a5e5a53f5 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -1081,8 +1081,6 @@ int migrate_swap(struct task_struct *cur, struct task_struct *p) struct migration_swap_arg arg; int ret = -EINVAL; - get_online_cpus(); - arg = (struct migration_swap_arg){ .src_task = cur, .src_cpu = task_cpu(cur), @@ -1093,6 +1091,10 @@ int migrate_swap(struct task_struct *cur, struct task_struct *p) if (arg.src_cpu == arg.dst_cpu) goto out; + /* +* These three tests are all lockless; this is OK since all of them +* will be re-checked with proper locks held further down the line. +*/ if (!cpu_active(arg.src_cpu) || !cpu_active(arg.dst_cpu)) goto out; @@ -1105,7 +1107,6 @@ int migrate_swap(struct task_struct *cur, struct task_struct *p) ret =
Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2
On 11 Oct 2013 08:14:57 -0400 George Spelvin li...@horizon.com wrote: There's places in the kernel that does for_each_cpu() that I'm sure you don't want to disable preemption for. Especially when you start having 4096 CPU machines! Er... why not? Seriously. If I have 4096 processors, and preemption is disabled on *one* of them for a long time, can't an urgent task just find a different processor to preempt? The problem is that the scheduler doesn't see that the current task has preemption disabled. It only looks at the priorities of the current task, and if it can preempt it, it will. It sets the NEED_RESCHED to the current task and waits for the preemption to schedule it out. If an RT task is woken on a CPU that's running a SCHED_OTHER task that's doing this loop. It will have to wait for the loop to finish before it can schedule in. After doing the wake up, the scheduling algorithm is happy that it got an RT task on a CPU quickly, when in reality it has to wait a long time till preemption is enabled again. -- Steve This seems like a non-problem. Or am I misunderstanding something about processor affinity? -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2
The problem is that the scheduler doesn't see that the current task has preemption disabled. It only looks at the priorities of the current task, and if it can preempt it, it will. It sets the NEED_RESCHED to the current task and waits for the preemption to schedule it out. Ah, got it. It's a priority inversion problem, and while there is a standard solution (priority inheritance), if I were to suggest such a heavyweight solution, Linus would start yelling again and Sarah would get unhappy, and we all know how that ends up. Thank you. I will now cease displaying my ignorance in public. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2
On 10/11, Peter Zijlstra wrote: As a penance I'll start by removing all get_online_cpus() usage from the scheduler. I only looked at the change in setaffinity, @@ -3706,7 +3707,6 @@ long sched_setaffinity(pid_t pid, const struct cpumask *in_mask) struct task_struct *p; int retval; - get_online_cpus(); rcu_read_lock(); Hmm. In theory task_rq_lock() doesn't imply rcu-lock, so set_cpus_allowed_ptr() can miss the change in cpu_active_mask. But this is probably fine, CPU_DYING does __migrate_task(). However. This means that sched_setaffinity() can fail if it races with the failing cpu_down() (say, __cpu_notify(CPU_DOWN_PREPARE) fails). Probably we do not really care, just this looks a bit confusing. @@ -3814,7 +3813,6 @@ long sched_getaffinity(pid_t pid, struct cpumask *mask) unsigned long flags; int retval; - get_online_cpus(); This change is probably fine in any case? rcu_read_lock(); retval = -ESRCH; @@ -3827,12 +3825,11 @@ long sched_getaffinity(pid_t pid, struct cpumask *mask) goto out_unlock; raw_spin_lock_irqsave(p-pi_lock, flags); - cpumask_and(mask, p-cpus_allowed, cpu_online_mask); + cpumask_and(mask, p-cpus_allowed, cpu_active_mask); But I am just curious, is this change is strictly needed? Afaics we do not care if we race with set_cpu_online(true/false). Oleg. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2
On Fri, Oct 11, 2013 at 08:25:07PM +0200, Oleg Nesterov wrote: On 10/11, Peter Zijlstra wrote: As a penance I'll start by removing all get_online_cpus() usage from the scheduler. I only looked at the change in setaffinity, @@ -3706,7 +3707,6 @@ long sched_setaffinity(pid_t pid, const struct cpumask *in_mask) struct task_struct *p; int retval; - get_online_cpus(); rcu_read_lock(); Hmm. In theory task_rq_lock() doesn't imply rcu-lock, so set_cpus_allowed_ptr() can miss the change in cpu_active_mask. But this is probably fine, CPU_DYING does __migrate_task(). I'm fine with always doing sync_sched(); sync_rcu(); if that makes you feel better. But I thought that assuming that !PREEMPT sync_rcu() would imply sync_sched() was ok. I think the comment there even says as much. And task_rq_lock() will very much disable preemption; and thus we get what we want, right? In any case; the goal was to make either RCU or preempt-disable sufficient. However. This means that sched_setaffinity() can fail if it races with the failing cpu_down() (say, __cpu_notify(CPU_DOWN_PREPARE) fails). Probably we do not really care, just this looks a bit confusing. Couldn't be bothered; failing hotplug will have side-effects any which way. @@ -3814,7 +3813,6 @@ long sched_getaffinity(pid_t pid, struct cpumask *mask) unsigned long flags; int retval; - get_online_cpus(); This change is probably fine in any case? Yes. rcu_read_lock(); retval = -ESRCH; @@ -3827,12 +3825,11 @@ long sched_getaffinity(pid_t pid, struct cpumask *mask) goto out_unlock; raw_spin_lock_irqsave(p-pi_lock, flags); - cpumask_and(mask, p-cpus_allowed, cpu_online_mask); + cpumask_and(mask, p-cpus_allowed, cpu_active_mask); But I am just curious, is this change is strictly needed? No; we could do without. It really doesn't matter much if anything. I only did it because sched_setaffinity()-set_cpus_allowed_ptr() checks against active, not online. And had a sudden urge to make get/set symmetric -- totally pointless otherwise. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2
* Linus Torvalds torva...@linux-foundation.org wrote: On Thu, Oct 10, 2013 at 12:04 PM, Steven Rostedt rost...@goodmis.org wrote: I'm wondering if we can have a for_each_cpu() that only disables preemption in the loop. I think we'd generally want to have it be something the loop asks for. If the loop is just some kind of gather statistics thing, I don't think it's required. The cost per loop is so low (usually adding up a couple of words) that the downside drowns the upside. And we could easily look at MAXSMP (or NR_CPUS) at compile-time, and not do it for common small values (although it looks like Fedora defaults to 128 CPU's for their distro kernels, which seems a bit excessive - too many by far for normal people, too few for the crazy big ones). Ubuntu has it at 256, so I guess Fedora is even a bit conservative ;-) Thanks, Ingo -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2
On Thu, 10 Oct 2013 12:16:16 -0700 Linus Torvalds wrote: > On Thu, Oct 10, 2013 at 12:04 PM, Steven Rostedt wrote: > > > > I'm wondering if we can have a for_each_cpu() that only disables > > preemption in the loop. > > I think we'd generally want to have it be something the loop asks for. Yeah, I was thinking of adding a new macro, not update for_each_cpu() itself, so that only the locations that would require it would use it. > > If the loop is just some kind of "gather statistics" thing, I don't > think it's required. The cost per loop is so low (usually adding up a > couple of words) that the downside drowns the upside. > > And we could easily look at MAXSMP (or NR_CPUS) at compile-time, and > not do it for common small values (although it looks like Fedora > defaults to 128 CPU's for their distro kernels, which seems a bit > excessive - too many by far for normal people, too few for the crazy > big ones). We could always make it a boot time option, and use alternates to change the code as we do with spin_locks() and other code. -- Steve (the lover of self modifying code) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2
On Thu, Oct 10, 2013 at 12:16:16PM -0700, Linus Torvalds wrote: > On Thu, Oct 10, 2013 at 12:04 PM, Steven Rostedt wrote: > > > > I'm wondering if we can have a for_each_cpu() that only disables > > preemption in the loop. > > I think we'd generally want to have it be something the loop asks for. > > If the loop is just some kind of "gather statistics" thing, I don't > think it's required. The cost per loop is so low (usually adding up a > couple of words) that the downside drowns the upside. > > And we could easily look at MAXSMP (or NR_CPUS) at compile-time, and > not do it for common small values (although it looks like Fedora > defaults to 128 CPU's for their distro kernels, which seems a bit > excessive - too many by far for normal people, too few for the crazy > big ones). Alternatively we could write it something like: rcu_read_lock(); for_each_online_node(node) { for_each_cpu(cpu, cpumask_of_node(node)) { ... } cond_resched_rcu(); } rcu_read_unlock(); But yes, I think this pattern (and variations) should work for most cases. The one I'm struggling with atm is kernel/cpuset.c: rebuild_sched_domains_locked(). Although I'm thinking we should do that seqlock style, just build it and verify if its still valid, if not, try again -- although I'm sure it'll be 'fun' to get correct. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2
On Thu, Oct 10, 2013 at 12:04 PM, Steven Rostedt wrote: > > I'm wondering if we can have a for_each_cpu() that only disables > preemption in the loop. I think we'd generally want to have it be something the loop asks for. If the loop is just some kind of "gather statistics" thing, I don't think it's required. The cost per loop is so low (usually adding up a couple of words) that the downside drowns the upside. And we could easily look at MAXSMP (or NR_CPUS) at compile-time, and not do it for common small values (although it looks like Fedora defaults to 128 CPU's for their distro kernels, which seems a bit excessive - too many by far for normal people, too few for the crazy big ones). Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2
On Thu, Oct 10, 2013 at 08:50:32PM +0200, Peter Zijlstra wrote: > On Thu, Oct 10, 2013 at 02:43:27PM -0400, Steven Rostedt wrote: > > On Thu, 10 Oct 2013 11:10:35 -0700 > > Linus Torvalds wrote: > > .. now we can free all the percpu data and kill the CPU .. > > > > > > without any locking anywhere - not stop-machine, not anything. If > > > somebody is doing a "for_each_cpu()" (under just a regular > > > rcu_read_lock()) and they see the bit set while it's going down, who > > > cares? The CPU is still there, the data is accessible.. > > > > The problem is that rcu_read_lock() requires preemption disabled unless > > you are using the preemptable rcu tree version. There's always > > srcu_read_lock() but that's not so free. It's basically the same as > > what Peter is doing. > > No, srcu is actually more expensive in the fast path. Although possibly > we could make SRCU more complex ;-) Indeed. Especially if we wanted to get rid of the memory barriers in srcu_read_lock() and srcu_read_unlock() and still allow SRCU to be used from idle and offline CPUs! ;-) Thanx, Paul > > There's places in the kernel that does for_each_cpu() that I'm sure you > > don't want to disable preemption for. Especially when you start having > > 4096 CPU machines! > > :-) But shouldn't you only have to disable over each separate pass through the loop rather across the entire loop? "I guarantee that the CPU I just handed you will remain online through the body of the loop" rather than "I guarantee that no CPU is going anywhere through the entire loop." Thanx, Paul -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2
On 10/10/2013 10:24 PM, Oleg Nesterov wrote: > On 10/10, Andrew Morton wrote: >> >> On Thu, 10 Oct 2013 17:26:12 +0200 Oleg Nesterov wrote: >> >>> On 10/10, Ingo Molnar wrote: So ... why not make it _really_ cheap, i.e. the read lock costing nothing, and tie CPU hotplug to freezing all tasks in the system? Actual CPU hot unplugging and repluggin is _ridiculously_ rare in a system, I don't understand how we tolerate _any_ overhead from this utter slowpath. >>> >>> Well, iirc Srivatsa (cc'ed) pointed out that some systems do cpu_down/up >>> quite often to save the power. >> >> cpu hotremove already uses stop_machine, > > And Srivatsa wants to remove it from cpu_down(). > Yes, I have worked on several designs to remove stop_machine() from the cpu_down path. http://lwn.net/Articles/538819/ http://lwn.net/Articles/556727/ >> so such an approach shouldn't >> actually worsen things (a lot) for them? > > this depends on what this "freezing all tasks" actually means. > I understood it as try_to_freeze_tasks/etc, looks too heavy... > > But my only point was, I am not sure we can assume that cpu_down/up > is extremly rare and its cost does not matter. > >> use stop_machine() on the add/remove >> (ie, "writer") side and nothing at all on the "reader" side. Is there >> anything which fundamentally prevents cpu hotplug from doing the same? > Its certainly possible to remove stop_machine() from CPU hotplug, as I've demonstrated in the patchsets mentioned above. And there were pretty good performance improvements too, arising out of that change, as described here: http://article.gmane.org/gmane.linux.ports.ppc.embedded/56122 Regards, Srivatsa S. Bhat -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2
On Thu, 10 Oct 2013 11:49:15 -0700 Linus Torvalds wrote: > Oh, and I'm sure there are several users that currently depend on > being able to sleep over get_online_cpu's. But I'm pretty sure it's > "several", not "hundreds", and I think we could fix them up. I'm wondering if we can have a for_each_cpu() that only disables preemption in the loop. That is, each iteration will enable preemption, but the loop itself will guarantee that the current cpu to process still exists. rcu_read_lock(); for_each_cpu(cpu, mask) { rcu_read_unlock(); rcu_read_lock(); if (!cpu_online(cpu)) continue; [...] } rcu_read_unlock(); That way expensive loops wont stop the current CPU to process all online CPUs. Of course, it will miss a CPU that comes online. But it would still need to handle the case of a CPU coming online after the final put_online_cpus(), so I'm not sure that's a problem. -- Steve -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2
On Thu, Oct 10, 2013 at 11:43 AM, Steven Rostedt wrote: > > There's places in the kernel that does for_each_cpu() that I'm sure you > don't want to disable preemption for. Especially when you start having > 4096 CPU machines! We could _easily_ add preemption points in for_each_cpu() for the MAX_SMP case. We can even do it in the macro itself. So I wouldn't worry about things like that. I'd expect the "Oh damn, we have to allocate" case to be more painful, but I think that is fairly easily handled by having something like a "redo_cpu()" macro that you can call inside for_each_cpu(). So the pattern for the (nopefully-not-all-that-common) thing would be something like sometype *prealloc = NULL; for_each_cpu() { if (!prealloc) { rcu_read_unlock(); // Or whatever we want to call it prealloc = kmalloc(..); rcu_read_lock(); redo_cpu(); } } kfree(prealloc); which is certainly more work than a sleeping lock is, but this looks like a initialization pattern. Looking at the things that are actually performance-critical, they tend to be things like "gather all the statistics for all CPUs". Honesty in advertizing: I only grepped a couple of users of get_online_cpus(), and they all seemed like they'd be perfectly happy with preemption-off. But maybe I was just lucky in my few samples: I have anecdotes, not data. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2
On 10/10/2013 08:56 PM, Oleg Nesterov wrote: > On 10/10, Ingo Molnar wrote: >> >> * Peter Zijlstra wrote: >> >>> But the thing is; our sense of NR_CPUS has shifted, where it used to be >>> ok to do something like: >>> >>> for_each_cpu() >>> >>> With preemption disabled; it gets to be less and less sane to do so, >>> simply because 'common' hardware has 256+ CPUs these days. If we cannot >>> rely on preempt disable to exclude hotplug, we must use >>> get_online_cpus(), but get_online_cpus() is global state and thus cannot >>> be used at any sort of frequency. >> >> So ... why not make it _really_ cheap, i.e. the read lock costing nothing, >> and tie CPU hotplug to freezing all tasks in the system? >> >> Actual CPU hot unplugging and repluggin is _ridiculously_ rare in a >> system, I don't understand how we tolerate _any_ overhead from this utter >> slowpath. > > Well, iirc Srivatsa (cc'ed) pointed out that some systems do cpu_down/up > quite often to save the power. > Yes, I've heard of such systems and so I might have brought them up during discussions about CPU hotplug. But unfortunately, I have been misquoted quite often, leading to the wrong impression that I have such a usecase or that I recommend/support using CPU hotplug for power management. So let me clarify that part, while I have the chance. (And I don't blame anyone for that. I work on power-management related areas, and I've worked on improving/optimizing CPU hotplug; so its pretty natural to make a connection between the two and assume that I tried to optimize CPU hotplug keeping power management in mind. But that's not the case, as I explain below.) I started out trying to make suspend/resume more reliable, scalable and fast. And suspend/resume uses CPU hotplug underneath and that's a pretty valid usecase. So with that, I started looking at CPU hotplug and soon realized the mess it had become. So I started working on cleaning up that mess, like rethinking the whole notifier scheme[1], and removing the ridiculous stop_machine() from the cpu_down path[2] etc. But the intention behind all this work was just to make CPU hotplug cleaner/saner/bug-free and possibly speed up suspend/resume. IOW, I didn't have any explicit intention to make it easier for people to use it for power management, although I understood that some of this work might help those poor souls who don't have any other choice, for whatever reason. And fortunately, (IIUC) the number of systems/people relying on CPU hotplug for power management has reduced quite a bit in the recent times, which is a very good thing. So, to reiterate, I totally agree that power-aware scheduler is the right way to do CPU power management; CPU hotplug is simply not the tool to use for that. No question about that. Also, system shutdown used to depend on CPU hotplug to disable the non-boot CPUs, but we don't do that any more after commit cf7df378a, which is a very welcome change. And in future if we can somehow do suspend/resume without using CPU hotplug, that would be absolutely wonderful as well. (There have been discussions in the past around this, but nobody has a solution yet). The other valid usecases that I can think of, for using CPU hotplug, is for RAS reasons and for DLPAR (Dynamic Logical Partitioning) operations on powerpc, both of which are not performance-sensitive, AFAIK. [1]. Reverse invocation of CPU hotplug notifiers http://lwn.net/Articles/508072/ [2]. Stop-machine()-free CPU hotplug http://lwn.net/Articles/538819/ (v6) http://lwn.net/Articles/556727/ Regards, Srivatsa S. Bhat -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2
On Thu, Oct 10, 2013 at 02:43:27PM -0400, Steven Rostedt wrote: > On Thu, 10 Oct 2013 11:10:35 -0700 > Linus Torvalds wrote: > .. now we can free all the percpu data and kill the CPU .. > > > > without any locking anywhere - not stop-machine, not anything. If > > somebody is doing a "for_each_cpu()" (under just a regular > > rcu_read_lock()) and they see the bit set while it's going down, who > > cares? The CPU is still there, the data is accessible.. > > The problem is that rcu_read_lock() requires preemption disabled unless > you are using the preemptable rcu tree version. There's always > srcu_read_lock() but that's not so free. It's basically the same as > what Peter is doing. No, srcu is actually more expensive in the fast path. Although possibly we could make SRCU more complex ;-) > There's places in the kernel that does for_each_cpu() that I'm sure you > don't want to disable preemption for. Especially when you start having > 4096 CPU machines! :-) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2
On Thu, Oct 10, 2013 at 11:34 AM, Peter Zijlstra wrote: > > But my point is that even though there aren't many of these today; with > the growing number of cpus in 'commodity' hardware you want to move away > from using preempt_disable() as hotplug lock. Umm. Wasn't this pretty much the argument for the COMPLETELY F*CKED UP change to make the vma locking use a semaphore? The whole "it's more convenient to use sleeping locks" argument is PURE AND UTTER SHIT when it comes to really core code. We are *much* better off saying "this is core, we care so deeply about performance that you had better use your brain before you do this". Seriously. Your argument is bad, but more importantly, it is *dangerously* bad. It's crap that results in bad code: and the bad code is almost impossible to fix up later, because once you encourage people to do the easy thing, they'll do so. Yes, preempt_disable() is harder to use than sleeping locks. You need to do pre-allocation etc. But it is much *much* more efficient. And in the kernel, we care. We have the resources. Plus, we can also say "if you can't handle it, don't do it". We don't need new features so badly that we are willing to screw up core code. As to your kernel/events/core.c example, things like that are generally pretty easy to fix. The good thing about preemption-disable is: - it's pretty easily debuggable, because you do get big warnings of "you screwed up exactly _here_" - it's so cheap that especially when it comes to things that can sleep, it's perfectly ok to drop the preemption count, do the sleeping thing, re-disable preemption and then check if we still needed the data. Oh, and I'm sure there are several users that currently depend on being able to sleep over get_online_cpu's. But I'm pretty sure it's "several", not "hundreds", and I think we could fix them up. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2
On Thu, Oct 10, 2013 at 11:10:35AM -0700, Linus Torvalds wrote: > You can't do that right now - since you have to get the cpu list. So > it may not be with "preemption enabled", but it should always be under > the locking provided by get_online_cpus().. That one allows sleeping, > though. > > I personally would *love* to make CPU hotplug be a lockless thing > entirely. But I detest stop-machine too, because it has these really > annoying properties. > > So if we want to make it zero-cost to look at online CPU data, can we > avoid even the stop-machine synchronization, instead saying that the > cpu hotplug bitmap is updated completely locklessly, but if you see a > bit set, the data associated with that CPU is guaranteed to still be > available. > > IOW, just use "RCU semantics" on a per-bit level. When we offline a CPU, we do > > clear_bit(cpu, cpu_online_mask); > rcu_synchronize(); > .. now we can free all the percpu data and kill the CPU .. > > without any locking anywhere - not stop-machine, not anything. If > somebody is doing a "for_each_cpu()" (under just a regular > rcu_read_lock()) and they see the bit set while it's going down, who > cares? The CPU is still there, the data is accessible.. > > I'm sure there's some reason the above wouldn't work, but the above > would seem to be pretty optimal. Why do we really force this big > locking thing? The new patches make that locking _smarter_, but it's > still a damn big lock. Could we possibly go _beyond_ the lock? The only down-side to doing this is that you cannot actually allocate memory under rcu_read_lock() because it might not allow preemption. That said; I like the idea. I'll go try and audit the get_online_cpus() sites to see if there's any that really need full exclusion. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2
On Thu, 10 Oct 2013 11:10:35 -0700 Linus Torvalds wrote: .. now we can free all the percpu data and kill the CPU .. > > without any locking anywhere - not stop-machine, not anything. If > somebody is doing a "for_each_cpu()" (under just a regular > rcu_read_lock()) and they see the bit set while it's going down, who > cares? The CPU is still there, the data is accessible.. The problem is that rcu_read_lock() requires preemption disabled unless you are using the preemptable rcu tree version. There's always srcu_read_lock() but that's not so free. It's basically the same as what Peter is doing. There's places in the kernel that does for_each_cpu() that I'm sure you don't want to disable preemption for. Especially when you start having 4096 CPU machines! -- Steve -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2
On Thu, Oct 10, 2013 at 10:13:53AM -0700, Paul E. McKenney wrote: > That does add quite a bit of latency to the hotplug operations, which > IIRC slows down things like boot, suspend, and resume. Guess what suspend does before it unplugs all these cpus? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2
On Thu, Oct 10, 2013 at 10:48:56AM -0700, Andrew Morton wrote: > > > Very much agreed; now stop_machine() wouldn't actually work for hotplug > > > because it will instantly preempt everybody, including someone who might > > > be in the middle of using per-cpu state of the cpu we're about to > > > remove. > > > > Well, stop machine doesn't instantly preempt everybody. Only those that > > don't have preemption disabled. Using per_cpu without preemption > > disabled can be dangerous. Can be, but there's more of that around than you want. Also that's not the entire issue. > Yes, I'd have thought that the cases where a CPU is fiddling with > another CPU's percpu data with preemption enabled would be rather rare. Look at kernel/events/core.c:swevent_hlist_get() it does something like: get_online_cpus(); for_each_online_cpu() { allocate_hash_table(); } put_online_cpus(); Surely you don't want to do that with preemption disabled? But my point is that even though there aren't many of these today; with the growing number of cpus in 'commodity' hardware you want to move away from using preempt_disable() as hotplug lock. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2
On Thu, Oct 10, 2013 at 10:48 AM, Andrew Morton wrote: > > Yes, I'd have thought that the cases where a CPU is fiddling with > another CPU's percpu data with preemption enabled would be rather rare. > > I can't actually think of any off the top. Are there examples we can > look at? You can't do that right now - since you have to get the cpu list. So it may not be with "preemption enabled", but it should always be under the locking provided by get_online_cpus().. That one allows sleeping, though. I personally would *love* to make CPU hotplug be a lockless thing entirely. But I detest stop-machine too, because it has these really annoying properties. So if we want to make it zero-cost to look at online CPU data, can we avoid even the stop-machine synchronization, instead saying that the cpu hotplug bitmap is updated completely locklessly, but if you see a bit set, the data associated with that CPU is guaranteed to still be available. IOW, just use "RCU semantics" on a per-bit level. When we offline a CPU, we do clear_bit(cpu, cpu_online_mask); rcu_synchronize(); .. now we can free all the percpu data and kill the CPU .. without any locking anywhere - not stop-machine, not anything. If somebody is doing a "for_each_cpu()" (under just a regular rcu_read_lock()) and they see the bit set while it's going down, who cares? The CPU is still there, the data is accessible.. I'm sure there's some reason the above wouldn't work, but the above would seem to be pretty optimal. Why do we really force this big locking thing? The new patches make that locking _smarter_, but it's still a damn big lock. Could we possibly go _beyond_ the lock? Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2
On Thu, 10 Oct 2013 13:13:05 -0400 Steven Rostedt wrote: > > > > > > Why prevent all CPUs from running when we want to remove > > > > one? > > > > > > So get_online_cpus() goes away. Nothing is more scalable than nothing! > > > > Very much agreed; now stop_machine() wouldn't actually work for hotplug > > because it will instantly preempt everybody, including someone who might > > be in the middle of using per-cpu state of the cpu we're about to > > remove. > > Well, stop machine doesn't instantly preempt everybody. Only those that > don't have preemption disabled. Using per_cpu without preemption > disabled can be dangerous. Yes, I'd have thought that the cases where a CPU is fiddling with another CPU's percpu data with preemption enabled would be rather rare. I can't actually think of any off the top. Are there examples we can look at? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2
On 10/10, Peter Zijlstra wrote: > > The freeze suggestion from Ingo would actually work because we freeze > tasks at known good points Not really known/good, we have more and more freezable_schedule's. But probably this is fine, nobody should do this under get_online_cpus(). > (userspace and kthread_freeze() points) where > we know they're not fiddling with per-cpu state. We should also take care of PF_NOFREEZE kthreads. Oleg. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2
On Thu, Oct 10, 2013 at 06:52:29PM +0200, Ingo Molnar wrote: > > * Andrew Morton wrote: > > > On Thu, 10 Oct 2013 17:26:12 +0200 Oleg Nesterov wrote: > > > > > On 10/10, Ingo Molnar wrote: > > > > > > > > * Peter Zijlstra wrote: > > > > > > > > > But the thing is; our sense of NR_CPUS has shifted, where it used to > > > > > be > > > > > ok to do something like: > > > > > > > > > > for_each_cpu() > > > > > > > > > > With preemption disabled; it gets to be less and less sane to do > > > > > so, simply because 'common' hardware has 256+ CPUs these days. If > > > > > we cannot rely on preempt disable to exclude hotplug, we must use > > > > > get_online_cpus(), but get_online_cpus() is global state and thus > > > > > cannot be used at any sort of frequency. > > > > > > > > So ... why not make it _really_ cheap, i.e. the read lock costing > > > > nothing, and tie CPU hotplug to freezing all tasks in the system? > > > > > > > > Actual CPU hot unplugging and repluggin is _ridiculously_ rare in a > > > > system, I don't understand how we tolerate _any_ overhead from this > > > > utter slowpath. > > > > > > Well, iirc Srivatsa (cc'ed) pointed out that some systems do > > > cpu_down/up quite often to save the power. > > > > cpu hotremove already uses stop_machine, so such an approach shouldn't > > actually worsen things (a lot) for them? > > Also, using CPU hotremove to save power, instead of implementing proper > power aware scheduling, is very broken to begin with. In many cases, agreed. Particularly on devices where there is an easy way to send out regular updates and bug fixes, and especially if they are connected to AC power. But if I was in charge of creating a battery-powered multicore embedded device that was to ship in million-unit quantities, especially in the not-uncommon case where more than one CPU is needed only for specific actions, I would offline the extra CPUs except when I was sure they were needed. This would of course still rely on power-aware scheduling when more than one CPU was in use. The power-aware scheduling conserves energy for short-term variations (in which case offlining really is useless), but longer term "keep this CPU -off-" is provided by offlining. Yes, the Linux kernel code quality is quite good, but bugs still happen, and bugs can easily cause CPUs to power up unnecessarily. This doesn't have to happen very often to kill your battery. If the extra CPUs are offline, a large class of those bugs are rendered harmless. If you are going to have a very large number of difficult-to-update battery-powered devices in the field, defending against such bugs is worthwhile. Thanx, Paul -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2
* Paul E. McKenney wrote: > On Thu, Oct 10, 2013 at 06:50:46PM +0200, Ingo Molnar wrote: > > > > * Peter Zijlstra wrote: > > > > > On Thu, Oct 10, 2013 at 04:57:38PM +0200, Ingo Molnar wrote: > > > > > > > So ... why not make it _really_ cheap, i.e. the read lock costing > > > > nothing, and tie CPU hotplug to freezing all tasks in the system? > > > > > > Such that we freeze regular tasks in userspace and kernel tasks in > > > their special freezer callback so as to guarantee minimal state? > > > > Yes, absolutely. > > That does add quite a bit of latency to the hotplug operations, which > IIRC slows down things like boot, suspend, and resume. I think it should _speed up_ suspend: instead of bringing down each CPU one by one, they could just be all stopped. Same for bootup - it's not really a hotplug operation in the traditional sense, as all CPUs are 'present' at once, and they could be added in one group operation instead of serializing the bootup. Also, most CPU bootup sequences are serialized. It would slow down individual CPU hotplug/hotunplug - and that is what is a really rare operation outside broken schemes to save power ... Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2
On Thu, Oct 10, 2013 at 06:50:46PM +0200, Ingo Molnar wrote: > > * Peter Zijlstra wrote: > > > On Thu, Oct 10, 2013 at 04:57:38PM +0200, Ingo Molnar wrote: > > > > > So ... why not make it _really_ cheap, i.e. the read lock costing > > > nothing, and tie CPU hotplug to freezing all tasks in the system? > > > > Such that we freeze regular tasks in userspace and kernel tasks in their > > special freezer callback so as to guarantee minimal state? > > Yes, absolutely. That does add quite a bit of latency to the hotplug operations, which IIRC slows down things like boot, suspend, and resume. Thanx, Paul -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2
On Thu, 10 Oct 2013 18:53:37 +0200 Peter Zijlstra wrote: > On Thu, Oct 10, 2013 at 09:43:55AM -0700, Andrew Morton wrote: > > > But we would like to remove stomp machine from > > > CPU hotplug. > > > > We do? That's news. It wasn't mentioned in the changelog and should > > have been. Why? > > It would be an unrelated change to this and unrelated to the reasons as > to why I want a faster get_online_cpus(). Yeah, sorry for the confusion. My comment wasn't really about this change set but about stop machine and hotplug in general. Needing stop machine for hotplug has been a complaint by many, but off topic for this particular change set. > > > > Why prevent all CPUs from running when we want to remove > > > one? > > > > So get_online_cpus() goes away. Nothing is more scalable than nothing! > > Very much agreed; now stop_machine() wouldn't actually work for hotplug > because it will instantly preempt everybody, including someone who might > be in the middle of using per-cpu state of the cpu we're about to > remove. Well, stop machine doesn't instantly preempt everybody. Only those that don't have preemption disabled. Using per_cpu without preemption disabled can be dangerous. Except for the migrate disable we want to add for -rt. Then we can't rely on migrate disable and stop machine making sure per_cpu data isn't being used. Oh, and for threads that use per_cpu that are bound to a CPU. But then they need to be taken care of too for their CPU going off line. But this would also require the "get_online_cpu()s" to disable preemption as well. Not quite a "nothing" we are looking for. -- Steve -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2
On 10/10, Andrew Morton wrote: > > On Thu, 10 Oct 2013 17:26:12 +0200 Oleg Nesterov wrote: > > > On 10/10, Ingo Molnar wrote: > > > > > > So ... why not make it _really_ cheap, i.e. the read lock costing nothing, > > > and tie CPU hotplug to freezing all tasks in the system? > > > > > > Actual CPU hot unplugging and repluggin is _ridiculously_ rare in a > > > system, I don't understand how we tolerate _any_ overhead from this utter > > > slowpath. > > > > Well, iirc Srivatsa (cc'ed) pointed out that some systems do cpu_down/up > > quite often to save the power. > > cpu hotremove already uses stop_machine, And Srivatsa wants to remove it from cpu_down(). > so such an approach shouldn't > actually worsen things (a lot) for them? this depends on what this "freezing all tasks" actually means. I understood it as try_to_freeze_tasks/etc, looks too heavy... But my only point was, I am not sure we can assume that cpu_down/up is extremly rare and its cost does not matter. > use stop_machine() on the add/remove > (ie, "writer") side and nothing at all on the "reader" side. Is there > anything which fundamentally prevents cpu hotplug from doing the same? Well, then we actually need to park all tasks in system, I guess. IOW, freezer. Oleg. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2
On Thu, Oct 10, 2013 at 09:43:55AM -0700, Andrew Morton wrote: > > But we would like to remove stomp machine from > > CPU hotplug. > > We do? That's news. It wasn't mentioned in the changelog and should > have been. Why? It would be an unrelated change to this and unrelated to the reasons as to why I want a faster get_online_cpus(). > > Why prevent all CPUs from running when we want to remove > > one? > > So get_online_cpus() goes away. Nothing is more scalable than nothing! Very much agreed; now stop_machine() wouldn't actually work for hotplug because it will instantly preempt everybody, including someone who might be in the middle of using per-cpu state of the cpu we're about to remove. The freeze suggestion from Ingo would actually work because we freeze tasks at known good points (userspace and kthread_freeze() points) where we know they're not fiddling with per-cpu state. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2
* Andrew Morton wrote: > On Thu, 10 Oct 2013 17:26:12 +0200 Oleg Nesterov wrote: > > > On 10/10, Ingo Molnar wrote: > > > > > > * Peter Zijlstra wrote: > > > > > > > But the thing is; our sense of NR_CPUS has shifted, where it used to be > > > > ok to do something like: > > > > > > > > for_each_cpu() > > > > > > > > With preemption disabled; it gets to be less and less sane to do > > > > so, simply because 'common' hardware has 256+ CPUs these days. If > > > > we cannot rely on preempt disable to exclude hotplug, we must use > > > > get_online_cpus(), but get_online_cpus() is global state and thus > > > > cannot be used at any sort of frequency. > > > > > > So ... why not make it _really_ cheap, i.e. the read lock costing > > > nothing, and tie CPU hotplug to freezing all tasks in the system? > > > > > > Actual CPU hot unplugging and repluggin is _ridiculously_ rare in a > > > system, I don't understand how we tolerate _any_ overhead from this > > > utter slowpath. > > > > Well, iirc Srivatsa (cc'ed) pointed out that some systems do > > cpu_down/up quite often to save the power. > > cpu hotremove already uses stop_machine, so such an approach shouldn't > actually worsen things (a lot) for them? Also, using CPU hotremove to save power, instead of implementing proper power aware scheduling, is very broken to begin with. Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2
* Peter Zijlstra wrote: > On Thu, Oct 10, 2013 at 04:57:38PM +0200, Ingo Molnar wrote: > > > So ... why not make it _really_ cheap, i.e. the read lock costing > > nothing, and tie CPU hotplug to freezing all tasks in the system? > > Such that we freeze regular tasks in userspace and kernel tasks in their > special freezer callback so as to guarantee minimal state? Yes, absolutely. Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2
On Thu, 10 Oct 2013 12:36:31 -0400 Steven Rostedt wrote: > On Thu, 10 Oct 2013 09:00:44 -0700 > Andrew Morton wrote: > > > It's been ages since I looked at this stuff :( Although it isn't used > > much, memory hotplug manages to use stop_machine() on the add/remove > > (ie, "writer") side and nothing at all on the "reader" side. Is there > > anything which fundamentally prevents cpu hotplug from doing the same? > > > I would think that memory hotplug may require stop machine as all CPUs > may touch that memory. Sure. > But we would like to remove stomp machine from > CPU hotplug. We do? That's news. It wasn't mentioned in the changelog and should have been. Why? > Why prevent all CPUs from running when we want to remove > one? So get_online_cpus() goes away. Nothing is more scalable than nothing! -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2
On Thu, 10 Oct 2013 09:00:44 -0700 Andrew Morton wrote: > It's been ages since I looked at this stuff :( Although it isn't used > much, memory hotplug manages to use stop_machine() on the add/remove > (ie, "writer") side and nothing at all on the "reader" side. Is there > anything which fundamentally prevents cpu hotplug from doing the same? I would think that memory hotplug may require stop machine as all CPUs may touch that memory. But we would like to remove stomp machine from CPU hotplug. Why prevent all CPUs from running when we want to remove one? -- Steve -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2
On Thu, 10 Oct 2013 17:26:12 +0200 Oleg Nesterov wrote: > On 10/10, Ingo Molnar wrote: > > > > * Peter Zijlstra wrote: > > > > > But the thing is; our sense of NR_CPUS has shifted, where it used to be > > > ok to do something like: > > > > > > for_each_cpu() > > > > > > With preemption disabled; it gets to be less and less sane to do so, > > > simply because 'common' hardware has 256+ CPUs these days. If we cannot > > > rely on preempt disable to exclude hotplug, we must use > > > get_online_cpus(), but get_online_cpus() is global state and thus cannot > > > be used at any sort of frequency. > > > > So ... why not make it _really_ cheap, i.e. the read lock costing nothing, > > and tie CPU hotplug to freezing all tasks in the system? > > > > Actual CPU hot unplugging and repluggin is _ridiculously_ rare in a > > system, I don't understand how we tolerate _any_ overhead from this utter > > slowpath. > > Well, iirc Srivatsa (cc'ed) pointed out that some systems do cpu_down/up > quite often to save the power. cpu hotremove already uses stop_machine, so such an approach shouldn't actually worsen things (a lot) for them? It's been ages since I looked at this stuff :( Although it isn't used much, memory hotplug manages to use stop_machine() on the add/remove (ie, "writer") side and nothing at all on the "reader" side. Is there anything which fundamentally prevents cpu hotplug from doing the same? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2
On 10/10, Peter Zijlstra wrote: > > That said, Oleg wants to use the same scheme for percpu_rwsem, Yes, and later then (I hope) get_online_cpus() will be just current->cpuhp_ref++ || percpu_down_read(). (just in case, we only need ->cpuhp_ref to ensure that the readers can't starve the writers as it currently possible. iow to block the new readers but ensure that the recursive get/down_read can't block) And please look at sb_wait_write/sb_start_write. It uses the same logic except: - it doesn't have the "true" fast-path for readers - its "slow" mode is slower than necessary - it is buggy (afaics) > and I'm > not sure his write-side (something in uprobes) wants to do the same. surely not ;) Oleg. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2
On 10/10, Ingo Molnar wrote: > > * Peter Zijlstra wrote: > > > But the thing is; our sense of NR_CPUS has shifted, where it used to be > > ok to do something like: > > > > for_each_cpu() > > > > With preemption disabled; it gets to be less and less sane to do so, > > simply because 'common' hardware has 256+ CPUs these days. If we cannot > > rely on preempt disable to exclude hotplug, we must use > > get_online_cpus(), but get_online_cpus() is global state and thus cannot > > be used at any sort of frequency. > > So ... why not make it _really_ cheap, i.e. the read lock costing nothing, > and tie CPU hotplug to freezing all tasks in the system? > > Actual CPU hot unplugging and repluggin is _ridiculously_ rare in a > system, I don't understand how we tolerate _any_ overhead from this utter > slowpath. Well, iirc Srivatsa (cc'ed) pointed out that some systems do cpu_down/up quite often to save the power. Oleg. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2
On Thu, Oct 10, 2013 at 04:57:38PM +0200, Ingo Molnar wrote: > So ... why not make it _really_ cheap, i.e. the read lock costing nothing, > and tie CPU hotplug to freezing all tasks in the system? Such that we freeze regular tasks in userspace and kernel tasks in their special freezer callback so as to guarantee minimal state? If you can get people to agree with that I'm all for it. That said, Oleg wants to use the same scheme for percpu_rwsem, and I'm not sure his write-side (something in uprobes) wants to do the same. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2
* Peter Zijlstra wrote: > But the thing is; our sense of NR_CPUS has shifted, where it used to be > ok to do something like: > > for_each_cpu() > > With preemption disabled; it gets to be less and less sane to do so, > simply because 'common' hardware has 256+ CPUs these days. If we cannot > rely on preempt disable to exclude hotplug, we must use > get_online_cpus(), but get_online_cpus() is global state and thus cannot > be used at any sort of frequency. So ... why not make it _really_ cheap, i.e. the read lock costing nothing, and tie CPU hotplug to freezing all tasks in the system? Actual CPU hot unplugging and repluggin is _ridiculously_ rare in a system, I don't understand how we tolerate _any_ overhead from this utter slowpath. Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2
On Wed, Oct 09, 2013 at 10:50:06PM -0700, Andrew Morton wrote: > On Tue, 08 Oct 2013 12:25:05 +0200 Peter Zijlstra > wrote: > > > The current cpu hotplug lock is a single global lock; therefore excluding > > hotplug is a very expensive proposition even though it is rare occurrence > > under > > normal operation. > > > > There is a desire for a more light weight implementation of > > {get,put}_online_cpus() from both the NUMA scheduling as well as the -RT > > side. > > > > The current hotplug lock is a full reader preference lock -- and thus > > supports > > reader recursion. However since we're making the read side lock much > > cheaper it > > is the expectation that it will also be used far more. Which in turn would > > lead > > to writer starvation. > > > > Therefore the new lock proposed is completely fair; albeit somewhat > > expensive > > on the write side. This in turn means that we need a per-task nesting count > > to > > support reader recursion. > > This is a lot of code and a lot of new complexity. It needs some pretty > convincing performance numbers to justify its inclusion, no? And here I thought it was generally understood to be unwise to bash global state on anything like a regular manner from every cpu. The NUMA bits really ought to use get_online_cpus()/put_online_cpus() on every balance pass; which is about once a second on every cpu. RT -- which has some quite horrible hotplug hacks due to this -- basically takes get_online_cpus() for every spin_lock/spin_unlock in the kernel. But the thing is; our sense of NR_CPUS has shifted, where it used to be ok to do something like: for_each_cpu() With preemption disabled; it gets to be less and less sane to do so, simply because 'common' hardware has 256+ CPUs these days. If we cannot rely on preempt disable to exclude hotplug, we must use get_online_cpus(), but get_online_cpus() is global state and thus cannot be used at any sort of frequency. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2
* Andrew Morton wrote: > On Thu, 10 Oct 2013 09:27:57 +0200 Ingo Molnar wrote: > > > > > Should be fairly straightforward to test: the sys_sched_getaffinity() > > > > and sys_sched_setaffinity() syscalls both make use of > > > > get_online_cpus()/put_online_cpus(), so a testcase frobbing affinities > > > > on N CPUs in parallel ought to demonstrate scalability improvements > > > > pretty nicely. > > > > > > Well, an in-kernel microbenchmark which camps in a loop doing get/put > > > would measure this as well. > > > > > > But neither approach answers the question "how useful is this patchset". > > > > Even ignoring all the other reasons cited, sys_sched_getaffinity() / > > sys_sched_setaffinity() are prime time system calls, and as long as > > the patches are correct, speeding them up is worthwhile. > > That I would not have guessed. What's the use case for calling > get/set_affinity at high frequency? I don't think high-freq usage is common (at all). It could happen in AIM7-like workloads that start up a ton of binaries in parallel, which can easily create parallel sched_getaffinity() calls during process startup? Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2
On Thu, 10 Oct 2013 09:27:57 +0200 Ingo Molnar wrote: > > > Should be fairly straightforward to test: the sys_sched_getaffinity() > > > and sys_sched_setaffinity() syscalls both make use of > > > get_online_cpus()/put_online_cpus(), so a testcase frobbing affinities > > > on N CPUs in parallel ought to demonstrate scalability improvements > > > pretty nicely. > > > > Well, an in-kernel microbenchmark which camps in a loop doing get/put > > would measure this as well. > > > > But neither approach answers the question "how useful is this patchset". > > Even ignoring all the other reasons cited, sys_sched_getaffinity() / > sys_sched_setaffinity() are prime time system calls, and as long as the > patches are correct, speeding them up is worthwhile. That I would not have guessed. What's the use case for calling get/set_affinity at high frequency? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2
* Andrew Morton wrote: > On Thu, 10 Oct 2013 08:27:41 +0200 Ingo Molnar wrote: > > > * Andrew Morton wrote: > > > > > On Tue, 08 Oct 2013 12:25:05 +0200 Peter Zijlstra > > > wrote: > > > > > > > The current cpu hotplug lock is a single global lock; therefore > > > > excluding hotplug is a very expensive proposition even though it is > > > > rare occurrence under normal operation. > > > > > > > > There is a desire for a more light weight implementation of > > > > {get,put}_online_cpus() from both the NUMA scheduling as well as the > > > > -RT side. > > > > > > > > The current hotplug lock is a full reader preference lock -- and thus > > > > supports reader recursion. However since we're making the read side > > > > lock much cheaper it is the expectation that it will also be used far > > > > more. Which in turn would lead to writer starvation. > > > > > > > > Therefore the new lock proposed is completely fair; albeit somewhat > > > > expensive on the write side. This in turn means that we need a > > > > per-task nesting count to support reader recursion. > > > > > > This is a lot of code and a lot of new complexity. It needs some pretty > > > convincing performance numbers to justify its inclusion, no? > > > > Should be fairly straightforward to test: the sys_sched_getaffinity() > > and sys_sched_setaffinity() syscalls both make use of > > get_online_cpus()/put_online_cpus(), so a testcase frobbing affinities > > on N CPUs in parallel ought to demonstrate scalability improvements > > pretty nicely. > > Well, an in-kernel microbenchmark which camps in a loop doing get/put > would measure this as well. > > But neither approach answers the question "how useful is this patchset". Even ignoring all the other reasons cited, sys_sched_getaffinity() / sys_sched_setaffinity() are prime time system calls, and as long as the patches are correct, speeding them up is worthwhile. Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2
On Thu, 10 Oct 2013 08:27:41 +0200 Ingo Molnar wrote: > * Andrew Morton wrote: > > > On Tue, 08 Oct 2013 12:25:05 +0200 Peter Zijlstra > > wrote: > > > > > The current cpu hotplug lock is a single global lock; therefore > > > excluding hotplug is a very expensive proposition even though it is > > > rare occurrence under normal operation. > > > > > > There is a desire for a more light weight implementation of > > > {get,put}_online_cpus() from both the NUMA scheduling as well as the > > > -RT side. > > > > > > The current hotplug lock is a full reader preference lock -- and thus > > > supports reader recursion. However since we're making the read side > > > lock much cheaper it is the expectation that it will also be used far > > > more. Which in turn would lead to writer starvation. > > > > > > Therefore the new lock proposed is completely fair; albeit somewhat > > > expensive on the write side. This in turn means that we need a > > > per-task nesting count to support reader recursion. > > > > This is a lot of code and a lot of new complexity. It needs some pretty > > convincing performance numbers to justify its inclusion, no? > > Should be fairly straightforward to test: the sys_sched_getaffinity() and > sys_sched_setaffinity() syscalls both make use of > get_online_cpus()/put_online_cpus(), so a testcase frobbing affinities on > N CPUs in parallel ought to demonstrate scalability improvements pretty > nicely. Well, an in-kernel microbenchmark which camps in a loop doing get/put would measure this as well. But neither approach answers the question "how useful is this patchset". -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2
* Andrew Morton wrote: > On Tue, 08 Oct 2013 12:25:05 +0200 Peter Zijlstra > wrote: > > > The current cpu hotplug lock is a single global lock; therefore > > excluding hotplug is a very expensive proposition even though it is > > rare occurrence under normal operation. > > > > There is a desire for a more light weight implementation of > > {get,put}_online_cpus() from both the NUMA scheduling as well as the > > -RT side. > > > > The current hotplug lock is a full reader preference lock -- and thus > > supports reader recursion. However since we're making the read side > > lock much cheaper it is the expectation that it will also be used far > > more. Which in turn would lead to writer starvation. > > > > Therefore the new lock proposed is completely fair; albeit somewhat > > expensive on the write side. This in turn means that we need a > > per-task nesting count to support reader recursion. > > This is a lot of code and a lot of new complexity. It needs some pretty > convincing performance numbers to justify its inclusion, no? Should be fairly straightforward to test: the sys_sched_getaffinity() and sys_sched_setaffinity() syscalls both make use of get_online_cpus()/put_online_cpus(), so a testcase frobbing affinities on N CPUs in parallel ought to demonstrate scalability improvements pretty nicely. [ It's not just about affinities: in particular sys_sched_getaffinity() also gets used as a NR_CPUS runtime detection method in apps, so it matters to regular non-affine loads as well. ] Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2
* Andrew Morton a...@linux-foundation.org wrote: On Tue, 08 Oct 2013 12:25:05 +0200 Peter Zijlstra pet...@infradead.org wrote: The current cpu hotplug lock is a single global lock; therefore excluding hotplug is a very expensive proposition even though it is rare occurrence under normal operation. There is a desire for a more light weight implementation of {get,put}_online_cpus() from both the NUMA scheduling as well as the -RT side. The current hotplug lock is a full reader preference lock -- and thus supports reader recursion. However since we're making the read side lock much cheaper it is the expectation that it will also be used far more. Which in turn would lead to writer starvation. Therefore the new lock proposed is completely fair; albeit somewhat expensive on the write side. This in turn means that we need a per-task nesting count to support reader recursion. This is a lot of code and a lot of new complexity. It needs some pretty convincing performance numbers to justify its inclusion, no? Should be fairly straightforward to test: the sys_sched_getaffinity() and sys_sched_setaffinity() syscalls both make use of get_online_cpus()/put_online_cpus(), so a testcase frobbing affinities on N CPUs in parallel ought to demonstrate scalability improvements pretty nicely. [ It's not just about affinities: in particular sys_sched_getaffinity() also gets used as a NR_CPUS runtime detection method in apps, so it matters to regular non-affine loads as well. ] Thanks, Ingo -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2
On Thu, 10 Oct 2013 08:27:41 +0200 Ingo Molnar mi...@kernel.org wrote: * Andrew Morton a...@linux-foundation.org wrote: On Tue, 08 Oct 2013 12:25:05 +0200 Peter Zijlstra pet...@infradead.org wrote: The current cpu hotplug lock is a single global lock; therefore excluding hotplug is a very expensive proposition even though it is rare occurrence under normal operation. There is a desire for a more light weight implementation of {get,put}_online_cpus() from both the NUMA scheduling as well as the -RT side. The current hotplug lock is a full reader preference lock -- and thus supports reader recursion. However since we're making the read side lock much cheaper it is the expectation that it will also be used far more. Which in turn would lead to writer starvation. Therefore the new lock proposed is completely fair; albeit somewhat expensive on the write side. This in turn means that we need a per-task nesting count to support reader recursion. This is a lot of code and a lot of new complexity. It needs some pretty convincing performance numbers to justify its inclusion, no? Should be fairly straightforward to test: the sys_sched_getaffinity() and sys_sched_setaffinity() syscalls both make use of get_online_cpus()/put_online_cpus(), so a testcase frobbing affinities on N CPUs in parallel ought to demonstrate scalability improvements pretty nicely. Well, an in-kernel microbenchmark which camps in a loop doing get/put would measure this as well. But neither approach answers the question how useful is this patchset. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2
* Andrew Morton a...@linux-foundation.org wrote: On Thu, 10 Oct 2013 08:27:41 +0200 Ingo Molnar mi...@kernel.org wrote: * Andrew Morton a...@linux-foundation.org wrote: On Tue, 08 Oct 2013 12:25:05 +0200 Peter Zijlstra pet...@infradead.org wrote: The current cpu hotplug lock is a single global lock; therefore excluding hotplug is a very expensive proposition even though it is rare occurrence under normal operation. There is a desire for a more light weight implementation of {get,put}_online_cpus() from both the NUMA scheduling as well as the -RT side. The current hotplug lock is a full reader preference lock -- and thus supports reader recursion. However since we're making the read side lock much cheaper it is the expectation that it will also be used far more. Which in turn would lead to writer starvation. Therefore the new lock proposed is completely fair; albeit somewhat expensive on the write side. This in turn means that we need a per-task nesting count to support reader recursion. This is a lot of code and a lot of new complexity. It needs some pretty convincing performance numbers to justify its inclusion, no? Should be fairly straightforward to test: the sys_sched_getaffinity() and sys_sched_setaffinity() syscalls both make use of get_online_cpus()/put_online_cpus(), so a testcase frobbing affinities on N CPUs in parallel ought to demonstrate scalability improvements pretty nicely. Well, an in-kernel microbenchmark which camps in a loop doing get/put would measure this as well. But neither approach answers the question how useful is this patchset. Even ignoring all the other reasons cited, sys_sched_getaffinity() / sys_sched_setaffinity() are prime time system calls, and as long as the patches are correct, speeding them up is worthwhile. Thanks, Ingo -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2
On Thu, 10 Oct 2013 09:27:57 +0200 Ingo Molnar mi...@kernel.org wrote: Should be fairly straightforward to test: the sys_sched_getaffinity() and sys_sched_setaffinity() syscalls both make use of get_online_cpus()/put_online_cpus(), so a testcase frobbing affinities on N CPUs in parallel ought to demonstrate scalability improvements pretty nicely. Well, an in-kernel microbenchmark which camps in a loop doing get/put would measure this as well. But neither approach answers the question how useful is this patchset. Even ignoring all the other reasons cited, sys_sched_getaffinity() / sys_sched_setaffinity() are prime time system calls, and as long as the patches are correct, speeding them up is worthwhile. That I would not have guessed. What's the use case for calling get/set_affinity at high frequency? -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2
* Andrew Morton a...@linux-foundation.org wrote: On Thu, 10 Oct 2013 09:27:57 +0200 Ingo Molnar mi...@kernel.org wrote: Should be fairly straightforward to test: the sys_sched_getaffinity() and sys_sched_setaffinity() syscalls both make use of get_online_cpus()/put_online_cpus(), so a testcase frobbing affinities on N CPUs in parallel ought to demonstrate scalability improvements pretty nicely. Well, an in-kernel microbenchmark which camps in a loop doing get/put would measure this as well. But neither approach answers the question how useful is this patchset. Even ignoring all the other reasons cited, sys_sched_getaffinity() / sys_sched_setaffinity() are prime time system calls, and as long as the patches are correct, speeding them up is worthwhile. That I would not have guessed. What's the use case for calling get/set_affinity at high frequency? I don't think high-freq usage is common (at all). It could happen in AIM7-like workloads that start up a ton of binaries in parallel, which can easily create parallel sched_getaffinity() calls during process startup? Thanks, Ingo -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2
On Wed, Oct 09, 2013 at 10:50:06PM -0700, Andrew Morton wrote: On Tue, 08 Oct 2013 12:25:05 +0200 Peter Zijlstra pet...@infradead.org wrote: The current cpu hotplug lock is a single global lock; therefore excluding hotplug is a very expensive proposition even though it is rare occurrence under normal operation. There is a desire for a more light weight implementation of {get,put}_online_cpus() from both the NUMA scheduling as well as the -RT side. The current hotplug lock is a full reader preference lock -- and thus supports reader recursion. However since we're making the read side lock much cheaper it is the expectation that it will also be used far more. Which in turn would lead to writer starvation. Therefore the new lock proposed is completely fair; albeit somewhat expensive on the write side. This in turn means that we need a per-task nesting count to support reader recursion. This is a lot of code and a lot of new complexity. It needs some pretty convincing performance numbers to justify its inclusion, no? And here I thought it was generally understood to be unwise to bash global state on anything like a regular manner from every cpu. The NUMA bits really ought to use get_online_cpus()/put_online_cpus() on every balance pass; which is about once a second on every cpu. RT -- which has some quite horrible hotplug hacks due to this -- basically takes get_online_cpus() for every spin_lock/spin_unlock in the kernel. But the thing is; our sense of NR_CPUS has shifted, where it used to be ok to do something like: for_each_cpu() With preemption disabled; it gets to be less and less sane to do so, simply because 'common' hardware has 256+ CPUs these days. If we cannot rely on preempt disable to exclude hotplug, we must use get_online_cpus(), but get_online_cpus() is global state and thus cannot be used at any sort of frequency. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2
* Peter Zijlstra pet...@infradead.org wrote: But the thing is; our sense of NR_CPUS has shifted, where it used to be ok to do something like: for_each_cpu() With preemption disabled; it gets to be less and less sane to do so, simply because 'common' hardware has 256+ CPUs these days. If we cannot rely on preempt disable to exclude hotplug, we must use get_online_cpus(), but get_online_cpus() is global state and thus cannot be used at any sort of frequency. So ... why not make it _really_ cheap, i.e. the read lock costing nothing, and tie CPU hotplug to freezing all tasks in the system? Actual CPU hot unplugging and repluggin is _ridiculously_ rare in a system, I don't understand how we tolerate _any_ overhead from this utter slowpath. Thanks, Ingo -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2
On Thu, Oct 10, 2013 at 04:57:38PM +0200, Ingo Molnar wrote: So ... why not make it _really_ cheap, i.e. the read lock costing nothing, and tie CPU hotplug to freezing all tasks in the system? Such that we freeze regular tasks in userspace and kernel tasks in their special freezer callback so as to guarantee minimal state? If you can get people to agree with that I'm all for it. That said, Oleg wants to use the same scheme for percpu_rwsem, and I'm not sure his write-side (something in uprobes) wants to do the same. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2
On 10/10, Ingo Molnar wrote: * Peter Zijlstra pet...@infradead.org wrote: But the thing is; our sense of NR_CPUS has shifted, where it used to be ok to do something like: for_each_cpu() With preemption disabled; it gets to be less and less sane to do so, simply because 'common' hardware has 256+ CPUs these days. If we cannot rely on preempt disable to exclude hotplug, we must use get_online_cpus(), but get_online_cpus() is global state and thus cannot be used at any sort of frequency. So ... why not make it _really_ cheap, i.e. the read lock costing nothing, and tie CPU hotplug to freezing all tasks in the system? Actual CPU hot unplugging and repluggin is _ridiculously_ rare in a system, I don't understand how we tolerate _any_ overhead from this utter slowpath. Well, iirc Srivatsa (cc'ed) pointed out that some systems do cpu_down/up quite often to save the power. Oleg. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2
On 10/10, Peter Zijlstra wrote: That said, Oleg wants to use the same scheme for percpu_rwsem, Yes, and later then (I hope) get_online_cpus() will be just current-cpuhp_ref++ || percpu_down_read(). (just in case, we only need -cpuhp_ref to ensure that the readers can't starve the writers as it currently possible. iow to block the new readers but ensure that the recursive get/down_read can't block) And please look at sb_wait_write/sb_start_write. It uses the same logic except: - it doesn't have the true fast-path for readers - its slow mode is slower than necessary - it is buggy (afaics) and I'm not sure his write-side (something in uprobes) wants to do the same. surely not ;) Oleg. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2
On Thu, 10 Oct 2013 17:26:12 +0200 Oleg Nesterov o...@redhat.com wrote: On 10/10, Ingo Molnar wrote: * Peter Zijlstra pet...@infradead.org wrote: But the thing is; our sense of NR_CPUS has shifted, where it used to be ok to do something like: for_each_cpu() With preemption disabled; it gets to be less and less sane to do so, simply because 'common' hardware has 256+ CPUs these days. If we cannot rely on preempt disable to exclude hotplug, we must use get_online_cpus(), but get_online_cpus() is global state and thus cannot be used at any sort of frequency. So ... why not make it _really_ cheap, i.e. the read lock costing nothing, and tie CPU hotplug to freezing all tasks in the system? Actual CPU hot unplugging and repluggin is _ridiculously_ rare in a system, I don't understand how we tolerate _any_ overhead from this utter slowpath. Well, iirc Srivatsa (cc'ed) pointed out that some systems do cpu_down/up quite often to save the power. cpu hotremove already uses stop_machine, so such an approach shouldn't actually worsen things (a lot) for them? It's been ages since I looked at this stuff :( Although it isn't used much, memory hotplug manages to use stop_machine() on the add/remove (ie, writer) side and nothing at all on the reader side. Is there anything which fundamentally prevents cpu hotplug from doing the same? -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2
On Thu, 10 Oct 2013 09:00:44 -0700 Andrew Morton a...@linux-foundation.org wrote: It's been ages since I looked at this stuff :( Although it isn't used much, memory hotplug manages to use stop_machine() on the add/remove (ie, writer) side and nothing at all on the reader side. Is there anything which fundamentally prevents cpu hotplug from doing the same? I would think that memory hotplug may require stop machine as all CPUs may touch that memory. But we would like to remove stomp machine from CPU hotplug. Why prevent all CPUs from running when we want to remove one? -- Steve -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2
On Thu, 10 Oct 2013 12:36:31 -0400 Steven Rostedt rost...@goodmis.org wrote: On Thu, 10 Oct 2013 09:00:44 -0700 Andrew Morton a...@linux-foundation.org wrote: It's been ages since I looked at this stuff :( Although it isn't used much, memory hotplug manages to use stop_machine() on the add/remove (ie, writer) side and nothing at all on the reader side. Is there anything which fundamentally prevents cpu hotplug from doing the same? I would think that memory hotplug may require stop machine as all CPUs may touch that memory. Sure. But we would like to remove stomp machine from CPU hotplug. We do? That's news. It wasn't mentioned in the changelog and should have been. Why? Why prevent all CPUs from running when we want to remove one? So get_online_cpus() goes away. Nothing is more scalable than nothing! -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2
* Peter Zijlstra pet...@infradead.org wrote: On Thu, Oct 10, 2013 at 04:57:38PM +0200, Ingo Molnar wrote: So ... why not make it _really_ cheap, i.e. the read lock costing nothing, and tie CPU hotplug to freezing all tasks in the system? Such that we freeze regular tasks in userspace and kernel tasks in their special freezer callback so as to guarantee minimal state? Yes, absolutely. Thanks, Ingo -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2
* Andrew Morton a...@linux-foundation.org wrote: On Thu, 10 Oct 2013 17:26:12 +0200 Oleg Nesterov o...@redhat.com wrote: On 10/10, Ingo Molnar wrote: * Peter Zijlstra pet...@infradead.org wrote: But the thing is; our sense of NR_CPUS has shifted, where it used to be ok to do something like: for_each_cpu() With preemption disabled; it gets to be less and less sane to do so, simply because 'common' hardware has 256+ CPUs these days. If we cannot rely on preempt disable to exclude hotplug, we must use get_online_cpus(), but get_online_cpus() is global state and thus cannot be used at any sort of frequency. So ... why not make it _really_ cheap, i.e. the read lock costing nothing, and tie CPU hotplug to freezing all tasks in the system? Actual CPU hot unplugging and repluggin is _ridiculously_ rare in a system, I don't understand how we tolerate _any_ overhead from this utter slowpath. Well, iirc Srivatsa (cc'ed) pointed out that some systems do cpu_down/up quite often to save the power. cpu hotremove already uses stop_machine, so such an approach shouldn't actually worsen things (a lot) for them? Also, using CPU hotremove to save power, instead of implementing proper power aware scheduling, is very broken to begin with. Thanks, Ingo -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2
On Thu, Oct 10, 2013 at 09:43:55AM -0700, Andrew Morton wrote: But we would like to remove stomp machine from CPU hotplug. We do? That's news. It wasn't mentioned in the changelog and should have been. Why? It would be an unrelated change to this and unrelated to the reasons as to why I want a faster get_online_cpus(). Why prevent all CPUs from running when we want to remove one? So get_online_cpus() goes away. Nothing is more scalable than nothing! Very much agreed; now stop_machine() wouldn't actually work for hotplug because it will instantly preempt everybody, including someone who might be in the middle of using per-cpu state of the cpu we're about to remove. The freeze suggestion from Ingo would actually work because we freeze tasks at known good points (userspace and kthread_freeze() points) where we know they're not fiddling with per-cpu state. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2
On 10/10, Andrew Morton wrote: On Thu, 10 Oct 2013 17:26:12 +0200 Oleg Nesterov o...@redhat.com wrote: On 10/10, Ingo Molnar wrote: So ... why not make it _really_ cheap, i.e. the read lock costing nothing, and tie CPU hotplug to freezing all tasks in the system? Actual CPU hot unplugging and repluggin is _ridiculously_ rare in a system, I don't understand how we tolerate _any_ overhead from this utter slowpath. Well, iirc Srivatsa (cc'ed) pointed out that some systems do cpu_down/up quite often to save the power. cpu hotremove already uses stop_machine, And Srivatsa wants to remove it from cpu_down(). so such an approach shouldn't actually worsen things (a lot) for them? this depends on what this freezing all tasks actually means. I understood it as try_to_freeze_tasks/etc, looks too heavy... But my only point was, I am not sure we can assume that cpu_down/up is extremly rare and its cost does not matter. use stop_machine() on the add/remove (ie, writer) side and nothing at all on the reader side. Is there anything which fundamentally prevents cpu hotplug from doing the same? Well, then we actually need to park all tasks in system, I guess. IOW, freezer. Oleg. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2
On Thu, 10 Oct 2013 18:53:37 +0200 Peter Zijlstra pet...@infradead.org wrote: On Thu, Oct 10, 2013 at 09:43:55AM -0700, Andrew Morton wrote: But we would like to remove stomp machine from CPU hotplug. We do? That's news. It wasn't mentioned in the changelog and should have been. Why? It would be an unrelated change to this and unrelated to the reasons as to why I want a faster get_online_cpus(). Yeah, sorry for the confusion. My comment wasn't really about this change set but about stop machine and hotplug in general. Needing stop machine for hotplug has been a complaint by many, but off topic for this particular change set. Why prevent all CPUs from running when we want to remove one? So get_online_cpus() goes away. Nothing is more scalable than nothing! Very much agreed; now stop_machine() wouldn't actually work for hotplug because it will instantly preempt everybody, including someone who might be in the middle of using per-cpu state of the cpu we're about to remove. Well, stop machine doesn't instantly preempt everybody. Only those that don't have preemption disabled. Using per_cpu without preemption disabled can be dangerous. Except for the migrate disable we want to add for -rt. Then we can't rely on migrate disable and stop machine making sure per_cpu data isn't being used. Oh, and for threads that use per_cpu that are bound to a CPU. But then they need to be taken care of too for their CPU going off line. But this would also require the get_online_cpu()s to disable preemption as well. Not quite a nothing we are looking for. -- Steve -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2
On Thu, Oct 10, 2013 at 06:50:46PM +0200, Ingo Molnar wrote: * Peter Zijlstra pet...@infradead.org wrote: On Thu, Oct 10, 2013 at 04:57:38PM +0200, Ingo Molnar wrote: So ... why not make it _really_ cheap, i.e. the read lock costing nothing, and tie CPU hotplug to freezing all tasks in the system? Such that we freeze regular tasks in userspace and kernel tasks in their special freezer callback so as to guarantee minimal state? Yes, absolutely. That does add quite a bit of latency to the hotplug operations, which IIRC slows down things like boot, suspend, and resume. Thanx, Paul -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2
* Paul E. McKenney paul...@linux.vnet.ibm.com wrote: On Thu, Oct 10, 2013 at 06:50:46PM +0200, Ingo Molnar wrote: * Peter Zijlstra pet...@infradead.org wrote: On Thu, Oct 10, 2013 at 04:57:38PM +0200, Ingo Molnar wrote: So ... why not make it _really_ cheap, i.e. the read lock costing nothing, and tie CPU hotplug to freezing all tasks in the system? Such that we freeze regular tasks in userspace and kernel tasks in their special freezer callback so as to guarantee minimal state? Yes, absolutely. That does add quite a bit of latency to the hotplug operations, which IIRC slows down things like boot, suspend, and resume. I think it should _speed up_ suspend: instead of bringing down each CPU one by one, they could just be all stopped. Same for bootup - it's not really a hotplug operation in the traditional sense, as all CPUs are 'present' at once, and they could be added in one group operation instead of serializing the bootup. Also, most CPU bootup sequences are serialized. It would slow down individual CPU hotplug/hotunplug - and that is what is a really rare operation outside broken schemes to save power ... Thanks, Ingo -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2
On Thu, Oct 10, 2013 at 06:52:29PM +0200, Ingo Molnar wrote: * Andrew Morton a...@linux-foundation.org wrote: On Thu, 10 Oct 2013 17:26:12 +0200 Oleg Nesterov o...@redhat.com wrote: On 10/10, Ingo Molnar wrote: * Peter Zijlstra pet...@infradead.org wrote: But the thing is; our sense of NR_CPUS has shifted, where it used to be ok to do something like: for_each_cpu() With preemption disabled; it gets to be less and less sane to do so, simply because 'common' hardware has 256+ CPUs these days. If we cannot rely on preempt disable to exclude hotplug, we must use get_online_cpus(), but get_online_cpus() is global state and thus cannot be used at any sort of frequency. So ... why not make it _really_ cheap, i.e. the read lock costing nothing, and tie CPU hotplug to freezing all tasks in the system? Actual CPU hot unplugging and repluggin is _ridiculously_ rare in a system, I don't understand how we tolerate _any_ overhead from this utter slowpath. Well, iirc Srivatsa (cc'ed) pointed out that some systems do cpu_down/up quite often to save the power. cpu hotremove already uses stop_machine, so such an approach shouldn't actually worsen things (a lot) for them? Also, using CPU hotremove to save power, instead of implementing proper power aware scheduling, is very broken to begin with. In many cases, agreed. Particularly on devices where there is an easy way to send out regular updates and bug fixes, and especially if they are connected to AC power. But if I was in charge of creating a battery-powered multicore embedded device that was to ship in million-unit quantities, especially in the not-uncommon case where more than one CPU is needed only for specific actions, I would offline the extra CPUs except when I was sure they were needed. This would of course still rely on power-aware scheduling when more than one CPU was in use. The power-aware scheduling conserves energy for short-term variations (in which case offlining really is useless), but longer term keep this CPU -off- is provided by offlining. Yes, the Linux kernel code quality is quite good, but bugs still happen, and bugs can easily cause CPUs to power up unnecessarily. This doesn't have to happen very often to kill your battery. If the extra CPUs are offline, a large class of those bugs are rendered harmless. If you are going to have a very large number of difficult-to-update battery-powered devices in the field, defending against such bugs is worthwhile. Thanx, Paul -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2
On 10/10, Peter Zijlstra wrote: The freeze suggestion from Ingo would actually work because we freeze tasks at known good points Not really known/good, we have more and more freezable_schedule's. But probably this is fine, nobody should do this under get_online_cpus(). (userspace and kthread_freeze() points) where we know they're not fiddling with per-cpu state. We should also take care of PF_NOFREEZE kthreads. Oleg. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2
On Thu, 10 Oct 2013 13:13:05 -0400 Steven Rostedt rost...@goodmis.org wrote: Why prevent all CPUs from running when we want to remove one? So get_online_cpus() goes away. Nothing is more scalable than nothing! Very much agreed; now stop_machine() wouldn't actually work for hotplug because it will instantly preempt everybody, including someone who might be in the middle of using per-cpu state of the cpu we're about to remove. Well, stop machine doesn't instantly preempt everybody. Only those that don't have preemption disabled. Using per_cpu without preemption disabled can be dangerous. Yes, I'd have thought that the cases where a CPU is fiddling with another CPU's percpu data with preemption enabled would be rather rare. I can't actually think of any off the top. Are there examples we can look at? -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2
On Thu, Oct 10, 2013 at 10:48 AM, Andrew Morton a...@linux-foundation.org wrote: Yes, I'd have thought that the cases where a CPU is fiddling with another CPU's percpu data with preemption enabled would be rather rare. I can't actually think of any off the top. Are there examples we can look at? You can't do that right now - since you have to get the cpu list. So it may not be with preemption enabled, but it should always be under the locking provided by get_online_cpus().. That one allows sleeping, though. I personally would *love* to make CPU hotplug be a lockless thing entirely. But I detest stop-machine too, because it has these really annoying properties. So if we want to make it zero-cost to look at online CPU data, can we avoid even the stop-machine synchronization, instead saying that the cpu hotplug bitmap is updated completely locklessly, but if you see a bit set, the data associated with that CPU is guaranteed to still be available. IOW, just use RCU semantics on a per-bit level. When we offline a CPU, we do clear_bit(cpu, cpu_online_mask); rcu_synchronize(); .. now we can free all the percpu data and kill the CPU .. without any locking anywhere - not stop-machine, not anything. If somebody is doing a for_each_cpu() (under just a regular rcu_read_lock()) and they see the bit set while it's going down, who cares? The CPU is still there, the data is accessible.. I'm sure there's some reason the above wouldn't work, but the above would seem to be pretty optimal. Why do we really force this big locking thing? The new patches make that locking _smarter_, but it's still a damn big lock. Could we possibly go _beyond_ the lock? Linus -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2
On Thu, Oct 10, 2013 at 10:48:56AM -0700, Andrew Morton wrote: Very much agreed; now stop_machine() wouldn't actually work for hotplug because it will instantly preempt everybody, including someone who might be in the middle of using per-cpu state of the cpu we're about to remove. Well, stop machine doesn't instantly preempt everybody. Only those that don't have preemption disabled. Using per_cpu without preemption disabled can be dangerous. Can be, but there's more of that around than you want. Also that's not the entire issue. Yes, I'd have thought that the cases where a CPU is fiddling with another CPU's percpu data with preemption enabled would be rather rare. Look at kernel/events/core.c:swevent_hlist_get() it does something like: get_online_cpus(); for_each_online_cpu() { allocate_hash_table(); } put_online_cpus(); Surely you don't want to do that with preemption disabled? But my point is that even though there aren't many of these today; with the growing number of cpus in 'commodity' hardware you want to move away from using preempt_disable() as hotplug lock. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2
On Thu, Oct 10, 2013 at 10:13:53AM -0700, Paul E. McKenney wrote: That does add quite a bit of latency to the hotplug operations, which IIRC slows down things like boot, suspend, and resume. Guess what suspend does before it unplugs all these cpus? -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2
On Thu, 10 Oct 2013 11:10:35 -0700 Linus Torvalds torva...@linux-foundation.org wrote: .. now we can free all the percpu data and kill the CPU .. without any locking anywhere - not stop-machine, not anything. If somebody is doing a for_each_cpu() (under just a regular rcu_read_lock()) and they see the bit set while it's going down, who cares? The CPU is still there, the data is accessible.. The problem is that rcu_read_lock() requires preemption disabled unless you are using the preemptable rcu tree version. There's always srcu_read_lock() but that's not so free. It's basically the same as what Peter is doing. There's places in the kernel that does for_each_cpu() that I'm sure you don't want to disable preemption for. Especially when you start having 4096 CPU machines! -- Steve -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2
On Thu, Oct 10, 2013 at 11:10:35AM -0700, Linus Torvalds wrote: You can't do that right now - since you have to get the cpu list. So it may not be with preemption enabled, but it should always be under the locking provided by get_online_cpus().. That one allows sleeping, though. I personally would *love* to make CPU hotplug be a lockless thing entirely. But I detest stop-machine too, because it has these really annoying properties. So if we want to make it zero-cost to look at online CPU data, can we avoid even the stop-machine synchronization, instead saying that the cpu hotplug bitmap is updated completely locklessly, but if you see a bit set, the data associated with that CPU is guaranteed to still be available. IOW, just use RCU semantics on a per-bit level. When we offline a CPU, we do clear_bit(cpu, cpu_online_mask); rcu_synchronize(); .. now we can free all the percpu data and kill the CPU .. without any locking anywhere - not stop-machine, not anything. If somebody is doing a for_each_cpu() (under just a regular rcu_read_lock()) and they see the bit set while it's going down, who cares? The CPU is still there, the data is accessible.. I'm sure there's some reason the above wouldn't work, but the above would seem to be pretty optimal. Why do we really force this big locking thing? The new patches make that locking _smarter_, but it's still a damn big lock. Could we possibly go _beyond_ the lock? The only down-side to doing this is that you cannot actually allocate memory under rcu_read_lock() because it might not allow preemption. That said; I like the idea. I'll go try and audit the get_online_cpus() sites to see if there's any that really need full exclusion. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2
On Thu, Oct 10, 2013 at 11:34 AM, Peter Zijlstra pet...@infradead.org wrote: But my point is that even though there aren't many of these today; with the growing number of cpus in 'commodity' hardware you want to move away from using preempt_disable() as hotplug lock. Umm. Wasn't this pretty much the argument for the COMPLETELY F*CKED UP change to make the vma locking use a semaphore? The whole it's more convenient to use sleeping locks argument is PURE AND UTTER SHIT when it comes to really core code. We are *much* better off saying this is core, we care so deeply about performance that you had better use your brain before you do this. Seriously. Your argument is bad, but more importantly, it is *dangerously* bad. It's crap that results in bad code: and the bad code is almost impossible to fix up later, because once you encourage people to do the easy thing, they'll do so. Yes, preempt_disable() is harder to use than sleeping locks. You need to do pre-allocation etc. But it is much *much* more efficient. And in the kernel, we care. We have the resources. Plus, we can also say if you can't handle it, don't do it. We don't need new features so badly that we are willing to screw up core code. As to your kernel/events/core.c example, things like that are generally pretty easy to fix. The good thing about preemption-disable is: - it's pretty easily debuggable, because you do get big warnings of you screwed up exactly _here_ - it's so cheap that especially when it comes to things that can sleep, it's perfectly ok to drop the preemption count, do the sleeping thing, re-disable preemption and then check if we still needed the data. Oh, and I'm sure there are several users that currently depend on being able to sleep over get_online_cpu's. But I'm pretty sure it's several, not hundreds, and I think we could fix them up. Linus -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2
On Thu, Oct 10, 2013 at 02:43:27PM -0400, Steven Rostedt wrote: On Thu, 10 Oct 2013 11:10:35 -0700 Linus Torvalds torva...@linux-foundation.org wrote: .. now we can free all the percpu data and kill the CPU .. without any locking anywhere - not stop-machine, not anything. If somebody is doing a for_each_cpu() (under just a regular rcu_read_lock()) and they see the bit set while it's going down, who cares? The CPU is still there, the data is accessible.. The problem is that rcu_read_lock() requires preemption disabled unless you are using the preemptable rcu tree version. There's always srcu_read_lock() but that's not so free. It's basically the same as what Peter is doing. No, srcu is actually more expensive in the fast path. Although possibly we could make SRCU more complex ;-) There's places in the kernel that does for_each_cpu() that I'm sure you don't want to disable preemption for. Especially when you start having 4096 CPU machines! :-) -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2
On 10/10/2013 08:56 PM, Oleg Nesterov wrote: On 10/10, Ingo Molnar wrote: * Peter Zijlstra pet...@infradead.org wrote: But the thing is; our sense of NR_CPUS has shifted, where it used to be ok to do something like: for_each_cpu() With preemption disabled; it gets to be less and less sane to do so, simply because 'common' hardware has 256+ CPUs these days. If we cannot rely on preempt disable to exclude hotplug, we must use get_online_cpus(), but get_online_cpus() is global state and thus cannot be used at any sort of frequency. So ... why not make it _really_ cheap, i.e. the read lock costing nothing, and tie CPU hotplug to freezing all tasks in the system? Actual CPU hot unplugging and repluggin is _ridiculously_ rare in a system, I don't understand how we tolerate _any_ overhead from this utter slowpath. Well, iirc Srivatsa (cc'ed) pointed out that some systems do cpu_down/up quite often to save the power. Yes, I've heard of such systems and so I might have brought them up during discussions about CPU hotplug. But unfortunately, I have been misquoted quite often, leading to the wrong impression that I have such a usecase or that I recommend/support using CPU hotplug for power management. So let me clarify that part, while I have the chance. (And I don't blame anyone for that. I work on power-management related areas, and I've worked on improving/optimizing CPU hotplug; so its pretty natural to make a connection between the two and assume that I tried to optimize CPU hotplug keeping power management in mind. But that's not the case, as I explain below.) I started out trying to make suspend/resume more reliable, scalable and fast. And suspend/resume uses CPU hotplug underneath and that's a pretty valid usecase. So with that, I started looking at CPU hotplug and soon realized the mess it had become. So I started working on cleaning up that mess, like rethinking the whole notifier scheme[1], and removing the ridiculous stop_machine() from the cpu_down path[2] etc. But the intention behind all this work was just to make CPU hotplug cleaner/saner/bug-free and possibly speed up suspend/resume. IOW, I didn't have any explicit intention to make it easier for people to use it for power management, although I understood that some of this work might help those poor souls who don't have any other choice, for whatever reason. And fortunately, (IIUC) the number of systems/people relying on CPU hotplug for power management has reduced quite a bit in the recent times, which is a very good thing. So, to reiterate, I totally agree that power-aware scheduler is the right way to do CPU power management; CPU hotplug is simply not the tool to use for that. No question about that. Also, system shutdown used to depend on CPU hotplug to disable the non-boot CPUs, but we don't do that any more after commit cf7df378a, which is a very welcome change. And in future if we can somehow do suspend/resume without using CPU hotplug, that would be absolutely wonderful as well. (There have been discussions in the past around this, but nobody has a solution yet). The other valid usecases that I can think of, for using CPU hotplug, is for RAS reasons and for DLPAR (Dynamic Logical Partitioning) operations on powerpc, both of which are not performance-sensitive, AFAIK. [1]. Reverse invocation of CPU hotplug notifiers http://lwn.net/Articles/508072/ [2]. Stop-machine()-free CPU hotplug http://lwn.net/Articles/538819/ (v6) http://lwn.net/Articles/556727/ Regards, Srivatsa S. Bhat -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2
On Thu, Oct 10, 2013 at 11:43 AM, Steven Rostedt rost...@goodmis.org wrote: There's places in the kernel that does for_each_cpu() that I'm sure you don't want to disable preemption for. Especially when you start having 4096 CPU machines! We could _easily_ add preemption points in for_each_cpu() for the MAX_SMP case. We can even do it in the macro itself. So I wouldn't worry about things like that. I'd expect the Oh damn, we have to allocate case to be more painful, but I think that is fairly easily handled by having something like a redo_cpu() macro that you can call inside for_each_cpu(). So the pattern for the (nopefully-not-all-that-common) thing would be something like sometype *prealloc = NULL; for_each_cpu() { if (!prealloc) { rcu_read_unlock(); // Or whatever we want to call it prealloc = kmalloc(..); rcu_read_lock(); redo_cpu(); } } kfree(prealloc); which is certainly more work than a sleeping lock is, but this looks like a initialization pattern. Looking at the things that are actually performance-critical, they tend to be things like gather all the statistics for all CPUs. Honesty in advertizing: I only grepped a couple of users of get_online_cpus(), and they all seemed like they'd be perfectly happy with preemption-off. But maybe I was just lucky in my few samples: I have anecdotes, not data. Linus -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2
On Thu, 10 Oct 2013 11:49:15 -0700 Linus Torvalds torva...@linux-foundation.org wrote: Oh, and I'm sure there are several users that currently depend on being able to sleep over get_online_cpu's. But I'm pretty sure it's several, not hundreds, and I think we could fix them up. I'm wondering if we can have a for_each_cpu() that only disables preemption in the loop. That is, each iteration will enable preemption, but the loop itself will guarantee that the current cpu to process still exists. rcu_read_lock(); for_each_cpu(cpu, mask) { rcu_read_unlock(); rcu_read_lock(); if (!cpu_online(cpu)) continue; [...] } rcu_read_unlock(); That way expensive loops wont stop the current CPU to process all online CPUs. Of course, it will miss a CPU that comes online. But it would still need to handle the case of a CPU coming online after the final put_online_cpus(), so I'm not sure that's a problem. -- Steve -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2
On 10/10/2013 10:24 PM, Oleg Nesterov wrote: On 10/10, Andrew Morton wrote: On Thu, 10 Oct 2013 17:26:12 +0200 Oleg Nesterov o...@redhat.com wrote: On 10/10, Ingo Molnar wrote: So ... why not make it _really_ cheap, i.e. the read lock costing nothing, and tie CPU hotplug to freezing all tasks in the system? Actual CPU hot unplugging and repluggin is _ridiculously_ rare in a system, I don't understand how we tolerate _any_ overhead from this utter slowpath. Well, iirc Srivatsa (cc'ed) pointed out that some systems do cpu_down/up quite often to save the power. cpu hotremove already uses stop_machine, And Srivatsa wants to remove it from cpu_down(). Yes, I have worked on several designs to remove stop_machine() from the cpu_down path. http://lwn.net/Articles/538819/ http://lwn.net/Articles/556727/ so such an approach shouldn't actually worsen things (a lot) for them? this depends on what this freezing all tasks actually means. I understood it as try_to_freeze_tasks/etc, looks too heavy... But my only point was, I am not sure we can assume that cpu_down/up is extremly rare and its cost does not matter. use stop_machine() on the add/remove (ie, writer) side and nothing at all on the reader side. Is there anything which fundamentally prevents cpu hotplug from doing the same? Its certainly possible to remove stop_machine() from CPU hotplug, as I've demonstrated in the patchsets mentioned above. And there were pretty good performance improvements too, arising out of that change, as described here: http://article.gmane.org/gmane.linux.ports.ppc.embedded/56122 Regards, Srivatsa S. Bhat -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2
On Thu, Oct 10, 2013 at 08:50:32PM +0200, Peter Zijlstra wrote: On Thu, Oct 10, 2013 at 02:43:27PM -0400, Steven Rostedt wrote: On Thu, 10 Oct 2013 11:10:35 -0700 Linus Torvalds torva...@linux-foundation.org wrote: .. now we can free all the percpu data and kill the CPU .. without any locking anywhere - not stop-machine, not anything. If somebody is doing a for_each_cpu() (under just a regular rcu_read_lock()) and they see the bit set while it's going down, who cares? The CPU is still there, the data is accessible.. The problem is that rcu_read_lock() requires preemption disabled unless you are using the preemptable rcu tree version. There's always srcu_read_lock() but that's not so free. It's basically the same as what Peter is doing. No, srcu is actually more expensive in the fast path. Although possibly we could make SRCU more complex ;-) Indeed. Especially if we wanted to get rid of the memory barriers in srcu_read_lock() and srcu_read_unlock() and still allow SRCU to be used from idle and offline CPUs! ;-) Thanx, Paul There's places in the kernel that does for_each_cpu() that I'm sure you don't want to disable preemption for. Especially when you start having 4096 CPU machines! :-) But shouldn't you only have to disable over each separate pass through the loop rather across the entire loop? I guarantee that the CPU I just handed you will remain online through the body of the loop rather than I guarantee that no CPU is going anywhere through the entire loop. Thanx, Paul -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2
On Thu, Oct 10, 2013 at 12:04 PM, Steven Rostedt rost...@goodmis.org wrote: I'm wondering if we can have a for_each_cpu() that only disables preemption in the loop. I think we'd generally want to have it be something the loop asks for. If the loop is just some kind of gather statistics thing, I don't think it's required. The cost per loop is so low (usually adding up a couple of words) that the downside drowns the upside. And we could easily look at MAXSMP (or NR_CPUS) at compile-time, and not do it for common small values (although it looks like Fedora defaults to 128 CPU's for their distro kernels, which seems a bit excessive - too many by far for normal people, too few for the crazy big ones). Linus -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2
On Thu, 10 Oct 2013 12:16:16 -0700 Linus Torvalds torva...@linux-foundation.org wrote: On Thu, Oct 10, 2013 at 12:04 PM, Steven Rostedt rost...@goodmis.org wrote: I'm wondering if we can have a for_each_cpu() that only disables preemption in the loop. I think we'd generally want to have it be something the loop asks for. Yeah, I was thinking of adding a new macro, not update for_each_cpu() itself, so that only the locations that would require it would use it. If the loop is just some kind of gather statistics thing, I don't think it's required. The cost per loop is so low (usually adding up a couple of words) that the downside drowns the upside. And we could easily look at MAXSMP (or NR_CPUS) at compile-time, and not do it for common small values (although it looks like Fedora defaults to 128 CPU's for their distro kernels, which seems a bit excessive - too many by far for normal people, too few for the crazy big ones). We could always make it a boot time option, and use alternates to change the code as we do with spin_locks() and other code. -- Steve (the lover of self modifying code) -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2
On Thu, Oct 10, 2013 at 12:16:16PM -0700, Linus Torvalds wrote: On Thu, Oct 10, 2013 at 12:04 PM, Steven Rostedt rost...@goodmis.org wrote: I'm wondering if we can have a for_each_cpu() that only disables preemption in the loop. I think we'd generally want to have it be something the loop asks for. If the loop is just some kind of gather statistics thing, I don't think it's required. The cost per loop is so low (usually adding up a couple of words) that the downside drowns the upside. And we could easily look at MAXSMP (or NR_CPUS) at compile-time, and not do it for common small values (although it looks like Fedora defaults to 128 CPU's for their distro kernels, which seems a bit excessive - too many by far for normal people, too few for the crazy big ones). Alternatively we could write it something like: rcu_read_lock(); for_each_online_node(node) { for_each_cpu(cpu, cpumask_of_node(node)) { ... } cond_resched_rcu(); } rcu_read_unlock(); But yes, I think this pattern (and variations) should work for most cases. The one I'm struggling with atm is kernel/cpuset.c: rebuild_sched_domains_locked(). Although I'm thinking we should do that seqlock style, just build it and verify if its still valid, if not, try again -- although I'm sure it'll be 'fun' to get correct. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2
On Tue, 08 Oct 2013 12:25:05 +0200 Peter Zijlstra wrote: > The current cpu hotplug lock is a single global lock; therefore excluding > hotplug is a very expensive proposition even though it is rare occurrence > under > normal operation. > > There is a desire for a more light weight implementation of > {get,put}_online_cpus() from both the NUMA scheduling as well as the -RT side. > > The current hotplug lock is a full reader preference lock -- and thus supports > reader recursion. However since we're making the read side lock much cheaper > it > is the expectation that it will also be used far more. Which in turn would > lead > to writer starvation. > > Therefore the new lock proposed is completely fair; albeit somewhat expensive > on the write side. This in turn means that we need a per-task nesting count to > support reader recursion. This is a lot of code and a lot of new complexity. It needs some pretty convincing performance numbers to justify its inclusion, no? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2
On Tue, 08 Oct 2013 12:25:05 +0200 Peter Zijlstra pet...@infradead.org wrote: The current cpu hotplug lock is a single global lock; therefore excluding hotplug is a very expensive proposition even though it is rare occurrence under normal operation. There is a desire for a more light weight implementation of {get,put}_online_cpus() from both the NUMA scheduling as well as the -RT side. The current hotplug lock is a full reader preference lock -- and thus supports reader recursion. However since we're making the read side lock much cheaper it is the expectation that it will also be used far more. Which in turn would lead to writer starvation. Therefore the new lock proposed is completely fair; albeit somewhat expensive on the write side. This in turn means that we need a per-task nesting count to support reader recursion. This is a lot of code and a lot of new complexity. It needs some pretty convincing performance numbers to justify its inclusion, no? -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/