Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2

2013-10-14 Thread Paul E. McKenney
On Mon, Oct 14, 2013 at 02:23:55AM -0700, Paul E. McKenney wrote:
> On Mon, Oct 14, 2013 at 11:05:08AM +0200, Peter Zijlstra wrote:
> > On Sat, Oct 12, 2013 at 07:06:56PM +0200, Oleg Nesterov wrote:
> > > it even disables irqs, so this should always imply rcu_read_lock() with
> > > any implementation, 
> > 
> > Not so; I could make an RCU implementation that drives the state machine
> > from rcu_read_unlock(). Such an implementation doesn't need the
> > interrupt driven poll-state driver we currently have and could thus
> > subvert that assumption :-)
> > 
> > Then again, there's a good reason PaulMck didn't pick this
> > implementation.
> 
> True enough, but there really are some out-of-tree RCU implementations
> that do take this approach and where disabling interrupts would not
> block preemptible RCU.  So please do not rely on this implementation
> detail.  You never know...

Actually, the current implementation of SRCU is not blocked by CPUs
disabling interrupts!

Thanx, Paul

> > > In fact I do not even understand why getaffinity() doesn't simply
> > > return ->cpus_allowed, but this is off-topic.
> > 
> > Yeah, me neither :-(, it always surprises me. But changing it is likely
> > to break stuff so there we are.
> 
> I know that feeling...
> 
>   Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2

2013-10-14 Thread Paul E. McKenney
On Mon, Oct 14, 2013 at 11:05:08AM +0200, Peter Zijlstra wrote:
> On Sat, Oct 12, 2013 at 07:06:56PM +0200, Oleg Nesterov wrote:
> > it even disables irqs, so this should always imply rcu_read_lock() with
> > any implementation, 
> 
> Not so; I could make an RCU implementation that drives the state machine
> from rcu_read_unlock(). Such an implementation doesn't need the
> interrupt driven poll-state driver we currently have and could thus
> subvert that assumption :-)
> 
> Then again, there's a good reason PaulMck didn't pick this
> implementation.

True enough, but there really are some out-of-tree RCU implementations
that do take this approach and where disabling interrupts would not
block preemptible RCU.  So please do not rely on this implementation
detail.  You never know...

> > In fact I do not even understand why getaffinity() doesn't simply
> > return ->cpus_allowed, but this is off-topic.
> 
> Yeah, me neither :-(, it always surprises me. But changing it is likely
> to break stuff so there we are.

I know that feeling...

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2

2013-10-14 Thread Peter Zijlstra
On Sat, Oct 12, 2013 at 07:06:56PM +0200, Oleg Nesterov wrote:
> it even disables irqs, so this should always imply rcu_read_lock() with
> any implementation, 

Not so; I could make an RCU implementation that drives the state machine
from rcu_read_unlock(). Such an implementation doesn't need the
interrupt driven poll-state driver we currently have and could thus
subvert that assumption :-)

Then again, there's a good reason PaulMck didn't pick this
implementation.

> In fact I do not even understand why getaffinity() doesn't simply
> return ->cpus_allowed, but this is off-topic.

Yeah, me neither :-(, it always surprises me. But changing it is likely
to break stuff so there we are.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2

2013-10-14 Thread Peter Zijlstra
On Sat, Oct 12, 2013 at 07:06:56PM +0200, Oleg Nesterov wrote:
 it even disables irqs, so this should always imply rcu_read_lock() with
 any implementation, 

Not so; I could make an RCU implementation that drives the state machine
from rcu_read_unlock(). Such an implementation doesn't need the
interrupt driven poll-state driver we currently have and could thus
subvert that assumption :-)

Then again, there's a good reason PaulMck didn't pick this
implementation.

 In fact I do not even understand why getaffinity() doesn't simply
 return -cpus_allowed, but this is off-topic.

Yeah, me neither :-(, it always surprises me. But changing it is likely
to break stuff so there we are.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2

2013-10-14 Thread Paul E. McKenney
On Mon, Oct 14, 2013 at 11:05:08AM +0200, Peter Zijlstra wrote:
 On Sat, Oct 12, 2013 at 07:06:56PM +0200, Oleg Nesterov wrote:
  it even disables irqs, so this should always imply rcu_read_lock() with
  any implementation, 
 
 Not so; I could make an RCU implementation that drives the state machine
 from rcu_read_unlock(). Such an implementation doesn't need the
 interrupt driven poll-state driver we currently have and could thus
 subvert that assumption :-)
 
 Then again, there's a good reason PaulMck didn't pick this
 implementation.

True enough, but there really are some out-of-tree RCU implementations
that do take this approach and where disabling interrupts would not
block preemptible RCU.  So please do not rely on this implementation
detail.  You never know...

  In fact I do not even understand why getaffinity() doesn't simply
  return -cpus_allowed, but this is off-topic.
 
 Yeah, me neither :-(, it always surprises me. But changing it is likely
 to break stuff so there we are.

I know that feeling...

Thanx, Paul

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2

2013-10-14 Thread Paul E. McKenney
On Mon, Oct 14, 2013 at 02:23:55AM -0700, Paul E. McKenney wrote:
 On Mon, Oct 14, 2013 at 11:05:08AM +0200, Peter Zijlstra wrote:
  On Sat, Oct 12, 2013 at 07:06:56PM +0200, Oleg Nesterov wrote:
   it even disables irqs, so this should always imply rcu_read_lock() with
   any implementation, 
  
  Not so; I could make an RCU implementation that drives the state machine
  from rcu_read_unlock(). Such an implementation doesn't need the
  interrupt driven poll-state driver we currently have and could thus
  subvert that assumption :-)
  
  Then again, there's a good reason PaulMck didn't pick this
  implementation.
 
 True enough, but there really are some out-of-tree RCU implementations
 that do take this approach and where disabling interrupts would not
 block preemptible RCU.  So please do not rely on this implementation
 detail.  You never know...

Actually, the current implementation of SRCU is not blocked by CPUs
disabling interrupts!

Thanx, Paul

   In fact I do not even understand why getaffinity() doesn't simply
   return -cpus_allowed, but this is off-topic.
  
  Yeah, me neither :-(, it always surprises me. But changing it is likely
  to break stuff so there we are.
 
 I know that feeling...
 
   Thanx, Paul

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2

2013-10-12 Thread Oleg Nesterov
On 10/11, Peter Zijlstra wrote:
>
> On Fri, Oct 11, 2013 at 08:25:07PM +0200, Oleg Nesterov wrote:
> > On 10/11, Peter Zijlstra wrote:
> > >
> > > As a penance I'll start by removing all get_online_cpus() usage from the
> > > scheduler.
> >
> > I only looked at the change in setaffinity,
> >
> > > @@ -3706,7 +3707,6 @@ long sched_setaffinity(pid_t pid, const struct 
> > > cpumask *in_mask)
> > >   struct task_struct *p;
> > >   int retval;
> > >
> > > - get_online_cpus();
> > >   rcu_read_lock();
> >
> > Hmm. In theory task_rq_lock() doesn't imply rcu-lock, so
> > set_cpus_allowed_ptr() can miss the change in cpu_active_mask. But this
> > is probably fine, CPU_DYING does __migrate_task().
>
> I'm fine with always doing sync_sched(); sync_rcu(); if that makes you
> feel better.

No, I was just curious. iow, I am asking, not arguing.

> But I thought that assuming that !PREEMPT sync_rcu() would
> imply sync_sched() was ok. I think the comment there even says as much.
>
> And task_rq_lock() will very much disable preemption; and thus we get
> what we want, right?

it even disables irqs, so this should always imply rcu_read_lock() with
any implementation, I guess. However I was told we should not rely on
this, and say posix_timer_event() does rcu_read_lock() even it is always
called under spin_lock_irq().

But what I actually tried to say, it seems that this particular change
looks fine even if cpu_down() doesn't do sync_sched/rcu at all, because
we can rely on __migrate_task(). IOW, if we race with DOWN_PREPARE and
miss set_cpu_active(false) we can pretend that this CPU goes down later.

> In any case; the goal was to make either RCU or preempt-disable
> sufficient.
>
> > However. This means that sched_setaffinity() can fail if it races with
> > the failing cpu_down() (say, __cpu_notify(CPU_DOWN_PREPARE) fails).
> > Probably we do not really care, just this looks a bit confusing.
>
> Couldn't be bothered; failing hotplug will have side-effects any which
> way.

OK.

> > > @@ -3827,12 +3825,11 @@ long sched_getaffinity(pid_t pid, struct cpumask 
> > > *mask)
> > >   goto out_unlock;
> > >
> > >   raw_spin_lock_irqsave(>pi_lock, flags);
> > > - cpumask_and(mask, >cpus_allowed, cpu_online_mask);
> > > + cpumask_and(mask, >cpus_allowed, cpu_active_mask);
> >
> > But I am just curious, is this change is strictly needed?
>
> No; we could do without. It really doesn't matter much if anything. I
> only did it because sched_setaffinity()->set_cpus_allowed_ptr() checks
> against active, not online. And had a sudden urge to make get/set
> symmetric -- totally pointless otherwise.

OK, thanks, I was just curious.

In fact I do not even understand why getaffinity() doesn't simply
return ->cpus_allowed, but this is off-topic.

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2

2013-10-12 Thread Oleg Nesterov
On 10/11, Peter Zijlstra wrote:

 On Fri, Oct 11, 2013 at 08:25:07PM +0200, Oleg Nesterov wrote:
  On 10/11, Peter Zijlstra wrote:
  
   As a penance I'll start by removing all get_online_cpus() usage from the
   scheduler.
 
  I only looked at the change in setaffinity,
 
   @@ -3706,7 +3707,6 @@ long sched_setaffinity(pid_t pid, const struct 
   cpumask *in_mask)
 struct task_struct *p;
 int retval;
  
   - get_online_cpus();
 rcu_read_lock();
 
  Hmm. In theory task_rq_lock() doesn't imply rcu-lock, so
  set_cpus_allowed_ptr() can miss the change in cpu_active_mask. But this
  is probably fine, CPU_DYING does __migrate_task().

 I'm fine with always doing sync_sched(); sync_rcu(); if that makes you
 feel better.

No, I was just curious. iow, I am asking, not arguing.

 But I thought that assuming that !PREEMPT sync_rcu() would
 imply sync_sched() was ok. I think the comment there even says as much.

 And task_rq_lock() will very much disable preemption; and thus we get
 what we want, right?

it even disables irqs, so this should always imply rcu_read_lock() with
any implementation, I guess. However I was told we should not rely on
this, and say posix_timer_event() does rcu_read_lock() even it is always
called under spin_lock_irq().

But what I actually tried to say, it seems that this particular change
looks fine even if cpu_down() doesn't do sync_sched/rcu at all, because
we can rely on __migrate_task(). IOW, if we race with DOWN_PREPARE and
miss set_cpu_active(false) we can pretend that this CPU goes down later.

 In any case; the goal was to make either RCU or preempt-disable
 sufficient.

  However. This means that sched_setaffinity() can fail if it races with
  the failing cpu_down() (say, __cpu_notify(CPU_DOWN_PREPARE) fails).
  Probably we do not really care, just this looks a bit confusing.

 Couldn't be bothered; failing hotplug will have side-effects any which
 way.

OK.

   @@ -3827,12 +3825,11 @@ long sched_getaffinity(pid_t pid, struct cpumask 
   *mask)
 goto out_unlock;
  
 raw_spin_lock_irqsave(p-pi_lock, flags);
   - cpumask_and(mask, p-cpus_allowed, cpu_online_mask);
   + cpumask_and(mask, p-cpus_allowed, cpu_active_mask);
 
  But I am just curious, is this change is strictly needed?

 No; we could do without. It really doesn't matter much if anything. I
 only did it because sched_setaffinity()-set_cpus_allowed_ptr() checks
 against active, not online. And had a sudden urge to make get/set
 symmetric -- totally pointless otherwise.

OK, thanks, I was just curious.

In fact I do not even understand why getaffinity() doesn't simply
return -cpus_allowed, but this is off-topic.

Oleg.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2

2013-10-11 Thread Peter Zijlstra
On Fri, Oct 11, 2013 at 08:25:07PM +0200, Oleg Nesterov wrote:
> On 10/11, Peter Zijlstra wrote:
> >
> > As a penance I'll start by removing all get_online_cpus() usage from the
> > scheduler.
> 
> I only looked at the change in setaffinity,
> 
> > @@ -3706,7 +3707,6 @@ long sched_setaffinity(pid_t pid, const struct 
> > cpumask *in_mask)
> > struct task_struct *p;
> > int retval;
> >
> > -   get_online_cpus();
> > rcu_read_lock();
> 
> Hmm. In theory task_rq_lock() doesn't imply rcu-lock, so
> set_cpus_allowed_ptr() can miss the change in cpu_active_mask. But this
> is probably fine, CPU_DYING does __migrate_task().

I'm fine with always doing sync_sched(); sync_rcu(); if that makes you
feel better. But I thought that assuming that !PREEMPT sync_rcu() would
imply sync_sched() was ok. I think the comment there even says as much.

And task_rq_lock() will very much disable preemption; and thus we get
what we want, right?

In any case; the goal was to make either RCU or preempt-disable
sufficient.

> However. This means that sched_setaffinity() can fail if it races with
> the failing cpu_down() (say, __cpu_notify(CPU_DOWN_PREPARE) fails).
> Probably we do not really care, just this looks a bit confusing.

Couldn't be bothered; failing hotplug will have side-effects any which
way.

> > @@ -3814,7 +3813,6 @@ long sched_getaffinity(pid_t pid, struct cpumask 
> > *mask)
> > unsigned long flags;
> > int retval;
> >
> > -   get_online_cpus();
> 
> This change is probably fine in any case?

Yes.

> > rcu_read_lock();
> >
> > retval = -ESRCH;
> > @@ -3827,12 +3825,11 @@ long sched_getaffinity(pid_t pid, struct cpumask 
> > *mask)
> > goto out_unlock;
> >
> > raw_spin_lock_irqsave(>pi_lock, flags);
> > -   cpumask_and(mask, >cpus_allowed, cpu_online_mask);
> > +   cpumask_and(mask, >cpus_allowed, cpu_active_mask);
> 
> But I am just curious, is this change is strictly needed?

No; we could do without. It really doesn't matter much if anything. I
only did it because sched_setaffinity()->set_cpus_allowed_ptr() checks
against active, not online. And had a sudden urge to make get/set
symmetric -- totally pointless otherwise.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2

2013-10-11 Thread Oleg Nesterov
On 10/11, Peter Zijlstra wrote:
>
> As a penance I'll start by removing all get_online_cpus() usage from the
> scheduler.

I only looked at the change in setaffinity,

> @@ -3706,7 +3707,6 @@ long sched_setaffinity(pid_t pid, const struct cpumask 
> *in_mask)
>   struct task_struct *p;
>   int retval;
>
> - get_online_cpus();
>   rcu_read_lock();

Hmm. In theory task_rq_lock() doesn't imply rcu-lock, so
set_cpus_allowed_ptr() can miss the change in cpu_active_mask. But this
is probably fine, CPU_DYING does __migrate_task().

However. This means that sched_setaffinity() can fail if it races with
the failing cpu_down() (say, __cpu_notify(CPU_DOWN_PREPARE) fails).
Probably we do not really care, just this looks a bit confusing.

> @@ -3814,7 +3813,6 @@ long sched_getaffinity(pid_t pid, struct cpumask *mask)
>   unsigned long flags;
>   int retval;
>
> - get_online_cpus();

This change is probably fine in any case?

>   rcu_read_lock();
>
>   retval = -ESRCH;
> @@ -3827,12 +3825,11 @@ long sched_getaffinity(pid_t pid, struct cpumask 
> *mask)
>   goto out_unlock;
>
>   raw_spin_lock_irqsave(>pi_lock, flags);
> - cpumask_and(mask, >cpus_allowed, cpu_online_mask);
> + cpumask_and(mask, >cpus_allowed, cpu_active_mask);

But I am just curious, is this change is strictly needed?

Afaics we do not care if we race with set_cpu_online(true/false).

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2

2013-10-11 Thread George Spelvin
> The problem is that the scheduler doesn't see that the current task has
> preemption disabled. It only looks at the priorities of the current
> task, and if it can preempt it, it will. It sets the NEED_RESCHED to the
> current task and waits for the preemption to schedule it out.

Ah, got it.  It's a priority inversion problem, and while there is a
standard solution (priority inheritance), if I were to suggest such a
heavyweight solution, Linus would start yelling again and Sarah would
get unhappy, and we all know how that ends up.

Thank you.  I will now cease displaying my ignorance in public.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2

2013-10-11 Thread Steven Rostedt
On 11 Oct 2013 08:14:57 -0400
"George Spelvin"  wrote:

> > There's places in the kernel that does for_each_cpu() that I'm sure you
> > don't want to disable preemption for. Especially when you start having
> > 4096 CPU machines!
> 
> Er... why not?
> 
> Seriously.  If I have 4096 processors, and preemption is disabled on
> *one* of them for a long time, can't an urgent task just find a different
> processor to preempt?

The problem is that the scheduler doesn't see that the current task has
preemption disabled. It only looks at the priorities of the current
task, and if it can preempt it, it will. It sets the NEED_RESCHED to the
current task and waits for the preemption to schedule it out.

If an RT task is woken on a CPU that's running a SCHED_OTHER task
that's doing this loop. It will have to wait for the loop to finish
before it can schedule in. After doing the wake up, the scheduling
algorithm is happy that it got an RT task on a CPU quickly, when in
reality it has to wait a long time till preemption is enabled again.

-- Steve



> 
> This seems like a non-problem.  Or am I misunderstanding something about
> processor affinity?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2

2013-10-11 Thread Peter Zijlstra
On Thu, Oct 10, 2013 at 11:49:15AM -0700, Linus Torvalds wrote:
> On Thu, Oct 10, 2013 at 11:34 AM, Peter Zijlstra  wrote:
> >
> > But my point is that even though there aren't many of these today; with
> > the growing number of cpus in 'commodity' hardware you want to move away
> > from using preempt_disable() as hotplug lock.
> 
> Umm.
> 
> Wasn't this pretty much the argument for the COMPLETELY F*CKED UP
> change to make the vma locking use a semaphore?

Not entirely; but fair enough, I did screw up there.

> The whole "it's more convenient to use sleeping locks" argument is
> PURE AND UTTER SHIT when it comes to really core code. We are *much*
> better off saying "this is core, we care so deeply about performance
> that you had better use your brain before you do this".
> 
> Seriously. Your argument is bad, but more importantly, it is
> *dangerously* bad. It's crap that results in bad code: and the bad
> code is almost impossible to fix up later, because once you encourage
> people to do the easy thing, they'll do so.

Right, I fell face first into this trap. The existence of
{get,put}_online_cpus() made me stop thinking and use it.

As a penance I'll start by removing all get_online_cpus() usage from the
scheduler.

---
Subject: sched: Remove get_online_cpus() usage

Remove get_online_cpus() usage from the scheduler; there's 4 sites that
use it:

 - sched_init_smp(); where its completely superfluous since we're in
   'early' boot and there simply cannot be any hotplugging.

 - sched_getaffinity(); we already take a raw spinlock to protect the
   task cpus_allowed mask, this disables preemption and therefore
   also stabilizes cpu_online_mask as that's modified using
   stop_machine. However switch to active mask for symmetry with
   sched_setaffinity()/set_cpus_allowed_ptr(). We guarantee active
   mask stability by inserting sync_rcu/sched() into _cpu_down.

 - sched_setaffinity(); we don't appear to need get_online_cpus()
   either, there's two sites where hotplug appears relevant:
* cpuset_cpus_allowed(); for the !cpuset case we use possible_mask,
  for the cpuset case we hold task_lock, which is a spinlock and
  thus for mainline disables preemption (might cause pain on RT).
* set_cpus_allowed_ptr(); Holds all scheduler locks and thus has
  preemption properly disabled; also it already deals with hotplug
  races explicitly where it releases them.

 - migrate_swap(); we can make stop_two_cpus() do the heavy lifting for
   us with a little trickery. By adding a sync_sched/rcu() after the
   CPU_DOWN_PREPARE notifier we can provide preempt/rcu guarantees for
   cpu_active_mask. Use these to validate that both our cpus are active
   when queueing the stop work before we queue the stop_machine works
   for take_cpu_down().

Signed-off-by: Peter Zijlstra 
---
 kernel/cpu.c  | 17 +
 kernel/sched/core.c   | 20 ++--
 kernel/stop_machine.c | 26 +-
 3 files changed, 48 insertions(+), 15 deletions(-)

diff --git a/kernel/cpu.c b/kernel/cpu.c
index d7f07a2da5a6..63aa50d7ce1e 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -308,6 +308,23 @@ static int __ref _cpu_down(unsigned int cpu, int 
tasks_frozen)
}
smpboot_park_threads(cpu);
 
+   /*
+* By now we've cleared cpu_active_mask, wait for all preempt-disabled
+* and RCU users of this state to go away such that all new such users
+* will observe it.
+*
+* For CONFIG_PREEMPT we have preemptible RCU and its sync_rcu() might
+* not imply sync_sched(), so explicitly call both.
+*/
+#ifdef CONFIG_PREEMPT
+   synchronize_sched();
+#endif
+   synchronize_rcu();
+
+   /*
+* So now all preempt/rcu users must observe !cpu_active().
+*/
+
err = __stop_machine(take_cpu_down, _param, cpumask_of(cpu));
if (err) {
/* CPU didn't die: tell everyone.  Can't complain. */
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 0c3feebcf112..498a5e5a53f5 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1081,8 +1081,6 @@ int migrate_swap(struct task_struct *cur, struct 
task_struct *p)
struct migration_swap_arg arg;
int ret = -EINVAL;
 
-   get_online_cpus();
-
arg = (struct migration_swap_arg){
.src_task = cur,
.src_cpu = task_cpu(cur),
@@ -1093,6 +1091,10 @@ int migrate_swap(struct task_struct *cur, struct 
task_struct *p)
if (arg.src_cpu == arg.dst_cpu)
goto out;
 
+   /*
+* These three tests are all lockless; this is OK since all of them
+* will be re-checked with proper locks held further down the line.
+*/
if (!cpu_active(arg.src_cpu) || !cpu_active(arg.dst_cpu))
goto out;
 
@@ -1105,7 +1107,6 @@ int migrate_swap(struct task_struct *cur, struct 
task_struct *p)
ret = stop_two_cpus(arg.dst_cpu, 

Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2

2013-10-11 Thread George Spelvin
> There's places in the kernel that does for_each_cpu() that I'm sure you
> don't want to disable preemption for. Especially when you start having
> 4096 CPU machines!

Er... why not?

Seriously.  If I have 4096 processors, and preemption is disabled on
*one* of them for a long time, can't an urgent task just find a different
processor to preempt?

This seems like a non-problem.  Or am I misunderstanding something about
processor affinity?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2

2013-10-11 Thread Ingo Molnar

* Linus Torvalds  wrote:

> On Thu, Oct 10, 2013 at 12:04 PM, Steven Rostedt  wrote:
> >
> > I'm wondering if we can have a for_each_cpu() that only disables 
> > preemption in the loop.
> 
> I think we'd generally want to have it be something the loop asks for.
> 
> If the loop is just some kind of "gather statistics" thing, I don't 
> think it's required. The cost per loop is so low (usually adding up a 
> couple of words) that the downside drowns the upside.
> 
> And we could easily look at MAXSMP (or NR_CPUS) at compile-time, and not 
> do it for common small values (although it looks like Fedora defaults to 
> 128 CPU's for their distro kernels, which seems a bit excessive - too 
> many by far for normal people, too few for the crazy big ones).

Ubuntu has it at 256, so I guess Fedora is even a bit conservative ;-)

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2

2013-10-11 Thread George Spelvin
 There's places in the kernel that does for_each_cpu() that I'm sure you
 don't want to disable preemption for. Especially when you start having
 4096 CPU machines!

Er... why not?

Seriously.  If I have 4096 processors, and preemption is disabled on
*one* of them for a long time, can't an urgent task just find a different
processor to preempt?

This seems like a non-problem.  Or am I misunderstanding something about
processor affinity?
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2

2013-10-11 Thread Peter Zijlstra
On Thu, Oct 10, 2013 at 11:49:15AM -0700, Linus Torvalds wrote:
 On Thu, Oct 10, 2013 at 11:34 AM, Peter Zijlstra pet...@infradead.org wrote:
 
  But my point is that even though there aren't many of these today; with
  the growing number of cpus in 'commodity' hardware you want to move away
  from using preempt_disable() as hotplug lock.
 
 Umm.
 
 Wasn't this pretty much the argument for the COMPLETELY F*CKED UP
 change to make the vma locking use a semaphore?

Not entirely; but fair enough, I did screw up there.

 The whole it's more convenient to use sleeping locks argument is
 PURE AND UTTER SHIT when it comes to really core code. We are *much*
 better off saying this is core, we care so deeply about performance
 that you had better use your brain before you do this.
 
 Seriously. Your argument is bad, but more importantly, it is
 *dangerously* bad. It's crap that results in bad code: and the bad
 code is almost impossible to fix up later, because once you encourage
 people to do the easy thing, they'll do so.

Right, I fell face first into this trap. The existence of
{get,put}_online_cpus() made me stop thinking and use it.

As a penance I'll start by removing all get_online_cpus() usage from the
scheduler.

---
Subject: sched: Remove get_online_cpus() usage

Remove get_online_cpus() usage from the scheduler; there's 4 sites that
use it:

 - sched_init_smp(); where its completely superfluous since we're in
   'early' boot and there simply cannot be any hotplugging.

 - sched_getaffinity(); we already take a raw spinlock to protect the
   task cpus_allowed mask, this disables preemption and therefore
   also stabilizes cpu_online_mask as that's modified using
   stop_machine. However switch to active mask for symmetry with
   sched_setaffinity()/set_cpus_allowed_ptr(). We guarantee active
   mask stability by inserting sync_rcu/sched() into _cpu_down.

 - sched_setaffinity(); we don't appear to need get_online_cpus()
   either, there's two sites where hotplug appears relevant:
* cpuset_cpus_allowed(); for the !cpuset case we use possible_mask,
  for the cpuset case we hold task_lock, which is a spinlock and
  thus for mainline disables preemption (might cause pain on RT).
* set_cpus_allowed_ptr(); Holds all scheduler locks and thus has
  preemption properly disabled; also it already deals with hotplug
  races explicitly where it releases them.

 - migrate_swap(); we can make stop_two_cpus() do the heavy lifting for
   us with a little trickery. By adding a sync_sched/rcu() after the
   CPU_DOWN_PREPARE notifier we can provide preempt/rcu guarantees for
   cpu_active_mask. Use these to validate that both our cpus are active
   when queueing the stop work before we queue the stop_machine works
   for take_cpu_down().

Signed-off-by: Peter Zijlstra pet...@infradead.org
---
 kernel/cpu.c  | 17 +
 kernel/sched/core.c   | 20 ++--
 kernel/stop_machine.c | 26 +-
 3 files changed, 48 insertions(+), 15 deletions(-)

diff --git a/kernel/cpu.c b/kernel/cpu.c
index d7f07a2da5a6..63aa50d7ce1e 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -308,6 +308,23 @@ static int __ref _cpu_down(unsigned int cpu, int 
tasks_frozen)
}
smpboot_park_threads(cpu);
 
+   /*
+* By now we've cleared cpu_active_mask, wait for all preempt-disabled
+* and RCU users of this state to go away such that all new such users
+* will observe it.
+*
+* For CONFIG_PREEMPT we have preemptible RCU and its sync_rcu() might
+* not imply sync_sched(), so explicitly call both.
+*/
+#ifdef CONFIG_PREEMPT
+   synchronize_sched();
+#endif
+   synchronize_rcu();
+
+   /*
+* So now all preempt/rcu users must observe !cpu_active().
+*/
+
err = __stop_machine(take_cpu_down, tcd_param, cpumask_of(cpu));
if (err) {
/* CPU didn't die: tell everyone.  Can't complain. */
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 0c3feebcf112..498a5e5a53f5 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1081,8 +1081,6 @@ int migrate_swap(struct task_struct *cur, struct 
task_struct *p)
struct migration_swap_arg arg;
int ret = -EINVAL;
 
-   get_online_cpus();
-
arg = (struct migration_swap_arg){
.src_task = cur,
.src_cpu = task_cpu(cur),
@@ -1093,6 +1091,10 @@ int migrate_swap(struct task_struct *cur, struct 
task_struct *p)
if (arg.src_cpu == arg.dst_cpu)
goto out;
 
+   /*
+* These three tests are all lockless; this is OK since all of them
+* will be re-checked with proper locks held further down the line.
+*/
if (!cpu_active(arg.src_cpu) || !cpu_active(arg.dst_cpu))
goto out;
 
@@ -1105,7 +1107,6 @@ int migrate_swap(struct task_struct *cur, struct 
task_struct *p)
ret = 

Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2

2013-10-11 Thread Steven Rostedt
On 11 Oct 2013 08:14:57 -0400
George Spelvin li...@horizon.com wrote:

  There's places in the kernel that does for_each_cpu() that I'm sure you
  don't want to disable preemption for. Especially when you start having
  4096 CPU machines!
 
 Er... why not?
 
 Seriously.  If I have 4096 processors, and preemption is disabled on
 *one* of them for a long time, can't an urgent task just find a different
 processor to preempt?

The problem is that the scheduler doesn't see that the current task has
preemption disabled. It only looks at the priorities of the current
task, and if it can preempt it, it will. It sets the NEED_RESCHED to the
current task and waits for the preemption to schedule it out.

If an RT task is woken on a CPU that's running a SCHED_OTHER task
that's doing this loop. It will have to wait for the loop to finish
before it can schedule in. After doing the wake up, the scheduling
algorithm is happy that it got an RT task on a CPU quickly, when in
reality it has to wait a long time till preemption is enabled again.

-- Steve



 
 This seems like a non-problem.  Or am I misunderstanding something about
 processor affinity?

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2

2013-10-11 Thread George Spelvin
 The problem is that the scheduler doesn't see that the current task has
 preemption disabled. It only looks at the priorities of the current
 task, and if it can preempt it, it will. It sets the NEED_RESCHED to the
 current task and waits for the preemption to schedule it out.

Ah, got it.  It's a priority inversion problem, and while there is a
standard solution (priority inheritance), if I were to suggest such a
heavyweight solution, Linus would start yelling again and Sarah would
get unhappy, and we all know how that ends up.

Thank you.  I will now cease displaying my ignorance in public.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2

2013-10-11 Thread Oleg Nesterov
On 10/11, Peter Zijlstra wrote:

 As a penance I'll start by removing all get_online_cpus() usage from the
 scheduler.

I only looked at the change in setaffinity,

 @@ -3706,7 +3707,6 @@ long sched_setaffinity(pid_t pid, const struct cpumask 
 *in_mask)
   struct task_struct *p;
   int retval;

 - get_online_cpus();
   rcu_read_lock();

Hmm. In theory task_rq_lock() doesn't imply rcu-lock, so
set_cpus_allowed_ptr() can miss the change in cpu_active_mask. But this
is probably fine, CPU_DYING does __migrate_task().

However. This means that sched_setaffinity() can fail if it races with
the failing cpu_down() (say, __cpu_notify(CPU_DOWN_PREPARE) fails).
Probably we do not really care, just this looks a bit confusing.

 @@ -3814,7 +3813,6 @@ long sched_getaffinity(pid_t pid, struct cpumask *mask)
   unsigned long flags;
   int retval;

 - get_online_cpus();

This change is probably fine in any case?

   rcu_read_lock();

   retval = -ESRCH;
 @@ -3827,12 +3825,11 @@ long sched_getaffinity(pid_t pid, struct cpumask 
 *mask)
   goto out_unlock;

   raw_spin_lock_irqsave(p-pi_lock, flags);
 - cpumask_and(mask, p-cpus_allowed, cpu_online_mask);
 + cpumask_and(mask, p-cpus_allowed, cpu_active_mask);

But I am just curious, is this change is strictly needed?

Afaics we do not care if we race with set_cpu_online(true/false).

Oleg.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2

2013-10-11 Thread Peter Zijlstra
On Fri, Oct 11, 2013 at 08:25:07PM +0200, Oleg Nesterov wrote:
 On 10/11, Peter Zijlstra wrote:
 
  As a penance I'll start by removing all get_online_cpus() usage from the
  scheduler.
 
 I only looked at the change in setaffinity,
 
  @@ -3706,7 +3707,6 @@ long sched_setaffinity(pid_t pid, const struct 
  cpumask *in_mask)
  struct task_struct *p;
  int retval;
 
  -   get_online_cpus();
  rcu_read_lock();
 
 Hmm. In theory task_rq_lock() doesn't imply rcu-lock, so
 set_cpus_allowed_ptr() can miss the change in cpu_active_mask. But this
 is probably fine, CPU_DYING does __migrate_task().

I'm fine with always doing sync_sched(); sync_rcu(); if that makes you
feel better. But I thought that assuming that !PREEMPT sync_rcu() would
imply sync_sched() was ok. I think the comment there even says as much.

And task_rq_lock() will very much disable preemption; and thus we get
what we want, right?

In any case; the goal was to make either RCU or preempt-disable
sufficient.

 However. This means that sched_setaffinity() can fail if it races with
 the failing cpu_down() (say, __cpu_notify(CPU_DOWN_PREPARE) fails).
 Probably we do not really care, just this looks a bit confusing.

Couldn't be bothered; failing hotplug will have side-effects any which
way.

  @@ -3814,7 +3813,6 @@ long sched_getaffinity(pid_t pid, struct cpumask 
  *mask)
  unsigned long flags;
  int retval;
 
  -   get_online_cpus();
 
 This change is probably fine in any case?

Yes.

  rcu_read_lock();
 
  retval = -ESRCH;
  @@ -3827,12 +3825,11 @@ long sched_getaffinity(pid_t pid, struct cpumask 
  *mask)
  goto out_unlock;
 
  raw_spin_lock_irqsave(p-pi_lock, flags);
  -   cpumask_and(mask, p-cpus_allowed, cpu_online_mask);
  +   cpumask_and(mask, p-cpus_allowed, cpu_active_mask);
 
 But I am just curious, is this change is strictly needed?

No; we could do without. It really doesn't matter much if anything. I
only did it because sched_setaffinity()-set_cpus_allowed_ptr() checks
against active, not online. And had a sudden urge to make get/set
symmetric -- totally pointless otherwise.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2

2013-10-11 Thread Ingo Molnar

* Linus Torvalds torva...@linux-foundation.org wrote:

 On Thu, Oct 10, 2013 at 12:04 PM, Steven Rostedt rost...@goodmis.org wrote:
 
  I'm wondering if we can have a for_each_cpu() that only disables 
  preemption in the loop.
 
 I think we'd generally want to have it be something the loop asks for.
 
 If the loop is just some kind of gather statistics thing, I don't 
 think it's required. The cost per loop is so low (usually adding up a 
 couple of words) that the downside drowns the upside.
 
 And we could easily look at MAXSMP (or NR_CPUS) at compile-time, and not 
 do it for common small values (although it looks like Fedora defaults to 
 128 CPU's for their distro kernels, which seems a bit excessive - too 
 many by far for normal people, too few for the crazy big ones).

Ubuntu has it at 256, so I guess Fedora is even a bit conservative ;-)

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2

2013-10-10 Thread Steven Rostedt
On Thu, 10 Oct 2013 12:16:16 -0700
Linus Torvalds  wrote:

> On Thu, Oct 10, 2013 at 12:04 PM, Steven Rostedt  wrote:
> >
> > I'm wondering if we can have a for_each_cpu() that only disables
> > preemption in the loop.
> 
> I think we'd generally want to have it be something the loop asks for.

Yeah, I was thinking of adding a new macro, not update for_each_cpu()
itself, so that only the locations that would require it would use it.

> 
> If the loop is just some kind of "gather statistics" thing, I don't
> think it's required. The cost per loop is so low (usually adding up a
> couple of words) that the downside drowns the upside.
> 
> And we could easily look at MAXSMP (or NR_CPUS) at compile-time, and
> not do it for common small values (although it looks like Fedora
> defaults to 128 CPU's for their distro kernels, which seems a bit
> excessive - too many by far for normal people, too few for the crazy
> big ones).

We could always make it a boot time option, and use alternates to
change the code as we do with spin_locks() and other code.

-- Steve (the lover of self modifying code)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2

2013-10-10 Thread Peter Zijlstra
On Thu, Oct 10, 2013 at 12:16:16PM -0700, Linus Torvalds wrote:
> On Thu, Oct 10, 2013 at 12:04 PM, Steven Rostedt  wrote:
> >
> > I'm wondering if we can have a for_each_cpu() that only disables
> > preemption in the loop.
> 
> I think we'd generally want to have it be something the loop asks for.
> 
> If the loop is just some kind of "gather statistics" thing, I don't
> think it's required. The cost per loop is so low (usually adding up a
> couple of words) that the downside drowns the upside.
> 
> And we could easily look at MAXSMP (or NR_CPUS) at compile-time, and
> not do it for common small values (although it looks like Fedora
> defaults to 128 CPU's for their distro kernels, which seems a bit
> excessive - too many by far for normal people, too few for the crazy
> big ones).

Alternatively we could write it something like:

rcu_read_lock();
for_each_online_node(node) {
for_each_cpu(cpu, cpumask_of_node(node)) {
...
}
cond_resched_rcu();
}
rcu_read_unlock();

But yes, I think this pattern (and variations) should work for most
cases.

The one I'm struggling with atm is kernel/cpuset.c:
rebuild_sched_domains_locked(). Although I'm thinking we should do that
seqlock style, just build it and verify if its still valid, if not, try
again -- although I'm sure it'll be 'fun' to get correct.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2

2013-10-10 Thread Linus Torvalds
On Thu, Oct 10, 2013 at 12:04 PM, Steven Rostedt  wrote:
>
> I'm wondering if we can have a for_each_cpu() that only disables
> preemption in the loop.

I think we'd generally want to have it be something the loop asks for.

If the loop is just some kind of "gather statistics" thing, I don't
think it's required. The cost per loop is so low (usually adding up a
couple of words) that the downside drowns the upside.

And we could easily look at MAXSMP (or NR_CPUS) at compile-time, and
not do it for common small values (although it looks like Fedora
defaults to 128 CPU's for their distro kernels, which seems a bit
excessive - too many by far for normal people, too few for the crazy
big ones).

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2

2013-10-10 Thread Paul E. McKenney
On Thu, Oct 10, 2013 at 08:50:32PM +0200, Peter Zijlstra wrote:
> On Thu, Oct 10, 2013 at 02:43:27PM -0400, Steven Rostedt wrote:
> > On Thu, 10 Oct 2013 11:10:35 -0700
> > Linus Torvalds  wrote:
> >  .. now we can free all the percpu data and kill the CPU ..
> > > 
> > > without any locking anywhere - not stop-machine, not anything. If
> > > somebody is doing a "for_each_cpu()" (under just a regular
> > > rcu_read_lock()) and they see the bit set while it's going down, who
> > > cares? The CPU is still there, the data is accessible..
> > 
> > The problem is that rcu_read_lock() requires preemption disabled unless
> > you are using the preemptable rcu tree version. There's always
> > srcu_read_lock() but that's not so free. It's basically the same as
> > what Peter is doing.
> 
> No, srcu is actually more expensive in the fast path. Although possibly
> we could make SRCU more complex ;-)

Indeed.  Especially if we wanted to get rid of the memory barriers in
srcu_read_lock() and srcu_read_unlock() and still allow SRCU to be used
from idle and offline CPUs!  ;-)

Thanx, Paul

> > There's places in the kernel that does for_each_cpu() that I'm sure you
> > don't want to disable preemption for. Especially when you start having
> > 4096 CPU machines!
> 
> :-)

But shouldn't you only have to disable over each separate pass through
the loop rather across the entire loop?  "I guarantee that the CPU I
just handed you will remain online through the body of the loop" rather
than "I guarantee that no CPU is going anywhere through the entire loop."

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2

2013-10-10 Thread Srivatsa S. Bhat
On 10/10/2013 10:24 PM, Oleg Nesterov wrote:
> On 10/10, Andrew Morton wrote:
>>
>> On Thu, 10 Oct 2013 17:26:12 +0200 Oleg Nesterov  wrote:
>>
>>> On 10/10, Ingo Molnar wrote:

 So ... why not make it _really_ cheap, i.e. the read lock costing nothing,
 and tie CPU hotplug to freezing all tasks in the system?

 Actual CPU hot unplugging and repluggin is _ridiculously_ rare in a
 system, I don't understand how we tolerate _any_ overhead from this utter
 slowpath.
>>>
>>> Well, iirc Srivatsa (cc'ed) pointed out that some systems do cpu_down/up
>>> quite often to save the power.
>>
>> cpu hotremove already uses stop_machine,
> 
> And Srivatsa wants to remove it from cpu_down().
> 

Yes, I have worked on several designs to remove stop_machine() from the cpu_down
path.

http://lwn.net/Articles/538819/
http://lwn.net/Articles/556727/

>> so such an approach shouldn't
>> actually worsen things (a lot) for them?
> 
> this depends on what this "freezing all tasks" actually means.
> I understood it as try_to_freeze_tasks/etc, looks too heavy...
> 
> But my only point was, I am not sure we can assume that cpu_down/up
> is extremly rare and its cost does not matter.
> 
>> use stop_machine() on the add/remove
>> (ie, "writer") side and nothing at all on the "reader" side.  Is there
>> anything which fundamentally prevents cpu hotplug from doing the same?
> 

Its certainly possible to remove stop_machine() from CPU hotplug, as I've
demonstrated in the patchsets mentioned above. And there were pretty good
performance improvements too, arising out of that change, as described here:

http://article.gmane.org/gmane.linux.ports.ppc.embedded/56122

Regards,
Srivatsa S. Bhat



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2

2013-10-10 Thread Steven Rostedt
On Thu, 10 Oct 2013 11:49:15 -0700
Linus Torvalds  wrote:
 
> Oh, and I'm sure there are several users that currently depend on
> being able to sleep over get_online_cpu's.  But I'm pretty sure it's
> "several", not "hundreds", and I think we could fix them up.

I'm wondering if we can have a for_each_cpu() that only disables
preemption in the loop. That is, each iteration will enable preemption,
but the loop itself will guarantee that the current cpu to process
still exists.

rcu_read_lock();
for_each_cpu(cpu, mask) {
rcu_read_unlock();
rcu_read_lock();
if (!cpu_online(cpu))
continue;
[...]
}
rcu_read_unlock();

That way expensive loops wont stop the current CPU to process all
online CPUs.

Of course, it will miss a CPU that comes online. But it would still
need to handle the case of a CPU coming online after the final
put_online_cpus(), so I'm not sure that's a problem.

-- Steve
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2

2013-10-10 Thread Linus Torvalds
On Thu, Oct 10, 2013 at 11:43 AM, Steven Rostedt  wrote:
>
> There's places in the kernel that does for_each_cpu() that I'm sure you
> don't want to disable preemption for. Especially when you start having
> 4096 CPU machines!

We could _easily_ add preemption points in for_each_cpu() for the
MAX_SMP case. We can even do it in the macro itself.

So I wouldn't worry about things like that. I'd expect the "Oh damn,
we have to allocate" case to be more painful, but I think that is
fairly easily handled by having something like a "redo_cpu()" macro
that you can call inside for_each_cpu(). So the pattern for the
(nopefully-not-all-that-common) thing would be something like

   sometype *prealloc = NULL;

   for_each_cpu() {
 if (!prealloc) {
   rcu_read_unlock();  // Or whatever we want to call it
   prealloc = kmalloc(..);
   rcu_read_lock();
   redo_cpu();
  }
   }
   kfree(prealloc);

which is certainly more work than a sleeping lock is, but this looks
like a initialization pattern.

Looking at the things that are actually performance-critical, they
tend to be things like "gather all the statistics for all CPUs".

Honesty in advertizing: I only grepped a couple of users of
get_online_cpus(), and they all seemed like they'd be perfectly happy
with preemption-off. But maybe I was just lucky in my few samples: I
have anecdotes, not data.

  Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2

2013-10-10 Thread Srivatsa S. Bhat
On 10/10/2013 08:56 PM, Oleg Nesterov wrote:
> On 10/10, Ingo Molnar wrote:
>>
>> * Peter Zijlstra  wrote:
>>
>>> But the thing is; our sense of NR_CPUS has shifted, where it used to be
>>> ok to do something like:
>>>
>>>   for_each_cpu()
>>>
>>> With preemption disabled; it gets to be less and less sane to do so,
>>> simply because 'common' hardware has 256+ CPUs these days. If we cannot
>>> rely on preempt disable to exclude hotplug, we must use
>>> get_online_cpus(), but get_online_cpus() is global state and thus cannot
>>> be used at any sort of frequency.
>>
>> So ... why not make it _really_ cheap, i.e. the read lock costing nothing,
>> and tie CPU hotplug to freezing all tasks in the system?
>>
>> Actual CPU hot unplugging and repluggin is _ridiculously_ rare in a
>> system, I don't understand how we tolerate _any_ overhead from this utter
>> slowpath.
> 
> Well, iirc Srivatsa (cc'ed) pointed out that some systems do cpu_down/up
> quite often to save the power.
> 

Yes, I've heard of such systems and so I might have brought them up during
discussions about CPU hotplug. But unfortunately, I have been misquoted quite
often, leading to the wrong impression that I have such a usecase or that I
recommend/support using CPU hotplug for power management. So let me clarify
that part, while I have the chance. (And I don't blame anyone for that. I
work on power-management related areas, and I've worked on improving/optimizing
CPU hotplug; so its pretty natural to make a connection between the two
and assume that I tried to optimize CPU hotplug keeping power management
in mind. But that's not the case, as I explain below.)

I started out trying to make suspend/resume more reliable, scalable and fast.
And suspend/resume uses CPU hotplug underneath and that's a pretty valid 
usecase.
So with that, I started looking at CPU hotplug and soon realized the mess it
had become. So I started working on cleaning up that mess, like rethinking the
whole notifier scheme[1], and removing the ridiculous stop_machine() from the
cpu_down path[2] etc. But the intention behind all this work was just to make
CPU hotplug cleaner/saner/bug-free and possibly speed up suspend/resume. IOW,
I didn't have any explicit intention to make it easier for people to use it
for power management, although I understood that some of this work might
help those poor souls who don't have any other choice, for whatever reason.
And fortunately, (IIUC) the number of systems/people relying on CPU hotplug for
power management has reduced quite a bit in the recent times, which is a very
good thing.

So, to reiterate, I totally agree that power-aware scheduler is the right way
to do CPU power management; CPU hotplug is simply not the tool to use for that.
No question about that. Also, system shutdown used to depend on CPU hotplug to
disable the non-boot CPUs, but we don't do that any more after commit cf7df378a,
which is a very welcome change. And in future if we can somehow do 
suspend/resume
without using CPU hotplug, that would be absolutely wonderful as well. (There
have been discussions in the past around this, but nobody has a solution yet).
The other valid usecases that I can think of, for using CPU hotplug, is for RAS
reasons and for DLPAR (Dynamic Logical Partitioning) operations on powerpc, both
of which are not performance-sensitive, AFAIK.


[1]. Reverse invocation of CPU hotplug notifiers
 http://lwn.net/Articles/508072/

[2]. Stop-machine()-free CPU hotplug
 http://lwn.net/Articles/538819/ (v6)
 http://lwn.net/Articles/556727/

Regards,
Srivatsa S. Bhat

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2

2013-10-10 Thread Peter Zijlstra
On Thu, Oct 10, 2013 at 02:43:27PM -0400, Steven Rostedt wrote:
> On Thu, 10 Oct 2013 11:10:35 -0700
> Linus Torvalds  wrote:
>  .. now we can free all the percpu data and kill the CPU ..
> > 
> > without any locking anywhere - not stop-machine, not anything. If
> > somebody is doing a "for_each_cpu()" (under just a regular
> > rcu_read_lock()) and they see the bit set while it's going down, who
> > cares? The CPU is still there, the data is accessible..
> 
> The problem is that rcu_read_lock() requires preemption disabled unless
> you are using the preemptable rcu tree version. There's always
> srcu_read_lock() but that's not so free. It's basically the same as
> what Peter is doing.

No, srcu is actually more expensive in the fast path. Although possibly
we could make SRCU more complex ;-)

> There's places in the kernel that does for_each_cpu() that I'm sure you
> don't want to disable preemption for. Especially when you start having
> 4096 CPU machines!

:-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2

2013-10-10 Thread Linus Torvalds
On Thu, Oct 10, 2013 at 11:34 AM, Peter Zijlstra  wrote:
>
> But my point is that even though there aren't many of these today; with
> the growing number of cpus in 'commodity' hardware you want to move away
> from using preempt_disable() as hotplug lock.

Umm.

Wasn't this pretty much the argument for the COMPLETELY F*CKED UP
change to make the vma locking use a semaphore?

The whole "it's more convenient to use sleeping locks" argument is
PURE AND UTTER SHIT when it comes to really core code. We are *much*
better off saying "this is core, we care so deeply about performance
that you had better use your brain before you do this".

Seriously. Your argument is bad, but more importantly, it is
*dangerously* bad. It's crap that results in bad code: and the bad
code is almost impossible to fix up later, because once you encourage
people to do the easy thing, they'll do so.

Yes, preempt_disable() is harder to use than sleeping locks. You need
to do pre-allocation etc. But it is much *much* more efficient.

And in the kernel, we care. We have the resources. Plus, we can also
say "if you can't handle it, don't do it". We don't need new features
so badly that we are willing to screw up core code.

As to your kernel/events/core.c example, things like that are
generally pretty easy to fix. The good thing about preemption-disable
is:

 - it's pretty easily debuggable, because you do get big warnings of
"you screwed up exactly _here_"

 - it's so cheap that especially when it comes to things that can
sleep, it's perfectly ok to drop the preemption count, do the sleeping
thing, re-disable preemption and then check if we still needed the
data.

Oh, and I'm sure there are several users that currently depend on
being able to sleep over get_online_cpu's.  But I'm pretty sure it's
"several", not "hundreds", and I think we could fix them up.

 Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2

2013-10-10 Thread Peter Zijlstra
On Thu, Oct 10, 2013 at 11:10:35AM -0700, Linus Torvalds wrote:
> You can't do that right now - since you have to get the cpu list. So
> it may not be with "preemption enabled", but it should always be under
> the locking provided by get_online_cpus().. That one allows sleeping,
> though.
> 
> I personally would *love* to make CPU hotplug be a lockless thing
> entirely. But I detest stop-machine too, because it has these really
> annoying properties.
> 
> So if we want to make it zero-cost to look at online CPU data, can we
> avoid even the stop-machine synchronization, instead saying that the
> cpu hotplug bitmap is updated completely locklessly, but if you see a
> bit set, the data associated with that CPU is guaranteed to still be
> available.
> 
> IOW, just use "RCU semantics" on a per-bit level. When we offline a CPU, we do
> 
>  clear_bit(cpu, cpu_online_mask);
>  rcu_synchronize();
>  .. now we can free all the percpu data and kill the CPU ..
> 
> without any locking anywhere - not stop-machine, not anything. If
> somebody is doing a "for_each_cpu()" (under just a regular
> rcu_read_lock()) and they see the bit set while it's going down, who
> cares? The CPU is still there, the data is accessible..
> 
> I'm sure there's some reason the above wouldn't work, but the above
> would seem to be pretty optimal. Why do we really force this big
> locking thing? The new patches make that locking _smarter_, but it's
> still a damn big lock. Could we possibly go _beyond_ the lock?

The only down-side to doing this is that you cannot actually allocate
memory under rcu_read_lock() because it might not allow preemption.

That said; I like the idea. I'll go try and audit the get_online_cpus()
sites to see if there's any that really need full exclusion.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2

2013-10-10 Thread Steven Rostedt
On Thu, 10 Oct 2013 11:10:35 -0700
Linus Torvalds  wrote:
 .. now we can free all the percpu data and kill the CPU ..
> 
> without any locking anywhere - not stop-machine, not anything. If
> somebody is doing a "for_each_cpu()" (under just a regular
> rcu_read_lock()) and they see the bit set while it's going down, who
> cares? The CPU is still there, the data is accessible..

The problem is that rcu_read_lock() requires preemption disabled unless
you are using the preemptable rcu tree version. There's always
srcu_read_lock() but that's not so free. It's basically the same as
what Peter is doing.

There's places in the kernel that does for_each_cpu() that I'm sure you
don't want to disable preemption for. Especially when you start having
4096 CPU machines!

-- Steve
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2

2013-10-10 Thread Peter Zijlstra
On Thu, Oct 10, 2013 at 10:13:53AM -0700, Paul E. McKenney wrote:
> That does add quite a bit of latency to the hotplug operations, which
> IIRC slows down things like boot, suspend, and resume.

Guess what suspend does before it unplugs all these cpus?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2

2013-10-10 Thread Peter Zijlstra
On Thu, Oct 10, 2013 at 10:48:56AM -0700, Andrew Morton wrote:
> > > Very much agreed; now stop_machine() wouldn't actually work for hotplug
> > > because it will instantly preempt everybody, including someone who might
> > > be in the middle of using per-cpu state of the cpu we're about to
> > > remove.
> > 
> > Well, stop machine doesn't instantly preempt everybody. Only those that
> > don't have preemption disabled. Using per_cpu without preemption
> > disabled can be dangerous.

Can be, but there's more of that around than you want. Also that's not
the entire issue.

> Yes, I'd have thought that the cases where a CPU is fiddling with
> another CPU's percpu data with preemption enabled would be rather rare.

Look at kernel/events/core.c:swevent_hlist_get() it does something like:

  get_online_cpus();
  for_each_online_cpu() {
allocate_hash_table();
  }
  put_online_cpus();

Surely you don't want to do that with preemption disabled?

But my point is that even though there aren't many of these today; with
the growing number of cpus in 'commodity' hardware you want to move away
from using preempt_disable() as hotplug lock.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2

2013-10-10 Thread Linus Torvalds
On Thu, Oct 10, 2013 at 10:48 AM, Andrew Morton
 wrote:
>
> Yes, I'd have thought that the cases where a CPU is fiddling with
> another CPU's percpu data with preemption enabled would be rather rare.
>
> I can't actually think of any off the top.  Are there examples we can
> look at?

You can't do that right now - since you have to get the cpu list. So
it may not be with "preemption enabled", but it should always be under
the locking provided by get_online_cpus().. That one allows sleeping,
though.

I personally would *love* to make CPU hotplug be a lockless thing
entirely. But I detest stop-machine too, because it has these really
annoying properties.

So if we want to make it zero-cost to look at online CPU data, can we
avoid even the stop-machine synchronization, instead saying that the
cpu hotplug bitmap is updated completely locklessly, but if you see a
bit set, the data associated with that CPU is guaranteed to still be
available.

IOW, just use "RCU semantics" on a per-bit level. When we offline a CPU, we do

 clear_bit(cpu, cpu_online_mask);
 rcu_synchronize();
 .. now we can free all the percpu data and kill the CPU ..

without any locking anywhere - not stop-machine, not anything. If
somebody is doing a "for_each_cpu()" (under just a regular
rcu_read_lock()) and they see the bit set while it's going down, who
cares? The CPU is still there, the data is accessible..

I'm sure there's some reason the above wouldn't work, but the above
would seem to be pretty optimal. Why do we really force this big
locking thing? The new patches make that locking _smarter_, but it's
still a damn big lock. Could we possibly go _beyond_ the lock?

 Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2

2013-10-10 Thread Andrew Morton
On Thu, 10 Oct 2013 13:13:05 -0400 Steven Rostedt  wrote:

> > 
> > > > Why prevent all CPUs from running when we want to remove
> > > > one?
> > > 
> > > So get_online_cpus() goes away.  Nothing is more scalable than nothing!
> > 
> > Very much agreed; now stop_machine() wouldn't actually work for hotplug
> > because it will instantly preempt everybody, including someone who might
> > be in the middle of using per-cpu state of the cpu we're about to
> > remove.
> 
> Well, stop machine doesn't instantly preempt everybody. Only those that
> don't have preemption disabled. Using per_cpu without preemption
> disabled can be dangerous.

Yes, I'd have thought that the cases where a CPU is fiddling with
another CPU's percpu data with preemption enabled would be rather rare.

I can't actually think of any off the top.  Are there examples we can
look at?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2

2013-10-10 Thread Oleg Nesterov
On 10/10, Peter Zijlstra wrote:
>
> The freeze suggestion from Ingo would actually work because we freeze
> tasks at known good points

Not really known/good, we have more and more freezable_schedule's.
But probably this is fine, nobody should do this under get_online_cpus().

> (userspace and kthread_freeze() points) where
> we know they're not fiddling with per-cpu state.

We should also take care of PF_NOFREEZE kthreads.

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2

2013-10-10 Thread Paul E. McKenney
On Thu, Oct 10, 2013 at 06:52:29PM +0200, Ingo Molnar wrote:
> 
> * Andrew Morton  wrote:
> 
> > On Thu, 10 Oct 2013 17:26:12 +0200 Oleg Nesterov  wrote:
> > 
> > > On 10/10, Ingo Molnar wrote:
> > > >
> > > > * Peter Zijlstra  wrote:
> > > >
> > > > > But the thing is; our sense of NR_CPUS has shifted, where it used to 
> > > > > be
> > > > > ok to do something like:
> > > > >
> > > > >   for_each_cpu()
> > > > >
> > > > > With preemption disabled; it gets to be less and less sane to do 
> > > > > so, simply because 'common' hardware has 256+ CPUs these days. If 
> > > > > we cannot rely on preempt disable to exclude hotplug, we must use 
> > > > > get_online_cpus(), but get_online_cpus() is global state and thus 
> > > > > cannot be used at any sort of frequency.
> > > >
> > > > So ... why not make it _really_ cheap, i.e. the read lock costing 
> > > > nothing, and tie CPU hotplug to freezing all tasks in the system?
> > > >
> > > > Actual CPU hot unplugging and repluggin is _ridiculously_ rare in a 
> > > > system, I don't understand how we tolerate _any_ overhead from this 
> > > > utter slowpath.
> > > 
> > > Well, iirc Srivatsa (cc'ed) pointed out that some systems do 
> > > cpu_down/up quite often to save the power.
> > 
> > cpu hotremove already uses stop_machine, so such an approach shouldn't 
> > actually worsen things (a lot) for them?
> 
> Also, using CPU hotremove to save power, instead of implementing proper 
> power aware scheduling, is very broken to begin with.

In many cases, agreed.  Particularly on devices where there is an easy
way to send out regular updates and bug fixes, and especially if they
are connected to AC power.

But if I was in charge of creating a battery-powered multicore embedded
device that was to ship in million-unit quantities, especially in the
not-uncommon case where more than one CPU is needed only for specific
actions, I would offline the extra CPUs except when I was sure they were
needed.  This would of course still rely on power-aware scheduling when
more than one CPU was in use.  The power-aware scheduling conserves energy
for short-term variations (in which case offlining really is useless),
but longer term "keep this CPU -off-" is provided by offlining.

Yes, the Linux kernel code quality is quite good, but bugs still happen,
and bugs can easily cause CPUs to power up unnecessarily.  This doesn't
have to happen very often to kill your battery.  If the extra CPUs are
offline, a large class of those bugs are rendered harmless.  If you are
going to have a very large number of difficult-to-update battery-powered
devices in the field, defending against such bugs is worthwhile.

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2

2013-10-10 Thread Ingo Molnar

* Paul E. McKenney  wrote:

> On Thu, Oct 10, 2013 at 06:50:46PM +0200, Ingo Molnar wrote:
> > 
> > * Peter Zijlstra  wrote:
> > 
> > > On Thu, Oct 10, 2013 at 04:57:38PM +0200, Ingo Molnar wrote:
> > >
> > > > So ... why not make it _really_ cheap, i.e. the read lock costing 
> > > > nothing, and tie CPU hotplug to freezing all tasks in the system?
> > > 
> > > Such that we freeze regular tasks in userspace and kernel tasks in 
> > > their special freezer callback so as to guarantee minimal state?
> > 
> > Yes, absolutely.
> 
> That does add quite a bit of latency to the hotplug operations, which 
> IIRC slows down things like boot, suspend, and resume.

I think it should _speed up_ suspend: instead of bringing down each CPU 
one by one, they could just be all stopped.

Same for bootup - it's not really a hotplug operation in the traditional 
sense, as all CPUs are 'present' at once, and they could be added in one 
group operation instead of serializing the bootup.

Also, most CPU bootup sequences are serialized.

It would slow down individual CPU hotplug/hotunplug - and that is what is 
a really rare operation outside broken schemes to save power ...

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2

2013-10-10 Thread Paul E. McKenney
On Thu, Oct 10, 2013 at 06:50:46PM +0200, Ingo Molnar wrote:
> 
> * Peter Zijlstra  wrote:
> 
> > On Thu, Oct 10, 2013 at 04:57:38PM +0200, Ingo Molnar wrote:
> >
> > > So ... why not make it _really_ cheap, i.e. the read lock costing 
> > > nothing, and tie CPU hotplug to freezing all tasks in the system?
> > 
> > Such that we freeze regular tasks in userspace and kernel tasks in their 
> > special freezer callback so as to guarantee minimal state?
> 
> Yes, absolutely.

That does add quite a bit of latency to the hotplug operations, which
IIRC slows down things like boot, suspend, and resume.

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2

2013-10-10 Thread Steven Rostedt
On Thu, 10 Oct 2013 18:53:37 +0200
Peter Zijlstra  wrote:

> On Thu, Oct 10, 2013 at 09:43:55AM -0700, Andrew Morton wrote:
> > > But we would like to remove stomp machine from
> > > CPU hotplug.
> > 
> > We do?  That's news.  It wasn't mentioned in the changelog and should
> > have been.  Why?  
> 
> It would be an unrelated change to this and unrelated to the reasons as
> to why I want a faster get_online_cpus().

Yeah, sorry for the confusion. My comment wasn't really about this
change set but about stop machine and hotplug in general. Needing stop
machine for hotplug has been a complaint by many, but off topic for
this particular change set.

> 
> > > Why prevent all CPUs from running when we want to remove
> > > one?
> > 
> > So get_online_cpus() goes away.  Nothing is more scalable than nothing!
> 
> Very much agreed; now stop_machine() wouldn't actually work for hotplug
> because it will instantly preempt everybody, including someone who might
> be in the middle of using per-cpu state of the cpu we're about to
> remove.

Well, stop machine doesn't instantly preempt everybody. Only those that
don't have preemption disabled. Using per_cpu without preemption
disabled can be dangerous. Except for the migrate disable we want to
add for -rt. Then we can't rely on migrate disable and stop machine
making sure per_cpu data isn't being used. Oh, and for threads that use
per_cpu that are bound to a CPU. But then they need to be taken care of
too for their CPU going off line.

But this would also require the "get_online_cpu()s" to disable
preemption as well. Not quite a "nothing" we are looking for.


-- Steve
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2

2013-10-10 Thread Oleg Nesterov
On 10/10, Andrew Morton wrote:
>
> On Thu, 10 Oct 2013 17:26:12 +0200 Oleg Nesterov  wrote:
>
> > On 10/10, Ingo Molnar wrote:
> > >
> > > So ... why not make it _really_ cheap, i.e. the read lock costing nothing,
> > > and tie CPU hotplug to freezing all tasks in the system?
> > >
> > > Actual CPU hot unplugging and repluggin is _ridiculously_ rare in a
> > > system, I don't understand how we tolerate _any_ overhead from this utter
> > > slowpath.
> >
> > Well, iirc Srivatsa (cc'ed) pointed out that some systems do cpu_down/up
> > quite often to save the power.
>
> cpu hotremove already uses stop_machine,

And Srivatsa wants to remove it from cpu_down().

> so such an approach shouldn't
> actually worsen things (a lot) for them?

this depends on what this "freezing all tasks" actually means.
I understood it as try_to_freeze_tasks/etc, looks too heavy...

But my only point was, I am not sure we can assume that cpu_down/up
is extremly rare and its cost does not matter.

> use stop_machine() on the add/remove
> (ie, "writer") side and nothing at all on the "reader" side.  Is there
> anything which fundamentally prevents cpu hotplug from doing the same?

Well, then we actually need to park all tasks in system, I guess.
IOW, freezer.

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2

2013-10-10 Thread Peter Zijlstra
On Thu, Oct 10, 2013 at 09:43:55AM -0700, Andrew Morton wrote:
> > But we would like to remove stomp machine from
> > CPU hotplug.
> 
> We do?  That's news.  It wasn't mentioned in the changelog and should
> have been.  Why?  

It would be an unrelated change to this and unrelated to the reasons as
to why I want a faster get_online_cpus().

> > Why prevent all CPUs from running when we want to remove
> > one?
> 
> So get_online_cpus() goes away.  Nothing is more scalable than nothing!

Very much agreed; now stop_machine() wouldn't actually work for hotplug
because it will instantly preempt everybody, including someone who might
be in the middle of using per-cpu state of the cpu we're about to
remove.

The freeze suggestion from Ingo would actually work because we freeze
tasks at known good points (userspace and kthread_freeze() points) where
we know they're not fiddling with per-cpu state.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2

2013-10-10 Thread Ingo Molnar

* Andrew Morton  wrote:

> On Thu, 10 Oct 2013 17:26:12 +0200 Oleg Nesterov  wrote:
> 
> > On 10/10, Ingo Molnar wrote:
> > >
> > > * Peter Zijlstra  wrote:
> > >
> > > > But the thing is; our sense of NR_CPUS has shifted, where it used to be
> > > > ok to do something like:
> > > >
> > > >   for_each_cpu()
> > > >
> > > > With preemption disabled; it gets to be less and less sane to do 
> > > > so, simply because 'common' hardware has 256+ CPUs these days. If 
> > > > we cannot rely on preempt disable to exclude hotplug, we must use 
> > > > get_online_cpus(), but get_online_cpus() is global state and thus 
> > > > cannot be used at any sort of frequency.
> > >
> > > So ... why not make it _really_ cheap, i.e. the read lock costing 
> > > nothing, and tie CPU hotplug to freezing all tasks in the system?
> > >
> > > Actual CPU hot unplugging and repluggin is _ridiculously_ rare in a 
> > > system, I don't understand how we tolerate _any_ overhead from this 
> > > utter slowpath.
> > 
> > Well, iirc Srivatsa (cc'ed) pointed out that some systems do 
> > cpu_down/up quite often to save the power.
> 
> cpu hotremove already uses stop_machine, so such an approach shouldn't 
> actually worsen things (a lot) for them?

Also, using CPU hotremove to save power, instead of implementing proper 
power aware scheduling, is very broken to begin with.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2

2013-10-10 Thread Ingo Molnar

* Peter Zijlstra  wrote:

> On Thu, Oct 10, 2013 at 04:57:38PM +0200, Ingo Molnar wrote:
>
> > So ... why not make it _really_ cheap, i.e. the read lock costing 
> > nothing, and tie CPU hotplug to freezing all tasks in the system?
> 
> Such that we freeze regular tasks in userspace and kernel tasks in their 
> special freezer callback so as to guarantee minimal state?

Yes, absolutely.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2

2013-10-10 Thread Andrew Morton
On Thu, 10 Oct 2013 12:36:31 -0400 Steven Rostedt  wrote:

> On Thu, 10 Oct 2013 09:00:44 -0700
> Andrew Morton  wrote:
> 
> > It's been ages since I looked at this stuff :( Although it isn't used
> > much, memory hotplug manages to use stop_machine() on the add/remove
> > (ie, "writer") side and nothing at all on the "reader" side.  Is there
> > anything which fundamentally prevents cpu hotplug from doing the same?
> 
> 
> I would think that memory hotplug may require stop machine as all CPUs
> may touch that memory.

Sure.

> But we would like to remove stomp machine from
> CPU hotplug.

We do?  That's news.  It wasn't mentioned in the changelog and should
have been.  Why?  

> Why prevent all CPUs from running when we want to remove
> one?

So get_online_cpus() goes away.  Nothing is more scalable than nothing!
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2

2013-10-10 Thread Steven Rostedt
On Thu, 10 Oct 2013 09:00:44 -0700
Andrew Morton  wrote:

> It's been ages since I looked at this stuff :( Although it isn't used
> much, memory hotplug manages to use stop_machine() on the add/remove
> (ie, "writer") side and nothing at all on the "reader" side.  Is there
> anything which fundamentally prevents cpu hotplug from doing the same?


I would think that memory hotplug may require stop machine as all CPUs
may touch that memory. But we would like to remove stomp machine from
CPU hotplug. Why prevent all CPUs from running when we want to remove
one?

-- Steve
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2

2013-10-10 Thread Andrew Morton
On Thu, 10 Oct 2013 17:26:12 +0200 Oleg Nesterov  wrote:

> On 10/10, Ingo Molnar wrote:
> >
> > * Peter Zijlstra  wrote:
> >
> > > But the thing is; our sense of NR_CPUS has shifted, where it used to be
> > > ok to do something like:
> > >
> > >   for_each_cpu()
> > >
> > > With preemption disabled; it gets to be less and less sane to do so,
> > > simply because 'common' hardware has 256+ CPUs these days. If we cannot
> > > rely on preempt disable to exclude hotplug, we must use
> > > get_online_cpus(), but get_online_cpus() is global state and thus cannot
> > > be used at any sort of frequency.
> >
> > So ... why not make it _really_ cheap, i.e. the read lock costing nothing,
> > and tie CPU hotplug to freezing all tasks in the system?
> >
> > Actual CPU hot unplugging and repluggin is _ridiculously_ rare in a
> > system, I don't understand how we tolerate _any_ overhead from this utter
> > slowpath.
> 
> Well, iirc Srivatsa (cc'ed) pointed out that some systems do cpu_down/up
> quite often to save the power.

cpu hotremove already uses stop_machine, so such an approach shouldn't
actually worsen things (a lot) for them?

It's been ages since I looked at this stuff :( Although it isn't used
much, memory hotplug manages to use stop_machine() on the add/remove
(ie, "writer") side and nothing at all on the "reader" side.  Is there
anything which fundamentally prevents cpu hotplug from doing the same?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2

2013-10-10 Thread Oleg Nesterov
On 10/10, Peter Zijlstra wrote:
>
> That said, Oleg wants to use the same scheme for percpu_rwsem,

Yes, and later then (I hope) get_online_cpus() will be just
current->cpuhp_ref++ || percpu_down_read().

(just in case, we only need ->cpuhp_ref to ensure that the readers
 can't starve the writers as it currently possible. iow to block the
 new readers but ensure that the recursive get/down_read can't block)

And please look at sb_wait_write/sb_start_write. It uses the same
logic except:

- it doesn't have the "true" fast-path for readers

- its "slow" mode is slower than necessary

- it is buggy (afaics)

> and I'm
> not sure his write-side (something in uprobes) wants to do the same.

surely not ;)

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2

2013-10-10 Thread Oleg Nesterov
On 10/10, Ingo Molnar wrote:
>
> * Peter Zijlstra  wrote:
>
> > But the thing is; our sense of NR_CPUS has shifted, where it used to be
> > ok to do something like:
> >
> >   for_each_cpu()
> >
> > With preemption disabled; it gets to be less and less sane to do so,
> > simply because 'common' hardware has 256+ CPUs these days. If we cannot
> > rely on preempt disable to exclude hotplug, we must use
> > get_online_cpus(), but get_online_cpus() is global state and thus cannot
> > be used at any sort of frequency.
>
> So ... why not make it _really_ cheap, i.e. the read lock costing nothing,
> and tie CPU hotplug to freezing all tasks in the system?
>
> Actual CPU hot unplugging and repluggin is _ridiculously_ rare in a
> system, I don't understand how we tolerate _any_ overhead from this utter
> slowpath.

Well, iirc Srivatsa (cc'ed) pointed out that some systems do cpu_down/up
quite often to save the power.

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2

2013-10-10 Thread Peter Zijlstra
On Thu, Oct 10, 2013 at 04:57:38PM +0200, Ingo Molnar wrote:
> So ... why not make it _really_ cheap, i.e. the read lock costing nothing, 
> and tie CPU hotplug to freezing all tasks in the system?

Such that we freeze regular tasks in userspace and kernel tasks in their
special freezer callback so as to guarantee minimal state?

If you can get people to agree with that I'm all for it.

That said, Oleg wants to use the same scheme for percpu_rwsem, and I'm
not sure his write-side (something in uprobes) wants to do the same.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2

2013-10-10 Thread Ingo Molnar

* Peter Zijlstra  wrote:

> But the thing is; our sense of NR_CPUS has shifted, where it used to be 
> ok to do something like:
> 
>   for_each_cpu()
> 
> With preemption disabled; it gets to be less and less sane to do so, 
> simply because 'common' hardware has 256+ CPUs these days. If we cannot 
> rely on preempt disable to exclude hotplug, we must use 
> get_online_cpus(), but get_online_cpus() is global state and thus cannot 
> be used at any sort of frequency.

So ... why not make it _really_ cheap, i.e. the read lock costing nothing, 
and tie CPU hotplug to freezing all tasks in the system?

Actual CPU hot unplugging and repluggin is _ridiculously_ rare in a 
system, I don't understand how we tolerate _any_ overhead from this utter 
slowpath.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2

2013-10-10 Thread Peter Zijlstra
On Wed, Oct 09, 2013 at 10:50:06PM -0700, Andrew Morton wrote:
> On Tue, 08 Oct 2013 12:25:05 +0200 Peter Zijlstra  
> wrote:
> 
> > The current cpu hotplug lock is a single global lock; therefore excluding
> > hotplug is a very expensive proposition even though it is rare occurrence 
> > under
> > normal operation.
> > 
> > There is a desire for a more light weight implementation of
> > {get,put}_online_cpus() from both the NUMA scheduling as well as the -RT 
> > side.
> > 
> > The current hotplug lock is a full reader preference lock -- and thus 
> > supports
> > reader recursion. However since we're making the read side lock much 
> > cheaper it
> > is the expectation that it will also be used far more. Which in turn would 
> > lead
> > to writer starvation.
> > 
> > Therefore the new lock proposed is completely fair; albeit somewhat 
> > expensive
> > on the write side. This in turn means that we need a per-task nesting count 
> > to
> > support reader recursion.
> 
> This is a lot of code and a lot of new complexity.  It needs some pretty
> convincing performance numbers to justify its inclusion, no?

And here I thought it was generally understood to be unwise to bash
global state on anything like a regular manner from every cpu.

The NUMA bits really ought to use get_online_cpus()/put_online_cpus() on
every balance pass; which is about once a second on every cpu.

RT -- which has some quite horrible hotplug hacks due to this --
basically takes get_online_cpus() for every spin_lock/spin_unlock in the
kernel.

But the thing is; our sense of NR_CPUS has shifted, where it used to be
ok to do something like:

  for_each_cpu()

With preemption disabled; it gets to be less and less sane to do so,
simply because 'common' hardware has 256+ CPUs these days. If we cannot
rely on preempt disable to exclude hotplug, we must use
get_online_cpus(), but get_online_cpus() is global state and thus cannot
be used at any sort of frequency.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2

2013-10-10 Thread Ingo Molnar

* Andrew Morton  wrote:

> On Thu, 10 Oct 2013 09:27:57 +0200 Ingo Molnar  wrote:
> 
> > > > Should be fairly straightforward to test: the sys_sched_getaffinity() 
> > > > and sys_sched_setaffinity() syscalls both make use of 
> > > > get_online_cpus()/put_online_cpus(), so a testcase frobbing affinities 
> > > > on N CPUs in parallel ought to demonstrate scalability improvements 
> > > > pretty nicely.
> > > 
> > > Well, an in-kernel microbenchmark which camps in a loop doing get/put 
> > > would measure this as well.
> > > 
> > > But neither approach answers the question "how useful is this patchset".
> > 
> > Even ignoring all the other reasons cited, sys_sched_getaffinity() / 
> > sys_sched_setaffinity() are prime time system calls, and as long as 
> > the patches are correct, speeding them up is worthwhile.
> 
> That I would not have guessed.  What's the use case for calling 
> get/set_affinity at high frequency?

I don't think high-freq usage is common (at all).

It could happen in AIM7-like workloads that start up a ton of binaries in 
parallel, which can easily create parallel sched_getaffinity() calls 
during process startup?

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2

2013-10-10 Thread Andrew Morton
On Thu, 10 Oct 2013 09:27:57 +0200 Ingo Molnar  wrote:

> > > Should be fairly straightforward to test: the sys_sched_getaffinity() 
> > > and sys_sched_setaffinity() syscalls both make use of 
> > > get_online_cpus()/put_online_cpus(), so a testcase frobbing affinities 
> > > on N CPUs in parallel ought to demonstrate scalability improvements 
> > > pretty nicely.
> > 
> > Well, an in-kernel microbenchmark which camps in a loop doing get/put 
> > would measure this as well.
> > 
> > But neither approach answers the question "how useful is this patchset".
> 
> Even ignoring all the other reasons cited, sys_sched_getaffinity() / 
> sys_sched_setaffinity() are prime time system calls, and as long as the 
> patches are correct, speeding them up is worthwhile.

That I would not have guessed.  What's the use case for calling
get/set_affinity at high frequency?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2

2013-10-10 Thread Ingo Molnar

* Andrew Morton  wrote:

> On Thu, 10 Oct 2013 08:27:41 +0200 Ingo Molnar  wrote:
> 
> > * Andrew Morton  wrote:
> > 
> > > On Tue, 08 Oct 2013 12:25:05 +0200 Peter Zijlstra  
> > > wrote:
> > > 
> > > > The current cpu hotplug lock is a single global lock; therefore 
> > > > excluding hotplug is a very expensive proposition even though it is 
> > > > rare occurrence under normal operation.
> > > > 
> > > > There is a desire for a more light weight implementation of 
> > > > {get,put}_online_cpus() from both the NUMA scheduling as well as the 
> > > > -RT side.
> > > > 
> > > > The current hotplug lock is a full reader preference lock -- and thus 
> > > > supports reader recursion. However since we're making the read side 
> > > > lock much cheaper it is the expectation that it will also be used far 
> > > > more. Which in turn would lead to writer starvation.
> > > > 
> > > > Therefore the new lock proposed is completely fair; albeit somewhat 
> > > > expensive on the write side. This in turn means that we need a 
> > > > per-task nesting count to support reader recursion.
> > > 
> > > This is a lot of code and a lot of new complexity.  It needs some pretty 
> > > convincing performance numbers to justify its inclusion, no?
> > 
> > Should be fairly straightforward to test: the sys_sched_getaffinity() 
> > and sys_sched_setaffinity() syscalls both make use of 
> > get_online_cpus()/put_online_cpus(), so a testcase frobbing affinities 
> > on N CPUs in parallel ought to demonstrate scalability improvements 
> > pretty nicely.
> 
> Well, an in-kernel microbenchmark which camps in a loop doing get/put 
> would measure this as well.
> 
> But neither approach answers the question "how useful is this patchset".

Even ignoring all the other reasons cited, sys_sched_getaffinity() / 
sys_sched_setaffinity() are prime time system calls, and as long as the 
patches are correct, speeding them up is worthwhile.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2

2013-10-10 Thread Andrew Morton
On Thu, 10 Oct 2013 08:27:41 +0200 Ingo Molnar  wrote:

> * Andrew Morton  wrote:
> 
> > On Tue, 08 Oct 2013 12:25:05 +0200 Peter Zijlstra  
> > wrote:
> > 
> > > The current cpu hotplug lock is a single global lock; therefore 
> > > excluding hotplug is a very expensive proposition even though it is 
> > > rare occurrence under normal operation.
> > > 
> > > There is a desire for a more light weight implementation of 
> > > {get,put}_online_cpus() from both the NUMA scheduling as well as the 
> > > -RT side.
> > > 
> > > The current hotplug lock is a full reader preference lock -- and thus 
> > > supports reader recursion. However since we're making the read side 
> > > lock much cheaper it is the expectation that it will also be used far 
> > > more. Which in turn would lead to writer starvation.
> > > 
> > > Therefore the new lock proposed is completely fair; albeit somewhat 
> > > expensive on the write side. This in turn means that we need a 
> > > per-task nesting count to support reader recursion.
> > 
> > This is a lot of code and a lot of new complexity.  It needs some pretty 
> > convincing performance numbers to justify its inclusion, no?
> 
> Should be fairly straightforward to test: the sys_sched_getaffinity() and 
> sys_sched_setaffinity() syscalls both make use of 
> get_online_cpus()/put_online_cpus(), so a testcase frobbing affinities on 
> N CPUs in parallel ought to demonstrate scalability improvements pretty 
> nicely.

Well, an in-kernel microbenchmark which camps in a loop doing get/put
would measure this as well.

But neither approach answers the question "how useful is this patchset".
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2

2013-10-10 Thread Ingo Molnar

* Andrew Morton  wrote:

> On Tue, 08 Oct 2013 12:25:05 +0200 Peter Zijlstra  
> wrote:
> 
> > The current cpu hotplug lock is a single global lock; therefore 
> > excluding hotplug is a very expensive proposition even though it is 
> > rare occurrence under normal operation.
> > 
> > There is a desire for a more light weight implementation of 
> > {get,put}_online_cpus() from both the NUMA scheduling as well as the 
> > -RT side.
> > 
> > The current hotplug lock is a full reader preference lock -- and thus 
> > supports reader recursion. However since we're making the read side 
> > lock much cheaper it is the expectation that it will also be used far 
> > more. Which in turn would lead to writer starvation.
> > 
> > Therefore the new lock proposed is completely fair; albeit somewhat 
> > expensive on the write side. This in turn means that we need a 
> > per-task nesting count to support reader recursion.
> 
> This is a lot of code and a lot of new complexity.  It needs some pretty 
> convincing performance numbers to justify its inclusion, no?

Should be fairly straightforward to test: the sys_sched_getaffinity() and 
sys_sched_setaffinity() syscalls both make use of 
get_online_cpus()/put_online_cpus(), so a testcase frobbing affinities on 
N CPUs in parallel ought to demonstrate scalability improvements pretty 
nicely.

[ It's not just about affinities: in particular sys_sched_getaffinity() 
  also gets used as a NR_CPUS runtime detection method in apps, so it 
  matters to regular non-affine loads as well. ]

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2

2013-10-10 Thread Ingo Molnar

* Andrew Morton a...@linux-foundation.org wrote:

 On Tue, 08 Oct 2013 12:25:05 +0200 Peter Zijlstra pet...@infradead.org 
 wrote:
 
  The current cpu hotplug lock is a single global lock; therefore 
  excluding hotplug is a very expensive proposition even though it is 
  rare occurrence under normal operation.
  
  There is a desire for a more light weight implementation of 
  {get,put}_online_cpus() from both the NUMA scheduling as well as the 
  -RT side.
  
  The current hotplug lock is a full reader preference lock -- and thus 
  supports reader recursion. However since we're making the read side 
  lock much cheaper it is the expectation that it will also be used far 
  more. Which in turn would lead to writer starvation.
  
  Therefore the new lock proposed is completely fair; albeit somewhat 
  expensive on the write side. This in turn means that we need a 
  per-task nesting count to support reader recursion.
 
 This is a lot of code and a lot of new complexity.  It needs some pretty 
 convincing performance numbers to justify its inclusion, no?

Should be fairly straightforward to test: the sys_sched_getaffinity() and 
sys_sched_setaffinity() syscalls both make use of 
get_online_cpus()/put_online_cpus(), so a testcase frobbing affinities on 
N CPUs in parallel ought to demonstrate scalability improvements pretty 
nicely.

[ It's not just about affinities: in particular sys_sched_getaffinity() 
  also gets used as a NR_CPUS runtime detection method in apps, so it 
  matters to regular non-affine loads as well. ]

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2

2013-10-10 Thread Andrew Morton
On Thu, 10 Oct 2013 08:27:41 +0200 Ingo Molnar mi...@kernel.org wrote:

 * Andrew Morton a...@linux-foundation.org wrote:
 
  On Tue, 08 Oct 2013 12:25:05 +0200 Peter Zijlstra pet...@infradead.org 
  wrote:
  
   The current cpu hotplug lock is a single global lock; therefore 
   excluding hotplug is a very expensive proposition even though it is 
   rare occurrence under normal operation.
   
   There is a desire for a more light weight implementation of 
   {get,put}_online_cpus() from both the NUMA scheduling as well as the 
   -RT side.
   
   The current hotplug lock is a full reader preference lock -- and thus 
   supports reader recursion. However since we're making the read side 
   lock much cheaper it is the expectation that it will also be used far 
   more. Which in turn would lead to writer starvation.
   
   Therefore the new lock proposed is completely fair; albeit somewhat 
   expensive on the write side. This in turn means that we need a 
   per-task nesting count to support reader recursion.
  
  This is a lot of code and a lot of new complexity.  It needs some pretty 
  convincing performance numbers to justify its inclusion, no?
 
 Should be fairly straightforward to test: the sys_sched_getaffinity() and 
 sys_sched_setaffinity() syscalls both make use of 
 get_online_cpus()/put_online_cpus(), so a testcase frobbing affinities on 
 N CPUs in parallel ought to demonstrate scalability improvements pretty 
 nicely.

Well, an in-kernel microbenchmark which camps in a loop doing get/put
would measure this as well.

But neither approach answers the question how useful is this patchset.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2

2013-10-10 Thread Ingo Molnar

* Andrew Morton a...@linux-foundation.org wrote:

 On Thu, 10 Oct 2013 08:27:41 +0200 Ingo Molnar mi...@kernel.org wrote:
 
  * Andrew Morton a...@linux-foundation.org wrote:
  
   On Tue, 08 Oct 2013 12:25:05 +0200 Peter Zijlstra pet...@infradead.org 
   wrote:
   
The current cpu hotplug lock is a single global lock; therefore 
excluding hotplug is a very expensive proposition even though it is 
rare occurrence under normal operation.

There is a desire for a more light weight implementation of 
{get,put}_online_cpus() from both the NUMA scheduling as well as the 
-RT side.

The current hotplug lock is a full reader preference lock -- and thus 
supports reader recursion. However since we're making the read side 
lock much cheaper it is the expectation that it will also be used far 
more. Which in turn would lead to writer starvation.

Therefore the new lock proposed is completely fair; albeit somewhat 
expensive on the write side. This in turn means that we need a 
per-task nesting count to support reader recursion.
   
   This is a lot of code and a lot of new complexity.  It needs some pretty 
   convincing performance numbers to justify its inclusion, no?
  
  Should be fairly straightforward to test: the sys_sched_getaffinity() 
  and sys_sched_setaffinity() syscalls both make use of 
  get_online_cpus()/put_online_cpus(), so a testcase frobbing affinities 
  on N CPUs in parallel ought to demonstrate scalability improvements 
  pretty nicely.
 
 Well, an in-kernel microbenchmark which camps in a loop doing get/put 
 would measure this as well.
 
 But neither approach answers the question how useful is this patchset.

Even ignoring all the other reasons cited, sys_sched_getaffinity() / 
sys_sched_setaffinity() are prime time system calls, and as long as the 
patches are correct, speeding them up is worthwhile.

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2

2013-10-10 Thread Andrew Morton
On Thu, 10 Oct 2013 09:27:57 +0200 Ingo Molnar mi...@kernel.org wrote:

   Should be fairly straightforward to test: the sys_sched_getaffinity() 
   and sys_sched_setaffinity() syscalls both make use of 
   get_online_cpus()/put_online_cpus(), so a testcase frobbing affinities 
   on N CPUs in parallel ought to demonstrate scalability improvements 
   pretty nicely.
  
  Well, an in-kernel microbenchmark which camps in a loop doing get/put 
  would measure this as well.
  
  But neither approach answers the question how useful is this patchset.
 
 Even ignoring all the other reasons cited, sys_sched_getaffinity() / 
 sys_sched_setaffinity() are prime time system calls, and as long as the 
 patches are correct, speeding them up is worthwhile.

That I would not have guessed.  What's the use case for calling
get/set_affinity at high frequency?

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2

2013-10-10 Thread Ingo Molnar

* Andrew Morton a...@linux-foundation.org wrote:

 On Thu, 10 Oct 2013 09:27:57 +0200 Ingo Molnar mi...@kernel.org wrote:
 
Should be fairly straightforward to test: the sys_sched_getaffinity() 
and sys_sched_setaffinity() syscalls both make use of 
get_online_cpus()/put_online_cpus(), so a testcase frobbing affinities 
on N CPUs in parallel ought to demonstrate scalability improvements 
pretty nicely.
   
   Well, an in-kernel microbenchmark which camps in a loop doing get/put 
   would measure this as well.
   
   But neither approach answers the question how useful is this patchset.
  
  Even ignoring all the other reasons cited, sys_sched_getaffinity() / 
  sys_sched_setaffinity() are prime time system calls, and as long as 
  the patches are correct, speeding them up is worthwhile.
 
 That I would not have guessed.  What's the use case for calling 
 get/set_affinity at high frequency?

I don't think high-freq usage is common (at all).

It could happen in AIM7-like workloads that start up a ton of binaries in 
parallel, which can easily create parallel sched_getaffinity() calls 
during process startup?

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2

2013-10-10 Thread Peter Zijlstra
On Wed, Oct 09, 2013 at 10:50:06PM -0700, Andrew Morton wrote:
 On Tue, 08 Oct 2013 12:25:05 +0200 Peter Zijlstra pet...@infradead.org 
 wrote:
 
  The current cpu hotplug lock is a single global lock; therefore excluding
  hotplug is a very expensive proposition even though it is rare occurrence 
  under
  normal operation.
  
  There is a desire for a more light weight implementation of
  {get,put}_online_cpus() from both the NUMA scheduling as well as the -RT 
  side.
  
  The current hotplug lock is a full reader preference lock -- and thus 
  supports
  reader recursion. However since we're making the read side lock much 
  cheaper it
  is the expectation that it will also be used far more. Which in turn would 
  lead
  to writer starvation.
  
  Therefore the new lock proposed is completely fair; albeit somewhat 
  expensive
  on the write side. This in turn means that we need a per-task nesting count 
  to
  support reader recursion.
 
 This is a lot of code and a lot of new complexity.  It needs some pretty
 convincing performance numbers to justify its inclusion, no?

And here I thought it was generally understood to be unwise to bash
global state on anything like a regular manner from every cpu.

The NUMA bits really ought to use get_online_cpus()/put_online_cpus() on
every balance pass; which is about once a second on every cpu.

RT -- which has some quite horrible hotplug hacks due to this --
basically takes get_online_cpus() for every spin_lock/spin_unlock in the
kernel.

But the thing is; our sense of NR_CPUS has shifted, where it used to be
ok to do something like:

  for_each_cpu()

With preemption disabled; it gets to be less and less sane to do so,
simply because 'common' hardware has 256+ CPUs these days. If we cannot
rely on preempt disable to exclude hotplug, we must use
get_online_cpus(), but get_online_cpus() is global state and thus cannot
be used at any sort of frequency.


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2

2013-10-10 Thread Ingo Molnar

* Peter Zijlstra pet...@infradead.org wrote:

 But the thing is; our sense of NR_CPUS has shifted, where it used to be 
 ok to do something like:
 
   for_each_cpu()
 
 With preemption disabled; it gets to be less and less sane to do so, 
 simply because 'common' hardware has 256+ CPUs these days. If we cannot 
 rely on preempt disable to exclude hotplug, we must use 
 get_online_cpus(), but get_online_cpus() is global state and thus cannot 
 be used at any sort of frequency.

So ... why not make it _really_ cheap, i.e. the read lock costing nothing, 
and tie CPU hotplug to freezing all tasks in the system?

Actual CPU hot unplugging and repluggin is _ridiculously_ rare in a 
system, I don't understand how we tolerate _any_ overhead from this utter 
slowpath.

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2

2013-10-10 Thread Peter Zijlstra
On Thu, Oct 10, 2013 at 04:57:38PM +0200, Ingo Molnar wrote:
 So ... why not make it _really_ cheap, i.e. the read lock costing nothing, 
 and tie CPU hotplug to freezing all tasks in the system?

Such that we freeze regular tasks in userspace and kernel tasks in their
special freezer callback so as to guarantee minimal state?

If you can get people to agree with that I'm all for it.

That said, Oleg wants to use the same scheme for percpu_rwsem, and I'm
not sure his write-side (something in uprobes) wants to do the same.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2

2013-10-10 Thread Oleg Nesterov
On 10/10, Ingo Molnar wrote:

 * Peter Zijlstra pet...@infradead.org wrote:

  But the thing is; our sense of NR_CPUS has shifted, where it used to be
  ok to do something like:
 
for_each_cpu()
 
  With preemption disabled; it gets to be less and less sane to do so,
  simply because 'common' hardware has 256+ CPUs these days. If we cannot
  rely on preempt disable to exclude hotplug, we must use
  get_online_cpus(), but get_online_cpus() is global state and thus cannot
  be used at any sort of frequency.

 So ... why not make it _really_ cheap, i.e. the read lock costing nothing,
 and tie CPU hotplug to freezing all tasks in the system?

 Actual CPU hot unplugging and repluggin is _ridiculously_ rare in a
 system, I don't understand how we tolerate _any_ overhead from this utter
 slowpath.

Well, iirc Srivatsa (cc'ed) pointed out that some systems do cpu_down/up
quite often to save the power.

Oleg.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2

2013-10-10 Thread Oleg Nesterov
On 10/10, Peter Zijlstra wrote:

 That said, Oleg wants to use the same scheme for percpu_rwsem,

Yes, and later then (I hope) get_online_cpus() will be just
current-cpuhp_ref++ || percpu_down_read().

(just in case, we only need -cpuhp_ref to ensure that the readers
 can't starve the writers as it currently possible. iow to block the
 new readers but ensure that the recursive get/down_read can't block)

And please look at sb_wait_write/sb_start_write. It uses the same
logic except:

- it doesn't have the true fast-path for readers

- its slow mode is slower than necessary

- it is buggy (afaics)

 and I'm
 not sure his write-side (something in uprobes) wants to do the same.

surely not ;)

Oleg.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2

2013-10-10 Thread Andrew Morton
On Thu, 10 Oct 2013 17:26:12 +0200 Oleg Nesterov o...@redhat.com wrote:

 On 10/10, Ingo Molnar wrote:
 
  * Peter Zijlstra pet...@infradead.org wrote:
 
   But the thing is; our sense of NR_CPUS has shifted, where it used to be
   ok to do something like:
  
 for_each_cpu()
  
   With preemption disabled; it gets to be less and less sane to do so,
   simply because 'common' hardware has 256+ CPUs these days. If we cannot
   rely on preempt disable to exclude hotplug, we must use
   get_online_cpus(), but get_online_cpus() is global state and thus cannot
   be used at any sort of frequency.
 
  So ... why not make it _really_ cheap, i.e. the read lock costing nothing,
  and tie CPU hotplug to freezing all tasks in the system?
 
  Actual CPU hot unplugging and repluggin is _ridiculously_ rare in a
  system, I don't understand how we tolerate _any_ overhead from this utter
  slowpath.
 
 Well, iirc Srivatsa (cc'ed) pointed out that some systems do cpu_down/up
 quite often to save the power.

cpu hotremove already uses stop_machine, so such an approach shouldn't
actually worsen things (a lot) for them?

It's been ages since I looked at this stuff :( Although it isn't used
much, memory hotplug manages to use stop_machine() on the add/remove
(ie, writer) side and nothing at all on the reader side.  Is there
anything which fundamentally prevents cpu hotplug from doing the same?

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2

2013-10-10 Thread Steven Rostedt
On Thu, 10 Oct 2013 09:00:44 -0700
Andrew Morton a...@linux-foundation.org wrote:

 It's been ages since I looked at this stuff :( Although it isn't used
 much, memory hotplug manages to use stop_machine() on the add/remove
 (ie, writer) side and nothing at all on the reader side.  Is there
 anything which fundamentally prevents cpu hotplug from doing the same?


I would think that memory hotplug may require stop machine as all CPUs
may touch that memory. But we would like to remove stomp machine from
CPU hotplug. Why prevent all CPUs from running when we want to remove
one?

-- Steve
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2

2013-10-10 Thread Andrew Morton
On Thu, 10 Oct 2013 12:36:31 -0400 Steven Rostedt rost...@goodmis.org wrote:

 On Thu, 10 Oct 2013 09:00:44 -0700
 Andrew Morton a...@linux-foundation.org wrote:
 
  It's been ages since I looked at this stuff :( Although it isn't used
  much, memory hotplug manages to use stop_machine() on the add/remove
  (ie, writer) side and nothing at all on the reader side.  Is there
  anything which fundamentally prevents cpu hotplug from doing the same?
 
 
 I would think that memory hotplug may require stop machine as all CPUs
 may touch that memory.

Sure.

 But we would like to remove stomp machine from
 CPU hotplug.

We do?  That's news.  It wasn't mentioned in the changelog and should
have been.  Why?  

 Why prevent all CPUs from running when we want to remove
 one?

So get_online_cpus() goes away.  Nothing is more scalable than nothing!
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2

2013-10-10 Thread Ingo Molnar

* Peter Zijlstra pet...@infradead.org wrote:

 On Thu, Oct 10, 2013 at 04:57:38PM +0200, Ingo Molnar wrote:

  So ... why not make it _really_ cheap, i.e. the read lock costing 
  nothing, and tie CPU hotplug to freezing all tasks in the system?
 
 Such that we freeze regular tasks in userspace and kernel tasks in their 
 special freezer callback so as to guarantee minimal state?

Yes, absolutely.

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2

2013-10-10 Thread Ingo Molnar

* Andrew Morton a...@linux-foundation.org wrote:

 On Thu, 10 Oct 2013 17:26:12 +0200 Oleg Nesterov o...@redhat.com wrote:
 
  On 10/10, Ingo Molnar wrote:
  
   * Peter Zijlstra pet...@infradead.org wrote:
  
But the thing is; our sense of NR_CPUS has shifted, where it used to be
ok to do something like:
   
  for_each_cpu()
   
With preemption disabled; it gets to be less and less sane to do 
so, simply because 'common' hardware has 256+ CPUs these days. If 
we cannot rely on preempt disable to exclude hotplug, we must use 
get_online_cpus(), but get_online_cpus() is global state and thus 
cannot be used at any sort of frequency.
  
   So ... why not make it _really_ cheap, i.e. the read lock costing 
   nothing, and tie CPU hotplug to freezing all tasks in the system?
  
   Actual CPU hot unplugging and repluggin is _ridiculously_ rare in a 
   system, I don't understand how we tolerate _any_ overhead from this 
   utter slowpath.
  
  Well, iirc Srivatsa (cc'ed) pointed out that some systems do 
  cpu_down/up quite often to save the power.
 
 cpu hotremove already uses stop_machine, so such an approach shouldn't 
 actually worsen things (a lot) for them?

Also, using CPU hotremove to save power, instead of implementing proper 
power aware scheduling, is very broken to begin with.

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2

2013-10-10 Thread Peter Zijlstra
On Thu, Oct 10, 2013 at 09:43:55AM -0700, Andrew Morton wrote:
  But we would like to remove stomp machine from
  CPU hotplug.
 
 We do?  That's news.  It wasn't mentioned in the changelog and should
 have been.  Why?  

It would be an unrelated change to this and unrelated to the reasons as
to why I want a faster get_online_cpus().

  Why prevent all CPUs from running when we want to remove
  one?
 
 So get_online_cpus() goes away.  Nothing is more scalable than nothing!

Very much agreed; now stop_machine() wouldn't actually work for hotplug
because it will instantly preempt everybody, including someone who might
be in the middle of using per-cpu state of the cpu we're about to
remove.

The freeze suggestion from Ingo would actually work because we freeze
tasks at known good points (userspace and kthread_freeze() points) where
we know they're not fiddling with per-cpu state.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2

2013-10-10 Thread Oleg Nesterov
On 10/10, Andrew Morton wrote:

 On Thu, 10 Oct 2013 17:26:12 +0200 Oleg Nesterov o...@redhat.com wrote:

  On 10/10, Ingo Molnar wrote:
  
   So ... why not make it _really_ cheap, i.e. the read lock costing nothing,
   and tie CPU hotplug to freezing all tasks in the system?
  
   Actual CPU hot unplugging and repluggin is _ridiculously_ rare in a
   system, I don't understand how we tolerate _any_ overhead from this utter
   slowpath.
 
  Well, iirc Srivatsa (cc'ed) pointed out that some systems do cpu_down/up
  quite often to save the power.

 cpu hotremove already uses stop_machine,

And Srivatsa wants to remove it from cpu_down().

 so such an approach shouldn't
 actually worsen things (a lot) for them?

this depends on what this freezing all tasks actually means.
I understood it as try_to_freeze_tasks/etc, looks too heavy...

But my only point was, I am not sure we can assume that cpu_down/up
is extremly rare and its cost does not matter.

 use stop_machine() on the add/remove
 (ie, writer) side and nothing at all on the reader side.  Is there
 anything which fundamentally prevents cpu hotplug from doing the same?

Well, then we actually need to park all tasks in system, I guess.
IOW, freezer.

Oleg.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2

2013-10-10 Thread Steven Rostedt
On Thu, 10 Oct 2013 18:53:37 +0200
Peter Zijlstra pet...@infradead.org wrote:

 On Thu, Oct 10, 2013 at 09:43:55AM -0700, Andrew Morton wrote:
   But we would like to remove stomp machine from
   CPU hotplug.
  
  We do?  That's news.  It wasn't mentioned in the changelog and should
  have been.  Why?  
 
 It would be an unrelated change to this and unrelated to the reasons as
 to why I want a faster get_online_cpus().

Yeah, sorry for the confusion. My comment wasn't really about this
change set but about stop machine and hotplug in general. Needing stop
machine for hotplug has been a complaint by many, but off topic for
this particular change set.

 
   Why prevent all CPUs from running when we want to remove
   one?
  
  So get_online_cpus() goes away.  Nothing is more scalable than nothing!
 
 Very much agreed; now stop_machine() wouldn't actually work for hotplug
 because it will instantly preempt everybody, including someone who might
 be in the middle of using per-cpu state of the cpu we're about to
 remove.

Well, stop machine doesn't instantly preempt everybody. Only those that
don't have preemption disabled. Using per_cpu without preemption
disabled can be dangerous. Except for the migrate disable we want to
add for -rt. Then we can't rely on migrate disable and stop machine
making sure per_cpu data isn't being used. Oh, and for threads that use
per_cpu that are bound to a CPU. But then they need to be taken care of
too for their CPU going off line.

But this would also require the get_online_cpu()s to disable
preemption as well. Not quite a nothing we are looking for.


-- Steve
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2

2013-10-10 Thread Paul E. McKenney
On Thu, Oct 10, 2013 at 06:50:46PM +0200, Ingo Molnar wrote:
 
 * Peter Zijlstra pet...@infradead.org wrote:
 
  On Thu, Oct 10, 2013 at 04:57:38PM +0200, Ingo Molnar wrote:
 
   So ... why not make it _really_ cheap, i.e. the read lock costing 
   nothing, and tie CPU hotplug to freezing all tasks in the system?
  
  Such that we freeze regular tasks in userspace and kernel tasks in their 
  special freezer callback so as to guarantee minimal state?
 
 Yes, absolutely.

That does add quite a bit of latency to the hotplug operations, which
IIRC slows down things like boot, suspend, and resume.

Thanx, Paul

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2

2013-10-10 Thread Ingo Molnar

* Paul E. McKenney paul...@linux.vnet.ibm.com wrote:

 On Thu, Oct 10, 2013 at 06:50:46PM +0200, Ingo Molnar wrote:
  
  * Peter Zijlstra pet...@infradead.org wrote:
  
   On Thu, Oct 10, 2013 at 04:57:38PM +0200, Ingo Molnar wrote:
  
So ... why not make it _really_ cheap, i.e. the read lock costing 
nothing, and tie CPU hotplug to freezing all tasks in the system?
   
   Such that we freeze regular tasks in userspace and kernel tasks in 
   their special freezer callback so as to guarantee minimal state?
  
  Yes, absolutely.
 
 That does add quite a bit of latency to the hotplug operations, which 
 IIRC slows down things like boot, suspend, and resume.

I think it should _speed up_ suspend: instead of bringing down each CPU 
one by one, they could just be all stopped.

Same for bootup - it's not really a hotplug operation in the traditional 
sense, as all CPUs are 'present' at once, and they could be added in one 
group operation instead of serializing the bootup.

Also, most CPU bootup sequences are serialized.

It would slow down individual CPU hotplug/hotunplug - and that is what is 
a really rare operation outside broken schemes to save power ...

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2

2013-10-10 Thread Paul E. McKenney
On Thu, Oct 10, 2013 at 06:52:29PM +0200, Ingo Molnar wrote:
 
 * Andrew Morton a...@linux-foundation.org wrote:
 
  On Thu, 10 Oct 2013 17:26:12 +0200 Oleg Nesterov o...@redhat.com wrote:
  
   On 10/10, Ingo Molnar wrote:
   
* Peter Zijlstra pet...@infradead.org wrote:
   
 But the thing is; our sense of NR_CPUS has shifted, where it used to 
 be
 ok to do something like:

   for_each_cpu()

 With preemption disabled; it gets to be less and less sane to do 
 so, simply because 'common' hardware has 256+ CPUs these days. If 
 we cannot rely on preempt disable to exclude hotplug, we must use 
 get_online_cpus(), but get_online_cpus() is global state and thus 
 cannot be used at any sort of frequency.
   
So ... why not make it _really_ cheap, i.e. the read lock costing 
nothing, and tie CPU hotplug to freezing all tasks in the system?
   
Actual CPU hot unplugging and repluggin is _ridiculously_ rare in a 
system, I don't understand how we tolerate _any_ overhead from this 
utter slowpath.
   
   Well, iirc Srivatsa (cc'ed) pointed out that some systems do 
   cpu_down/up quite often to save the power.
  
  cpu hotremove already uses stop_machine, so such an approach shouldn't 
  actually worsen things (a lot) for them?
 
 Also, using CPU hotremove to save power, instead of implementing proper 
 power aware scheduling, is very broken to begin with.

In many cases, agreed.  Particularly on devices where there is an easy
way to send out regular updates and bug fixes, and especially if they
are connected to AC power.

But if I was in charge of creating a battery-powered multicore embedded
device that was to ship in million-unit quantities, especially in the
not-uncommon case where more than one CPU is needed only for specific
actions, I would offline the extra CPUs except when I was sure they were
needed.  This would of course still rely on power-aware scheduling when
more than one CPU was in use.  The power-aware scheduling conserves energy
for short-term variations (in which case offlining really is useless),
but longer term keep this CPU -off- is provided by offlining.

Yes, the Linux kernel code quality is quite good, but bugs still happen,
and bugs can easily cause CPUs to power up unnecessarily.  This doesn't
have to happen very often to kill your battery.  If the extra CPUs are
offline, a large class of those bugs are rendered harmless.  If you are
going to have a very large number of difficult-to-update battery-powered
devices in the field, defending against such bugs is worthwhile.

Thanx, Paul

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2

2013-10-10 Thread Oleg Nesterov
On 10/10, Peter Zijlstra wrote:

 The freeze suggestion from Ingo would actually work because we freeze
 tasks at known good points

Not really known/good, we have more and more freezable_schedule's.
But probably this is fine, nobody should do this under get_online_cpus().

 (userspace and kthread_freeze() points) where
 we know they're not fiddling with per-cpu state.

We should also take care of PF_NOFREEZE kthreads.

Oleg.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2

2013-10-10 Thread Andrew Morton
On Thu, 10 Oct 2013 13:13:05 -0400 Steven Rostedt rost...@goodmis.org wrote:

  
Why prevent all CPUs from running when we want to remove
one?
   
   So get_online_cpus() goes away.  Nothing is more scalable than nothing!
  
  Very much agreed; now stop_machine() wouldn't actually work for hotplug
  because it will instantly preempt everybody, including someone who might
  be in the middle of using per-cpu state of the cpu we're about to
  remove.
 
 Well, stop machine doesn't instantly preempt everybody. Only those that
 don't have preemption disabled. Using per_cpu without preemption
 disabled can be dangerous.

Yes, I'd have thought that the cases where a CPU is fiddling with
another CPU's percpu data with preemption enabled would be rather rare.

I can't actually think of any off the top.  Are there examples we can
look at?

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2

2013-10-10 Thread Linus Torvalds
On Thu, Oct 10, 2013 at 10:48 AM, Andrew Morton
a...@linux-foundation.org wrote:

 Yes, I'd have thought that the cases where a CPU is fiddling with
 another CPU's percpu data with preemption enabled would be rather rare.

 I can't actually think of any off the top.  Are there examples we can
 look at?

You can't do that right now - since you have to get the cpu list. So
it may not be with preemption enabled, but it should always be under
the locking provided by get_online_cpus().. That one allows sleeping,
though.

I personally would *love* to make CPU hotplug be a lockless thing
entirely. But I detest stop-machine too, because it has these really
annoying properties.

So if we want to make it zero-cost to look at online CPU data, can we
avoid even the stop-machine synchronization, instead saying that the
cpu hotplug bitmap is updated completely locklessly, but if you see a
bit set, the data associated with that CPU is guaranteed to still be
available.

IOW, just use RCU semantics on a per-bit level. When we offline a CPU, we do

 clear_bit(cpu, cpu_online_mask);
 rcu_synchronize();
 .. now we can free all the percpu data and kill the CPU ..

without any locking anywhere - not stop-machine, not anything. If
somebody is doing a for_each_cpu() (under just a regular
rcu_read_lock()) and they see the bit set while it's going down, who
cares? The CPU is still there, the data is accessible..

I'm sure there's some reason the above wouldn't work, but the above
would seem to be pretty optimal. Why do we really force this big
locking thing? The new patches make that locking _smarter_, but it's
still a damn big lock. Could we possibly go _beyond_ the lock?

 Linus
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2

2013-10-10 Thread Peter Zijlstra
On Thu, Oct 10, 2013 at 10:48:56AM -0700, Andrew Morton wrote:
   Very much agreed; now stop_machine() wouldn't actually work for hotplug
   because it will instantly preempt everybody, including someone who might
   be in the middle of using per-cpu state of the cpu we're about to
   remove.
  
  Well, stop machine doesn't instantly preempt everybody. Only those that
  don't have preemption disabled. Using per_cpu without preemption
  disabled can be dangerous.

Can be, but there's more of that around than you want. Also that's not
the entire issue.

 Yes, I'd have thought that the cases where a CPU is fiddling with
 another CPU's percpu data with preemption enabled would be rather rare.

Look at kernel/events/core.c:swevent_hlist_get() it does something like:

  get_online_cpus();
  for_each_online_cpu() {
allocate_hash_table();
  }
  put_online_cpus();

Surely you don't want to do that with preemption disabled?

But my point is that even though there aren't many of these today; with
the growing number of cpus in 'commodity' hardware you want to move away
from using preempt_disable() as hotplug lock.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2

2013-10-10 Thread Peter Zijlstra
On Thu, Oct 10, 2013 at 10:13:53AM -0700, Paul E. McKenney wrote:
 That does add quite a bit of latency to the hotplug operations, which
 IIRC slows down things like boot, suspend, and resume.

Guess what suspend does before it unplugs all these cpus?
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2

2013-10-10 Thread Steven Rostedt
On Thu, 10 Oct 2013 11:10:35 -0700
Linus Torvalds torva...@linux-foundation.org wrote:
 .. now we can free all the percpu data and kill the CPU ..
 
 without any locking anywhere - not stop-machine, not anything. If
 somebody is doing a for_each_cpu() (under just a regular
 rcu_read_lock()) and they see the bit set while it's going down, who
 cares? The CPU is still there, the data is accessible..

The problem is that rcu_read_lock() requires preemption disabled unless
you are using the preemptable rcu tree version. There's always
srcu_read_lock() but that's not so free. It's basically the same as
what Peter is doing.

There's places in the kernel that does for_each_cpu() that I'm sure you
don't want to disable preemption for. Especially when you start having
4096 CPU machines!

-- Steve
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2

2013-10-10 Thread Peter Zijlstra
On Thu, Oct 10, 2013 at 11:10:35AM -0700, Linus Torvalds wrote:
 You can't do that right now - since you have to get the cpu list. So
 it may not be with preemption enabled, but it should always be under
 the locking provided by get_online_cpus().. That one allows sleeping,
 though.
 
 I personally would *love* to make CPU hotplug be a lockless thing
 entirely. But I detest stop-machine too, because it has these really
 annoying properties.
 
 So if we want to make it zero-cost to look at online CPU data, can we
 avoid even the stop-machine synchronization, instead saying that the
 cpu hotplug bitmap is updated completely locklessly, but if you see a
 bit set, the data associated with that CPU is guaranteed to still be
 available.
 
 IOW, just use RCU semantics on a per-bit level. When we offline a CPU, we do
 
  clear_bit(cpu, cpu_online_mask);
  rcu_synchronize();
  .. now we can free all the percpu data and kill the CPU ..
 
 without any locking anywhere - not stop-machine, not anything. If
 somebody is doing a for_each_cpu() (under just a regular
 rcu_read_lock()) and they see the bit set while it's going down, who
 cares? The CPU is still there, the data is accessible..
 
 I'm sure there's some reason the above wouldn't work, but the above
 would seem to be pretty optimal. Why do we really force this big
 locking thing? The new patches make that locking _smarter_, but it's
 still a damn big lock. Could we possibly go _beyond_ the lock?

The only down-side to doing this is that you cannot actually allocate
memory under rcu_read_lock() because it might not allow preemption.

That said; I like the idea. I'll go try and audit the get_online_cpus()
sites to see if there's any that really need full exclusion.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2

2013-10-10 Thread Linus Torvalds
On Thu, Oct 10, 2013 at 11:34 AM, Peter Zijlstra pet...@infradead.org wrote:

 But my point is that even though there aren't many of these today; with
 the growing number of cpus in 'commodity' hardware you want to move away
 from using preempt_disable() as hotplug lock.

Umm.

Wasn't this pretty much the argument for the COMPLETELY F*CKED UP
change to make the vma locking use a semaphore?

The whole it's more convenient to use sleeping locks argument is
PURE AND UTTER SHIT when it comes to really core code. We are *much*
better off saying this is core, we care so deeply about performance
that you had better use your brain before you do this.

Seriously. Your argument is bad, but more importantly, it is
*dangerously* bad. It's crap that results in bad code: and the bad
code is almost impossible to fix up later, because once you encourage
people to do the easy thing, they'll do so.

Yes, preempt_disable() is harder to use than sleeping locks. You need
to do pre-allocation etc. But it is much *much* more efficient.

And in the kernel, we care. We have the resources. Plus, we can also
say if you can't handle it, don't do it. We don't need new features
so badly that we are willing to screw up core code.

As to your kernel/events/core.c example, things like that are
generally pretty easy to fix. The good thing about preemption-disable
is:

 - it's pretty easily debuggable, because you do get big warnings of
you screwed up exactly _here_

 - it's so cheap that especially when it comes to things that can
sleep, it's perfectly ok to drop the preemption count, do the sleeping
thing, re-disable preemption and then check if we still needed the
data.

Oh, and I'm sure there are several users that currently depend on
being able to sleep over get_online_cpu's.  But I'm pretty sure it's
several, not hundreds, and I think we could fix them up.

 Linus
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2

2013-10-10 Thread Peter Zijlstra
On Thu, Oct 10, 2013 at 02:43:27PM -0400, Steven Rostedt wrote:
 On Thu, 10 Oct 2013 11:10:35 -0700
 Linus Torvalds torva...@linux-foundation.org wrote:
  .. now we can free all the percpu data and kill the CPU ..
  
  without any locking anywhere - not stop-machine, not anything. If
  somebody is doing a for_each_cpu() (under just a regular
  rcu_read_lock()) and they see the bit set while it's going down, who
  cares? The CPU is still there, the data is accessible..
 
 The problem is that rcu_read_lock() requires preemption disabled unless
 you are using the preemptable rcu tree version. There's always
 srcu_read_lock() but that's not so free. It's basically the same as
 what Peter is doing.

No, srcu is actually more expensive in the fast path. Although possibly
we could make SRCU more complex ;-)

 There's places in the kernel that does for_each_cpu() that I'm sure you
 don't want to disable preemption for. Especially when you start having
 4096 CPU machines!

:-)
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2

2013-10-10 Thread Srivatsa S. Bhat
On 10/10/2013 08:56 PM, Oleg Nesterov wrote:
 On 10/10, Ingo Molnar wrote:

 * Peter Zijlstra pet...@infradead.org wrote:

 But the thing is; our sense of NR_CPUS has shifted, where it used to be
 ok to do something like:

   for_each_cpu()

 With preemption disabled; it gets to be less and less sane to do so,
 simply because 'common' hardware has 256+ CPUs these days. If we cannot
 rely on preempt disable to exclude hotplug, we must use
 get_online_cpus(), but get_online_cpus() is global state and thus cannot
 be used at any sort of frequency.

 So ... why not make it _really_ cheap, i.e. the read lock costing nothing,
 and tie CPU hotplug to freezing all tasks in the system?

 Actual CPU hot unplugging and repluggin is _ridiculously_ rare in a
 system, I don't understand how we tolerate _any_ overhead from this utter
 slowpath.
 
 Well, iirc Srivatsa (cc'ed) pointed out that some systems do cpu_down/up
 quite often to save the power.
 

Yes, I've heard of such systems and so I might have brought them up during
discussions about CPU hotplug. But unfortunately, I have been misquoted quite
often, leading to the wrong impression that I have such a usecase or that I
recommend/support using CPU hotplug for power management. So let me clarify
that part, while I have the chance. (And I don't blame anyone for that. I
work on power-management related areas, and I've worked on improving/optimizing
CPU hotplug; so its pretty natural to make a connection between the two
and assume that I tried to optimize CPU hotplug keeping power management
in mind. But that's not the case, as I explain below.)

I started out trying to make suspend/resume more reliable, scalable and fast.
And suspend/resume uses CPU hotplug underneath and that's a pretty valid 
usecase.
So with that, I started looking at CPU hotplug and soon realized the mess it
had become. So I started working on cleaning up that mess, like rethinking the
whole notifier scheme[1], and removing the ridiculous stop_machine() from the
cpu_down path[2] etc. But the intention behind all this work was just to make
CPU hotplug cleaner/saner/bug-free and possibly speed up suspend/resume. IOW,
I didn't have any explicit intention to make it easier for people to use it
for power management, although I understood that some of this work might
help those poor souls who don't have any other choice, for whatever reason.
And fortunately, (IIUC) the number of systems/people relying on CPU hotplug for
power management has reduced quite a bit in the recent times, which is a very
good thing.

So, to reiterate, I totally agree that power-aware scheduler is the right way
to do CPU power management; CPU hotplug is simply not the tool to use for that.
No question about that. Also, system shutdown used to depend on CPU hotplug to
disable the non-boot CPUs, but we don't do that any more after commit cf7df378a,
which is a very welcome change. And in future if we can somehow do 
suspend/resume
without using CPU hotplug, that would be absolutely wonderful as well. (There
have been discussions in the past around this, but nobody has a solution yet).
The other valid usecases that I can think of, for using CPU hotplug, is for RAS
reasons and for DLPAR (Dynamic Logical Partitioning) operations on powerpc, both
of which are not performance-sensitive, AFAIK.


[1]. Reverse invocation of CPU hotplug notifiers
 http://lwn.net/Articles/508072/

[2]. Stop-machine()-free CPU hotplug
 http://lwn.net/Articles/538819/ (v6)
 http://lwn.net/Articles/556727/

Regards,
Srivatsa S. Bhat

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2

2013-10-10 Thread Linus Torvalds
On Thu, Oct 10, 2013 at 11:43 AM, Steven Rostedt rost...@goodmis.org wrote:

 There's places in the kernel that does for_each_cpu() that I'm sure you
 don't want to disable preemption for. Especially when you start having
 4096 CPU machines!

We could _easily_ add preemption points in for_each_cpu() for the
MAX_SMP case. We can even do it in the macro itself.

So I wouldn't worry about things like that. I'd expect the Oh damn,
we have to allocate case to be more painful, but I think that is
fairly easily handled by having something like a redo_cpu() macro
that you can call inside for_each_cpu(). So the pattern for the
(nopefully-not-all-that-common) thing would be something like

   sometype *prealloc = NULL;

   for_each_cpu() {
 if (!prealloc) {
   rcu_read_unlock();  // Or whatever we want to call it
   prealloc = kmalloc(..);
   rcu_read_lock();
   redo_cpu();
  }
   }
   kfree(prealloc);

which is certainly more work than a sleeping lock is, but this looks
like a initialization pattern.

Looking at the things that are actually performance-critical, they
tend to be things like gather all the statistics for all CPUs.

Honesty in advertizing: I only grepped a couple of users of
get_online_cpus(), and they all seemed like they'd be perfectly happy
with preemption-off. But maybe I was just lucky in my few samples: I
have anecdotes, not data.

  Linus
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2

2013-10-10 Thread Steven Rostedt
On Thu, 10 Oct 2013 11:49:15 -0700
Linus Torvalds torva...@linux-foundation.org wrote:
 
 Oh, and I'm sure there are several users that currently depend on
 being able to sleep over get_online_cpu's.  But I'm pretty sure it's
 several, not hundreds, and I think we could fix them up.

I'm wondering if we can have a for_each_cpu() that only disables
preemption in the loop. That is, each iteration will enable preemption,
but the loop itself will guarantee that the current cpu to process
still exists.

rcu_read_lock();
for_each_cpu(cpu, mask) {
rcu_read_unlock();
rcu_read_lock();
if (!cpu_online(cpu))
continue;
[...]
}
rcu_read_unlock();

That way expensive loops wont stop the current CPU to process all
online CPUs.

Of course, it will miss a CPU that comes online. But it would still
need to handle the case of a CPU coming online after the final
put_online_cpus(), so I'm not sure that's a problem.

-- Steve
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2

2013-10-10 Thread Srivatsa S. Bhat
On 10/10/2013 10:24 PM, Oleg Nesterov wrote:
 On 10/10, Andrew Morton wrote:

 On Thu, 10 Oct 2013 17:26:12 +0200 Oleg Nesterov o...@redhat.com wrote:

 On 10/10, Ingo Molnar wrote:

 So ... why not make it _really_ cheap, i.e. the read lock costing nothing,
 and tie CPU hotplug to freezing all tasks in the system?

 Actual CPU hot unplugging and repluggin is _ridiculously_ rare in a
 system, I don't understand how we tolerate _any_ overhead from this utter
 slowpath.

 Well, iirc Srivatsa (cc'ed) pointed out that some systems do cpu_down/up
 quite often to save the power.

 cpu hotremove already uses stop_machine,
 
 And Srivatsa wants to remove it from cpu_down().
 

Yes, I have worked on several designs to remove stop_machine() from the cpu_down
path.

http://lwn.net/Articles/538819/
http://lwn.net/Articles/556727/

 so such an approach shouldn't
 actually worsen things (a lot) for them?
 
 this depends on what this freezing all tasks actually means.
 I understood it as try_to_freeze_tasks/etc, looks too heavy...
 
 But my only point was, I am not sure we can assume that cpu_down/up
 is extremly rare and its cost does not matter.
 
 use stop_machine() on the add/remove
 (ie, writer) side and nothing at all on the reader side.  Is there
 anything which fundamentally prevents cpu hotplug from doing the same?
 

Its certainly possible to remove stop_machine() from CPU hotplug, as I've
demonstrated in the patchsets mentioned above. And there were pretty good
performance improvements too, arising out of that change, as described here:

http://article.gmane.org/gmane.linux.ports.ppc.embedded/56122

Regards,
Srivatsa S. Bhat



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2

2013-10-10 Thread Paul E. McKenney
On Thu, Oct 10, 2013 at 08:50:32PM +0200, Peter Zijlstra wrote:
 On Thu, Oct 10, 2013 at 02:43:27PM -0400, Steven Rostedt wrote:
  On Thu, 10 Oct 2013 11:10:35 -0700
  Linus Torvalds torva...@linux-foundation.org wrote:
   .. now we can free all the percpu data and kill the CPU ..
   
   without any locking anywhere - not stop-machine, not anything. If
   somebody is doing a for_each_cpu() (under just a regular
   rcu_read_lock()) and they see the bit set while it's going down, who
   cares? The CPU is still there, the data is accessible..
  
  The problem is that rcu_read_lock() requires preemption disabled unless
  you are using the preemptable rcu tree version. There's always
  srcu_read_lock() but that's not so free. It's basically the same as
  what Peter is doing.
 
 No, srcu is actually more expensive in the fast path. Although possibly
 we could make SRCU more complex ;-)

Indeed.  Especially if we wanted to get rid of the memory barriers in
srcu_read_lock() and srcu_read_unlock() and still allow SRCU to be used
from idle and offline CPUs!  ;-)

Thanx, Paul

  There's places in the kernel that does for_each_cpu() that I'm sure you
  don't want to disable preemption for. Especially when you start having
  4096 CPU machines!
 
 :-)

But shouldn't you only have to disable over each separate pass through
the loop rather across the entire loop?  I guarantee that the CPU I
just handed you will remain online through the body of the loop rather
than I guarantee that no CPU is going anywhere through the entire loop.

Thanx, Paul

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2

2013-10-10 Thread Linus Torvalds
On Thu, Oct 10, 2013 at 12:04 PM, Steven Rostedt rost...@goodmis.org wrote:

 I'm wondering if we can have a for_each_cpu() that only disables
 preemption in the loop.

I think we'd generally want to have it be something the loop asks for.

If the loop is just some kind of gather statistics thing, I don't
think it's required. The cost per loop is so low (usually adding up a
couple of words) that the downside drowns the upside.

And we could easily look at MAXSMP (or NR_CPUS) at compile-time, and
not do it for common small values (although it looks like Fedora
defaults to 128 CPU's for their distro kernels, which seems a bit
excessive - too many by far for normal people, too few for the crazy
big ones).

Linus
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2

2013-10-10 Thread Steven Rostedt
On Thu, 10 Oct 2013 12:16:16 -0700
Linus Torvalds torva...@linux-foundation.org wrote:

 On Thu, Oct 10, 2013 at 12:04 PM, Steven Rostedt rost...@goodmis.org wrote:
 
  I'm wondering if we can have a for_each_cpu() that only disables
  preemption in the loop.
 
 I think we'd generally want to have it be something the loop asks for.

Yeah, I was thinking of adding a new macro, not update for_each_cpu()
itself, so that only the locations that would require it would use it.

 
 If the loop is just some kind of gather statistics thing, I don't
 think it's required. The cost per loop is so low (usually adding up a
 couple of words) that the downside drowns the upside.
 
 And we could easily look at MAXSMP (or NR_CPUS) at compile-time, and
 not do it for common small values (although it looks like Fedora
 defaults to 128 CPU's for their distro kernels, which seems a bit
 excessive - too many by far for normal people, too few for the crazy
 big ones).

We could always make it a boot time option, and use alternates to
change the code as we do with spin_locks() and other code.

-- Steve (the lover of self modifying code)

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2

2013-10-10 Thread Peter Zijlstra
On Thu, Oct 10, 2013 at 12:16:16PM -0700, Linus Torvalds wrote:
 On Thu, Oct 10, 2013 at 12:04 PM, Steven Rostedt rost...@goodmis.org wrote:
 
  I'm wondering if we can have a for_each_cpu() that only disables
  preemption in the loop.
 
 I think we'd generally want to have it be something the loop asks for.
 
 If the loop is just some kind of gather statistics thing, I don't
 think it's required. The cost per loop is so low (usually adding up a
 couple of words) that the downside drowns the upside.
 
 And we could easily look at MAXSMP (or NR_CPUS) at compile-time, and
 not do it for common small values (although it looks like Fedora
 defaults to 128 CPU's for their distro kernels, which seems a bit
 excessive - too many by far for normal people, too few for the crazy
 big ones).

Alternatively we could write it something like:

rcu_read_lock();
for_each_online_node(node) {
for_each_cpu(cpu, cpumask_of_node(node)) {
...
}
cond_resched_rcu();
}
rcu_read_unlock();

But yes, I think this pattern (and variations) should work for most
cases.

The one I'm struggling with atm is kernel/cpuset.c:
rebuild_sched_domains_locked(). Although I'm thinking we should do that
seqlock style, just build it and verify if its still valid, if not, try
again -- although I'm sure it'll be 'fun' to get correct.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2

2013-10-09 Thread Andrew Morton
On Tue, 08 Oct 2013 12:25:05 +0200 Peter Zijlstra  wrote:

> The current cpu hotplug lock is a single global lock; therefore excluding
> hotplug is a very expensive proposition even though it is rare occurrence 
> under
> normal operation.
> 
> There is a desire for a more light weight implementation of
> {get,put}_online_cpus() from both the NUMA scheduling as well as the -RT side.
> 
> The current hotplug lock is a full reader preference lock -- and thus supports
> reader recursion. However since we're making the read side lock much cheaper 
> it
> is the expectation that it will also be used far more. Which in turn would 
> lead
> to writer starvation.
> 
> Therefore the new lock proposed is completely fair; albeit somewhat expensive
> on the write side. This in turn means that we need a per-task nesting count to
> support reader recursion.

This is a lot of code and a lot of new complexity.  It needs some pretty
convincing performance numbers to justify its inclusion, no?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/6] Optimize the cpu hotplug locking -v2

2013-10-09 Thread Andrew Morton
On Tue, 08 Oct 2013 12:25:05 +0200 Peter Zijlstra pet...@infradead.org wrote:

 The current cpu hotplug lock is a single global lock; therefore excluding
 hotplug is a very expensive proposition even though it is rare occurrence 
 under
 normal operation.
 
 There is a desire for a more light weight implementation of
 {get,put}_online_cpus() from both the NUMA scheduling as well as the -RT side.
 
 The current hotplug lock is a full reader preference lock -- and thus supports
 reader recursion. However since we're making the read side lock much cheaper 
 it
 is the expectation that it will also be used far more. Which in turn would 
 lead
 to writer starvation.
 
 Therefore the new lock proposed is completely fair; albeit somewhat expensive
 on the write side. This in turn means that we need a per-task nesting count to
 support reader recursion.

This is a lot of code and a lot of new complexity.  It needs some pretty
convincing performance numbers to justify its inclusion, no?
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   >