Re: [GIT PULL] workqueue fixes for v4.5-rc3

2016-02-11 Thread Linus Torvalds
On Thu, Feb 11, 2016 at 10:49 AM, Thomas Gleixner  wrote:
>
> while playing around with it and wondering where to put the command line
> option, I wondered whether it makes sense to tie this to debugobjects.
>
> If stuff goes bad, then state corruptions of the involved objects (timers,
> work ..) are likely to happen, so having debugobjects enabled along with that
> force RR scheme makes a lot of sense.
>
> Thoughts?

I'm not violently opposed, but at the same time I personally
absolutely detest the debug options that are so expensive that they
are unusable in production environments. That limits the scope of
testing a lot.

debugobjects and PAGEALLOC_DEBUG are both good things, but they really
are so expensive as to be completely unusable in many situations. I
would never enable them personally unless I was actively trying to
chase something down, or for some very occasional "let's just do a
sanity check". A system admin that enabled those things on every
machine he runs would be insane (again, unless he is already in big
trouble and is actively trying to chase something down).

There's a *lot* to be said for cheap debug options that you might want
to enable "just in case". debugobjects is not that.

So I'd much prefer a standalone option. Then, *if* that shows
problems, and *if* those problems end up being hard to chase down, at
that point we might ask the people who see issues to "maybe enable
debugobjects that might give us more information".

Most oopses are not subtle.

 Linus


Re: [GIT PULL] workqueue fixes for v4.5-rc3

2016-02-11 Thread Thomas Gleixner
Linus,

On Thu, 11 Feb 2016, Linus Torvalds wrote:

> On Thu, Feb 11, 2016 at 12:18 AM, Thomas Gleixner  wrote:
> >
> > That certainly makes sense. So we should use a common debug option to force
> > the 'schedule remote' machinery for all interesting subsystems instead of
> > creating an extra option for each.
> 
> Yeah, let's make it easy to use and enable.
> 
> If it was split, the options would likely interact anyway (ie any
> option that enables "add_timer()" to run on arbitrary cpu's wouldl
> automatically also affect "queue_delayed_work()"), so it's not like it
> would really be truly independent issues anyway.
> 
> And if it does show an oops or other problem due to some broken
> expectations, I'd hope the oops should be enough to show what the
> problem is. It did for the vmstat case.

while playing around with it and wondering where to put the command line
option, I wondered whether it makes sense to tie this to debugobjects.

If stuff goes bad, then state corruptions of the involved objects (timers,
work ..) are likely to happen, so having debugobjects enabled along with that
force RR scheme makes a lot of sense.

Thoughts?

Thanks,

tglx


Re: [GIT PULL] workqueue fixes for v4.5-rc3

2016-02-11 Thread Linus Torvalds
On Thu, Feb 11, 2016 at 12:18 AM, Thomas Gleixner  wrote:
>
> That certainly makes sense. So we should use a common debug option to force
> the 'schedule remote' machinery for all interesting subsystems instead of
> creating an extra option for each.

Yeah, let's make it easy to use and enable.

If it was split, the options would likely interact anyway (ie any
option that enables "add_timer()" to run on arbitrary cpu's wouldl
automatically also affect "queue_delayed_work()"), so it's not like it
would really be truly independent issues anyway.

And if it does show an oops or other problem due to some broken
expectations, I'd hope the oops should be enough to show what the
problem is. It did for the vmstat case.

Linus


Re: [GIT PULL] workqueue fixes for v4.5-rc3

2016-02-11 Thread Thomas Gleixner
On Wed, 10 Feb 2016, Linus Torvalds wrote:
> On Wed, Feb 10, 2016 at 9:25 AM, Tejun Heo  wrote:
> >
> > * Officially break local execution guarantee of unbound work items and
> >   add a debug feature to flush out usages which depend on it.
> 
> Btw, should we try to do this for timers too? A debug option to make a
> regular "add_timer()" explicitly schedule things on another CPU, to
> figure out the cases that really should use "add_timer_on()"?

That certainly makes sense. So we should use a common debug option to force
the 'schedule remote' machinery for all interesting subsystems instead of
creating an extra option for each.

Thanks,

tglx


Re: [GIT PULL] workqueue fixes for v4.5-rc3

2016-02-11 Thread Thomas Gleixner
On Wed, 10 Feb 2016, Linus Torvalds wrote:
> On Wed, Feb 10, 2016 at 9:25 AM, Tejun Heo  wrote:
> >
> > * Officially break local execution guarantee of unbound work items and
> >   add a debug feature to flush out usages which depend on it.
> 
> Btw, should we try to do this for timers too? A debug option to make a
> regular "add_timer()" explicitly schedule things on another CPU, to
> figure out the cases that really should use "add_timer_on()"?

That certainly makes sense. So we should use a common debug option to force
the 'schedule remote' machinery for all interesting subsystems instead of
creating an extra option for each.

Thanks,

tglx


Re: [GIT PULL] workqueue fixes for v4.5-rc3

2016-02-11 Thread Thomas Gleixner
Linus,

On Thu, 11 Feb 2016, Linus Torvalds wrote:

> On Thu, Feb 11, 2016 at 12:18 AM, Thomas Gleixner  wrote:
> >
> > That certainly makes sense. So we should use a common debug option to force
> > the 'schedule remote' machinery for all interesting subsystems instead of
> > creating an extra option for each.
> 
> Yeah, let's make it easy to use and enable.
> 
> If it was split, the options would likely interact anyway (ie any
> option that enables "add_timer()" to run on arbitrary cpu's wouldl
> automatically also affect "queue_delayed_work()"), so it's not like it
> would really be truly independent issues anyway.
> 
> And if it does show an oops or other problem due to some broken
> expectations, I'd hope the oops should be enough to show what the
> problem is. It did for the vmstat case.

while playing around with it and wondering where to put the command line
option, I wondered whether it makes sense to tie this to debugobjects.

If stuff goes bad, then state corruptions of the involved objects (timers,
work ..) are likely to happen, so having debugobjects enabled along with that
force RR scheme makes a lot of sense.

Thoughts?

Thanks,

tglx


Re: [GIT PULL] workqueue fixes for v4.5-rc3

2016-02-11 Thread Linus Torvalds
On Thu, Feb 11, 2016 at 10:49 AM, Thomas Gleixner  wrote:
>
> while playing around with it and wondering where to put the command line
> option, I wondered whether it makes sense to tie this to debugobjects.
>
> If stuff goes bad, then state corruptions of the involved objects (timers,
> work ..) are likely to happen, so having debugobjects enabled along with that
> force RR scheme makes a lot of sense.
>
> Thoughts?

I'm not violently opposed, but at the same time I personally
absolutely detest the debug options that are so expensive that they
are unusable in production environments. That limits the scope of
testing a lot.

debugobjects and PAGEALLOC_DEBUG are both good things, but they really
are so expensive as to be completely unusable in many situations. I
would never enable them personally unless I was actively trying to
chase something down, or for some very occasional "let's just do a
sanity check". A system admin that enabled those things on every
machine he runs would be insane (again, unless he is already in big
trouble and is actively trying to chase something down).

There's a *lot* to be said for cheap debug options that you might want
to enable "just in case". debugobjects is not that.

So I'd much prefer a standalone option. Then, *if* that shows
problems, and *if* those problems end up being hard to chase down, at
that point we might ask the people who see issues to "maybe enable
debugobjects that might give us more information".

Most oopses are not subtle.

 Linus


Re: [GIT PULL] workqueue fixes for v4.5-rc3

2016-02-11 Thread Linus Torvalds
On Thu, Feb 11, 2016 at 12:18 AM, Thomas Gleixner  wrote:
>
> That certainly makes sense. So we should use a common debug option to force
> the 'schedule remote' machinery for all interesting subsystems instead of
> creating an extra option for each.

Yeah, let's make it easy to use and enable.

If it was split, the options would likely interact anyway (ie any
option that enables "add_timer()" to run on arbitrary cpu's wouldl
automatically also affect "queue_delayed_work()"), so it's not like it
would really be truly independent issues anyway.

And if it does show an oops or other problem due to some broken
expectations, I'd hope the oops should be enough to show what the
problem is. It did for the vmstat case.

Linus


Re: [GIT PULL] workqueue fixes for v4.5-rc3

2016-02-10 Thread Linus Torvalds
On Wed, Feb 10, 2016 at 9:25 AM, Tejun Heo  wrote:
>
> * Officially break local execution guarantee of unbound work items and
>   add a debug feature to flush out usages which depend on it.

Btw, should we try to do this for timers too? A debug option to make a
regular "add_timer()" explicitly schedule things on another CPU, to
figure out the cases that really should use "add_timer_on()"?

Added Thomas to the cc.

   Linus


[GIT PULL] workqueue fixes for v4.5-rc3

2016-02-10 Thread Tejun Heo
Hello, Linus.

Workqueue fixes for v4.5-rc3.

* Remove a spurious triggering of flush dependency warning.

* Officially break local execution guarantee of unbound work items and
  add a debug feature to flush out usages which depend on it.

* Work around CPU -> NODE mapping becoming invalid on CPU offline.

The branch is young but pushing out early as stable kernels are being
affected.

Thanks.

The following changes since commit 92e963f50fc74041b5e9e744c330dca48e04f08d:

  Linux 4.5-rc1 (2016-01-24 13:06:47 -0800)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq.git for-4.5-fixes

for you to fetch changes up to d6e022f1d207a161cd88e08ef0371554680ffc46:

  workqueue: handle NUMA_NO_NODE for unbound pool_workqueue lookup (2016-02-10 
12:13:05 -0500)


Mike Galbraith (1):
  workqueue: schedule WORK_CPU_UNBOUND work on wq_unbound_cpumask CPUs

Tejun Heo (4):
  workqueue: skip flush dependency checks for legacy workqueues
  Revert "workqueue: make sure delayed work run in local cpu"
  workqueue: implement "workqueue.debug_force_rr_cpu" debug feature
  workqueue: handle NUMA_NO_NODE for unbound pool_workqueue lookup

 Documentation/kernel-parameters.txt | 11 ++
 include/linux/workqueue.h   |  9 +++--
 kernel/workqueue.c  | 74 +
 lib/Kconfig.debug   | 15 
 4 files changed, 98 insertions(+), 11 deletions(-)

-- 
tejun


[GIT PULL] workqueue fixes for v4.5-rc3

2016-02-10 Thread Tejun Heo
Hello, Linus.

Workqueue fixes for v4.5-rc3.

* Remove a spurious triggering of flush dependency warning.

* Officially break local execution guarantee of unbound work items and
  add a debug feature to flush out usages which depend on it.

* Work around CPU -> NODE mapping becoming invalid on CPU offline.

The branch is young but pushing out early as stable kernels are being
affected.

Thanks.

The following changes since commit 92e963f50fc74041b5e9e744c330dca48e04f08d:

  Linux 4.5-rc1 (2016-01-24 13:06:47 -0800)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq.git for-4.5-fixes

for you to fetch changes up to d6e022f1d207a161cd88e08ef0371554680ffc46:

  workqueue: handle NUMA_NO_NODE for unbound pool_workqueue lookup (2016-02-10 
12:13:05 -0500)


Mike Galbraith (1):
  workqueue: schedule WORK_CPU_UNBOUND work on wq_unbound_cpumask CPUs

Tejun Heo (4):
  workqueue: skip flush dependency checks for legacy workqueues
  Revert "workqueue: make sure delayed work run in local cpu"
  workqueue: implement "workqueue.debug_force_rr_cpu" debug feature
  workqueue: handle NUMA_NO_NODE for unbound pool_workqueue lookup

 Documentation/kernel-parameters.txt | 11 ++
 include/linux/workqueue.h   |  9 +++--
 kernel/workqueue.c  | 74 +
 lib/Kconfig.debug   | 15 
 4 files changed, 98 insertions(+), 11 deletions(-)

-- 
tejun


Re: [GIT PULL] workqueue fixes for v4.5-rc3

2016-02-10 Thread Linus Torvalds
On Wed, Feb 10, 2016 at 9:25 AM, Tejun Heo  wrote:
>
> * Officially break local execution guarantee of unbound work items and
>   add a debug feature to flush out usages which depend on it.

Btw, should we try to do this for timers too? A debug option to make a
regular "add_timer()" explicitly schedule things on another CPU, to
figure out the cases that really should use "add_timer_on()"?

Added Thomas to the cc.

   Linus