Re: [PATCH 00/10] workqueue: break affinity initiatively

2020-12-16 Thread Tejun Heo
On Mon, Dec 14, 2020 at 11:54:47PM +0800, Lai Jiangshan wrote:
> From: Lai Jiangshan 
> 
> 06249738a41a ("workqueue: Manually break affinity on hotplug")
> said that scheduler will not force break affinity for us.
> 
> But workqueue highly depends on the old behavior. Many parts of the codes
> relies on it, 06249738a41a ("workqueue: Manually break affinity on hotplug")
> is not enough to change it, and the commit has flaws in itself too.
> 
> We need to thoroughly update the way workqueue handles affinity
> in cpu hot[un]plug, what is this patchset intends to do and
> replace the Valentin Schneider's patch [1].
> 
> Patch 1 fixes a flaw reported by Hillf Danton .
> I have to include this fix because later patches depends on it.
> 
> The patchset is based on tip/master rather than workqueue tree,
> because the patchset is a complement for 06249738a41a ("workqueue:
> Manually break affinity on hotplug") which is only in tip/master by now.
> 
> [1]: 
> https://lore.kernel.org/r/ff62e3ee994efb3620177bf7b19fab16f4866845.ca...@redhat.com

Generally looks good to me. Please feel free to add

 Acked-by: Tejun Heo 

and route the series through tip.

Thanks.

-- 
tejun


Re: [PATCH 00/10] workqueue: break affinity initiatively

2020-12-15 Thread Lai Jiangshan
On Tue, Dec 15, 2020 at 4:49 PM Peter Zijlstra  wrote:
>
> On Tue, Dec 15, 2020 at 04:14:26PM +0800, Lai Jiangshan wrote:
> > On Tue, Dec 15, 2020 at 3:50 PM Peter Zijlstra  wrote:
> > >
> > > On Tue, Dec 15, 2020 at 01:44:53PM +0800, Lai Jiangshan wrote:
> > > > I don't know how the scheduler distinguishes all these
> > > > different cases under the "new assumption".
> > >
> > > The special case is:
> > >
> > >   (p->flags & PF_KTHREAD) && p->nr_cpus_allowed == 1
> > >
> > >
> >
> > So unbound per-node workers can possibly match this test. So there is code
> > needed to handle for unbound workers/pools which is done by this patchset.
>
> Curious; how could a per-node worker match this? Only if the node is a
> single CPU, or otherwise too?

We have /sys/devices/virtual/workqueue/cpumask which can be read/written
to access to wq_unbound_cpumask.

A per-node worker's cpumask is wq_unbound_cpumask_cpumask_of_the_node.
Since wq_unbound_cpumask can be changed by system adim, so a per-node
worker's cpumask is possible to be single CPU.

wq_unbound_cpumask is used when a system adim wants to isolate some
CPUs from unbound workqueques.  But I think it is rare case when the
admin causes a per-node worker's cpumask to be single CPU.

Even it is a rare case, we have to handle it.

>
> > Is this the code of is_per_cpu_kthread()? I think I should have also
> > used this function in workqueue and don't break affinity for unbound
> > workers have more than 1 cpu.
>
> Yes, that function captures it. If you want to use it, feel free to move
> it to include/linux/sched.h.

I will.  "single CPU" for unbound workers/pools is the rare case
and enough to bring the code to break affinity for unbound workers.
If we optimize for the common cases (multiple CPUs for unbound workers),
the optimization seems like additional code works only in the slow
path (hotunplug).

I will try it and see if it is worth.

>
> This class of threads is 'special', since it needs to violate the
> regular hotplug rules, and migrate_disable() made it just this little
> bit more special. It basically comes down to how we need certain per-cpu
> kthreads to run on a CPU while it's brought up, before userspace is
> allowed on, and similarly they need to run on the CPU after userspace is
> no longer allowed on in order to bring it down.
>
> (IOW, they must be allowed to violate the active mask)
>
> Due to migrate_disable() we had to move the migration code from the very
> last cpu-down stage, to earlier. This in turn brought the expectation
> (which is normally met) that per-cpu kthreads will stop/park or
> otherwise make themselves scarce when the CPU goes down. We can no
> longer force migrate them.

Thanks for explaining the rationale.

>
> Workqueues are the sole exception to that, they've got some really
> 'dodgy' hotplug behaviour.
>

Indeed.  No one want to wait for workqueue when hotunplug, so we have
to do something after the fact.


Re: [PATCH 00/10] workqueue: break affinity initiatively

2020-12-15 Thread Peter Zijlstra
On Tue, Dec 15, 2020 at 04:14:26PM +0800, Lai Jiangshan wrote:
> On Tue, Dec 15, 2020 at 3:50 PM Peter Zijlstra  wrote:
> >
> > On Tue, Dec 15, 2020 at 01:44:53PM +0800, Lai Jiangshan wrote:
> > > I don't know how the scheduler distinguishes all these
> > > different cases under the "new assumption".
> >
> > The special case is:
> >
> >   (p->flags & PF_KTHREAD) && p->nr_cpus_allowed == 1
> >
> >
> 
> So unbound per-node workers can possibly match this test. So there is code
> needed to handle for unbound workers/pools which is done by this patchset.

Curious; how could a per-node worker match this? Only if the node is a
single CPU, or otherwise too?

> Is this the code of is_per_cpu_kthread()? I think I should have also
> used this function in workqueue and don't break affinity for unbound
> workers have more than 1 cpu.

Yes, that function captures it. If you want to use it, feel free to move
it to include/linux/sched.h.

This class of threads is 'special', since it needs to violate the
regular hotplug rules, and migrate_disable() made it just this little
bit more special. It basically comes down to how we need certain per-cpu
kthreads to run on a CPU while it's brought up, before userspace is
allowed on, and similarly they need to run on the CPU after userspace is
no longer allowed on in order to bring it down.

(IOW, they must be allowed to violate the active mask)

Due to migrate_disable() we had to move the migration code from the very
last cpu-down stage, to earlier. This in turn brought the expectation
(which is normally met) that per-cpu kthreads will stop/park or
otherwise make themselves scarce when the CPU goes down. We can no
longer force migrate them.

Workqueues are the sole exception to that, they've got some really
'dodgy' hotplug behaviour.



Re: [PATCH 00/10] workqueue: break affinity initiatively

2020-12-15 Thread Lai Jiangshan
On Tue, Dec 15, 2020 at 3:50 PM Peter Zijlstra  wrote:
>
> On Tue, Dec 15, 2020 at 01:44:53PM +0800, Lai Jiangshan wrote:
> > I don't know how the scheduler distinguishes all these
> > different cases under the "new assumption".
>
> The special case is:
>
>   (p->flags & PF_KTHREAD) && p->nr_cpus_allowed == 1
>
>

So unbound per-node workers can possibly match this test. So there is code
needed to handle for unbound workers/pools which is done by this patchset.

Is this the code of is_per_cpu_kthread()? I think I should have also
used this function in workqueue and don't break affinity for unbound
workers have more than 1 cpu.


Re: [PATCH 00/10] workqueue: break affinity initiatively

2020-12-14 Thread Peter Zijlstra
On Tue, Dec 15, 2020 at 01:44:53PM +0800, Lai Jiangshan wrote:
> I don't know how the scheduler distinguishes all these
> different cases under the "new assumption".

The special case is:

  (p->flags & PF_KTHREAD) && p->nr_cpus_allowed == 1




Re: [PATCH 00/10] workqueue: break affinity initiatively

2020-12-14 Thread Lai Jiangshan
On Tue, Dec 15, 2020 at 1:36 AM Peter Zijlstra  wrote:
>
> On Mon, Dec 14, 2020 at 11:54:47PM +0800, Lai Jiangshan wrote:
> > From: Lai Jiangshan 
> >
> > 06249738a41a ("workqueue: Manually break affinity on hotplug")
> > said that scheduler will not force break affinity for us.
> >
> > But workqueue highly depends on the old behavior. Many parts of the codes
> > relies on it, 06249738a41a ("workqueue: Manually break affinity on hotplug")
> > is not enough to change it, and the commit has flaws in itself too.
> >
> > We need to thoroughly update the way workqueue handles affinity
> > in cpu hot[un]plug, what is this patchset intends to do and
> > replace the Valentin Schneider's patch [1].
>
> So the actual problem is with per-cpu kthreads, the new assumption is
> that hot-un-plug will make all per-cpu kthreads for the dying CPU go
> away.

Hello, Peter

"new assumption" is all needed to be aligned. I haven't read the code.
I thought I understood to some extent which is enough for me to know
that workqueue does violate that.

Workqueue does not break affinity for all per-cpu kthreads in several
cases such as hot-un-plug and workers detaching from pool (those workers
will not be searchable from pools and should be handled alike to hot-un-plug).

But workqueue has not only per-cpu kthreads but also per-node threads.
And per-node threads may be bound to multiple CPUs or may be bound to
a single CPU. I don't know how the scheduler distinguishes all these
different cases under the "new assumption". But at least workqueue
handle these different cases at the same few places.  Since workqueue
have to "break affinity" for per-cpu kthreads, it can also "break affinity"
for other cases. Making workqueue totally do not rely on scheduler's
work to "break affinity" is worth doing since we have to do it for the
most parts.

I haven't read the code about "new assumption", if possible, I'll first
try to find out how will scheduler handle these cases:

If a per-node thread has only cpu 4, and when it goes down, does
workqueue need to "break affinity" for it?

If a per-node thread has only cpu 41,42, and when both go down, does
workqueue need to "break affinity" for it?

Thanks
Lai

>
> Workqueues violated that. I fixed the obvious site, and Valentin's patch
> avoids workqueues from quickly creating new ones while we're not
> looking.
>
> What other problems did you find?


Re: [PATCH 00/10] workqueue: break affinity initiatively

2020-12-14 Thread Peter Zijlstra
On Mon, Dec 14, 2020 at 11:54:47PM +0800, Lai Jiangshan wrote:
> From: Lai Jiangshan 
> 
> 06249738a41a ("workqueue: Manually break affinity on hotplug")
> said that scheduler will not force break affinity for us.
> 
> But workqueue highly depends on the old behavior. Many parts of the codes
> relies on it, 06249738a41a ("workqueue: Manually break affinity on hotplug")
> is not enough to change it, and the commit has flaws in itself too.
> 
> We need to thoroughly update the way workqueue handles affinity
> in cpu hot[un]plug, what is this patchset intends to do and
> replace the Valentin Schneider's patch [1].

So the actual problem is with per-cpu kthreads, the new assumption is
that hot-un-plug will make all per-cpu kthreads for the dying CPU go
away.

Workqueues violated that. I fixed the obvious site, and Valentin's patch
avoids workqueues from quickly creating new ones while we're not
looking.

What other problems did you find?


[PATCH 00/10] workqueue: break affinity initiatively

2020-12-14 Thread Lai Jiangshan
From: Lai Jiangshan 

06249738a41a ("workqueue: Manually break affinity on hotplug")
said that scheduler will not force break affinity for us.

But workqueue highly depends on the old behavior. Many parts of the codes
relies on it, 06249738a41a ("workqueue: Manually break affinity on hotplug")
is not enough to change it, and the commit has flaws in itself too.

We need to thoroughly update the way workqueue handles affinity
in cpu hot[un]plug, what is this patchset intends to do and
replace the Valentin Schneider's patch [1].

Patch 1 fixes a flaw reported by Hillf Danton .
I have to include this fix because later patches depends on it.

The patchset is based on tip/master rather than workqueue tree,
because the patchset is a complement for 06249738a41a ("workqueue:
Manually break affinity on hotplug") which is only in tip/master by now.

[1]: 
https://lore.kernel.org/r/ff62e3ee994efb3620177bf7b19fab16f4866845.ca...@redhat.com

Cc: Hillf Danton 
Cc: Valentin Schneider 
Cc: Qian Cai 
Cc: Peter Zijlstra 
Cc: Vincent Donnefort 
Cc: Tejun Heo 

Lai Jiangshan (10):
  workqueue: restore unbound_workers' cpumask correctly
  workqueue: use cpu_possible_mask instead of cpu_active_mask to break
affinity
  workqueue: Manually break affinity on pool detachment
  workqueue: don't set the worker's cpumask when kthread_bind_mask()
  workqueue: introduce wq_online_cpumask
  workqueue: use wq_online_cpumask in restore_unbound_workers_cpumask()
  workqueue: Manually break affinity on hotplug for unbound pool
  workqueue: reorganize workqueue_online_cpu()
  workqueue: reorganize workqueue_offline_cpu() unbind_workers()
  workqueue: Fix affinity of kworkers when attaching into pool

 kernel/workqueue.c | 212 +++--
 1 file changed, 130 insertions(+), 82 deletions(-)

-- 
2.19.1.6.gb485710b