Re: [PATCH 02/10] workqueue: use cpu_possible_mask instead of cpu_active_mask to break affinity

2020-12-16 Thread Tejun Heo
Hello,

On Mon, Dec 14, 2020 at 11:54:49PM +0800, Lai Jiangshan wrote:
> @@ -4909,8 +4909,9 @@ static void unbind_workers(int cpu)
>  
>   raw_spin_unlock_irq(>lock);
>  
> + /* don't rely on the scheduler to force break affinity for us. 
> */

I'm not sure this comment is helpful. The comment may make sense right now
while the scheduler behavior is changing but down the line it's not gonna
make whole lot of sense.

>   for_each_pool_worker(worker, pool)
> - WARN_ON_ONCE(set_cpus_allowed_ptr(worker->task, 
> cpu_active_mask) < 0);
> + WARN_ON_ONCE(set_cpus_allowed_ptr(worker->task, 
> cpu_possible_mask) < 0);
>  
>   mutex_unlock(_pool_attach_mutex);

Thanks.

-- 
tejun


Re: [PATCH 02/10] workqueue: use cpu_possible_mask instead of cpu_active_mask to break affinity

2020-12-15 Thread Peter Zijlstra
On Mon, Dec 14, 2020 at 06:25:34PM +0100, Peter Zijlstra wrote:
> On Mon, Dec 14, 2020 at 11:54:49PM +0800, Lai Jiangshan wrote:
> > From: Lai Jiangshan 
> > 
> > There might be other CPU online. The workers losing binding on its CPU
> > should have chance to work on those later onlined CPUs.
> > 
> > Fixes: 06249738a41a ("workqueue: Manually break affinity on hotplug")
> > Signed-off-by: Lai Jiangshan 
> > ---
> >  kernel/workqueue.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> > 
> > diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> > index aba71ab359dd..1f5b8385c0cf 100644
> > --- a/kernel/workqueue.c
> > +++ b/kernel/workqueue.c
> > @@ -4909,8 +4909,9 @@ static void unbind_workers(int cpu)
> >  
> > raw_spin_unlock_irq(>lock);
> >  
> > +   /* don't rely on the scheduler to force break affinity for us. 
> > */
> > for_each_pool_worker(worker, pool)
> > -   WARN_ON_ONCE(set_cpus_allowed_ptr(worker->task, 
> > cpu_active_mask) < 0);
> > +   WARN_ON_ONCE(set_cpus_allowed_ptr(worker->task, 
> > cpu_possible_mask) < 0);
> 
> Please explain this one.. it's not making sense. Also the Changelog
> doesn't seem remotely related to the actual change.
> 
> Afaict this is actively wrong.

I think I was too tired, I see what you're doing now and it should work
fine, I still think the changelog could use help though.


Re: [PATCH 02/10] workqueue: use cpu_possible_mask instead of cpu_active_mask to break affinity

2020-12-15 Thread Lai Jiangshan
On Tue, Dec 15, 2020 at 1:25 AM Peter Zijlstra  wrote:
>
> On Mon, Dec 14, 2020 at 11:54:49PM +0800, Lai Jiangshan wrote:
> > From: Lai Jiangshan 
> >
> > There might be other CPU online. The workers losing binding on its CPU
> > should have chance to work on those later onlined CPUs.
> >
> > Fixes: 06249738a41a ("workqueue: Manually break affinity on hotplug")
> > Signed-off-by: Lai Jiangshan 
> > ---
> >  kernel/workqueue.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> > index aba71ab359dd..1f5b8385c0cf 100644
> > --- a/kernel/workqueue.c
> > +++ b/kernel/workqueue.c
> > @@ -4909,8 +4909,9 @@ static void unbind_workers(int cpu)
> >
> >   raw_spin_unlock_irq(>lock);
> >
> > + /* don't rely on the scheduler to force break affinity for 
> > us. */
> >   for_each_pool_worker(worker, pool)
> > - WARN_ON_ONCE(set_cpus_allowed_ptr(worker->task, 
> > cpu_active_mask) < 0);
> > + WARN_ON_ONCE(set_cpus_allowed_ptr(worker->task, 
> > cpu_possible_mask) < 0);
>
> Please explain this one.. it's not making sense. Also the Changelog
> doesn't seem remotely related to the actual change.

If the scheduler doesn't break affinity for us any more, I hope that
we can "emulate" previous behavior when the scheduler did breaks affinity
for us. The behavior is "changing the cpumask to cpu_possible_mask".

And there might be some other CPUs online later while the worker is
still running with the pending work items.  I hope the worker can also
use the later online CPUs as before.  If we use cpu_active_mask here,
we can't achieve this.   This is what the changelog said.  I don't know
which wording is better, I will combine both if this reason stands.


>
> Afaict this is actively wrong.
>
> Also, can you please not Cc me parts of a series? That's bloody
> annoying.


Sorry about it.  I was taught "once don't send the whole series to
someone" and very probably I missed the conditions about it.  I think
in this case, I should Cc you the whole series.  May I?

Thanks
Lai


Re: [PATCH 02/10] workqueue: use cpu_possible_mask instead of cpu_active_mask to break affinity

2020-12-14 Thread Peter Zijlstra
On Mon, Dec 14, 2020 at 11:54:49PM +0800, Lai Jiangshan wrote:
> From: Lai Jiangshan 
> 
> There might be other CPU online. The workers losing binding on its CPU
> should have chance to work on those later onlined CPUs.
> 
> Fixes: 06249738a41a ("workqueue: Manually break affinity on hotplug")
> Signed-off-by: Lai Jiangshan 
> ---
>  kernel/workqueue.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index aba71ab359dd..1f5b8385c0cf 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -4909,8 +4909,9 @@ static void unbind_workers(int cpu)
>  
>   raw_spin_unlock_irq(>lock);
>  
> + /* don't rely on the scheduler to force break affinity for us. 
> */
>   for_each_pool_worker(worker, pool)
> - WARN_ON_ONCE(set_cpus_allowed_ptr(worker->task, 
> cpu_active_mask) < 0);
> + WARN_ON_ONCE(set_cpus_allowed_ptr(worker->task, 
> cpu_possible_mask) < 0);

Please explain this one.. it's not making sense. Also the Changelog
doesn't seem remotely related to the actual change.

Afaict this is actively wrong.

Also, can you please not Cc me parts of a series? That's bloody
annoying.


[PATCH 02/10] workqueue: use cpu_possible_mask instead of cpu_active_mask to break affinity

2020-12-14 Thread Lai Jiangshan
From: Lai Jiangshan 

There might be other CPU online. The workers losing binding on its CPU
should have chance to work on those later onlined CPUs.

Fixes: 06249738a41a ("workqueue: Manually break affinity on hotplug")
Signed-off-by: Lai Jiangshan 
---
 kernel/workqueue.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index aba71ab359dd..1f5b8385c0cf 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -4909,8 +4909,9 @@ static void unbind_workers(int cpu)
 
raw_spin_unlock_irq(>lock);
 
+   /* don't rely on the scheduler to force break affinity for us. 
*/
for_each_pool_worker(worker, pool)
-   WARN_ON_ONCE(set_cpus_allowed_ptr(worker->task, 
cpu_active_mask) < 0);
+   WARN_ON_ONCE(set_cpus_allowed_ptr(worker->task, 
cpu_possible_mask) < 0);
 
mutex_unlock(_pool_attach_mutex);
 
-- 
2.19.1.6.gb485710b