Re: [PATCH] kthread: kthread_bind fails to enforce CPU affinity (fixes kernel BUG at kernel/smpboot.c:134!)

2014-12-08 Thread Ingo Molnar

* Anton Blanchard an...@samba.org wrote:

 I have a busy ppc64le KVM box where guests sometimes hit the 
 infamous kernel BUG at kernel/smpboot.c:134! issue during 
 boot:
 
 BUG_ON(td-cpu != smp_processor_id());
 
 Basically a per CPU hotplug thread scheduled on the wrong CPU. The oops
 output confirms it:
 
 CPU: 0
 Comm: watchdog/130
 
 The issue is in kthread_bind where we set the cpus_allowed 
 mask, but do not touch task_thread_info(p)-cpu. The scheduler 
 assumes the previously scheduled CPU is in the cpus_allowed 
 mask, but in this case we are moving a thread to another CPU so 
 it is not.
 
 We used to call set_task_cpu which sets 
 task_thread_info(p)-cpu (in fact kthread_bind still has a 
 comment suggesting this). That was removed in e2912009fb7b 
 (sched: Ensure set_task_cpu() is never called on blocked 
 tasks).
 
 Since we cannot call set_task_cpu (the task is in a sleeping 
 state), just do an explicit set of task_thread_info(p)-cpu.

So we cannot call set_task_cpu() because in the normal life time 
of a task the -cpu value gets set on wakeup. So if a task is 
blocked right now, and its affinity changes, it ought to get a 
correct -cpu selected on wakeup. The affinity mask and the 
current value of -cpu getting out of sync is thus 'normal'.

(Check for example how set_cpus_allowed_ptr() works: we first set 
the new allowed mask, then do we migrate the task away if 
necessary.)

In the kthread_bind() case this is explicitly assumed: it only 
calls do_set_cpus_allowed().

But obviously the bug triggers in kernel/smpboot.c, and that 
assert shows a real bug - and your patch makes the assert go 
away, so the question is, how did the kthread get woken up and 
put on a runqueue without its -cpu getting set?

One possibility is a generic scheduler bug in ttwu(), resulting 
in -cpu not getting set properly. If this was the case then 
other places would be blowing up as well, and I don't think we 
are seeing this currently, especially not over such a long 
timespan.

Another possibility would be that kthread_bind()'s assumption 
that the task is inactive is false: if the task activates when we 
think it's blocked and we just hotplug-migrate it away while its 
running (setting its td-cpu?), the assert could trigger I think 
- and the patch would make the assert go away.

A third possibility would be, if this is a freshly created 
thread, some sort of initialization race - either in the kthread 
or in the scheduler code.

Weird.

Thanks,

Ingo
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] kthread: kthread_bind fails to enforce CPU affinity (fixes kernel BUG at kernel/smpboot.c:134!)

2014-12-08 Thread Anton Blanchard

Hi Ingo,

 So we cannot call set_task_cpu() because in the normal life time 
 of a task the -cpu value gets set on wakeup. So if a task is 
 blocked right now, and its affinity changes, it ought to get a 
 correct -cpu selected on wakeup. The affinity mask and the 
 current value of -cpu getting out of sync is thus 'normal'.
 
 (Check for example how set_cpus_allowed_ptr() works: we first set 
 the new allowed mask, then do we migrate the task away if 
 necessary.)
 
 In the kthread_bind() case this is explicitly assumed: it only 
 calls do_set_cpus_allowed().
 
 But obviously the bug triggers in kernel/smpboot.c, and that 
 assert shows a real bug - and your patch makes the assert go 
 away, so the question is, how did the kthread get woken up and 
 put on a runqueue without its -cpu getting set?

I started going down this line earlier today, and found things like:

select_task_rq_fair:

if (p-nr_cpus_allowed == 1)
return prev_cpu;

I tried returning cpumask_first(tsk_cpus_allowed()) instead, and while
I couldn't hit the BUG I did manage to get a scheduler lockup during
testing.

At that point I thought the previous task_cpu() was somewhat ingrained
in the scheduler and came up with the patch. If not, we could go on a
hunt to see what else needs fixing.

Anton
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] kthread: kthread_bind fails to enforce CPU affinity (fixes kernel BUG at kernel/smpboot.c:134!)

2014-12-08 Thread Steven Rostedt
On Mon,  8 Dec 2014 14:27:01 +1100
Anton Blanchard an...@samba.org wrote:

 I have a busy ppc64le KVM box where guests sometimes hit the infamous
 kernel BUG at kernel/smpboot.c:134! issue during boot:
 
 BUG_ON(td-cpu != smp_processor_id());
 
 Basically a per CPU hotplug thread scheduled on the wrong CPU. The oops
 output confirms it:
 
 CPU: 0
 Comm: watchdog/130
 
 The issue is in kthread_bind where we set the cpus_allowed mask, but do
 not touch task_thread_info(p)-cpu. The scheduler assumes the previously
 scheduled CPU is in the cpus_allowed mask, but in this case we are
 moving a thread to another CPU so it is not.
 

Does this happen always on boot up, and always with the watchdog thread?

I followed the logic that starts the watchdog threads.

watchdog_enable_all_cpus()
  smpboot_register_percpu-thread() {

for_each_online_cpu(cpu) { ... }

Where watchdog_enable_all_cpus() can be called by
lockup_detector_init() before SMP is started, but also by
proc_dowatchdog() which is called by the sysctl commands (after SMP is
up and running).

I noticed there's no get_online_cpus() anywhere, although the
unregister_percpu_thread() has it. Is it possible that we created a
thread on a CPU that wasn't fully online yet?

Perhaps the following patch is needed? Even if this isn't the solution
to this bug, it is probably needed as watchdog_enable_all_cpus() can be
called after boot up too.

-- Steve

diff --git a/kernel/smpboot.c b/kernel/smpboot.c
index eb89e1807408..60d35ac5d3f1 100644
--- a/kernel/smpboot.c
+++ b/kernel/smpboot.c
@@ -279,6 +279,7 @@ int smpboot_register_percpu_thread(struct 
smp_hotplug_thread *plug_thread)
unsigned int cpu;
int ret = 0;
 
+   get_online_cpus();
mutex_lock(smpboot_threads_lock);
for_each_online_cpu(cpu) {
ret = __smpboot_create_thread(plug_thread, cpu);
@@ -291,6 +292,7 @@ int smpboot_register_percpu_thread(struct 
smp_hotplug_thread *plug_thread)
list_add(plug_thread-list, hotplug_threads);
 out:
mutex_unlock(smpboot_threads_lock);
+   put_online_cpus();
return ret;
 }
 EXPORT_SYMBOL_GPL(smpboot_register_percpu_thread);
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] kthread: kthread_bind fails to enforce CPU affinity (fixes kernel BUG at kernel/smpboot.c:134!)

2014-12-08 Thread Lai Jiangshan
On 12/08/2014 09:54 PM, Steven Rostedt wrote:
 On Mon,  8 Dec 2014 14:27:01 +1100
 Anton Blanchard an...@samba.org wrote:
 
 I have a busy ppc64le KVM box where guests sometimes hit the infamous
 kernel BUG at kernel/smpboot.c:134! issue during boot:

 BUG_ON(td-cpu != smp_processor_id());

 Basically a per CPU hotplug thread scheduled on the wrong CPU. The oops
 output confirms it:

 CPU: 0
 Comm: watchdog/130

 The issue is in kthread_bind where we set the cpus_allowed mask, but do
 not touch task_thread_info(p)-cpu. The scheduler assumes the previously
 scheduled CPU is in the cpus_allowed mask, but in this case we are
 moving a thread to another CPU so it is not.

 
 Does this happen always on boot up, and always with the watchdog thread?
 
 I followed the logic that starts the watchdog threads.
 
 watchdog_enable_all_cpus()
   smpboot_register_percpu-thread() {
 
 for_each_online_cpu(cpu) { ... }
 
 Where watchdog_enable_all_cpus() can be called by
 lockup_detector_init() before SMP is started, but also by
 proc_dowatchdog() which is called by the sysctl commands (after SMP is
 up and running).
 
 I noticed there's no get_online_cpus() anywhere, although the
 unregister_percpu_thread() has it. Is it possible that we created a
 thread on a CPU that wasn't fully online yet?
 
 Perhaps the following patch is needed? Even if this isn't the solution
 to this bug, it is probably needed as watchdog_enable_all_cpus() can be
 called after boot up too.
 
 -- Steve


Hi, Steven, tglx

See this https://lkml.org/lkml/2014/7/30/804
[PATCH] smpboot: add missing get_online_cpus() when register


Thanks,
Lai

 
 diff --git a/kernel/smpboot.c b/kernel/smpboot.c
 index eb89e1807408..60d35ac5d3f1 100644
 --- a/kernel/smpboot.c
 +++ b/kernel/smpboot.c
 @@ -279,6 +279,7 @@ int smpboot_register_percpu_thread(struct 
 smp_hotplug_thread *plug_thread)
   unsigned int cpu;
   int ret = 0;
  
 + get_online_cpus();
   mutex_lock(smpboot_threads_lock);
   for_each_online_cpu(cpu) {
   ret = __smpboot_create_thread(plug_thread, cpu);
 @@ -291,6 +292,7 @@ int smpboot_register_percpu_thread(struct 
 smp_hotplug_thread *plug_thread)
   list_add(plug_thread-list, hotplug_threads);
  out:
   mutex_unlock(smpboot_threads_lock);
 + put_online_cpus();
   return ret;
  }
  EXPORT_SYMBOL_GPL(smpboot_register_percpu_thread);
 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/
 .
 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH] kthread: kthread_bind fails to enforce CPU affinity (fixes kernel BUG at kernel/smpboot.c:134!)

2014-12-07 Thread Anton Blanchard
I have a busy ppc64le KVM box where guests sometimes hit the infamous
kernel BUG at kernel/smpboot.c:134! issue during boot:

BUG_ON(td-cpu != smp_processor_id());

Basically a per CPU hotplug thread scheduled on the wrong CPU. The oops
output confirms it:

CPU: 0
Comm: watchdog/130

The issue is in kthread_bind where we set the cpus_allowed mask, but do
not touch task_thread_info(p)-cpu. The scheduler assumes the previously
scheduled CPU is in the cpus_allowed mask, but in this case we are
moving a thread to another CPU so it is not.

We used to call set_task_cpu which sets task_thread_info(p)-cpu (in fact
kthread_bind still has a comment suggesting this). That was removed in
e2912009fb7b (sched: Ensure set_task_cpu() is never called on blocked
tasks).

Since we cannot call set_task_cpu (the task is in a sleeping state),
just do an explicit set of task_thread_info(p)-cpu.

Fixes: e2912009fb7b (sched: Ensure set_task_cpu() is never called on blocked 
tasks)
Cc: sta...@vger.kernel.org
Signed-off-by: Anton Blanchard an...@samba.org
---
 kernel/kthread.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/kthread.c b/kernel/kthread.c
index 10e489c..e40ab1d 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -327,13 +327,14 @@ EXPORT_SYMBOL(kthread_create_on_node);
 
 static void __kthread_bind(struct task_struct *p, unsigned int cpu, long state)
 {
-   /* Must have done schedule() in kthread() before we set_task_cpu */
+   /* Must have done schedule() in kthread() before we change affinity */
if (!wait_task_inactive(p, state)) {
WARN_ON(1);
return;
}
/* It's safe because the task is inactive. */
do_set_cpus_allowed(p, cpumask_of(cpu));
+   task_thread_info(p)-cpu = cpu;
p-flags |= PF_NO_SETAFFINITY;
 }
 
-- 
2.1.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] kthread: kthread_bind fails to enforce CPU affinity (fixes kernel BUG at kernel/smpboot.c:134!)

2014-12-07 Thread Linus Torvalds
On Sun, Dec 7, 2014 at 7:27 PM, Anton Blanchard an...@samba.org wrote:

 Since we cannot call set_task_cpu (the task is in a sleeping state),
 just do an explicit set of task_thread_info(p)-cpu.

Scheduler people: is this sufficient and ok?

The __set_task_cpu() function does various other things too:

set_task_rq(p, cpu);
  #ifdef CONFIG_SMP
/*
 * After -cpu is set up to a new value, task_rq_lock(p, ...) can be
 * successfuly executed on another CPU. We must ensure that updates of
 * per-task data have been completed by this moment.
 */
smp_wmb();
task_thread_info(p)-cpu = cpu;
p-wake_cpu = cpu;
  #endif

which makes me worry about just setting the thread_info-cpu value.
That set_task_rq() initializes various group scheduling things, an
dthat whole wake_cpu thing seems relevant too.

I'm not saying the patch is wrong, I just want confirmation/ack for
it. Although to be honest, I'd rather see this come in through the
scheduler tree anyway.

Hmm? This seems to go back to 2.6.33 if that Fixes line is accurate,
so it's been around forever (almost exactly five years). Comments?

Linus
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] kthread: kthread_bind fails to enforce CPU affinity (fixes kernel BUG at kernel/smpboot.c:134!)

2014-12-07 Thread Anton Blanchard

Hi Linus,

 The __set_task_cpu() function does various other things too:
 
 set_task_rq(p, cpu);
   #ifdef CONFIG_SMP
 /*
  * After -cpu is set up to a new value, task_rq_lock(p, ...)
 can be
  * successfuly executed on another CPU. We must ensure that
 updates of
  * per-task data have been completed by this moment.
  */
 smp_wmb();
 task_thread_info(p)-cpu = cpu;
 p-wake_cpu = cpu;
   #endif
 
 which makes me worry about just setting the thread_info-cpu value.
 That set_task_rq() initializes various group scheduling things, an
 dthat whole wake_cpu thing seems relevant too.

Yeah, I would definitely like the scheduler guys to weigh in on this,
especially considering how difficult it can be to hit.

Anton
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev