Re: workqueue: WARN at at kernel/workqueue.c:2176

2014-09-04 Thread Peter Zijlstra
On Thu, Sep 04, 2014 at 10:22:08AM +0800, Lai Jiangshan wrote: > > dest_cpu = cpumask_any_and(cpu_active_mask, new_mask); > > - if (task_on_rq_queued(p)) { > > + if (task_on_rq_queued(p) || p->state == TASK_WAKING) { > > unrelated question: why we have to stop the cpu even the task is >

Re: workqueue: WARN at at kernel/workqueue.c:2176

2014-09-04 Thread Peter Zijlstra
On Thu, Sep 04, 2014 at 10:22:08AM +0800, Lai Jiangshan wrote: dest_cpu = cpumask_any_and(cpu_active_mask, new_mask); - if (task_on_rq_queued(p)) { + if (task_on_rq_queued(p) || p-state == TASK_WAKING) { unrelated question: why we have to stop the cpu even the task is not

Re: workqueue: WARN at at kernel/workqueue.c:2176

2014-09-03 Thread Lai Jiangshan
On 09/03/2014 11:15 PM, Peter Zijlstra wrote: > On Mon, Sep 01, 2014 at 11:04:23AM +0800, Lai Jiangshan wrote: >> Hi, Peter >> >> Could you make a patch for it, please? Jason J. Herne's test showed we >> addressed the bug. But the fix is not in kernel yet. Some new highly >> related reports are

Re: workqueue: WARN at at kernel/workqueue.c:2176

2014-09-03 Thread Peter Zijlstra
On Mon, Sep 01, 2014 at 11:04:23AM +0800, Lai Jiangshan wrote: > Hi, Peter > > Could you make a patch for it, please? Jason J. Herne's test showed we > addressed the bug. But the fix is not in kernel yet. Some new highly > related reports are come up again. > > I don't want to argue any more,

Re: workqueue: WARN at at kernel/workqueue.c:2176

2014-09-03 Thread Peter Zijlstra
On Mon, Sep 01, 2014 at 11:04:23AM +0800, Lai Jiangshan wrote: Hi, Peter Could you make a patch for it, please? Jason J. Herne's test showed we addressed the bug. But the fix is not in kernel yet. Some new highly related reports are come up again. I don't want to argue any more, no

Re: workqueue: WARN at at kernel/workqueue.c:2176

2014-09-03 Thread Lai Jiangshan
On 09/03/2014 11:15 PM, Peter Zijlstra wrote: On Mon, Sep 01, 2014 at 11:04:23AM +0800, Lai Jiangshan wrote: Hi, Peter Could you make a patch for it, please? Jason J. Herne's test showed we addressed the bug. But the fix is not in kernel yet. Some new highly related reports are come up

Re: workqueue: WARN at at kernel/workqueue.c:2176

2014-08-31 Thread Lai Jiangshan
Hi, Peter Could you make a patch for it, please? Jason J. Herne's test showed we addressed the bug. But the fix is not in kernel yet. Some new highly related reports are come up again. I don't want to argue any more, no matter how the patch will be, I will accept. And please add the following

Re: workqueue: WARN at at kernel/workqueue.c:2176

2014-08-31 Thread Lai Jiangshan
Hi, Peter Could you make a patch for it, please? Jason J. Herne's test showed we addressed the bug. But the fix is not in kernel yet. Some new highly related reports are come up again. I don't want to argue any more, no matter how the patch will be, I will accept. And please add the following

Re: workqueue: WARN at at kernel/workqueue.c:2176

2014-06-15 Thread Lai Jiangshan
Hi, Peter Ping... thanks, Lai On 06/10/2014 09:21 AM, Lai Jiangshan wrote: > On 06/09/2014 10:01 PM, Jason J. Herne wrote: >> On 06/05/2014 06:54 AM, Lai Jiangshan wrote: >>> >>> >>> Subject: [PATCH] sched: migrate the waking tasks >>> >>> Current code skips to migrate the waking

Re: workqueue: WARN at at kernel/workqueue.c:2176

2014-06-15 Thread Lai Jiangshan
Hi, Peter Ping... thanks, Lai On 06/10/2014 09:21 AM, Lai Jiangshan wrote: On 06/09/2014 10:01 PM, Jason J. Herne wrote: On 06/05/2014 06:54 AM, Lai Jiangshan wrote: Subject: [PATCH] sched: migrate the waking tasks Current code skips to migrate the waking task silently when

Re: workqueue: WARN at at kernel/workqueue.c:2176

2014-06-09 Thread Lai Jiangshan
On 06/09/2014 10:01 PM, Jason J. Herne wrote: > On 06/05/2014 06:54 AM, Lai Jiangshan wrote: >> >> >> Subject: [PATCH] sched: migrate the waking tasks >> >> Current code skips to migrate the waking task silently when TTWU_QUEUE is >> enabled. >> >> When a task is waking, it is

Re: workqueue: WARN at at kernel/workqueue.c:2176

2014-06-09 Thread Jason J. Herne
On 06/05/2014 06:54 AM, Lai Jiangshan wrote: Subject: [PATCH] sched: migrate the waking tasks Current code skips to migrate the waking task silently when TTWU_QUEUE is enabled. When a task is waking, it is pending on the wake_list of the rq, but it is not on queue (task->on_rq

Re: workqueue: WARN at at kernel/workqueue.c:2176

2014-06-09 Thread Jason J. Herne
On 06/05/2014 06:54 AM, Lai Jiangshan wrote: Subject: [PATCH] sched: migrate the waking tasks Current code skips to migrate the waking task silently when TTWU_QUEUE is enabled. When a task is waking, it is pending on the wake_list of the rq, but it is not on queue (task-on_rq ==

Re: workqueue: WARN at at kernel/workqueue.c:2176

2014-06-09 Thread Lai Jiangshan
On 06/09/2014 10:01 PM, Jason J. Herne wrote: On 06/05/2014 06:54 AM, Lai Jiangshan wrote: Subject: [PATCH] sched: migrate the waking tasks Current code skips to migrate the waking task silently when TTWU_QUEUE is enabled. When a task is waking, it is pending on the

Re: workqueue: WARN at at kernel/workqueue.c:2176

2014-06-07 Thread Lai Jiangshan
On 06/06/2014 09:36 PM, Peter Zijlstra wrote: > On Thu, Jun 05, 2014 at 06:54:35PM +0800, Lai Jiangshan wrote: >> diff --git a/kernel/sched/core.c b/kernel/sched/core.c >> index 268a45e..d05a5a1 100644 >> --- a/kernel/sched/core.c >> +++ b/kernel/sched/core.c >> @@ -1474,20 +1474,24 @@ static int

Re: workqueue: WARN at at kernel/workqueue.c:2176

2014-06-07 Thread Lai Jiangshan
On 06/06/2014 09:36 PM, Peter Zijlstra wrote: On Thu, Jun 05, 2014 at 06:54:35PM +0800, Lai Jiangshan wrote: diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 268a45e..d05a5a1 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -1474,20 +1474,24 @@ static int

Re: workqueue: WARN at at kernel/workqueue.c:2176

2014-06-06 Thread Peter Zijlstra
On Thu, Jun 05, 2014 at 06:54:35PM +0800, Lai Jiangshan wrote: > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > index 268a45e..d05a5a1 100644 > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -1474,20 +1474,24 @@ static int ttwu_remote(struct task_struct *p, int > wake_flags)

Re: workqueue: WARN at at kernel/workqueue.c:2176

2014-06-06 Thread Jason J. Herne
On 06/05/2014 06:54 AM, Lai Jiangshan wrote: The patch is not tested by Jason, I don't know whether the patch fix the problem. The changlog including the "Reported-by:" and "Tested-by:" need to be updated after it is proved. With this patch, my workload ran overnight without hitting the

Re: workqueue: WARN at at kernel/workqueue.c:2176

2014-06-06 Thread Jason J. Herne
On 06/05/2014 06:54 AM, Lai Jiangshan wrote: The patch is not tested by Jason, I don't know whether the patch fix the problem. The changlog including the Reported-by: and Tested-by: need to be updated after it is proved. With this patch, my workload ran overnight without hitting the warning.

Re: workqueue: WARN at at kernel/workqueue.c:2176

2014-06-06 Thread Peter Zijlstra
On Thu, Jun 05, 2014 at 06:54:35PM +0800, Lai Jiangshan wrote: diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 268a45e..d05a5a1 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -1474,20 +1474,24 @@ static int ttwu_remote(struct task_struct *p, int wake_flags) }

Re: workqueue: WARN at at kernel/workqueue.c:2176

2014-06-05 Thread Jason J. Herne
On 06/05/2014 06:54 AM, Lai Jiangshan wrote: The patch is not tested by Jason, I don't know whether the patch fix the problem. The changlog including the "Reported-by:" and "Tested-by:" need to be updated after it is proved. I will test this one and get back to you as soon as possible. -- --

Re: workqueue: WARN at at kernel/workqueue.c:2176

2014-06-05 Thread Lai Jiangshan
The patch is not tested by Jason, I don't know whether the patch fix the problem. The changlog including the "Reported-by:" and "Tested-by:" need to be updated after it is proved. Subject: [PATCH] sched: migrate the waking tasks Current code skips to migrate the waking task

Re: workqueue: WARN at at kernel/workqueue.c:2176

2014-06-05 Thread Lai Jiangshan
The patch is not tested by Jason, I don't know whether the patch fix the problem. The changlog including the Reported-by: and Tested-by: need to be updated after it is proved. Subject: [PATCH] sched: migrate the waking tasks Current code skips to migrate the waking task silently

Re: workqueue: WARN at at kernel/workqueue.c:2176

2014-06-05 Thread Jason J. Herne
On 06/05/2014 06:54 AM, Lai Jiangshan wrote: The patch is not tested by Jason, I don't know whether the patch fix the problem. The changlog including the Reported-by: and Tested-by: need to be updated after it is proved. I will test this one and get back to you as soon as possible. -- --

Re: workqueue: WARN at at kernel/workqueue.c:2176

2014-06-04 Thread Peter Zijlstra
On Wed, Jun 04, 2014 at 04:25:15PM +0800, Lai Jiangshan wrote: > I think the following code works. (inspirited from the sched_ttwu_pending() > in migration_call().) > > if p->on_rq == 0 && p->state == TASK_WAKING in __migrate_task() after this > patch, > it means the cpuallowed is changed

Re: workqueue: WARN at at kernel/workqueue.c:2176

2014-06-04 Thread Lai Jiangshan
On 06/04/2014 02:49 PM, Peter Zijlstra wrote: > On Wed, Jun 04, 2014 at 10:27:25AM +0800, Lai Jiangshan wrote: >>> Hmm, yes I think you're right. A queued wakeup can miss an affinity >>> change like that. >>> >>> Something like the below ought to cure that I suppose.. >> >> As a non-scheduler

Re: workqueue: WARN at at kernel/workqueue.c:2176

2014-06-04 Thread Peter Zijlstra
On Wed, Jun 04, 2014 at 10:27:25AM +0800, Lai Jiangshan wrote: > > Hmm, yes I think you're right. A queued wakeup can miss an affinity > > change like that. > > > > Something like the below ought to cure that I suppose.. > > As a non-scheduler developer, I can't find anything wrong with the

Re: workqueue: WARN at at kernel/workqueue.c:2176

2014-06-04 Thread Peter Zijlstra
On Wed, Jun 04, 2014 at 10:28:50AM +0800, Lai Jiangshan wrote: > On 06/03/2014 07:24 PM, Lai Jiangshan wrote: > > Hi, Jason > > > > Could you test again after the following command has done. > > (if Peter hasn't asked you test with this command before nor he doesn't > > stop you now) > > > >

Re: workqueue: WARN at at kernel/workqueue.c:2176

2014-06-04 Thread Peter Zijlstra
On Wed, Jun 04, 2014 at 10:28:50AM +0800, Lai Jiangshan wrote: On 06/03/2014 07:24 PM, Lai Jiangshan wrote: Hi, Jason Could you test again after the following command has done. (if Peter hasn't asked you test with this command before nor he doesn't stop you now) echo

Re: workqueue: WARN at at kernel/workqueue.c:2176

2014-06-04 Thread Peter Zijlstra
On Wed, Jun 04, 2014 at 10:27:25AM +0800, Lai Jiangshan wrote: Hmm, yes I think you're right. A queued wakeup can miss an affinity change like that. Something like the below ought to cure that I suppose.. As a non-scheduler developer, I can't find anything wrong with the patch (I

Re: workqueue: WARN at at kernel/workqueue.c:2176

2014-06-04 Thread Lai Jiangshan
On 06/04/2014 02:49 PM, Peter Zijlstra wrote: On Wed, Jun 04, 2014 at 10:27:25AM +0800, Lai Jiangshan wrote: Hmm, yes I think you're right. A queued wakeup can miss an affinity change like that. Something like the below ought to cure that I suppose.. As a non-scheduler developer, I can't

Re: workqueue: WARN at at kernel/workqueue.c:2176

2014-06-04 Thread Peter Zijlstra
On Wed, Jun 04, 2014 at 04:25:15PM +0800, Lai Jiangshan wrote: I think the following code works. (inspirited from the sched_ttwu_pending() in migration_call().) if p-on_rq == 0 p-state == TASK_WAKING in __migrate_task() after this patch, it means the cpuallowed is changed before

Re: workqueue: WARN at at kernel/workqueue.c:2176

2014-06-03 Thread Lai Jiangshan
On 06/03/2014 07:24 PM, Lai Jiangshan wrote: > Hi, Jason > > Could you test again after the following command has done. > (if Peter hasn't asked you test with this command before nor he doesn't stop > you now) > > echo NO_TTWU_QUEUE > /sys/kernel/debug/sched_features Off-topic! Why

Re: workqueue: WARN at at kernel/workqueue.c:2176

2014-06-03 Thread Lai Jiangshan
On 06/03/2014 10:16 PM, Peter Zijlstra wrote: > On Tue, Jun 03, 2014 at 07:24:38PM +0800, Lai Jiangshan wrote: >> Hi, Jason >> >> Could you test again after the following command has done. >> (if Peter hasn't asked you test with this command before nor he doesn't stop >> you now) >> >> echo

Re: workqueue: WARN at at kernel/workqueue.c:2176

2014-06-03 Thread Lai Jiangshan
On 06/03/2014 10:28 PM, Peter Zijlstra wrote: > On Tue, Jun 03, 2014 at 08:45:39PM +0800, Lai Jiangshan wrote: >> >> Hi, Peter, >> >> I rewrote the analyse. (scheduler_ipi() must be called before stopper-task, >> so the part for workqueue of the old analyse maybe be wrong.) > > But I don't think

Re: workqueue: WARN at at kernel/workqueue.c:2176

2014-06-03 Thread Peter Zijlstra
On Tue, Jun 03, 2014 at 08:45:39PM +0800, Lai Jiangshan wrote: > > Hi, Peter, > > I rewrote the analyse. (scheduler_ipi() must be called before stopper-task, > so the part for workqueue of the old analyse maybe be wrong.) But I don't think there is any guarantee we'll do the wakeup before

Re: workqueue: WARN at at kernel/workqueue.c:2176

2014-06-03 Thread Peter Zijlstra
On Tue, Jun 03, 2014 at 07:24:38PM +0800, Lai Jiangshan wrote: > Hi, Jason > > Could you test again after the following command has done. > (if Peter hasn't asked you test with this command before nor he doesn't stop > you now) > > echo NO_TTWU_QUEUE > /sys/kernel/debug/sched_features > >

Re: workqueue: WARN at at kernel/workqueue.c:2176

2014-06-03 Thread Lai Jiangshan
Hi, Peter, I rewrote the analyse. (scheduler_ipi() must be called before stopper-task, so the part for workqueue of the old analyse maybe be wrong.) I found something strange by review (just by review, no test yet) int set_cpus_allowed_ptr(struct task_struct *p, const struct cpumask

Re: workqueue: WARN at at kernel/workqueue.c:2176

2014-06-03 Thread Lai Jiangshan
Hi, Jason Could you test again after the following command has done. (if Peter hasn't asked you test with this command before nor he doesn't stop you now) echo NO_TTWU_QUEUE > /sys/kernel/debug/sched_features Thanks a lot. Hi, Peter, I found something strange by review (just by review, no

Re: workqueue: WARN at at kernel/workqueue.c:2176

2014-06-03 Thread Lai Jiangshan
Hi, Jason Could you test again after the following command has done. (if Peter hasn't asked you test with this command before nor he doesn't stop you now) echo NO_TTWU_QUEUE /sys/kernel/debug/sched_features Thanks a lot. Hi, Peter, I found something strange by review (just by review, no

Re: workqueue: WARN at at kernel/workqueue.c:2176

2014-06-03 Thread Lai Jiangshan
Hi, Peter, I rewrote the analyse. (scheduler_ipi() must be called before stopper-task, so the part for workqueue of the old analyse maybe be wrong.) I found something strange by review (just by review, no test yet) int set_cpus_allowed_ptr(struct task_struct *p, const struct cpumask

Re: workqueue: WARN at at kernel/workqueue.c:2176

2014-06-03 Thread Peter Zijlstra
On Tue, Jun 03, 2014 at 07:24:38PM +0800, Lai Jiangshan wrote: Hi, Jason Could you test again after the following command has done. (if Peter hasn't asked you test with this command before nor he doesn't stop you now) echo NO_TTWU_QUEUE /sys/kernel/debug/sched_features Thanks a lot.

Re: workqueue: WARN at at kernel/workqueue.c:2176

2014-06-03 Thread Peter Zijlstra
On Tue, Jun 03, 2014 at 08:45:39PM +0800, Lai Jiangshan wrote: Hi, Peter, I rewrote the analyse. (scheduler_ipi() must be called before stopper-task, so the part for workqueue of the old analyse maybe be wrong.) But I don't think there is any guarantee we'll do the wakeup before running

Re: workqueue: WARN at at kernel/workqueue.c:2176

2014-06-03 Thread Lai Jiangshan
On 06/03/2014 10:28 PM, Peter Zijlstra wrote: On Tue, Jun 03, 2014 at 08:45:39PM +0800, Lai Jiangshan wrote: Hi, Peter, I rewrote the analyse. (scheduler_ipi() must be called before stopper-task, so the part for workqueue of the old analyse maybe be wrong.) But I don't think there is any

Re: workqueue: WARN at at kernel/workqueue.c:2176

2014-06-03 Thread Lai Jiangshan
On 06/03/2014 10:16 PM, Peter Zijlstra wrote: On Tue, Jun 03, 2014 at 07:24:38PM +0800, Lai Jiangshan wrote: Hi, Jason Could you test again after the following command has done. (if Peter hasn't asked you test with this command before nor he doesn't stop you now) echo NO_TTWU_QUEUE

Re: workqueue: WARN at at kernel/workqueue.c:2176

2014-06-03 Thread Lai Jiangshan
On 06/03/2014 07:24 PM, Lai Jiangshan wrote: Hi, Jason Could you test again after the following command has done. (if Peter hasn't asked you test with this command before nor he doesn't stop you now) echo NO_TTWU_QUEUE /sys/kernel/debug/sched_features Off-topic! Why sched_features

Re: workqueue: WARN at at kernel/workqueue.c:2176

2014-05-29 Thread Jason J. Herne
On 05/27/2014 10:26 AM, Peter Zijlstra wrote: On Tue, May 27, 2014 at 10:18:31AM -0400, Jason J. Herne wrote: On 05/16/2014 12:29 PM, Peter Zijlstra wrote: On Sat, May 17, 2014 at 12:18:06AM +0800, Lai Jiangshan wrote: so the scheduler/set_cpus_allowed_ptr()/cpu_active_mask should be the

Re: workqueue: WARN at at kernel/workqueue.c:2176

2014-05-29 Thread Jason J. Herne
On 05/27/2014 10:26 AM, Peter Zijlstra wrote: On Tue, May 27, 2014 at 10:18:31AM -0400, Jason J. Herne wrote: On 05/16/2014 12:29 PM, Peter Zijlstra wrote: On Sat, May 17, 2014 at 12:18:06AM +0800, Lai Jiangshan wrote: so the scheduler/set_cpus_allowed_ptr()/cpu_active_mask should be the

Re: workqueue: WARN at at kernel/workqueue.c:2176

2014-05-27 Thread Peter Zijlstra
On Tue, May 27, 2014 at 10:18:31AM -0400, Jason J. Herne wrote: > On 05/16/2014 12:29 PM, Peter Zijlstra wrote: > >On Sat, May 17, 2014 at 12:18:06AM +0800, Lai Jiangshan wrote: > >>so the scheduler/set_cpus_allowed_ptr()/cpu_active_mask should be the first > >>place to fix. > > > >I'm not arguing

Re: workqueue: WARN at at kernel/workqueue.c:2176

2014-05-27 Thread Jason J. Herne
On 05/16/2014 12:29 PM, Peter Zijlstra wrote: On Sat, May 17, 2014 at 12:18:06AM +0800, Lai Jiangshan wrote: so the scheduler/set_cpus_allowed_ptr()/cpu_active_mask should be the first place to fix. I'm not arguing about that, not to mention that this is userspace exposed and nobody protects

Re: workqueue: WARN at at kernel/workqueue.c:2176

2014-05-27 Thread Jason J. Herne
On 05/16/2014 12:29 PM, Peter Zijlstra wrote: On Sat, May 17, 2014 at 12:18:06AM +0800, Lai Jiangshan wrote: so the scheduler/set_cpus_allowed_ptr()/cpu_active_mask should be the first place to fix. I'm not arguing about that, not to mention that this is userspace exposed and nobody protects

Re: workqueue: WARN at at kernel/workqueue.c:2176

2014-05-27 Thread Peter Zijlstra
On Tue, May 27, 2014 at 10:18:31AM -0400, Jason J. Herne wrote: On 05/16/2014 12:29 PM, Peter Zijlstra wrote: On Sat, May 17, 2014 at 12:18:06AM +0800, Lai Jiangshan wrote: so the scheduler/set_cpus_allowed_ptr()/cpu_active_mask should be the first place to fix. I'm not arguing about that,

Re: workqueue: WARN at at kernel/workqueue.c:2176

2014-05-16 Thread Peter Zijlstra
On Sat, May 17, 2014 at 12:18:06AM +0800, Lai Jiangshan wrote: > so the scheduler/set_cpus_allowed_ptr()/cpu_active_mask should be the first > place to fix. I'm not arguing about that, not to mention that this is userspace exposed and nobody protects that. But I was expecting kernel stuff that

Re: workqueue: WARN at at kernel/workqueue.c:2176

2014-05-16 Thread Lai Jiangshan
On Fri, May 16, 2014 at 7:57 PM, Peter Zijlstra wrote: > On Fri, May 16, 2014 at 11:50:42AM +0800, Lai Jiangshan wrote: >> Hi, Peter and other scheduler Gurus: >> >> When I was trying to test wq-VS-hotplug, I always hit a problem in scheduler >> with the following WARNING: >> >> [ 74.765519]

Re: workqueue: WARN at at kernel/workqueue.c:2176

2014-05-16 Thread Tejun Heo
On Fri, May 16, 2014 at 02:14:05PM +0200, Thomas Gleixner wrote: > On Fri, 16 May 2014, Tejun Heo wrote: > > On Fri, May 16, 2014 at 01:57:37PM +0200, Peter Zijlstra wrote: > > > This of course leaves the question how the workqueue code manages to > > > call set_cpu_allowed_ptr() on a cpu _before_

Re: workqueue: WARN at at kernel/workqueue.c:2176

2014-05-16 Thread Thomas Gleixner
On Fri, 16 May 2014, Tejun Heo wrote: > On Fri, May 16, 2014 at 01:57:37PM +0200, Peter Zijlstra wrote: > > This of course leaves the question how the workqueue code manages to > > call set_cpu_allowed_ptr() on a cpu _before_ its online. > > > > That too sounds fishy.. with the proposed patch the

Re: workqueue: WARN at at kernel/workqueue.c:2176

2014-05-16 Thread Tejun Heo
Hello, Peter. On Fri, May 16, 2014 at 01:57:37PM +0200, Peter Zijlstra wrote: > This of course leaves the question how the workqueue code manages to > call set_cpu_allowed_ptr() on a cpu _before_ its online. > > That too sounds fishy.. with the proposed patch the > set_cpus_allowed_ptr() will

Re: workqueue: WARN at at kernel/workqueue.c:2176

2014-05-16 Thread Peter Zijlstra
On Fri, May 16, 2014 at 11:50:42AM +0800, Lai Jiangshan wrote: > Hi, Peter and other scheduler Gurus: > > When I was trying to test wq-VS-hotplug, I always hit a problem in scheduler > with the following WARNING: > > [ 74.765519] WARNING: CPU: 1 PID: 13 at arch/x86/kernel/smp.c:124 >

Re: workqueue: WARN at at kernel/workqueue.c:2176

2014-05-16 Thread Peter Zijlstra
On Fri, May 16, 2014 at 12:16:43PM +0200, Peter Zijlstra wrote: > On Fri, May 16, 2014 at 12:15:05PM +0200, Peter Zijlstra wrote: > > --- a/kernel/sched/core.c > > +++ b/kernel/sched/core.c > > @@ -5126,7 +5126,6 @@ static int sched_cpu_active(struct notif > >

Re: workqueue: WARN at at kernel/workqueue.c:2176

2014-05-16 Thread Peter Zijlstra
On Fri, May 16, 2014 at 05:56:10PM +0800, Lai Jiangshan wrote: > On 05/16/2014 05:35 PM, Peter Zijlstra wrote: > > On Fri, May 16, 2014 at 11:50:42AM +0800, Lai Jiangshan wrote: > >> After debugging, I found the hotlug-in cpu is atctive but !online in this > >> case. > >> the problem was

Re: workqueue: WARN at at kernel/workqueue.c:2176

2014-05-16 Thread Peter Zijlstra
On Fri, May 16, 2014 at 12:15:05PM +0200, Peter Zijlstra wrote: > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -5126,7 +5126,6 @@ static int sched_cpu_active(struct notif > unsigned long action, void *hcpu) > { > switch (action &

Re: workqueue: WARN at at kernel/workqueue.c:2176

2014-05-16 Thread Peter Zijlstra
On Fri, May 16, 2014 at 11:35:30AM +0200, Peter Zijlstra wrote: > On Fri, May 16, 2014 at 11:50:42AM +0800, Lai Jiangshan wrote: > > After debugging, I found the hotlug-in cpu is atctive but !online in this > > case. > > the problem was introduced by 5fbd036b. > > Some code assumes that any cpu

Re: workqueue: WARN at at kernel/workqueue.c:2176

2014-05-16 Thread Lai Jiangshan
On 05/16/2014 05:35 PM, Peter Zijlstra wrote: > On Fri, May 16, 2014 at 11:50:42AM +0800, Lai Jiangshan wrote: >> After debugging, I found the hotlug-in cpu is atctive but !online in this >> case. >> the problem was introduced by 5fbd036b. >> Some code assumes that any cpu in cpu_active_mask is

Re: workqueue: WARN at at kernel/workqueue.c:2176

2014-05-16 Thread Peter Zijlstra
On Fri, May 16, 2014 at 11:50:42AM +0800, Lai Jiangshan wrote: > After debugging, I found the hotlug-in cpu is atctive but !online in this > case. > the problem was introduced by 5fbd036b. > Some code assumes that any cpu in cpu_active_mask is also online, but > 5fbd036b breaks > this

Re: workqueue: WARN at at kernel/workqueue.c:2176

2014-05-16 Thread Peter Zijlstra
On Fri, May 16, 2014 at 11:50:42AM +0800, Lai Jiangshan wrote: After debugging, I found the hotlug-in cpu is atctive but !online in this case. the problem was introduced by 5fbd036b. Some code assumes that any cpu in cpu_active_mask is also online, but 5fbd036b breaks this assumption, so

Re: workqueue: WARN at at kernel/workqueue.c:2176

2014-05-16 Thread Lai Jiangshan
On 05/16/2014 05:35 PM, Peter Zijlstra wrote: On Fri, May 16, 2014 at 11:50:42AM +0800, Lai Jiangshan wrote: After debugging, I found the hotlug-in cpu is atctive but !online in this case. the problem was introduced by 5fbd036b. Some code assumes that any cpu in cpu_active_mask is also

Re: workqueue: WARN at at kernel/workqueue.c:2176

2014-05-16 Thread Peter Zijlstra
On Fri, May 16, 2014 at 11:35:30AM +0200, Peter Zijlstra wrote: On Fri, May 16, 2014 at 11:50:42AM +0800, Lai Jiangshan wrote: After debugging, I found the hotlug-in cpu is atctive but !online in this case. the problem was introduced by 5fbd036b. Some code assumes that any cpu in

Re: workqueue: WARN at at kernel/workqueue.c:2176

2014-05-16 Thread Peter Zijlstra
On Fri, May 16, 2014 at 12:15:05PM +0200, Peter Zijlstra wrote: --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -5126,7 +5126,6 @@ static int sched_cpu_active(struct notif unsigned long action, void *hcpu) { switch (action ~CPU_TASKS_FROZEN)

Re: workqueue: WARN at at kernel/workqueue.c:2176

2014-05-16 Thread Peter Zijlstra
On Fri, May 16, 2014 at 05:56:10PM +0800, Lai Jiangshan wrote: On 05/16/2014 05:35 PM, Peter Zijlstra wrote: On Fri, May 16, 2014 at 11:50:42AM +0800, Lai Jiangshan wrote: After debugging, I found the hotlug-in cpu is atctive but !online in this case. the problem was introduced by

Re: workqueue: WARN at at kernel/workqueue.c:2176

2014-05-16 Thread Peter Zijlstra
On Fri, May 16, 2014 at 12:16:43PM +0200, Peter Zijlstra wrote: On Fri, May 16, 2014 at 12:15:05PM +0200, Peter Zijlstra wrote: --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -5126,7 +5126,6 @@ static int sched_cpu_active(struct notif unsigned

Re: workqueue: WARN at at kernel/workqueue.c:2176

2014-05-16 Thread Peter Zijlstra
On Fri, May 16, 2014 at 11:50:42AM +0800, Lai Jiangshan wrote: Hi, Peter and other scheduler Gurus: When I was trying to test wq-VS-hotplug, I always hit a problem in scheduler with the following WARNING: [ 74.765519] WARNING: CPU: 1 PID: 13 at arch/x86/kernel/smp.c:124

Re: workqueue: WARN at at kernel/workqueue.c:2176

2014-05-16 Thread Tejun Heo
Hello, Peter. On Fri, May 16, 2014 at 01:57:37PM +0200, Peter Zijlstra wrote: This of course leaves the question how the workqueue code manages to call set_cpu_allowed_ptr() on a cpu _before_ its online. That too sounds fishy.. with the proposed patch the set_cpus_allowed_ptr() will

Re: workqueue: WARN at at kernel/workqueue.c:2176

2014-05-16 Thread Thomas Gleixner
On Fri, 16 May 2014, Tejun Heo wrote: On Fri, May 16, 2014 at 01:57:37PM +0200, Peter Zijlstra wrote: This of course leaves the question how the workqueue code manages to call set_cpu_allowed_ptr() on a cpu _before_ its online. That too sounds fishy.. with the proposed patch the

Re: workqueue: WARN at at kernel/workqueue.c:2176

2014-05-16 Thread Tejun Heo
On Fri, May 16, 2014 at 02:14:05PM +0200, Thomas Gleixner wrote: On Fri, 16 May 2014, Tejun Heo wrote: On Fri, May 16, 2014 at 01:57:37PM +0200, Peter Zijlstra wrote: This of course leaves the question how the workqueue code manages to call set_cpu_allowed_ptr() on a cpu _before_ its

Re: workqueue: WARN at at kernel/workqueue.c:2176

2014-05-16 Thread Lai Jiangshan
On Fri, May 16, 2014 at 7:57 PM, Peter Zijlstra pet...@infradead.org wrote: On Fri, May 16, 2014 at 11:50:42AM +0800, Lai Jiangshan wrote: Hi, Peter and other scheduler Gurus: When I was trying to test wq-VS-hotplug, I always hit a problem in scheduler with the following WARNING: [

Re: workqueue: WARN at at kernel/workqueue.c:2176

2014-05-16 Thread Peter Zijlstra
On Sat, May 17, 2014 at 12:18:06AM +0800, Lai Jiangshan wrote: so the scheduler/set_cpus_allowed_ptr()/cpu_active_mask should be the first place to fix. I'm not arguing about that, not to mention that this is userspace exposed and nobody protects that. But I was expecting kernel stuff that

Re: workqueue: WARN at at kernel/workqueue.c:2176

2014-05-15 Thread Lai Jiangshan
On 05/15/2014 12:52 AM, Jason J. Herne wrote: > On 05/12/2014 10:17 PM, Sasha Levin wrote: >> I don't have an easy way to reproduce it as I only saw the bug once, but >> it happened when I started pressuring CPU hotplug paths by adding and >> removing >> CPUs often. Maybe it has anything to do

Re: workqueue: WARN at at kernel/workqueue.c:2176

2014-05-15 Thread Lai Jiangshan
On 05/15/2014 12:52 AM, Jason J. Herne wrote: On 05/12/2014 10:17 PM, Sasha Levin wrote: I don't have an easy way to reproduce it as I only saw the bug once, but it happened when I started pressuring CPU hotplug paths by adding and removing CPUs often. Maybe it has anything to do with that?

Re: workqueue: WARN at at kernel/workqueue.c:2176

2014-05-14 Thread Jason J. Herne
On 05/12/2014 10:17 PM, Sasha Levin wrote: I don't have an easy way to reproduce it as I only saw the bug once, but it happened when I started pressuring CPU hotplug paths by adding and removing CPUs often. Maybe it has anything to do with that? As per the original report

Re: workqueue: WARN at at kernel/workqueue.c:2176

2014-05-14 Thread Jason J. Herne
On 05/12/2014 10:17 PM, Sasha Levin wrote: I don't have an easy way to reproduce it as I only saw the bug once, but it happened when I started pressuring CPU hotplug paths by adding and removing CPUs often. Maybe it has anything to do with that? As per the original report

Re: workqueue: WARN at at kernel/workqueue.c:2176

2014-05-12 Thread Sasha Levin
On 05/12/2014 10:19 PM, Lai Jiangshan wrote: > On 05/13/2014 04:01 AM, Tejun Heo wrote: >> > On Mon, May 12, 2014 at 02:58:55PM -0400, Sasha Levin wrote: >>> >> Hi all, >>> >> >>> >> While fuzzing with trinity inside a KVM tools guest running the latest >>> >> -next >>> >> kernel I've stumbled on

Re: workqueue: WARN at at kernel/workqueue.c:2176

2014-05-12 Thread Lai Jiangshan
On 05/13/2014 04:01 AM, Tejun Heo wrote: > On Mon, May 12, 2014 at 02:58:55PM -0400, Sasha Levin wrote: >> Hi all, >> >> While fuzzing with trinity inside a KVM tools guest running the latest -next >> kernel I've stumbled on the following spew: >> >> [ 1297.886670] WARNING: CPU: 0 PID: 190 at

Re: workqueue: WARN at at kernel/workqueue.c:2176

2014-05-12 Thread Tejun Heo
On Mon, May 12, 2014 at 02:58:55PM -0400, Sasha Levin wrote: > Hi all, > > While fuzzing with trinity inside a KVM tools guest running the latest -next > kernel I've stumbled on the following spew: > > [ 1297.886670] WARNING: CPU: 0 PID: 190 at kernel/workqueue.c:2176 >

workqueue: WARN at at kernel/workqueue.c:2176

2014-05-12 Thread Sasha Levin
Hi all, While fuzzing with trinity inside a KVM tools guest running the latest -next kernel I've stumbled on the following spew: [ 1297.886670] WARNING: CPU: 0 PID: 190 at kernel/workqueue.c:2176 process_one_work+0xb5/0x6f0() [ 1297.889216] Modules linked in: [ 1297.890306] CPU: 0 PID: 190

workqueue: WARN at at kernel/workqueue.c:2176

2014-05-12 Thread Sasha Levin
Hi all, While fuzzing with trinity inside a KVM tools guest running the latest -next kernel I've stumbled on the following spew: [ 1297.886670] WARNING: CPU: 0 PID: 190 at kernel/workqueue.c:2176 process_one_work+0xb5/0x6f0() [ 1297.889216] Modules linked in: [ 1297.890306] CPU: 0 PID: 190

Re: workqueue: WARN at at kernel/workqueue.c:2176

2014-05-12 Thread Tejun Heo
On Mon, May 12, 2014 at 02:58:55PM -0400, Sasha Levin wrote: Hi all, While fuzzing with trinity inside a KVM tools guest running the latest -next kernel I've stumbled on the following spew: [ 1297.886670] WARNING: CPU: 0 PID: 190 at kernel/workqueue.c:2176 process_one_work+0xb5/0x6f0()

Re: workqueue: WARN at at kernel/workqueue.c:2176

2014-05-12 Thread Lai Jiangshan
On 05/13/2014 04:01 AM, Tejun Heo wrote: On Mon, May 12, 2014 at 02:58:55PM -0400, Sasha Levin wrote: Hi all, While fuzzing with trinity inside a KVM tools guest running the latest -next kernel I've stumbled on the following spew: [ 1297.886670] WARNING: CPU: 0 PID: 190 at

Re: workqueue: WARN at at kernel/workqueue.c:2176

2014-05-12 Thread Sasha Levin
On 05/12/2014 10:19 PM, Lai Jiangshan wrote: On 05/13/2014 04:01 AM, Tejun Heo wrote: On Mon, May 12, 2014 at 02:58:55PM -0400, Sasha Levin wrote: Hi all, While fuzzing with trinity inside a KVM tools guest running the latest -next kernel I've stumbled on the following spew: [