Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.

2012-10-19 Thread Peter Zijlstra
On Wed, 2012-10-17 at 20:29 -0700, David Rientjes wrote: > > Ok, thanks for the update. I agree that we should be clearing the mapping > at node hot-remove since any cpu that would subsequently get onlined and > assume one of the previous cpu's ids is not guaranteed to have the same >

Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.

2012-10-19 Thread Peter Zijlstra
On Wed, 2012-10-17 at 20:29 -0700, David Rientjes wrote: Ok, thanks for the update. I agree that we should be clearing the mapping at node hot-remove since any cpu that would subsequently get onlined and assume one of the previous cpu's ids is not guaranteed to have the same affinity.

Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.

2012-10-17 Thread David Rientjes
On Thu, 18 Oct 2012, Tang Chen wrote: > We are working on this problem. Since it is complicated, it really > takes us some time. Sorry for the delay. :) > > Actually, we intend to clear cpu-to-node mappings when a whole node is > removed. But the node hot-plug code is still under development, so

Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.

2012-10-17 Thread Tang Chen
On 10/18/2012 08:52 AM, David Rientjes wrote: On Wed, 10 Oct 2012, David Rientjes wrote: Ok, so it's been a week and these patches are still in -mm. This is what I was afraid of: patches that both Peter and I nacked sitting in -mm and allow a NULL pointer dereference because no alternative

Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.

2012-10-17 Thread David Rientjes
On Wed, 10 Oct 2012, David Rientjes wrote: > It fixes a BUG() that only affects users who are doing node hot-remove, > which is still radically under development, and nobody cares about except > those on the cc list, but it also introduces the NULL pointer dereference > that is attempting to

Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.

2012-10-17 Thread David Rientjes
On Wed, 10 Oct 2012, David Rientjes wrote: It fixes a BUG() that only affects users who are doing node hot-remove, which is still radically under development, and nobody cares about except those on the cc list, but it also introduces the NULL pointer dereference that is attempting to be

Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.

2012-10-17 Thread Tang Chen
On 10/18/2012 08:52 AM, David Rientjes wrote: On Wed, 10 Oct 2012, David Rientjes wrote: Ok, so it's been a week and these patches are still in -mm. This is what I was afraid of: patches that both Peter and I nacked sitting in -mm and allow a NULL pointer dereference because no alternative

Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.

2012-10-17 Thread David Rientjes
On Thu, 18 Oct 2012, Tang Chen wrote: We are working on this problem. Since it is complicated, it really takes us some time. Sorry for the delay. :) Actually, we intend to clear cpu-to-node mappings when a whole node is removed. But the node hot-plug code is still under development, so I

Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.

2012-10-10 Thread David Rientjes
On Wed, 10 Oct 2012, Andrew Morton wrote: > > > So for now, let me NACK that patch. You cannot go change stuff like > > > that. > > > > > > > Agreed, that makes the nack-count up to 2 now. Andrew, please remove > > cpu_hotplug-unmap-cpu2node-when-the-cpu-is-hotremoved.patch > >

Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.

2012-10-10 Thread Andrew Morton
On Wed, 10 Oct 2012 13:30:29 -0700 (PDT) David Rientjes wrote: > > So for now, let me NACK that patch. You cannot go change stuff like > > that. > > > > Agreed, that makes the nack-count up to 2 now. Andrew, please remove > cpu_hotplug-unmap-cpu2node-when-the-cpu-is-hotremoved.patch >

Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.

2012-10-10 Thread David Rientjes
On Wed, 10 Oct 2012, Peter Zijlstra wrote: > > If cpu_to_node() always returns a valid node id even if all cpus on the > > node are offline, then the cpumask_of_node() implementation, which the > > sched code is using, should either return an empty cpumask (if > > node_to_cpumask_map[nid]

Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.

2012-10-10 Thread Peter Zijlstra
On Wed, 2012-10-10 at 18:10 +0800, Wen Congyang wrote: > I use ./scripts/get_maintainer.pl, and it doesn't tell me that I should cc > you when I post that patch. That script doesn't look at all usage sites of the code you modify does it? You need to audit the entire tree for usage of the

Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.

2012-10-10 Thread Wen Congyang
At 10/10/2012 05:51 PM, Peter Zijlstra Wrote: > On Wed, 2012-10-10 at 17:33 +0800, Wen Congyang wrote: >> >> Hmm, if per-cpu memory is preserved, and we can't offline and remove >> this memory. So we can't offline the node. >> >> But, if the node is hot added, and per-cpu memory doesn't use the >>

Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.

2012-10-10 Thread Peter Zijlstra
On Wed, 2012-10-10 at 17:33 +0800, Wen Congyang wrote: > > Hmm, if per-cpu memory is preserved, and we can't offline and remove > this memory. So we can't offline the node. > > But, if the node is hot added, and per-cpu memory doesn't use the > memory on this node. We can hotremove cpu/memory on

Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.

2012-10-10 Thread Wen Congyang
At 10/10/2012 05:10 PM, Peter Zijlstra Wrote: > On Tue, 2012-10-09 at 16:27 -0700, David Rientjes wrote: >> On Tue, 9 Oct 2012, Peter Zijlstra wrote: >> >>> Well the code they were patching is in the wakeup path. As I think Tang >>> said, we leave !runnable tasks on whatever cpu they ran on last,

Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.

2012-10-10 Thread Peter Zijlstra
On Tue, 2012-10-09 at 16:27 -0700, David Rientjes wrote: > On Tue, 9 Oct 2012, Peter Zijlstra wrote: > > > Well the code they were patching is in the wakeup path. As I think Tang > > said, we leave !runnable tasks on whatever cpu they ran on last, even if > > that cpu is offlined, we try and fix

Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.

2012-10-10 Thread Peter Zijlstra
On Tue, 2012-10-09 at 16:27 -0700, David Rientjes wrote: On Tue, 9 Oct 2012, Peter Zijlstra wrote: Well the code they were patching is in the wakeup path. As I think Tang said, we leave !runnable tasks on whatever cpu they ran on last, even if that cpu is offlined, we try and fix up state

Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.

2012-10-10 Thread Wen Congyang
At 10/10/2012 05:10 PM, Peter Zijlstra Wrote: On Tue, 2012-10-09 at 16:27 -0700, David Rientjes wrote: On Tue, 9 Oct 2012, Peter Zijlstra wrote: Well the code they were patching is in the wakeup path. As I think Tang said, we leave !runnable tasks on whatever cpu they ran on last, even if

Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.

2012-10-10 Thread Peter Zijlstra
On Wed, 2012-10-10 at 17:33 +0800, Wen Congyang wrote: Hmm, if per-cpu memory is preserved, and we can't offline and remove this memory. So we can't offline the node. But, if the node is hot added, and per-cpu memory doesn't use the memory on this node. We can hotremove cpu/memory on this

Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.

2012-10-10 Thread Wen Congyang
At 10/10/2012 05:51 PM, Peter Zijlstra Wrote: On Wed, 2012-10-10 at 17:33 +0800, Wen Congyang wrote: Hmm, if per-cpu memory is preserved, and we can't offline and remove this memory. So we can't offline the node. But, if the node is hot added, and per-cpu memory doesn't use the memory on

Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.

2012-10-10 Thread Peter Zijlstra
On Wed, 2012-10-10 at 18:10 +0800, Wen Congyang wrote: I use ./scripts/get_maintainer.pl, and it doesn't tell me that I should cc you when I post that patch. That script doesn't look at all usage sites of the code you modify does it? You need to audit the entire tree for usage of the

Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.

2012-10-10 Thread David Rientjes
On Wed, 10 Oct 2012, Peter Zijlstra wrote: If cpu_to_node() always returns a valid node id even if all cpus on the node are offline, then the cpumask_of_node() implementation, which the sched code is using, should either return an empty cpumask (if node_to_cpumask_map[nid] isn't freed)

Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.

2012-10-10 Thread Andrew Morton
On Wed, 10 Oct 2012 13:30:29 -0700 (PDT) David Rientjes rient...@google.com wrote: So for now, let me NACK that patch. You cannot go change stuff like that. Agreed, that makes the nack-count up to 2 now. Andrew, please remove

Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.

2012-10-10 Thread David Rientjes
On Wed, 10 Oct 2012, Andrew Morton wrote: So for now, let me NACK that patch. You cannot go change stuff like that. Agreed, that makes the nack-count up to 2 now. Andrew, please remove cpu_hotplug-unmap-cpu2node-when-the-cpu-is-hotremoved.patch

Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.

2012-10-09 Thread Wen Congyang
At 10/10/2012 10:06 AM, Wen Congyang Wrote: > At 10/10/2012 07:27 AM, David Rientjes Wrote: >> On Tue, 9 Oct 2012, Peter Zijlstra wrote: >> >>> Well the code they were patching is in the wakeup path. As I think Tang >>> said, we leave !runnable tasks on whatever cpu they ran on last, even if >>>

Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.

2012-10-09 Thread Wen Congyang
At 10/10/2012 07:27 AM, David Rientjes Wrote: > On Tue, 9 Oct 2012, Peter Zijlstra wrote: > >> Well the code they were patching is in the wakeup path. As I think Tang >> said, we leave !runnable tasks on whatever cpu they ran on last, even if >> that cpu is offlined, we try and fix up state when

Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.

2012-10-09 Thread David Rientjes
On Tue, 9 Oct 2012, Peter Zijlstra wrote: > Well the code they were patching is in the wakeup path. As I think Tang > said, we leave !runnable tasks on whatever cpu they ran on last, even if > that cpu is offlined, we try and fix up state when we get a wakeup. > > On wakeup, it tries to find a

Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.

2012-10-09 Thread Peter Zijlstra
On Tue, 2012-10-09 at 13:36 -0700, David Rientjes wrote: > On Tue, 9 Oct 2012, Peter Zijlstra wrote: > > > On Mon, 2012-10-08 at 10:59 +0800, Tang Chen wrote: > > > If a cpu is offline, its nid will be set to -1, and cpu_to_node(cpu) will > > > return -1. As a result, cpumask_of_node(nid) will

Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.

2012-10-09 Thread David Rientjes
On Tue, 9 Oct 2012, Peter Zijlstra wrote: > On Mon, 2012-10-08 at 10:59 +0800, Tang Chen wrote: > > If a cpu is offline, its nid will be set to -1, and cpu_to_node(cpu) will > > return -1. As a result, cpumask_of_node(nid) will return NULL. In this case, > > find_next_bit() in for_each_cpu will

Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.

2012-10-09 Thread David Rientjes
On Tue, 9 Oct 2012, Wen Congyang wrote: > I clear cpu-to-node mapping when the cpu is hotremoved. If the cpu is onlined, > it will be offlined before clearing cpu-to-node mapping. > > Here is the code in driver/acpi/processor_driver.c: > = > static int

Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.

2012-10-09 Thread Peter Zijlstra
On Mon, 2012-10-08 at 10:59 +0800, Tang Chen wrote: > If a cpu is offline, its nid will be set to -1, and cpu_to_node(cpu) will > return -1. As a result, cpumask_of_node(nid) will return NULL. In this case, > find_next_bit() in for_each_cpu will get a NULL pointer and cause panic. Hurm,. this is

Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.

2012-10-09 Thread Wen Congyang
At 10/09/2012 06:04 PM, David Rientjes Wrote: > On Tue, 9 Oct 2012, Tang Chen wrote: > Eek, the nid shouldn't be -1 yet, though, for cpu hotplug since this should be called at CPU_DYING level and migrate_tasks() still sees a valid cpu. >> >> As Wen said below, nid is now set to -1

Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.

2012-10-09 Thread David Rientjes
On Tue, 9 Oct 2012, Tang Chen wrote: > > > Eek, the nid shouldn't be -1 yet, though, for cpu hotplug since this > > > should be called at CPU_DYING level and migrate_tasks() still sees a valid > > > cpu. > > As Wen said below, nid is now set to -1 when cpu is hotremoved. > I reproduce this

Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.

2012-10-09 Thread Tang Chen
Hi David, Thanks for reviewing this patch. :) On 10/09/2012 04:34 PM, Wen Congyang wrote: At 10/09/2012 02:21 PM, David Rientjes Wrote: On Mon, 8 Oct 2012, Tang Chen wrote: + /* If the cpu has been offlined, its nid was set to -1. */ + if (nid != -1) { NUMA_NO_NODE. Yes,

Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.

2012-10-09 Thread Wen Congyang
At 10/09/2012 02:21 PM, David Rientjes Wrote: > On Mon, 8 Oct 2012, Tang Chen wrote: > >> diff --git a/kernel/sched/core.c b/kernel/sched/core.c >> index 66b36ab..e76dce9 100644 >> --- a/kernel/sched/core.c >> +++ b/kernel/sched/core.c >> @@ -1263,18 +1263,24 @@ EXPORT_SYMBOL_GPL(kick_process);

Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.

2012-10-09 Thread David Rientjes
On Mon, 8 Oct 2012, Tang Chen wrote: > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > index 66b36ab..e76dce9 100644 > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -1263,18 +1263,24 @@ EXPORT_SYMBOL_GPL(kick_process); > */ > static int select_fallback_rq(int cpu, struct

Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.

2012-10-09 Thread David Rientjes
On Mon, 8 Oct 2012, Tang Chen wrote: diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 66b36ab..e76dce9 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -1263,18 +1263,24 @@ EXPORT_SYMBOL_GPL(kick_process); */ static int select_fallback_rq(int cpu, struct

Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.

2012-10-09 Thread Wen Congyang
At 10/09/2012 02:21 PM, David Rientjes Wrote: On Mon, 8 Oct 2012, Tang Chen wrote: diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 66b36ab..e76dce9 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -1263,18 +1263,24 @@ EXPORT_SYMBOL_GPL(kick_process); */

Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.

2012-10-09 Thread Tang Chen
Hi David, Thanks for reviewing this patch. :) On 10/09/2012 04:34 PM, Wen Congyang wrote: At 10/09/2012 02:21 PM, David Rientjes Wrote: On Mon, 8 Oct 2012, Tang Chen wrote: + /* If the cpu has been offlined, its nid was set to -1. */ + if (nid != -1) { NUMA_NO_NODE. Yes,

Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.

2012-10-09 Thread David Rientjes
On Tue, 9 Oct 2012, Tang Chen wrote: Eek, the nid shouldn't be -1 yet, though, for cpu hotplug since this should be called at CPU_DYING level and migrate_tasks() still sees a valid cpu. As Wen said below, nid is now set to -1 when cpu is hotremoved. I reproduce this problem in this

Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.

2012-10-09 Thread Wen Congyang
At 10/09/2012 06:04 PM, David Rientjes Wrote: On Tue, 9 Oct 2012, Tang Chen wrote: Eek, the nid shouldn't be -1 yet, though, for cpu hotplug since this should be called at CPU_DYING level and migrate_tasks() still sees a valid cpu. As Wen said below, nid is now set to -1 when cpu is

Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.

2012-10-09 Thread Peter Zijlstra
On Mon, 2012-10-08 at 10:59 +0800, Tang Chen wrote: If a cpu is offline, its nid will be set to -1, and cpu_to_node(cpu) will return -1. As a result, cpumask_of_node(nid) will return NULL. In this case, find_next_bit() in for_each_cpu will get a NULL pointer and cause panic. Hurm,. this is

Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.

2012-10-09 Thread David Rientjes
On Tue, 9 Oct 2012, Wen Congyang wrote: I clear cpu-to-node mapping when the cpu is hotremoved. If the cpu is onlined, it will be offlined before clearing cpu-to-node mapping. Here is the code in driver/acpi/processor_driver.c: = static int acpi_processor_handle_eject(struct

Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.

2012-10-09 Thread David Rientjes
On Tue, 9 Oct 2012, Peter Zijlstra wrote: On Mon, 2012-10-08 at 10:59 +0800, Tang Chen wrote: If a cpu is offline, its nid will be set to -1, and cpu_to_node(cpu) will return -1. As a result, cpumask_of_node(nid) will return NULL. In this case, find_next_bit() in for_each_cpu will get a

Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.

2012-10-09 Thread Peter Zijlstra
On Tue, 2012-10-09 at 13:36 -0700, David Rientjes wrote: On Tue, 9 Oct 2012, Peter Zijlstra wrote: On Mon, 2012-10-08 at 10:59 +0800, Tang Chen wrote: If a cpu is offline, its nid will be set to -1, and cpu_to_node(cpu) will return -1. As a result, cpumask_of_node(nid) will return NULL.

Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.

2012-10-09 Thread David Rientjes
On Tue, 9 Oct 2012, Peter Zijlstra wrote: Well the code they were patching is in the wakeup path. As I think Tang said, we leave !runnable tasks on whatever cpu they ran on last, even if that cpu is offlined, we try and fix up state when we get a wakeup. On wakeup, it tries to find a cpu to

Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.

2012-10-09 Thread Wen Congyang
At 10/10/2012 07:27 AM, David Rientjes Wrote: On Tue, 9 Oct 2012, Peter Zijlstra wrote: Well the code they were patching is in the wakeup path. As I think Tang said, we leave !runnable tasks on whatever cpu they ran on last, even if that cpu is offlined, we try and fix up state when we get a

Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.

2012-10-09 Thread Wen Congyang
At 10/10/2012 10:06 AM, Wen Congyang Wrote: At 10/10/2012 07:27 AM, David Rientjes Wrote: On Tue, 9 Oct 2012, Peter Zijlstra wrote: Well the code they were patching is in the wakeup path. As I think Tang said, we leave !runnable tasks on whatever cpu they ran on last, even if that cpu is

[PATCH] Do not use cpu_to_node() to find an offlined cpu's node.

2012-10-07 Thread Tang Chen
If a cpu is offline, its nid will be set to -1, and cpu_to_node(cpu) will return -1. As a result, cpumask_of_node(nid) will return NULL. In this case, find_next_bit() in for_each_cpu will get a NULL pointer and cause panic. Here is a call trace: [ 609.824017] Call Trace: [ 609.824017] [

[PATCH] Do not use cpu_to_node() to find an offlined cpu's node.

2012-10-07 Thread Tang Chen
If a cpu is offline, its nid will be set to -1, and cpu_to_node(cpu) will return -1. As a result, cpumask_of_node(nid) will return NULL. In this case, find_next_bit() in for_each_cpu will get a NULL pointer and cause panic. Here is a call trace: [ 609.824017] Call Trace: [ 609.824017] IRQ [