subject:"\[PATCH\] Do not use cpu_to_node\(\) to find an offlined cpu's node."

Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.

2012-10-19 Thread Peter Zijlstra

On Wed, 2012-10-17 at 20:29 -0700, David Rientjes wrote:
> 
> Ok, thanks for the update.  I agree that we should be clearing the mapping 
> at node hot-remove since any cpu that would subsequently get onlined and 
> assume one of the previous cpu's ids is not guaranteed to have the same 
> affinity.

Would this mean we have to remap (and memcpy) per-cpu memory on
node-plug?

> I'm just really hoping that we don't touch the acpi code and that we can 
> remove both cpu_hotplug-unmap-cpu2node-when-the-cpu-is-hotremoved.patch 
> and cpu_hotplug-unmap-cpu2node-when-the-cpu-is-hotremoved-fix.patch from 
> -mm.

Yeah, none of this should be anywhere near ACPI, its got nothing to do
with ACPI. Furthermore it should be be same across all archs, not just
be weird and wonderful for x86.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.

2012-10-19 Thread Peter Zijlstra

On Wed, 2012-10-17 at 20:29 -0700, David Rientjes wrote:
 
 Ok, thanks for the update.  I agree that we should be clearing the mapping 
 at node hot-remove since any cpu that would subsequently get onlined and 
 assume one of the previous cpu's ids is not guaranteed to have the same 
 affinity.

Would this mean we have to remap (and memcpy) per-cpu memory on
node-plug?

 I'm just really hoping that we don't touch the acpi code and that we can 
 remove both cpu_hotplug-unmap-cpu2node-when-the-cpu-is-hotremoved.patch 
 and cpu_hotplug-unmap-cpu2node-when-the-cpu-is-hotremoved-fix.patch from 
 -mm.

Yeah, none of this should be anywhere near ACPI, its got nothing to do
with ACPI. Furthermore it should be be same across all archs, not just
be weird and wonderful for x86.


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.

2012-10-17 Thread David Rientjes

On Thu, 18 Oct 2012, Tang Chen wrote:

> We are working on this problem. Since it is complicated, it really
> takes us some time. Sorry for the delay. :)
> 
> Actually, we intend to clear cpu-to-node mappings when a whole node is
> removed. But the node hot-plug code is still under development, so I
> think Wen will send a fix patch soon. :)
> 

Ok, thanks for the update.  I agree that we should be clearing the mapping 
at node hot-remove since any cpu that would subsequently get onlined and 
assume one of the previous cpu's ids is not guaranteed to have the same 
affinity.

I'm just really hoping that we don't touch the acpi code and that we can 
remove both cpu_hotplug-unmap-cpu2node-when-the-cpu-is-hotremoved.patch 
and cpu_hotplug-unmap-cpu2node-when-the-cpu-is-hotremoved-fix.patch from 
-mm.

Thanks again!
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.

2012-10-17 Thread Tang Chen


On 10/18/2012 08:52 AM, David Rientjes wrote:

On Wed, 10 Oct 2012, David Rientjes wrote:

Ok, so it's been a week and these patches are still in -mm.  This is what
I was afraid of: patches that both Peter and I nacked sitting in -mm and
allow a NULL pointer dereference because no alternative patch exists yet
to fix the issue correctly.

Tang and Wen, are you intending on addressing these problems (i.e. not
touching the acpi code at all and rather clearing cpu-to-node mappings at
node hot-remove) as we've discussed or do I need to do it myself?


Hi David,

We are working on this problem. Since it is complicated, it really
takes us some time. Sorry for the delay. :)

Actually, we intend to clear cpu-to-node mappings when a whole node is
removed. But the node hot-plug code is still under development, so I
think Wen will send a fix patch soon. :)

Thanks.



Thanks.



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.

2012-10-17 Thread David Rientjes

On Wed, 10 Oct 2012, David Rientjes wrote:

> It fixes a BUG() that only affects users who are doing node hot-remove, 
> which is still radically under development, and nobody cares about except 
> those on the cc list, but it also introduces the NULL pointer dereference 
> that is attempting to be addressed in this patch.  The "fix" that causes 
> this NULL pointer is clearly not the direction we want to go, I think we 
> have agreement at node hot-remove to iterate all possible cpus are map all 
> offline cpus with cpu_to_node(cpu) == node to NUMA_NO_NODE instead in the 
> generic hotplug code.
> 
> Regardless, this shouldn't be touching the acpi code which
> cpu_hotplug-unmap-cpu2node-when-the-cpu-is-hotremoved.patch and 
> cpu_hotplug-unmap-cpu2node-when-the-cpu-is-hotremoved-fix.patch do since 
> it makes the behavior inconsistent across interfaces and architectures.
> 

Ok, so it's been a week and these patches are still in -mm.  This is what 
I was afraid of: patches that both Peter and I nacked sitting in -mm and 
allow a NULL pointer dereference because no alternative patch exists yet 
to fix the issue correctly.

Tang and Wen, are you intending on addressing these problems (i.e. not 
touching the acpi code at all and rather clearing cpu-to-node mappings at 
node hot-remove) as we've discussed or do I need to do it myself?

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.

2012-10-17 Thread David Rientjes

On Wed, 10 Oct 2012, David Rientjes wrote:

 It fixes a BUG() that only affects users who are doing node hot-remove, 
 which is still radically under development, and nobody cares about except 
 those on the cc list, but it also introduces the NULL pointer dereference 
 that is attempting to be addressed in this patch.  The fix that causes 
 this NULL pointer is clearly not the direction we want to go, I think we 
 have agreement at node hot-remove to iterate all possible cpus are map all 
 offline cpus with cpu_to_node(cpu) == node to NUMA_NO_NODE instead in the 
 generic hotplug code.
 
 Regardless, this shouldn't be touching the acpi code which
 cpu_hotplug-unmap-cpu2node-when-the-cpu-is-hotremoved.patch and 
 cpu_hotplug-unmap-cpu2node-when-the-cpu-is-hotremoved-fix.patch do since 
 it makes the behavior inconsistent across interfaces and architectures.
 

Ok, so it's been a week and these patches are still in -mm.  This is what 
I was afraid of: patches that both Peter and I nacked sitting in -mm and 
allow a NULL pointer dereference because no alternative patch exists yet 
to fix the issue correctly.

Tang and Wen, are you intending on addressing these problems (i.e. not 
touching the acpi code at all and rather clearing cpu-to-node mappings at 
node hot-remove) as we've discussed or do I need to do it myself?

Thanks.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.

2012-10-17 Thread Tang Chen


On 10/18/2012 08:52 AM, David Rientjes wrote:

On Wed, 10 Oct 2012, David Rientjes wrote:

Ok, so it's been a week and these patches are still in -mm.  This is what
I was afraid of: patches that both Peter and I nacked sitting in -mm and
allow a NULL pointer dereference because no alternative patch exists yet
to fix the issue correctly.

Tang and Wen, are you intending on addressing these problems (i.e. not
touching the acpi code at all and rather clearing cpu-to-node mappings at
node hot-remove) as we've discussed or do I need to do it myself?


Hi David,

We are working on this problem. Since it is complicated, it really
takes us some time. Sorry for the delay. :)

Actually, we intend to clear cpu-to-node mappings when a whole node is
removed. But the node hot-plug code is still under development, so I
think Wen will send a fix patch soon. :)

Thanks.



Thanks.



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.

2012-10-17 Thread David Rientjes

On Thu, 18 Oct 2012, Tang Chen wrote:

 We are working on this problem. Since it is complicated, it really
 takes us some time. Sorry for the delay. :)
 
 Actually, we intend to clear cpu-to-node mappings when a whole node is
 removed. But the node hot-plug code is still under development, so I
 think Wen will send a fix patch soon. :)
 

Ok, thanks for the update.  I agree that we should be clearing the mapping 
at node hot-remove since any cpu that would subsequently get onlined and 
assume one of the previous cpu's ids is not guaranteed to have the same 
affinity.

I'm just really hoping that we don't touch the acpi code and that we can 
remove both cpu_hotplug-unmap-cpu2node-when-the-cpu-is-hotremoved.patch 
and cpu_hotplug-unmap-cpu2node-when-the-cpu-is-hotremoved-fix.patch from 
-mm.

Thanks again!
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.

2012-10-10 Thread David Rientjes

On Wed, 10 Oct 2012, Andrew Morton wrote:

> > > So for now, let me NACK that patch. You cannot go change stuff like
> > > that.
> > > 
> > 
> > Agreed, that makes the nack-count up to 2 now.  Andrew, please remove 
> > cpu_hotplug-unmap-cpu2node-when-the-cpu-is-hotremoved.patch
> > cpu_hotplug-unmap-cpu2node-when-the-cpu-is-hotremoved-fix.patch
> > from -mm.
> 
> Nope.  It fixes a BUG() and so I'll be keeping it around until I see a
> better fix.  It's one of the ways in which I prevent things from falling
> through cracks.
> 

It fixes a BUG() that only affects users who are doing node hot-remove, 
which is still radically under development, and nobody cares about except 
those on the cc list, but it also introduces the NULL pointer dereference 
that is attempting to be addressed in this patch.  The "fix" that causes 
this NULL pointer is clearly not the direction we want to go, I think we 
have agreement at node hot-remove to iterate all possible cpus are map all 
offline cpus with cpu_to_node(cpu) == node to NUMA_NO_NODE instead in the 
generic hotplug code.

Regardless, this shouldn't be touching the acpi code which
cpu_hotplug-unmap-cpu2node-when-the-cpu-is-hotremoved.patch and 
cpu_hotplug-unmap-cpu2node-when-the-cpu-is-hotremoved-fix.patch do since 
it makes the behavior inconsistent across interfaces and architectures.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.

2012-10-10 Thread Andrew Morton

On Wed, 10 Oct 2012 13:30:29 -0700 (PDT)
David Rientjes  wrote:

> > So for now, let me NACK that patch. You cannot go change stuff like
> > that.
> > 
> 
> Agreed, that makes the nack-count up to 2 now.  Andrew, please remove 
> cpu_hotplug-unmap-cpu2node-when-the-cpu-is-hotremoved.patch
> cpu_hotplug-unmap-cpu2node-when-the-cpu-is-hotremoved-fix.patch
> from -mm.

Nope.  It fixes a BUG() and so I'll be keeping it around until I see a
better fix.  It's one of the ways in which I prevent things from falling
through cracks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.

2012-10-10 Thread David Rientjes

On Wed, 10 Oct 2012, Peter Zijlstra wrote:

> > If cpu_to_node() always returns a valid node id even if all cpus on the 
> > node are offline, then the cpumask_of_node() implementation, which the 
> > sched code is using, should either return an empty cpumask (if 
> > node_to_cpumask_map[nid] isn't freed) or cpu_online_mask.  The change in 
> > behavior here occurred because 
> > cpu_hotplug-unmap-cpu2node-when-the-cpu-is-hotremoved.patch in -mm doesn't 
> > return a valid node id and forces it to return -1 so a kzalloc_node(..., 
> > -1) fallsback to allocate anywhere.
> 
> I think that's broken semantics.. so far the entire cpu<->node mapping
> was invariant during hotplug. Changing that is going to be _very_
> interesting and cannot be done lightly.
> 
> Because as I said, per-cpu memory is preserved over hotplug, and that
> has numa affinity.
> 
> So for now, let me NACK that patch. You cannot go change stuff like
> that.
> 

Agreed, that makes the nack-count up to 2 now.  Andrew, please remove 
cpu_hotplug-unmap-cpu2node-when-the-cpu-is-hotremoved.patch
cpu_hotplug-unmap-cpu2node-when-the-cpu-is-hotremoved-fix.patch
from -mm.

> > But if you only need cpu_to_node() when waking up to find a runnable cpu 
> > for this NUMA information, then I think you can just change the 
> > kzalloc_node() in alloc_{fair,rt}_sched_group() to do 
> > kzalloc(..., cpu_online(cpu) ? cpu_to_node(cpu) : NUMA_NO_NODE).
> 
> That's a confusing statement, the wakeup stuff and the
> alloc_{fair,rt}_sched_group() stuff are unrelated, although both sites
> might need fixing if we're going to go ahead with this.
> 

The alternative is for node hot-remove to do an iteration of all possible 
cpus and set cpu-to-node to be NUMA_NO_NODE for all offlined cpus that map 
to that node.  If cpu_online() is true for any of those cpus, then 
obviously it can't be offlined.  We want to do this so that 
kzalloc_node(..., cpu_to_node()) fallsback to allocating from any node, 
which it should, and because a subsequent node hot-add event that reuses 
the same node id may not be the same node.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.

2012-10-10 Thread Peter Zijlstra

On Wed, 2012-10-10 at 18:10 +0800, Wen Congyang wrote:
> I use ./scripts/get_maintainer.pl, and it doesn't tell me that I should cc
> you when I post that patch. 

That script doesn't look at all usage sites of the code you modify does
it?

You need to audit the entire tree for usage of the interfaces you change
and email all relevant people for whoem you're planning to break stuff
-- preferably with a proposed fix.

That's manual work, no script will ever do this for you.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.

2012-10-10 Thread Wen Congyang

At 10/10/2012 05:51 PM, Peter Zijlstra Wrote:
> On Wed, 2012-10-10 at 17:33 +0800, Wen Congyang wrote:
>>
>> Hmm, if per-cpu memory is preserved, and we can't offline and remove
>> this memory. So we can't offline the node.
>>
>> But, if the node is hot added, and per-cpu memory doesn't use the
>> memory on this node. We can hotremove cpu/memory on this node, and then
>> offline this node.
>>
>> Before the cpu is hotadded, cpu's node is -1. We set cpu<->node mapping
>> when it is hotadded. So the entire cpu<->node mapping was not invariant
>> during hotplug.
>>
>> So it is why I try to clear it when the cpu is hot-removed.
>>
>> As we need the mapping to migrate a task to the cpu on the same node first,
>> I think we can clear the mapping when the node is offlined.
> 
> Hmm maybe, but hardware that can hot-add is rare and nobody has it so
> nobody cares ;-)

Yes, nobody cares it now. But we have a such hardware, so I care it now.

> 
> But by clearing cpu_to_node on every hotplug you change semantics for
> all hardware and everybody gets to feel the pain.
> 
> I'm not saying you cannot change things, I'm only saying you should be
> far more careful about it, not change it and wait for things to break.
> Put in some effort to find things that might break and warn people --
> sure, you'll always miss some, and that's ok.

I use ./scripts/get_maintainer.pl, and it doesn't tell me that I should cc
you when I post that patch.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.

2012-10-10 Thread Peter Zijlstra

On Wed, 2012-10-10 at 17:33 +0800, Wen Congyang wrote:
> 
> Hmm, if per-cpu memory is preserved, and we can't offline and remove
> this memory. So we can't offline the node.
> 
> But, if the node is hot added, and per-cpu memory doesn't use the
> memory on this node. We can hotremove cpu/memory on this node, and then
> offline this node.
> 
> Before the cpu is hotadded, cpu's node is -1. We set cpu<->node mapping
> when it is hotadded. So the entire cpu<->node mapping was not invariant
> during hotplug.
> 
> So it is why I try to clear it when the cpu is hot-removed.
> 
> As we need the mapping to migrate a task to the cpu on the same node first,
> I think we can clear the mapping when the node is offlined.

Hmm maybe, but hardware that can hot-add is rare and nobody has it so
nobody cares ;-)

But by clearing cpu_to_node on every hotplug you change semantics for
all hardware and everybody gets to feel the pain.

I'm not saying you cannot change things, I'm only saying you should be
far more careful about it, not change it and wait for things to break.
Put in some effort to find things that might break and warn people --
sure, you'll always miss some, and that's ok.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.

2012-10-10 Thread Wen Congyang

At 10/10/2012 05:10 PM, Peter Zijlstra Wrote:
> On Tue, 2012-10-09 at 16:27 -0700, David Rientjes wrote:
>> On Tue, 9 Oct 2012, Peter Zijlstra wrote:
>>
>>> Well the code they were patching is in the wakeup path. As I think Tang
>>> said, we leave !runnable tasks on whatever cpu they ran on last, even if
>>> that cpu is offlined, we try and fix up state when we get a wakeup.
>>>
>>> On wakeup, it tries to find a cpu to run on and will try a cpu of the
>>> same node first.
>>>
>>> Now if that node's entirely gone away, it appears the cpu_to_node() map
>>> will not return a valid node number.
>>>
>>> I think that's a change in behaviour, it didn't used to do that afaik.
>>> Certainly this code hasn't change in a while.
>>>
>>
>> If cpu_to_node() always returns a valid node id even if all cpus on the 
>> node are offline, then the cpumask_of_node() implementation, which the 
>> sched code is using, should either return an empty cpumask (if 
>> node_to_cpumask_map[nid] isn't freed) or cpu_online_mask.  The change in 
>> behavior here occurred because 
>> cpu_hotplug-unmap-cpu2node-when-the-cpu-is-hotremoved.patch in -mm doesn't 
>> return a valid node id and forces it to return -1 so a kzalloc_node(..., 
>> -1) fallsback to allocate anywhere.
> 
> I think that's broken semantics.. so far the entire cpu<->node mapping
> was invariant during hotplug. Changing that is going to be _very_
> interesting and cannot be done lightly.
> 
> Because as I said, per-cpu memory is preserved over hotplug, and that
> has numa affinity.

Hmm, if per-cpu memory is preserved, and we can't offline and remove
this memory. So we can't offline the node.

But, if the node is hot added, and per-cpu memory doesn't use the
memory on this node. We can hotremove cpu/memory on this node, and then
offline this node.

Before the cpu is hotadded, cpu's node is -1. We set cpu<->node mapping
when it is hotadded. So the entire cpu<->node mapping was not invariant
during hotplug.

So it is why I try to clear it when the cpu is hot-removed.

As we need the mapping to migrate a task to the cpu on the same node first,
I think we can clear the mapping when the node is offlined.

Thanks
Wen Congyang

> 
> So for now, let me NACK that patch. You cannot go change stuff like
> that.
> 
>>
>> But if you only need cpu_to_node() when waking up to find a runnable cpu 
>> for this NUMA information, then I think you can just change the 
>> kzalloc_node() in alloc_{fair,rt}_sched_group() to do 
>> kzalloc(..., cpu_online(cpu) ? cpu_to_node(cpu) : NUMA_NO_NODE).
> 
> That's a confusing statement, the wakeup stuff and the
> alloc_{fair,rt}_sched_group() stuff are unrelated, although both sites
> might need fixing if we're going to go ahead with this.
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.

2012-10-10 Thread Peter Zijlstra

On Tue, 2012-10-09 at 16:27 -0700, David Rientjes wrote:
> On Tue, 9 Oct 2012, Peter Zijlstra wrote:
> 
> > Well the code they were patching is in the wakeup path. As I think Tang
> > said, we leave !runnable tasks on whatever cpu they ran on last, even if
> > that cpu is offlined, we try and fix up state when we get a wakeup.
> > 
> > On wakeup, it tries to find a cpu to run on and will try a cpu of the
> > same node first.
> > 
> > Now if that node's entirely gone away, it appears the cpu_to_node() map
> > will not return a valid node number.
> > 
> > I think that's a change in behaviour, it didn't used to do that afaik.
> > Certainly this code hasn't change in a while.
> > 
> 
> If cpu_to_node() always returns a valid node id even if all cpus on the 
> node are offline, then the cpumask_of_node() implementation, which the 
> sched code is using, should either return an empty cpumask (if 
> node_to_cpumask_map[nid] isn't freed) or cpu_online_mask.  The change in 
> behavior here occurred because 
> cpu_hotplug-unmap-cpu2node-when-the-cpu-is-hotremoved.patch in -mm doesn't 
> return a valid node id and forces it to return -1 so a kzalloc_node(..., 
> -1) fallsback to allocate anywhere.

I think that's broken semantics.. so far the entire cpu<->node mapping
was invariant during hotplug. Changing that is going to be _very_
interesting and cannot be done lightly.

Because as I said, per-cpu memory is preserved over hotplug, and that
has numa affinity.

So for now, let me NACK that patch. You cannot go change stuff like
that.

> 
> But if you only need cpu_to_node() when waking up to find a runnable cpu 
> for this NUMA information, then I think you can just change the 
> kzalloc_node() in alloc_{fair,rt}_sched_group() to do 
> kzalloc(..., cpu_online(cpu) ? cpu_to_node(cpu) : NUMA_NO_NODE).

That's a confusing statement, the wakeup stuff and the
alloc_{fair,rt}_sched_group() stuff are unrelated, although both sites
might need fixing if we're going to go ahead with this.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.

2012-10-10 Thread Peter Zijlstra

On Tue, 2012-10-09 at 16:27 -0700, David Rientjes wrote:
 On Tue, 9 Oct 2012, Peter Zijlstra wrote:
 
  Well the code they were patching is in the wakeup path. As I think Tang
  said, we leave !runnable tasks on whatever cpu they ran on last, even if
  that cpu is offlined, we try and fix up state when we get a wakeup.
  
  On wakeup, it tries to find a cpu to run on and will try a cpu of the
  same node first.
  
  Now if that node's entirely gone away, it appears the cpu_to_node() map
  will not return a valid node number.
  
  I think that's a change in behaviour, it didn't used to do that afaik.
  Certainly this code hasn't change in a while.
  
 
 If cpu_to_node() always returns a valid node id even if all cpus on the 
 node are offline, then the cpumask_of_node() implementation, which the 
 sched code is using, should either return an empty cpumask (if 
 node_to_cpumask_map[nid] isn't freed) or cpu_online_mask.  The change in 
 behavior here occurred because 
 cpu_hotplug-unmap-cpu2node-when-the-cpu-is-hotremoved.patch in -mm doesn't 
 return a valid node id and forces it to return -1 so a kzalloc_node(..., 
 -1) fallsback to allocate anywhere.

I think that's broken semantics.. so far the entire cpu-node mapping
was invariant during hotplug. Changing that is going to be _very_
interesting and cannot be done lightly.

Because as I said, per-cpu memory is preserved over hotplug, and that
has numa affinity.

So for now, let me NACK that patch. You cannot go change stuff like
that.

 
 But if you only need cpu_to_node() when waking up to find a runnable cpu 
 for this NUMA information, then I think you can just change the 
 kzalloc_node() in alloc_{fair,rt}_sched_group() to do 
 kzalloc(..., cpu_online(cpu) ? cpu_to_node(cpu) : NUMA_NO_NODE).

That's a confusing statement, the wakeup stuff and the
alloc_{fair,rt}_sched_group() stuff are unrelated, although both sites
might need fixing if we're going to go ahead with this.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.

2012-10-10 Thread Wen Congyang

At 10/10/2012 05:10 PM, Peter Zijlstra Wrote:
 On Tue, 2012-10-09 at 16:27 -0700, David Rientjes wrote:
 On Tue, 9 Oct 2012, Peter Zijlstra wrote:

 Well the code they were patching is in the wakeup path. As I think Tang
 said, we leave !runnable tasks on whatever cpu they ran on last, even if
 that cpu is offlined, we try and fix up state when we get a wakeup.

 On wakeup, it tries to find a cpu to run on and will try a cpu of the
 same node first.

 Now if that node's entirely gone away, it appears the cpu_to_node() map
 will not return a valid node number.

 I think that's a change in behaviour, it didn't used to do that afaik.
 Certainly this code hasn't change in a while.


 If cpu_to_node() always returns a valid node id even if all cpus on the 
 node are offline, then the cpumask_of_node() implementation, which the 
 sched code is using, should either return an empty cpumask (if 
 node_to_cpumask_map[nid] isn't freed) or cpu_online_mask.  The change in 
 behavior here occurred because 
 cpu_hotplug-unmap-cpu2node-when-the-cpu-is-hotremoved.patch in -mm doesn't 
 return a valid node id and forces it to return -1 so a kzalloc_node(..., 
 -1) fallsback to allocate anywhere.
 
 I think that's broken semantics.. so far the entire cpu-node mapping
 was invariant during hotplug. Changing that is going to be _very_
 interesting and cannot be done lightly.
 
 Because as I said, per-cpu memory is preserved over hotplug, and that
 has numa affinity.

Hmm, if per-cpu memory is preserved, and we can't offline and remove
this memory. So we can't offline the node.

But, if the node is hot added, and per-cpu memory doesn't use the
memory on this node. We can hotremove cpu/memory on this node, and then
offline this node.

Before the cpu is hotadded, cpu's node is -1. We set cpu-node mapping
when it is hotadded. So the entire cpu-node mapping was not invariant
during hotplug.

So it is why I try to clear it when the cpu is hot-removed.

As we need the mapping to migrate a task to the cpu on the same node first,
I think we can clear the mapping when the node is offlined.

Thanks
Wen Congyang

 
 So for now, let me NACK that patch. You cannot go change stuff like
 that.
 

 But if you only need cpu_to_node() when waking up to find a runnable cpu 
 for this NUMA information, then I think you can just change the 
 kzalloc_node() in alloc_{fair,rt}_sched_group() to do 
 kzalloc(..., cpu_online(cpu) ? cpu_to_node(cpu) : NUMA_NO_NODE).
 
 That's a confusing statement, the wakeup stuff and the
 alloc_{fair,rt}_sched_group() stuff are unrelated, although both sites
 might need fixing if we're going to go ahead with this.
 

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.

2012-10-10 Thread Peter Zijlstra

On Wed, 2012-10-10 at 17:33 +0800, Wen Congyang wrote:
 
 Hmm, if per-cpu memory is preserved, and we can't offline and remove
 this memory. So we can't offline the node.
 
 But, if the node is hot added, and per-cpu memory doesn't use the
 memory on this node. We can hotremove cpu/memory on this node, and then
 offline this node.
 
 Before the cpu is hotadded, cpu's node is -1. We set cpu-node mapping
 when it is hotadded. So the entire cpu-node mapping was not invariant
 during hotplug.
 
 So it is why I try to clear it when the cpu is hot-removed.
 
 As we need the mapping to migrate a task to the cpu on the same node first,
 I think we can clear the mapping when the node is offlined.

Hmm maybe, but hardware that can hot-add is rare and nobody has it so
nobody cares ;-)

But by clearing cpu_to_node on every hotplug you change semantics for
all hardware and everybody gets to feel the pain.

I'm not saying you cannot change things, I'm only saying you should be
far more careful about it, not change it and wait for things to break.
Put in some effort to find things that might break and warn people --
sure, you'll always miss some, and that's ok.


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.

2012-10-10 Thread Wen Congyang

At 10/10/2012 05:51 PM, Peter Zijlstra Wrote:
 On Wed, 2012-10-10 at 17:33 +0800, Wen Congyang wrote:

 Hmm, if per-cpu memory is preserved, and we can't offline and remove
 this memory. So we can't offline the node.

 But, if the node is hot added, and per-cpu memory doesn't use the
 memory on this node. We can hotremove cpu/memory on this node, and then
 offline this node.

 Before the cpu is hotadded, cpu's node is -1. We set cpu-node mapping
 when it is hotadded. So the entire cpu-node mapping was not invariant
 during hotplug.

 So it is why I try to clear it when the cpu is hot-removed.

 As we need the mapping to migrate a task to the cpu on the same node first,
 I think we can clear the mapping when the node is offlined.
 
 Hmm maybe, but hardware that can hot-add is rare and nobody has it so
 nobody cares ;-)

Yes, nobody cares it now. But we have a such hardware, so I care it now.

 
 But by clearing cpu_to_node on every hotplug you change semantics for
 all hardware and everybody gets to feel the pain.
 
 I'm not saying you cannot change things, I'm only saying you should be
 far more careful about it, not change it and wait for things to break.
 Put in some effort to find things that might break and warn people --
 sure, you'll always miss some, and that's ok.

I use ./scripts/get_maintainer.pl, and it doesn't tell me that I should cc
you when I post that patch.


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.

2012-10-10 Thread Peter Zijlstra

On Wed, 2012-10-10 at 18:10 +0800, Wen Congyang wrote:
 I use ./scripts/get_maintainer.pl, and it doesn't tell me that I should cc
 you when I post that patch. 

That script doesn't look at all usage sites of the code you modify does
it?

You need to audit the entire tree for usage of the interfaces you change
and email all relevant people for whoem you're planning to break stuff
-- preferably with a proposed fix.

That's manual work, no script will ever do this for you.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.

2012-10-10 Thread David Rientjes

On Wed, 10 Oct 2012, Peter Zijlstra wrote:

  If cpu_to_node() always returns a valid node id even if all cpus on the 
  node are offline, then the cpumask_of_node() implementation, which the 
  sched code is using, should either return an empty cpumask (if 
  node_to_cpumask_map[nid] isn't freed) or cpu_online_mask.  The change in 
  behavior here occurred because 
  cpu_hotplug-unmap-cpu2node-when-the-cpu-is-hotremoved.patch in -mm doesn't 
  return a valid node id and forces it to return -1 so a kzalloc_node(..., 
  -1) fallsback to allocate anywhere.
 
 I think that's broken semantics.. so far the entire cpu-node mapping
 was invariant during hotplug. Changing that is going to be _very_
 interesting and cannot be done lightly.
 
 Because as I said, per-cpu memory is preserved over hotplug, and that
 has numa affinity.
 
 So for now, let me NACK that patch. You cannot go change stuff like
 that.
 

Agreed, that makes the nack-count up to 2 now.  Andrew, please remove 
cpu_hotplug-unmap-cpu2node-when-the-cpu-is-hotremoved.patch
cpu_hotplug-unmap-cpu2node-when-the-cpu-is-hotremoved-fix.patch
from -mm.

  But if you only need cpu_to_node() when waking up to find a runnable cpu 
  for this NUMA information, then I think you can just change the 
  kzalloc_node() in alloc_{fair,rt}_sched_group() to do 
  kzalloc(..., cpu_online(cpu) ? cpu_to_node(cpu) : NUMA_NO_NODE).
 
 That's a confusing statement, the wakeup stuff and the
 alloc_{fair,rt}_sched_group() stuff are unrelated, although both sites
 might need fixing if we're going to go ahead with this.
 

The alternative is for node hot-remove to do an iteration of all possible 
cpus and set cpu-to-node to be NUMA_NO_NODE for all offlined cpus that map 
to that node.  If cpu_online() is true for any of those cpus, then 
obviously it can't be offlined.  We want to do this so that 
kzalloc_node(..., cpu_to_node()) fallsback to allocating from any node, 
which it should, and because a subsequent node hot-add event that reuses 
the same node id may not be the same node.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.

2012-10-10 Thread Andrew Morton

On Wed, 10 Oct 2012 13:30:29 -0700 (PDT)
David Rientjes rient...@google.com wrote:

  So for now, let me NACK that patch. You cannot go change stuff like
  that.
  
 
 Agreed, that makes the nack-count up to 2 now.  Andrew, please remove 
 cpu_hotplug-unmap-cpu2node-when-the-cpu-is-hotremoved.patch
 cpu_hotplug-unmap-cpu2node-when-the-cpu-is-hotremoved-fix.patch
 from -mm.

Nope.  It fixes a BUG() and so I'll be keeping it around until I see a
better fix.  It's one of the ways in which I prevent things from falling
through cracks.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.

2012-10-10 Thread David Rientjes

On Wed, 10 Oct 2012, Andrew Morton wrote:

   So for now, let me NACK that patch. You cannot go change stuff like
   that.
   
  
  Agreed, that makes the nack-count up to 2 now.  Andrew, please remove 
  cpu_hotplug-unmap-cpu2node-when-the-cpu-is-hotremoved.patch
  cpu_hotplug-unmap-cpu2node-when-the-cpu-is-hotremoved-fix.patch
  from -mm.
 
 Nope.  It fixes a BUG() and so I'll be keeping it around until I see a
 better fix.  It's one of the ways in which I prevent things from falling
 through cracks.
 

It fixes a BUG() that only affects users who are doing node hot-remove, 
which is still radically under development, and nobody cares about except 
those on the cc list, but it also introduces the NULL pointer dereference 
that is attempting to be addressed in this patch.  The fix that causes 
this NULL pointer is clearly not the direction we want to go, I think we 
have agreement at node hot-remove to iterate all possible cpus are map all 
offline cpus with cpu_to_node(cpu) == node to NUMA_NO_NODE instead in the 
generic hotplug code.

Regardless, this shouldn't be touching the acpi code which
cpu_hotplug-unmap-cpu2node-when-the-cpu-is-hotremoved.patch and 
cpu_hotplug-unmap-cpu2node-when-the-cpu-is-hotremoved-fix.patch do since 
it makes the behavior inconsistent across interfaces and architectures.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.

2012-10-09 Thread Wen Congyang

At 10/10/2012 10:06 AM, Wen Congyang Wrote:
> At 10/10/2012 07:27 AM, David Rientjes Wrote:
>> On Tue, 9 Oct 2012, Peter Zijlstra wrote:
>>
>>> Well the code they were patching is in the wakeup path. As I think Tang
>>> said, we leave !runnable tasks on whatever cpu they ran on last, even if
>>> that cpu is offlined, we try and fix up state when we get a wakeup.
>>>
>>> On wakeup, it tries to find a cpu to run on and will try a cpu of the
>>> same node first.
>>>
>>> Now if that node's entirely gone away, it appears the cpu_to_node() map
>>> will not return a valid node number.
>>>
>>> I think that's a change in behaviour, it didn't used to do that afaik.
>>> Certainly this code hasn't change in a while.
>>>
>>
>> If cpu_to_node() always returns a valid node id even if all cpus on the 
>> node are offline, then the cpumask_of_node() implementation, which the 
>> sched code is using, should either return an empty cpumask (if 
>> node_to_cpumask_map[nid] isn't freed) or cpu_online_mask.  The change in 
>> behavior here occurred because 
>> cpu_hotplug-unmap-cpu2node-when-the-cpu-is-hotremoved.patch in -mm doesn't 
>> return a valid node id and forces it to return -1 so a kzalloc_node(..., 
>> -1) fallsback to allocate anywhere.
>>
>> But if you only need cpu_to_node() when waking up to find a runnable cpu 
>> for this NUMA information, then I think you can just change the 
>> kzalloc_node() in alloc_{fair,rt}_sched_group() to do 
>> kzalloc(..., cpu_online(cpu) ? cpu_to_node(cpu) : NUMA_NO_NODE).
>>
>>  [ The changelog here is confusing because it's fixing a problem in 
>>linux-next without saying so. ]
>>
> 
> I don't agree with this way. Because it only fix the code which causes a
> problem, and we can't say there is no any similar problem. So it is
> why I clear the cpu-to-node mapping.
> 
> What about the following solution:
> 1. clear the cpu-to-node mapping when the node is offlined

There is no interface to online/offline a node. We online a node only
when the cpu/memory is node, and offline it when all cpu/memory in
this node is offlined(TODO).

So we may need to map cpu-to-node when the cpu is onlined if clear
it when the node is offlined. But we don't know the cpu's node.

Thanks
Wen Congyang

> 2. tang's patch is still necessary because we leave !runnable tasks on
>whatever cpu they ran on last. If cpu's node is NUMA_NO_NODE, it means
>the entire node is offlined, and we must migrate the task to the other
>node.
> 
> Thanks
> Wen Congyang
> --
> To unsubscribe from this list: send the line "unsubscribe linux-numa" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.

2012-10-09 Thread Wen Congyang

At 10/10/2012 07:27 AM, David Rientjes Wrote:
> On Tue, 9 Oct 2012, Peter Zijlstra wrote:
> 
>> Well the code they were patching is in the wakeup path. As I think Tang
>> said, we leave !runnable tasks on whatever cpu they ran on last, even if
>> that cpu is offlined, we try and fix up state when we get a wakeup.
>>
>> On wakeup, it tries to find a cpu to run on and will try a cpu of the
>> same node first.
>>
>> Now if that node's entirely gone away, it appears the cpu_to_node() map
>> will not return a valid node number.
>>
>> I think that's a change in behaviour, it didn't used to do that afaik.
>> Certainly this code hasn't change in a while.
>>
> 
> If cpu_to_node() always returns a valid node id even if all cpus on the 
> node are offline, then the cpumask_of_node() implementation, which the 
> sched code is using, should either return an empty cpumask (if 
> node_to_cpumask_map[nid] isn't freed) or cpu_online_mask.  The change in 
> behavior here occurred because 
> cpu_hotplug-unmap-cpu2node-when-the-cpu-is-hotremoved.patch in -mm doesn't 
> return a valid node id and forces it to return -1 so a kzalloc_node(..., 
> -1) fallsback to allocate anywhere.
> 
> But if you only need cpu_to_node() when waking up to find a runnable cpu 
> for this NUMA information, then I think you can just change the 
> kzalloc_node() in alloc_{fair,rt}_sched_group() to do 
> kzalloc(..., cpu_online(cpu) ? cpu_to_node(cpu) : NUMA_NO_NODE).
> 
>  [ The changelog here is confusing because it's fixing a problem in 
>linux-next without saying so. ]
> 

I don't agree with this way. Because it only fix the code which causes a
problem, and we can't say there is no any similar problem. So it is
why I clear the cpu-to-node mapping.

What about the following solution:
1. clear the cpu-to-node mapping when the node is offlined
2. tang's patch is still necessary because we leave !runnable tasks on
   whatever cpu they ran on last. If cpu's node is NUMA_NO_NODE, it means
   the entire node is offlined, and we must migrate the task to the other
   node.

Thanks
Wen Congyang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.

2012-10-09 Thread David Rientjes

On Tue, 9 Oct 2012, Peter Zijlstra wrote:

> Well the code they were patching is in the wakeup path. As I think Tang
> said, we leave !runnable tasks on whatever cpu they ran on last, even if
> that cpu is offlined, we try and fix up state when we get a wakeup.
> 
> On wakeup, it tries to find a cpu to run on and will try a cpu of the
> same node first.
> 
> Now if that node's entirely gone away, it appears the cpu_to_node() map
> will not return a valid node number.
> 
> I think that's a change in behaviour, it didn't used to do that afaik.
> Certainly this code hasn't change in a while.
> 

If cpu_to_node() always returns a valid node id even if all cpus on the 
node are offline, then the cpumask_of_node() implementation, which the 
sched code is using, should either return an empty cpumask (if 
node_to_cpumask_map[nid] isn't freed) or cpu_online_mask.  The change in 
behavior here occurred because 
cpu_hotplug-unmap-cpu2node-when-the-cpu-is-hotremoved.patch in -mm doesn't 
return a valid node id and forces it to return -1 so a kzalloc_node(..., 
-1) fallsback to allocate anywhere.

But if you only need cpu_to_node() when waking up to find a runnable cpu 
for this NUMA information, then I think you can just change the 
kzalloc_node() in alloc_{fair,rt}_sched_group() to do 
kzalloc(..., cpu_online(cpu) ? cpu_to_node(cpu) : NUMA_NO_NODE).

 [ The changelog here is confusing because it's fixing a problem in 
   linux-next without saying so. ]
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.

2012-10-09 Thread Peter Zijlstra

On Tue, 2012-10-09 at 13:36 -0700, David Rientjes wrote:
> On Tue, 9 Oct 2012, Peter Zijlstra wrote:
> 
> > On Mon, 2012-10-08 at 10:59 +0800, Tang Chen wrote:
> > > If a cpu is offline, its nid will be set to -1, and cpu_to_node(cpu) will
> > > return -1. As a result, cpumask_of_node(nid) will return NULL. In this 
> > > case,
> > > find_next_bit() in for_each_cpu will get a NULL pointer and cause panic.
> > 
> > Hurm,. this is new, right? Who is changing all these semantics without
> > auditing the tree and informing all affected people?
> > 
> 
> I've nacked the patch that did it because I think it should be done from 
> the generic cpu hotplug code only at the CPU_DEAD level with a per-arch 
> callback to fixup whatever cpu-to-node mappings they maintain since 
> processes can reenter the scheduler at CPU_DYING.

Well the code they were patching is in the wakeup path. As I think Tang
said, we leave !runnable tasks on whatever cpu they ran on last, even if
that cpu is offlined, we try and fix up state when we get a wakeup.

On wakeup, it tries to find a cpu to run on and will try a cpu of the
same node first.

Now if that node's entirely gone away, it appears the cpu_to_node() map
will not return a valid node number.

I think that's a change in behaviour, it didn't used to do that afaik.
Certainly this code hasn't change in a while.

> The whole issue seems to be because alloc_{fair,rt}_sched_group() does an 
> iteration over all possible cpus (not all online cpus) and does 
> kzalloc_node() which references a now-offlined node.  Changing it to -1 
> makes the slab code fallback to any online node.

Right, that's because the rq structures are assumed always present. What
I cannot remember is why I'm not using per-cpu allocations there,
because that's exactly what it looks like it wants to be.

> What I think we need to do instead of hacking only the acpi code and not 
> standardizing this across the kernel is:

Right, what I don't understand is wtf ACPI has to do with anything. We
have plenty cpu hotplug code, ACPI isn't involved in any of that last
time I checked.

>  - reset cpu-to-node with a per-arch callback in generic cpu hotplug code 
>at CPU_DEAD, and
> 
>  - do an iteration over all possible cpus for node hot-remove ensuring 
>there are no stale references.

Why do we need to clear cpu-to-node maps? are we going to change the
topology at runtime? What are you going to do with per-cpu stuff,
per-cpu memory isn't freed on hotplug, so its node relation is static.

/me confused..

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.

2012-10-09 Thread David Rientjes

On Tue, 9 Oct 2012, Peter Zijlstra wrote:

> On Mon, 2012-10-08 at 10:59 +0800, Tang Chen wrote:
> > If a cpu is offline, its nid will be set to -1, and cpu_to_node(cpu) will
> > return -1. As a result, cpumask_of_node(nid) will return NULL. In this case,
> > find_next_bit() in for_each_cpu will get a NULL pointer and cause panic.
> 
> Hurm,. this is new, right? Who is changing all these semantics without
> auditing the tree and informing all affected people?
> 

I've nacked the patch that did it because I think it should be done from 
the generic cpu hotplug code only at the CPU_DEAD level with a per-arch 
callback to fixup whatever cpu-to-node mappings they maintain since 
processes can reenter the scheduler at CPU_DYING.

The whole issue seems to be because alloc_{fair,rt}_sched_group() does an 
iteration over all possible cpus (not all online cpus) and does 
kzalloc_node() which references a now-offlined node.  Changing it to -1 
makes the slab code fallback to any online node.

What I think we need to do instead of hacking only the acpi code and not 
standardizing this across the kernel is:

 - reset cpu-to-node with a per-arch callback in generic cpu hotplug code 
   at CPU_DEAD, and

 - do an iteration over all possible cpus for node hot-remove ensuring 
   there are no stale references.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.

2012-10-09 Thread David Rientjes

On Tue, 9 Oct 2012, Wen Congyang wrote:

> I clear cpu-to-node mapping when the cpu is hotremoved. If the cpu is onlined,
> it will be offlined before clearing cpu-to-node mapping.
> 
> Here is the code in driver/acpi/processor_driver.c:
> =
> static int acpi_processor_handle_eject(struct acpi_processor *pr)
> {
>   if (cpu_online(pr->id))
>   cpu_down(pr->id);  <== cpu is offlined here.
> 
>   arch_unregister_cpu(pr->id);
>   acpi_unmap_lsapic(pr->id); <== I clear the mapping here.
>   return (0);
> }
> =
> 
> The real problem is that: we don't migrate this task to another cpu when
> the cpu is offlined. I guess this task is not in running state, and it
> is not in the cpu's runqueue.
> 

ACPI is not the only way to offline a cpu, this all needs to be 
standardized throughout the kernel for each arch so that generic code like 
sched can use cpu_to_node() in a CPU_DYING callback.  The only way to do 
that is by having arch callbacks to set cpu-to-node to NUMA_NO_NODE only 
at CPU_DEAD.

> This patch is try to fix a bug: the kernel will be panicked after removing a
> node.
> 

That's much more subtle than having a NULL pointer dereference like Tang 
reports during migration.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.

2012-10-09 Thread Peter Zijlstra

On Mon, 2012-10-08 at 10:59 +0800, Tang Chen wrote:
> If a cpu is offline, its nid will be set to -1, and cpu_to_node(cpu) will
> return -1. As a result, cpumask_of_node(nid) will return NULL. In this case,
> find_next_bit() in for_each_cpu will get a NULL pointer and cause panic.

Hurm,. this is new, right? Who is changing all these semantics without
auditing the tree and informing all affected people?


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.

2012-10-09 Thread Wen Congyang

At 10/09/2012 06:04 PM, David Rientjes Wrote:
> On Tue, 9 Oct 2012, Tang Chen wrote:
> 
 Eek, the nid shouldn't be -1 yet, though, for cpu hotplug since this
 should be called at CPU_DYING level and migrate_tasks() still sees a valid
 cpu.
>>
>> As Wen said below, nid is now set to -1 when cpu is hotremoved.
>> I reproduce this problem in this situation:
>>
>> all cpus are online, and hot remove a system board directorily, without
>> offlining any cpu.
>>
>> As a result, the removed cpu's nid is set to -1, and this causes
>> problems.
>>
> 
> Let's add Andrew to the cc list then, because I'm nacking 
> cpu_hotplug-unmap-cpu2node-when-the-cpu-is-hotremoved.patch in the -mm 
> tree for this reason.
> 
> We can only clear a cpu-to-node mapping when the cpu is completely 
> offline, not before or during the CPU_DYING stage.  Kernel code, such as 

I clear cpu-to-node mapping when the cpu is hotremoved. If the cpu is onlined,
it will be offlined before clearing cpu-to-node mapping.

Here is the code in driver/acpi/processor_driver.c:
=
static int acpi_processor_handle_eject(struct acpi_processor *pr)
{
if (cpu_online(pr->id))
cpu_down(pr->id);  <== cpu is offlined here.

arch_unregister_cpu(pr->id);
acpi_unmap_lsapic(pr->id); <== I clear the mapping here.
return (0);
}
=

The real problem is that: we don't migrate this task to another cpu when
the cpu is offlined. I guess this task is not in running state, and it
is not in the cpu's runqueue.

Thanks
Wen Congyang

> the sched code that you are now trying to "fix", depends on this mapping 
> to work correctly; obviously no audit was done of cpu hotplug code 
> depending on it before the patch was proposed.
> 
> I say "fix" because even this workaround isn't a good solution since it 
> would be much better to pick another cpu on the same node as the offlining 
> cpu for the runqueue before falling back to the set of all allowed nodes.  
> We lose all NUMA affinity information with that patch.  There's no reason 
> why we shouldn't know the node of a cpu that is being offlined.
> 
> So nack to cpu_hotplug-unmap-cpu2node-when-the-cpu-is-hotremoved.patch.  
> After it's removed because it's buggy, this "fix" will no longer be 
> necessary.
> 

This patch is try to fix a bug: the kernel will be panicked after removing a
node.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.

2012-10-09 Thread David Rientjes

On Tue, 9 Oct 2012, Tang Chen wrote:

> > > Eek, the nid shouldn't be -1 yet, though, for cpu hotplug since this
> > > should be called at CPU_DYING level and migrate_tasks() still sees a valid
> > > cpu.
> 
> As Wen said below, nid is now set to -1 when cpu is hotremoved.
> I reproduce this problem in this situation:
> 
> all cpus are online, and hot remove a system board directorily, without
> offlining any cpu.
> 
> As a result, the removed cpu's nid is set to -1, and this causes
> problems.
> 

Let's add Andrew to the cc list then, because I'm nacking 
cpu_hotplug-unmap-cpu2node-when-the-cpu-is-hotremoved.patch in the -mm 
tree for this reason.

We can only clear a cpu-to-node mapping when the cpu is completely 
offline, not before or during the CPU_DYING stage.  Kernel code, such as 
the sched code that you are now trying to "fix", depends on this mapping 
to work correctly; obviously no audit was done of cpu hotplug code 
depending on it before the patch was proposed.

I say "fix" because even this workaround isn't a good solution since it 
would be much better to pick another cpu on the same node as the offlining 
cpu for the runqueue before falling back to the set of all allowed nodes.  
We lose all NUMA affinity information with that patch.  There's no reason 
why we shouldn't know the node of a cpu that is being offlined.

So nack to cpu_hotplug-unmap-cpu2node-when-the-cpu-is-hotremoved.patch.  
After it's removed because it's buggy, this "fix" will no longer be 
necessary.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.

2012-10-09 Thread Tang Chen


Hi David,

Thanks for reviewing this patch. :)

On 10/09/2012 04:34 PM, Wen Congyang wrote:

At 10/09/2012 02:21 PM, David Rientjes Wrote:

On Mon, 8 Oct 2012, Tang Chen wrote:

+   /* If the cpu has been offlined, its nid was set to -1. */
+   if (nid != -1) {


NUMA_NO_NODE.


Yes, NUMA_NO_NODE is better. I'll improve it, thanks. :)



Eek, the nid shouldn't be -1 yet, though, for cpu hotplug since this
should be called at CPU_DYING level and migrate_tasks() still sees a valid
cpu.


As Wen said below, nid is now set to -1 when cpu is hotremoved.
I reproduce this problem in this situation:

all cpus are online, and hot remove a system board directorily, without
offlining any cpu.

As a result, the removed cpu's nid is set to -1, and this causes
problems.



the cpu's node is set when the cpu is hotpluged(not online), and it will
be cleared when the cpu is hotremoved(This patch is in akpm tree):
https://lkml.org/lkml/2012/9/3/39

I guess the task is in sleep state when the cpu is offlined, and it doesn't
be migrated to another cpu.

Thanks
Wen Congyang



On x86, cpumask_of_node() is always guaranteed to return a valid cpumask
after boot so presumably this is a problem in some non-x86 arch code and
isn't actually a sched problem.


BTW, my box is x86. I think for the reason I said above,
cpumask_of_node() will no longer guaranteed anything even if on x86.

Thanks. :)




+   nodemask = cpumask_of_node(nid);
+
+   /* Look for allowed, online CPU in same node. */
+   for_each_cpu(dest_cpu, nodemask) {
+   if (!cpu_online(dest_cpu))
+   continue;
+   if (!cpu_active(dest_cpu))
+   continue;
+   if (cpumask_test_cpu(dest_cpu, tsk_cpus_allowed(p)))
+   return dest_cpu;
+   }
}

for (;;) {




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.

2012-10-09 Thread Wen Congyang

At 10/09/2012 02:21 PM, David Rientjes Wrote:
> On Mon, 8 Oct 2012, Tang Chen wrote:
> 
>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>> index 66b36ab..e76dce9 100644
>> --- a/kernel/sched/core.c
>> +++ b/kernel/sched/core.c
>> @@ -1263,18 +1263,24 @@ EXPORT_SYMBOL_GPL(kick_process);
>>   */
>>  static int select_fallback_rq(int cpu, struct task_struct *p)
>>  {
>> -const struct cpumask *nodemask = cpumask_of_node(cpu_to_node(cpu));
>> +int nid = cpu_to_node(cpu);
>> +const struct cpumask *nodemask = NULL;
>>  enum { cpuset, possible, fail } state = cpuset;
>>  int dest_cpu;
>>  
>> -/* Look for allowed, online CPU in same node. */
>> -for_each_cpu(dest_cpu, nodemask) {
>> -if (!cpu_online(dest_cpu))
>> -continue;
>> -if (!cpu_active(dest_cpu))
>> -continue;
>> -if (cpumask_test_cpu(dest_cpu, tsk_cpus_allowed(p)))
>> -return dest_cpu;
>> +/* If the cpu has been offlined, its nid was set to -1. */
>> +if (nid != -1) {
> 
> NUMA_NO_NODE.
> 
> Eek, the nid shouldn't be -1 yet, though, for cpu hotplug since this 
> should be called at CPU_DYING level and migrate_tasks() still sees a valid 
> cpu.

the cpu's node is set when the cpu is hotpluged(not online), and it will
be cleared when the cpu is hotremoved(This patch is in akpm tree):
https://lkml.org/lkml/2012/9/3/39

I guess the task is in sleep state when the cpu is offlined, and it doesn't
be migrated to another cpu.

Thanks
Wen Congyang

> 
> On x86, cpumask_of_node() is always guaranteed to return a valid cpumask 
> after boot so presumably this is a problem in some non-x86 arch code and 
> isn't actually a sched problem.
> 
>> +nodemask = cpumask_of_node(nid);
>> +
>> +/* Look for allowed, online CPU in same node. */
>> +for_each_cpu(dest_cpu, nodemask) {
>> +if (!cpu_online(dest_cpu))
>> +continue;
>> +if (!cpu_active(dest_cpu))
>> +continue;
>> +if (cpumask_test_cpu(dest_cpu, tsk_cpus_allowed(p)))
>> +return dest_cpu;
>> +}
>>  }
>>  
>>  for (;;) {
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.

2012-10-09 Thread David Rientjes

On Mon, 8 Oct 2012, Tang Chen wrote:

> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 66b36ab..e76dce9 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -1263,18 +1263,24 @@ EXPORT_SYMBOL_GPL(kick_process);
>   */
>  static int select_fallback_rq(int cpu, struct task_struct *p)
>  {
> - const struct cpumask *nodemask = cpumask_of_node(cpu_to_node(cpu));
> + int nid = cpu_to_node(cpu);
> + const struct cpumask *nodemask = NULL;
>   enum { cpuset, possible, fail } state = cpuset;
>   int dest_cpu;
>  
> - /* Look for allowed, online CPU in same node. */
> - for_each_cpu(dest_cpu, nodemask) {
> - if (!cpu_online(dest_cpu))
> - continue;
> - if (!cpu_active(dest_cpu))
> - continue;
> - if (cpumask_test_cpu(dest_cpu, tsk_cpus_allowed(p)))
> - return dest_cpu;
> + /* If the cpu has been offlined, its nid was set to -1. */
> + if (nid != -1) {

NUMA_NO_NODE.

Eek, the nid shouldn't be -1 yet, though, for cpu hotplug since this 
should be called at CPU_DYING level and migrate_tasks() still sees a valid 
cpu.

On x86, cpumask_of_node() is always guaranteed to return a valid cpumask 
after boot so presumably this is a problem in some non-x86 arch code and 
isn't actually a sched problem.

> + nodemask = cpumask_of_node(nid);
> +
> + /* Look for allowed, online CPU in same node. */
> + for_each_cpu(dest_cpu, nodemask) {
> + if (!cpu_online(dest_cpu))
> + continue;
> + if (!cpu_active(dest_cpu))
> + continue;
> + if (cpumask_test_cpu(dest_cpu, tsk_cpus_allowed(p)))
> + return dest_cpu;
> + }
>   }
>  
>   for (;;) {
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.

2012-10-09 Thread David Rientjes

On Mon, 8 Oct 2012, Tang Chen wrote:

 diff --git a/kernel/sched/core.c b/kernel/sched/core.c
 index 66b36ab..e76dce9 100644
 --- a/kernel/sched/core.c
 +++ b/kernel/sched/core.c
 @@ -1263,18 +1263,24 @@ EXPORT_SYMBOL_GPL(kick_process);
   */
  static int select_fallback_rq(int cpu, struct task_struct *p)
  {
 - const struct cpumask *nodemask = cpumask_of_node(cpu_to_node(cpu));
 + int nid = cpu_to_node(cpu);
 + const struct cpumask *nodemask = NULL;
   enum { cpuset, possible, fail } state = cpuset;
   int dest_cpu;
  
 - /* Look for allowed, online CPU in same node. */
 - for_each_cpu(dest_cpu, nodemask) {
 - if (!cpu_online(dest_cpu))
 - continue;
 - if (!cpu_active(dest_cpu))
 - continue;
 - if (cpumask_test_cpu(dest_cpu, tsk_cpus_allowed(p)))
 - return dest_cpu;
 + /* If the cpu has been offlined, its nid was set to -1. */
 + if (nid != -1) {

NUMA_NO_NODE.

Eek, the nid shouldn't be -1 yet, though, for cpu hotplug since this 
should be called at CPU_DYING level and migrate_tasks() still sees a valid 
cpu.

On x86, cpumask_of_node() is always guaranteed to return a valid cpumask 
after boot so presumably this is a problem in some non-x86 arch code and 
isn't actually a sched problem.

 + nodemask = cpumask_of_node(nid);
 +
 + /* Look for allowed, online CPU in same node. */
 + for_each_cpu(dest_cpu, nodemask) {
 + if (!cpu_online(dest_cpu))
 + continue;
 + if (!cpu_active(dest_cpu))
 + continue;
 + if (cpumask_test_cpu(dest_cpu, tsk_cpus_allowed(p)))
 + return dest_cpu;
 + }
   }
  
   for (;;) {
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.

2012-10-09 Thread Wen Congyang

At 10/09/2012 02:21 PM, David Rientjes Wrote:
 On Mon, 8 Oct 2012, Tang Chen wrote:
 
 diff --git a/kernel/sched/core.c b/kernel/sched/core.c
 index 66b36ab..e76dce9 100644
 --- a/kernel/sched/core.c
 +++ b/kernel/sched/core.c
 @@ -1263,18 +1263,24 @@ EXPORT_SYMBOL_GPL(kick_process);
   */
  static int select_fallback_rq(int cpu, struct task_struct *p)
  {
 -const struct cpumask *nodemask = cpumask_of_node(cpu_to_node(cpu));
 +int nid = cpu_to_node(cpu);
 +const struct cpumask *nodemask = NULL;
  enum { cpuset, possible, fail } state = cpuset;
  int dest_cpu;
  
 -/* Look for allowed, online CPU in same node. */
 -for_each_cpu(dest_cpu, nodemask) {
 -if (!cpu_online(dest_cpu))
 -continue;
 -if (!cpu_active(dest_cpu))
 -continue;
 -if (cpumask_test_cpu(dest_cpu, tsk_cpus_allowed(p)))
 -return dest_cpu;
 +/* If the cpu has been offlined, its nid was set to -1. */
 +if (nid != -1) {
 
 NUMA_NO_NODE.
 
 Eek, the nid shouldn't be -1 yet, though, for cpu hotplug since this 
 should be called at CPU_DYING level and migrate_tasks() still sees a valid 
 cpu.

the cpu's node is set when the cpu is hotpluged(not online), and it will
be cleared when the cpu is hotremoved(This patch is in akpm tree):
https://lkml.org/lkml/2012/9/3/39

I guess the task is in sleep state when the cpu is offlined, and it doesn't
be migrated to another cpu.

Thanks
Wen Congyang

 
 On x86, cpumask_of_node() is always guaranteed to return a valid cpumask 
 after boot so presumably this is a problem in some non-x86 arch code and 
 isn't actually a sched problem.
 
 +nodemask = cpumask_of_node(nid);
 +
 +/* Look for allowed, online CPU in same node. */
 +for_each_cpu(dest_cpu, nodemask) {
 +if (!cpu_online(dest_cpu))
 +continue;
 +if (!cpu_active(dest_cpu))
 +continue;
 +if (cpumask_test_cpu(dest_cpu, tsk_cpus_allowed(p)))
 +return dest_cpu;
 +}
  }
  
  for (;;) {
 

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.

2012-10-09 Thread Tang Chen


Hi David,

Thanks for reviewing this patch. :)

On 10/09/2012 04:34 PM, Wen Congyang wrote:

At 10/09/2012 02:21 PM, David Rientjes Wrote:

On Mon, 8 Oct 2012, Tang Chen wrote:

+   /* If the cpu has been offlined, its nid was set to -1. */
+   if (nid != -1) {


NUMA_NO_NODE.


Yes, NUMA_NO_NODE is better. I'll improve it, thanks. :)



Eek, the nid shouldn't be -1 yet, though, for cpu hotplug since this
should be called at CPU_DYING level and migrate_tasks() still sees a valid
cpu.


As Wen said below, nid is now set to -1 when cpu is hotremoved.
I reproduce this problem in this situation:

all cpus are online, and hot remove a system board directorily, without
offlining any cpu.

As a result, the removed cpu's nid is set to -1, and this causes
problems.



the cpu's node is set when the cpu is hotpluged(not online), and it will
be cleared when the cpu is hotremoved(This patch is in akpm tree):
https://lkml.org/lkml/2012/9/3/39

I guess the task is in sleep state when the cpu is offlined, and it doesn't
be migrated to another cpu.

Thanks
Wen Congyang



On x86, cpumask_of_node() is always guaranteed to return a valid cpumask
after boot so presumably this is a problem in some non-x86 arch code and
isn't actually a sched problem.


BTW, my box is x86. I think for the reason I said above,
cpumask_of_node() will no longer guaranteed anything even if on x86.

Thanks. :)




+   nodemask = cpumask_of_node(nid);
+
+   /* Look for allowed, online CPU in same node. */
+   for_each_cpu(dest_cpu, nodemask) {
+   if (!cpu_online(dest_cpu))
+   continue;
+   if (!cpu_active(dest_cpu))
+   continue;
+   if (cpumask_test_cpu(dest_cpu, tsk_cpus_allowed(p)))
+   return dest_cpu;
+   }
}

for (;;) {




--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.

2012-10-09 Thread David Rientjes

On Tue, 9 Oct 2012, Tang Chen wrote:

   Eek, the nid shouldn't be -1 yet, though, for cpu hotplug since this
   should be called at CPU_DYING level and migrate_tasks() still sees a valid
   cpu.
 
 As Wen said below, nid is now set to -1 when cpu is hotremoved.
 I reproduce this problem in this situation:
 
 all cpus are online, and hot remove a system board directorily, without
 offlining any cpu.
 
 As a result, the removed cpu's nid is set to -1, and this causes
 problems.
 

Let's add Andrew to the cc list then, because I'm nacking 
cpu_hotplug-unmap-cpu2node-when-the-cpu-is-hotremoved.patch in the -mm 
tree for this reason.

We can only clear a cpu-to-node mapping when the cpu is completely 
offline, not before or during the CPU_DYING stage.  Kernel code, such as 
the sched code that you are now trying to fix, depends on this mapping 
to work correctly; obviously no audit was done of cpu hotplug code 
depending on it before the patch was proposed.

I say fix because even this workaround isn't a good solution since it 
would be much better to pick another cpu on the same node as the offlining 
cpu for the runqueue before falling back to the set of all allowed nodes.  
We lose all NUMA affinity information with that patch.  There's no reason 
why we shouldn't know the node of a cpu that is being offlined.

So nack to cpu_hotplug-unmap-cpu2node-when-the-cpu-is-hotremoved.patch.  
After it's removed because it's buggy, this fix will no longer be 
necessary.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.

2012-10-09 Thread Wen Congyang

At 10/09/2012 06:04 PM, David Rientjes Wrote:
 On Tue, 9 Oct 2012, Tang Chen wrote:
 
 Eek, the nid shouldn't be -1 yet, though, for cpu hotplug since this
 should be called at CPU_DYING level and migrate_tasks() still sees a valid
 cpu.

 As Wen said below, nid is now set to -1 when cpu is hotremoved.
 I reproduce this problem in this situation:

 all cpus are online, and hot remove a system board directorily, without
 offlining any cpu.

 As a result, the removed cpu's nid is set to -1, and this causes
 problems.

 
 Let's add Andrew to the cc list then, because I'm nacking 
 cpu_hotplug-unmap-cpu2node-when-the-cpu-is-hotremoved.patch in the -mm 
 tree for this reason.
 
 We can only clear a cpu-to-node mapping when the cpu is completely 
 offline, not before or during the CPU_DYING stage.  Kernel code, such as 

I clear cpu-to-node mapping when the cpu is hotremoved. If the cpu is onlined,
it will be offlined before clearing cpu-to-node mapping.

Here is the code in driver/acpi/processor_driver.c:
=
static int acpi_processor_handle_eject(struct acpi_processor *pr)
{
if (cpu_online(pr-id))
cpu_down(pr-id);  == cpu is offlined here.

arch_unregister_cpu(pr-id);
acpi_unmap_lsapic(pr-id); == I clear the mapping here.
return (0);
}
=

The real problem is that: we don't migrate this task to another cpu when
the cpu is offlined. I guess this task is not in running state, and it
is not in the cpu's runqueue.

Thanks
Wen Congyang

 the sched code that you are now trying to fix, depends on this mapping 
 to work correctly; obviously no audit was done of cpu hotplug code 
 depending on it before the patch was proposed.
 
 I say fix because even this workaround isn't a good solution since it 
 would be much better to pick another cpu on the same node as the offlining 
 cpu for the runqueue before falling back to the set of all allowed nodes.  
 We lose all NUMA affinity information with that patch.  There's no reason 
 why we shouldn't know the node of a cpu that is being offlined.
 
 So nack to cpu_hotplug-unmap-cpu2node-when-the-cpu-is-hotremoved.patch.  
 After it's removed because it's buggy, this fix will no longer be 
 necessary.
 

This patch is try to fix a bug: the kernel will be panicked after removing a
node.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.

2012-10-09 Thread Peter Zijlstra

On Mon, 2012-10-08 at 10:59 +0800, Tang Chen wrote:
 If a cpu is offline, its nid will be set to -1, and cpu_to_node(cpu) will
 return -1. As a result, cpumask_of_node(nid) will return NULL. In this case,
 find_next_bit() in for_each_cpu will get a NULL pointer and cause panic.

Hurm,. this is new, right? Who is changing all these semantics without
auditing the tree and informing all affected people?


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.

2012-10-09 Thread David Rientjes

On Tue, 9 Oct 2012, Wen Congyang wrote:

 I clear cpu-to-node mapping when the cpu is hotremoved. If the cpu is onlined,
 it will be offlined before clearing cpu-to-node mapping.
 
 Here is the code in driver/acpi/processor_driver.c:
 =
 static int acpi_processor_handle_eject(struct acpi_processor *pr)
 {
   if (cpu_online(pr-id))
   cpu_down(pr-id);  == cpu is offlined here.
 
   arch_unregister_cpu(pr-id);
   acpi_unmap_lsapic(pr-id); == I clear the mapping here.
   return (0);
 }
 =
 
 The real problem is that: we don't migrate this task to another cpu when
 the cpu is offlined. I guess this task is not in running state, and it
 is not in the cpu's runqueue.
 

ACPI is not the only way to offline a cpu, this all needs to be 
standardized throughout the kernel for each arch so that generic code like 
sched can use cpu_to_node() in a CPU_DYING callback.  The only way to do 
that is by having arch callbacks to set cpu-to-node to NUMA_NO_NODE only 
at CPU_DEAD.

 This patch is try to fix a bug: the kernel will be panicked after removing a
 node.
 

That's much more subtle than having a NULL pointer dereference like Tang 
reports during migration.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.

2012-10-09 Thread David Rientjes

On Tue, 9 Oct 2012, Peter Zijlstra wrote:

 On Mon, 2012-10-08 at 10:59 +0800, Tang Chen wrote:
  If a cpu is offline, its nid will be set to -1, and cpu_to_node(cpu) will
  return -1. As a result, cpumask_of_node(nid) will return NULL. In this case,
  find_next_bit() in for_each_cpu will get a NULL pointer and cause panic.
 
 Hurm,. this is new, right? Who is changing all these semantics without
 auditing the tree and informing all affected people?
 

I've nacked the patch that did it because I think it should be done from 
the generic cpu hotplug code only at the CPU_DEAD level with a per-arch 
callback to fixup whatever cpu-to-node mappings they maintain since 
processes can reenter the scheduler at CPU_DYING.

The whole issue seems to be because alloc_{fair,rt}_sched_group() does an 
iteration over all possible cpus (not all online cpus) and does 
kzalloc_node() which references a now-offlined node.  Changing it to -1 
makes the slab code fallback to any online node.

What I think we need to do instead of hacking only the acpi code and not 
standardizing this across the kernel is:

 - reset cpu-to-node with a per-arch callback in generic cpu hotplug code 
   at CPU_DEAD, and

 - do an iteration over all possible cpus for node hot-remove ensuring 
   there are no stale references.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.

2012-10-09 Thread Peter Zijlstra

On Tue, 2012-10-09 at 13:36 -0700, David Rientjes wrote:
 On Tue, 9 Oct 2012, Peter Zijlstra wrote:
 
  On Mon, 2012-10-08 at 10:59 +0800, Tang Chen wrote:
   If a cpu is offline, its nid will be set to -1, and cpu_to_node(cpu) will
   return -1. As a result, cpumask_of_node(nid) will return NULL. In this 
   case,
   find_next_bit() in for_each_cpu will get a NULL pointer and cause panic.
  
  Hurm,. this is new, right? Who is changing all these semantics without
  auditing the tree and informing all affected people?
  
 
 I've nacked the patch that did it because I think it should be done from 
 the generic cpu hotplug code only at the CPU_DEAD level with a per-arch 
 callback to fixup whatever cpu-to-node mappings they maintain since 
 processes can reenter the scheduler at CPU_DYING.

Well the code they were patching is in the wakeup path. As I think Tang
said, we leave !runnable tasks on whatever cpu they ran on last, even if
that cpu is offlined, we try and fix up state when we get a wakeup.

On wakeup, it tries to find a cpu to run on and will try a cpu of the
same node first.

Now if that node's entirely gone away, it appears the cpu_to_node() map
will not return a valid node number.

I think that's a change in behaviour, it didn't used to do that afaik.
Certainly this code hasn't change in a while.


 The whole issue seems to be because alloc_{fair,rt}_sched_group() does an 
 iteration over all possible cpus (not all online cpus) and does 
 kzalloc_node() which references a now-offlined node.  Changing it to -1 
 makes the slab code fallback to any online node.

Right, that's because the rq structures are assumed always present. What
I cannot remember is why I'm not using per-cpu allocations there,
because that's exactly what it looks like it wants to be.

 What I think we need to do instead of hacking only the acpi code and not 
 standardizing this across the kernel is:

Right, what I don't understand is wtf ACPI has to do with anything. We
have plenty cpu hotplug code, ACPI isn't involved in any of that last
time I checked.

  - reset cpu-to-node with a per-arch callback in generic cpu hotplug code 
at CPU_DEAD, and
 
  - do an iteration over all possible cpus for node hot-remove ensuring 
there are no stale references.

Why do we need to clear cpu-to-node maps? are we going to change the
topology at runtime? What are you going to do with per-cpu stuff,
per-cpu memory isn't freed on hotplug, so its node relation is static.

/me confused..

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.

2012-10-09 Thread David Rientjes

On Tue, 9 Oct 2012, Peter Zijlstra wrote:

 Well the code they were patching is in the wakeup path. As I think Tang
 said, we leave !runnable tasks on whatever cpu they ran on last, even if
 that cpu is offlined, we try and fix up state when we get a wakeup.
 
 On wakeup, it tries to find a cpu to run on and will try a cpu of the
 same node first.
 
 Now if that node's entirely gone away, it appears the cpu_to_node() map
 will not return a valid node number.
 
 I think that's a change in behaviour, it didn't used to do that afaik.
 Certainly this code hasn't change in a while.
 

If cpu_to_node() always returns a valid node id even if all cpus on the 
node are offline, then the cpumask_of_node() implementation, which the 
sched code is using, should either return an empty cpumask (if 
node_to_cpumask_map[nid] isn't freed) or cpu_online_mask.  The change in 
behavior here occurred because 
cpu_hotplug-unmap-cpu2node-when-the-cpu-is-hotremoved.patch in -mm doesn't 
return a valid node id and forces it to return -1 so a kzalloc_node(..., 
-1) fallsback to allocate anywhere.

But if you only need cpu_to_node() when waking up to find a runnable cpu 
for this NUMA information, then I think you can just change the 
kzalloc_node() in alloc_{fair,rt}_sched_group() to do 
kzalloc(..., cpu_online(cpu) ? cpu_to_node(cpu) : NUMA_NO_NODE).

 [ The changelog here is confusing because it's fixing a problem in 
   linux-next without saying so. ]
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.

2012-10-09 Thread Wen Congyang

At 10/10/2012 07:27 AM, David Rientjes Wrote:
 On Tue, 9 Oct 2012, Peter Zijlstra wrote:
 
 Well the code they were patching is in the wakeup path. As I think Tang
 said, we leave !runnable tasks on whatever cpu they ran on last, even if
 that cpu is offlined, we try and fix up state when we get a wakeup.

 On wakeup, it tries to find a cpu to run on and will try a cpu of the
 same node first.

 Now if that node's entirely gone away, it appears the cpu_to_node() map
 will not return a valid node number.

 I think that's a change in behaviour, it didn't used to do that afaik.
 Certainly this code hasn't change in a while.

 
 If cpu_to_node() always returns a valid node id even if all cpus on the 
 node are offline, then the cpumask_of_node() implementation, which the 
 sched code is using, should either return an empty cpumask (if 
 node_to_cpumask_map[nid] isn't freed) or cpu_online_mask.  The change in 
 behavior here occurred because 
 cpu_hotplug-unmap-cpu2node-when-the-cpu-is-hotremoved.patch in -mm doesn't 
 return a valid node id and forces it to return -1 so a kzalloc_node(..., 
 -1) fallsback to allocate anywhere.
 
 But if you only need cpu_to_node() when waking up to find a runnable cpu 
 for this NUMA information, then I think you can just change the 
 kzalloc_node() in alloc_{fair,rt}_sched_group() to do 
 kzalloc(..., cpu_online(cpu) ? cpu_to_node(cpu) : NUMA_NO_NODE).
 
  [ The changelog here is confusing because it's fixing a problem in 
linux-next without saying so. ]
 

I don't agree with this way. Because it only fix the code which causes a
problem, and we can't say there is no any similar problem. So it is
why I clear the cpu-to-node mapping.

What about the following solution:
1. clear the cpu-to-node mapping when the node is offlined
2. tang's patch is still necessary because we leave !runnable tasks on
   whatever cpu they ran on last. If cpu's node is NUMA_NO_NODE, it means
   the entire node is offlined, and we must migrate the task to the other
   node.

Thanks
Wen Congyang
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.

2012-10-09 Thread Wen Congyang

At 10/10/2012 10:06 AM, Wen Congyang Wrote:
 At 10/10/2012 07:27 AM, David Rientjes Wrote:
 On Tue, 9 Oct 2012, Peter Zijlstra wrote:

 Well the code they were patching is in the wakeup path. As I think Tang
 said, we leave !runnable tasks on whatever cpu they ran on last, even if
 that cpu is offlined, we try and fix up state when we get a wakeup.

 On wakeup, it tries to find a cpu to run on and will try a cpu of the
 same node first.

 Now if that node's entirely gone away, it appears the cpu_to_node() map
 will not return a valid node number.

 I think that's a change in behaviour, it didn't used to do that afaik.
 Certainly this code hasn't change in a while.


 If cpu_to_node() always returns a valid node id even if all cpus on the 
 node are offline, then the cpumask_of_node() implementation, which the 
 sched code is using, should either return an empty cpumask (if 
 node_to_cpumask_map[nid] isn't freed) or cpu_online_mask.  The change in 
 behavior here occurred because 
 cpu_hotplug-unmap-cpu2node-when-the-cpu-is-hotremoved.patch in -mm doesn't 
 return a valid node id and forces it to return -1 so a kzalloc_node(..., 
 -1) fallsback to allocate anywhere.

 But if you only need cpu_to_node() when waking up to find a runnable cpu 
 for this NUMA information, then I think you can just change the 
 kzalloc_node() in alloc_{fair,rt}_sched_group() to do 
 kzalloc(..., cpu_online(cpu) ? cpu_to_node(cpu) : NUMA_NO_NODE).

  [ The changelog here is confusing because it's fixing a problem in 
linux-next without saying so. ]

 
 I don't agree with this way. Because it only fix the code which causes a
 problem, and we can't say there is no any similar problem. So it is
 why I clear the cpu-to-node mapping.
 
 What about the following solution:
 1. clear the cpu-to-node mapping when the node is offlined

There is no interface to online/offline a node. We online a node only
when the cpu/memory is node, and offline it when all cpu/memory in
this node is offlined(TODO).

So we may need to map cpu-to-node when the cpu is onlined if clear
it when the node is offlined. But we don't know the cpu's node.

Thanks
Wen Congyang

 2. tang's patch is still necessary because we leave !runnable tasks on
whatever cpu they ran on last. If cpu's node is NUMA_NO_NODE, it means
the entire node is offlined, and we must migrate the task to the other
node.
 
 Thanks
 Wen Congyang
 --
 To unsubscribe from this list: send the line unsubscribe linux-numa in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] Do not use cpu_to_node() to find an offlined cpu's node.

2012-10-07 Thread Tang Chen

If a cpu is offline, its nid will be set to -1, and cpu_to_node(cpu) will
return -1. As a result, cpumask_of_node(nid) will return NULL. In this case,
find_next_bit() in for_each_cpu will get a NULL pointer and cause panic.

Here is a call trace:
[  609.824017] Call Trace:
[  609.824017]   
[  609.824017]  [] select_fallback_rq+0x71/0x190
[  609.824017]  [] ? try_to_wake_up+0x2e/0x2f0
[  609.824017]  [] try_to_wake_up+0x2cb/0x2f0
[  609.824017]  [] ? __run_hrtimer+0x78/0x320
[  609.824017]  [] wake_up_process+0x15/0x20
[  609.824017]  [] hrtimer_wakeup+0x22/0x30
[  609.824017]  [] __run_hrtimer+0x83/0x320
[  609.824017]  [] ? update_rmtp+0x80/0x80
[  609.824017]  [] hrtimer_interrupt+0x106/0x280
[  609.824017]  [] ? sd_free_ctl_entry+0x68/0x70
[  609.824017]  [] smp_apic_timer_interrupt+0x69/0x99
[  609.824017]  [] apic_timer_interrupt+0x6f/0x80

There is a hrtimer process sleeping, whose cpu has already been offlined.
When it is waken up, it tries to find another cpu to run, and get a -1 nid.
As a result, cpumask_of_node(-1) returns NULL, and causes ernel panic.

This patch fixes this problem by judging if the nid is -1.
If nid is not -1, a cpu on the same node will be picked.
Else, a online cpu on another node will be picked.

Signed-off-by: Tang Chen 
---
 kernel/sched/core.c |   24 +++-
 1 files changed, 15 insertions(+), 9 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 66b36ab..e76dce9 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1263,18 +1263,24 @@ EXPORT_SYMBOL_GPL(kick_process);
  */
 static int select_fallback_rq(int cpu, struct task_struct *p)
 {
-   const struct cpumask *nodemask = cpumask_of_node(cpu_to_node(cpu));
+   int nid = cpu_to_node(cpu);
+   const struct cpumask *nodemask = NULL;
enum { cpuset, possible, fail } state = cpuset;
int dest_cpu;
 
-   /* Look for allowed, online CPU in same node. */
-   for_each_cpu(dest_cpu, nodemask) {
-   if (!cpu_online(dest_cpu))
-   continue;
-   if (!cpu_active(dest_cpu))
-   continue;
-   if (cpumask_test_cpu(dest_cpu, tsk_cpus_allowed(p)))
-   return dest_cpu;
+   /* If the cpu has been offlined, its nid was set to -1. */
+   if (nid != -1) {
+   nodemask = cpumask_of_node(nid);
+
+   /* Look for allowed, online CPU in same node. */
+   for_each_cpu(dest_cpu, nodemask) {
+   if (!cpu_online(dest_cpu))
+   continue;
+   if (!cpu_active(dest_cpu))
+   continue;
+   if (cpumask_test_cpu(dest_cpu, tsk_cpus_allowed(p)))
+   return dest_cpu;
+   }
}
 
for (;;) {
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] Do not use cpu_to_node() to find an offlined cpu's node.

2012-10-07 Thread Tang Chen

If a cpu is offline, its nid will be set to -1, and cpu_to_node(cpu) will
return -1. As a result, cpumask_of_node(nid) will return NULL. In this case,
find_next_bit() in for_each_cpu will get a NULL pointer and cause panic.

Here is a call trace:
[  609.824017] Call Trace:
[  609.824017]  IRQ 
[  609.824017]  [810b0721] select_fallback_rq+0x71/0x190
[  609.824017]  [810b086e] ? try_to_wake_up+0x2e/0x2f0
[  609.824017]  [810b0b0b] try_to_wake_up+0x2cb/0x2f0
[  609.824017]  [8109da08] ? __run_hrtimer+0x78/0x320
[  609.824017]  [810b0b85] wake_up_process+0x15/0x20
[  609.824017]  [8109ce62] hrtimer_wakeup+0x22/0x30
[  609.824017]  [8109da13] __run_hrtimer+0x83/0x320
[  609.824017]  [8109ce40] ? update_rmtp+0x80/0x80
[  609.824017]  [8109df56] hrtimer_interrupt+0x106/0x280
[  609.824017]  [810a72c8] ? sd_free_ctl_entry+0x68/0x70
[  609.824017]  [8167cf39] smp_apic_timer_interrupt+0x69/0x99
[  609.824017]  [8167be2f] apic_timer_interrupt+0x6f/0x80

There is a hrtimer process sleeping, whose cpu has already been offlined.
When it is waken up, it tries to find another cpu to run, and get a -1 nid.
As a result, cpumask_of_node(-1) returns NULL, and causes ernel panic.

This patch fixes this problem by judging if the nid is -1.
If nid is not -1, a cpu on the same node will be picked.
Else, a online cpu on another node will be picked.

Signed-off-by: Tang Chen tangc...@cn.fujitsu.com
---
 kernel/sched/core.c |   24 +++-
 1 files changed, 15 insertions(+), 9 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 66b36ab..e76dce9 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1263,18 +1263,24 @@ EXPORT_SYMBOL_GPL(kick_process);
  */
 static int select_fallback_rq(int cpu, struct task_struct *p)
 {
-   const struct cpumask *nodemask = cpumask_of_node(cpu_to_node(cpu));
+   int nid = cpu_to_node(cpu);
+   const struct cpumask *nodemask = NULL;
enum { cpuset, possible, fail } state = cpuset;
int dest_cpu;
 
-   /* Look for allowed, online CPU in same node. */
-   for_each_cpu(dest_cpu, nodemask) {
-   if (!cpu_online(dest_cpu))
-   continue;
-   if (!cpu_active(dest_cpu))
-   continue;
-   if (cpumask_test_cpu(dest_cpu, tsk_cpus_allowed(p)))
-   return dest_cpu;
+   /* If the cpu has been offlined, its nid was set to -1. */
+   if (nid != -1) {
+   nodemask = cpumask_of_node(nid);
+
+   /* Look for allowed, online CPU in same node. */
+   for_each_cpu(dest_cpu, nodemask) {
+   if (!cpu_online(dest_cpu))
+   continue;
+   if (!cpu_active(dest_cpu))
+   continue;
+   if (cpumask_test_cpu(dest_cpu, tsk_cpus_allowed(p)))
+   return dest_cpu;
+   }
}
 
for (;;) {
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

50 matches

Mail list logo