Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.
On Wed, 2012-10-17 at 20:29 -0700, David Rientjes wrote: > > Ok, thanks for the update. I agree that we should be clearing the mapping > at node hot-remove since any cpu that would subsequently get onlined and > assume one of the previous cpu's ids is not guaranteed to have the same > affinity. Would this mean we have to remap (and memcpy) per-cpu memory on node-plug? > I'm just really hoping that we don't touch the acpi code and that we can > remove both cpu_hotplug-unmap-cpu2node-when-the-cpu-is-hotremoved.patch > and cpu_hotplug-unmap-cpu2node-when-the-cpu-is-hotremoved-fix.patch from > -mm. Yeah, none of this should be anywhere near ACPI, its got nothing to do with ACPI. Furthermore it should be be same across all archs, not just be weird and wonderful for x86. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.
On Wed, 2012-10-17 at 20:29 -0700, David Rientjes wrote: Ok, thanks for the update. I agree that we should be clearing the mapping at node hot-remove since any cpu that would subsequently get onlined and assume one of the previous cpu's ids is not guaranteed to have the same affinity. Would this mean we have to remap (and memcpy) per-cpu memory on node-plug? I'm just really hoping that we don't touch the acpi code and that we can remove both cpu_hotplug-unmap-cpu2node-when-the-cpu-is-hotremoved.patch and cpu_hotplug-unmap-cpu2node-when-the-cpu-is-hotremoved-fix.patch from -mm. Yeah, none of this should be anywhere near ACPI, its got nothing to do with ACPI. Furthermore it should be be same across all archs, not just be weird and wonderful for x86. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.
On Thu, 18 Oct 2012, Tang Chen wrote: > We are working on this problem. Since it is complicated, it really > takes us some time. Sorry for the delay. :) > > Actually, we intend to clear cpu-to-node mappings when a whole node is > removed. But the node hot-plug code is still under development, so I > think Wen will send a fix patch soon. :) > Ok, thanks for the update. I agree that we should be clearing the mapping at node hot-remove since any cpu that would subsequently get onlined and assume one of the previous cpu's ids is not guaranteed to have the same affinity. I'm just really hoping that we don't touch the acpi code and that we can remove both cpu_hotplug-unmap-cpu2node-when-the-cpu-is-hotremoved.patch and cpu_hotplug-unmap-cpu2node-when-the-cpu-is-hotremoved-fix.patch from -mm. Thanks again! -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.
On 10/18/2012 08:52 AM, David Rientjes wrote: On Wed, 10 Oct 2012, David Rientjes wrote: Ok, so it's been a week and these patches are still in -mm. This is what I was afraid of: patches that both Peter and I nacked sitting in -mm and allow a NULL pointer dereference because no alternative patch exists yet to fix the issue correctly. Tang and Wen, are you intending on addressing these problems (i.e. not touching the acpi code at all and rather clearing cpu-to-node mappings at node hot-remove) as we've discussed or do I need to do it myself? Hi David, We are working on this problem. Since it is complicated, it really takes us some time. Sorry for the delay. :) Actually, we intend to clear cpu-to-node mappings when a whole node is removed. But the node hot-plug code is still under development, so I think Wen will send a fix patch soon. :) Thanks. Thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.
On Wed, 10 Oct 2012, David Rientjes wrote: > It fixes a BUG() that only affects users who are doing node hot-remove, > which is still radically under development, and nobody cares about except > those on the cc list, but it also introduces the NULL pointer dereference > that is attempting to be addressed in this patch. The "fix" that causes > this NULL pointer is clearly not the direction we want to go, I think we > have agreement at node hot-remove to iterate all possible cpus are map all > offline cpus with cpu_to_node(cpu) == node to NUMA_NO_NODE instead in the > generic hotplug code. > > Regardless, this shouldn't be touching the acpi code which > cpu_hotplug-unmap-cpu2node-when-the-cpu-is-hotremoved.patch and > cpu_hotplug-unmap-cpu2node-when-the-cpu-is-hotremoved-fix.patch do since > it makes the behavior inconsistent across interfaces and architectures. > Ok, so it's been a week and these patches are still in -mm. This is what I was afraid of: patches that both Peter and I nacked sitting in -mm and allow a NULL pointer dereference because no alternative patch exists yet to fix the issue correctly. Tang and Wen, are you intending on addressing these problems (i.e. not touching the acpi code at all and rather clearing cpu-to-node mappings at node hot-remove) as we've discussed or do I need to do it myself? Thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.
On Wed, 10 Oct 2012, David Rientjes wrote: It fixes a BUG() that only affects users who are doing node hot-remove, which is still radically under development, and nobody cares about except those on the cc list, but it also introduces the NULL pointer dereference that is attempting to be addressed in this patch. The fix that causes this NULL pointer is clearly not the direction we want to go, I think we have agreement at node hot-remove to iterate all possible cpus are map all offline cpus with cpu_to_node(cpu) == node to NUMA_NO_NODE instead in the generic hotplug code. Regardless, this shouldn't be touching the acpi code which cpu_hotplug-unmap-cpu2node-when-the-cpu-is-hotremoved.patch and cpu_hotplug-unmap-cpu2node-when-the-cpu-is-hotremoved-fix.patch do since it makes the behavior inconsistent across interfaces and architectures. Ok, so it's been a week and these patches are still in -mm. This is what I was afraid of: patches that both Peter and I nacked sitting in -mm and allow a NULL pointer dereference because no alternative patch exists yet to fix the issue correctly. Tang and Wen, are you intending on addressing these problems (i.e. not touching the acpi code at all and rather clearing cpu-to-node mappings at node hot-remove) as we've discussed or do I need to do it myself? Thanks. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.
On 10/18/2012 08:52 AM, David Rientjes wrote: On Wed, 10 Oct 2012, David Rientjes wrote: Ok, so it's been a week and these patches are still in -mm. This is what I was afraid of: patches that both Peter and I nacked sitting in -mm and allow a NULL pointer dereference because no alternative patch exists yet to fix the issue correctly. Tang and Wen, are you intending on addressing these problems (i.e. not touching the acpi code at all and rather clearing cpu-to-node mappings at node hot-remove) as we've discussed or do I need to do it myself? Hi David, We are working on this problem. Since it is complicated, it really takes us some time. Sorry for the delay. :) Actually, we intend to clear cpu-to-node mappings when a whole node is removed. But the node hot-plug code is still under development, so I think Wen will send a fix patch soon. :) Thanks. Thanks. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.
On Thu, 18 Oct 2012, Tang Chen wrote: We are working on this problem. Since it is complicated, it really takes us some time. Sorry for the delay. :) Actually, we intend to clear cpu-to-node mappings when a whole node is removed. But the node hot-plug code is still under development, so I think Wen will send a fix patch soon. :) Ok, thanks for the update. I agree that we should be clearing the mapping at node hot-remove since any cpu that would subsequently get onlined and assume one of the previous cpu's ids is not guaranteed to have the same affinity. I'm just really hoping that we don't touch the acpi code and that we can remove both cpu_hotplug-unmap-cpu2node-when-the-cpu-is-hotremoved.patch and cpu_hotplug-unmap-cpu2node-when-the-cpu-is-hotremoved-fix.patch from -mm. Thanks again! -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.
On Wed, 10 Oct 2012, Andrew Morton wrote: > > > So for now, let me NACK that patch. You cannot go change stuff like > > > that. > > > > > > > Agreed, that makes the nack-count up to 2 now. Andrew, please remove > > cpu_hotplug-unmap-cpu2node-when-the-cpu-is-hotremoved.patch > > cpu_hotplug-unmap-cpu2node-when-the-cpu-is-hotremoved-fix.patch > > from -mm. > > Nope. It fixes a BUG() and so I'll be keeping it around until I see a > better fix. It's one of the ways in which I prevent things from falling > through cracks. > It fixes a BUG() that only affects users who are doing node hot-remove, which is still radically under development, and nobody cares about except those on the cc list, but it also introduces the NULL pointer dereference that is attempting to be addressed in this patch. The "fix" that causes this NULL pointer is clearly not the direction we want to go, I think we have agreement at node hot-remove to iterate all possible cpus are map all offline cpus with cpu_to_node(cpu) == node to NUMA_NO_NODE instead in the generic hotplug code. Regardless, this shouldn't be touching the acpi code which cpu_hotplug-unmap-cpu2node-when-the-cpu-is-hotremoved.patch and cpu_hotplug-unmap-cpu2node-when-the-cpu-is-hotremoved-fix.patch do since it makes the behavior inconsistent across interfaces and architectures. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.
On Wed, 10 Oct 2012 13:30:29 -0700 (PDT) David Rientjes wrote: > > So for now, let me NACK that patch. You cannot go change stuff like > > that. > > > > Agreed, that makes the nack-count up to 2 now. Andrew, please remove > cpu_hotplug-unmap-cpu2node-when-the-cpu-is-hotremoved.patch > cpu_hotplug-unmap-cpu2node-when-the-cpu-is-hotremoved-fix.patch > from -mm. Nope. It fixes a BUG() and so I'll be keeping it around until I see a better fix. It's one of the ways in which I prevent things from falling through cracks. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.
On Wed, 10 Oct 2012, Peter Zijlstra wrote: > > If cpu_to_node() always returns a valid node id even if all cpus on the > > node are offline, then the cpumask_of_node() implementation, which the > > sched code is using, should either return an empty cpumask (if > > node_to_cpumask_map[nid] isn't freed) or cpu_online_mask. The change in > > behavior here occurred because > > cpu_hotplug-unmap-cpu2node-when-the-cpu-is-hotremoved.patch in -mm doesn't > > return a valid node id and forces it to return -1 so a kzalloc_node(..., > > -1) fallsback to allocate anywhere. > > I think that's broken semantics.. so far the entire cpu<->node mapping > was invariant during hotplug. Changing that is going to be _very_ > interesting and cannot be done lightly. > > Because as I said, per-cpu memory is preserved over hotplug, and that > has numa affinity. > > So for now, let me NACK that patch. You cannot go change stuff like > that. > Agreed, that makes the nack-count up to 2 now. Andrew, please remove cpu_hotplug-unmap-cpu2node-when-the-cpu-is-hotremoved.patch cpu_hotplug-unmap-cpu2node-when-the-cpu-is-hotremoved-fix.patch from -mm. > > But if you only need cpu_to_node() when waking up to find a runnable cpu > > for this NUMA information, then I think you can just change the > > kzalloc_node() in alloc_{fair,rt}_sched_group() to do > > kzalloc(..., cpu_online(cpu) ? cpu_to_node(cpu) : NUMA_NO_NODE). > > That's a confusing statement, the wakeup stuff and the > alloc_{fair,rt}_sched_group() stuff are unrelated, although both sites > might need fixing if we're going to go ahead with this. > The alternative is for node hot-remove to do an iteration of all possible cpus and set cpu-to-node to be NUMA_NO_NODE for all offlined cpus that map to that node. If cpu_online() is true for any of those cpus, then obviously it can't be offlined. We want to do this so that kzalloc_node(..., cpu_to_node()) fallsback to allocating from any node, which it should, and because a subsequent node hot-add event that reuses the same node id may not be the same node. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.
On Wed, 2012-10-10 at 18:10 +0800, Wen Congyang wrote: > I use ./scripts/get_maintainer.pl, and it doesn't tell me that I should cc > you when I post that patch. That script doesn't look at all usage sites of the code you modify does it? You need to audit the entire tree for usage of the interfaces you change and email all relevant people for whoem you're planning to break stuff -- preferably with a proposed fix. That's manual work, no script will ever do this for you. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.
At 10/10/2012 05:51 PM, Peter Zijlstra Wrote: > On Wed, 2012-10-10 at 17:33 +0800, Wen Congyang wrote: >> >> Hmm, if per-cpu memory is preserved, and we can't offline and remove >> this memory. So we can't offline the node. >> >> But, if the node is hot added, and per-cpu memory doesn't use the >> memory on this node. We can hotremove cpu/memory on this node, and then >> offline this node. >> >> Before the cpu is hotadded, cpu's node is -1. We set cpu<->node mapping >> when it is hotadded. So the entire cpu<->node mapping was not invariant >> during hotplug. >> >> So it is why I try to clear it when the cpu is hot-removed. >> >> As we need the mapping to migrate a task to the cpu on the same node first, >> I think we can clear the mapping when the node is offlined. > > Hmm maybe, but hardware that can hot-add is rare and nobody has it so > nobody cares ;-) Yes, nobody cares it now. But we have a such hardware, so I care it now. > > But by clearing cpu_to_node on every hotplug you change semantics for > all hardware and everybody gets to feel the pain. > > I'm not saying you cannot change things, I'm only saying you should be > far more careful about it, not change it and wait for things to break. > Put in some effort to find things that might break and warn people -- > sure, you'll always miss some, and that's ok. I use ./scripts/get_maintainer.pl, and it doesn't tell me that I should cc you when I post that patch. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.
On Wed, 2012-10-10 at 17:33 +0800, Wen Congyang wrote: > > Hmm, if per-cpu memory is preserved, and we can't offline and remove > this memory. So we can't offline the node. > > But, if the node is hot added, and per-cpu memory doesn't use the > memory on this node. We can hotremove cpu/memory on this node, and then > offline this node. > > Before the cpu is hotadded, cpu's node is -1. We set cpu<->node mapping > when it is hotadded. So the entire cpu<->node mapping was not invariant > during hotplug. > > So it is why I try to clear it when the cpu is hot-removed. > > As we need the mapping to migrate a task to the cpu on the same node first, > I think we can clear the mapping when the node is offlined. Hmm maybe, but hardware that can hot-add is rare and nobody has it so nobody cares ;-) But by clearing cpu_to_node on every hotplug you change semantics for all hardware and everybody gets to feel the pain. I'm not saying you cannot change things, I'm only saying you should be far more careful about it, not change it and wait for things to break. Put in some effort to find things that might break and warn people -- sure, you'll always miss some, and that's ok. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.
At 10/10/2012 05:10 PM, Peter Zijlstra Wrote: > On Tue, 2012-10-09 at 16:27 -0700, David Rientjes wrote: >> On Tue, 9 Oct 2012, Peter Zijlstra wrote: >> >>> Well the code they were patching is in the wakeup path. As I think Tang >>> said, we leave !runnable tasks on whatever cpu they ran on last, even if >>> that cpu is offlined, we try and fix up state when we get a wakeup. >>> >>> On wakeup, it tries to find a cpu to run on and will try a cpu of the >>> same node first. >>> >>> Now if that node's entirely gone away, it appears the cpu_to_node() map >>> will not return a valid node number. >>> >>> I think that's a change in behaviour, it didn't used to do that afaik. >>> Certainly this code hasn't change in a while. >>> >> >> If cpu_to_node() always returns a valid node id even if all cpus on the >> node are offline, then the cpumask_of_node() implementation, which the >> sched code is using, should either return an empty cpumask (if >> node_to_cpumask_map[nid] isn't freed) or cpu_online_mask. The change in >> behavior here occurred because >> cpu_hotplug-unmap-cpu2node-when-the-cpu-is-hotremoved.patch in -mm doesn't >> return a valid node id and forces it to return -1 so a kzalloc_node(..., >> -1) fallsback to allocate anywhere. > > I think that's broken semantics.. so far the entire cpu<->node mapping > was invariant during hotplug. Changing that is going to be _very_ > interesting and cannot be done lightly. > > Because as I said, per-cpu memory is preserved over hotplug, and that > has numa affinity. Hmm, if per-cpu memory is preserved, and we can't offline and remove this memory. So we can't offline the node. But, if the node is hot added, and per-cpu memory doesn't use the memory on this node. We can hotremove cpu/memory on this node, and then offline this node. Before the cpu is hotadded, cpu's node is -1. We set cpu<->node mapping when it is hotadded. So the entire cpu<->node mapping was not invariant during hotplug. So it is why I try to clear it when the cpu is hot-removed. As we need the mapping to migrate a task to the cpu on the same node first, I think we can clear the mapping when the node is offlined. Thanks Wen Congyang > > So for now, let me NACK that patch. You cannot go change stuff like > that. > >> >> But if you only need cpu_to_node() when waking up to find a runnable cpu >> for this NUMA information, then I think you can just change the >> kzalloc_node() in alloc_{fair,rt}_sched_group() to do >> kzalloc(..., cpu_online(cpu) ? cpu_to_node(cpu) : NUMA_NO_NODE). > > That's a confusing statement, the wakeup stuff and the > alloc_{fair,rt}_sched_group() stuff are unrelated, although both sites > might need fixing if we're going to go ahead with this. > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.
On Tue, 2012-10-09 at 16:27 -0700, David Rientjes wrote: > On Tue, 9 Oct 2012, Peter Zijlstra wrote: > > > Well the code they were patching is in the wakeup path. As I think Tang > > said, we leave !runnable tasks on whatever cpu they ran on last, even if > > that cpu is offlined, we try and fix up state when we get a wakeup. > > > > On wakeup, it tries to find a cpu to run on and will try a cpu of the > > same node first. > > > > Now if that node's entirely gone away, it appears the cpu_to_node() map > > will not return a valid node number. > > > > I think that's a change in behaviour, it didn't used to do that afaik. > > Certainly this code hasn't change in a while. > > > > If cpu_to_node() always returns a valid node id even if all cpus on the > node are offline, then the cpumask_of_node() implementation, which the > sched code is using, should either return an empty cpumask (if > node_to_cpumask_map[nid] isn't freed) or cpu_online_mask. The change in > behavior here occurred because > cpu_hotplug-unmap-cpu2node-when-the-cpu-is-hotremoved.patch in -mm doesn't > return a valid node id and forces it to return -1 so a kzalloc_node(..., > -1) fallsback to allocate anywhere. I think that's broken semantics.. so far the entire cpu<->node mapping was invariant during hotplug. Changing that is going to be _very_ interesting and cannot be done lightly. Because as I said, per-cpu memory is preserved over hotplug, and that has numa affinity. So for now, let me NACK that patch. You cannot go change stuff like that. > > But if you only need cpu_to_node() when waking up to find a runnable cpu > for this NUMA information, then I think you can just change the > kzalloc_node() in alloc_{fair,rt}_sched_group() to do > kzalloc(..., cpu_online(cpu) ? cpu_to_node(cpu) : NUMA_NO_NODE). That's a confusing statement, the wakeup stuff and the alloc_{fair,rt}_sched_group() stuff are unrelated, although both sites might need fixing if we're going to go ahead with this. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.
On Tue, 2012-10-09 at 16:27 -0700, David Rientjes wrote: On Tue, 9 Oct 2012, Peter Zijlstra wrote: Well the code they were patching is in the wakeup path. As I think Tang said, we leave !runnable tasks on whatever cpu they ran on last, even if that cpu is offlined, we try and fix up state when we get a wakeup. On wakeup, it tries to find a cpu to run on and will try a cpu of the same node first. Now if that node's entirely gone away, it appears the cpu_to_node() map will not return a valid node number. I think that's a change in behaviour, it didn't used to do that afaik. Certainly this code hasn't change in a while. If cpu_to_node() always returns a valid node id even if all cpus on the node are offline, then the cpumask_of_node() implementation, which the sched code is using, should either return an empty cpumask (if node_to_cpumask_map[nid] isn't freed) or cpu_online_mask. The change in behavior here occurred because cpu_hotplug-unmap-cpu2node-when-the-cpu-is-hotremoved.patch in -mm doesn't return a valid node id and forces it to return -1 so a kzalloc_node(..., -1) fallsback to allocate anywhere. I think that's broken semantics.. so far the entire cpu-node mapping was invariant during hotplug. Changing that is going to be _very_ interesting and cannot be done lightly. Because as I said, per-cpu memory is preserved over hotplug, and that has numa affinity. So for now, let me NACK that patch. You cannot go change stuff like that. But if you only need cpu_to_node() when waking up to find a runnable cpu for this NUMA information, then I think you can just change the kzalloc_node() in alloc_{fair,rt}_sched_group() to do kzalloc(..., cpu_online(cpu) ? cpu_to_node(cpu) : NUMA_NO_NODE). That's a confusing statement, the wakeup stuff and the alloc_{fair,rt}_sched_group() stuff are unrelated, although both sites might need fixing if we're going to go ahead with this. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.
At 10/10/2012 05:10 PM, Peter Zijlstra Wrote: On Tue, 2012-10-09 at 16:27 -0700, David Rientjes wrote: On Tue, 9 Oct 2012, Peter Zijlstra wrote: Well the code they were patching is in the wakeup path. As I think Tang said, we leave !runnable tasks on whatever cpu they ran on last, even if that cpu is offlined, we try and fix up state when we get a wakeup. On wakeup, it tries to find a cpu to run on and will try a cpu of the same node first. Now if that node's entirely gone away, it appears the cpu_to_node() map will not return a valid node number. I think that's a change in behaviour, it didn't used to do that afaik. Certainly this code hasn't change in a while. If cpu_to_node() always returns a valid node id even if all cpus on the node are offline, then the cpumask_of_node() implementation, which the sched code is using, should either return an empty cpumask (if node_to_cpumask_map[nid] isn't freed) or cpu_online_mask. The change in behavior here occurred because cpu_hotplug-unmap-cpu2node-when-the-cpu-is-hotremoved.patch in -mm doesn't return a valid node id and forces it to return -1 so a kzalloc_node(..., -1) fallsback to allocate anywhere. I think that's broken semantics.. so far the entire cpu-node mapping was invariant during hotplug. Changing that is going to be _very_ interesting and cannot be done lightly. Because as I said, per-cpu memory is preserved over hotplug, and that has numa affinity. Hmm, if per-cpu memory is preserved, and we can't offline and remove this memory. So we can't offline the node. But, if the node is hot added, and per-cpu memory doesn't use the memory on this node. We can hotremove cpu/memory on this node, and then offline this node. Before the cpu is hotadded, cpu's node is -1. We set cpu-node mapping when it is hotadded. So the entire cpu-node mapping was not invariant during hotplug. So it is why I try to clear it when the cpu is hot-removed. As we need the mapping to migrate a task to the cpu on the same node first, I think we can clear the mapping when the node is offlined. Thanks Wen Congyang So for now, let me NACK that patch. You cannot go change stuff like that. But if you only need cpu_to_node() when waking up to find a runnable cpu for this NUMA information, then I think you can just change the kzalloc_node() in alloc_{fair,rt}_sched_group() to do kzalloc(..., cpu_online(cpu) ? cpu_to_node(cpu) : NUMA_NO_NODE). That's a confusing statement, the wakeup stuff and the alloc_{fair,rt}_sched_group() stuff are unrelated, although both sites might need fixing if we're going to go ahead with this. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.
On Wed, 2012-10-10 at 17:33 +0800, Wen Congyang wrote: Hmm, if per-cpu memory is preserved, and we can't offline and remove this memory. So we can't offline the node. But, if the node is hot added, and per-cpu memory doesn't use the memory on this node. We can hotremove cpu/memory on this node, and then offline this node. Before the cpu is hotadded, cpu's node is -1. We set cpu-node mapping when it is hotadded. So the entire cpu-node mapping was not invariant during hotplug. So it is why I try to clear it when the cpu is hot-removed. As we need the mapping to migrate a task to the cpu on the same node first, I think we can clear the mapping when the node is offlined. Hmm maybe, but hardware that can hot-add is rare and nobody has it so nobody cares ;-) But by clearing cpu_to_node on every hotplug you change semantics for all hardware and everybody gets to feel the pain. I'm not saying you cannot change things, I'm only saying you should be far more careful about it, not change it and wait for things to break. Put in some effort to find things that might break and warn people -- sure, you'll always miss some, and that's ok. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.
At 10/10/2012 05:51 PM, Peter Zijlstra Wrote: On Wed, 2012-10-10 at 17:33 +0800, Wen Congyang wrote: Hmm, if per-cpu memory is preserved, and we can't offline and remove this memory. So we can't offline the node. But, if the node is hot added, and per-cpu memory doesn't use the memory on this node. We can hotremove cpu/memory on this node, and then offline this node. Before the cpu is hotadded, cpu's node is -1. We set cpu-node mapping when it is hotadded. So the entire cpu-node mapping was not invariant during hotplug. So it is why I try to clear it when the cpu is hot-removed. As we need the mapping to migrate a task to the cpu on the same node first, I think we can clear the mapping when the node is offlined. Hmm maybe, but hardware that can hot-add is rare and nobody has it so nobody cares ;-) Yes, nobody cares it now. But we have a such hardware, so I care it now. But by clearing cpu_to_node on every hotplug you change semantics for all hardware and everybody gets to feel the pain. I'm not saying you cannot change things, I'm only saying you should be far more careful about it, not change it and wait for things to break. Put in some effort to find things that might break and warn people -- sure, you'll always miss some, and that's ok. I use ./scripts/get_maintainer.pl, and it doesn't tell me that I should cc you when I post that patch. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.
On Wed, 2012-10-10 at 18:10 +0800, Wen Congyang wrote: I use ./scripts/get_maintainer.pl, and it doesn't tell me that I should cc you when I post that patch. That script doesn't look at all usage sites of the code you modify does it? You need to audit the entire tree for usage of the interfaces you change and email all relevant people for whoem you're planning to break stuff -- preferably with a proposed fix. That's manual work, no script will ever do this for you. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.
On Wed, 10 Oct 2012, Peter Zijlstra wrote: If cpu_to_node() always returns a valid node id even if all cpus on the node are offline, then the cpumask_of_node() implementation, which the sched code is using, should either return an empty cpumask (if node_to_cpumask_map[nid] isn't freed) or cpu_online_mask. The change in behavior here occurred because cpu_hotplug-unmap-cpu2node-when-the-cpu-is-hotremoved.patch in -mm doesn't return a valid node id and forces it to return -1 so a kzalloc_node(..., -1) fallsback to allocate anywhere. I think that's broken semantics.. so far the entire cpu-node mapping was invariant during hotplug. Changing that is going to be _very_ interesting and cannot be done lightly. Because as I said, per-cpu memory is preserved over hotplug, and that has numa affinity. So for now, let me NACK that patch. You cannot go change stuff like that. Agreed, that makes the nack-count up to 2 now. Andrew, please remove cpu_hotplug-unmap-cpu2node-when-the-cpu-is-hotremoved.patch cpu_hotplug-unmap-cpu2node-when-the-cpu-is-hotremoved-fix.patch from -mm. But if you only need cpu_to_node() when waking up to find a runnable cpu for this NUMA information, then I think you can just change the kzalloc_node() in alloc_{fair,rt}_sched_group() to do kzalloc(..., cpu_online(cpu) ? cpu_to_node(cpu) : NUMA_NO_NODE). That's a confusing statement, the wakeup stuff and the alloc_{fair,rt}_sched_group() stuff are unrelated, although both sites might need fixing if we're going to go ahead with this. The alternative is for node hot-remove to do an iteration of all possible cpus and set cpu-to-node to be NUMA_NO_NODE for all offlined cpus that map to that node. If cpu_online() is true for any of those cpus, then obviously it can't be offlined. We want to do this so that kzalloc_node(..., cpu_to_node()) fallsback to allocating from any node, which it should, and because a subsequent node hot-add event that reuses the same node id may not be the same node. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.
On Wed, 10 Oct 2012 13:30:29 -0700 (PDT) David Rientjes rient...@google.com wrote: So for now, let me NACK that patch. You cannot go change stuff like that. Agreed, that makes the nack-count up to 2 now. Andrew, please remove cpu_hotplug-unmap-cpu2node-when-the-cpu-is-hotremoved.patch cpu_hotplug-unmap-cpu2node-when-the-cpu-is-hotremoved-fix.patch from -mm. Nope. It fixes a BUG() and so I'll be keeping it around until I see a better fix. It's one of the ways in which I prevent things from falling through cracks. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.
On Wed, 10 Oct 2012, Andrew Morton wrote: So for now, let me NACK that patch. You cannot go change stuff like that. Agreed, that makes the nack-count up to 2 now. Andrew, please remove cpu_hotplug-unmap-cpu2node-when-the-cpu-is-hotremoved.patch cpu_hotplug-unmap-cpu2node-when-the-cpu-is-hotremoved-fix.patch from -mm. Nope. It fixes a BUG() and so I'll be keeping it around until I see a better fix. It's one of the ways in which I prevent things from falling through cracks. It fixes a BUG() that only affects users who are doing node hot-remove, which is still radically under development, and nobody cares about except those on the cc list, but it also introduces the NULL pointer dereference that is attempting to be addressed in this patch. The fix that causes this NULL pointer is clearly not the direction we want to go, I think we have agreement at node hot-remove to iterate all possible cpus are map all offline cpus with cpu_to_node(cpu) == node to NUMA_NO_NODE instead in the generic hotplug code. Regardless, this shouldn't be touching the acpi code which cpu_hotplug-unmap-cpu2node-when-the-cpu-is-hotremoved.patch and cpu_hotplug-unmap-cpu2node-when-the-cpu-is-hotremoved-fix.patch do since it makes the behavior inconsistent across interfaces and architectures. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.
At 10/10/2012 10:06 AM, Wen Congyang Wrote: > At 10/10/2012 07:27 AM, David Rientjes Wrote: >> On Tue, 9 Oct 2012, Peter Zijlstra wrote: >> >>> Well the code they were patching is in the wakeup path. As I think Tang >>> said, we leave !runnable tasks on whatever cpu they ran on last, even if >>> that cpu is offlined, we try and fix up state when we get a wakeup. >>> >>> On wakeup, it tries to find a cpu to run on and will try a cpu of the >>> same node first. >>> >>> Now if that node's entirely gone away, it appears the cpu_to_node() map >>> will not return a valid node number. >>> >>> I think that's a change in behaviour, it didn't used to do that afaik. >>> Certainly this code hasn't change in a while. >>> >> >> If cpu_to_node() always returns a valid node id even if all cpus on the >> node are offline, then the cpumask_of_node() implementation, which the >> sched code is using, should either return an empty cpumask (if >> node_to_cpumask_map[nid] isn't freed) or cpu_online_mask. The change in >> behavior here occurred because >> cpu_hotplug-unmap-cpu2node-when-the-cpu-is-hotremoved.patch in -mm doesn't >> return a valid node id and forces it to return -1 so a kzalloc_node(..., >> -1) fallsback to allocate anywhere. >> >> But if you only need cpu_to_node() when waking up to find a runnable cpu >> for this NUMA information, then I think you can just change the >> kzalloc_node() in alloc_{fair,rt}_sched_group() to do >> kzalloc(..., cpu_online(cpu) ? cpu_to_node(cpu) : NUMA_NO_NODE). >> >> [ The changelog here is confusing because it's fixing a problem in >>linux-next without saying so. ] >> > > I don't agree with this way. Because it only fix the code which causes a > problem, and we can't say there is no any similar problem. So it is > why I clear the cpu-to-node mapping. > > What about the following solution: > 1. clear the cpu-to-node mapping when the node is offlined There is no interface to online/offline a node. We online a node only when the cpu/memory is node, and offline it when all cpu/memory in this node is offlined(TODO). So we may need to map cpu-to-node when the cpu is onlined if clear it when the node is offlined. But we don't know the cpu's node. Thanks Wen Congyang > 2. tang's patch is still necessary because we leave !runnable tasks on >whatever cpu they ran on last. If cpu's node is NUMA_NO_NODE, it means >the entire node is offlined, and we must migrate the task to the other >node. > > Thanks > Wen Congyang > -- > To unsubscribe from this list: send the line "unsubscribe linux-numa" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.
At 10/10/2012 07:27 AM, David Rientjes Wrote: > On Tue, 9 Oct 2012, Peter Zijlstra wrote: > >> Well the code they were patching is in the wakeup path. As I think Tang >> said, we leave !runnable tasks on whatever cpu they ran on last, even if >> that cpu is offlined, we try and fix up state when we get a wakeup. >> >> On wakeup, it tries to find a cpu to run on and will try a cpu of the >> same node first. >> >> Now if that node's entirely gone away, it appears the cpu_to_node() map >> will not return a valid node number. >> >> I think that's a change in behaviour, it didn't used to do that afaik. >> Certainly this code hasn't change in a while. >> > > If cpu_to_node() always returns a valid node id even if all cpus on the > node are offline, then the cpumask_of_node() implementation, which the > sched code is using, should either return an empty cpumask (if > node_to_cpumask_map[nid] isn't freed) or cpu_online_mask. The change in > behavior here occurred because > cpu_hotplug-unmap-cpu2node-when-the-cpu-is-hotremoved.patch in -mm doesn't > return a valid node id and forces it to return -1 so a kzalloc_node(..., > -1) fallsback to allocate anywhere. > > But if you only need cpu_to_node() when waking up to find a runnable cpu > for this NUMA information, then I think you can just change the > kzalloc_node() in alloc_{fair,rt}_sched_group() to do > kzalloc(..., cpu_online(cpu) ? cpu_to_node(cpu) : NUMA_NO_NODE). > > [ The changelog here is confusing because it's fixing a problem in >linux-next without saying so. ] > I don't agree with this way. Because it only fix the code which causes a problem, and we can't say there is no any similar problem. So it is why I clear the cpu-to-node mapping. What about the following solution: 1. clear the cpu-to-node mapping when the node is offlined 2. tang's patch is still necessary because we leave !runnable tasks on whatever cpu they ran on last. If cpu's node is NUMA_NO_NODE, it means the entire node is offlined, and we must migrate the task to the other node. Thanks Wen Congyang -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.
On Tue, 9 Oct 2012, Peter Zijlstra wrote: > Well the code they were patching is in the wakeup path. As I think Tang > said, we leave !runnable tasks on whatever cpu they ran on last, even if > that cpu is offlined, we try and fix up state when we get a wakeup. > > On wakeup, it tries to find a cpu to run on and will try a cpu of the > same node first. > > Now if that node's entirely gone away, it appears the cpu_to_node() map > will not return a valid node number. > > I think that's a change in behaviour, it didn't used to do that afaik. > Certainly this code hasn't change in a while. > If cpu_to_node() always returns a valid node id even if all cpus on the node are offline, then the cpumask_of_node() implementation, which the sched code is using, should either return an empty cpumask (if node_to_cpumask_map[nid] isn't freed) or cpu_online_mask. The change in behavior here occurred because cpu_hotplug-unmap-cpu2node-when-the-cpu-is-hotremoved.patch in -mm doesn't return a valid node id and forces it to return -1 so a kzalloc_node(..., -1) fallsback to allocate anywhere. But if you only need cpu_to_node() when waking up to find a runnable cpu for this NUMA information, then I think you can just change the kzalloc_node() in alloc_{fair,rt}_sched_group() to do kzalloc(..., cpu_online(cpu) ? cpu_to_node(cpu) : NUMA_NO_NODE). [ The changelog here is confusing because it's fixing a problem in linux-next without saying so. ] -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.
On Tue, 2012-10-09 at 13:36 -0700, David Rientjes wrote: > On Tue, 9 Oct 2012, Peter Zijlstra wrote: > > > On Mon, 2012-10-08 at 10:59 +0800, Tang Chen wrote: > > > If a cpu is offline, its nid will be set to -1, and cpu_to_node(cpu) will > > > return -1. As a result, cpumask_of_node(nid) will return NULL. In this > > > case, > > > find_next_bit() in for_each_cpu will get a NULL pointer and cause panic. > > > > Hurm,. this is new, right? Who is changing all these semantics without > > auditing the tree and informing all affected people? > > > > I've nacked the patch that did it because I think it should be done from > the generic cpu hotplug code only at the CPU_DEAD level with a per-arch > callback to fixup whatever cpu-to-node mappings they maintain since > processes can reenter the scheduler at CPU_DYING. Well the code they were patching is in the wakeup path. As I think Tang said, we leave !runnable tasks on whatever cpu they ran on last, even if that cpu is offlined, we try and fix up state when we get a wakeup. On wakeup, it tries to find a cpu to run on and will try a cpu of the same node first. Now if that node's entirely gone away, it appears the cpu_to_node() map will not return a valid node number. I think that's a change in behaviour, it didn't used to do that afaik. Certainly this code hasn't change in a while. > The whole issue seems to be because alloc_{fair,rt}_sched_group() does an > iteration over all possible cpus (not all online cpus) and does > kzalloc_node() which references a now-offlined node. Changing it to -1 > makes the slab code fallback to any online node. Right, that's because the rq structures are assumed always present. What I cannot remember is why I'm not using per-cpu allocations there, because that's exactly what it looks like it wants to be. > What I think we need to do instead of hacking only the acpi code and not > standardizing this across the kernel is: Right, what I don't understand is wtf ACPI has to do with anything. We have plenty cpu hotplug code, ACPI isn't involved in any of that last time I checked. > - reset cpu-to-node with a per-arch callback in generic cpu hotplug code >at CPU_DEAD, and > > - do an iteration over all possible cpus for node hot-remove ensuring >there are no stale references. Why do we need to clear cpu-to-node maps? are we going to change the topology at runtime? What are you going to do with per-cpu stuff, per-cpu memory isn't freed on hotplug, so its node relation is static. /me confused.. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.
On Tue, 9 Oct 2012, Peter Zijlstra wrote: > On Mon, 2012-10-08 at 10:59 +0800, Tang Chen wrote: > > If a cpu is offline, its nid will be set to -1, and cpu_to_node(cpu) will > > return -1. As a result, cpumask_of_node(nid) will return NULL. In this case, > > find_next_bit() in for_each_cpu will get a NULL pointer and cause panic. > > Hurm,. this is new, right? Who is changing all these semantics without > auditing the tree and informing all affected people? > I've nacked the patch that did it because I think it should be done from the generic cpu hotplug code only at the CPU_DEAD level with a per-arch callback to fixup whatever cpu-to-node mappings they maintain since processes can reenter the scheduler at CPU_DYING. The whole issue seems to be because alloc_{fair,rt}_sched_group() does an iteration over all possible cpus (not all online cpus) and does kzalloc_node() which references a now-offlined node. Changing it to -1 makes the slab code fallback to any online node. What I think we need to do instead of hacking only the acpi code and not standardizing this across the kernel is: - reset cpu-to-node with a per-arch callback in generic cpu hotplug code at CPU_DEAD, and - do an iteration over all possible cpus for node hot-remove ensuring there are no stale references. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.
On Tue, 9 Oct 2012, Wen Congyang wrote: > I clear cpu-to-node mapping when the cpu is hotremoved. If the cpu is onlined, > it will be offlined before clearing cpu-to-node mapping. > > Here is the code in driver/acpi/processor_driver.c: > = > static int acpi_processor_handle_eject(struct acpi_processor *pr) > { > if (cpu_online(pr->id)) > cpu_down(pr->id); <== cpu is offlined here. > > arch_unregister_cpu(pr->id); > acpi_unmap_lsapic(pr->id); <== I clear the mapping here. > return (0); > } > = > > The real problem is that: we don't migrate this task to another cpu when > the cpu is offlined. I guess this task is not in running state, and it > is not in the cpu's runqueue. > ACPI is not the only way to offline a cpu, this all needs to be standardized throughout the kernel for each arch so that generic code like sched can use cpu_to_node() in a CPU_DYING callback. The only way to do that is by having arch callbacks to set cpu-to-node to NUMA_NO_NODE only at CPU_DEAD. > This patch is try to fix a bug: the kernel will be panicked after removing a > node. > That's much more subtle than having a NULL pointer dereference like Tang reports during migration. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.
On Mon, 2012-10-08 at 10:59 +0800, Tang Chen wrote: > If a cpu is offline, its nid will be set to -1, and cpu_to_node(cpu) will > return -1. As a result, cpumask_of_node(nid) will return NULL. In this case, > find_next_bit() in for_each_cpu will get a NULL pointer and cause panic. Hurm,. this is new, right? Who is changing all these semantics without auditing the tree and informing all affected people? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.
At 10/09/2012 06:04 PM, David Rientjes Wrote: > On Tue, 9 Oct 2012, Tang Chen wrote: > Eek, the nid shouldn't be -1 yet, though, for cpu hotplug since this should be called at CPU_DYING level and migrate_tasks() still sees a valid cpu. >> >> As Wen said below, nid is now set to -1 when cpu is hotremoved. >> I reproduce this problem in this situation: >> >> all cpus are online, and hot remove a system board directorily, without >> offlining any cpu. >> >> As a result, the removed cpu's nid is set to -1, and this causes >> problems. >> > > Let's add Andrew to the cc list then, because I'm nacking > cpu_hotplug-unmap-cpu2node-when-the-cpu-is-hotremoved.patch in the -mm > tree for this reason. > > We can only clear a cpu-to-node mapping when the cpu is completely > offline, not before or during the CPU_DYING stage. Kernel code, such as I clear cpu-to-node mapping when the cpu is hotremoved. If the cpu is onlined, it will be offlined before clearing cpu-to-node mapping. Here is the code in driver/acpi/processor_driver.c: = static int acpi_processor_handle_eject(struct acpi_processor *pr) { if (cpu_online(pr->id)) cpu_down(pr->id); <== cpu is offlined here. arch_unregister_cpu(pr->id); acpi_unmap_lsapic(pr->id); <== I clear the mapping here. return (0); } = The real problem is that: we don't migrate this task to another cpu when the cpu is offlined. I guess this task is not in running state, and it is not in the cpu's runqueue. Thanks Wen Congyang > the sched code that you are now trying to "fix", depends on this mapping > to work correctly; obviously no audit was done of cpu hotplug code > depending on it before the patch was proposed. > > I say "fix" because even this workaround isn't a good solution since it > would be much better to pick another cpu on the same node as the offlining > cpu for the runqueue before falling back to the set of all allowed nodes. > We lose all NUMA affinity information with that patch. There's no reason > why we shouldn't know the node of a cpu that is being offlined. > > So nack to cpu_hotplug-unmap-cpu2node-when-the-cpu-is-hotremoved.patch. > After it's removed because it's buggy, this "fix" will no longer be > necessary. > This patch is try to fix a bug: the kernel will be panicked after removing a node. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.
On Tue, 9 Oct 2012, Tang Chen wrote: > > > Eek, the nid shouldn't be -1 yet, though, for cpu hotplug since this > > > should be called at CPU_DYING level and migrate_tasks() still sees a valid > > > cpu. > > As Wen said below, nid is now set to -1 when cpu is hotremoved. > I reproduce this problem in this situation: > > all cpus are online, and hot remove a system board directorily, without > offlining any cpu. > > As a result, the removed cpu's nid is set to -1, and this causes > problems. > Let's add Andrew to the cc list then, because I'm nacking cpu_hotplug-unmap-cpu2node-when-the-cpu-is-hotremoved.patch in the -mm tree for this reason. We can only clear a cpu-to-node mapping when the cpu is completely offline, not before or during the CPU_DYING stage. Kernel code, such as the sched code that you are now trying to "fix", depends on this mapping to work correctly; obviously no audit was done of cpu hotplug code depending on it before the patch was proposed. I say "fix" because even this workaround isn't a good solution since it would be much better to pick another cpu on the same node as the offlining cpu for the runqueue before falling back to the set of all allowed nodes. We lose all NUMA affinity information with that patch. There's no reason why we shouldn't know the node of a cpu that is being offlined. So nack to cpu_hotplug-unmap-cpu2node-when-the-cpu-is-hotremoved.patch. After it's removed because it's buggy, this "fix" will no longer be necessary. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.
Hi David, Thanks for reviewing this patch. :) On 10/09/2012 04:34 PM, Wen Congyang wrote: At 10/09/2012 02:21 PM, David Rientjes Wrote: On Mon, 8 Oct 2012, Tang Chen wrote: + /* If the cpu has been offlined, its nid was set to -1. */ + if (nid != -1) { NUMA_NO_NODE. Yes, NUMA_NO_NODE is better. I'll improve it, thanks. :) Eek, the nid shouldn't be -1 yet, though, for cpu hotplug since this should be called at CPU_DYING level and migrate_tasks() still sees a valid cpu. As Wen said below, nid is now set to -1 when cpu is hotremoved. I reproduce this problem in this situation: all cpus are online, and hot remove a system board directorily, without offlining any cpu. As a result, the removed cpu's nid is set to -1, and this causes problems. the cpu's node is set when the cpu is hotpluged(not online), and it will be cleared when the cpu is hotremoved(This patch is in akpm tree): https://lkml.org/lkml/2012/9/3/39 I guess the task is in sleep state when the cpu is offlined, and it doesn't be migrated to another cpu. Thanks Wen Congyang On x86, cpumask_of_node() is always guaranteed to return a valid cpumask after boot so presumably this is a problem in some non-x86 arch code and isn't actually a sched problem. BTW, my box is x86. I think for the reason I said above, cpumask_of_node() will no longer guaranteed anything even if on x86. Thanks. :) + nodemask = cpumask_of_node(nid); + + /* Look for allowed, online CPU in same node. */ + for_each_cpu(dest_cpu, nodemask) { + if (!cpu_online(dest_cpu)) + continue; + if (!cpu_active(dest_cpu)) + continue; + if (cpumask_test_cpu(dest_cpu, tsk_cpus_allowed(p))) + return dest_cpu; + } } for (;;) { -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.
At 10/09/2012 02:21 PM, David Rientjes Wrote: > On Mon, 8 Oct 2012, Tang Chen wrote: > >> diff --git a/kernel/sched/core.c b/kernel/sched/core.c >> index 66b36ab..e76dce9 100644 >> --- a/kernel/sched/core.c >> +++ b/kernel/sched/core.c >> @@ -1263,18 +1263,24 @@ EXPORT_SYMBOL_GPL(kick_process); >> */ >> static int select_fallback_rq(int cpu, struct task_struct *p) >> { >> -const struct cpumask *nodemask = cpumask_of_node(cpu_to_node(cpu)); >> +int nid = cpu_to_node(cpu); >> +const struct cpumask *nodemask = NULL; >> enum { cpuset, possible, fail } state = cpuset; >> int dest_cpu; >> >> -/* Look for allowed, online CPU in same node. */ >> -for_each_cpu(dest_cpu, nodemask) { >> -if (!cpu_online(dest_cpu)) >> -continue; >> -if (!cpu_active(dest_cpu)) >> -continue; >> -if (cpumask_test_cpu(dest_cpu, tsk_cpus_allowed(p))) >> -return dest_cpu; >> +/* If the cpu has been offlined, its nid was set to -1. */ >> +if (nid != -1) { > > NUMA_NO_NODE. > > Eek, the nid shouldn't be -1 yet, though, for cpu hotplug since this > should be called at CPU_DYING level and migrate_tasks() still sees a valid > cpu. the cpu's node is set when the cpu is hotpluged(not online), and it will be cleared when the cpu is hotremoved(This patch is in akpm tree): https://lkml.org/lkml/2012/9/3/39 I guess the task is in sleep state when the cpu is offlined, and it doesn't be migrated to another cpu. Thanks Wen Congyang > > On x86, cpumask_of_node() is always guaranteed to return a valid cpumask > after boot so presumably this is a problem in some non-x86 arch code and > isn't actually a sched problem. > >> +nodemask = cpumask_of_node(nid); >> + >> +/* Look for allowed, online CPU in same node. */ >> +for_each_cpu(dest_cpu, nodemask) { >> +if (!cpu_online(dest_cpu)) >> +continue; >> +if (!cpu_active(dest_cpu)) >> +continue; >> +if (cpumask_test_cpu(dest_cpu, tsk_cpus_allowed(p))) >> +return dest_cpu; >> +} >> } >> >> for (;;) { > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.
On Mon, 8 Oct 2012, Tang Chen wrote: > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > index 66b36ab..e76dce9 100644 > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -1263,18 +1263,24 @@ EXPORT_SYMBOL_GPL(kick_process); > */ > static int select_fallback_rq(int cpu, struct task_struct *p) > { > - const struct cpumask *nodemask = cpumask_of_node(cpu_to_node(cpu)); > + int nid = cpu_to_node(cpu); > + const struct cpumask *nodemask = NULL; > enum { cpuset, possible, fail } state = cpuset; > int dest_cpu; > > - /* Look for allowed, online CPU in same node. */ > - for_each_cpu(dest_cpu, nodemask) { > - if (!cpu_online(dest_cpu)) > - continue; > - if (!cpu_active(dest_cpu)) > - continue; > - if (cpumask_test_cpu(dest_cpu, tsk_cpus_allowed(p))) > - return dest_cpu; > + /* If the cpu has been offlined, its nid was set to -1. */ > + if (nid != -1) { NUMA_NO_NODE. Eek, the nid shouldn't be -1 yet, though, for cpu hotplug since this should be called at CPU_DYING level and migrate_tasks() still sees a valid cpu. On x86, cpumask_of_node() is always guaranteed to return a valid cpumask after boot so presumably this is a problem in some non-x86 arch code and isn't actually a sched problem. > + nodemask = cpumask_of_node(nid); > + > + /* Look for allowed, online CPU in same node. */ > + for_each_cpu(dest_cpu, nodemask) { > + if (!cpu_online(dest_cpu)) > + continue; > + if (!cpu_active(dest_cpu)) > + continue; > + if (cpumask_test_cpu(dest_cpu, tsk_cpus_allowed(p))) > + return dest_cpu; > + } > } > > for (;;) { -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.
On Mon, 8 Oct 2012, Tang Chen wrote: diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 66b36ab..e76dce9 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -1263,18 +1263,24 @@ EXPORT_SYMBOL_GPL(kick_process); */ static int select_fallback_rq(int cpu, struct task_struct *p) { - const struct cpumask *nodemask = cpumask_of_node(cpu_to_node(cpu)); + int nid = cpu_to_node(cpu); + const struct cpumask *nodemask = NULL; enum { cpuset, possible, fail } state = cpuset; int dest_cpu; - /* Look for allowed, online CPU in same node. */ - for_each_cpu(dest_cpu, nodemask) { - if (!cpu_online(dest_cpu)) - continue; - if (!cpu_active(dest_cpu)) - continue; - if (cpumask_test_cpu(dest_cpu, tsk_cpus_allowed(p))) - return dest_cpu; + /* If the cpu has been offlined, its nid was set to -1. */ + if (nid != -1) { NUMA_NO_NODE. Eek, the nid shouldn't be -1 yet, though, for cpu hotplug since this should be called at CPU_DYING level and migrate_tasks() still sees a valid cpu. On x86, cpumask_of_node() is always guaranteed to return a valid cpumask after boot so presumably this is a problem in some non-x86 arch code and isn't actually a sched problem. + nodemask = cpumask_of_node(nid); + + /* Look for allowed, online CPU in same node. */ + for_each_cpu(dest_cpu, nodemask) { + if (!cpu_online(dest_cpu)) + continue; + if (!cpu_active(dest_cpu)) + continue; + if (cpumask_test_cpu(dest_cpu, tsk_cpus_allowed(p))) + return dest_cpu; + } } for (;;) { -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.
At 10/09/2012 02:21 PM, David Rientjes Wrote: On Mon, 8 Oct 2012, Tang Chen wrote: diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 66b36ab..e76dce9 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -1263,18 +1263,24 @@ EXPORT_SYMBOL_GPL(kick_process); */ static int select_fallback_rq(int cpu, struct task_struct *p) { -const struct cpumask *nodemask = cpumask_of_node(cpu_to_node(cpu)); +int nid = cpu_to_node(cpu); +const struct cpumask *nodemask = NULL; enum { cpuset, possible, fail } state = cpuset; int dest_cpu; -/* Look for allowed, online CPU in same node. */ -for_each_cpu(dest_cpu, nodemask) { -if (!cpu_online(dest_cpu)) -continue; -if (!cpu_active(dest_cpu)) -continue; -if (cpumask_test_cpu(dest_cpu, tsk_cpus_allowed(p))) -return dest_cpu; +/* If the cpu has been offlined, its nid was set to -1. */ +if (nid != -1) { NUMA_NO_NODE. Eek, the nid shouldn't be -1 yet, though, for cpu hotplug since this should be called at CPU_DYING level and migrate_tasks() still sees a valid cpu. the cpu's node is set when the cpu is hotpluged(not online), and it will be cleared when the cpu is hotremoved(This patch is in akpm tree): https://lkml.org/lkml/2012/9/3/39 I guess the task is in sleep state when the cpu is offlined, and it doesn't be migrated to another cpu. Thanks Wen Congyang On x86, cpumask_of_node() is always guaranteed to return a valid cpumask after boot so presumably this is a problem in some non-x86 arch code and isn't actually a sched problem. +nodemask = cpumask_of_node(nid); + +/* Look for allowed, online CPU in same node. */ +for_each_cpu(dest_cpu, nodemask) { +if (!cpu_online(dest_cpu)) +continue; +if (!cpu_active(dest_cpu)) +continue; +if (cpumask_test_cpu(dest_cpu, tsk_cpus_allowed(p))) +return dest_cpu; +} } for (;;) { -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.
Hi David, Thanks for reviewing this patch. :) On 10/09/2012 04:34 PM, Wen Congyang wrote: At 10/09/2012 02:21 PM, David Rientjes Wrote: On Mon, 8 Oct 2012, Tang Chen wrote: + /* If the cpu has been offlined, its nid was set to -1. */ + if (nid != -1) { NUMA_NO_NODE. Yes, NUMA_NO_NODE is better. I'll improve it, thanks. :) Eek, the nid shouldn't be -1 yet, though, for cpu hotplug since this should be called at CPU_DYING level and migrate_tasks() still sees a valid cpu. As Wen said below, nid is now set to -1 when cpu is hotremoved. I reproduce this problem in this situation: all cpus are online, and hot remove a system board directorily, without offlining any cpu. As a result, the removed cpu's nid is set to -1, and this causes problems. the cpu's node is set when the cpu is hotpluged(not online), and it will be cleared when the cpu is hotremoved(This patch is in akpm tree): https://lkml.org/lkml/2012/9/3/39 I guess the task is in sleep state when the cpu is offlined, and it doesn't be migrated to another cpu. Thanks Wen Congyang On x86, cpumask_of_node() is always guaranteed to return a valid cpumask after boot so presumably this is a problem in some non-x86 arch code and isn't actually a sched problem. BTW, my box is x86. I think for the reason I said above, cpumask_of_node() will no longer guaranteed anything even if on x86. Thanks. :) + nodemask = cpumask_of_node(nid); + + /* Look for allowed, online CPU in same node. */ + for_each_cpu(dest_cpu, nodemask) { + if (!cpu_online(dest_cpu)) + continue; + if (!cpu_active(dest_cpu)) + continue; + if (cpumask_test_cpu(dest_cpu, tsk_cpus_allowed(p))) + return dest_cpu; + } } for (;;) { -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.
On Tue, 9 Oct 2012, Tang Chen wrote: Eek, the nid shouldn't be -1 yet, though, for cpu hotplug since this should be called at CPU_DYING level and migrate_tasks() still sees a valid cpu. As Wen said below, nid is now set to -1 when cpu is hotremoved. I reproduce this problem in this situation: all cpus are online, and hot remove a system board directorily, without offlining any cpu. As a result, the removed cpu's nid is set to -1, and this causes problems. Let's add Andrew to the cc list then, because I'm nacking cpu_hotplug-unmap-cpu2node-when-the-cpu-is-hotremoved.patch in the -mm tree for this reason. We can only clear a cpu-to-node mapping when the cpu is completely offline, not before or during the CPU_DYING stage. Kernel code, such as the sched code that you are now trying to fix, depends on this mapping to work correctly; obviously no audit was done of cpu hotplug code depending on it before the patch was proposed. I say fix because even this workaround isn't a good solution since it would be much better to pick another cpu on the same node as the offlining cpu for the runqueue before falling back to the set of all allowed nodes. We lose all NUMA affinity information with that patch. There's no reason why we shouldn't know the node of a cpu that is being offlined. So nack to cpu_hotplug-unmap-cpu2node-when-the-cpu-is-hotremoved.patch. After it's removed because it's buggy, this fix will no longer be necessary. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.
At 10/09/2012 06:04 PM, David Rientjes Wrote: On Tue, 9 Oct 2012, Tang Chen wrote: Eek, the nid shouldn't be -1 yet, though, for cpu hotplug since this should be called at CPU_DYING level and migrate_tasks() still sees a valid cpu. As Wen said below, nid is now set to -1 when cpu is hotremoved. I reproduce this problem in this situation: all cpus are online, and hot remove a system board directorily, without offlining any cpu. As a result, the removed cpu's nid is set to -1, and this causes problems. Let's add Andrew to the cc list then, because I'm nacking cpu_hotplug-unmap-cpu2node-when-the-cpu-is-hotremoved.patch in the -mm tree for this reason. We can only clear a cpu-to-node mapping when the cpu is completely offline, not before or during the CPU_DYING stage. Kernel code, such as I clear cpu-to-node mapping when the cpu is hotremoved. If the cpu is onlined, it will be offlined before clearing cpu-to-node mapping. Here is the code in driver/acpi/processor_driver.c: = static int acpi_processor_handle_eject(struct acpi_processor *pr) { if (cpu_online(pr-id)) cpu_down(pr-id); == cpu is offlined here. arch_unregister_cpu(pr-id); acpi_unmap_lsapic(pr-id); == I clear the mapping here. return (0); } = The real problem is that: we don't migrate this task to another cpu when the cpu is offlined. I guess this task is not in running state, and it is not in the cpu's runqueue. Thanks Wen Congyang the sched code that you are now trying to fix, depends on this mapping to work correctly; obviously no audit was done of cpu hotplug code depending on it before the patch was proposed. I say fix because even this workaround isn't a good solution since it would be much better to pick another cpu on the same node as the offlining cpu for the runqueue before falling back to the set of all allowed nodes. We lose all NUMA affinity information with that patch. There's no reason why we shouldn't know the node of a cpu that is being offlined. So nack to cpu_hotplug-unmap-cpu2node-when-the-cpu-is-hotremoved.patch. After it's removed because it's buggy, this fix will no longer be necessary. This patch is try to fix a bug: the kernel will be panicked after removing a node. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.
On Mon, 2012-10-08 at 10:59 +0800, Tang Chen wrote: If a cpu is offline, its nid will be set to -1, and cpu_to_node(cpu) will return -1. As a result, cpumask_of_node(nid) will return NULL. In this case, find_next_bit() in for_each_cpu will get a NULL pointer and cause panic. Hurm,. this is new, right? Who is changing all these semantics without auditing the tree and informing all affected people? -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.
On Tue, 9 Oct 2012, Wen Congyang wrote: I clear cpu-to-node mapping when the cpu is hotremoved. If the cpu is onlined, it will be offlined before clearing cpu-to-node mapping. Here is the code in driver/acpi/processor_driver.c: = static int acpi_processor_handle_eject(struct acpi_processor *pr) { if (cpu_online(pr-id)) cpu_down(pr-id); == cpu is offlined here. arch_unregister_cpu(pr-id); acpi_unmap_lsapic(pr-id); == I clear the mapping here. return (0); } = The real problem is that: we don't migrate this task to another cpu when the cpu is offlined. I guess this task is not in running state, and it is not in the cpu's runqueue. ACPI is not the only way to offline a cpu, this all needs to be standardized throughout the kernel for each arch so that generic code like sched can use cpu_to_node() in a CPU_DYING callback. The only way to do that is by having arch callbacks to set cpu-to-node to NUMA_NO_NODE only at CPU_DEAD. This patch is try to fix a bug: the kernel will be panicked after removing a node. That's much more subtle than having a NULL pointer dereference like Tang reports during migration. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.
On Tue, 9 Oct 2012, Peter Zijlstra wrote: On Mon, 2012-10-08 at 10:59 +0800, Tang Chen wrote: If a cpu is offline, its nid will be set to -1, and cpu_to_node(cpu) will return -1. As a result, cpumask_of_node(nid) will return NULL. In this case, find_next_bit() in for_each_cpu will get a NULL pointer and cause panic. Hurm,. this is new, right? Who is changing all these semantics without auditing the tree and informing all affected people? I've nacked the patch that did it because I think it should be done from the generic cpu hotplug code only at the CPU_DEAD level with a per-arch callback to fixup whatever cpu-to-node mappings they maintain since processes can reenter the scheduler at CPU_DYING. The whole issue seems to be because alloc_{fair,rt}_sched_group() does an iteration over all possible cpus (not all online cpus) and does kzalloc_node() which references a now-offlined node. Changing it to -1 makes the slab code fallback to any online node. What I think we need to do instead of hacking only the acpi code and not standardizing this across the kernel is: - reset cpu-to-node with a per-arch callback in generic cpu hotplug code at CPU_DEAD, and - do an iteration over all possible cpus for node hot-remove ensuring there are no stale references. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.
On Tue, 2012-10-09 at 13:36 -0700, David Rientjes wrote: On Tue, 9 Oct 2012, Peter Zijlstra wrote: On Mon, 2012-10-08 at 10:59 +0800, Tang Chen wrote: If a cpu is offline, its nid will be set to -1, and cpu_to_node(cpu) will return -1. As a result, cpumask_of_node(nid) will return NULL. In this case, find_next_bit() in for_each_cpu will get a NULL pointer and cause panic. Hurm,. this is new, right? Who is changing all these semantics without auditing the tree and informing all affected people? I've nacked the patch that did it because I think it should be done from the generic cpu hotplug code only at the CPU_DEAD level with a per-arch callback to fixup whatever cpu-to-node mappings they maintain since processes can reenter the scheduler at CPU_DYING. Well the code they were patching is in the wakeup path. As I think Tang said, we leave !runnable tasks on whatever cpu they ran on last, even if that cpu is offlined, we try and fix up state when we get a wakeup. On wakeup, it tries to find a cpu to run on and will try a cpu of the same node first. Now if that node's entirely gone away, it appears the cpu_to_node() map will not return a valid node number. I think that's a change in behaviour, it didn't used to do that afaik. Certainly this code hasn't change in a while. The whole issue seems to be because alloc_{fair,rt}_sched_group() does an iteration over all possible cpus (not all online cpus) and does kzalloc_node() which references a now-offlined node. Changing it to -1 makes the slab code fallback to any online node. Right, that's because the rq structures are assumed always present. What I cannot remember is why I'm not using per-cpu allocations there, because that's exactly what it looks like it wants to be. What I think we need to do instead of hacking only the acpi code and not standardizing this across the kernel is: Right, what I don't understand is wtf ACPI has to do with anything. We have plenty cpu hotplug code, ACPI isn't involved in any of that last time I checked. - reset cpu-to-node with a per-arch callback in generic cpu hotplug code at CPU_DEAD, and - do an iteration over all possible cpus for node hot-remove ensuring there are no stale references. Why do we need to clear cpu-to-node maps? are we going to change the topology at runtime? What are you going to do with per-cpu stuff, per-cpu memory isn't freed on hotplug, so its node relation is static. /me confused.. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.
On Tue, 9 Oct 2012, Peter Zijlstra wrote: Well the code they were patching is in the wakeup path. As I think Tang said, we leave !runnable tasks on whatever cpu they ran on last, even if that cpu is offlined, we try and fix up state when we get a wakeup. On wakeup, it tries to find a cpu to run on and will try a cpu of the same node first. Now if that node's entirely gone away, it appears the cpu_to_node() map will not return a valid node number. I think that's a change in behaviour, it didn't used to do that afaik. Certainly this code hasn't change in a while. If cpu_to_node() always returns a valid node id even if all cpus on the node are offline, then the cpumask_of_node() implementation, which the sched code is using, should either return an empty cpumask (if node_to_cpumask_map[nid] isn't freed) or cpu_online_mask. The change in behavior here occurred because cpu_hotplug-unmap-cpu2node-when-the-cpu-is-hotremoved.patch in -mm doesn't return a valid node id and forces it to return -1 so a kzalloc_node(..., -1) fallsback to allocate anywhere. But if you only need cpu_to_node() when waking up to find a runnable cpu for this NUMA information, then I think you can just change the kzalloc_node() in alloc_{fair,rt}_sched_group() to do kzalloc(..., cpu_online(cpu) ? cpu_to_node(cpu) : NUMA_NO_NODE). [ The changelog here is confusing because it's fixing a problem in linux-next without saying so. ] -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.
At 10/10/2012 07:27 AM, David Rientjes Wrote: On Tue, 9 Oct 2012, Peter Zijlstra wrote: Well the code they were patching is in the wakeup path. As I think Tang said, we leave !runnable tasks on whatever cpu they ran on last, even if that cpu is offlined, we try and fix up state when we get a wakeup. On wakeup, it tries to find a cpu to run on and will try a cpu of the same node first. Now if that node's entirely gone away, it appears the cpu_to_node() map will not return a valid node number. I think that's a change in behaviour, it didn't used to do that afaik. Certainly this code hasn't change in a while. If cpu_to_node() always returns a valid node id even if all cpus on the node are offline, then the cpumask_of_node() implementation, which the sched code is using, should either return an empty cpumask (if node_to_cpumask_map[nid] isn't freed) or cpu_online_mask. The change in behavior here occurred because cpu_hotplug-unmap-cpu2node-when-the-cpu-is-hotremoved.patch in -mm doesn't return a valid node id and forces it to return -1 so a kzalloc_node(..., -1) fallsback to allocate anywhere. But if you only need cpu_to_node() when waking up to find a runnable cpu for this NUMA information, then I think you can just change the kzalloc_node() in alloc_{fair,rt}_sched_group() to do kzalloc(..., cpu_online(cpu) ? cpu_to_node(cpu) : NUMA_NO_NODE). [ The changelog here is confusing because it's fixing a problem in linux-next without saying so. ] I don't agree with this way. Because it only fix the code which causes a problem, and we can't say there is no any similar problem. So it is why I clear the cpu-to-node mapping. What about the following solution: 1. clear the cpu-to-node mapping when the node is offlined 2. tang's patch is still necessary because we leave !runnable tasks on whatever cpu they ran on last. If cpu's node is NUMA_NO_NODE, it means the entire node is offlined, and we must migrate the task to the other node. Thanks Wen Congyang -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.
At 10/10/2012 10:06 AM, Wen Congyang Wrote: At 10/10/2012 07:27 AM, David Rientjes Wrote: On Tue, 9 Oct 2012, Peter Zijlstra wrote: Well the code they were patching is in the wakeup path. As I think Tang said, we leave !runnable tasks on whatever cpu they ran on last, even if that cpu is offlined, we try and fix up state when we get a wakeup. On wakeup, it tries to find a cpu to run on and will try a cpu of the same node first. Now if that node's entirely gone away, it appears the cpu_to_node() map will not return a valid node number. I think that's a change in behaviour, it didn't used to do that afaik. Certainly this code hasn't change in a while. If cpu_to_node() always returns a valid node id even if all cpus on the node are offline, then the cpumask_of_node() implementation, which the sched code is using, should either return an empty cpumask (if node_to_cpumask_map[nid] isn't freed) or cpu_online_mask. The change in behavior here occurred because cpu_hotplug-unmap-cpu2node-when-the-cpu-is-hotremoved.patch in -mm doesn't return a valid node id and forces it to return -1 so a kzalloc_node(..., -1) fallsback to allocate anywhere. But if you only need cpu_to_node() when waking up to find a runnable cpu for this NUMA information, then I think you can just change the kzalloc_node() in alloc_{fair,rt}_sched_group() to do kzalloc(..., cpu_online(cpu) ? cpu_to_node(cpu) : NUMA_NO_NODE). [ The changelog here is confusing because it's fixing a problem in linux-next without saying so. ] I don't agree with this way. Because it only fix the code which causes a problem, and we can't say there is no any similar problem. So it is why I clear the cpu-to-node mapping. What about the following solution: 1. clear the cpu-to-node mapping when the node is offlined There is no interface to online/offline a node. We online a node only when the cpu/memory is node, and offline it when all cpu/memory in this node is offlined(TODO). So we may need to map cpu-to-node when the cpu is onlined if clear it when the node is offlined. But we don't know the cpu's node. Thanks Wen Congyang 2. tang's patch is still necessary because we leave !runnable tasks on whatever cpu they ran on last. If cpu's node is NUMA_NO_NODE, it means the entire node is offlined, and we must migrate the task to the other node. Thanks Wen Congyang -- To unsubscribe from this list: send the line unsubscribe linux-numa in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] Do not use cpu_to_node() to find an offlined cpu's node.
If a cpu is offline, its nid will be set to -1, and cpu_to_node(cpu) will return -1. As a result, cpumask_of_node(nid) will return NULL. In this case, find_next_bit() in for_each_cpu will get a NULL pointer and cause panic. Here is a call trace: [ 609.824017] Call Trace: [ 609.824017] [ 609.824017] [] select_fallback_rq+0x71/0x190 [ 609.824017] [] ? try_to_wake_up+0x2e/0x2f0 [ 609.824017] [] try_to_wake_up+0x2cb/0x2f0 [ 609.824017] [] ? __run_hrtimer+0x78/0x320 [ 609.824017] [] wake_up_process+0x15/0x20 [ 609.824017] [] hrtimer_wakeup+0x22/0x30 [ 609.824017] [] __run_hrtimer+0x83/0x320 [ 609.824017] [] ? update_rmtp+0x80/0x80 [ 609.824017] [] hrtimer_interrupt+0x106/0x280 [ 609.824017] [] ? sd_free_ctl_entry+0x68/0x70 [ 609.824017] [] smp_apic_timer_interrupt+0x69/0x99 [ 609.824017] [] apic_timer_interrupt+0x6f/0x80 There is a hrtimer process sleeping, whose cpu has already been offlined. When it is waken up, it tries to find another cpu to run, and get a -1 nid. As a result, cpumask_of_node(-1) returns NULL, and causes ernel panic. This patch fixes this problem by judging if the nid is -1. If nid is not -1, a cpu on the same node will be picked. Else, a online cpu on another node will be picked. Signed-off-by: Tang Chen --- kernel/sched/core.c | 24 +++- 1 files changed, 15 insertions(+), 9 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 66b36ab..e76dce9 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -1263,18 +1263,24 @@ EXPORT_SYMBOL_GPL(kick_process); */ static int select_fallback_rq(int cpu, struct task_struct *p) { - const struct cpumask *nodemask = cpumask_of_node(cpu_to_node(cpu)); + int nid = cpu_to_node(cpu); + const struct cpumask *nodemask = NULL; enum { cpuset, possible, fail } state = cpuset; int dest_cpu; - /* Look for allowed, online CPU in same node. */ - for_each_cpu(dest_cpu, nodemask) { - if (!cpu_online(dest_cpu)) - continue; - if (!cpu_active(dest_cpu)) - continue; - if (cpumask_test_cpu(dest_cpu, tsk_cpus_allowed(p))) - return dest_cpu; + /* If the cpu has been offlined, its nid was set to -1. */ + if (nid != -1) { + nodemask = cpumask_of_node(nid); + + /* Look for allowed, online CPU in same node. */ + for_each_cpu(dest_cpu, nodemask) { + if (!cpu_online(dest_cpu)) + continue; + if (!cpu_active(dest_cpu)) + continue; + if (cpumask_test_cpu(dest_cpu, tsk_cpus_allowed(p))) + return dest_cpu; + } } for (;;) { -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] Do not use cpu_to_node() to find an offlined cpu's node.
If a cpu is offline, its nid will be set to -1, and cpu_to_node(cpu) will return -1. As a result, cpumask_of_node(nid) will return NULL. In this case, find_next_bit() in for_each_cpu will get a NULL pointer and cause panic. Here is a call trace: [ 609.824017] Call Trace: [ 609.824017] IRQ [ 609.824017] [810b0721] select_fallback_rq+0x71/0x190 [ 609.824017] [810b086e] ? try_to_wake_up+0x2e/0x2f0 [ 609.824017] [810b0b0b] try_to_wake_up+0x2cb/0x2f0 [ 609.824017] [8109da08] ? __run_hrtimer+0x78/0x320 [ 609.824017] [810b0b85] wake_up_process+0x15/0x20 [ 609.824017] [8109ce62] hrtimer_wakeup+0x22/0x30 [ 609.824017] [8109da13] __run_hrtimer+0x83/0x320 [ 609.824017] [8109ce40] ? update_rmtp+0x80/0x80 [ 609.824017] [8109df56] hrtimer_interrupt+0x106/0x280 [ 609.824017] [810a72c8] ? sd_free_ctl_entry+0x68/0x70 [ 609.824017] [8167cf39] smp_apic_timer_interrupt+0x69/0x99 [ 609.824017] [8167be2f] apic_timer_interrupt+0x6f/0x80 There is a hrtimer process sleeping, whose cpu has already been offlined. When it is waken up, it tries to find another cpu to run, and get a -1 nid. As a result, cpumask_of_node(-1) returns NULL, and causes ernel panic. This patch fixes this problem by judging if the nid is -1. If nid is not -1, a cpu on the same node will be picked. Else, a online cpu on another node will be picked. Signed-off-by: Tang Chen tangc...@cn.fujitsu.com --- kernel/sched/core.c | 24 +++- 1 files changed, 15 insertions(+), 9 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 66b36ab..e76dce9 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -1263,18 +1263,24 @@ EXPORT_SYMBOL_GPL(kick_process); */ static int select_fallback_rq(int cpu, struct task_struct *p) { - const struct cpumask *nodemask = cpumask_of_node(cpu_to_node(cpu)); + int nid = cpu_to_node(cpu); + const struct cpumask *nodemask = NULL; enum { cpuset, possible, fail } state = cpuset; int dest_cpu; - /* Look for allowed, online CPU in same node. */ - for_each_cpu(dest_cpu, nodemask) { - if (!cpu_online(dest_cpu)) - continue; - if (!cpu_active(dest_cpu)) - continue; - if (cpumask_test_cpu(dest_cpu, tsk_cpus_allowed(p))) - return dest_cpu; + /* If the cpu has been offlined, its nid was set to -1. */ + if (nid != -1) { + nodemask = cpumask_of_node(nid); + + /* Look for allowed, online CPU in same node. */ + for_each_cpu(dest_cpu, nodemask) { + if (!cpu_online(dest_cpu)) + continue; + if (!cpu_active(dest_cpu)) + continue; + if (cpumask_test_cpu(dest_cpu, tsk_cpus_allowed(p))) + return dest_cpu; + } } for (;;) { -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/