Re: sched: CPU #1's llc-sibling CPU #0 is not on the same node!
On Fri, Mar 1, 2013 at 7:03 PM, chen tang wrote: > > Thank you for your suggestion and fix work. :) > I would prefer your Plan b. But one last thing I want to confirm: > > Will "allocating pgdat and zone on local node" prevent node hot-removing ? > Or is it safe to free all node data when removing a node ? > AFAIK, no way to ensure node data is not on thread stack. Not sure. I need to go over the code. That is slub's limitation. If it is not, it should be fixed. > > If it is OK, I think Plan B is OK, and we can improve movablemem_map more in > the future. > > BTW, I didn't mean to deny your idea and work. NUMA performance is always > understand our consideration. > It's just we plan it as a long way development in the future. > movablemem_map is very important to us. And we do hope to keep it in kernel > now, and improve it later. That does not look like right way to do development with mainline tree to add new features. You don't need to put development/testing support patches in the mainline. Just put those support patches in your local tree. Everyone have bunch of development/debug/teststub patches in their own hardisk for their working area, but don't need put them into mainline tree. Good practice should be: Have the feature completely done in your local tree and etc. then send out several patchset. and get reviewed and get merged one by one. Sometime would turn out that your whole patchset has problem that can not be fixed during review, and should be redesign again. Mainline tree is NOT testbed. For pci-root-bus hotplug, I already had code done completely. Then send out patchset one by one to get completely review. One patchset about acpi-scan is totally rewritten by Rafael after he understood our needs with better and clean design. Now still have ioapic and iommu left, and those patchset have been in my local tree more than 6 months and I keep optimizing them. BTW, Please do not top-post later. Thanks Yinghai -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched: CPU #1's llc-sibling CPU #0 is not on the same node!
On Thu, Feb 28, 2013 at 11:43 PM, Yinghai Lu wrote: > [trim down CC list a bit] > > On Thu, Feb 28, 2013 at 9:00 PM, Yinghai Lu wrote: >> >> >> On Thursday, February 28, 2013, H. Peter Anvin wrote: >>> >>> On 02/28/2013 08:32 PM, Linus Torvalds wrote: >>> > Yingai, Andrew, >>> > is this ok with you two? >>> > >>> > Linus >>> >>> FWIW, it makes sense to me iff it resolves the problems >> >> >> I prefer to reverting all 8 patches. >> >> Actually I have worked out one patch that could solve all problems, but it >> is too intrusive that I do not want to split it to small pieces to post >> it. >> >> Leaving the movablemem_map related changes in the upstream tree, will >> prevent me from continuing to make memblock to be used to allocate page >> table on local node ram for hot add. >> >> Will send reverting patch and putting page table on local node patch around >> 10pm after I get home. > > Please check attached patches. > > Plan A. revert all 8 patches: > revert_movablemem_map.patch > > Plan B. fix movablemem_map: > kill_max_low_pfn_mapped.patch and fix_movablemem_map.patch > > fix_movablemem_map.patch is too risky, and need more test. > > Konrad, Stefano: > Can you check kill_max_low_pfn_mapped.patch and fix_movablemem_map.patch > on top of today's Linus tree to check if it breaks Xen? > Sorry, miss change in setup.c during split the patch. Thanks Yinghai fix_movablemem_map_v2.patch Description: Binary data
Re: sched: CPU #1's llc-sibling CPU #0 is not on the same node!
On 02/28/2013 11:55 PM, Yinghai Lu wrote: > > Let me try again: > > movablemem_map is broken idea or poor design. > Very much so. I have said this before: this is potentially useful during development/testing, but anyone who expects to actually tell their customers to use it is abusive. -hpa -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched: CPU #1's llc-sibling CPU #0 is not on the same node!
If NUMAQ is breaking real stuff we can kill it by marking it BROKEN. Rip-out is 3.10 at this stage. Ingo Molnar wrote: > >* Borislav Petkov wrote: > >> On Thu, Feb 28, 2013 at 10:37:10PM -0800, H. Peter Anvin wrote: >> > I'd be very happy to get the NUMAQ code ripped out. I am wondering >if >> > there are any reasons to keep any 32-bit x86 NUMA code at all. >> >> How much would it hurt us if we said 3.8 is the last kernel that >supported NUMAQ? >> If anyone wants the functionality, they should use 3.8 or older. > >v3.9 - any non-trivial patch in the stage of being contemplated near >the end of the >v3.9 merge window is most likely v3.10 material. > >Thanks, > > Ingo -- Sent from my mobile phone. Please excuse brevity and lack of formatting. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched: CPU #1's llc-sibling CPU #0 is not on the same node!
On 03/01/2013 03:43 PM, Yinghai Lu wrote: Please check attached patches. Plan A. revert all 8 patches: revert_movablemem_map.patch Plan B. fix movablemem_map: kill_max_low_pfn_mapped.patch and fix_movablemem_map.patch fix_movablemem_map.patch is too risky, and need more test. Hi Yinghai, In your Plan B, you allocated pgdat on local node, right ? -nd_pa = memblock_alloc_try_nid(nd_size, SMP_CACHE_BYTES, nid); +nd_pa = memblock_find_in_range_node(start, end, nd_size, + SMP_CACHE_BYTES, nid); Here, right ? Without movablemem_map, pgdat will be allocated successfully on local node, right ? If so, this will prevent node hot-plug, because as mentioned by Kamezawa, there is no way to ensure pgdat is not used by others on stack. I do hope you can stop putting pgdat and zone on local node for now. And improve it in the future. And I also hope you can apply my revert SRAT patch first, and then do your work. It will seem more clean to me. Thanks. :) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched: CPU #1's llc-sibling CPU #0 is not on the same node!
* Borislav Petkov wrote: > On Thu, Feb 28, 2013 at 10:37:10PM -0800, H. Peter Anvin wrote: > > I'd be very happy to get the NUMAQ code ripped out. I am wondering if > > there are any reasons to keep any 32-bit x86 NUMA code at all. > > How much would it hurt us if we said 3.8 is the last kernel that supported > NUMAQ? > If anyone wants the functionality, they should use 3.8 or older. v3.9 - any non-trivial patch in the stage of being contemplated near the end of the v3.9 merge window is most likely v3.10 material. Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched: CPU #1's llc-sibling CPU #0 is not on the same node!
On Thu, Feb 28, 2013 at 10:37:10PM -0800, H. Peter Anvin wrote: > I'd be very happy to get the NUMAQ code ripped out. I am wondering if > there are any reasons to keep any 32-bit x86 NUMA code at all. How much would it hurt us if we said 3.8 is the last kernel that supported NUMAQ? If anyone wants the functionality, they should use 3.8 or older. -- Regards/Gruss, Boris. Sent from a fat crate under my desk. Formatting is fine. -- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched: CPU #1's llc-sibling CPU #0 is not on the same node!
* H. Peter Anvin wrote: > On 02/25/2013 08:51 PM, Martin Bligh wrote: > >> Do you mean we can remove numaq x86 32bit code now? > > > > Wouldn't bother me at all. The machine is from 1995, end of life c. 2000? > > Was > > useful in the early days of getting NUMA up and running on Linux, but is > > now too > > old to be a museum piece, really. > > I'd be very happy to get the NUMAQ code ripped out. I am wondering if there > are > any reasons to keep any 32-bit x86 NUMA code at all. Not much I suspect. Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched: CPU #1's llc-sibling CPU #0 is not on the same node!
2013/03/01 17:02, Yinghai Lu wrote: On Thu, Feb 28, 2013 at 10:18 PM, Tang Chen wrote: On 03/01/2013 01:00 PM, Yinghai Lu wrote: On Thursday, February 28, 2013, H. Peter Anvin wrote: On 02/28/2013 08:32 PM, Linus Torvalds wrote: Yingai, Andrew, is this ok with you two? Linus FWIW, it makes sense to me iff it resolves the problems I prefer to reverting all 8 patches. Actually I have worked out one patch that could solve all problems, but it is too intrusive that I do not want to split it to small pieces to post it. Leaving the movablemem_map related changes in the upstream tree, will prevent me from continuing to make memblock to be used to allocate page table on local node ram for hot add. Hi Yinghai, Would you please give me a url to your code ? I don't think movablemem_map will block your work a lot. According to your description, you are modifying memblock to reserve some memory for local node pagetables, right ? My idea: current for hotadd mem, page table will from other nodes from slub. that is not right. that will prevent others nodes to be hot removed. If we use your idea, pglist_data and zone are also allocated from local node. In my understanding, pglist_data and zone cannot be deleted safely since there is no way to guarantee that nobody use them. So it means that all nodes cannot be hot removed. If you develop your idea, you should consider memory hot remove. Thanks, Yasuaki Ishimatsu To fix the problem a. make memblock still alive after booting. b. or have separated dynamical memblock. second way looks more clean. so alloc_low_pages will get initial page for page table from low range with slub. and later will get page table from its own just mapped range. Now need to make memblock more clean and remove hardcoded reference in those functions. Thanks Yinghai -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched: CPU #1's llc-sibling CPU #0 is not on the same node!
On Thu, Feb 28, 2013 at 10:37 PM, H. Peter Anvin wrote: > On 02/25/2013 08:51 PM, Martin Bligh wrote: >>> Do you mean we can remove numaq x86 32bit code now? >> >> Wouldn't bother me at all. The machine is from 1995, end of life c. 2000? >> Was useful in the early days of getting NUMA up and running on Linux, >> but is now too old to be a museum piece, really. >> > > I'd be very happy to get the NUMAQ code ripped out. I am wondering if > there are any reasons to keep any 32-bit x86 NUMA code at all. Agreed! -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched: CPU #1's llc-sibling CPU #0 is not on the same node!
On Thu, Feb 28, 2013 at 10:18 PM, Tang Chen wrote: > On 03/01/2013 01:00 PM, Yinghai Lu wrote: >> >> On Thursday, February 28, 2013, H. Peter Anvin wrote: >> >>> On 02/28/2013 08:32 PM, Linus Torvalds wrote: Yingai, Andrew, is this ok with you two? Linus >>> >>> >>> FWIW, it makes sense to me iff it resolves the problems >> >> >> >> I prefer to reverting all 8 patches. >> >> Actually I have worked out one patch that could solve all problems, but it >> is too intrusive that I do not want to split it to small pieces to >> post it. >> >> Leaving the movablemem_map related changes in the upstream tree, >> will prevent me from continuing to make memblock to be used to allocate >> page table on local node ram for hot add. > > > Hi Yinghai, > > Would you please give me a url to your code ? > > I don't think movablemem_map will block your work a lot. According to your > description, you are modifying memblock to reserve some memory for local > node pagetables, right ? My idea: current for hotadd mem, page table will from other nodes from slub. that is not right. that will prevent others nodes to be hot removed. To fix the problem a. make memblock still alive after booting. b. or have separated dynamical memblock. second way looks more clean. so alloc_low_pages will get initial page for page table from low range with slub. and later will get page table from its own just mapped range. Now need to make memblock more clean and remove hardcoded reference in those functions. Thanks Yinghai -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched: CPU #1's llc-sibling CPU #0 is not on the same node!
On Thu, Feb 28, 2013 at 10:18 PM, Tang Chen tangc...@cn.fujitsu.com wrote: On 03/01/2013 01:00 PM, Yinghai Lu wrote: On Thursday, February 28, 2013, H. Peter Anvin wrote: On 02/28/2013 08:32 PM, Linus Torvalds wrote: Yingai, Andrew, is this ok with you two? Linus FWIW, it makes sense to me iff it resolves the problems I prefer to reverting all 8 patches. Actually I have worked out one patch that could solve all problems, but it is too intrusive that I do not want to split it to small pieces to post it. Leaving the movablemem_map related changes in the upstream tree, will prevent me from continuing to make memblock to be used to allocate page table on local node ram for hot add. Hi Yinghai, Would you please give me a url to your code ? I don't think movablemem_map will block your work a lot. According to your description, you are modifying memblock to reserve some memory for local node pagetables, right ? My idea: current for hotadd mem, page table will from other nodes from slub. that is not right. that will prevent others nodes to be hot removed. To fix the problem a. make memblock still alive after booting. b. or have separated dynamical memblock. second way looks more clean. so alloc_low_pages will get initial page for page table from low range with slub. and later will get page table from its own just mapped range. Now need to make memblock more clean and remove hardcoded reference in those functions. Thanks Yinghai -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched: CPU #1's llc-sibling CPU #0 is not on the same node!
On Thu, Feb 28, 2013 at 10:37 PM, H. Peter Anvin h...@zytor.com wrote: On 02/25/2013 08:51 PM, Martin Bligh wrote: Do you mean we can remove numaq x86 32bit code now? Wouldn't bother me at all. The machine is from 1995, end of life c. 2000? Was useful in the early days of getting NUMA up and running on Linux, but is now too old to be a museum piece, really. I'd be very happy to get the NUMAQ code ripped out. I am wondering if there are any reasons to keep any 32-bit x86 NUMA code at all. Agreed! -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched: CPU #1's llc-sibling CPU #0 is not on the same node!
2013/03/01 17:02, Yinghai Lu wrote: On Thu, Feb 28, 2013 at 10:18 PM, Tang Chen tangc...@cn.fujitsu.com wrote: On 03/01/2013 01:00 PM, Yinghai Lu wrote: On Thursday, February 28, 2013, H. Peter Anvin wrote: On 02/28/2013 08:32 PM, Linus Torvalds wrote: Yingai, Andrew, is this ok with you two? Linus FWIW, it makes sense to me iff it resolves the problems I prefer to reverting all 8 patches. Actually I have worked out one patch that could solve all problems, but it is too intrusive that I do not want to split it to small pieces to post it. Leaving the movablemem_map related changes in the upstream tree, will prevent me from continuing to make memblock to be used to allocate page table on local node ram for hot add. Hi Yinghai, Would you please give me a url to your code ? I don't think movablemem_map will block your work a lot. According to your description, you are modifying memblock to reserve some memory for local node pagetables, right ? My idea: current for hotadd mem, page table will from other nodes from slub. that is not right. that will prevent others nodes to be hot removed. If we use your idea, pglist_data and zone are also allocated from local node. In my understanding, pglist_data and zone cannot be deleted safely since there is no way to guarantee that nobody use them. So it means that all nodes cannot be hot removed. If you develop your idea, you should consider memory hot remove. Thanks, Yasuaki Ishimatsu To fix the problem a. make memblock still alive after booting. b. or have separated dynamical memblock. second way looks more clean. so alloc_low_pages will get initial page for page table from low range with slub. and later will get page table from its own just mapped range. Now need to make memblock more clean and remove hardcoded reference in those functions. Thanks Yinghai -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched: CPU #1's llc-sibling CPU #0 is not on the same node!
* H. Peter Anvin h...@zytor.com wrote: On 02/25/2013 08:51 PM, Martin Bligh wrote: Do you mean we can remove numaq x86 32bit code now? Wouldn't bother me at all. The machine is from 1995, end of life c. 2000? Was useful in the early days of getting NUMA up and running on Linux, but is now too old to be a museum piece, really. I'd be very happy to get the NUMAQ code ripped out. I am wondering if there are any reasons to keep any 32-bit x86 NUMA code at all. Not much I suspect. Thanks, Ingo -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched: CPU #1's llc-sibling CPU #0 is not on the same node!
On Thu, Feb 28, 2013 at 10:37:10PM -0800, H. Peter Anvin wrote: I'd be very happy to get the NUMAQ code ripped out. I am wondering if there are any reasons to keep any 32-bit x86 NUMA code at all. How much would it hurt us if we said 3.8 is the last kernel that supported NUMAQ? If anyone wants the functionality, they should use 3.8 or older. -- Regards/Gruss, Boris. Sent from a fat crate under my desk. Formatting is fine. -- -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched: CPU #1's llc-sibling CPU #0 is not on the same node!
* Borislav Petkov b...@alien8.de wrote: On Thu, Feb 28, 2013 at 10:37:10PM -0800, H. Peter Anvin wrote: I'd be very happy to get the NUMAQ code ripped out. I am wondering if there are any reasons to keep any 32-bit x86 NUMA code at all. How much would it hurt us if we said 3.8 is the last kernel that supported NUMAQ? If anyone wants the functionality, they should use 3.8 or older. v3.9 - any non-trivial patch in the stage of being contemplated near the end of the v3.9 merge window is most likely v3.10 material. Thanks, Ingo -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched: CPU #1's llc-sibling CPU #0 is not on the same node!
On 03/01/2013 03:43 PM, Yinghai Lu wrote: Please check attached patches. Plan A. revert all 8 patches: revert_movablemem_map.patch Plan B. fix movablemem_map: kill_max_low_pfn_mapped.patch and fix_movablemem_map.patch fix_movablemem_map.patch is too risky, and need more test. Hi Yinghai, In your Plan B, you allocated pgdat on local node, right ? -nd_pa = memblock_alloc_try_nid(nd_size, SMP_CACHE_BYTES, nid); +nd_pa = memblock_find_in_range_node(start, end, nd_size, + SMP_CACHE_BYTES, nid); Here, right ? Without movablemem_map, pgdat will be allocated successfully on local node, right ? If so, this will prevent node hot-plug, because as mentioned by Kamezawa, there is no way to ensure pgdat is not used by others on stack. I do hope you can stop putting pgdat and zone on local node for now. And improve it in the future. And I also hope you can apply my revert SRAT patch first, and then do your work. It will seem more clean to me. Thanks. :) -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched: CPU #1's llc-sibling CPU #0 is not on the same node!
If NUMAQ is breaking real stuff we can kill it by marking it BROKEN. Rip-out is 3.10 at this stage. Ingo Molnar mi...@kernel.org wrote: * Borislav Petkov b...@alien8.de wrote: On Thu, Feb 28, 2013 at 10:37:10PM -0800, H. Peter Anvin wrote: I'd be very happy to get the NUMAQ code ripped out. I am wondering if there are any reasons to keep any 32-bit x86 NUMA code at all. How much would it hurt us if we said 3.8 is the last kernel that supported NUMAQ? If anyone wants the functionality, they should use 3.8 or older. v3.9 - any non-trivial patch in the stage of being contemplated near the end of the v3.9 merge window is most likely v3.10 material. Thanks, Ingo -- Sent from my mobile phone. Please excuse brevity and lack of formatting. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched: CPU #1's llc-sibling CPU #0 is not on the same node!
On 02/28/2013 11:55 PM, Yinghai Lu wrote: Let me try again: movablemem_map is broken idea or poor design. Very much so. I have said this before: this is potentially useful during development/testing, but anyone who expects to actually tell their customers to use it is abusive. -hpa -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched: CPU #1's llc-sibling CPU #0 is not on the same node!
On Thu, Feb 28, 2013 at 11:43 PM, Yinghai Lu ying...@kernel.org wrote: [trim down CC list a bit] On Thu, Feb 28, 2013 at 9:00 PM, Yinghai Lu ying...@kernel.org wrote: On Thursday, February 28, 2013, H. Peter Anvin wrote: On 02/28/2013 08:32 PM, Linus Torvalds wrote: Yingai, Andrew, is this ok with you two? Linus FWIW, it makes sense to me iff it resolves the problems I prefer to reverting all 8 patches. Actually I have worked out one patch that could solve all problems, but it is too intrusive that I do not want to split it to small pieces to post it. Leaving the movablemem_map related changes in the upstream tree, will prevent me from continuing to make memblock to be used to allocate page table on local node ram for hot add. Will send reverting patch and putting page table on local node patch around 10pm after I get home. Please check attached patches. Plan A. revert all 8 patches: revert_movablemem_map.patch Plan B. fix movablemem_map: kill_max_low_pfn_mapped.patch and fix_movablemem_map.patch fix_movablemem_map.patch is too risky, and need more test. Konrad, Stefano: Can you check kill_max_low_pfn_mapped.patch and fix_movablemem_map.patch on top of today's Linus tree to check if it breaks Xen? Sorry, miss change in setup.c during split the patch. Thanks Yinghai fix_movablemem_map_v2.patch Description: Binary data
Re: sched: CPU #1's llc-sibling CPU #0 is not on the same node!
On Fri, Mar 1, 2013 at 7:03 PM, chen tang imtangc...@gmail.com wrote: Thank you for your suggestion and fix work. :) I would prefer your Plan b. But one last thing I want to confirm: Will allocating pgdat and zone on local node prevent node hot-removing ? Or is it safe to free all node data when removing a node ? AFAIK, no way to ensure node data is not on thread stack. Not sure. I need to go over the code. That is slub's limitation. If it is not, it should be fixed. If it is OK, I think Plan B is OK, and we can improve movablemem_map more in the future. BTW, I didn't mean to deny your idea and work. NUMA performance is always understand our consideration. It's just we plan it as a long way development in the future. movablemem_map is very important to us. And we do hope to keep it in kernel now, and improve it later. That does not look like right way to do development with mainline tree to add new features. You don't need to put development/testing support patches in the mainline. Just put those support patches in your local tree. Everyone have bunch of development/debug/teststub patches in their own hardisk for their working area, but don't need put them into mainline tree. Good practice should be: Have the feature completely done in your local tree and etc. then send out several patchset. and get reviewed and get merged one by one. Sometime would turn out that your whole patchset has problem that can not be fixed during review, and should be redesign again. Mainline tree is NOT testbed. For pci-root-bus hotplug, I already had code done completely. Then send out patchset one by one to get completely review. One patchset about acpi-scan is totally rewritten by Rafael after he understood our needs with better and clean design. Now still have ioapic and iommu left, and those patchset have been in my local tree more than 6 months and I keep optimizing them. BTW, Please do not top-post later. Thanks Yinghai -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched: CPU #1's llc-sibling CPU #0 is not on the same node!
On Thu, Feb 28, 2013 at 10:02 PM, Yasuaki Ishimatsu wrote: > 2013/03/01 14:00, Yinghai Lu wrote: > > Original issue occurs by two patches. And it is fixed by Tang's reverting > patch. So other patches are obviously unrelated to original problem. Thus > there is no reason to revert all patches related with movablemem_map. > > If there is a reason, movablemem_map patches prevent only your work. > > If you keep on developing your work, you should develop it in consideration > of those patches. Let me try again: movablemem_map is broken idea or poor design. It just push down kernel memory from local node to some place. It is ridiculous to let use specify mem range in command line to make memory hotplug working. Think about different memory layout conf, that will drive customer crazy. Also not mention there is performance regarding put numa data low. Right way or good pratice is: Find out those kernel memory that can not be moved, either put them low or make it to local node ram. Thanks Yinghai -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched: CPU #1's llc-sibling CPU #0 is not on the same node!
On 02/25/2013 08:51 PM, Martin Bligh wrote: >> Do you mean we can remove numaq x86 32bit code now? > > Wouldn't bother me at all. The machine is from 1995, end of life c. 2000? > Was useful in the early days of getting NUMA up and running on Linux, > but is now too old to be a museum piece, really. > I'd be very happy to get the NUMAQ code ripped out. I am wondering if there are any reasons to keep any 32-bit x86 NUMA code at all. -hpa -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched: CPU #1's llc-sibling CPU #0 is not on the same node!
On 03/01/2013 01:00 PM, Yinghai Lu wrote: On Thursday, February 28, 2013, H. Peter Anvin wrote: On 02/28/2013 08:32 PM, Linus Torvalds wrote: Yingai, Andrew, is this ok with you two? Linus FWIW, it makes sense to me iff it resolves the problems I prefer to reverting all 8 patches. Actually I have worked out one patch that could solve all problems, but it is too intrusive that I do not want to split it to small pieces to post it. Leaving the movablemem_map related changes in the upstream tree, will prevent me from continuing to make memblock to be used to allocate page table on local node ram for hot add. Hi Yinghai, Would you please give me a url to your code ? I don't think movablemem_map will block your work a lot. According to your description, you are modifying memblock to reserve some memory for local node pagetables, right ? If so, I think it won't be too difficult to make the code OK with your work. Thanks. :) Will send reverting patch and putting page table on local node patch around 10pm after I get home. Thanks -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched: CPU #1's llc-sibling CPU #0 is not on the same node!
2013/03/01 14:00, Yinghai Lu wrote: On Thursday, February 28, 2013, H. Peter Anvin wrote: On 02/28/2013 08:32 PM, Linus Torvalds wrote: Yingai, Andrew, is this ok with you two? Linus FWIW, it makes sense to me iff it resolves the problems I prefer to reverting all 8 patches. Actually I have worked out one patch that could solve all problems, but it is too intrusive that I do not want to split it to small pieces to post it. Leaving the movablemem_map related changes in the upstream tree, will prevent me from continuing to make memblock to be used to allocate page table on local node ram for hot add. Original issue occurs by two patches. And it is fixed by Tang's reverting patch. So other patches are obviously unrelated to original problem. Thus there is no reason to revert all patches related with movablemem_map. If there is a reason, movablemem_map patches prevent only your work. If you keep on developing your work, you should develop it in consideration of those patches. Thanks, Yasuaki Ishimatsu Will send reverting patch and putting page table on local node patch around 10pm after I get home. Thanks -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched: CPU #1's llc-sibling CPU #0 is not on the same node!
On 02/28/2013 08:32 PM, Linus Torvalds wrote: > Yingai, Andrew, > is this ok with you two? > > Linus FWIW, it makes sense to me iff it resolves the problems. -hpa -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched: CPU #1's llc-sibling CPU #0 is not on the same node!
On Thu, 28 Feb 2013 20:32:15 -0800 Linus Torvalds wrote: > Yingai, Andrew, > is this ok with you two? If it works. I haven't tested it yet! Ordinarily I'd give it a few days for -next testing and to let Fengguang's testbot chew on it. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched: CPU #1's llc-sibling CPU #0 is not on the same node!
Yingai, Andrew, is this ok with you two? Linus On Thu, Feb 28, 2013 at 7:46 PM, Tang Chen wrote: > Hi Linus, > > Please refer to the attached patch. > > This patch everts only the following two patches. > > > commit 01a178a94e8eaec351b29ee49fbb3d1c124cb7fb > acpi, memory-hotplug: support getting hotplug info from SRAT > commit e8d1955258091e4c92d5a975ebd7fd8a98f5d30f > > acpi, memory-hotplug: parse SRAT before memblock is ready > > Without these two patches, users can use "movablemem_map=nn[KMG]@ss[KMG]" > correctly, and cause no problem. > > And of course, the kernel will work as before if users don't use > > "movablemem_map=nn[KMG]@ss[KMG]". > > I do hope we can keep "movablemem_map=nn[KMG]@ss[KMG]" in 3.9. > > > We are working on fixing the SRAT problems, and we aims to push SRAT related > patches in 3.10. And we will also improve "movablemem_map=nn[KMG]@ss[KMG]" > functionality consistently in the future. > > Thanks. :) > > > On 03/01/2013 11:13 AM, Linus Torvalds wrote: >> >> On Wed, Feb 27, 2013 at 1:26 PM, Andrew Morton >> wrote: >>> >>> >>> So I'm thinking that the best approach here is to revert everything and >>> then try again for 3.10-rc1. This gives people time to test the code >>> while it's only in linux-next. (Hint!) >> >> >> I'd prefer to revert too by now - the bug seems to be known, and >> apparently it's not a trivial fix. We're getting close to the end of >> the merge window, and it's still being discussed, it clearly wasn't >> really fully cooked. >> >> Can we agree on some minimal set of reverts? Can somebody send me a >> patch with the revert and the commit explanation for the revert? >> Yinghai? Or I can do the reverts too if just the exact set of commits >> is clear, but I'd rather get it from somebody who sees and understand >> the problem, and can test the state afterwards.. >> >> Linus >> > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched: CPU #1's llc-sibling CPU #0 is not on the same node!
Hi Linus, Please refer to the attached patch. This patch everts only the following two patches. commit 01a178a94e8eaec351b29ee49fbb3d1c124cb7fb acpi, memory-hotplug: support getting hotplug info from SRAT commit e8d1955258091e4c92d5a975ebd7fd8a98f5d30f acpi, memory-hotplug: parse SRAT before memblock is ready Without these two patches, users can use "movablemem_map=nn[KMG]@ss[KMG]" correctly, and cause no problem. And of course, the kernel will work as before if users don't use "movablemem_map=nn[KMG]@ss[KMG]". I do hope we can keep "movablemem_map=nn[KMG]@ss[KMG]" in 3.9. We are working on fixing the SRAT problems, and we aims to push SRAT related patches in 3.10. And we will also improve "movablemem_map=nn[KMG]@ss[KMG]" functionality consistently in the future. Thanks. :) On 03/01/2013 11:13 AM, Linus Torvalds wrote: On Wed, Feb 27, 2013 at 1:26 PM, Andrew Morton wrote: So I'm thinking that the best approach here is to revert everything and then try again for 3.10-rc1. This gives people time to test the code while it's only in linux-next. (Hint!) I'd prefer to revert too by now - the bug seems to be known, and apparently it's not a trivial fix. We're getting close to the end of the merge window, and it's still being discussed, it clearly wasn't really fully cooked. Can we agree on some minimal set of reverts? Can somebody send me a patch with the revert and the commit explanation for the revert? Yinghai? Or I can do the reverts too if just the exact set of commits is clear, but I'd rather get it from somebody who sees and understand the problem, and can test the state afterwards.. Linus >From 2e859dc212ce13fb812da6f971409a0518914574 Mon Sep 17 00:00:00 2001 From: Tang Chen Date: Thu, 28 Feb 2013 10:43:51 +0900 Subject: [PATCH] x86, ACPI, mm: Revert SRAT support from movablemem_map boot option. The following two commits suooprt getting info from SRAT and determine which memory is hot-pluggable, also AKA "movablemem_map=srat" boot option. commit 01a178a94e8eaec351b29ee49fbb3d1c124cb7fb acpi, memory-hotplug: support getting hotplug info from SRAT commit e8d1955258091e4c92d5a975ebd7fd8a98f5d30f acpi, memory-hotplug: parse SRAT before memblock is ready We need to know SRAT info before memblock is ready, so that we can prevent memblock from allocate movable memory. To achieve goal, we moved SRAT parsing code earlier in these patches. But it broke ACPI_INITRD_TABLE_OVERRIDE functionality, and the fallback path of numa_init(). So we revert these two commits for now. And after that, users can only use "movablemem_map=nn[KMG]@ss[KMG]". NOTE: 1) It is OK to revert only these two patches. The core problems mentioned by Lu Yinghai: 1. numa_init is called several times, NOT just for srat. so those nodes_clear(numa_nodes_parsed) memset(_meminfo, 0, sizeof(numa_meminfo)) can not be just removed. Need to consider sequence is: numaq, srat, amd, dummy. and make fall back path working. 2. simply split acpi_numa_init to early_parse_srat. a. that early_parse_srat is NOT called for ia64, so you break ia64. b. for (i = 0; i < MAX_LOCAL_APIC; i++) set_apicid_to_node(i, NUMA_NO_NODE) still left in numa_init. So it will just clear result from early_parse_srat. it should be moved before that c. it breaks ACPI_TABLE_OVERRIDE...as the acpi table scan is moved early before override from INITRD is settled. They are caused by moving SRAT parsing earlier. And "movablemem_map=nn[KMG]@ss[KMG]" causes no harm to kernel. 2) With these two patches reverted, memblock will start to work before we parse SRAT, which means we won't know the end address of each node early enough. For example: If one node has memory [10G, 20G), and user specifies [15G, 16G), we cannot extend it to [15G, 20G). So memblock could still have a chance to allocate memory from [16G, 20G) for kernel, which is non-movable. As a resule, users could only use this option in a very limit way: They should specify the memory range to the end of each node. Reported-by: Tim Gardner Reported-by: Don Morris Bisected-by: Don Morris Reported-by: Yinghai Lu Signed-off-by: Tang Chen Cc: Thomas Gleixner Cc: Ingo Molnar Cc: "H. Peter Anvin" Cc: Andrew Morton Cc: Tony Luck Cc: Thomas Renninger Cc: Tejun Heo Cc: Tang Chen Cc: Yasuaki Ishimatsu --- Documentation/kernel-parameters.txt | 29 ++ arch/x86/kernel/setup.c | 13 ++ arch/x86/mm/numa.c |6 +-- arch/x86/mm/srat.c | 71 ++ drivers/acpi/numa.c | 23 +-- include/linux/acpi.h|8 include/linux/mm.h |2 - mm/page_alloc.c | 22 +-- 8 files changed, 27 insertions(+), 147 deletions(-) diff --git
Re: sched: CPU #1's llc-sibling CPU #0 is not on the same node!
On Wed, Feb 27, 2013 at 1:26 PM, Andrew Morton wrote: > > So I'm thinking that the best approach here is to revert everything and > then try again for 3.10-rc1. This gives people time to test the code > while it's only in linux-next. (Hint!) I'd prefer to revert too by now - the bug seems to be known, and apparently it's not a trivial fix. We're getting close to the end of the merge window, and it's still being discussed, it clearly wasn't really fully cooked. Can we agree on some minimal set of reverts? Can somebody send me a patch with the revert and the commit explanation for the revert? Yinghai? Or I can do the reverts too if just the exact set of commits is clear, but I'd rather get it from somebody who sees and understand the problem, and can test the state afterwards.. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched: CPU #1's llc-sibling CPU #0 is not on the same node!
On 03/01/2013 12:07 AM, Yinghai Lu wrote: On Tue, Feb 26, 2013 at 11:44 PM, Tang Chen wrote: Sorry, if you want to revert, you just need to revert: commit e8d1955258091e4c92d5a975ebd7fd8a98f5d30f acpi, memory-hotplug: parse SRAT before memblock is ready commit 01a178a94e8eaec351b29ee49fbb3d1c124cb7fb acpi, memory-hotplug: support getting hotplug info from SRAT The other two have nothing to do with SRAT. And they are necessary. Seeing from the code, I think it is clean. But we'd better test it. We should revert them all. as commit fb06bc8e5f42f38c011de0e59481f464a82380f6 Author: Tang Chen Date: Fri Feb 22 16:33:42 2013 -0800 page_alloc: bootmem limit with movablecore_map It is totally misleading in the TITLE. Come on, what is movablecore_map? It actually use movablemem_map to exclude some range during memblock_find_in_range. That make memblock less generic. That patch is the base of the whole patchset. Also you and Yasuaki keep saying: movablemem_map=srat. But where is doc and code for it? Looks like there is only movablemem_map=acpi. Hi Yinghai, I think I forgot to change the title when merging the related bugfix patches into one. And yes, movablecore_map has been changed to movablemem_map. How about this: For now, let's revert the SRAT related patch, and keep movablecore_map=nn[KMG]@ss[KMG]. About the SRAT thing, we have the following solution: 1) keep the original init series, parse acpi tables and modify global variables as before 2) introduce a new function to obtain SRAT info earlier, store the info somewhere, and touch no numa related thing 3) use the info to do movablemem_map thing, and free them when it is done In this way, we keep our code isolated from numa code. And the numa will be initialized as before. This can be done in one week or faster. And I'll cc x86 guys, and they can choose whenever to merge the new code. And about movablecore_map=nn[KMG]@ss[KMG] code, there is no harm to the kernel. And we have documented it that using this option will cause numa performance down. And users who don't want to lose the numa performance can boot the kernel without this option, and the kernel will work as before. I do hope we can keep the code in 3.9, and do more improvement in the future. So please just revert the two SRAT related patches. Thanks. :) I'm upset by this patchset. Next time, please get Ack from TJ or Ben when you touch memblock code. And at least make the TITLE is right. Thanks Yinghai -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched: CPU #1's llc-sibling CPU #0 is not on the same node!
On Tue, Feb 26, 2013 at 11:44 PM, Tang Chen wrote: > > Sorry, if you want to revert, you just need to revert: > > commit e8d1955258091e4c92d5a975ebd7fd8a98f5d30f > acpi, memory-hotplug: parse SRAT before memblock is ready > commit 01a178a94e8eaec351b29ee49fbb3d1c124cb7fb > acpi, memory-hotplug: support getting hotplug info from SRAT > > The other two have nothing to do with SRAT. And they are necessary. > > Seeing from the code, I think it is clean. But we'd better test it. We should revert them all. as commit fb06bc8e5f42f38c011de0e59481f464a82380f6 Author: Tang Chen Date: Fri Feb 22 16:33:42 2013 -0800 page_alloc: bootmem limit with movablecore_map It is totally misleading in the TITLE. Come on, what is movablecore_map? It actually use movablemem_map to exclude some range during memblock_find_in_range. That make memblock less generic. That patch is the base of the whole patchset. Also you and Yasuaki keep saying: movablemem_map=srat. But where is doc and code for it? Looks like there is only movablemem_map=acpi. I'm upset by this patchset. Next time, please get Ack from TJ or Ben when you touch memblock code. And at least make the TITLE is right. Thanks Yinghai -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched: CPU #1's llc-sibling CPU #0 is not on the same node!
Hi Andrew, On 02/28/2013 05:26 AM, Andrew Morton wrote: Thank you all for addressing the bug. we are on the way to fix it. How long do you think this will take? I think we need one week to solve these problems. I do hope we can catch up the merge window for 3.9. Thanks. :) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched: CPU #1's llc-sibling CPU #0 is not on the same node!
Hi Andrew, On 02/28/2013 05:26 AM, Andrew Morton wrote: Thank you all for addressing the bug. we are on the way to fix it. How long do you think this will take? I think we need one week to solve these problems. I do hope we can catch up the merge window for 3.9. Thanks. :) -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched: CPU #1's llc-sibling CPU #0 is not on the same node!
On Tue, Feb 26, 2013 at 11:44 PM, Tang Chen tangc...@cn.fujitsu.com wrote: Sorry, if you want to revert, you just need to revert: commit e8d1955258091e4c92d5a975ebd7fd8a98f5d30f acpi, memory-hotplug: parse SRAT before memblock is ready commit 01a178a94e8eaec351b29ee49fbb3d1c124cb7fb acpi, memory-hotplug: support getting hotplug info from SRAT The other two have nothing to do with SRAT. And they are necessary. Seeing from the code, I think it is clean. But we'd better test it. We should revert them all. as commit fb06bc8e5f42f38c011de0e59481f464a82380f6 Author: Tang Chen tangc...@cn.fujitsu.com Date: Fri Feb 22 16:33:42 2013 -0800 page_alloc: bootmem limit with movablecore_map It is totally misleading in the TITLE. Come on, what is movablecore_map? It actually use movablemem_map to exclude some range during memblock_find_in_range. That make memblock less generic. That patch is the base of the whole patchset. Also you and Yasuaki keep saying: movablemem_map=srat. But where is doc and code for it? Looks like there is only movablemem_map=acpi. I'm upset by this patchset. Next time, please get Ack from TJ or Ben when you touch memblock code. And at least make the TITLE is right. Thanks Yinghai -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched: CPU #1's llc-sibling CPU #0 is not on the same node!
On 03/01/2013 12:07 AM, Yinghai Lu wrote: On Tue, Feb 26, 2013 at 11:44 PM, Tang Chentangc...@cn.fujitsu.com wrote: Sorry, if you want to revert, you just need to revert: commit e8d1955258091e4c92d5a975ebd7fd8a98f5d30f acpi, memory-hotplug: parse SRAT before memblock is ready commit 01a178a94e8eaec351b29ee49fbb3d1c124cb7fb acpi, memory-hotplug: support getting hotplug info from SRAT The other two have nothing to do with SRAT. And they are necessary. Seeing from the code, I think it is clean. But we'd better test it. We should revert them all. as commit fb06bc8e5f42f38c011de0e59481f464a82380f6 Author: Tang Chentangc...@cn.fujitsu.com Date: Fri Feb 22 16:33:42 2013 -0800 page_alloc: bootmem limit with movablecore_map It is totally misleading in the TITLE. Come on, what is movablecore_map? It actually use movablemem_map to exclude some range during memblock_find_in_range. That make memblock less generic. That patch is the base of the whole patchset. Also you and Yasuaki keep saying: movablemem_map=srat. But where is doc and code for it? Looks like there is only movablemem_map=acpi. Hi Yinghai, I think I forgot to change the title when merging the related bugfix patches into one. And yes, movablecore_map has been changed to movablemem_map. How about this: For now, let's revert the SRAT related patch, and keep movablecore_map=nn[KMG]@ss[KMG]. About the SRAT thing, we have the following solution: 1) keep the original init series, parse acpi tables and modify global variables as before 2) introduce a new function to obtain SRAT info earlier, store the info somewhere, and touch no numa related thing 3) use the info to do movablemem_map thing, and free them when it is done In this way, we keep our code isolated from numa code. And the numa will be initialized as before. This can be done in one week or faster. And I'll cc x86 guys, and they can choose whenever to merge the new code. And about movablecore_map=nn[KMG]@ss[KMG] code, there is no harm to the kernel. And we have documented it that using this option will cause numa performance down. And users who don't want to lose the numa performance can boot the kernel without this option, and the kernel will work as before. I do hope we can keep the code in 3.9, and do more improvement in the future. So please just revert the two SRAT related patches. Thanks. :) I'm upset by this patchset. Next time, please get Ack from TJ or Ben when you touch memblock code. And at least make the TITLE is right. Thanks Yinghai -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched: CPU #1's llc-sibling CPU #0 is not on the same node!
On Wed, Feb 27, 2013 at 1:26 PM, Andrew Morton a...@linux-foundation.org wrote: So I'm thinking that the best approach here is to revert everything and then try again for 3.10-rc1. This gives people time to test the code while it's only in linux-next. (Hint!) I'd prefer to revert too by now - the bug seems to be known, and apparently it's not a trivial fix. We're getting close to the end of the merge window, and it's still being discussed, it clearly wasn't really fully cooked. Can we agree on some minimal set of reverts? Can somebody send me a patch with the revert and the commit explanation for the revert? Yinghai? Or I can do the reverts too if just the exact set of commits is clear, but I'd rather get it from somebody who sees and understand the problem, and can test the state afterwards.. Linus -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched: CPU #1's llc-sibling CPU #0 is not on the same node!
Hi Linus, Please refer to the attached patch. This patch everts only the following two patches. commit 01a178a94e8eaec351b29ee49fbb3d1c124cb7fb acpi, memory-hotplug: support getting hotplug info from SRAT commit e8d1955258091e4c92d5a975ebd7fd8a98f5d30f acpi, memory-hotplug: parse SRAT before memblock is ready Without these two patches, users can use movablemem_map=nn[KMG]@ss[KMG] correctly, and cause no problem. And of course, the kernel will work as before if users don't use movablemem_map=nn[KMG]@ss[KMG]. I do hope we can keep movablemem_map=nn[KMG]@ss[KMG] in 3.9. We are working on fixing the SRAT problems, and we aims to push SRAT related patches in 3.10. And we will also improve movablemem_map=nn[KMG]@ss[KMG] functionality consistently in the future. Thanks. :) On 03/01/2013 11:13 AM, Linus Torvalds wrote: On Wed, Feb 27, 2013 at 1:26 PM, Andrew Morton a...@linux-foundation.org wrote: So I'm thinking that the best approach here is to revert everything and then try again for 3.10-rc1. This gives people time to test the code while it's only in linux-next. (Hint!) I'd prefer to revert too by now - the bug seems to be known, and apparently it's not a trivial fix. We're getting close to the end of the merge window, and it's still being discussed, it clearly wasn't really fully cooked. Can we agree on some minimal set of reverts? Can somebody send me a patch with the revert and the commit explanation for the revert? Yinghai? Or I can do the reverts too if just the exact set of commits is clear, but I'd rather get it from somebody who sees and understand the problem, and can test the state afterwards.. Linus From 2e859dc212ce13fb812da6f971409a0518914574 Mon Sep 17 00:00:00 2001 From: Tang Chen tangc...@cn.fujitsu.com Date: Thu, 28 Feb 2013 10:43:51 +0900 Subject: [PATCH] x86, ACPI, mm: Revert SRAT support from movablemem_map boot option. The following two commits suooprt getting info from SRAT and determine which memory is hot-pluggable, also AKA movablemem_map=srat boot option. commit 01a178a94e8eaec351b29ee49fbb3d1c124cb7fb acpi, memory-hotplug: support getting hotplug info from SRAT commit e8d1955258091e4c92d5a975ebd7fd8a98f5d30f acpi, memory-hotplug: parse SRAT before memblock is ready We need to know SRAT info before memblock is ready, so that we can prevent memblock from allocate movable memory. To achieve goal, we moved SRAT parsing code earlier in these patches. But it broke ACPI_INITRD_TABLE_OVERRIDE functionality, and the fallback path of numa_init(). So we revert these two commits for now. And after that, users can only use movablemem_map=nn[KMG]@ss[KMG]. NOTE: 1) It is OK to revert only these two patches. The core problems mentioned by Lu Yinghai: 1. numa_init is called several times, NOT just for srat. so those nodes_clear(numa_nodes_parsed) memset(numa_meminfo, 0, sizeof(numa_meminfo)) can not be just removed. Need to consider sequence is: numaq, srat, amd, dummy. and make fall back path working. 2. simply split acpi_numa_init to early_parse_srat. a. that early_parse_srat is NOT called for ia64, so you break ia64. b. for (i = 0; i MAX_LOCAL_APIC; i++) set_apicid_to_node(i, NUMA_NO_NODE) still left in numa_init. So it will just clear result from early_parse_srat. it should be moved before that c. it breaks ACPI_TABLE_OVERRIDE...as the acpi table scan is moved early before override from INITRD is settled. They are caused by moving SRAT parsing earlier. And movablemem_map=nn[KMG]@ss[KMG] causes no harm to kernel. 2) With these two patches reverted, memblock will start to work before we parse SRAT, which means we won't know the end address of each node early enough. For example: If one node has memory [10G, 20G), and user specifies [15G, 16G), we cannot extend it to [15G, 20G). So memblock could still have a chance to allocate memory from [16G, 20G) for kernel, which is non-movable. As a resule, users could only use this option in a very limit way: They should specify the memory range to the end of each node. Reported-by: Tim Gardner tim.gard...@canonical.com Reported-by: Don Morris don.mor...@hp.com Bisected-by: Don Morris don.mor...@hp.com Reported-by: Yinghai Lu ying...@kernel.org Signed-off-by: Tang Chen tangc...@cn.fujitsu.com Cc: Thomas Gleixner t...@linutronix.de Cc: Ingo Molnar mi...@redhat.com Cc: H. Peter Anvin h...@zytor.com Cc: Andrew Morton a...@linux-foundation.org Cc: Tony Luck tony.l...@intel.com Cc: Thomas Renninger tr...@suse.de Cc: Tejun Heo t...@kernel.org Cc: Tang Chen tangc...@cn.fujitsu.com Cc: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com --- Documentation/kernel-parameters.txt | 29 ++ arch/x86/kernel/setup.c | 13 ++ arch/x86/mm/numa.c |6 +-- arch/x86/mm/srat.c | 71
Re: sched: CPU #1's llc-sibling CPU #0 is not on the same node!
Yingai, Andrew, is this ok with you two? Linus On Thu, Feb 28, 2013 at 7:46 PM, Tang Chen tangc...@cn.fujitsu.com wrote: Hi Linus, Please refer to the attached patch. This patch everts only the following two patches. commit 01a178a94e8eaec351b29ee49fbb3d1c124cb7fb acpi, memory-hotplug: support getting hotplug info from SRAT commit e8d1955258091e4c92d5a975ebd7fd8a98f5d30f acpi, memory-hotplug: parse SRAT before memblock is ready Without these two patches, users can use movablemem_map=nn[KMG]@ss[KMG] correctly, and cause no problem. And of course, the kernel will work as before if users don't use movablemem_map=nn[KMG]@ss[KMG]. I do hope we can keep movablemem_map=nn[KMG]@ss[KMG] in 3.9. We are working on fixing the SRAT problems, and we aims to push SRAT related patches in 3.10. And we will also improve movablemem_map=nn[KMG]@ss[KMG] functionality consistently in the future. Thanks. :) On 03/01/2013 11:13 AM, Linus Torvalds wrote: On Wed, Feb 27, 2013 at 1:26 PM, Andrew Morton a...@linux-foundation.org wrote: So I'm thinking that the best approach here is to revert everything and then try again for 3.10-rc1. This gives people time to test the code while it's only in linux-next. (Hint!) I'd prefer to revert too by now - the bug seems to be known, and apparently it's not a trivial fix. We're getting close to the end of the merge window, and it's still being discussed, it clearly wasn't really fully cooked. Can we agree on some minimal set of reverts? Can somebody send me a patch with the revert and the commit explanation for the revert? Yinghai? Or I can do the reverts too if just the exact set of commits is clear, but I'd rather get it from somebody who sees and understand the problem, and can test the state afterwards.. Linus -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched: CPU #1's llc-sibling CPU #0 is not on the same node!
On Thu, 28 Feb 2013 20:32:15 -0800 Linus Torvalds torva...@linux-foundation.org wrote: Yingai, Andrew, is this ok with you two? If it works. I haven't tested it yet! Ordinarily I'd give it a few days for -next testing and to let Fengguang's testbot chew on it. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched: CPU #1's llc-sibling CPU #0 is not on the same node!
On 02/28/2013 08:32 PM, Linus Torvalds wrote: Yingai, Andrew, is this ok with you two? Linus FWIW, it makes sense to me iff it resolves the problems. -hpa -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched: CPU #1's llc-sibling CPU #0 is not on the same node!
2013/03/01 14:00, Yinghai Lu wrote: On Thursday, February 28, 2013, H. Peter Anvin wrote: On 02/28/2013 08:32 PM, Linus Torvalds wrote: Yingai, Andrew, is this ok with you two? Linus FWIW, it makes sense to me iff it resolves the problems I prefer to reverting all 8 patches. Actually I have worked out one patch that could solve all problems, but it is too intrusive that I do not want to split it to small pieces to post it. Leaving the movablemem_map related changes in the upstream tree, will prevent me from continuing to make memblock to be used to allocate page table on local node ram for hot add. Original issue occurs by two patches. And it is fixed by Tang's reverting patch. So other patches are obviously unrelated to original problem. Thus there is no reason to revert all patches related with movablemem_map. If there is a reason, movablemem_map patches prevent only your work. If you keep on developing your work, you should develop it in consideration of those patches. Thanks, Yasuaki Ishimatsu Will send reverting patch and putting page table on local node patch around 10pm after I get home. Thanks -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched: CPU #1's llc-sibling CPU #0 is not on the same node!
On 03/01/2013 01:00 PM, Yinghai Lu wrote: On Thursday, February 28, 2013, H. Peter Anvin wrote: On 02/28/2013 08:32 PM, Linus Torvalds wrote: Yingai, Andrew, is this ok with you two? Linus FWIW, it makes sense to me iff it resolves the problems I prefer to reverting all 8 patches. Actually I have worked out one patch that could solve all problems, but it is too intrusive that I do not want to split it to small pieces to post it. Leaving the movablemem_map related changes in the upstream tree, will prevent me from continuing to make memblock to be used to allocate page table on local node ram for hot add. Hi Yinghai, Would you please give me a url to your code ? I don't think movablemem_map will block your work a lot. According to your description, you are modifying memblock to reserve some memory for local node pagetables, right ? If so, I think it won't be too difficult to make the code OK with your work. Thanks. :) Will send reverting patch and putting page table on local node patch around 10pm after I get home. Thanks -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched: CPU #1's llc-sibling CPU #0 is not on the same node!
On 02/25/2013 08:51 PM, Martin Bligh wrote: Do you mean we can remove numaq x86 32bit code now? Wouldn't bother me at all. The machine is from 1995, end of life c. 2000? Was useful in the early days of getting NUMA up and running on Linux, but is now too old to be a museum piece, really. I'd be very happy to get the NUMAQ code ripped out. I am wondering if there are any reasons to keep any 32-bit x86 NUMA code at all. -hpa -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched: CPU #1's llc-sibling CPU #0 is not on the same node!
On Thu, Feb 28, 2013 at 10:02 PM, Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com wrote: 2013/03/01 14:00, Yinghai Lu wrote: Original issue occurs by two patches. And it is fixed by Tang's reverting patch. So other patches are obviously unrelated to original problem. Thus there is no reason to revert all patches related with movablemem_map. If there is a reason, movablemem_map patches prevent only your work. If you keep on developing your work, you should develop it in consideration of those patches. Let me try again: movablemem_map is broken idea or poor design. It just push down kernel memory from local node to some place. It is ridiculous to let use specify mem range in command line to make memory hotplug working. Think about different memory layout conf, that will drive customer crazy. Also not mention there is performance regarding put numa data low. Right way or good pratice is: Find out those kernel memory that can not be moved, either put them low or make it to local node ram. Thanks Yinghai -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched: CPU #1's llc-sibling CPU #0 is not on the same node!
On Wed, 27 Feb 2013 16:00:36 +0800 Lai Jiangshan wrote: > In the mails and the changlog of the revert-patch, I think Yinghai > mainly worries about 3 problems. > > 1) the current implement has bug and bad code. > > Yes. Any bug should be fixed. we should fix it directly, or > we can revert the related patches and then send the fixed patches. > > But the related patch is only one or two, it is not good idea > to revert the whole patchset or the whole feature. Right? Reverting a new patchset isn't really a big deal. The patchset gets fixed up, retested then reapplied. We like to do things this way because it minimises the amount of trouble which the regression is causing other people. Reverting one or two patches from a fairly large and complex patchset sounds risky - we're putting an untested patch combination straight into mainline with minimal testing. It would be safer to revert everything. So I'm thinking that the best approach here is to revert everything and then try again for 3.10-rc1. This gives people time to test the code while it's only in linux-next. (Hint!) > Thank you all for addressing the bug. we are on the way to fix it. How long do you think this will take? > 2) many memory can be put into hotplugable memory, but we have not yet moved > them >into hotplugable memory yet. like: vmemmap, some page table ...etc, a lot. > > This is a restriction in the currently kernel, we can't convert them > quickly. > we must convert them step by step. example, we are converting the > memory of > page_cgroup to hotplugable memory. > > > 3) if the user(or firmware) specify the un-hotplugable memory too small, the > system can't >work, even can't boot. > > Any feature/system has its own minimum requirements, the user should > meet the requirements and specify more un-hotplugable memory. > so I don't think it is a problem in kernel land. > > But the problem 2)(above) make this feature's "minimum requirements" > much higher. It is the real thing that Yinghai worries about. > > But all systems which use this feature can offer this higher requirement > very easily. The users should specify enough un-hotplugable memory > before and after we decrease the "minimum requirements". > > The whole feature works very well if the user specify enough > un-hotplugable memory. So the problem 2) and 3) are not urgent > problems. Yes, let's not mingle concepts. From a feature perspective we've always understood that 3.9 memory hotplug would be "has limitations, needs work, but better than it was before". Let's consider that separately from "your patchset broke my kernel". -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: sched: CPU #1's llc-sibling CPU #0 is not on the same node!
> b. it will be freed to slub before run time. > like init code and initrd disk. If this is a problem - I'd be inclined to disable the code that frees it. It's only a few hundred KB of code, and possibly a few MB of initrd. Too small to worry about on a hot pluggable server. > In that case, so they should just boot system with numa=off. But we will still care about NUMA locality. -Tony -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched: CPU #1's llc-sibling CPU #0 is not on the same node!
On Wed, Feb 27, 2013 at 8:28 AM, Luck, Tony wrote: >> assume first cpu only have 1G ram, and other 31 socket will have bunch of ram > > That doesn't seem to be a very realistic assumption. Can you even still buy 1G > DIMMs for servers? I'd think that a minimum would be to have each of four > channels populated with a 4G DIMM - so 16GB on first cpu. But even that feels > rather low. We could use memmap= to exclude mem, right? > > I think that making sure that the system can boot is good (and maybe it should > ignore/override[*] parameters that would prevent booting). But let's be > realistic > about the cases we actually have to deal with (before somebody comes and talks > about systems with just 16MB). About make memory hotplug working: 1. find out ram that is used by kernel in early time. 2. check if a. it is with kernel code that will not be moved. like real_mode. b. it will be freed to slub before run time. like init code and initrd disk. c. if it is on local node ram that will not prevent mem hot-remove like page table and vmemmap. current we already have vmemmap and node_data on local node. May need to put page table on local node too. or just put page table with local node that kernel is on. d. something could be anywhere, and could be moved down after slub is ready. movablemem_map patchset prevents kernel using kernel from local node. In that case, so they should just boot system with numa=off. Thanks Yinghai -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: sched: CPU #1's llc-sibling CPU #0 is not on the same node!
> assume first cpu only have 1G ram, and other 31 socket will have bunch of ram That doesn't seem to be a very realistic assumption. Can you even still buy 1G DIMMs for servers? I'd think that a minimum would be to have each of four channels populated with a 4G DIMM - so 16GB on first cpu. But even that feels rather low. I think that making sure that the system can boot is good (and maybe it should ignore/override[*] parameters that would prevent booting). But let's be realistic about the cases we actually have to deal with (before somebody comes and talks about systems with just 16MB). -Tony [*] with some noisy warnings in the console log -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched: CPU #1's llc-sibling CPU #0 is not on the same node!
On 02/27/2013 12:11 AM, Yinghai Lu wrote: > On Tue, Feb 26, 2013 at 8:43 PM, Yasuaki Ishimatsu > wrote: >> 2013/02/27 13:04, Yinghai Lu wrote: >>> >>> On Tue, Feb 26, 2013 at 7:38 PM, Yasuaki Ishimatsu >>> wrote: 2013/02/27 11:30, Yinghai Lu wrote: > > Do you mean you can not boot one socket system with 1G ram ? > Assume socket 0 does not support hotplug, other 31 sockets support hot > plug. > > So we could boot system only with socket0, and later one by one hot > add other cpus. In this case, system can boot. But other cpus with bunch of ram hot plug may fails, since system does not have enough memory for cover hot added memory. When hot adding memory device, kernel object for the memory is allocated from 1G ram since hot added memory has not been enabled. >>> >>> yes, it may fail, if the one node memory need page table and vmemmap >>> is more than 1g ... >>> >> >>> for hot add memory we need to >>> 1. add another wrapper for init_memory_mapping, just like >>> init_mem_mapping() for booting path. >>> 2. we need make memblock more generic, so we can use it with hot add >>> memory during runtime. >>> 3. with that we can initialize page table for hot added node with ram. >>> a. initial page table for 2M near node top is from node0 ( that does >>> not support hot plug). >>> b. then will use 2M for memory below node top... >>> c. with that we will make sure page table stay on local node. >>> alloc_low_pages need to be updated to support that. >>> 4. need to make sure vmemmap on local node too. >> >> >> I think so too. By this, memory hot plug becomes more useful. >> >>> >>> so hot-remove node will work too later. >>> >>> In the long run, we should make booting path and hot adding more >>> similar and share at most code. >>> That will make code get more test coverage. > > Tang, Yasuaki, Andrew, > > Please check if you are ok with attached reverting patch. > > Tim, Don, > Can you try if attached reverting patch fix all the problems for you ? I'm sure from the discussion on how to leave in memory hotplug it likely won't be just a clean reversion, but as a data point -- yes, this patch does remove the problem as expected (and I don't see any new ones at first glance... though I'm not trying hotplug yet obviously). Thanks, Don Morris -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched: CPU #1's llc-sibling CPU #0 is not on the same node!
On 02/27/2013 12:11 AM, Yinghai Lu wrote: On Tue, Feb 26, 2013 at 8:43 PM, Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com wrote: 2013/02/27 13:04, Yinghai Lu wrote: On Tue, Feb 26, 2013 at 7:38 PM, Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com wrote: 2013/02/27 11:30, Yinghai Lu wrote: Do you mean you can not boot one socket system with 1G ram ? Assume socket 0 does not support hotplug, other 31 sockets support hot plug. So we could boot system only with socket0, and later one by one hot add other cpus. In this case, system can boot. But other cpus with bunch of ram hot plug may fails, since system does not have enough memory for cover hot added memory. When hot adding memory device, kernel object for the memory is allocated from 1G ram since hot added memory has not been enabled. yes, it may fail, if the one node memory need page table and vmemmap is more than 1g ... for hot add memory we need to 1. add another wrapper for init_memory_mapping, just like init_mem_mapping() for booting path. 2. we need make memblock more generic, so we can use it with hot add memory during runtime. 3. with that we can initialize page table for hot added node with ram. a. initial page table for 2M near node top is from node0 ( that does not support hot plug). b. then will use 2M for memory below node top... c. with that we will make sure page table stay on local node. alloc_low_pages need to be updated to support that. 4. need to make sure vmemmap on local node too. I think so too. By this, memory hot plug becomes more useful. so hot-remove node will work too later. In the long run, we should make booting path and hot adding more similar and share at most code. That will make code get more test coverage. Tang, Yasuaki, Andrew, Please check if you are ok with attached reverting patch. Tim, Don, Can you try if attached reverting patch fix all the problems for you ? I'm sure from the discussion on how to leave in memory hotplug it likely won't be just a clean reversion, but as a data point -- yes, this patch does remove the problem as expected (and I don't see any new ones at first glance... though I'm not trying hotplug yet obviously). Thanks, Don Morris -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: sched: CPU #1's llc-sibling CPU #0 is not on the same node!
assume first cpu only have 1G ram, and other 31 socket will have bunch of ram That doesn't seem to be a very realistic assumption. Can you even still buy 1G DIMMs for servers? I'd think that a minimum would be to have each of four channels populated with a 4G DIMM - so 16GB on first cpu. But even that feels rather low. I think that making sure that the system can boot is good (and maybe it should ignore/override[*] parameters that would prevent booting). But let's be realistic about the cases we actually have to deal with (before somebody comes and talks about systems with just 16MB). -Tony [*] with some noisy warnings in the console log -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched: CPU #1's llc-sibling CPU #0 is not on the same node!
On Wed, Feb 27, 2013 at 8:28 AM, Luck, Tony tony.l...@intel.com wrote: assume first cpu only have 1G ram, and other 31 socket will have bunch of ram That doesn't seem to be a very realistic assumption. Can you even still buy 1G DIMMs for servers? I'd think that a minimum would be to have each of four channels populated with a 4G DIMM - so 16GB on first cpu. But even that feels rather low. We could use memmap= to exclude mem, right? I think that making sure that the system can boot is good (and maybe it should ignore/override[*] parameters that would prevent booting). But let's be realistic about the cases we actually have to deal with (before somebody comes and talks about systems with just 16MB). About make memory hotplug working: 1. find out ram that is used by kernel in early time. 2. check if a. it is with kernel code that will not be moved. like real_mode. b. it will be freed to slub before run time. like init code and initrd disk. c. if it is on local node ram that will not prevent mem hot-remove like page table and vmemmap. current we already have vmemmap and node_data on local node. May need to put page table on local node too. or just put page table with local node that kernel is on. d. something could be anywhere, and could be moved down after slub is ready. movablemem_map patchset prevents kernel using kernel from local node. In that case, so they should just boot system with numa=off. Thanks Yinghai -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: sched: CPU #1's llc-sibling CPU #0 is not on the same node!
b. it will be freed to slub before run time. like init code and initrd disk. If this is a problem - I'd be inclined to disable the code that frees it. It's only a few hundred KB of code, and possibly a few MB of initrd. Too small to worry about on a hot pluggable server. In that case, so they should just boot system with numa=off. But we will still care about NUMA locality. -Tony -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched: CPU #1's llc-sibling CPU #0 is not on the same node!
On Wed, 27 Feb 2013 16:00:36 +0800 Lai Jiangshan la...@cn.fujitsu.com wrote: In the mails and the changlog of the revert-patch, I think Yinghai mainly worries about 3 problems. 1) the current implement has bug and bad code. Yes. Any bug should be fixed. we should fix it directly, or we can revert the related patches and then send the fixed patches. But the related patch is only one or two, it is not good idea to revert the whole patchset or the whole feature. Right? Reverting a new patchset isn't really a big deal. The patchset gets fixed up, retested then reapplied. We like to do things this way because it minimises the amount of trouble which the regression is causing other people. Reverting one or two patches from a fairly large and complex patchset sounds risky - we're putting an untested patch combination straight into mainline with minimal testing. It would be safer to revert everything. So I'm thinking that the best approach here is to revert everything and then try again for 3.10-rc1. This gives people time to test the code while it's only in linux-next. (Hint!) Thank you all for addressing the bug. we are on the way to fix it. How long do you think this will take? 2) many memory can be put into hotplugable memory, but we have not yet moved them into hotplugable memory yet. like: vmemmap, some page table ...etc, a lot. This is a restriction in the currently kernel, we can't convert them quickly. we must convert them step by step. example, we are converting the memory of page_cgroup to hotplugable memory. 3) if the user(or firmware) specify the un-hotplugable memory too small, the system can't work, even can't boot. Any feature/system has its own minimum requirements, the user should meet the requirements and specify more un-hotplugable memory. so I don't think it is a problem in kernel land. But the problem 2)(above) make this feature's minimum requirements much higher. It is the real thing that Yinghai worries about. But all systems which use this feature can offer this higher requirement very easily. The users should specify enough un-hotplugable memory before and after we decrease the minimum requirements. The whole feature works very well if the user specify enough un-hotplugable memory. So the problem 2) and 3) are not urgent problems. Yes, let's not mingle concepts. From a feature perspective we've always understood that 3.9 memory hotplug would be has limitations, needs work, but better than it was before. Let's consider that separately from your patchset broke my kernel. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched: CPU #1's llc-sibling CPU #0 is not on the same node!
On 02/27/2013 01:11 PM, Yinghai Lu wrote: > On Tue, Feb 26, 2013 at 8:43 PM, Yasuaki Ishimatsu > wrote: >> 2013/02/27 13:04, Yinghai Lu wrote: >>> >>> On Tue, Feb 26, 2013 at 7:38 PM, Yasuaki Ishimatsu >>> wrote: 2013/02/27 11:30, Yinghai Lu wrote: > > Do you mean you can not boot one socket system with 1G ram ? > Assume socket 0 does not support hotplug, other 31 sockets support hot > plug. > > So we could boot system only with socket0, and later one by one hot > add other cpus. In this case, system can boot. But other cpus with bunch of ram hot plug may fails, since system does not have enough memory for cover hot added memory. When hot adding memory device, kernel object for the memory is allocated from 1G ram since hot added memory has not been enabled. >>> >>> yes, it may fail, if the one node memory need page table and vmemmap >>> is more than 1g ... >>> >> >>> for hot add memory we need to >>> 1. add another wrapper for init_memory_mapping, just like >>> init_mem_mapping() for booting path. >>> 2. we need make memblock more generic, so we can use it with hot add >>> memory during runtime. >>> 3. with that we can initialize page table for hot added node with ram. >>> a. initial page table for 2M near node top is from node0 ( that does >>> not support hot plug). >>> b. then will use 2M for memory below node top... >>> c. with that we will make sure page table stay on local node. >>> alloc_low_pages need to be updated to support that. >>> 4. need to make sure vmemmap on local node too. >> >> >> I think so too. By this, memory hot plug becomes more useful. >> >>> >>> so hot-remove node will work too later. >>> >>> In the long run, we should make booting path and hot adding more >>> similar and share at most code. >>> That will make code get more test coverage. > > Tang, Yasuaki, Andrew, > > Please check if you are ok with attached reverting patch. > > Tim, Don, > Can you try if attached reverting patch fix all the problems for you ? > Hi, Yinghai, Andrew In the mails and the changlog of the revert-patch, I think Yinghai mainly worries about 3 problems. 1) the current implement has bug and bad code. Yes. Any bug should be fixed. we should fix it directly, or we can revert the related patches and then send the fixed patches. But the related patch is only one or two, it is not good idea to revert the whole patchset or the whole feature. Right? Thank you all for addressing the bug. we are on the way to fix it. 2) many memory can be put into hotplugable memory, but we have not yet moved them into hotplugable memory yet. like: vmemmap, some page table ...etc, a lot. This is a restriction in the currently kernel, we can't convert them quickly. we must convert them step by step. example, we are converting the memory of page_cgroup to hotplugable memory. 3) if the user(or firmware) specify the un-hotplugable memory too small, the system can't work, even can't boot. Any feature/system has its own minimum requirements, the user should meet the requirements and specify more un-hotplugable memory. so I don't think it is a problem in kernel land. But the problem 2)(above) make this feature's "minimum requirements" much higher. It is the real thing that Yinghai worries about. But all systems which use this feature can offer this higher requirement very easily. The users should specify enough un-hotplugable memory before and after we decrease the "minimum requirements". The whole feature works very well if the user specify enough un-hotplugable memory. So the problem 2) and 3) are not urgent problems. And our team has another problem, we are still not good at community work, (example, the patch TITLE is total misleading), but we are growing up. We are sorry and thank you for pointing out the mistakes. The feature/patchset does have problems. But it is not good to tangle all the problems together and revert the whole feature. Thanks, Lai -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched: CPU #1's llc-sibling CPU #0 is not on the same node!
On 02/27/2013 03:25 PM, Yinghai Lu wrote: On Tue, Feb 26, 2013 at 11:11 PM, Tang Chen wrote: On 02/27/2013 02:54 PM, Yinghai Lu wrote: Those patches are tangled together. No, they are not. The following commits supports "movablemem_map=nn[KMG]@ss[KMG]". commit fb06bc8e5f42f38c011de0e59481f464a82380f6 page_alloc: bootmem limit with movablecore_map commit 42f47e27e761fee07da69e04612ec7dd0d490edd page_alloc: make movablemem_map have higher priority commit 6981ec31146cf19454c55c130625f6cee89aab95 page_alloc: introduce zone_movable_limit[] to keep movable limit for nodes commit 34b71f1e04fcba578e719e675b4882eeeb2a1f6f page_alloc: add movable_memmap kernel parameter commit 4d59a75125d5a4717e57e9fc62c64b3d346e603e x86: get pg_data_t's memory from other node And the following supports "movablemem_map=srat". commit f7210e6c4ac795694106c1c5307134d3fc233e88 mm/memblock.c: use CONFIG_HAVE_MEMBLOCK_NODE_MAP to protect movablecore_map in memblock_overlaps_region(). commit 01a178a94e8eaec351b29ee49fbb3d1c124cb7fb acpi, memory-hotplug: support getting hotplug info from SRAT commit 27168d38fa209073219abedbe6a9de7ba9acbfad acpi, memory-hotplug: extend movablemem_map ranges to the end of node commit e8d1955258091e4c92d5a975ebd7fd8a98f5d30f acpi, memory-hotplug: parse SRAT before memblock is ready those four can be reverted cleanly? Sorry, if you want to revert, you just need to revert: commit e8d1955258091e4c92d5a975ebd7fd8a98f5d30f acpi, memory-hotplug: parse SRAT before memblock is ready commit 01a178a94e8eaec351b29ee49fbb3d1c124cb7fb acpi, memory-hotplug: support getting hotplug info from SRAT The other two have nothing to do with SRAT. And they are necessary. Seeing from the code, I think it is clean. But we'd better test it. Also it looks funny to ask user to specify mem range in boot command line to enable mem hotplug. Well, I think sometimes users don't like the SRAT memory style, and want to increase or reduce hot-pluggable memory by themselves. And also, it is useful for debuging firmware bugs. I agree that "movablemem_map=srat" functionality need more work to improve. Can we not revert it, and improve it during 3.9rc ? I think during rc time, at least we can fix the problems brought by early_parse_srat(). looks like acpi_override can not be fixed. About this problem, I need to do some investigation, and I think we can have a try. I do hope we can keep these patches. And put the improve work in the future. :) Thanks. :) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched: CPU #1's llc-sibling CPU #0 is not on the same node!
On Tue, Feb 26, 2013 at 11:11 PM, Tang Chen wrote: > On 02/27/2013 02:54 PM, Yinghai Lu wrote: >> >> Those patches are tangled together. > > > No, they are not. > > The following commits supports "movablemem_map=nn[KMG]@ss[KMG]". > > commit fb06bc8e5f42f38c011de0e59481f464a82380f6 > page_alloc: bootmem limit with movablecore_map > commit 42f47e27e761fee07da69e04612ec7dd0d490edd > page_alloc: make movablemem_map have higher priority > commit 6981ec31146cf19454c55c130625f6cee89aab95 > page_alloc: introduce zone_movable_limit[] to keep movable limit for nodes > commit 34b71f1e04fcba578e719e675b4882eeeb2a1f6f > page_alloc: add movable_memmap kernel parameter > commit 4d59a75125d5a4717e57e9fc62c64b3d346e603e > x86: get pg_data_t's memory from other node > > And the following supports "movablemem_map=srat". > > commit f7210e6c4ac795694106c1c5307134d3fc233e88 > mm/memblock.c: use CONFIG_HAVE_MEMBLOCK_NODE_MAP to protect > movablecore_map in memblock_overlaps_region(). > commit 01a178a94e8eaec351b29ee49fbb3d1c124cb7fb > acpi, memory-hotplug: support getting hotplug info from SRAT > commit 27168d38fa209073219abedbe6a9de7ba9acbfad > acpi, memory-hotplug: extend movablemem_map ranges to the end of node > commit e8d1955258091e4c92d5a975ebd7fd8a98f5d30f > acpi, memory-hotplug: parse SRAT before memblock is ready those four can be reverted cleanly? > >> >> Also it looks funny to ask user to specify mem range in boot command >> line to enable mem hotplug. > > > Well, I think sometimes users don't like the SRAT memory style, and want to > increase or reduce hot-pluggable memory by themselves. And also, it is > useful > for debuging firmware bugs. > > I agree that "movablemem_map=srat" functionality need more work to improve. > Can we not revert it, and improve it during 3.9rc ? I think during rc time, > at least we can fix the problems brought by early_parse_srat(). looks like acpi_override can not be fixed. Thanks Yinghai -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched: CPU #1's llc-sibling CPU #0 is not on the same node!
On 02/27/2013 02:54 PM, Yinghai Lu wrote: On Tue, Feb 26, 2013 at 9:49 PM, Yasuaki Ishimatsu wrote: 2013/02/27 14:11, Yinghai Lu wrote: On Tue, Feb 26, 2013 at 8:43 PM, Yasuaki Ishimatsu wrote: 2013/02/27 13:04, Yinghai Lu wrote: On Tue, Feb 26, 2013 at 7:38 PM, Yasuaki Ishimatsu wrote: 2013/02/27 11:30, Yinghai Lu wrote: Do you mean you can not boot one socket system with 1G ram ? Assume socket 0 does not support hotplug, other 31 sockets support hot plug. So we could boot system only with socket0, and later one by one hot add other cpus. In this case, system can boot. But other cpus with bunch of ram hot plug may fails, since system does not have enough memory for cover hot added memory. When hot adding memory device, kernel object for the memory is allocated from 1G ram since hot added memory has not been enabled. yes, it may fail, if the one node memory need page table and vmemmap is more than 1g ... for hot add memory we need to 1. add another wrapper for init_memory_mapping, just like init_mem_mapping() for booting path. 2. we need make memblock more generic, so we can use it with hot add memory during runtime. 3. with that we can initialize page table for hot added node with ram. a. initial page table for 2M near node top is from node0 ( that does not support hot plug). b. then will use 2M for memory below node top... c. with that we will make sure page table stay on local node. alloc_low_pages need to be updated to support that. 4. need to make sure vmemmap on local node too. I think so too. By this, memory hot plug becomes more useful. I agree with your idea. But I think above ideas is future work. So at first we should use movable memory for memory hot plug. After that, we will implement above ideas. so hot-remove node will work too later. In the long run, we should make booting path and hot adding more similar and share at most code. That will make code get more test coverage. Tang, Yasuaki, Andrew, Please check if you are ok with attached reverting patch. We will fix this problem with no objection. So please wait a while. And the problem occurs by "movablemem_map=srat" not "movablemem_map=nn[KMG]@ss[KMG]" At least, if you want to revert it, you should revert only "movablemem_map=srat" part. Those patches are tangled together. No, they are not. The following commits supports "movablemem_map=nn[KMG]@ss[KMG]". commit fb06bc8e5f42f38c011de0e59481f464a82380f6 page_alloc: bootmem limit with movablecore_map commit 42f47e27e761fee07da69e04612ec7dd0d490edd page_alloc: make movablemem_map have higher priority commit 6981ec31146cf19454c55c130625f6cee89aab95 page_alloc: introduce zone_movable_limit[] to keep movable limit for nodes commit 34b71f1e04fcba578e719e675b4882eeeb2a1f6f page_alloc: add movable_memmap kernel parameter commit 4d59a75125d5a4717e57e9fc62c64b3d346e603e x86: get pg_data_t's memory from other node And the following supports "movablemem_map=srat". commit f7210e6c4ac795694106c1c5307134d3fc233e88 mm/memblock.c: use CONFIG_HAVE_MEMBLOCK_NODE_MAP to protect movablecore_map in memblock_overlaps_region(). commit 01a178a94e8eaec351b29ee49fbb3d1c124cb7fb acpi, memory-hotplug: support getting hotplug info from SRAT commit 27168d38fa209073219abedbe6a9de7ba9acbfad acpi, memory-hotplug: extend movablemem_map ranges to the end of node commit e8d1955258091e4c92d5a975ebd7fd8a98f5d30f acpi, memory-hotplug: parse SRAT before memblock is ready Also it looks funny to ask user to specify mem range in boot command line to enable mem hotplug. Well, I think sometimes users don't like the SRAT memory style, and want to increase or reduce hot-pluggable memory by themselves. And also, it is useful for debuging firmware bugs. I agree that "movablemem_map=srat" functionality need more work to improve. Can we not revert it, and improve it during 3.9rc ? I think during rc time, at least we can fix the problems brought by early_parse_srat(). Thanks. :) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched: CPU #1's llc-sibling CPU #0 is not on the same node!
On Tue, Feb 26, 2013 at 9:49 PM, Yasuaki Ishimatsu wrote: > 2013/02/27 14:11, Yinghai Lu wrote: >> >> On Tue, Feb 26, 2013 at 8:43 PM, Yasuaki Ishimatsu >> wrote: >>> >>> 2013/02/27 13:04, Yinghai Lu wrote: On Tue, Feb 26, 2013 at 7:38 PM, Yasuaki Ishimatsu wrote: > > > 2013/02/27 11:30, Yinghai Lu wrote: >> >> >> Do you mean you can not boot one socket system with 1G ram ? >> Assume socket 0 does not support hotplug, other 31 sockets support hot >> plug. >> >> So we could boot system only with socket0, and later one by one hot >> add other cpus. > > > > > In this case, system can boot. But other cpus with bunch of ram hot > plug may fails, since system does not have enough memory for cover > hot added memory. When hot adding memory device, kernel object for the > memory is allocated from 1G ram since hot added memory has not been > enabled. > yes, it may fail, if the one node memory need page table and vmemmap is more than 1g ... >>> > for hot add memory we need to 1. add another wrapper for init_memory_mapping, just like init_mem_mapping() for booting path. 2. we need make memblock more generic, so we can use it with hot add memory during runtime. 3. with that we can initialize page table for hot added node with ram. a. initial page table for 2M near node top is from node0 ( that does not support hot plug). b. then will use 2M for memory below node top... c. with that we will make sure page table stay on local node. alloc_low_pages need to be updated to support that. 4. need to make sure vmemmap on local node too. >>> >>> >>> >>> I think so too. By this, memory hot plug becomes more useful. > > > I agree with your idea. But I think above ideas is future work. > So at first we should use movable memory for memory hot plug. > After that, we will implement above ideas. > > >>> so hot-remove node will work too later. In the long run, we should make booting path and hot adding more similar and share at most code. That will make code get more test coverage. >> >> >> Tang, Yasuaki, Andrew, >> >> Please check if you are ok with attached reverting patch. > > > We will fix this problem with no objection. So please wait a while. > > And the problem occurs by "movablemem_map=srat" not > "movablemem_map=nn[KMG]@ss[KMG]" > At least, if you want to revert it, you should revert only > "movablemem_map=srat" part. Those patches are tangled together. Also it looks funny to ask user to specify mem range in boot command line to enable mem hotplug. Thanks Yinghai -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched: CPU #1's llc-sibling CPU #0 is not on the same node!
2013/02/27 14:11, Yinghai Lu wrote: On Tue, Feb 26, 2013 at 8:43 PM, Yasuaki Ishimatsu wrote: 2013/02/27 13:04, Yinghai Lu wrote: On Tue, Feb 26, 2013 at 7:38 PM, Yasuaki Ishimatsu wrote: 2013/02/27 11:30, Yinghai Lu wrote: Do you mean you can not boot one socket system with 1G ram ? Assume socket 0 does not support hotplug, other 31 sockets support hot plug. So we could boot system only with socket0, and later one by one hot add other cpus. In this case, system can boot. But other cpus with bunch of ram hot plug may fails, since system does not have enough memory for cover hot added memory. When hot adding memory device, kernel object for the memory is allocated from 1G ram since hot added memory has not been enabled. yes, it may fail, if the one node memory need page table and vmemmap is more than 1g ... for hot add memory we need to 1. add another wrapper for init_memory_mapping, just like init_mem_mapping() for booting path. 2. we need make memblock more generic, so we can use it with hot add memory during runtime. 3. with that we can initialize page table for hot added node with ram. a. initial page table for 2M near node top is from node0 ( that does not support hot plug). b. then will use 2M for memory below node top... c. with that we will make sure page table stay on local node. alloc_low_pages need to be updated to support that. 4. need to make sure vmemmap on local node too. I think so too. By this, memory hot plug becomes more useful. I agree with your idea. But I think above ideas is future work. So at first we should use movable memory for memory hot plug. After that, we will implement above ideas. so hot-remove node will work too later. In the long run, we should make booting path and hot adding more similar and share at most code. That will make code get more test coverage. Tang, Yasuaki, Andrew, Please check if you are ok with attached reverting patch. We will fix this problem with no objection. So please wait a while. And the problem occurs by "movablemem_map=srat" not "movablemem_map=nn[KMG]@ss[KMG]" At least, if you want to revert it, you should revert only "movablemem_map=srat" part. Thanks, Yasuaki Ishimatsu Tim, Don, Can you try if attached reverting patch fix all the problems for you ? Thanks Yinghai -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched: CPU #1's llc-sibling CPU #0 is not on the same node!
On Tue, Feb 26, 2013 at 8:43 PM, Yasuaki Ishimatsu wrote: > 2013/02/27 13:04, Yinghai Lu wrote: >> >> On Tue, Feb 26, 2013 at 7:38 PM, Yasuaki Ishimatsu >> wrote: >>> >>> 2013/02/27 11:30, Yinghai Lu wrote: Do you mean you can not boot one socket system with 1G ram ? Assume socket 0 does not support hotplug, other 31 sockets support hot plug. So we could boot system only with socket0, and later one by one hot add other cpus. >>> >>> >>> >>> In this case, system can boot. But other cpus with bunch of ram hot >>> plug may fails, since system does not have enough memory for cover >>> hot added memory. When hot adding memory device, kernel object for the >>> memory is allocated from 1G ram since hot added memory has not been >>> enabled. >>> >> >> yes, it may fail, if the one node memory need page table and vmemmap >> is more than 1g ... >> > >> for hot add memory we need to >> 1. add another wrapper for init_memory_mapping, just like >> init_mem_mapping() for booting path. >> 2. we need make memblock more generic, so we can use it with hot add >> memory during runtime. >> 3. with that we can initialize page table for hot added node with ram. >> a. initial page table for 2M near node top is from node0 ( that does >> not support hot plug). >> b. then will use 2M for memory below node top... >> c. with that we will make sure page table stay on local node. >> alloc_low_pages need to be updated to support that. >> 4. need to make sure vmemmap on local node too. > > > I think so too. By this, memory hot plug becomes more useful. > >> >> so hot-remove node will work too later. >> >> In the long run, we should make booting path and hot adding more >> similar and share at most code. >> That will make code get more test coverage. Tang, Yasuaki, Andrew, Please check if you are ok with attached reverting patch. Tim, Don, Can you try if attached reverting patch fix all the problems for you ? Thanks Yinghai revert_movable_map.patch Description: Binary data
Re: sched: CPU #1's llc-sibling CPU #0 is not on the same node!
2013/02/27 13:04, Yinghai Lu wrote: On Tue, Feb 26, 2013 at 7:38 PM, Yasuaki Ishimatsu wrote: 2013/02/27 11:30, Yinghai Lu wrote: Do you mean you can not boot one socket system with 1G ram ? Assume socket 0 does not support hotplug, other 31 sockets support hot plug. So we could boot system only with socket0, and later one by one hot add other cpus. In this case, system can boot. But other cpus with bunch of ram hot plug may fails, since system does not have enough memory for cover hot added memory. When hot adding memory device, kernel object for the memory is allocated from 1G ram since hot added memory has not been enabled. yes, it may fail, if the one node memory need page table and vmemmap is more than 1g ... for hot add memory we need to 1. add another wrapper for init_memory_mapping, just like init_mem_mapping() for booting path. 2. we need make memblock more generic, so we can use it with hot add memory during runtime. 3. with that we can initialize page table for hot added node with ram. a. initial page table for 2M near node top is from node0 ( that does not support hot plug). b. then will use 2M for memory below node top... c. with that we will make sure page table stay on local node. alloc_low_pages need to be updated to support that. 4. need to make sure vmemmap on local node too. I think so too. By this, memory hot plug becomes more useful. Thanks, Yasuaki Ishimatsu so hot-remove node will work too later. In the long run, we should make booting path and hot adding more similar and share at most code. That will make code get more test coverage. Thanks Yinghai -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched: CPU #1's llc-sibling CPU #0 is not on the same node!
On 02/27/2013 10:24 AM, Yinghai Lu wrote: After looked at the code more, thought that theory that does not let kernel use ram on hotplug area is not right. after that commit, following range can not use movable ram: 1. real_mode code well..funny, legacy cpu0 [0,1M) could be hot-removed? 2. dma_continguous ? 3. log buff ring. 4. initrd... why it will be freed after booting, so it could be on movable... 5. crashkernel for kdump...: : looks like we can not put kdump kernel above 4G anymore 6. initmem_init: it will allocate page table to setup kernel mapping for memory..., it should be with BRK and near end of max_pfn AFAIK, Linux kernel now cannot migrate memory used by the kernel because. So any memory used by the kernel should not be on movable area. that depends. initrd will be freed later, so it should be put anywhere that is under max_pfn during boot. OK,but initrd is not that big. Actually, before my code start to work, memblock has reserved some memory. But it is not that big. On the other hand, it is not that easy to find out which memory should be kept in unmovable area, and which should not. If node is hotplugable, the mem related stuff like page table and vmemmap could be on the that node without problem and should be on that node. page tables and vmemmap are kernel memory. They should not be movable, I think. why do you need to migrate pagetable and vmemmap for the memory range that will be offline ? Hum, you are right. :) True, we can store pagetable and vmemmap on the node that is hot-pluggable. But just like the page_cgroup structs, we need additional work to handle it. But based on the existing code, we didn't do any special handling. I think we can improve it if needed. :) assume first cpu only have 1G ram, and other 31 socket will have bunch of ram and those cpu with ram could be hotadd and hotremoved. Now you want to put page table and vmemmap on first node. The system would not boot as not enough memory for cover whole system RAM. Yes, you are right. And a more extreme situation has been talked about by HPA. "If all the memory is hot-pluggable, then the kernel won't be able to boot." So, please refer to commit 01a178a94e8eaec351b29ee49fbb3d1c124cb7fb: acpi, memory-hotplug: support getting hotplug info from SRAT I have excluded all the memory reserved by memblock, and any node that has memory reserved by memblock will be set to un-hot-pluggable, which means we will have enough memory (all the memory on the node) to boot the kernel. So I think the problem you are talking about has been solved. I don't think that you understand the problem. for the system that will put all pagetable and vmemmap on the 1G ram of first cpu. as all other ram are MOVABLE, so memblock_find_in_range will not use any local ram on those nodes. Yes, I konw that. :) In this case, the kernel will not able to use local ram on those nodes. It will cause some performance down. I mean if the 1G ram is not enough for the kernel to boot, the current code will set all the ram on the same node as un-hot-pluggable. If all the ram on the node is not enough for kernel to boot, it is a really extreme situation, IIUC. I think users can solve this problem in two ways: 1) add more ram to the node. 2) use movablemem_map=nn[KMG]@ss[KMG] to configure more ram as unmovable. Thanks. :) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched: CPU #1's llc-sibling CPU #0 is not on the same node!
On Tue, Feb 26, 2013 at 7:38 PM, Yasuaki Ishimatsu wrote: > 2013/02/27 11:30, Yinghai Lu wrote: >> Do you mean you can not boot one socket system with 1G ram ? >> Assume socket 0 does not support hotplug, other 31 sockets support hot >> plug. >> >> So we could boot system only with socket0, and later one by one hot >> add other cpus. > > > In this case, system can boot. But other cpus with bunch of ram hot > plug may fails, since system does not have enough memory for cover > hot added memory. When hot adding memory device, kernel object for the > memory is allocated from 1G ram since hot added memory has not been > enabled. > yes, it may fail, if the one node memory need page table and vmemmap is more than 1g ... for hot add memory we need to 1. add another wrapper for init_memory_mapping, just like init_mem_mapping() for booting path. 2. we need make memblock more generic, so we can use it with hot add memory during runtime. 3. with that we can initialize page table for hot added node with ram. a. initial page table for 2M near node top is from node0 ( that does not support hot plug). b. then will use 2M for memory below node top... c. with that we will make sure page table stay on local node. alloc_low_pages need to be updated to support that. 4. need to make sure vmemmap on local node too. so hot-remove node will work too later. In the long run, we should make booting path and hot adding more similar and share at most code. That will make code get more test coverage. Thanks Yinghai -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched: CPU #1's llc-sibling CPU #0 is not on the same node!
2013/02/27 11:30, Yinghai Lu wrote: On Tue, Feb 26, 2013 at 4:52 PM, Yasuaki Ishimatsu wrote: 2013/02/27 7:44, Yinghai Lu wrote: that commit is totally broken, and it should be reverted. 1. numa_init is called several times, NOT just for srat. so those nodes_clear(numa_nodes_parsed) memset(_meminfo, 0, sizeof(numa_meminfo)) can not be just removed. please consider sequence is: numaq, srat, amd, dummy. You need to make fall back path working! 2. simply split acpi_numa_init to early_parse_srat. a. that early_parse_srat is NOT called for ia64, so you break ia64. b. for (i = 0; i < MAX_LOCAL_APIC; i++) set_apicid_to_node(i, NUMA_NO_NODE) still left in numa_init. So it will just clear result from early_parse_srat. it should be moved before that c. it breaks ACPI_TABLE_OVERIDE...as the acpi table scan is moved early before override from INITRD is settled. 3. that patch TITLE is total misleading, there is NO x86 in the title, but it changes to x86 code. 4, it does not CC to TJ and other numa guys... After looked at the code more, thought that theory that does not let kernel use ram on hotplug area is not right. after that commit, following range can not use movable ram: 1. real_mode code well..funny, legacy cpu0 [0,1M) could be hot-removed? 2. dma_continguous ? 3. log buff ring. 4. initrd... why it will be freed after booting, so it could be on movable... 5. crashkernel for kdump...: : looks like we can not put kdump kernel above 4G anymore 6. initmem_init: it will allocate page table to setup kernel mapping for memory..., it should be with BRK and near end of max_pfn If you use "movablemem_map=srat", abobe memory can not use movable memory. But in my understanding, current Linux cannot move above memory. So above memory should not use movable memory. that depends, like relocating initrd to different position. If node is hotplugable, the mem related stuff like page table and vmemmap could be on the that node without problem and should be on that node. assume first cpu only have 1G ram, and other 31 socket will have bunch of ram and those cpu with ram could be hotadd and hotremoved. Now you want to put page table and vmemmap on first node. The system would not boot as not enough memory for cover whole system RAM. Even if we solve your above mentions, the system cannot boot. In this case, user should: o add ram to first cpu o decreases hotpluggable ram by : - changing hotpluggable information of SRAT - using movablemem_map=nn[KMG]@ss[KMG] Do you mean you can not boot one socket system with 1G ram ? Assume socket 0 does not support hotplug, other 31 sockets support hot plug. So we could boot system only with socket0, and later one by one hot add other cpus. In this case, system can boot. But other cpus with bunch of ram hot plug may fails, since system does not have enough memory for cover hot added memory. When hot adding memory device, kernel object for the memory is allocated from 1G ram since hot added memory has not been enabled. Thanks, Yasuaki Ishimatsu We should simulate that way, just like boot system with PXM0 at first and later during acpi scan, add other cpus/ram. Thanks Yinghai -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched: CPU #1's llc-sibling CPU #0 is not on the same node!
On Tue, Feb 26, 2013 at 4:52 PM, Yasuaki Ishimatsu wrote: > 2013/02/27 7:44, Yinghai Lu wrote: that commit is totally broken, and it should be reverted. 1. numa_init is called several times, NOT just for srat. so those nodes_clear(numa_nodes_parsed) memset(_meminfo, 0, sizeof(numa_meminfo)) can not be just removed. please consider sequence is: numaq, srat, amd, dummy. You need to make fall back path working! 2. simply split acpi_numa_init to early_parse_srat. a. that early_parse_srat is NOT called for ia64, so you break ia64. b. for (i = 0; i < MAX_LOCAL_APIC; i++) set_apicid_to_node(i, NUMA_NO_NODE) still left in numa_init. So it will just clear result from early_parse_srat. it should be moved before that >>> >>> >>> c. it breaks ACPI_TABLE_OVERIDE...as the acpi table scan is moved >>> early before override from INITRD is settled. >>> 3. that patch TITLE is total misleading, there is NO x86 in the title, but it changes to x86 code. 4, it does not CC to TJ and other numa guys... >> >> >> After looked at the code more, thought that theory that does not let >> kernel use ram >> on hotplug area is not right. >> > >> after that commit, following range can not use movable ram: >> 1. real_mode code well..funny, legacy cpu0 [0,1M) could be >> hot-removed? >> 2. dma_continguous ? >> 3. log buff ring. >> 4. initrd... why it will be freed after booting, so it could be on >> movable... >> 5. crashkernel for kdump...: : looks like we can not put kdump kernel >> above 4G anymore >> 6. initmem_init: it will allocate page table to setup kernel mapping >> for memory..., it should >> be with BRK and near end of max_pfn > > > If you use "movablemem_map=srat", abobe memory can not use movable memory. > But in my understanding, current Linux cannot move above memory. So above > memory should not use movable memory. > that depends, like relocating initrd to different position. > >> >> If node is hotplugable, the mem related stuff like page table and >> vmemmap could be >> on the that node without problem and should be on that node. >> > >> assume first cpu only have 1G ram, and other 31 socket will have bunch of >> ram >> and those cpu with ram could be hotadd and hotremoved. >> Now you want to put page table and vmemmap on first node. >> The system would not boot as not enough memory for cover whole system RAM. > > > Even if we solve your above mentions, the system cannot boot. > In this case, user should: > o add ram to first cpu > o decreases hotpluggable ram by : > - changing hotpluggable information of SRAT > - using movablemem_map=nn[KMG]@ss[KMG] Do you mean you can not boot one socket system with 1G ram ? Assume socket 0 does not support hotplug, other 31 sockets support hot plug. So we could boot system only with socket0, and later one by one hot add other cpus. We should simulate that way, just like boot system with PXM0 at first and later during acpi scan, add other cpus/ram. Thanks Yinghai -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched: CPU #1's llc-sibling CPU #0 is not on the same node!
On Tue, Feb 26, 2013 at 6:14 PM, Tang Chen wrote: > Hi Yinghai, > > Please see below. :) > > > On 02/27/2013 06:44 AM, Yinghai Lu wrote: that commit is totally broken, and it should be reverted. 1. numa_init is called several times, NOT just for srat. so those nodes_clear(numa_nodes_parsed) memset(_meminfo, 0, sizeof(numa_meminfo)) can not be just removed. please consider sequence is: numaq, srat, amd, dummy. You need to make fall back path working! 2. simply split acpi_numa_init to early_parse_srat. a. that early_parse_srat is NOT called for ia64, so you break ia64. b. for (i = 0; i< MAX_LOCAL_APIC; i++) set_apicid_to_node(i, NUMA_NO_NODE) still left in numa_init. So it will just clear result from early_parse_srat. it should be moved before that >>> >>> >>> c. it breaks ACPI_TABLE_OVERIDE...as the acpi table scan is moved >>> early before override from INITRD is settled. >>> 3. that patch TITLE is total misleading, there is NO x86 in the title, but it changes to x86 code. 4, it does not CC to TJ and other numa guys... >> >> >> After looked at the code more, thought that theory that does not let >> kernel use ram >> on hotplug area is not right. >> >> after that commit, following range can not use movable ram: >> 1. real_mode code well..funny, legacy cpu0 [0,1M) could be >> hot-removed? >> 2. dma_continguous ? >> 3. log buff ring. >> 4. initrd... why it will be freed after booting, so it could be on >> movable... >> 5. crashkernel for kdump...: : looks like we can not put kdump kernel >> above 4G anymore >> 6. initmem_init: it will allocate page table to setup kernel mapping >> for memory..., it should >> be with BRK and near end of max_pfn > > > AFAIK, Linux kernel now cannot migrate memory used by the kernel because. So > any memory > used by the kernel should not be on movable area. that depends. initrd will be freed later, so it should be put anywhere that is under max_pfn during boot. > > >> >> If node is hotplugable, the mem related stuff like page table and >> vmemmap could be >> on the that node without problem and should be on that node. > > > page tables and vmemmap are kernel memory. They should not be movable, I > think. why do you need to migrate pagetable and vmemmap for the memory range that will be offline ? > > >> >> assume first cpu only have 1G ram, and other 31 socket will have bunch of >> ram >> and those cpu with ram could be hotadd and hotremoved. >> Now you want to put page table and vmemmap on first node. >> The system would not boot as not enough memory for cover whole system RAM. > > > Yes, you are right. And a more extreme situation has been talked about by > HPA. > > "If all the memory is hot-pluggable, then the kernel won't be able to boot." > > So, please refer to commit 01a178a94e8eaec351b29ee49fbb3d1c124cb7fb: > acpi, memory-hotplug: support getting hotplug info from SRAT > > I have excluded all the memory reserved by memblock, and any node that has > memory > reserved by memblock will be set to un-hot-pluggable, which means we will > have > enough memory (all the memory on the node) to boot the kernel. So I think > the problem > you are talking about has been solved. I don't think that you understand the problem. for the system that will put all pagetable and vmemmap on the 1G ram of first cpu. as all other ram are MOVABLE, so memblock_find_in_range will not use any local ram on those nodes. > > >> >> e8d1955258091e4c92d5a975ebd7fd8a98f5d30f and related commits should be >> just >> reverted now. >> >> Thanks >> >> Yinghai >> > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched: CPU #1's llc-sibling CPU #0 is not on the same node!
Hi Yinghai, Please see below. :) On 02/27/2013 06:44 AM, Yinghai Lu wrote: that commit is totally broken, and it should be reverted. 1. numa_init is called several times, NOT just for srat. so those nodes_clear(numa_nodes_parsed) memset(_meminfo, 0, sizeof(numa_meminfo)) can not be just removed. please consider sequence is: numaq, srat, amd, dummy. You need to make fall back path working! 2. simply split acpi_numa_init to early_parse_srat. a. that early_parse_srat is NOT called for ia64, so you break ia64. b. for (i = 0; i< MAX_LOCAL_APIC; i++) set_apicid_to_node(i, NUMA_NO_NODE) still left in numa_init. So it will just clear result from early_parse_srat. it should be moved before that c. it breaks ACPI_TABLE_OVERIDE...as the acpi table scan is moved early before override from INITRD is settled. 3. that patch TITLE is total misleading, there is NO x86 in the title, but it changes to x86 code. 4, it does not CC to TJ and other numa guys... After looked at the code more, thought that theory that does not let kernel use ram on hotplug area is not right. after that commit, following range can not use movable ram: 1. real_mode code well..funny, legacy cpu0 [0,1M) could be hot-removed? 2. dma_continguous ? 3. log buff ring. 4. initrd... why it will be freed after booting, so it could be on movable... 5. crashkernel for kdump...: : looks like we can not put kdump kernel above 4G anymore 6. initmem_init: it will allocate page table to setup kernel mapping for memory..., it should be with BRK and near end of max_pfn AFAIK, Linux kernel now cannot migrate memory used by the kernel because. So any memory used by the kernel should not be on movable area. If node is hotplugable, the mem related stuff like page table and vmemmap could be on the that node without problem and should be on that node. page tables and vmemmap are kernel memory. They should not be movable, I think. assume first cpu only have 1G ram, and other 31 socket will have bunch of ram and those cpu with ram could be hotadd and hotremoved. Now you want to put page table and vmemmap on first node. The system would not boot as not enough memory for cover whole system RAM. Yes, you are right. And a more extreme situation has been talked about by HPA. "If all the memory is hot-pluggable, then the kernel won't be able to boot." So, please refer to commit 01a178a94e8eaec351b29ee49fbb3d1c124cb7fb: acpi, memory-hotplug: support getting hotplug info from SRAT I have excluded all the memory reserved by memblock, and any node that has memory reserved by memblock will be set to un-hot-pluggable, which means we will have enough memory (all the memory on the node) to boot the kernel. So I think the problem you are talking about has been solved. e8d1955258091e4c92d5a975ebd7fd8a98f5d30f and related commits should be just reverted now. Thanks Yinghai -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched: CPU #1's llc-sibling CPU #0 is not on the same node!
2013/02/27 7:44, Yinghai Lu wrote: On Tue, Feb 26, 2013 at 1:36 PM, Yinghai Lu wrote: On Mon, Feb 25, 2013 at 2:50 PM, Yinghai Lu wrote: On Mon, Feb 25, 2013 at 1:27 PM, Don Morris wrote: On 02/25/2013 10:32 AM, Tim Gardner wrote: On 02/25/2013 08:02 AM, Tim Gardner wrote: Is this an expected warning ? I'll boot a vanilla kernel just to be sure. rebased against ab7826595e9ec51a51f622c5fc91e2f59440481a in Linus' repo: Same with a vanilla kernel, so it doesn't appear that any Ubuntu cruft is having an impact: Reproduced on a HP z620 workstation (E5-2620 instead of E5-2680, but still Sandy Bridge, though I don't think that matters). Bisection leads to: # bad: [e8d1955258091e4c92d5a975ebd7fd8a98f5d30f] acpi, memory-hotplug: parse SRAT before memblock is ready Nothing terribly obvious leaps out as to *why* that reshuffling messes up the cpu<-->node bindings, but I wanted to put this out there while I poke around further. [Note that the SRAT: PXM -> APIC -> Node print outs during boot are the same either way -- if you look at the APIC numbers of the processors (from /proc/cpuinfo), the processors should be assigned to the correct node, but they aren't.] cc'ing Tang Chen in case this is obvious to him or he's already fixed it somewhere not on Linus's tree yet. Don Morris [0.170435] [ cut here ] [0.170450] WARNING: at arch/x86/kernel/smpboot.c:324 topology_sane.isra.2+0x71/0x84() [0.170452] Hardware name: S2600CP [0.170454] sched: CPU #1's llc-sibling CPU #0 is not on the same node! [node: 1 != 0]. Ignoring dependency. [0.156000] smpboot: Booting Node 1, Processors #1 [0.170455] Modules linked in: [0.170460] Pid: 0, comm: swapper/1 Not tainted 3.8.0+ #1 [0.170461] Call Trace: [0.170466] [] warn_slowpath_common+0x7f/0xc0 [0.170473] [] warn_slowpath_fmt+0x46/0x50 [0.170477] [] topology_sane.isra.2+0x71/0x84 [0.170482] [] set_cpu_sibling_map+0x23f/0x436 [0.170487] [] start_secondary+0x137/0x201 [0.170502] ---[ end trace 09222f596307ca1d ]--- that commit is totally broken, and it should be reverted. 1. numa_init is called several times, NOT just for srat. so those nodes_clear(numa_nodes_parsed) memset(_meminfo, 0, sizeof(numa_meminfo)) can not be just removed. please consider sequence is: numaq, srat, amd, dummy. You need to make fall back path working! 2. simply split acpi_numa_init to early_parse_srat. a. that early_parse_srat is NOT called for ia64, so you break ia64. b. for (i = 0; i < MAX_LOCAL_APIC; i++) set_apicid_to_node(i, NUMA_NO_NODE) still left in numa_init. So it will just clear result from early_parse_srat. it should be moved before that c. it breaks ACPI_TABLE_OVERIDE...as the acpi table scan is moved early before override from INITRD is settled. 3. that patch TITLE is total misleading, there is NO x86 in the title, but it changes to x86 code. 4, it does not CC to TJ and other numa guys... After looked at the code more, thought that theory that does not let kernel use ram on hotplug area is not right. after that commit, following range can not use movable ram: 1. real_mode code well..funny, legacy cpu0 [0,1M) could be hot-removed? 2. dma_continguous ? 3. log buff ring. 4. initrd... why it will be freed after booting, so it could be on movable... 5. crashkernel for kdump...: : looks like we can not put kdump kernel above 4G anymore 6. initmem_init: it will allocate page table to setup kernel mapping for memory..., it should be with BRK and near end of max_pfn If you use "movablemem_map=srat", abobe memory can not use movable memory. But in my understanding, current Linux cannot move above memory. So above memory should not use movable memory. If node is hotplugable, the mem related stuff like page table and vmemmap could be on the that node without problem and should be on that node. assume first cpu only have 1G ram, and other 31 socket will have bunch of ram and those cpu with ram could be hotadd and hotremoved. Now you want to put page table and vmemmap on first node. The system would not boot as not enough memory for cover whole system RAM. Even if we solve your above mentions, the system cannot boot. In this case, user should: o add ram to first cpu o decreases hotpluggable ram by : - changing hotpluggable information of SRAT - using movablemem_map=nn[KMG]@ss[KMG] Thansk, Yasuaki Ishimatsu e8d1955258091e4c92d5a975ebd7fd8a98f5d30f and related commits should be just reverted now. Thanks Yinghai -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at
Re: sched: CPU #1's llc-sibling CPU #0 is not on the same node!
On Tue, Feb 26, 2013 at 1:36 PM, Yinghai Lu wrote: > On Mon, Feb 25, 2013 at 2:50 PM, Yinghai Lu wrote: >> On Mon, Feb 25, 2013 at 1:27 PM, Don Morris wrote: >>> On 02/25/2013 10:32 AM, Tim Gardner wrote: On 02/25/2013 08:02 AM, Tim Gardner wrote: > Is this an expected warning ? I'll boot a vanilla kernel just to be sure. > > rebased against ab7826595e9ec51a51f622c5fc91e2f59440481a in Linus' repo: > Same with a vanilla kernel, so it doesn't appear that any Ubuntu cruft is having an impact: >>> >>> Reproduced on a HP z620 workstation (E5-2620 instead of E5-2680, but >>> still Sandy Bridge, though I don't think that matters). >>> >>> Bisection leads to: >>> # bad: [e8d1955258091e4c92d5a975ebd7fd8a98f5d30f] acpi, memory-hotplug: >>> parse SRAT before memblock is ready >>> >>> Nothing terribly obvious leaps out as to *why* that reshuffling messes >>> up the cpu<-->node bindings, but I wanted to put this out there while >>> I poke around further. [Note that the SRAT: PXM -> APIC -> Node print >>> outs during boot are the same either way -- if you look at the APIC >>> numbers of the processors (from /proc/cpuinfo), the processors should >>> be assigned to the correct node, but they aren't.] cc'ing Tang Chen >>> in case this is obvious to him or he's already fixed it somewhere not >>> on Linus's tree yet. >>> >>> Don Morris >>> [0.170435] [ cut here ] [0.170450] WARNING: at arch/x86/kernel/smpboot.c:324 topology_sane.isra.2+0x71/0x84() [0.170452] Hardware name: S2600CP [0.170454] sched: CPU #1's llc-sibling CPU #0 is not on the same node! [node: 1 != 0]. Ignoring dependency. [0.156000] smpboot: Booting Node 1, Processors #1 [0.170455] Modules linked in: [0.170460] Pid: 0, comm: swapper/1 Not tainted 3.8.0+ #1 [0.170461] Call Trace: [0.170466] [] warn_slowpath_common+0x7f/0xc0 [0.170473] [] warn_slowpath_fmt+0x46/0x50 [0.170477] [] topology_sane.isra.2+0x71/0x84 [0.170482] [] set_cpu_sibling_map+0x23f/0x436 [0.170487] [] start_secondary+0x137/0x201 [0.170502] ---[ end trace 09222f596307ca1d ]--- >> >> that commit is totally broken, and it should be reverted. >> >> 1. numa_init is called several times, NOT just for srat. so those >>nodes_clear(numa_nodes_parsed) >>memset(_meminfo, 0, sizeof(numa_meminfo)) >> can not be just removed. >> please consider sequence is: numaq, srat, amd, dummy. >> You need to make fall back path working! >> >> 2. simply split acpi_numa_init to early_parse_srat. >> a. that early_parse_srat is NOT called for ia64, so you break ia64. >> b. for (i = 0; i < MAX_LOCAL_APIC; i++) >> set_apicid_to_node(i, NUMA_NO_NODE) >> still left in numa_init. So it will just clear result from early_parse_srat. >> it should be moved before that > >c. it breaks ACPI_TABLE_OVERIDE...as the acpi table scan is moved > early before override from INITRD is settled. > >> >> 3. that patch TITLE is total misleading, there is NO x86 in the title, >> but it changes >> to x86 code. >> >> 4, it does not CC to TJ and other numa guys... After looked at the code more, thought that theory that does not let kernel use ram on hotplug area is not right. after that commit, following range can not use movable ram: 1. real_mode code well..funny, legacy cpu0 [0,1M) could be hot-removed? 2. dma_continguous ? 3. log buff ring. 4. initrd... why it will be freed after booting, so it could be on movable... 5. crashkernel for kdump...: : looks like we can not put kdump kernel above 4G anymore 6. initmem_init: it will allocate page table to setup kernel mapping for memory..., it should be with BRK and near end of max_pfn If node is hotplugable, the mem related stuff like page table and vmemmap could be on the that node without problem and should be on that node. assume first cpu only have 1G ram, and other 31 socket will have bunch of ram and those cpu with ram could be hotadd and hotremoved. Now you want to put page table and vmemmap on first node. The system would not boot as not enough memory for cover whole system RAM. e8d1955258091e4c92d5a975ebd7fd8a98f5d30f and related commits should be just reverted now. Thanks Yinghai -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched: CPU #1's llc-sibling CPU #0 is not on the same node!
On Mon, Feb 25, 2013 at 2:50 PM, Yinghai Lu wrote: > On Mon, Feb 25, 2013 at 1:27 PM, Don Morris wrote: >> On 02/25/2013 10:32 AM, Tim Gardner wrote: >>> On 02/25/2013 08:02 AM, Tim Gardner wrote: Is this an expected warning ? I'll boot a vanilla kernel just to be sure. rebased against ab7826595e9ec51a51f622c5fc91e2f59440481a in Linus' repo: >>> >>> Same with a vanilla kernel, so it doesn't appear that any Ubuntu cruft >>> is having an impact: >> >> Reproduced on a HP z620 workstation (E5-2620 instead of E5-2680, but >> still Sandy Bridge, though I don't think that matters). >> >> Bisection leads to: >> # bad: [e8d1955258091e4c92d5a975ebd7fd8a98f5d30f] acpi, memory-hotplug: >> parse SRAT before memblock is ready >> >> Nothing terribly obvious leaps out as to *why* that reshuffling messes >> up the cpu<-->node bindings, but I wanted to put this out there while >> I poke around further. [Note that the SRAT: PXM -> APIC -> Node print >> outs during boot are the same either way -- if you look at the APIC >> numbers of the processors (from /proc/cpuinfo), the processors should >> be assigned to the correct node, but they aren't.] cc'ing Tang Chen >> in case this is obvious to him or he's already fixed it somewhere not >> on Linus's tree yet. >> >> Don Morris >> >>> >>> [0.170435] [ cut here ] >>> [0.170450] WARNING: at arch/x86/kernel/smpboot.c:324 >>> topology_sane.isra.2+0x71/0x84() >>> [0.170452] Hardware name: S2600CP >>> [0.170454] sched: CPU #1's llc-sibling CPU #0 is not on the same >>> node! [node: 1 != 0]. Ignoring dependency. >>> [0.156000] smpboot: Booting Node 1, Processors #1 >>> [0.170455] Modules linked in: >>> [0.170460] Pid: 0, comm: swapper/1 Not tainted 3.8.0+ #1 >>> [0.170461] Call Trace: >>> [0.170466] [] warn_slowpath_common+0x7f/0xc0 >>> [0.170473] [] warn_slowpath_fmt+0x46/0x50 >>> [0.170477] [] topology_sane.isra.2+0x71/0x84 >>> [0.170482] [] set_cpu_sibling_map+0x23f/0x436 >>> [0.170487] [] start_secondary+0x137/0x201 >>> [0.170502] ---[ end trace 09222f596307ca1d ]--- > > that commit is totally broken, and it should be reverted. > > 1. numa_init is called several times, NOT just for srat. so those >nodes_clear(numa_nodes_parsed) >memset(_meminfo, 0, sizeof(numa_meminfo)) > can not be just removed. > please consider sequence is: numaq, srat, amd, dummy. > You need to make fall back path working! > > 2. simply split acpi_numa_init to early_parse_srat. > a. that early_parse_srat is NOT called for ia64, so you break ia64. > b. for (i = 0; i < MAX_LOCAL_APIC; i++) > set_apicid_to_node(i, NUMA_NO_NODE) > still left in numa_init. So it will just clear result from early_parse_srat. > it should be moved before that c. it breaks ACPI_TABLE_OVERIDE...as the acpi table scan is moved early before override from INITRD is settled. > > 3. that patch TITLE is total misleading, there is NO x86 in the title, > but it changes > to x86 code. > > 4, it does not CC to TJ and other numa guys... -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched: CPU #1's llc-sibling CPU #0 is not on the same node!
On Mon, Feb 25, 2013 at 2:50 PM, Yinghai Lu ying...@kernel.org wrote: On Mon, Feb 25, 2013 at 1:27 PM, Don Morris don.mor...@hp.com wrote: On 02/25/2013 10:32 AM, Tim Gardner wrote: On 02/25/2013 08:02 AM, Tim Gardner wrote: Is this an expected warning ? I'll boot a vanilla kernel just to be sure. rebased against ab7826595e9ec51a51f622c5fc91e2f59440481a in Linus' repo: Same with a vanilla kernel, so it doesn't appear that any Ubuntu cruft is having an impact: Reproduced on a HP z620 workstation (E5-2620 instead of E5-2680, but still Sandy Bridge, though I don't think that matters). Bisection leads to: # bad: [e8d1955258091e4c92d5a975ebd7fd8a98f5d30f] acpi, memory-hotplug: parse SRAT before memblock is ready Nothing terribly obvious leaps out as to *why* that reshuffling messes up the cpu--node bindings, but I wanted to put this out there while I poke around further. [Note that the SRAT: PXM - APIC - Node print outs during boot are the same either way -- if you look at the APIC numbers of the processors (from /proc/cpuinfo), the processors should be assigned to the correct node, but they aren't.] cc'ing Tang Chen in case this is obvious to him or he's already fixed it somewhere not on Linus's tree yet. Don Morris [0.170435] [ cut here ] [0.170450] WARNING: at arch/x86/kernel/smpboot.c:324 topology_sane.isra.2+0x71/0x84() [0.170452] Hardware name: S2600CP [0.170454] sched: CPU #1's llc-sibling CPU #0 is not on the same node! [node: 1 != 0]. Ignoring dependency. [0.156000] smpboot: Booting Node 1, Processors #1 [0.170455] Modules linked in: [0.170460] Pid: 0, comm: swapper/1 Not tainted 3.8.0+ #1 [0.170461] Call Trace: [0.170466] [810597bf] warn_slowpath_common+0x7f/0xc0 [0.170473] [810598b6] warn_slowpath_fmt+0x46/0x50 [0.170477] [816cc752] topology_sane.isra.2+0x71/0x84 [0.170482] [816cc9de] set_cpu_sibling_map+0x23f/0x436 [0.170487] [816ccd0c] start_secondary+0x137/0x201 [0.170502] ---[ end trace 09222f596307ca1d ]--- that commit is totally broken, and it should be reverted. 1. numa_init is called several times, NOT just for srat. so those nodes_clear(numa_nodes_parsed) memset(numa_meminfo, 0, sizeof(numa_meminfo)) can not be just removed. please consider sequence is: numaq, srat, amd, dummy. You need to make fall back path working! 2. simply split acpi_numa_init to early_parse_srat. a. that early_parse_srat is NOT called for ia64, so you break ia64. b. for (i = 0; i MAX_LOCAL_APIC; i++) set_apicid_to_node(i, NUMA_NO_NODE) still left in numa_init. So it will just clear result from early_parse_srat. it should be moved before that c. it breaks ACPI_TABLE_OVERIDE...as the acpi table scan is moved early before override from INITRD is settled. 3. that patch TITLE is total misleading, there is NO x86 in the title, but it changes to x86 code. 4, it does not CC to TJ and other numa guys... -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched: CPU #1's llc-sibling CPU #0 is not on the same node!
On Tue, Feb 26, 2013 at 1:36 PM, Yinghai Lu ying...@kernel.org wrote: On Mon, Feb 25, 2013 at 2:50 PM, Yinghai Lu ying...@kernel.org wrote: On Mon, Feb 25, 2013 at 1:27 PM, Don Morris don.mor...@hp.com wrote: On 02/25/2013 10:32 AM, Tim Gardner wrote: On 02/25/2013 08:02 AM, Tim Gardner wrote: Is this an expected warning ? I'll boot a vanilla kernel just to be sure. rebased against ab7826595e9ec51a51f622c5fc91e2f59440481a in Linus' repo: Same with a vanilla kernel, so it doesn't appear that any Ubuntu cruft is having an impact: Reproduced on a HP z620 workstation (E5-2620 instead of E5-2680, but still Sandy Bridge, though I don't think that matters). Bisection leads to: # bad: [e8d1955258091e4c92d5a975ebd7fd8a98f5d30f] acpi, memory-hotplug: parse SRAT before memblock is ready Nothing terribly obvious leaps out as to *why* that reshuffling messes up the cpu--node bindings, but I wanted to put this out there while I poke around further. [Note that the SRAT: PXM - APIC - Node print outs during boot are the same either way -- if you look at the APIC numbers of the processors (from /proc/cpuinfo), the processors should be assigned to the correct node, but they aren't.] cc'ing Tang Chen in case this is obvious to him or he's already fixed it somewhere not on Linus's tree yet. Don Morris [0.170435] [ cut here ] [0.170450] WARNING: at arch/x86/kernel/smpboot.c:324 topology_sane.isra.2+0x71/0x84() [0.170452] Hardware name: S2600CP [0.170454] sched: CPU #1's llc-sibling CPU #0 is not on the same node! [node: 1 != 0]. Ignoring dependency. [0.156000] smpboot: Booting Node 1, Processors #1 [0.170455] Modules linked in: [0.170460] Pid: 0, comm: swapper/1 Not tainted 3.8.0+ #1 [0.170461] Call Trace: [0.170466] [810597bf] warn_slowpath_common+0x7f/0xc0 [0.170473] [810598b6] warn_slowpath_fmt+0x46/0x50 [0.170477] [816cc752] topology_sane.isra.2+0x71/0x84 [0.170482] [816cc9de] set_cpu_sibling_map+0x23f/0x436 [0.170487] [816ccd0c] start_secondary+0x137/0x201 [0.170502] ---[ end trace 09222f596307ca1d ]--- that commit is totally broken, and it should be reverted. 1. numa_init is called several times, NOT just for srat. so those nodes_clear(numa_nodes_parsed) memset(numa_meminfo, 0, sizeof(numa_meminfo)) can not be just removed. please consider sequence is: numaq, srat, amd, dummy. You need to make fall back path working! 2. simply split acpi_numa_init to early_parse_srat. a. that early_parse_srat is NOT called for ia64, so you break ia64. b. for (i = 0; i MAX_LOCAL_APIC; i++) set_apicid_to_node(i, NUMA_NO_NODE) still left in numa_init. So it will just clear result from early_parse_srat. it should be moved before that c. it breaks ACPI_TABLE_OVERIDE...as the acpi table scan is moved early before override from INITRD is settled. 3. that patch TITLE is total misleading, there is NO x86 in the title, but it changes to x86 code. 4, it does not CC to TJ and other numa guys... After looked at the code more, thought that theory that does not let kernel use ram on hotplug area is not right. after that commit, following range can not use movable ram: 1. real_mode code well..funny, legacy cpu0 [0,1M) could be hot-removed? 2. dma_continguous ? 3. log buff ring. 4. initrd... why it will be freed after booting, so it could be on movable... 5. crashkernel for kdump...: : looks like we can not put kdump kernel above 4G anymore 6. initmem_init: it will allocate page table to setup kernel mapping for memory..., it should be with BRK and near end of max_pfn If node is hotplugable, the mem related stuff like page table and vmemmap could be on the that node without problem and should be on that node. assume first cpu only have 1G ram, and other 31 socket will have bunch of ram and those cpu with ram could be hotadd and hotremoved. Now you want to put page table and vmemmap on first node. The system would not boot as not enough memory for cover whole system RAM. e8d1955258091e4c92d5a975ebd7fd8a98f5d30f and related commits should be just reverted now. Thanks Yinghai -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched: CPU #1's llc-sibling CPU #0 is not on the same node!
2013/02/27 7:44, Yinghai Lu wrote: On Tue, Feb 26, 2013 at 1:36 PM, Yinghai Lu ying...@kernel.org wrote: On Mon, Feb 25, 2013 at 2:50 PM, Yinghai Lu ying...@kernel.org wrote: On Mon, Feb 25, 2013 at 1:27 PM, Don Morris don.mor...@hp.com wrote: On 02/25/2013 10:32 AM, Tim Gardner wrote: On 02/25/2013 08:02 AM, Tim Gardner wrote: Is this an expected warning ? I'll boot a vanilla kernel just to be sure. rebased against ab7826595e9ec51a51f622c5fc91e2f59440481a in Linus' repo: Same with a vanilla kernel, so it doesn't appear that any Ubuntu cruft is having an impact: Reproduced on a HP z620 workstation (E5-2620 instead of E5-2680, but still Sandy Bridge, though I don't think that matters). Bisection leads to: # bad: [e8d1955258091e4c92d5a975ebd7fd8a98f5d30f] acpi, memory-hotplug: parse SRAT before memblock is ready Nothing terribly obvious leaps out as to *why* that reshuffling messes up the cpu--node bindings, but I wanted to put this out there while I poke around further. [Note that the SRAT: PXM - APIC - Node print outs during boot are the same either way -- if you look at the APIC numbers of the processors (from /proc/cpuinfo), the processors should be assigned to the correct node, but they aren't.] cc'ing Tang Chen in case this is obvious to him or he's already fixed it somewhere not on Linus's tree yet. Don Morris [0.170435] [ cut here ] [0.170450] WARNING: at arch/x86/kernel/smpboot.c:324 topology_sane.isra.2+0x71/0x84() [0.170452] Hardware name: S2600CP [0.170454] sched: CPU #1's llc-sibling CPU #0 is not on the same node! [node: 1 != 0]. Ignoring dependency. [0.156000] smpboot: Booting Node 1, Processors #1 [0.170455] Modules linked in: [0.170460] Pid: 0, comm: swapper/1 Not tainted 3.8.0+ #1 [0.170461] Call Trace: [0.170466] [810597bf] warn_slowpath_common+0x7f/0xc0 [0.170473] [810598b6] warn_slowpath_fmt+0x46/0x50 [0.170477] [816cc752] topology_sane.isra.2+0x71/0x84 [0.170482] [816cc9de] set_cpu_sibling_map+0x23f/0x436 [0.170487] [816ccd0c] start_secondary+0x137/0x201 [0.170502] ---[ end trace 09222f596307ca1d ]--- that commit is totally broken, and it should be reverted. 1. numa_init is called several times, NOT just for srat. so those nodes_clear(numa_nodes_parsed) memset(numa_meminfo, 0, sizeof(numa_meminfo)) can not be just removed. please consider sequence is: numaq, srat, amd, dummy. You need to make fall back path working! 2. simply split acpi_numa_init to early_parse_srat. a. that early_parse_srat is NOT called for ia64, so you break ia64. b. for (i = 0; i MAX_LOCAL_APIC; i++) set_apicid_to_node(i, NUMA_NO_NODE) still left in numa_init. So it will just clear result from early_parse_srat. it should be moved before that c. it breaks ACPI_TABLE_OVERIDE...as the acpi table scan is moved early before override from INITRD is settled. 3. that patch TITLE is total misleading, there is NO x86 in the title, but it changes to x86 code. 4, it does not CC to TJ and other numa guys... After looked at the code more, thought that theory that does not let kernel use ram on hotplug area is not right. after that commit, following range can not use movable ram: 1. real_mode code well..funny, legacy cpu0 [0,1M) could be hot-removed? 2. dma_continguous ? 3. log buff ring. 4. initrd... why it will be freed after booting, so it could be on movable... 5. crashkernel for kdump...: : looks like we can not put kdump kernel above 4G anymore 6. initmem_init: it will allocate page table to setup kernel mapping for memory..., it should be with BRK and near end of max_pfn If you use movablemem_map=srat, abobe memory can not use movable memory. But in my understanding, current Linux cannot move above memory. So above memory should not use movable memory. If node is hotplugable, the mem related stuff like page table and vmemmap could be on the that node without problem and should be on that node. assume first cpu only have 1G ram, and other 31 socket will have bunch of ram and those cpu with ram could be hotadd and hotremoved. Now you want to put page table and vmemmap on first node. The system would not boot as not enough memory for cover whole system RAM. Even if we solve your above mentions, the system cannot boot. In this case, user should: o add ram to first cpu o decreases hotpluggable ram by : - changing hotpluggable information of SRAT - using movablemem_map=nn[KMG]@ss[KMG] Thansk, Yasuaki Ishimatsu e8d1955258091e4c92d5a975ebd7fd8a98f5d30f and related commits should be just reverted now. Thanks Yinghai -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line
Re: sched: CPU #1's llc-sibling CPU #0 is not on the same node!
Hi Yinghai, Please see below. :) On 02/27/2013 06:44 AM, Yinghai Lu wrote: that commit is totally broken, and it should be reverted. 1. numa_init is called several times, NOT just for srat. so those nodes_clear(numa_nodes_parsed) memset(numa_meminfo, 0, sizeof(numa_meminfo)) can not be just removed. please consider sequence is: numaq, srat, amd, dummy. You need to make fall back path working! 2. simply split acpi_numa_init to early_parse_srat. a. that early_parse_srat is NOT called for ia64, so you break ia64. b. for (i = 0; i MAX_LOCAL_APIC; i++) set_apicid_to_node(i, NUMA_NO_NODE) still left in numa_init. So it will just clear result from early_parse_srat. it should be moved before that c. it breaks ACPI_TABLE_OVERIDE...as the acpi table scan is moved early before override from INITRD is settled. 3. that patch TITLE is total misleading, there is NO x86 in the title, but it changes to x86 code. 4, it does not CC to TJ and other numa guys... After looked at the code more, thought that theory that does not let kernel use ram on hotplug area is not right. after that commit, following range can not use movable ram: 1. real_mode code well..funny, legacy cpu0 [0,1M) could be hot-removed? 2. dma_continguous ? 3. log buff ring. 4. initrd... why it will be freed after booting, so it could be on movable... 5. crashkernel for kdump...: : looks like we can not put kdump kernel above 4G anymore 6. initmem_init: it will allocate page table to setup kernel mapping for memory..., it should be with BRK and near end of max_pfn AFAIK, Linux kernel now cannot migrate memory used by the kernel because. So any memory used by the kernel should not be on movable area. If node is hotplugable, the mem related stuff like page table and vmemmap could be on the that node without problem and should be on that node. page tables and vmemmap are kernel memory. They should not be movable, I think. assume first cpu only have 1G ram, and other 31 socket will have bunch of ram and those cpu with ram could be hotadd and hotremoved. Now you want to put page table and vmemmap on first node. The system would not boot as not enough memory for cover whole system RAM. Yes, you are right. And a more extreme situation has been talked about by HPA. If all the memory is hot-pluggable, then the kernel won't be able to boot. So, please refer to commit 01a178a94e8eaec351b29ee49fbb3d1c124cb7fb: acpi, memory-hotplug: support getting hotplug info from SRAT I have excluded all the memory reserved by memblock, and any node that has memory reserved by memblock will be set to un-hot-pluggable, which means we will have enough memory (all the memory on the node) to boot the kernel. So I think the problem you are talking about has been solved. e8d1955258091e4c92d5a975ebd7fd8a98f5d30f and related commits should be just reverted now. Thanks Yinghai -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched: CPU #1's llc-sibling CPU #0 is not on the same node!
On Tue, Feb 26, 2013 at 6:14 PM, Tang Chen tangc...@cn.fujitsu.com wrote: Hi Yinghai, Please see below. :) On 02/27/2013 06:44 AM, Yinghai Lu wrote: that commit is totally broken, and it should be reverted. 1. numa_init is called several times, NOT just for srat. so those nodes_clear(numa_nodes_parsed) memset(numa_meminfo, 0, sizeof(numa_meminfo)) can not be just removed. please consider sequence is: numaq, srat, amd, dummy. You need to make fall back path working! 2. simply split acpi_numa_init to early_parse_srat. a. that early_parse_srat is NOT called for ia64, so you break ia64. b. for (i = 0; i MAX_LOCAL_APIC; i++) set_apicid_to_node(i, NUMA_NO_NODE) still left in numa_init. So it will just clear result from early_parse_srat. it should be moved before that c. it breaks ACPI_TABLE_OVERIDE...as the acpi table scan is moved early before override from INITRD is settled. 3. that patch TITLE is total misleading, there is NO x86 in the title, but it changes to x86 code. 4, it does not CC to TJ and other numa guys... After looked at the code more, thought that theory that does not let kernel use ram on hotplug area is not right. after that commit, following range can not use movable ram: 1. real_mode code well..funny, legacy cpu0 [0,1M) could be hot-removed? 2. dma_continguous ? 3. log buff ring. 4. initrd... why it will be freed after booting, so it could be on movable... 5. crashkernel for kdump...: : looks like we can not put kdump kernel above 4G anymore 6. initmem_init: it will allocate page table to setup kernel mapping for memory..., it should be with BRK and near end of max_pfn AFAIK, Linux kernel now cannot migrate memory used by the kernel because. So any memory used by the kernel should not be on movable area. that depends. initrd will be freed later, so it should be put anywhere that is under max_pfn during boot. If node is hotplugable, the mem related stuff like page table and vmemmap could be on the that node without problem and should be on that node. page tables and vmemmap are kernel memory. They should not be movable, I think. why do you need to migrate pagetable and vmemmap for the memory range that will be offline ? assume first cpu only have 1G ram, and other 31 socket will have bunch of ram and those cpu with ram could be hotadd and hotremoved. Now you want to put page table and vmemmap on first node. The system would not boot as not enough memory for cover whole system RAM. Yes, you are right. And a more extreme situation has been talked about by HPA. If all the memory is hot-pluggable, then the kernel won't be able to boot. So, please refer to commit 01a178a94e8eaec351b29ee49fbb3d1c124cb7fb: acpi, memory-hotplug: support getting hotplug info from SRAT I have excluded all the memory reserved by memblock, and any node that has memory reserved by memblock will be set to un-hot-pluggable, which means we will have enough memory (all the memory on the node) to boot the kernel. So I think the problem you are talking about has been solved. I don't think that you understand the problem. for the system that will put all pagetable and vmemmap on the 1G ram of first cpu. as all other ram are MOVABLE, so memblock_find_in_range will not use any local ram on those nodes. e8d1955258091e4c92d5a975ebd7fd8a98f5d30f and related commits should be just reverted now. Thanks Yinghai -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched: CPU #1's llc-sibling CPU #0 is not on the same node!
On Tue, Feb 26, 2013 at 4:52 PM, Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com wrote: 2013/02/27 7:44, Yinghai Lu wrote: that commit is totally broken, and it should be reverted. 1. numa_init is called several times, NOT just for srat. so those nodes_clear(numa_nodes_parsed) memset(numa_meminfo, 0, sizeof(numa_meminfo)) can not be just removed. please consider sequence is: numaq, srat, amd, dummy. You need to make fall back path working! 2. simply split acpi_numa_init to early_parse_srat. a. that early_parse_srat is NOT called for ia64, so you break ia64. b. for (i = 0; i MAX_LOCAL_APIC; i++) set_apicid_to_node(i, NUMA_NO_NODE) still left in numa_init. So it will just clear result from early_parse_srat. it should be moved before that c. it breaks ACPI_TABLE_OVERIDE...as the acpi table scan is moved early before override from INITRD is settled. 3. that patch TITLE is total misleading, there is NO x86 in the title, but it changes to x86 code. 4, it does not CC to TJ and other numa guys... After looked at the code more, thought that theory that does not let kernel use ram on hotplug area is not right. after that commit, following range can not use movable ram: 1. real_mode code well..funny, legacy cpu0 [0,1M) could be hot-removed? 2. dma_continguous ? 3. log buff ring. 4. initrd... why it will be freed after booting, so it could be on movable... 5. crashkernel for kdump...: : looks like we can not put kdump kernel above 4G anymore 6. initmem_init: it will allocate page table to setup kernel mapping for memory..., it should be with BRK and near end of max_pfn If you use movablemem_map=srat, abobe memory can not use movable memory. But in my understanding, current Linux cannot move above memory. So above memory should not use movable memory. that depends, like relocating initrd to different position. If node is hotplugable, the mem related stuff like page table and vmemmap could be on the that node without problem and should be on that node. assume first cpu only have 1G ram, and other 31 socket will have bunch of ram and those cpu with ram could be hotadd and hotremoved. Now you want to put page table and vmemmap on first node. The system would not boot as not enough memory for cover whole system RAM. Even if we solve your above mentions, the system cannot boot. In this case, user should: o add ram to first cpu o decreases hotpluggable ram by : - changing hotpluggable information of SRAT - using movablemem_map=nn[KMG]@ss[KMG] Do you mean you can not boot one socket system with 1G ram ? Assume socket 0 does not support hotplug, other 31 sockets support hot plug. So we could boot system only with socket0, and later one by one hot add other cpus. We should simulate that way, just like boot system with PXM0 at first and later during acpi scan, add other cpus/ram. Thanks Yinghai -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched: CPU #1's llc-sibling CPU #0 is not on the same node!
2013/02/27 11:30, Yinghai Lu wrote: On Tue, Feb 26, 2013 at 4:52 PM, Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com wrote: 2013/02/27 7:44, Yinghai Lu wrote: that commit is totally broken, and it should be reverted. 1. numa_init is called several times, NOT just for srat. so those nodes_clear(numa_nodes_parsed) memset(numa_meminfo, 0, sizeof(numa_meminfo)) can not be just removed. please consider sequence is: numaq, srat, amd, dummy. You need to make fall back path working! 2. simply split acpi_numa_init to early_parse_srat. a. that early_parse_srat is NOT called for ia64, so you break ia64. b. for (i = 0; i MAX_LOCAL_APIC; i++) set_apicid_to_node(i, NUMA_NO_NODE) still left in numa_init. So it will just clear result from early_parse_srat. it should be moved before that c. it breaks ACPI_TABLE_OVERIDE...as the acpi table scan is moved early before override from INITRD is settled. 3. that patch TITLE is total misleading, there is NO x86 in the title, but it changes to x86 code. 4, it does not CC to TJ and other numa guys... After looked at the code more, thought that theory that does not let kernel use ram on hotplug area is not right. after that commit, following range can not use movable ram: 1. real_mode code well..funny, legacy cpu0 [0,1M) could be hot-removed? 2. dma_continguous ? 3. log buff ring. 4. initrd... why it will be freed after booting, so it could be on movable... 5. crashkernel for kdump...: : looks like we can not put kdump kernel above 4G anymore 6. initmem_init: it will allocate page table to setup kernel mapping for memory..., it should be with BRK and near end of max_pfn If you use movablemem_map=srat, abobe memory can not use movable memory. But in my understanding, current Linux cannot move above memory. So above memory should not use movable memory. that depends, like relocating initrd to different position. If node is hotplugable, the mem related stuff like page table and vmemmap could be on the that node without problem and should be on that node. assume first cpu only have 1G ram, and other 31 socket will have bunch of ram and those cpu with ram could be hotadd and hotremoved. Now you want to put page table and vmemmap on first node. The system would not boot as not enough memory for cover whole system RAM. Even if we solve your above mentions, the system cannot boot. In this case, user should: o add ram to first cpu o decreases hotpluggable ram by : - changing hotpluggable information of SRAT - using movablemem_map=nn[KMG]@ss[KMG] Do you mean you can not boot one socket system with 1G ram ? Assume socket 0 does not support hotplug, other 31 sockets support hot plug. So we could boot system only with socket0, and later one by one hot add other cpus. In this case, system can boot. But other cpus with bunch of ram hot plug may fails, since system does not have enough memory for cover hot added memory. When hot adding memory device, kernel object for the memory is allocated from 1G ram since hot added memory has not been enabled. Thanks, Yasuaki Ishimatsu We should simulate that way, just like boot system with PXM0 at first and later during acpi scan, add other cpus/ram. Thanks Yinghai -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched: CPU #1's llc-sibling CPU #0 is not on the same node!
On Tue, Feb 26, 2013 at 7:38 PM, Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com wrote: 2013/02/27 11:30, Yinghai Lu wrote: Do you mean you can not boot one socket system with 1G ram ? Assume socket 0 does not support hotplug, other 31 sockets support hot plug. So we could boot system only with socket0, and later one by one hot add other cpus. In this case, system can boot. But other cpus with bunch of ram hot plug may fails, since system does not have enough memory for cover hot added memory. When hot adding memory device, kernel object for the memory is allocated from 1G ram since hot added memory has not been enabled. yes, it may fail, if the one node memory need page table and vmemmap is more than 1g ... for hot add memory we need to 1. add another wrapper for init_memory_mapping, just like init_mem_mapping() for booting path. 2. we need make memblock more generic, so we can use it with hot add memory during runtime. 3. with that we can initialize page table for hot added node with ram. a. initial page table for 2M near node top is from node0 ( that does not support hot plug). b. then will use 2M for memory below node top... c. with that we will make sure page table stay on local node. alloc_low_pages need to be updated to support that. 4. need to make sure vmemmap on local node too. so hot-remove node will work too later. In the long run, we should make booting path and hot adding more similar and share at most code. That will make code get more test coverage. Thanks Yinghai -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched: CPU #1's llc-sibling CPU #0 is not on the same node!
On 02/27/2013 10:24 AM, Yinghai Lu wrote: After looked at the code more, thought that theory that does not let kernel use ram on hotplug area is not right. after that commit, following range can not use movable ram: 1. real_mode code well..funny, legacy cpu0 [0,1M) could be hot-removed? 2. dma_continguous ? 3. log buff ring. 4. initrd... why it will be freed after booting, so it could be on movable... 5. crashkernel for kdump...: : looks like we can not put kdump kernel above 4G anymore 6. initmem_init: it will allocate page table to setup kernel mapping for memory..., it should be with BRK and near end of max_pfn AFAIK, Linux kernel now cannot migrate memory used by the kernel because. So any memory used by the kernel should not be on movable area. that depends. initrd will be freed later, so it should be put anywhere that is under max_pfn during boot. OK,but initrd is not that big. Actually, before my code start to work, memblock has reserved some memory. But it is not that big. On the other hand, it is not that easy to find out which memory should be kept in unmovable area, and which should not. If node is hotplugable, the mem related stuff like page table and vmemmap could be on the that node without problem and should be on that node. page tables and vmemmap are kernel memory. They should not be movable, I think. why do you need to migrate pagetable and vmemmap for the memory range that will be offline ? Hum, you are right. :) True, we can store pagetable and vmemmap on the node that is hot-pluggable. But just like the page_cgroup structs, we need additional work to handle it. But based on the existing code, we didn't do any special handling. I think we can improve it if needed. :) assume first cpu only have 1G ram, and other 31 socket will have bunch of ram and those cpu with ram could be hotadd and hotremoved. Now you want to put page table and vmemmap on first node. The system would not boot as not enough memory for cover whole system RAM. Yes, you are right. And a more extreme situation has been talked about by HPA. If all the memory is hot-pluggable, then the kernel won't be able to boot. So, please refer to commit 01a178a94e8eaec351b29ee49fbb3d1c124cb7fb: acpi, memory-hotplug: support getting hotplug info from SRAT I have excluded all the memory reserved by memblock, and any node that has memory reserved by memblock will be set to un-hot-pluggable, which means we will have enough memory (all the memory on the node) to boot the kernel. So I think the problem you are talking about has been solved. I don't think that you understand the problem. for the system that will put all pagetable and vmemmap on the 1G ram of first cpu. as all other ram are MOVABLE, so memblock_find_in_range will not use any local ram on those nodes. Yes, I konw that. :) In this case, the kernel will not able to use local ram on those nodes. It will cause some performance down. I mean if the 1G ram is not enough for the kernel to boot, the current code will set all the ram on the same node as un-hot-pluggable. If all the ram on the node is not enough for kernel to boot, it is a really extreme situation, IIUC. I think users can solve this problem in two ways: 1) add more ram to the node. 2) use movablemem_map=nn[KMG]@ss[KMG] to configure more ram as unmovable. Thanks. :) -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched: CPU #1's llc-sibling CPU #0 is not on the same node!
2013/02/27 13:04, Yinghai Lu wrote: On Tue, Feb 26, 2013 at 7:38 PM, Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com wrote: 2013/02/27 11:30, Yinghai Lu wrote: Do you mean you can not boot one socket system with 1G ram ? Assume socket 0 does not support hotplug, other 31 sockets support hot plug. So we could boot system only with socket0, and later one by one hot add other cpus. In this case, system can boot. But other cpus with bunch of ram hot plug may fails, since system does not have enough memory for cover hot added memory. When hot adding memory device, kernel object for the memory is allocated from 1G ram since hot added memory has not been enabled. yes, it may fail, if the one node memory need page table and vmemmap is more than 1g ... for hot add memory we need to 1. add another wrapper for init_memory_mapping, just like init_mem_mapping() for booting path. 2. we need make memblock more generic, so we can use it with hot add memory during runtime. 3. with that we can initialize page table for hot added node with ram. a. initial page table for 2M near node top is from node0 ( that does not support hot plug). b. then will use 2M for memory below node top... c. with that we will make sure page table stay on local node. alloc_low_pages need to be updated to support that. 4. need to make sure vmemmap on local node too. I think so too. By this, memory hot plug becomes more useful. Thanks, Yasuaki Ishimatsu so hot-remove node will work too later. In the long run, we should make booting path and hot adding more similar and share at most code. That will make code get more test coverage. Thanks Yinghai -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched: CPU #1's llc-sibling CPU #0 is not on the same node!
On Tue, Feb 26, 2013 at 8:43 PM, Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com wrote: 2013/02/27 13:04, Yinghai Lu wrote: On Tue, Feb 26, 2013 at 7:38 PM, Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com wrote: 2013/02/27 11:30, Yinghai Lu wrote: Do you mean you can not boot one socket system with 1G ram ? Assume socket 0 does not support hotplug, other 31 sockets support hot plug. So we could boot system only with socket0, and later one by one hot add other cpus. In this case, system can boot. But other cpus with bunch of ram hot plug may fails, since system does not have enough memory for cover hot added memory. When hot adding memory device, kernel object for the memory is allocated from 1G ram since hot added memory has not been enabled. yes, it may fail, if the one node memory need page table and vmemmap is more than 1g ... for hot add memory we need to 1. add another wrapper for init_memory_mapping, just like init_mem_mapping() for booting path. 2. we need make memblock more generic, so we can use it with hot add memory during runtime. 3. with that we can initialize page table for hot added node with ram. a. initial page table for 2M near node top is from node0 ( that does not support hot plug). b. then will use 2M for memory below node top... c. with that we will make sure page table stay on local node. alloc_low_pages need to be updated to support that. 4. need to make sure vmemmap on local node too. I think so too. By this, memory hot plug becomes more useful. so hot-remove node will work too later. In the long run, we should make booting path and hot adding more similar and share at most code. That will make code get more test coverage. Tang, Yasuaki, Andrew, Please check if you are ok with attached reverting patch. Tim, Don, Can you try if attached reverting patch fix all the problems for you ? Thanks Yinghai revert_movable_map.patch Description: Binary data
Re: sched: CPU #1's llc-sibling CPU #0 is not on the same node!
2013/02/27 14:11, Yinghai Lu wrote: On Tue, Feb 26, 2013 at 8:43 PM, Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com wrote: 2013/02/27 13:04, Yinghai Lu wrote: On Tue, Feb 26, 2013 at 7:38 PM, Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com wrote: 2013/02/27 11:30, Yinghai Lu wrote: Do you mean you can not boot one socket system with 1G ram ? Assume socket 0 does not support hotplug, other 31 sockets support hot plug. So we could boot system only with socket0, and later one by one hot add other cpus. In this case, system can boot. But other cpus with bunch of ram hot plug may fails, since system does not have enough memory for cover hot added memory. When hot adding memory device, kernel object for the memory is allocated from 1G ram since hot added memory has not been enabled. yes, it may fail, if the one node memory need page table and vmemmap is more than 1g ... for hot add memory we need to 1. add another wrapper for init_memory_mapping, just like init_mem_mapping() for booting path. 2. we need make memblock more generic, so we can use it with hot add memory during runtime. 3. with that we can initialize page table for hot added node with ram. a. initial page table for 2M near node top is from node0 ( that does not support hot plug). b. then will use 2M for memory below node top... c. with that we will make sure page table stay on local node. alloc_low_pages need to be updated to support that. 4. need to make sure vmemmap on local node too. I think so too. By this, memory hot plug becomes more useful. I agree with your idea. But I think above ideas is future work. So at first we should use movable memory for memory hot plug. After that, we will implement above ideas. so hot-remove node will work too later. In the long run, we should make booting path and hot adding more similar and share at most code. That will make code get more test coverage. Tang, Yasuaki, Andrew, Please check if you are ok with attached reverting patch. We will fix this problem with no objection. So please wait a while. And the problem occurs by movablemem_map=srat not movablemem_map=nn[KMG]@ss[KMG] At least, if you want to revert it, you should revert only movablemem_map=srat part. Thanks, Yasuaki Ishimatsu Tim, Don, Can you try if attached reverting patch fix all the problems for you ? Thanks Yinghai -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched: CPU #1's llc-sibling CPU #0 is not on the same node!
On Tue, Feb 26, 2013 at 9:49 PM, Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com wrote: 2013/02/27 14:11, Yinghai Lu wrote: On Tue, Feb 26, 2013 at 8:43 PM, Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com wrote: 2013/02/27 13:04, Yinghai Lu wrote: On Tue, Feb 26, 2013 at 7:38 PM, Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com wrote: 2013/02/27 11:30, Yinghai Lu wrote: Do you mean you can not boot one socket system with 1G ram ? Assume socket 0 does not support hotplug, other 31 sockets support hot plug. So we could boot system only with socket0, and later one by one hot add other cpus. In this case, system can boot. But other cpus with bunch of ram hot plug may fails, since system does not have enough memory for cover hot added memory. When hot adding memory device, kernel object for the memory is allocated from 1G ram since hot added memory has not been enabled. yes, it may fail, if the one node memory need page table and vmemmap is more than 1g ... for hot add memory we need to 1. add another wrapper for init_memory_mapping, just like init_mem_mapping() for booting path. 2. we need make memblock more generic, so we can use it with hot add memory during runtime. 3. with that we can initialize page table for hot added node with ram. a. initial page table for 2M near node top is from node0 ( that does not support hot plug). b. then will use 2M for memory below node top... c. with that we will make sure page table stay on local node. alloc_low_pages need to be updated to support that. 4. need to make sure vmemmap on local node too. I think so too. By this, memory hot plug becomes more useful. I agree with your idea. But I think above ideas is future work. So at first we should use movable memory for memory hot plug. After that, we will implement above ideas. so hot-remove node will work too later. In the long run, we should make booting path and hot adding more similar and share at most code. That will make code get more test coverage. Tang, Yasuaki, Andrew, Please check if you are ok with attached reverting patch. We will fix this problem with no objection. So please wait a while. And the problem occurs by movablemem_map=srat not movablemem_map=nn[KMG]@ss[KMG] At least, if you want to revert it, you should revert only movablemem_map=srat part. Those patches are tangled together. Also it looks funny to ask user to specify mem range in boot command line to enable mem hotplug. Thanks Yinghai -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched: CPU #1's llc-sibling CPU #0 is not on the same node!
On 02/27/2013 02:54 PM, Yinghai Lu wrote: On Tue, Feb 26, 2013 at 9:49 PM, Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com wrote: 2013/02/27 14:11, Yinghai Lu wrote: On Tue, Feb 26, 2013 at 8:43 PM, Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com wrote: 2013/02/27 13:04, Yinghai Lu wrote: On Tue, Feb 26, 2013 at 7:38 PM, Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com wrote: 2013/02/27 11:30, Yinghai Lu wrote: Do you mean you can not boot one socket system with 1G ram ? Assume socket 0 does not support hotplug, other 31 sockets support hot plug. So we could boot system only with socket0, and later one by one hot add other cpus. In this case, system can boot. But other cpus with bunch of ram hot plug may fails, since system does not have enough memory for cover hot added memory. When hot adding memory device, kernel object for the memory is allocated from 1G ram since hot added memory has not been enabled. yes, it may fail, if the one node memory need page table and vmemmap is more than 1g ... for hot add memory we need to 1. add another wrapper for init_memory_mapping, just like init_mem_mapping() for booting path. 2. we need make memblock more generic, so we can use it with hot add memory during runtime. 3. with that we can initialize page table for hot added node with ram. a. initial page table for 2M near node top is from node0 ( that does not support hot plug). b. then will use 2M for memory below node top... c. with that we will make sure page table stay on local node. alloc_low_pages need to be updated to support that. 4. need to make sure vmemmap on local node too. I think so too. By this, memory hot plug becomes more useful. I agree with your idea. But I think above ideas is future work. So at first we should use movable memory for memory hot plug. After that, we will implement above ideas. so hot-remove node will work too later. In the long run, we should make booting path and hot adding more similar and share at most code. That will make code get more test coverage. Tang, Yasuaki, Andrew, Please check if you are ok with attached reverting patch. We will fix this problem with no objection. So please wait a while. And the problem occurs by movablemem_map=srat not movablemem_map=nn[KMG]@ss[KMG] At least, if you want to revert it, you should revert only movablemem_map=srat part. Those patches are tangled together. No, they are not. The following commits supports movablemem_map=nn[KMG]@ss[KMG]. commit fb06bc8e5f42f38c011de0e59481f464a82380f6 page_alloc: bootmem limit with movablecore_map commit 42f47e27e761fee07da69e04612ec7dd0d490edd page_alloc: make movablemem_map have higher priority commit 6981ec31146cf19454c55c130625f6cee89aab95 page_alloc: introduce zone_movable_limit[] to keep movable limit for nodes commit 34b71f1e04fcba578e719e675b4882eeeb2a1f6f page_alloc: add movable_memmap kernel parameter commit 4d59a75125d5a4717e57e9fc62c64b3d346e603e x86: get pg_data_t's memory from other node And the following supports movablemem_map=srat. commit f7210e6c4ac795694106c1c5307134d3fc233e88 mm/memblock.c: use CONFIG_HAVE_MEMBLOCK_NODE_MAP to protect movablecore_map in memblock_overlaps_region(). commit 01a178a94e8eaec351b29ee49fbb3d1c124cb7fb acpi, memory-hotplug: support getting hotplug info from SRAT commit 27168d38fa209073219abedbe6a9de7ba9acbfad acpi, memory-hotplug: extend movablemem_map ranges to the end of node commit e8d1955258091e4c92d5a975ebd7fd8a98f5d30f acpi, memory-hotplug: parse SRAT before memblock is ready Also it looks funny to ask user to specify mem range in boot command line to enable mem hotplug. Well, I think sometimes users don't like the SRAT memory style, and want to increase or reduce hot-pluggable memory by themselves. And also, it is useful for debuging firmware bugs. I agree that movablemem_map=srat functionality need more work to improve. Can we not revert it, and improve it during 3.9rc ? I think during rc time, at least we can fix the problems brought by early_parse_srat(). Thanks. :) -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched: CPU #1's llc-sibling CPU #0 is not on the same node!
On Tue, Feb 26, 2013 at 11:11 PM, Tang Chen tangc...@cn.fujitsu.com wrote: On 02/27/2013 02:54 PM, Yinghai Lu wrote: Those patches are tangled together. No, they are not. The following commits supports movablemem_map=nn[KMG]@ss[KMG]. commit fb06bc8e5f42f38c011de0e59481f464a82380f6 page_alloc: bootmem limit with movablecore_map commit 42f47e27e761fee07da69e04612ec7dd0d490edd page_alloc: make movablemem_map have higher priority commit 6981ec31146cf19454c55c130625f6cee89aab95 page_alloc: introduce zone_movable_limit[] to keep movable limit for nodes commit 34b71f1e04fcba578e719e675b4882eeeb2a1f6f page_alloc: add movable_memmap kernel parameter commit 4d59a75125d5a4717e57e9fc62c64b3d346e603e x86: get pg_data_t's memory from other node And the following supports movablemem_map=srat. commit f7210e6c4ac795694106c1c5307134d3fc233e88 mm/memblock.c: use CONFIG_HAVE_MEMBLOCK_NODE_MAP to protect movablecore_map in memblock_overlaps_region(). commit 01a178a94e8eaec351b29ee49fbb3d1c124cb7fb acpi, memory-hotplug: support getting hotplug info from SRAT commit 27168d38fa209073219abedbe6a9de7ba9acbfad acpi, memory-hotplug: extend movablemem_map ranges to the end of node commit e8d1955258091e4c92d5a975ebd7fd8a98f5d30f acpi, memory-hotplug: parse SRAT before memblock is ready those four can be reverted cleanly? Also it looks funny to ask user to specify mem range in boot command line to enable mem hotplug. Well, I think sometimes users don't like the SRAT memory style, and want to increase or reduce hot-pluggable memory by themselves. And also, it is useful for debuging firmware bugs. I agree that movablemem_map=srat functionality need more work to improve. Can we not revert it, and improve it during 3.9rc ? I think during rc time, at least we can fix the problems brought by early_parse_srat(). looks like acpi_override can not be fixed. Thanks Yinghai -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched: CPU #1's llc-sibling CPU #0 is not on the same node!
On 02/27/2013 03:25 PM, Yinghai Lu wrote: On Tue, Feb 26, 2013 at 11:11 PM, Tang Chentangc...@cn.fujitsu.com wrote: On 02/27/2013 02:54 PM, Yinghai Lu wrote: Those patches are tangled together. No, they are not. The following commits supports movablemem_map=nn[KMG]@ss[KMG]. commit fb06bc8e5f42f38c011de0e59481f464a82380f6 page_alloc: bootmem limit with movablecore_map commit 42f47e27e761fee07da69e04612ec7dd0d490edd page_alloc: make movablemem_map have higher priority commit 6981ec31146cf19454c55c130625f6cee89aab95 page_alloc: introduce zone_movable_limit[] to keep movable limit for nodes commit 34b71f1e04fcba578e719e675b4882eeeb2a1f6f page_alloc: add movable_memmap kernel parameter commit 4d59a75125d5a4717e57e9fc62c64b3d346e603e x86: get pg_data_t's memory from other node And the following supports movablemem_map=srat. commit f7210e6c4ac795694106c1c5307134d3fc233e88 mm/memblock.c: use CONFIG_HAVE_MEMBLOCK_NODE_MAP to protect movablecore_map in memblock_overlaps_region(). commit 01a178a94e8eaec351b29ee49fbb3d1c124cb7fb acpi, memory-hotplug: support getting hotplug info from SRAT commit 27168d38fa209073219abedbe6a9de7ba9acbfad acpi, memory-hotplug: extend movablemem_map ranges to the end of node commit e8d1955258091e4c92d5a975ebd7fd8a98f5d30f acpi, memory-hotplug: parse SRAT before memblock is ready those four can be reverted cleanly? Sorry, if you want to revert, you just need to revert: commit e8d1955258091e4c92d5a975ebd7fd8a98f5d30f acpi, memory-hotplug: parse SRAT before memblock is ready commit 01a178a94e8eaec351b29ee49fbb3d1c124cb7fb acpi, memory-hotplug: support getting hotplug info from SRAT The other two have nothing to do with SRAT. And they are necessary. Seeing from the code, I think it is clean. But we'd better test it. Also it looks funny to ask user to specify mem range in boot command line to enable mem hotplug. Well, I think sometimes users don't like the SRAT memory style, and want to increase or reduce hot-pluggable memory by themselves. And also, it is useful for debuging firmware bugs. I agree that movablemem_map=srat functionality need more work to improve. Can we not revert it, and improve it during 3.9rc ? I think during rc time, at least we can fix the problems brought by early_parse_srat(). looks like acpi_override can not be fixed. About this problem, I need to do some investigation, and I think we can have a try. I do hope we can keep these patches. And put the improve work in the future. :) Thanks. :) -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched: CPU #1's llc-sibling CPU #0 is not on the same node!
On 02/27/2013 01:11 PM, Yinghai Lu wrote: On Tue, Feb 26, 2013 at 8:43 PM, Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com wrote: 2013/02/27 13:04, Yinghai Lu wrote: On Tue, Feb 26, 2013 at 7:38 PM, Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com wrote: 2013/02/27 11:30, Yinghai Lu wrote: Do you mean you can not boot one socket system with 1G ram ? Assume socket 0 does not support hotplug, other 31 sockets support hot plug. So we could boot system only with socket0, and later one by one hot add other cpus. In this case, system can boot. But other cpus with bunch of ram hot plug may fails, since system does not have enough memory for cover hot added memory. When hot adding memory device, kernel object for the memory is allocated from 1G ram since hot added memory has not been enabled. yes, it may fail, if the one node memory need page table and vmemmap is more than 1g ... for hot add memory we need to 1. add another wrapper for init_memory_mapping, just like init_mem_mapping() for booting path. 2. we need make memblock more generic, so we can use it with hot add memory during runtime. 3. with that we can initialize page table for hot added node with ram. a. initial page table for 2M near node top is from node0 ( that does not support hot plug). b. then will use 2M for memory below node top... c. with that we will make sure page table stay on local node. alloc_low_pages need to be updated to support that. 4. need to make sure vmemmap on local node too. I think so too. By this, memory hot plug becomes more useful. so hot-remove node will work too later. In the long run, we should make booting path and hot adding more similar and share at most code. That will make code get more test coverage. Tang, Yasuaki, Andrew, Please check if you are ok with attached reverting patch. Tim, Don, Can you try if attached reverting patch fix all the problems for you ? Hi, Yinghai, Andrew In the mails and the changlog of the revert-patch, I think Yinghai mainly worries about 3 problems. 1) the current implement has bug and bad code. Yes. Any bug should be fixed. we should fix it directly, or we can revert the related patches and then send the fixed patches. But the related patch is only one or two, it is not good idea to revert the whole patchset or the whole feature. Right? Thank you all for addressing the bug. we are on the way to fix it. 2) many memory can be put into hotplugable memory, but we have not yet moved them into hotplugable memory yet. like: vmemmap, some page table ...etc, a lot. This is a restriction in the currently kernel, we can't convert them quickly. we must convert them step by step. example, we are converting the memory of page_cgroup to hotplugable memory. 3) if the user(or firmware) specify the un-hotplugable memory too small, the system can't work, even can't boot. Any feature/system has its own minimum requirements, the user should meet the requirements and specify more un-hotplugable memory. so I don't think it is a problem in kernel land. But the problem 2)(above) make this feature's minimum requirements much higher. It is the real thing that Yinghai worries about. But all systems which use this feature can offer this higher requirement very easily. The users should specify enough un-hotplugable memory before and after we decrease the minimum requirements. The whole feature works very well if the user specify enough un-hotplugable memory. So the problem 2) and 3) are not urgent problems. And our team has another problem, we are still not good at community work, (example, the patch TITLE is total misleading), but we are growing up. We are sorry and thank you for pointing out the mistakes. The feature/patchset does have problems. But it is not good to tangle all the problems together and revert the whole feature. Thanks, Lai -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched: CPU #1's llc-sibling CPU #0 is not on the same node!
Hi Yinghai, 2013/02/26 15:57, Yinghai Lu wrote: On Mon, Feb 25, 2013 at 10:09 PM, Tang Chen wrote: On 02/26/2013 12:51 PM, Martin Bligh wrote: Do you mean we can remove numaq x86 32bit code now? Wouldn't bother me at all. The machine is from 1995, end of life c. 2000? Was useful in the early days of getting NUMA up and running on Linux, but is now too old to be a museum piece, really. M. Hi Martin, Yinghai, It was me that I failed to make numa_init() fall back path working, and forgot to call early_parse_srat in ia64. Sorry for the breaking of other platform. :) So now, is Yinghai's patch enough for this problem ? Or we can encapsulate the following clear up work into one function ? + for (i = 0; i < MAX_LOCAL_APIC; i++) + set_apicid_to_node(i, NUMA_NO_NODE); + nodes_clear(numa_nodes_parsed); + memset(_meminfo, 0, sizeof(numa_meminfo)); That is temporary workaround and your patch and this workaround make x86 acpi numa init too messy. I don't see the point to hack SRAT to make memory hotplug working. Do you guys check and use PMTT in ACPI spec instead? I read PMTT specification in ACPI spec revision 5.0. But this table does not have hotpluggable information. So we cannot know which memory device can hotplug from this table. Thanks, Yasuaki Ishimatsu Yinghai -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched: CPU #1's llc-sibling CPU #0 is not on the same node!
On 02/26/2013 02:57 PM, Yinghai Lu wrote: That is temporary workaround and your patch and this workaround make x86 acpi numa init too messy. I don't see the point to hack SRAT to make memory hotplug working. Do you guys check and use PMTT in ACPI spec instead? Hi Yinghai, Thanks for the suggestion. :) The point we are using SRAT is that we need the hot-pluggable bit in SRAT. I didn't find such info in PMTT or elsewhere. We use SRAT in this way aims to satisfy users who don't want to specify physical address ranges in kernel command line. They want to use SRAT to determine which memory is hot-pluggable, and which is not. To achieve this aim, we have to ensure we have the SRAT info before memblock starts to allocate memory. So that we can prevent memblock from allocating memory in the hot-pluggable area. So I have to parse SRAT earlier. I don't think the code is that messy. I think we can encapsulate the clear up job into one function, and call it where it is needed. How do you think ? Thanks. :) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched: CPU #1's llc-sibling CPU #0 is not on the same node!
On Mon, Feb 25, 2013 at 10:09 PM, Tang Chen wrote: > On 02/26/2013 12:51 PM, Martin Bligh wrote: >>> >>> Do you mean we can remove numaq x86 32bit code now? >> >> >> Wouldn't bother me at all. The machine is from 1995, end of life c. 2000? >> Was useful in the early days of getting NUMA up and running on Linux, >> but is now too old to be a museum piece, really. >> >> M. >> > > Hi Martin, Yinghai, > > It was me that I failed to make numa_init() fall back path working, and > forgot > to call early_parse_srat in ia64. Sorry for the breaking of other platform. > :) > > So now, is Yinghai's patch enough for this problem ? > Or we can encapsulate the following clear up work into one function ? > > > + for (i = 0; i < MAX_LOCAL_APIC; i++) > + set_apicid_to_node(i, NUMA_NO_NODE); > + nodes_clear(numa_nodes_parsed); > + memset(_meminfo, 0, sizeof(numa_meminfo)); > > That is temporary workaround and your patch and this workaround make x86 acpi numa init too messy. I don't see the point to hack SRAT to make memory hotplug working. Do you guys check and use PMTT in ACPI spec instead? Yinghai -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched: CPU #1's llc-sibling CPU #0 is not on the same node!
On 02/26/2013 12:51 PM, Martin Bligh wrote: Do you mean we can remove numaq x86 32bit code now? Wouldn't bother me at all. The machine is from 1995, end of life c. 2000? Was useful in the early days of getting NUMA up and running on Linux, but is now too old to be a museum piece, really. M. Hi Martin, Yinghai, It was me that I failed to make numa_init() fall back path working, and forgot to call early_parse_srat in ia64. Sorry for the breaking of other platform. :) So now, is Yinghai's patch enough for this problem ? Or we can encapsulate the following clear up work into one function ? + for (i = 0; i < MAX_LOCAL_APIC; i++) + set_apicid_to_node(i, NUMA_NO_NODE); + nodes_clear(numa_nodes_parsed); + memset(_meminfo, 0, sizeof(numa_meminfo)); Thanks. :) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched: CPU #1's llc-sibling CPU #0 is not on the same node!
> Do you mean we can remove numaq x86 32bit code now? Wouldn't bother me at all. The machine is from 1995, end of life c. 2000? Was useful in the early days of getting NUMA up and running on Linux, but is now too old to be a museum piece, really. M. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched: CPU #1's llc-sibling CPU #0 is not on the same node!
On Mon, Feb 25, 2013 at 7:21 PM, Martin Bligh wrote: 4, it does not CC to TJ and other numa guys... >>> >>> attached workaround the problem for now. >>> but it will assume NUMAQ would not have SRAT table. >> >> Martin, can you confirm that numaq does not have srat? > > No, it's pre-SRAT. I forget the exact name of the table, but no SRAT until > x440. > > OTOH, you should probably feel free to break it by now, I can't > imagine they are any use to man nor beast any more. Do you mean we can remove numaq x86 32bit code now? Thanks Yinghai -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched: CPU #1's llc-sibling CPU #0 is not on the same node!
>>> 4, it does not CC to TJ and other numa guys... >> >> attached workaround the problem for now. >> but it will assume NUMAQ would not have SRAT table. > > Martin, can you confirm that numaq does not have srat? No, it's pre-SRAT. I forget the exact name of the table, but no SRAT until x440. OTOH, you should probably feel free to break it by now, I can't imagine they are any use to man nor beast any more. M. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched: CPU #1's llc-sibling CPU #0 is not on the same node!
[ Add new address with Martin] On Mon, Feb 25, 2013 at 4:35 PM, Yinghai Lu wrote: > On Mon, Feb 25, 2013 at 2:50 PM, Yinghai Lu wrote: >> On Mon, Feb 25, 2013 at 1:27 PM, Don Morris wrote: >>> On 02/25/2013 10:32 AM, Tim Gardner wrote: On 02/25/2013 08:02 AM, Tim Gardner wrote: > Is this an expected warning ? I'll boot a vanilla kernel just to be sure. > > rebased against ab7826595e9ec51a51f622c5fc91e2f59440481a in Linus' repo: > Same with a vanilla kernel, so it doesn't appear that any Ubuntu cruft is having an impact: >>> >>> Reproduced on a HP z620 workstation (E5-2620 instead of E5-2680, but >>> still Sandy Bridge, though I don't think that matters). >>> >>> Bisection leads to: >>> # bad: [e8d1955258091e4c92d5a975ebd7fd8a98f5d30f] acpi, memory-hotplug: >>> parse SRAT before memblock is ready >>> >>> Nothing terribly obvious leaps out as to *why* that reshuffling messes >>> up the cpu<-->node bindings, but I wanted to put this out there while >>> I poke around further. [Note that the SRAT: PXM -> APIC -> Node print >>> outs during boot are the same either way -- if you look at the APIC >>> numbers of the processors (from /proc/cpuinfo), the processors should >>> be assigned to the correct node, but they aren't.] cc'ing Tang Chen >>> in case this is obvious to him or he's already fixed it somewhere not >>> on Linus's tree yet. >>> >>> Don Morris >>> [0.170435] [ cut here ] [0.170450] WARNING: at arch/x86/kernel/smpboot.c:324 topology_sane.isra.2+0x71/0x84() [0.170452] Hardware name: S2600CP [0.170454] sched: CPU #1's llc-sibling CPU #0 is not on the same node! [node: 1 != 0]. Ignoring dependency. [0.156000] smpboot: Booting Node 1, Processors #1 [0.170455] Modules linked in: [0.170460] Pid: 0, comm: swapper/1 Not tainted 3.8.0+ #1 [0.170461] Call Trace: [0.170466] [] warn_slowpath_common+0x7f/0xc0 [0.170473] [] warn_slowpath_fmt+0x46/0x50 [0.170477] [] topology_sane.isra.2+0x71/0x84 [0.170482] [] set_cpu_sibling_map+0x23f/0x436 [0.170487] [] start_secondary+0x137/0x201 [0.170502] ---[ end trace 09222f596307ca1d ]--- >> >> that commit is totally broken, and it should be reverted. >> >> 1. numa_init is called several times, NOT just for srat. so those >>nodes_clear(numa_nodes_parsed) >>memset(_meminfo, 0, sizeof(numa_meminfo)) >> can not be just removed. >> please consider sequence is: numaq, srat, amd, dummy. >> You need to make fall back path working! >> >> 2. simply split acpi_numa_init to early_parse_srat. >> a. that early_parse_srat is NOT called for ia64, so you break ia64. >> b. for (i = 0; i < MAX_LOCAL_APIC; i++) >> set_apicid_to_node(i, NUMA_NO_NODE) >> still left in numa_init. So it will just clear result from early_parse_srat. >> it should be moved before that >> >> 3. that patch TITLE is total misleading, there is NO x86 in the title, >> but it changes >> to x86 code. >> >> 4, it does not CC to TJ and other numa guys... > > attached workaround the problem for now. > but it will assume NUMAQ would not have SRAT table. > Martin, can you confirm that numaq does not have srat? Thanks Yinghai x.patch Description: Binary data
Re: sched: CPU #1's llc-sibling CPU #0 is not on the same node!
[0.170435] [ cut here ] [0.170450] WARNING: at arch/x86/kernel/smpboot.c:324 topology_sane.isra.2+0x71/0x84() [0.170452] Hardware name: S2600CP [0.170454] sched: CPU #1's llc-sibling CPU #0 is not on the same node! [node: 1 != 0]. Ignoring dependency. [0.156000] smpboot: Booting Node 1, Processors #1 [0.170455] Modules linked in: [0.170460] Pid: 0, comm: swapper/1 Not tainted 3.8.0+ #1 [0.170461] Call Trace: [0.170466] [] warn_slowpath_common+0x7f/0xc0 [0.170473] [] warn_slowpath_fmt+0x46/0x50 [0.170477] [] topology_sane.isra.2+0x71/0x84 [0.170482] [] set_cpu_sibling_map+0x23f/0x436 [0.170487] [] start_secondary+0x137/0x201 [0.170502] ---[ end trace 09222f596307ca1d ]--- that commit is totally broken, and it should be reverted. 1. numa_init is called several times, NOT just for srat. so those nodes_clear(numa_nodes_parsed) memset(_meminfo, 0, sizeof(numa_meminfo)) can not be just removed. please consider sequence is: numaq, srat, amd, dummy. You need to make fall back path working! 2. simply split acpi_numa_init to early_parse_srat. a. that early_parse_srat is NOT called for ia64, so you break ia64. b. for (i = 0; i< MAX_LOCAL_APIC; i++) set_apicid_to_node(i, NUMA_NO_NODE) still left in numa_init. So it will just clear result from early_parse_srat. it should be moved before that 3. that patch TITLE is total misleading, there is NO x86 in the title, but it changes to x86 code. 4, it does not CC to TJ and other numa guys... Hi Yinghai, Don, OK, I see this. I'll fix it soon. :) Thanks. :) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched: CPU #1's llc-sibling CPU #0 is not on the same node!
On Mon, Feb 25, 2013 at 2:50 PM, Yinghai Lu wrote: > On Mon, Feb 25, 2013 at 1:27 PM, Don Morris wrote: >> On 02/25/2013 10:32 AM, Tim Gardner wrote: >>> On 02/25/2013 08:02 AM, Tim Gardner wrote: Is this an expected warning ? I'll boot a vanilla kernel just to be sure. rebased against ab7826595e9ec51a51f622c5fc91e2f59440481a in Linus' repo: >>> >>> Same with a vanilla kernel, so it doesn't appear that any Ubuntu cruft >>> is having an impact: >> >> Reproduced on a HP z620 workstation (E5-2620 instead of E5-2680, but >> still Sandy Bridge, though I don't think that matters). >> >> Bisection leads to: >> # bad: [e8d1955258091e4c92d5a975ebd7fd8a98f5d30f] acpi, memory-hotplug: >> parse SRAT before memblock is ready >> >> Nothing terribly obvious leaps out as to *why* that reshuffling messes >> up the cpu<-->node bindings, but I wanted to put this out there while >> I poke around further. [Note that the SRAT: PXM -> APIC -> Node print >> outs during boot are the same either way -- if you look at the APIC >> numbers of the processors (from /proc/cpuinfo), the processors should >> be assigned to the correct node, but they aren't.] cc'ing Tang Chen >> in case this is obvious to him or he's already fixed it somewhere not >> on Linus's tree yet. >> >> Don Morris >> >>> >>> [0.170435] [ cut here ] >>> [0.170450] WARNING: at arch/x86/kernel/smpboot.c:324 >>> topology_sane.isra.2+0x71/0x84() >>> [0.170452] Hardware name: S2600CP >>> [0.170454] sched: CPU #1's llc-sibling CPU #0 is not on the same >>> node! [node: 1 != 0]. Ignoring dependency. >>> [0.156000] smpboot: Booting Node 1, Processors #1 >>> [0.170455] Modules linked in: >>> [0.170460] Pid: 0, comm: swapper/1 Not tainted 3.8.0+ #1 >>> [0.170461] Call Trace: >>> [0.170466] [] warn_slowpath_common+0x7f/0xc0 >>> [0.170473] [] warn_slowpath_fmt+0x46/0x50 >>> [0.170477] [] topology_sane.isra.2+0x71/0x84 >>> [0.170482] [] set_cpu_sibling_map+0x23f/0x436 >>> [0.170487] [] start_secondary+0x137/0x201 >>> [0.170502] ---[ end trace 09222f596307ca1d ]--- > > that commit is totally broken, and it should be reverted. > > 1. numa_init is called several times, NOT just for srat. so those >nodes_clear(numa_nodes_parsed) >memset(_meminfo, 0, sizeof(numa_meminfo)) > can not be just removed. > please consider sequence is: numaq, srat, amd, dummy. > You need to make fall back path working! > > 2. simply split acpi_numa_init to early_parse_srat. > a. that early_parse_srat is NOT called for ia64, so you break ia64. > b. for (i = 0; i < MAX_LOCAL_APIC; i++) > set_apicid_to_node(i, NUMA_NO_NODE) > still left in numa_init. So it will just clear result from early_parse_srat. > it should be moved before that > > 3. that patch TITLE is total misleading, there is NO x86 in the title, > but it changes > to x86 code. > > 4, it does not CC to TJ and other numa guys... attached workaround the problem for now. but it will assume NUMAQ would not have SRAT table. Martin, can you confirm that numaq does not have srat? Yinghai x.patch Description: Binary data