Re: [PATCH 1/2 V2] memory-hotplug: fix BUG_ON in move_freepages()
On 2015/4/21 2:23, Yasuaki Ishimatsu wrote: > > On Mon, 20 Apr 2015 11:42:10 +0800 > Xishi Qiu wrote: > >> On 2015/4/20 11:29, Yasuaki Ishimatsu wrote: >> >>> >>> On Mon, 20 Apr 2015 10:45:45 +0800 >>> Xishi Qiu wrote: >>> On 2015/4/20 9:42, Gu Zheng wrote: > Hi Xishi, > On 04/18/2015 04:05 AM, Yasuaki Ishimatsu wrote: > >> >> Your patches will fix your issue. >> But, if BIOS reports memory first at node hot add, pgdat can >> not be initialized. >> >> Memory hot add flows are as follows: >> >> add_memory >> ... >> -> hotadd_new_pgdat() >> ... >> -> node_set_online(nid) >> >> When calling hotadd_new_pgdat() for a hot added node, the node is >> offline because node_set_online() is not called yet. So if applying >> your patches, the pgdat is not initialized in this case. > > Ishimtasu's worry is reasonable. And I am afraid the fix here is a bit > over-kill. > >> >> Thanks, >> Yasuaki Ishimatsu >> >> On Fri, 17 Apr 2015 18:50:32 +0800 >> Xishi Qiu wrote: >> >>> Hot remove nodeXX, then hot add nodeXX. If BIOS report cpu first, it >>> will call >>> hotadd_new_pgdat(nid, 0), this will set pgdat->node_start_pfn to 0. As >>> nodeXX >>> exists at boot time, so pgdat->node_spanned_pages is the same as >>> original. Then >>> free_area_init_core()->memmap_init() will pass a wrong start and a >>> nonzero size. > > As your analysis said the root cause here is passing a *0* as the > node_start_pfn, > then the chaos occurred when init the zones. And this only happens to the > re-hotadd > node, so how about using the saved *node_start_pfn* (via > get_pfn_range_for_nid(nid, _pfn, _pfn)) > instead if we find "pgdat->node_start_pfn == 0 && !node_online(XXX)"? > > Thanks, > Gu > Hi Gu, I first considered this method, but if the hot added node's start and size are different from before, it makes the chaos. >>> e.g. nodeXX (8-16G) remove nodeXX BIOS report cpu first and online it hotadd nodeXX use the original value, so pgdat->node_start_pfn is set to 8G, and size is 8G BIOS report mem(10-12G) call add_memory()->__add_zone()->grow_zone_span()/grow_pgdat_span() the start is still 8G, not 10G, this is chaos! >>> >>> If you set CONFIG_HAVE_MEMBLOCK_NODE_MAP, kernel shows the following >>> pr_info()'s message. >>> >>> void __paginginit free_area_init_node(int nid, unsigned long *zones_size, >>> unsigned long node_start_pfn, unsigned long *zholes_size) >>> { >>> ... >>> #ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP >>> get_pfn_range_for_nid(nid, _pfn, _pfn); >>> pr_info("Initmem setup node %d [mem %#018Lx-%#018Lx]\n", nid, >>> (u64)start_pfn << PAGE_SHIFT, ((u64)end_pfn << PAGE_SHIFT) >>> - 1); >>> #endif >>> } >>> >>> Is the memory range of the message "8G - 16G"? >>> If so, the reason is that memblk is not deleted at memory hot remove. >>> >>> Thanks, >>> Yasuaki Ishimatsu >>> >> >> Hi Yasuaki, >> > >> By reading the code, I find memblk is not deleted at memory hot remove. >> I am not sure whether we should remove it. If remove it, we should also reset >> "arch_zone_lowest_possible_pfn", right? It seems a little complicated. > > I think memblk should be added/removed by hot adding/removing memory. > But, arch_zone_lowest_possible_pfn should not be changed. > Ok, thanks for your suggestion. > Thanks, > Yasuaki Ishimatsu > >> >> Thanks, >> Xishi Qiu >> >>> >>> Thanks, Xishi Qiu >>> >>> . >>> >> >> >> > > . > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2 V2] memory-hotplug: fix BUG_ON in move_freepages()
On Mon, 20 Apr 2015 11:42:10 +0800 Xishi Qiu wrote: > On 2015/4/20 11:29, Yasuaki Ishimatsu wrote: > > > > > On Mon, 20 Apr 2015 10:45:45 +0800 > > Xishi Qiu wrote: > > > >> On 2015/4/20 9:42, Gu Zheng wrote: > >> > >>> Hi Xishi, > >>> On 04/18/2015 04:05 AM, Yasuaki Ishimatsu wrote: > >>> > > Your patches will fix your issue. > But, if BIOS reports memory first at node hot add, pgdat can > not be initialized. > > Memory hot add flows are as follows: > > add_memory > ... > -> hotadd_new_pgdat() > ... > -> node_set_online(nid) > > When calling hotadd_new_pgdat() for a hot added node, the node is > offline because node_set_online() is not called yet. So if applying > your patches, the pgdat is not initialized in this case. > >>> > >>> Ishimtasu's worry is reasonable. And I am afraid the fix here is a bit > >>> over-kill. > >>> > > Thanks, > Yasuaki Ishimatsu > > On Fri, 17 Apr 2015 18:50:32 +0800 > Xishi Qiu wrote: > > > Hot remove nodeXX, then hot add nodeXX. If BIOS report cpu first, it > > will call > > hotadd_new_pgdat(nid, 0), this will set pgdat->node_start_pfn to 0. As > > nodeXX > > exists at boot time, so pgdat->node_spanned_pages is the same as > > original. Then > > free_area_init_core()->memmap_init() will pass a wrong start and a > > nonzero size. > >>> > >>> As your analysis said the root cause here is passing a *0* as the > >>> node_start_pfn, > >>> then the chaos occurred when init the zones. And this only happens to the > >>> re-hotadd > >>> node, so how about using the saved *node_start_pfn* (via > >>> get_pfn_range_for_nid(nid, _pfn, _pfn)) > >>> instead if we find "pgdat->node_start_pfn == 0 && !node_online(XXX)"? > >>> > >>> Thanks, > >>> Gu > >>> > >> > >> Hi Gu, > >> > >> I first considered this method, but if the hot added node's start and size > >> are different > >> from before, it makes the chaos. > >> > > > >> e.g. > >> nodeXX (8-16G) > >> remove nodeXX > >> BIOS report cpu first and online it > >> hotadd nodeXX > >> use the original value, so pgdat->node_start_pfn is set to 8G, and size is > >> 8G > >> BIOS report mem(10-12G) > >> call add_memory()->__add_zone()->grow_zone_span()/grow_pgdat_span() > >> the start is still 8G, not 10G, this is chaos! > > > > If you set CONFIG_HAVE_MEMBLOCK_NODE_MAP, kernel shows the following > > pr_info()'s message. > > > > void __paginginit free_area_init_node(int nid, unsigned long *zones_size, > > unsigned long node_start_pfn, unsigned long *zholes_size) > > { > > ... > > #ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP > > get_pfn_range_for_nid(nid, _pfn, _pfn); > > pr_info("Initmem setup node %d [mem %#018Lx-%#018Lx]\n", nid, > > (u64)start_pfn << PAGE_SHIFT, ((u64)end_pfn << PAGE_SHIFT) > > - 1); > > #endif > > } > > > > Is the memory range of the message "8G - 16G"? > > If so, the reason is that memblk is not deleted at memory hot remove. > > > > Thanks, > > Yasuaki Ishimatsu > > > > Hi Yasuaki, > > By reading the code, I find memblk is not deleted at memory hot remove. > I am not sure whether we should remove it. If remove it, we should also reset > "arch_zone_lowest_possible_pfn", right? It seems a little complicated. I think memblk should be added/removed by hot adding/removing memory. But, arch_zone_lowest_possible_pfn should not be changed. Thanks, Yasuaki Ishimatsu > > Thanks, > Xishi Qiu > > > > > > >> > >> Thanks, > >> Xishi Qiu > >> > > > > . > > > > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2 V2] memory-hotplug: fix BUG_ON in move_freepages()
On 2015/4/21 2:23, Yasuaki Ishimatsu wrote: On Mon, 20 Apr 2015 11:42:10 +0800 Xishi Qiu qiuxi...@huawei.com wrote: On 2015/4/20 11:29, Yasuaki Ishimatsu wrote: On Mon, 20 Apr 2015 10:45:45 +0800 Xishi Qiu qiuxi...@huawei.com wrote: On 2015/4/20 9:42, Gu Zheng wrote: Hi Xishi, On 04/18/2015 04:05 AM, Yasuaki Ishimatsu wrote: Your patches will fix your issue. But, if BIOS reports memory first at node hot add, pgdat can not be initialized. Memory hot add flows are as follows: add_memory ... - hotadd_new_pgdat() ... - node_set_online(nid) When calling hotadd_new_pgdat() for a hot added node, the node is offline because node_set_online() is not called yet. So if applying your patches, the pgdat is not initialized in this case. Ishimtasu's worry is reasonable. And I am afraid the fix here is a bit over-kill. Thanks, Yasuaki Ishimatsu On Fri, 17 Apr 2015 18:50:32 +0800 Xishi Qiu qiuxi...@huawei.com wrote: Hot remove nodeXX, then hot add nodeXX. If BIOS report cpu first, it will call hotadd_new_pgdat(nid, 0), this will set pgdat-node_start_pfn to 0. As nodeXX exists at boot time, so pgdat-node_spanned_pages is the same as original. Then free_area_init_core()-memmap_init() will pass a wrong start and a nonzero size. As your analysis said the root cause here is passing a *0* as the node_start_pfn, then the chaos occurred when init the zones. And this only happens to the re-hotadd node, so how about using the saved *node_start_pfn* (via get_pfn_range_for_nid(nid, start_pfn, end_pfn)) instead if we find pgdat-node_start_pfn == 0 !node_online(XXX)? Thanks, Gu Hi Gu, I first considered this method, but if the hot added node's start and size are different from before, it makes the chaos. e.g. nodeXX (8-16G) remove nodeXX BIOS report cpu first and online it hotadd nodeXX use the original value, so pgdat-node_start_pfn is set to 8G, and size is 8G BIOS report mem(10-12G) call add_memory()-__add_zone()-grow_zone_span()/grow_pgdat_span() the start is still 8G, not 10G, this is chaos! If you set CONFIG_HAVE_MEMBLOCK_NODE_MAP, kernel shows the following pr_info()'s message. void __paginginit free_area_init_node(int nid, unsigned long *zones_size, unsigned long node_start_pfn, unsigned long *zholes_size) { ... #ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP get_pfn_range_for_nid(nid, start_pfn, end_pfn); pr_info(Initmem setup node %d [mem %#018Lx-%#018Lx]\n, nid, (u64)start_pfn PAGE_SHIFT, ((u64)end_pfn PAGE_SHIFT) - 1); #endif } Is the memory range of the message 8G - 16G? If so, the reason is that memblk is not deleted at memory hot remove. Thanks, Yasuaki Ishimatsu Hi Yasuaki, By reading the code, I find memblk is not deleted at memory hot remove. I am not sure whether we should remove it. If remove it, we should also reset arch_zone_lowest_possible_pfn, right? It seems a little complicated. I think memblk should be added/removed by hot adding/removing memory. But, arch_zone_lowest_possible_pfn should not be changed. Ok, thanks for your suggestion. Thanks, Yasuaki Ishimatsu Thanks, Xishi Qiu Thanks, Xishi Qiu . . -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2 V2] memory-hotplug: fix BUG_ON in move_freepages()
On Mon, 20 Apr 2015 11:42:10 +0800 Xishi Qiu qiuxi...@huawei.com wrote: On 2015/4/20 11:29, Yasuaki Ishimatsu wrote: On Mon, 20 Apr 2015 10:45:45 +0800 Xishi Qiu qiuxi...@huawei.com wrote: On 2015/4/20 9:42, Gu Zheng wrote: Hi Xishi, On 04/18/2015 04:05 AM, Yasuaki Ishimatsu wrote: Your patches will fix your issue. But, if BIOS reports memory first at node hot add, pgdat can not be initialized. Memory hot add flows are as follows: add_memory ... - hotadd_new_pgdat() ... - node_set_online(nid) When calling hotadd_new_pgdat() for a hot added node, the node is offline because node_set_online() is not called yet. So if applying your patches, the pgdat is not initialized in this case. Ishimtasu's worry is reasonable. And I am afraid the fix here is a bit over-kill. Thanks, Yasuaki Ishimatsu On Fri, 17 Apr 2015 18:50:32 +0800 Xishi Qiu qiuxi...@huawei.com wrote: Hot remove nodeXX, then hot add nodeXX. If BIOS report cpu first, it will call hotadd_new_pgdat(nid, 0), this will set pgdat-node_start_pfn to 0. As nodeXX exists at boot time, so pgdat-node_spanned_pages is the same as original. Then free_area_init_core()-memmap_init() will pass a wrong start and a nonzero size. As your analysis said the root cause here is passing a *0* as the node_start_pfn, then the chaos occurred when init the zones. And this only happens to the re-hotadd node, so how about using the saved *node_start_pfn* (via get_pfn_range_for_nid(nid, start_pfn, end_pfn)) instead if we find pgdat-node_start_pfn == 0 !node_online(XXX)? Thanks, Gu Hi Gu, I first considered this method, but if the hot added node's start and size are different from before, it makes the chaos. e.g. nodeXX (8-16G) remove nodeXX BIOS report cpu first and online it hotadd nodeXX use the original value, so pgdat-node_start_pfn is set to 8G, and size is 8G BIOS report mem(10-12G) call add_memory()-__add_zone()-grow_zone_span()/grow_pgdat_span() the start is still 8G, not 10G, this is chaos! If you set CONFIG_HAVE_MEMBLOCK_NODE_MAP, kernel shows the following pr_info()'s message. void __paginginit free_area_init_node(int nid, unsigned long *zones_size, unsigned long node_start_pfn, unsigned long *zholes_size) { ... #ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP get_pfn_range_for_nid(nid, start_pfn, end_pfn); pr_info(Initmem setup node %d [mem %#018Lx-%#018Lx]\n, nid, (u64)start_pfn PAGE_SHIFT, ((u64)end_pfn PAGE_SHIFT) - 1); #endif } Is the memory range of the message 8G - 16G? If so, the reason is that memblk is not deleted at memory hot remove. Thanks, Yasuaki Ishimatsu Hi Yasuaki, By reading the code, I find memblk is not deleted at memory hot remove. I am not sure whether we should remove it. If remove it, we should also reset arch_zone_lowest_possible_pfn, right? It seems a little complicated. I think memblk should be added/removed by hot adding/removing memory. But, arch_zone_lowest_possible_pfn should not be changed. Thanks, Yasuaki Ishimatsu Thanks, Xishi Qiu Thanks, Xishi Qiu . -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2 V2] memory-hotplug: fix BUG_ON in move_freepages()
On 2015/4/20 11:29, Yasuaki Ishimatsu wrote: > > On Mon, 20 Apr 2015 10:45:45 +0800 > Xishi Qiu wrote: > >> On 2015/4/20 9:42, Gu Zheng wrote: >> >>> Hi Xishi, >>> On 04/18/2015 04:05 AM, Yasuaki Ishimatsu wrote: >>> Your patches will fix your issue. But, if BIOS reports memory first at node hot add, pgdat can not be initialized. Memory hot add flows are as follows: add_memory ... -> hotadd_new_pgdat() ... -> node_set_online(nid) When calling hotadd_new_pgdat() for a hot added node, the node is offline because node_set_online() is not called yet. So if applying your patches, the pgdat is not initialized in this case. >>> >>> Ishimtasu's worry is reasonable. And I am afraid the fix here is a bit >>> over-kill. >>> Thanks, Yasuaki Ishimatsu On Fri, 17 Apr 2015 18:50:32 +0800 Xishi Qiu wrote: > Hot remove nodeXX, then hot add nodeXX. If BIOS report cpu first, it will > call > hotadd_new_pgdat(nid, 0), this will set pgdat->node_start_pfn to 0. As > nodeXX > exists at boot time, so pgdat->node_spanned_pages is the same as > original. Then > free_area_init_core()->memmap_init() will pass a wrong start and a > nonzero size. >>> >>> As your analysis said the root cause here is passing a *0* as the >>> node_start_pfn, >>> then the chaos occurred when init the zones. And this only happens to the >>> re-hotadd >>> node, so how about using the saved *node_start_pfn* (via >>> get_pfn_range_for_nid(nid, _pfn, _pfn)) >>> instead if we find "pgdat->node_start_pfn == 0 && !node_online(XXX)"? >>> >>> Thanks, >>> Gu >>> >> >> Hi Gu, >> >> I first considered this method, but if the hot added node's start and size >> are different >> from before, it makes the chaos. >> > >> e.g. >> nodeXX (8-16G) >> remove nodeXX >> BIOS report cpu first and online it >> hotadd nodeXX >> use the original value, so pgdat->node_start_pfn is set to 8G, and size is 8G >> BIOS report mem(10-12G) >> call add_memory()->__add_zone()->grow_zone_span()/grow_pgdat_span() >> the start is still 8G, not 10G, this is chaos! > > If you set CONFIG_HAVE_MEMBLOCK_NODE_MAP, kernel shows the following > pr_info()'s message. > > void __paginginit free_area_init_node(int nid, unsigned long *zones_size, > unsigned long node_start_pfn, unsigned long *zholes_size) > { > ... > #ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP > get_pfn_range_for_nid(nid, _pfn, _pfn); > pr_info("Initmem setup node %d [mem %#018Lx-%#018Lx]\n", nid, > (u64)start_pfn << PAGE_SHIFT, ((u64)end_pfn << PAGE_SHIFT) - > 1); > #endif > } > > Is the memory range of the message "8G - 16G"? > If so, the reason is that memblk is not deleted at memory hot remove. > > Thanks, > Yasuaki Ishimatsu > Hi Yasuaki, By reading the code, I find memblk is not deleted at memory hot remove. I am not sure whether we should remove it. If remove it, we should also reset "arch_zone_lowest_possible_pfn", right? It seems a little complicated. Thanks, Xishi Qiu > > >> >> Thanks, >> Xishi Qiu >> > > . > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2 V2] memory-hotplug: fix BUG_ON in move_freepages()
On 2015/4/20 11:15, Yasuaki Ishimatsu wrote: > > On Mon, 20 Apr 2015 10:59:37 +0800 > Xishi Qiu wrote: > >> On 2015/4/20 10:09, Gu Zheng wrote: >> >>> Hi Ishimatsu, Xishi, >>> >>> On 04/20/2015 10:11 AM, Yasuaki Ishimatsu wrote: >>> > When hot adding memory and creating new node, the node is offline. > And after calling node_set_online(), the node becomes online. > > Oh, sorry. I misread your ptaches. > Please ignore it... >>> >>> Seems also a misread to me. >>> I clear it (my worry) here: >>> If we set the node size to 0 here, it may hidden more things than we >>> experted. >>> All the init chunks around with the size (spanned/present/managed...) will >>> be non-sense, and the user/caller will not get a summary of the hot added >>> node >>> because of the changes here. >>> I am not sure the worry is necessary, please correct me if I missing >>> something. >>> >>> Regards, >>> Gu >>> >> >> Hi Gu, >> >> My patch is just set size to 0 when hotadd a node(old or new). I know your >> worry, >> but I think it is not necessary. >> > >> When we calculate the size, it uses "arch_zone_lowest_possible_pfn[]" and >> "memblock", >> and they are both from boot time. If we hotadd a new node, the calculated >> size is >> 0 too. When add momery, __add_zone() will grow the size and start. > > If hot adding new node, you are right. But if hot removing a memory which > is presented at boot time, memblock of the memory range is not deleted. > So when hot adding the memory, the calculated size does not become 0. > Yes, so I just set it to 0, init_currently_empty_zone() and memmap_init() will be called in __add_zone(), and start/size also will be grow there. Thanks, Xishi Qiu > Thanks, > Yasuaki Ishimatsu > >> >> Thanks, >> Xishi Qiu >> > > . > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2 V2] memory-hotplug: fix BUG_ON in move_freepages()
On Mon, 20 Apr 2015 10:45:45 +0800 Xishi Qiu wrote: > On 2015/4/20 9:42, Gu Zheng wrote: > > > Hi Xishi, > > On 04/18/2015 04:05 AM, Yasuaki Ishimatsu wrote: > > > >> > >> Your patches will fix your issue. > >> But, if BIOS reports memory first at node hot add, pgdat can > >> not be initialized. > >> > >> Memory hot add flows are as follows: > >> > >> add_memory > >> ... > >> -> hotadd_new_pgdat() > >> ... > >> -> node_set_online(nid) > >> > >> When calling hotadd_new_pgdat() for a hot added node, the node is > >> offline because node_set_online() is not called yet. So if applying > >> your patches, the pgdat is not initialized in this case. > > > > Ishimtasu's worry is reasonable. And I am afraid the fix here is a bit > > over-kill. > > > >> > >> Thanks, > >> Yasuaki Ishimatsu > >> > >> On Fri, 17 Apr 2015 18:50:32 +0800 > >> Xishi Qiu wrote: > >> > >>> Hot remove nodeXX, then hot add nodeXX. If BIOS report cpu first, it will > >>> call > >>> hotadd_new_pgdat(nid, 0), this will set pgdat->node_start_pfn to 0. As > >>> nodeXX > >>> exists at boot time, so pgdat->node_spanned_pages is the same as > >>> original. Then > >>> free_area_init_core()->memmap_init() will pass a wrong start and a > >>> nonzero size. > > > > As your analysis said the root cause here is passing a *0* as the > > node_start_pfn, > > then the chaos occurred when init the zones. And this only happens to the > > re-hotadd > > node, so how about using the saved *node_start_pfn* (via > > get_pfn_range_for_nid(nid, _pfn, _pfn)) > > instead if we find "pgdat->node_start_pfn == 0 && !node_online(XXX)"? > > > > Thanks, > > Gu > > > > Hi Gu, > > I first considered this method, but if the hot added node's start and size > are different > from before, it makes the chaos. > > e.g. > nodeXX (8-16G) > remove nodeXX > BIOS report cpu first and online it > hotadd nodeXX > use the original value, so pgdat->node_start_pfn is set to 8G, and size is 8G > BIOS report mem(10-12G) > call add_memory()->__add_zone()->grow_zone_span()/grow_pgdat_span() > the start is still 8G, not 10G, this is chaos! If you set CONFIG_HAVE_MEMBLOCK_NODE_MAP, kernel shows the following pr_info()'s message. void __paginginit free_area_init_node(int nid, unsigned long *zones_size, unsigned long node_start_pfn, unsigned long *zholes_size) { ... #ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP get_pfn_range_for_nid(nid, _pfn, _pfn); pr_info("Initmem setup node %d [mem %#018Lx-%#018Lx]\n", nid, (u64)start_pfn << PAGE_SHIFT, ((u64)end_pfn << PAGE_SHIFT) - 1); #endif } Is the memory range of the message "8G - 16G"? If so, the reason is that memblk is not deleted at memory hot remove. Thanks, Yasuaki Ishimatsu > > Thanks, > Xishi Qiu > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2 V2] memory-hotplug: fix BUG_ON in move_freepages()
On Mon, 20 Apr 2015 10:59:37 +0800 Xishi Qiu wrote: > On 2015/4/20 10:09, Gu Zheng wrote: > > > Hi Ishimatsu, Xishi, > > > > On 04/20/2015 10:11 AM, Yasuaki Ishimatsu wrote: > > > >> > >>> When hot adding memory and creating new node, the node is offline. > >>> And after calling node_set_online(), the node becomes online. > >>> > >>> Oh, sorry. I misread your ptaches. > >>> > >> > >> Please ignore it... > > > > Seems also a misread to me. > > I clear it (my worry) here: > > If we set the node size to 0 here, it may hidden more things than we > > experted. > > All the init chunks around with the size (spanned/present/managed...) will > > be non-sense, and the user/caller will not get a summary of the hot added > > node > > because of the changes here. > > I am not sure the worry is necessary, please correct me if I missing > > something. > > > > Regards, > > Gu > > > > Hi Gu, > > My patch is just set size to 0 when hotadd a node(old or new). I know your > worry, > but I think it is not necessary. > > When we calculate the size, it uses "arch_zone_lowest_possible_pfn[]" and > "memblock", > and they are both from boot time. If we hotadd a new node, the calculated > size is > 0 too. When add momery, __add_zone() will grow the size and start. If hot adding new node, you are right. But if hot removing a memory which is presented at boot time, memblock of the memory range is not deleted. So when hot adding the memory, the calculated size does not become 0. Thanks, Yasuaki Ishimatsu > > Thanks, > Xishi Qiu > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2 V2] memory-hotplug: fix BUG_ON in move_freepages()
On 2015/4/20 9:42, Gu Zheng wrote: > Hi Xishi, > On 04/18/2015 04:05 AM, Yasuaki Ishimatsu wrote: > >> >> Your patches will fix your issue. >> But, if BIOS reports memory first at node hot add, pgdat can >> not be initialized. >> >> Memory hot add flows are as follows: >> >> add_memory >> ... >> -> hotadd_new_pgdat() >> ... >> -> node_set_online(nid) >> >> When calling hotadd_new_pgdat() for a hot added node, the node is >> offline because node_set_online() is not called yet. So if applying >> your patches, the pgdat is not initialized in this case. > > Ishimtasu's worry is reasonable. And I am afraid the fix here is a bit > over-kill. > >> >> Thanks, >> Yasuaki Ishimatsu >> >> On Fri, 17 Apr 2015 18:50:32 +0800 >> Xishi Qiu wrote: >> >>> Hot remove nodeXX, then hot add nodeXX. If BIOS report cpu first, it will >>> call >>> hotadd_new_pgdat(nid, 0), this will set pgdat->node_start_pfn to 0. As >>> nodeXX >>> exists at boot time, so pgdat->node_spanned_pages is the same as original. >>> Then >>> free_area_init_core()->memmap_init() will pass a wrong start and a nonzero >>> size. > > As your analysis said the root cause here is passing a *0* as the > node_start_pfn, > then the chaos occurred when init the zones. And this only happens to the > re-hotadd > node, so how about using the saved *node_start_pfn* (via > get_pfn_range_for_nid(nid, _pfn, _pfn)) > instead if we find "pgdat->node_start_pfn == 0 && !node_online(XXX)"? > > Thanks, > Gu > Hi Gu, I first considered this method, but if the hot added node's start and size are different from before, it makes the chaos. e.g. nodeXX (8-16G) remove nodeXX BIOS report cpu first and online it hotadd nodeXX use the original value, so pgdat->node_start_pfn is set to 8G, and size is 8G BIOS report mem(10-12G) call add_memory()->__add_zone()->grow_zone_span()/grow_pgdat_span() the start is still 8G, not 10G, this is chaos! Thanks, Xishi Qiu -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2 V2] memory-hotplug: fix BUG_ON in move_freepages()
On 2015/4/20 10:09, Gu Zheng wrote: > Hi Ishimatsu, Xishi, > > On 04/20/2015 10:11 AM, Yasuaki Ishimatsu wrote: > >> >>> When hot adding memory and creating new node, the node is offline. >>> And after calling node_set_online(), the node becomes online. >>> >>> Oh, sorry. I misread your ptaches. >>> >> >> Please ignore it... > > Seems also a misread to me. > I clear it (my worry) here: > If we set the node size to 0 here, it may hidden more things than we experted. > All the init chunks around with the size (spanned/present/managed...) will > be non-sense, and the user/caller will not get a summary of the hot added node > because of the changes here. > I am not sure the worry is necessary, please correct me if I missing > something. > > Regards, > Gu > Hi Gu, My patch is just set size to 0 when hotadd a node(old or new). I know your worry, but I think it is not necessary. When we calculate the size, it uses "arch_zone_lowest_possible_pfn[]" and "memblock", and they are both from boot time. If we hotadd a new node, the calculated size is 0 too. When add momery, __add_zone() will grow the size and start. Thanks, Xishi Qiu -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2 V2] memory-hotplug: fix BUG_ON in move_freepages()
Hi Ishimatsu, Xishi, On 04/20/2015 10:11 AM, Yasuaki Ishimatsu wrote: > >> When hot adding memory and creating new node, the node is offline. >> And after calling node_set_online(), the node becomes online. >> >> Oh, sorry. I misread your ptaches. >> > > Please ignore it... Seems also a misread to me. I clear it (my worry) here: If we set the node size to 0 here, it may hidden more things than we experted, and all the init chunks around with the size (spanned/present/managed...) will be non-sense, and the user/caller will not get a summary of the hot added node because of the changes here. I am not sure the worry is necessary, please correct me if I missing something. Regards, Gu > > Thanks, > Yasuaki Ishimatsu > > On > Yasuaki Ishimatsu wrote: > >> >> When hot adding memory and creating new node, the node is offline. >> And after calling node_set_online(), the node becomes online. >> >> Oh, sorry. I misread your ptaches. >> >> Thanks, >> Yasuaki Ishimatsu >> >> On Mon, 20 Apr 2015 09:33:10 +0800 >> Xishi Qiu wrote: >> >>> On 2015/4/18 4:05, Yasuaki Ishimatsu wrote: >>> Your patches will fix your issue. But, if BIOS reports memory first at node hot add, pgdat can not be initialized. Memory hot add flows are as follows: add_memory ... -> hotadd_new_pgdat() ... -> node_set_online(nid) When calling hotadd_new_pgdat() for a hot added node, the node is offline because node_set_online() is not called yet. So if applying your patches, the pgdat is not initialized in this case. Thanks, Yasuaki Ishimatsu >>> >>> Hi Yasuaki, >>> >>> I'm not quite understand, when BIOS reports memory first, why pgdat >>> can not be initialized? >>> When hotadd a new node, hotadd_new_pgdat() will be called too, and >>> when hotadd memory to a existent node, it's no need to call >>> hotadd_new_pgdat(), >>> right? >>> >>> Thanks, >>> Xishi Qiu >>> > . > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2 V2] memory-hotplug: fix BUG_ON in move_freepages()
Hi Ishimatsu, Xishi, On 04/20/2015 10:11 AM, Yasuaki Ishimatsu wrote: > >> When hot adding memory and creating new node, the node is offline. >> And after calling node_set_online(), the node becomes online. >> >> Oh, sorry. I misread your ptaches. >> > > Please ignore it... Seems also a misread to me. I clear it (my worry) here: If we set the node size to 0 here, it may hidden more things than we experted. All the init chunks around with the size (spanned/present/managed...) will be non-sense, and the user/caller will not get a summary of the hot added node because of the changes here. I am not sure the worry is necessary, please correct me if I missing something. Regards, Gu > > Thanks, > Yasuaki Ishimatsu > > On > Yasuaki Ishimatsu wrote: > >> >> When hot adding memory and creating new node, the node is offline. >> And after calling node_set_online(), the node becomes online. >> >> Oh, sorry. I misread your ptaches. >> >> Thanks, >> Yasuaki Ishimatsu >> >> On Mon, 20 Apr 2015 09:33:10 +0800 >> Xishi Qiu wrote: >> >>> On 2015/4/18 4:05, Yasuaki Ishimatsu wrote: >>> Your patches will fix your issue. But, if BIOS reports memory first at node hot add, pgdat can not be initialized. Memory hot add flows are as follows: add_memory ... -> hotadd_new_pgdat() ... -> node_set_online(nid) When calling hotadd_new_pgdat() for a hot added node, the node is offline because node_set_online() is not called yet. So if applying your patches, the pgdat is not initialized in this case. Thanks, Yasuaki Ishimatsu >>> >>> Hi Yasuaki, >>> >>> I'm not quite understand, when BIOS reports memory first, why pgdat >>> can not be initialized? >>> When hotadd a new node, hotadd_new_pgdat() will be called too, and >>> when hotadd memory to a existent node, it's no need to call >>> hotadd_new_pgdat(), >>> right? >>> >>> Thanks, >>> Xishi Qiu >>> > . > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2 V2] memory-hotplug: fix BUG_ON in move_freepages()
On Mon, 20 Apr 2015 09:33:10 +0800 Xishi Qiu wrote: > On 2015/4/18 4:05, Yasuaki Ishimatsu wrote: > > > > > Your patches will fix your issue. > > But, if BIOS reports memory first at node hot add, pgdat can > > not be initialized. > > > > Memory hot add flows are as follows: > > > > add_memory > > ... > > -> hotadd_new_pgdat() > > ... > > -> node_set_online(nid) > > > > When calling hotadd_new_pgdat() for a hot added node, the node is > > offline because node_set_online() is not called yet. So if applying > > your patches, the pgdat is not initialized in this case. > > > > Thanks, > > Yasuaki Ishimatsu > > > > Hi Yasuaki, > > I'm not quite understand, when BIOS reports memory first, why pgdat > can not be initialized? > When hotadd a new node, hotadd_new_pgdat() will be called too, and > when hotadd memory to a existent node, it's no need to call > hotadd_new_pgdat(), > right? Your patch sikps initialization of pgdat, when node is offline. But when hot adding new node and calling hotadd_new_pgdat(), the node is offline yet. So pgdat is not initialized. Thanks, Yasuaki Ishimatsu > > Thanks, > Xishi Qiu > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2 V2] memory-hotplug: fix BUG_ON in move_freepages()
> When hot adding memory and creating new node, the node is offline. > And after calling node_set_online(), the node becomes online. > > Oh, sorry. I misread your ptaches. > Please ignore it... Thanks, Yasuaki Ishimatsu On Yasuaki Ishimatsu wrote: > > When hot adding memory and creating new node, the node is offline. > And after calling node_set_online(), the node becomes online. > > Oh, sorry. I misread your ptaches. > > Thanks, > Yasuaki Ishimatsu > > On Mon, 20 Apr 2015 09:33:10 +0800 > Xishi Qiu wrote: > > > On 2015/4/18 4:05, Yasuaki Ishimatsu wrote: > > > > > > > > Your patches will fix your issue. > > > But, if BIOS reports memory first at node hot add, pgdat can > > > not be initialized. > > > > > > Memory hot add flows are as follows: > > > > > > add_memory > > > ... > > > -> hotadd_new_pgdat() > > > ... > > > -> node_set_online(nid) > > > > > > When calling hotadd_new_pgdat() for a hot added node, the node is > > > offline because node_set_online() is not called yet. So if applying > > > your patches, the pgdat is not initialized in this case. > > > > > > Thanks, > > > Yasuaki Ishimatsu > > > > > > > Hi Yasuaki, > > > > I'm not quite understand, when BIOS reports memory first, why pgdat > > can not be initialized? > > When hotadd a new node, hotadd_new_pgdat() will be called too, and > > when hotadd memory to a existent node, it's no need to call > > hotadd_new_pgdat(), > > right? > > > > Thanks, > > Xishi Qiu > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2 V2] memory-hotplug: fix BUG_ON in move_freepages()
Hi Xishi, On 04/18/2015 04:05 AM, Yasuaki Ishimatsu wrote: > > Your patches will fix your issue. > But, if BIOS reports memory first at node hot add, pgdat can > not be initialized. > > Memory hot add flows are as follows: > > add_memory > ... > -> hotadd_new_pgdat() > ... > -> node_set_online(nid) > > When calling hotadd_new_pgdat() for a hot added node, the node is > offline because node_set_online() is not called yet. So if applying > your patches, the pgdat is not initialized in this case. Ishimtasu's worry is reasonable. And I am afraid the fix here is a bit over-kill. > > Thanks, > Yasuaki Ishimatsu > > On Fri, 17 Apr 2015 18:50:32 +0800 > Xishi Qiu wrote: > >> Hot remove nodeXX, then hot add nodeXX. If BIOS report cpu first, it will >> call >> hotadd_new_pgdat(nid, 0), this will set pgdat->node_start_pfn to 0. As nodeXX >> exists at boot time, so pgdat->node_spanned_pages is the same as original. >> Then >> free_area_init_core()->memmap_init() will pass a wrong start and a nonzero >> size. As your analysis said the root cause here is passing a *0* as the node_start_pfn, then the chaos occurred when init the zones. And this only happens to the re-hotadd node, so how about using the saved *node_start_pfn* (via get_pfn_range_for_nid(nid, _pfn, _pfn)) instead if we find "pgdat->node_start_pfn == 0 && !node_online(XXX)"? Thanks, Gu >> >> free_area_init_core() >> memmap_init() >> memmap_init_zone() >> early_pfn_in_nid() >> set_page_links() >> >> "if (!early_pfn_in_nid(pfn, nid))" will skip the pfn(memory in section), but >> it >> will not skip the pfn(hole in section), this will cover and relink the page >> to >> zone/nid, so page_zone() from memory and hole in the same section are >> different. >> The following call trace shows the bug. >> >> This patch will set the node size to 0 when hotadd a new node(original or >> new). >> init_currently_empty_zone() and memmap_init() will be called in add_zone(), >> so >> need not to change it. >> >> [90476.077469] kernel BUG at mm/page_alloc.c:1042! // move_freepages() -> >> BUG_ON(page_zone(start_page) != page_zone(end_page)); >> [90476.077469] invalid opcode: [#1] SMP >> [90476.077469] Modules linked in: iptable_nat nf_conntrack_ipv4 >> nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack fuse btrfs zlib_deflate >> raid6_pq xor msdos ext4 mbcache jbd2 binfmt_misc bridge stp llc >> ip6table_filter ip6_tables iptable_filter ip_tables ebtable_nat ebtables >> cfg80211 rfkill sg iTCO_wdt iTCO_vendor_support intel_powerclamp coretemp >> intel_rapl kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32c_intel >> ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd >> pcspkr igb vfat i2c_algo_bit dca fat sb_edac edac_core i2c_i801 lpc_ich >> i2c_core mfd_core shpchp acpi_pad ipmi_si ipmi_msghandler uinput nfsd >> auth_rpcgss nfs_acl lockd sunrpc xfs libcrc32c sd_mod crc_t10dif >> crct10dif_common ahci libahci megaraid_sas tg3 ptp libata pps_core dm_mirror >> dm_region_hash dm_log dm_mod [last unloaded: rasf] >> [90476.157382] CPU: 2 PID: 322803 Comm: updatedb Tainted: GF W >> O-- 3.10.0-229.1.2.5.hulk.rc14.x86_64 #1 >> [90476.157382] Hardware name: HUAWEI TECHNOLOGIES CO.,LTD. Huawei N1/Huawei >> N1, BIOS V100R001 04/13/2015 >> [90476.157382] task: 88006a6d5b00 ti: 880068eb8000 task.ti: >> 880068eb8000 >> [90476.157382] RIP: 0010:[] [] >> move_freepages+0x12f/0x140 >> [90476.157382] RSP: 0018:880068ebb640 EFLAGS: 00010002 >> [90476.157382] RAX: 880002316cc0 RBX: ea0001bd RCX: >> 0001 >> [90476.157382] RDX: 880002476e40 RSI: RDI: >> 880002316cc0 >> [90476.157382] RBP: 880068ebb690 R08: 0010 R09: >> ea0001bd7fc0 >> [90476.157382] R10: 0006f5ff R11: R12: >> 0001 >> [90476.157382] R13: 0003 R14: 880002316eb8 R15: >> ea0001bd7fc0 >> [90476.157382] FS: 7f4d3ab95740() GS:880033a0() >> knlGS: >> [90476.157382] CS: 0010 DS: ES: CR0: 80050033 >> [90476.157382] CR2: 7f4d3ae1a808 CR3: 00018907a000 CR4: >> 001407e0 >> [90476.157382] DR0: DR1: DR2: >> >> [90476.157382] DR3: DR6: fffe0ff0 DR7: >> 0400 >> [90476.157382] Stack: >> [90476.157382] 880068ebb698 880002316cc0 a800b5378098 >> 880068ebb698 >> [90476.157382] 810b11dc 880002316cc0 0001 >> 0003 >> [90476.157382] 880002316eb8 ea0001bd6420 880068ebb6a0 >> 8115a003 >> [90476.157382] Call Trace: >> [90476.157382] [] ? update_curr+0xcc/0x150 >> [90476.157382] [] move_freepages_block+0x73/0x80 >> [90476.157382] [] __rmqueue+0x26a/0x460 >> [90476.157382] [] ? native_sched_clock+0x13/0x80 >>
Re: [PATCH 1/2 V2] memory-hotplug: fix BUG_ON in move_freepages()
On 2015/4/18 4:05, Yasuaki Ishimatsu wrote: > > Your patches will fix your issue. > But, if BIOS reports memory first at node hot add, pgdat can > not be initialized. > > Memory hot add flows are as follows: > > add_memory > ... > -> hotadd_new_pgdat() > ... > -> node_set_online(nid) > > When calling hotadd_new_pgdat() for a hot added node, the node is > offline because node_set_online() is not called yet. So if applying > your patches, the pgdat is not initialized in this case. > > Thanks, > Yasuaki Ishimatsu > Hi Yasuaki, I'm not quite understand, when BIOS reports memory first, why pgdat can not be initialized? When hotadd a new node, hotadd_new_pgdat() will be called too, and when hotadd memory to a existent node, it's no need to call hotadd_new_pgdat(), right? Thanks, Xishi Qiu -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2 V2] memory-hotplug: fix BUG_ON in move_freepages()
When hot adding memory and creating new node, the node is offline. And after calling node_set_online(), the node becomes online. Oh, sorry. I misread your ptaches. Thanks, Yasuaki Ishimatsu On Mon, 20 Apr 2015 09:33:10 +0800 Xishi Qiu wrote: > On 2015/4/18 4:05, Yasuaki Ishimatsu wrote: > > > > > Your patches will fix your issue. > > But, if BIOS reports memory first at node hot add, pgdat can > > not be initialized. > > > > Memory hot add flows are as follows: > > > > add_memory > > ... > > -> hotadd_new_pgdat() > > ... > > -> node_set_online(nid) > > > > When calling hotadd_new_pgdat() for a hot added node, the node is > > offline because node_set_online() is not called yet. So if applying > > your patches, the pgdat is not initialized in this case. > > > > Thanks, > > Yasuaki Ishimatsu > > > > Hi Yasuaki, > > I'm not quite understand, when BIOS reports memory first, why pgdat > can not be initialized? > When hotadd a new node, hotadd_new_pgdat() will be called too, and > when hotadd memory to a existent node, it's no need to call > hotadd_new_pgdat(), > right? > > Thanks, > Xishi Qiu > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2 V2] memory-hotplug: fix BUG_ON in move_freepages()
When hot adding memory and creating new node, the node is offline. And after calling node_set_online(), the node becomes online. Oh, sorry. I misread your ptaches. Please ignore it... Thanks, Yasuaki Ishimatsu On Yasuaki Ishimatsu yasu.isim...@gmail.com wrote: When hot adding memory and creating new node, the node is offline. And after calling node_set_online(), the node becomes online. Oh, sorry. I misread your ptaches. Thanks, Yasuaki Ishimatsu On Mon, 20 Apr 2015 09:33:10 +0800 Xishi Qiu qiuxi...@huawei.com wrote: On 2015/4/18 4:05, Yasuaki Ishimatsu wrote: Your patches will fix your issue. But, if BIOS reports memory first at node hot add, pgdat can not be initialized. Memory hot add flows are as follows: add_memory ... - hotadd_new_pgdat() ... - node_set_online(nid) When calling hotadd_new_pgdat() for a hot added node, the node is offline because node_set_online() is not called yet. So if applying your patches, the pgdat is not initialized in this case. Thanks, Yasuaki Ishimatsu Hi Yasuaki, I'm not quite understand, when BIOS reports memory first, why pgdat can not be initialized? When hotadd a new node, hotadd_new_pgdat() will be called too, and when hotadd memory to a existent node, it's no need to call hotadd_new_pgdat(), right? Thanks, Xishi Qiu -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2 V2] memory-hotplug: fix BUG_ON in move_freepages()
On 2015/4/18 4:05, Yasuaki Ishimatsu wrote: Your patches will fix your issue. But, if BIOS reports memory first at node hot add, pgdat can not be initialized. Memory hot add flows are as follows: add_memory ... - hotadd_new_pgdat() ... - node_set_online(nid) When calling hotadd_new_pgdat() for a hot added node, the node is offline because node_set_online() is not called yet. So if applying your patches, the pgdat is not initialized in this case. Thanks, Yasuaki Ishimatsu Hi Yasuaki, I'm not quite understand, when BIOS reports memory first, why pgdat can not be initialized? When hotadd a new node, hotadd_new_pgdat() will be called too, and when hotadd memory to a existent node, it's no need to call hotadd_new_pgdat(), right? Thanks, Xishi Qiu -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2 V2] memory-hotplug: fix BUG_ON in move_freepages()
Hi Ishimatsu, Xishi, On 04/20/2015 10:11 AM, Yasuaki Ishimatsu wrote: When hot adding memory and creating new node, the node is offline. And after calling node_set_online(), the node becomes online. Oh, sorry. I misread your ptaches. Please ignore it... Seems also a misread to me. I clear it (my worry) here: If we set the node size to 0 here, it may hidden more things than we experted, and all the init chunks around with the size (spanned/present/managed...) will be non-sense, and the user/caller will not get a summary of the hot added node because of the changes here. I am not sure the worry is necessary, please correct me if I missing something. Regards, Gu Thanks, Yasuaki Ishimatsu On Yasuaki Ishimatsu yasu.isim...@gmail.com wrote: When hot adding memory and creating new node, the node is offline. And after calling node_set_online(), the node becomes online. Oh, sorry. I misread your ptaches. Thanks, Yasuaki Ishimatsu On Mon, 20 Apr 2015 09:33:10 +0800 Xishi Qiu qiuxi...@huawei.com wrote: On 2015/4/18 4:05, Yasuaki Ishimatsu wrote: Your patches will fix your issue. But, if BIOS reports memory first at node hot add, pgdat can not be initialized. Memory hot add flows are as follows: add_memory ... - hotadd_new_pgdat() ... - node_set_online(nid) When calling hotadd_new_pgdat() for a hot added node, the node is offline because node_set_online() is not called yet. So if applying your patches, the pgdat is not initialized in this case. Thanks, Yasuaki Ishimatsu Hi Yasuaki, I'm not quite understand, when BIOS reports memory first, why pgdat can not be initialized? When hotadd a new node, hotadd_new_pgdat() will be called too, and when hotadd memory to a existent node, it's no need to call hotadd_new_pgdat(), right? Thanks, Xishi Qiu . -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2 V2] memory-hotplug: fix BUG_ON in move_freepages()
Hi Ishimatsu, Xishi, On 04/20/2015 10:11 AM, Yasuaki Ishimatsu wrote: When hot adding memory and creating new node, the node is offline. And after calling node_set_online(), the node becomes online. Oh, sorry. I misread your ptaches. Please ignore it... Seems also a misread to me. I clear it (my worry) here: If we set the node size to 0 here, it may hidden more things than we experted. All the init chunks around with the size (spanned/present/managed...) will be non-sense, and the user/caller will not get a summary of the hot added node because of the changes here. I am not sure the worry is necessary, please correct me if I missing something. Regards, Gu Thanks, Yasuaki Ishimatsu On Yasuaki Ishimatsu yasu.isim...@gmail.com wrote: When hot adding memory and creating new node, the node is offline. And after calling node_set_online(), the node becomes online. Oh, sorry. I misread your ptaches. Thanks, Yasuaki Ishimatsu On Mon, 20 Apr 2015 09:33:10 +0800 Xishi Qiu qiuxi...@huawei.com wrote: On 2015/4/18 4:05, Yasuaki Ishimatsu wrote: Your patches will fix your issue. But, if BIOS reports memory first at node hot add, pgdat can not be initialized. Memory hot add flows are as follows: add_memory ... - hotadd_new_pgdat() ... - node_set_online(nid) When calling hotadd_new_pgdat() for a hot added node, the node is offline because node_set_online() is not called yet. So if applying your patches, the pgdat is not initialized in this case. Thanks, Yasuaki Ishimatsu Hi Yasuaki, I'm not quite understand, when BIOS reports memory first, why pgdat can not be initialized? When hotadd a new node, hotadd_new_pgdat() will be called too, and when hotadd memory to a existent node, it's no need to call hotadd_new_pgdat(), right? Thanks, Xishi Qiu . -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2 V2] memory-hotplug: fix BUG_ON in move_freepages()
On 2015/4/20 10:09, Gu Zheng wrote: Hi Ishimatsu, Xishi, On 04/20/2015 10:11 AM, Yasuaki Ishimatsu wrote: When hot adding memory and creating new node, the node is offline. And after calling node_set_online(), the node becomes online. Oh, sorry. I misread your ptaches. Please ignore it... Seems also a misread to me. I clear it (my worry) here: If we set the node size to 0 here, it may hidden more things than we experted. All the init chunks around with the size (spanned/present/managed...) will be non-sense, and the user/caller will not get a summary of the hot added node because of the changes here. I am not sure the worry is necessary, please correct me if I missing something. Regards, Gu Hi Gu, My patch is just set size to 0 when hotadd a node(old or new). I know your worry, but I think it is not necessary. When we calculate the size, it uses arch_zone_lowest_possible_pfn[] and memblock, and they are both from boot time. If we hotadd a new node, the calculated size is 0 too. When add momery, __add_zone() will grow the size and start. Thanks, Xishi Qiu -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2 V2] memory-hotplug: fix BUG_ON in move_freepages()
On 2015/4/20 9:42, Gu Zheng wrote: Hi Xishi, On 04/18/2015 04:05 AM, Yasuaki Ishimatsu wrote: Your patches will fix your issue. But, if BIOS reports memory first at node hot add, pgdat can not be initialized. Memory hot add flows are as follows: add_memory ... - hotadd_new_pgdat() ... - node_set_online(nid) When calling hotadd_new_pgdat() for a hot added node, the node is offline because node_set_online() is not called yet. So if applying your patches, the pgdat is not initialized in this case. Ishimtasu's worry is reasonable. And I am afraid the fix here is a bit over-kill. Thanks, Yasuaki Ishimatsu On Fri, 17 Apr 2015 18:50:32 +0800 Xishi Qiu qiuxi...@huawei.com wrote: Hot remove nodeXX, then hot add nodeXX. If BIOS report cpu first, it will call hotadd_new_pgdat(nid, 0), this will set pgdat-node_start_pfn to 0. As nodeXX exists at boot time, so pgdat-node_spanned_pages is the same as original. Then free_area_init_core()-memmap_init() will pass a wrong start and a nonzero size. As your analysis said the root cause here is passing a *0* as the node_start_pfn, then the chaos occurred when init the zones. And this only happens to the re-hotadd node, so how about using the saved *node_start_pfn* (via get_pfn_range_for_nid(nid, start_pfn, end_pfn)) instead if we find pgdat-node_start_pfn == 0 !node_online(XXX)? Thanks, Gu Hi Gu, I first considered this method, but if the hot added node's start and size are different from before, it makes the chaos. e.g. nodeXX (8-16G) remove nodeXX BIOS report cpu first and online it hotadd nodeXX use the original value, so pgdat-node_start_pfn is set to 8G, and size is 8G BIOS report mem(10-12G) call add_memory()-__add_zone()-grow_zone_span()/grow_pgdat_span() the start is still 8G, not 10G, this is chaos! Thanks, Xishi Qiu -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2 V2] memory-hotplug: fix BUG_ON in move_freepages()
On 2015/4/20 11:15, Yasuaki Ishimatsu wrote: On Mon, 20 Apr 2015 10:59:37 +0800 Xishi Qiu qiuxi...@huawei.com wrote: On 2015/4/20 10:09, Gu Zheng wrote: Hi Ishimatsu, Xishi, On 04/20/2015 10:11 AM, Yasuaki Ishimatsu wrote: When hot adding memory and creating new node, the node is offline. And after calling node_set_online(), the node becomes online. Oh, sorry. I misread your ptaches. Please ignore it... Seems also a misread to me. I clear it (my worry) here: If we set the node size to 0 here, it may hidden more things than we experted. All the init chunks around with the size (spanned/present/managed...) will be non-sense, and the user/caller will not get a summary of the hot added node because of the changes here. I am not sure the worry is necessary, please correct me if I missing something. Regards, Gu Hi Gu, My patch is just set size to 0 when hotadd a node(old or new). I know your worry, but I think it is not necessary. When we calculate the size, it uses arch_zone_lowest_possible_pfn[] and memblock, and they are both from boot time. If we hotadd a new node, the calculated size is 0 too. When add momery, __add_zone() will grow the size and start. If hot adding new node, you are right. But if hot removing a memory which is presented at boot time, memblock of the memory range is not deleted. So when hot adding the memory, the calculated size does not become 0. Yes, so I just set it to 0, init_currently_empty_zone() and memmap_init() will be called in __add_zone(), and start/size also will be grow there. Thanks, Xishi Qiu Thanks, Yasuaki Ishimatsu Thanks, Xishi Qiu . -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2 V2] memory-hotplug: fix BUG_ON in move_freepages()
When hot adding memory and creating new node, the node is offline. And after calling node_set_online(), the node becomes online. Oh, sorry. I misread your ptaches. Thanks, Yasuaki Ishimatsu On Mon, 20 Apr 2015 09:33:10 +0800 Xishi Qiu qiuxi...@huawei.com wrote: On 2015/4/18 4:05, Yasuaki Ishimatsu wrote: Your patches will fix your issue. But, if BIOS reports memory first at node hot add, pgdat can not be initialized. Memory hot add flows are as follows: add_memory ... - hotadd_new_pgdat() ... - node_set_online(nid) When calling hotadd_new_pgdat() for a hot added node, the node is offline because node_set_online() is not called yet. So if applying your patches, the pgdat is not initialized in this case. Thanks, Yasuaki Ishimatsu Hi Yasuaki, I'm not quite understand, when BIOS reports memory first, why pgdat can not be initialized? When hotadd a new node, hotadd_new_pgdat() will be called too, and when hotadd memory to a existent node, it's no need to call hotadd_new_pgdat(), right? Thanks, Xishi Qiu -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2 V2] memory-hotplug: fix BUG_ON in move_freepages()
Hi Xishi, On 04/18/2015 04:05 AM, Yasuaki Ishimatsu wrote: Your patches will fix your issue. But, if BIOS reports memory first at node hot add, pgdat can not be initialized. Memory hot add flows are as follows: add_memory ... - hotadd_new_pgdat() ... - node_set_online(nid) When calling hotadd_new_pgdat() for a hot added node, the node is offline because node_set_online() is not called yet. So if applying your patches, the pgdat is not initialized in this case. Ishimtasu's worry is reasonable. And I am afraid the fix here is a bit over-kill. Thanks, Yasuaki Ishimatsu On Fri, 17 Apr 2015 18:50:32 +0800 Xishi Qiu qiuxi...@huawei.com wrote: Hot remove nodeXX, then hot add nodeXX. If BIOS report cpu first, it will call hotadd_new_pgdat(nid, 0), this will set pgdat-node_start_pfn to 0. As nodeXX exists at boot time, so pgdat-node_spanned_pages is the same as original. Then free_area_init_core()-memmap_init() will pass a wrong start and a nonzero size. As your analysis said the root cause here is passing a *0* as the node_start_pfn, then the chaos occurred when init the zones. And this only happens to the re-hotadd node, so how about using the saved *node_start_pfn* (via get_pfn_range_for_nid(nid, start_pfn, end_pfn)) instead if we find pgdat-node_start_pfn == 0 !node_online(XXX)? Thanks, Gu free_area_init_core() memmap_init() memmap_init_zone() early_pfn_in_nid() set_page_links() if (!early_pfn_in_nid(pfn, nid)) will skip the pfn(memory in section), but it will not skip the pfn(hole in section), this will cover and relink the page to zone/nid, so page_zone() from memory and hole in the same section are different. The following call trace shows the bug. This patch will set the node size to 0 when hotadd a new node(original or new). init_currently_empty_zone() and memmap_init() will be called in add_zone(), so need not to change it. [90476.077469] kernel BUG at mm/page_alloc.c:1042! // move_freepages() - BUG_ON(page_zone(start_page) != page_zone(end_page)); [90476.077469] invalid opcode: [#1] SMP [90476.077469] Modules linked in: iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack fuse btrfs zlib_deflate raid6_pq xor msdos ext4 mbcache jbd2 binfmt_misc bridge stp llc ip6table_filter ip6_tables iptable_filter ip_tables ebtable_nat ebtables cfg80211 rfkill sg iTCO_wdt iTCO_vendor_support intel_powerclamp coretemp intel_rapl kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd pcspkr igb vfat i2c_algo_bit dca fat sb_edac edac_core i2c_i801 lpc_ich i2c_core mfd_core shpchp acpi_pad ipmi_si ipmi_msghandler uinput nfsd auth_rpcgss nfs_acl lockd sunrpc xfs libcrc32c sd_mod crc_t10dif crct10dif_common ahci libahci megaraid_sas tg3 ptp libata pps_core dm_mirror dm_region_hash dm_log dm_mod [last unloaded: rasf] [90476.157382] CPU: 2 PID: 322803 Comm: updatedb Tainted: GF W O-- 3.10.0-229.1.2.5.hulk.rc14.x86_64 #1 [90476.157382] Hardware name: HUAWEI TECHNOLOGIES CO.,LTD. Huawei N1/Huawei N1, BIOS V100R001 04/13/2015 [90476.157382] task: 88006a6d5b00 ti: 880068eb8000 task.ti: 880068eb8000 [90476.157382] RIP: 0010:[81159f7f] [81159f7f] move_freepages+0x12f/0x140 [90476.157382] RSP: 0018:880068ebb640 EFLAGS: 00010002 [90476.157382] RAX: 880002316cc0 RBX: ea0001bd RCX: 0001 [90476.157382] RDX: 880002476e40 RSI: RDI: 880002316cc0 [90476.157382] RBP: 880068ebb690 R08: 0010 R09: ea0001bd7fc0 [90476.157382] R10: 0006f5ff R11: R12: 0001 [90476.157382] R13: 0003 R14: 880002316eb8 R15: ea0001bd7fc0 [90476.157382] FS: 7f4d3ab95740() GS:880033a0() knlGS: [90476.157382] CS: 0010 DS: ES: CR0: 80050033 [90476.157382] CR2: 7f4d3ae1a808 CR3: 00018907a000 CR4: 001407e0 [90476.157382] DR0: DR1: DR2: [90476.157382] DR3: DR6: fffe0ff0 DR7: 0400 [90476.157382] Stack: [90476.157382] 880068ebb698 880002316cc0 a800b5378098 880068ebb698 [90476.157382] 810b11dc 880002316cc0 0001 0003 [90476.157382] 880002316eb8 ea0001bd6420 880068ebb6a0 8115a003 [90476.157382] Call Trace: [90476.157382] [810b11dc] ? update_curr+0xcc/0x150 [90476.157382] [8115a003] move_freepages_block+0x73/0x80 [90476.157382] [8115b9ba] __rmqueue+0x26a/0x460 [90476.157382] [8101ba53] ? native_sched_clock+0x13/0x80 [90476.157382] [8115e172] get_page_from_freelist+0x7f2/0xd30 [90476.157382]
Re: [PATCH 1/2 V2] memory-hotplug: fix BUG_ON in move_freepages()
On Mon, 20 Apr 2015 09:33:10 +0800 Xishi Qiu qiuxi...@huawei.com wrote: On 2015/4/18 4:05, Yasuaki Ishimatsu wrote: Your patches will fix your issue. But, if BIOS reports memory first at node hot add, pgdat can not be initialized. Memory hot add flows are as follows: add_memory ... - hotadd_new_pgdat() ... - node_set_online(nid) When calling hotadd_new_pgdat() for a hot added node, the node is offline because node_set_online() is not called yet. So if applying your patches, the pgdat is not initialized in this case. Thanks, Yasuaki Ishimatsu Hi Yasuaki, I'm not quite understand, when BIOS reports memory first, why pgdat can not be initialized? When hotadd a new node, hotadd_new_pgdat() will be called too, and when hotadd memory to a existent node, it's no need to call hotadd_new_pgdat(), right? Your patch sikps initialization of pgdat, when node is offline. But when hot adding new node and calling hotadd_new_pgdat(), the node is offline yet. So pgdat is not initialized. Thanks, Yasuaki Ishimatsu Thanks, Xishi Qiu -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2 V2] memory-hotplug: fix BUG_ON in move_freepages()
On Mon, 20 Apr 2015 10:59:37 +0800 Xishi Qiu qiuxi...@huawei.com wrote: On 2015/4/20 10:09, Gu Zheng wrote: Hi Ishimatsu, Xishi, On 04/20/2015 10:11 AM, Yasuaki Ishimatsu wrote: When hot adding memory and creating new node, the node is offline. And after calling node_set_online(), the node becomes online. Oh, sorry. I misread your ptaches. Please ignore it... Seems also a misread to me. I clear it (my worry) here: If we set the node size to 0 here, it may hidden more things than we experted. All the init chunks around with the size (spanned/present/managed...) will be non-sense, and the user/caller will not get a summary of the hot added node because of the changes here. I am not sure the worry is necessary, please correct me if I missing something. Regards, Gu Hi Gu, My patch is just set size to 0 when hotadd a node(old or new). I know your worry, but I think it is not necessary. When we calculate the size, it uses arch_zone_lowest_possible_pfn[] and memblock, and they are both from boot time. If we hotadd a new node, the calculated size is 0 too. When add momery, __add_zone() will grow the size and start. If hot adding new node, you are right. But if hot removing a memory which is presented at boot time, memblock of the memory range is not deleted. So when hot adding the memory, the calculated size does not become 0. Thanks, Yasuaki Ishimatsu Thanks, Xishi Qiu -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2 V2] memory-hotplug: fix BUG_ON in move_freepages()
On Mon, 20 Apr 2015 10:45:45 +0800 Xishi Qiu qiuxi...@huawei.com wrote: On 2015/4/20 9:42, Gu Zheng wrote: Hi Xishi, On 04/18/2015 04:05 AM, Yasuaki Ishimatsu wrote: Your patches will fix your issue. But, if BIOS reports memory first at node hot add, pgdat can not be initialized. Memory hot add flows are as follows: add_memory ... - hotadd_new_pgdat() ... - node_set_online(nid) When calling hotadd_new_pgdat() for a hot added node, the node is offline because node_set_online() is not called yet. So if applying your patches, the pgdat is not initialized in this case. Ishimtasu's worry is reasonable. And I am afraid the fix here is a bit over-kill. Thanks, Yasuaki Ishimatsu On Fri, 17 Apr 2015 18:50:32 +0800 Xishi Qiu qiuxi...@huawei.com wrote: Hot remove nodeXX, then hot add nodeXX. If BIOS report cpu first, it will call hotadd_new_pgdat(nid, 0), this will set pgdat-node_start_pfn to 0. As nodeXX exists at boot time, so pgdat-node_spanned_pages is the same as original. Then free_area_init_core()-memmap_init() will pass a wrong start and a nonzero size. As your analysis said the root cause here is passing a *0* as the node_start_pfn, then the chaos occurred when init the zones. And this only happens to the re-hotadd node, so how about using the saved *node_start_pfn* (via get_pfn_range_for_nid(nid, start_pfn, end_pfn)) instead if we find pgdat-node_start_pfn == 0 !node_online(XXX)? Thanks, Gu Hi Gu, I first considered this method, but if the hot added node's start and size are different from before, it makes the chaos. e.g. nodeXX (8-16G) remove nodeXX BIOS report cpu first and online it hotadd nodeXX use the original value, so pgdat-node_start_pfn is set to 8G, and size is 8G BIOS report mem(10-12G) call add_memory()-__add_zone()-grow_zone_span()/grow_pgdat_span() the start is still 8G, not 10G, this is chaos! If you set CONFIG_HAVE_MEMBLOCK_NODE_MAP, kernel shows the following pr_info()'s message. void __paginginit free_area_init_node(int nid, unsigned long *zones_size, unsigned long node_start_pfn, unsigned long *zholes_size) { ... #ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP get_pfn_range_for_nid(nid, start_pfn, end_pfn); pr_info(Initmem setup node %d [mem %#018Lx-%#018Lx]\n, nid, (u64)start_pfn PAGE_SHIFT, ((u64)end_pfn PAGE_SHIFT) - 1); #endif } Is the memory range of the message 8G - 16G? If so, the reason is that memblk is not deleted at memory hot remove. Thanks, Yasuaki Ishimatsu Thanks, Xishi Qiu -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2 V2] memory-hotplug: fix BUG_ON in move_freepages()
On 2015/4/20 11:29, Yasuaki Ishimatsu wrote: On Mon, 20 Apr 2015 10:45:45 +0800 Xishi Qiu qiuxi...@huawei.com wrote: On 2015/4/20 9:42, Gu Zheng wrote: Hi Xishi, On 04/18/2015 04:05 AM, Yasuaki Ishimatsu wrote: Your patches will fix your issue. But, if BIOS reports memory first at node hot add, pgdat can not be initialized. Memory hot add flows are as follows: add_memory ... - hotadd_new_pgdat() ... - node_set_online(nid) When calling hotadd_new_pgdat() for a hot added node, the node is offline because node_set_online() is not called yet. So if applying your patches, the pgdat is not initialized in this case. Ishimtasu's worry is reasonable. And I am afraid the fix here is a bit over-kill. Thanks, Yasuaki Ishimatsu On Fri, 17 Apr 2015 18:50:32 +0800 Xishi Qiu qiuxi...@huawei.com wrote: Hot remove nodeXX, then hot add nodeXX. If BIOS report cpu first, it will call hotadd_new_pgdat(nid, 0), this will set pgdat-node_start_pfn to 0. As nodeXX exists at boot time, so pgdat-node_spanned_pages is the same as original. Then free_area_init_core()-memmap_init() will pass a wrong start and a nonzero size. As your analysis said the root cause here is passing a *0* as the node_start_pfn, then the chaos occurred when init the zones. And this only happens to the re-hotadd node, so how about using the saved *node_start_pfn* (via get_pfn_range_for_nid(nid, start_pfn, end_pfn)) instead if we find pgdat-node_start_pfn == 0 !node_online(XXX)? Thanks, Gu Hi Gu, I first considered this method, but if the hot added node's start and size are different from before, it makes the chaos. e.g. nodeXX (8-16G) remove nodeXX BIOS report cpu first and online it hotadd nodeXX use the original value, so pgdat-node_start_pfn is set to 8G, and size is 8G BIOS report mem(10-12G) call add_memory()-__add_zone()-grow_zone_span()/grow_pgdat_span() the start is still 8G, not 10G, this is chaos! If you set CONFIG_HAVE_MEMBLOCK_NODE_MAP, kernel shows the following pr_info()'s message. void __paginginit free_area_init_node(int nid, unsigned long *zones_size, unsigned long node_start_pfn, unsigned long *zholes_size) { ... #ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP get_pfn_range_for_nid(nid, start_pfn, end_pfn); pr_info(Initmem setup node %d [mem %#018Lx-%#018Lx]\n, nid, (u64)start_pfn PAGE_SHIFT, ((u64)end_pfn PAGE_SHIFT) - 1); #endif } Is the memory range of the message 8G - 16G? If so, the reason is that memblk is not deleted at memory hot remove. Thanks, Yasuaki Ishimatsu Hi Yasuaki, By reading the code, I find memblk is not deleted at memory hot remove. I am not sure whether we should remove it. If remove it, we should also reset arch_zone_lowest_possible_pfn, right? It seems a little complicated. Thanks, Xishi Qiu Thanks, Xishi Qiu . -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2 V2] memory-hotplug: fix BUG_ON in move_freepages()
Your patches will fix your issue. But, if BIOS reports memory first at node hot add, pgdat can not be initialized. Memory hot add flows are as follows: add_memory ... -> hotadd_new_pgdat() ... -> node_set_online(nid) When calling hotadd_new_pgdat() for a hot added node, the node is offline because node_set_online() is not called yet. So if applying your patches, the pgdat is not initialized in this case. Thanks, Yasuaki Ishimatsu On Fri, 17 Apr 2015 18:50:32 +0800 Xishi Qiu wrote: > Hot remove nodeXX, then hot add nodeXX. If BIOS report cpu first, it will call > hotadd_new_pgdat(nid, 0), this will set pgdat->node_start_pfn to 0. As nodeXX > exists at boot time, so pgdat->node_spanned_pages is the same as original. > Then > free_area_init_core()->memmap_init() will pass a wrong start and a nonzero > size. > > free_area_init_core() > memmap_init() > memmap_init_zone() > early_pfn_in_nid() > set_page_links() > > "if (!early_pfn_in_nid(pfn, nid))" will skip the pfn(memory in section), but > it > will not skip the pfn(hole in section), this will cover and relink the page to > zone/nid, so page_zone() from memory and hole in the same section are > different. > The following call trace shows the bug. > > This patch will set the node size to 0 when hotadd a new node(original or > new). > init_currently_empty_zone() and memmap_init() will be called in add_zone(), so > need not to change it. > > [90476.077469] kernel BUG at mm/page_alloc.c:1042! // move_freepages() -> > BUG_ON(page_zone(start_page) != page_zone(end_page)); > [90476.077469] invalid opcode: [#1] SMP > [90476.077469] Modules linked in: iptable_nat nf_conntrack_ipv4 > nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack fuse btrfs zlib_deflate > raid6_pq xor msdos ext4 mbcache jbd2 binfmt_misc bridge stp llc > ip6table_filter ip6_tables iptable_filter ip_tables ebtable_nat ebtables > cfg80211 rfkill sg iTCO_wdt iTCO_vendor_support intel_powerclamp coretemp > intel_rapl kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32c_intel > ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd > pcspkr igb vfat i2c_algo_bit dca fat sb_edac edac_core i2c_i801 lpc_ich > i2c_core mfd_core shpchp acpi_pad ipmi_si ipmi_msghandler uinput nfsd > auth_rpcgss nfs_acl lockd sunrpc xfs libcrc32c sd_mod crc_t10dif > crct10dif_common ahci libahci megaraid_sas tg3 ptp libata pps_core dm_mirror > dm_region_hash dm_log dm_mod [last unloaded: rasf] > [90476.157382] CPU: 2 PID: 322803 Comm: updatedb Tainted: GF W > O-- 3.10.0-229.1.2.5.hulk.rc14.x86_64 #1 > [90476.157382] Hardware name: HUAWEI TECHNOLOGIES CO.,LTD. Huawei N1/Huawei > N1, BIOS V100R001 04/13/2015 > [90476.157382] task: 88006a6d5b00 ti: 880068eb8000 task.ti: > 880068eb8000 > [90476.157382] RIP: 0010:[] [] > move_freepages+0x12f/0x140 > [90476.157382] RSP: 0018:880068ebb640 EFLAGS: 00010002 > [90476.157382] RAX: 880002316cc0 RBX: ea0001bd RCX: > 0001 > [90476.157382] RDX: 880002476e40 RSI: RDI: > 880002316cc0 > [90476.157382] RBP: 880068ebb690 R08: 0010 R09: > ea0001bd7fc0 > [90476.157382] R10: 0006f5ff R11: R12: > 0001 > [90476.157382] R13: 0003 R14: 880002316eb8 R15: > ea0001bd7fc0 > [90476.157382] FS: 7f4d3ab95740() GS:880033a0() > knlGS: > [90476.157382] CS: 0010 DS: ES: CR0: 80050033 > [90476.157382] CR2: 7f4d3ae1a808 CR3: 00018907a000 CR4: > 001407e0 > [90476.157382] DR0: DR1: DR2: > > [90476.157382] DR3: DR6: fffe0ff0 DR7: > 0400 > [90476.157382] Stack: > [90476.157382] 880068ebb698 880002316cc0 a800b5378098 > 880068ebb698 > [90476.157382] 810b11dc 880002316cc0 0001 > 0003 > [90476.157382] 880002316eb8 ea0001bd6420 880068ebb6a0 > 8115a003 > [90476.157382] Call Trace: > [90476.157382] [] ? update_curr+0xcc/0x150 > [90476.157382] [] move_freepages_block+0x73/0x80 > [90476.157382] [] __rmqueue+0x26a/0x460 > [90476.157382] [] ? native_sched_clock+0x13/0x80 > [90476.157382] [] get_page_from_freelist+0x7f2/0xd30 > [90476.157382] [] ? __switch_to+0x179/0x4a0 > [90476.157382] [] ? xfs_iext_bno_to_ext+0xa7/0x1a0 [xfs] > [90476.157382] [] __alloc_pages_nodemask+0x1c1/0xc90 > [90476.157382] [] ? _xfs_buf_ioapply+0x31c/0x420 [xfs] > [90476.157382] [] ? down_trylock+0x2d/0x40 > [90476.157382] [] ? xfs_buf_trylock+0x1f/0x80 [xfs] > [90476.157382] [] alloc_pages_current+0xa9/0x170 > [90476.157382] [] new_slab+0x275/0x300 > [90476.157382] [] __slab_alloc+0x315/0x48f > [90476.157382] [] ? kmem_zone_alloc+0x77/0x100 [xfs] > [90476.157382] [] ? xfs_bmap_search_extents+0x5c/0xc0 [xfs] >
Re: [PATCH 1/2 V2] memory-hotplug: fix BUG_ON in move_freepages()
Your patches will fix your issue. But, if BIOS reports memory first at node hot add, pgdat can not be initialized. Memory hot add flows are as follows: add_memory ... - hotadd_new_pgdat() ... - node_set_online(nid) When calling hotadd_new_pgdat() for a hot added node, the node is offline because node_set_online() is not called yet. So if applying your patches, the pgdat is not initialized in this case. Thanks, Yasuaki Ishimatsu On Fri, 17 Apr 2015 18:50:32 +0800 Xishi Qiu qiuxi...@huawei.com wrote: Hot remove nodeXX, then hot add nodeXX. If BIOS report cpu first, it will call hotadd_new_pgdat(nid, 0), this will set pgdat-node_start_pfn to 0. As nodeXX exists at boot time, so pgdat-node_spanned_pages is the same as original. Then free_area_init_core()-memmap_init() will pass a wrong start and a nonzero size. free_area_init_core() memmap_init() memmap_init_zone() early_pfn_in_nid() set_page_links() if (!early_pfn_in_nid(pfn, nid)) will skip the pfn(memory in section), but it will not skip the pfn(hole in section), this will cover and relink the page to zone/nid, so page_zone() from memory and hole in the same section are different. The following call trace shows the bug. This patch will set the node size to 0 when hotadd a new node(original or new). init_currently_empty_zone() and memmap_init() will be called in add_zone(), so need not to change it. [90476.077469] kernel BUG at mm/page_alloc.c:1042! // move_freepages() - BUG_ON(page_zone(start_page) != page_zone(end_page)); [90476.077469] invalid opcode: [#1] SMP [90476.077469] Modules linked in: iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack fuse btrfs zlib_deflate raid6_pq xor msdos ext4 mbcache jbd2 binfmt_misc bridge stp llc ip6table_filter ip6_tables iptable_filter ip_tables ebtable_nat ebtables cfg80211 rfkill sg iTCO_wdt iTCO_vendor_support intel_powerclamp coretemp intel_rapl kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd pcspkr igb vfat i2c_algo_bit dca fat sb_edac edac_core i2c_i801 lpc_ich i2c_core mfd_core shpchp acpi_pad ipmi_si ipmi_msghandler uinput nfsd auth_rpcgss nfs_acl lockd sunrpc xfs libcrc32c sd_mod crc_t10dif crct10dif_common ahci libahci megaraid_sas tg3 ptp libata pps_core dm_mirror dm_region_hash dm_log dm_mod [last unloaded: rasf] [90476.157382] CPU: 2 PID: 322803 Comm: updatedb Tainted: GF W O-- 3.10.0-229.1.2.5.hulk.rc14.x86_64 #1 [90476.157382] Hardware name: HUAWEI TECHNOLOGIES CO.,LTD. Huawei N1/Huawei N1, BIOS V100R001 04/13/2015 [90476.157382] task: 88006a6d5b00 ti: 880068eb8000 task.ti: 880068eb8000 [90476.157382] RIP: 0010:[81159f7f] [81159f7f] move_freepages+0x12f/0x140 [90476.157382] RSP: 0018:880068ebb640 EFLAGS: 00010002 [90476.157382] RAX: 880002316cc0 RBX: ea0001bd RCX: 0001 [90476.157382] RDX: 880002476e40 RSI: RDI: 880002316cc0 [90476.157382] RBP: 880068ebb690 R08: 0010 R09: ea0001bd7fc0 [90476.157382] R10: 0006f5ff R11: R12: 0001 [90476.157382] R13: 0003 R14: 880002316eb8 R15: ea0001bd7fc0 [90476.157382] FS: 7f4d3ab95740() GS:880033a0() knlGS: [90476.157382] CS: 0010 DS: ES: CR0: 80050033 [90476.157382] CR2: 7f4d3ae1a808 CR3: 00018907a000 CR4: 001407e0 [90476.157382] DR0: DR1: DR2: [90476.157382] DR3: DR6: fffe0ff0 DR7: 0400 [90476.157382] Stack: [90476.157382] 880068ebb698 880002316cc0 a800b5378098 880068ebb698 [90476.157382] 810b11dc 880002316cc0 0001 0003 [90476.157382] 880002316eb8 ea0001bd6420 880068ebb6a0 8115a003 [90476.157382] Call Trace: [90476.157382] [810b11dc] ? update_curr+0xcc/0x150 [90476.157382] [8115a003] move_freepages_block+0x73/0x80 [90476.157382] [8115b9ba] __rmqueue+0x26a/0x460 [90476.157382] [8101ba53] ? native_sched_clock+0x13/0x80 [90476.157382] [8115e172] get_page_from_freelist+0x7f2/0xd30 [90476.157382] [81012639] ? __switch_to+0x179/0x4a0 [90476.157382] [a01fc0d7] ? xfs_iext_bno_to_ext+0xa7/0x1a0 [xfs] [90476.157382] [8115e871] __alloc_pages_nodemask+0x1c1/0xc90 [90476.157382] [a01ab24c] ? _xfs_buf_ioapply+0x31c/0x420 [xfs] [90476.157382] [8109cb0d] ? down_trylock+0x2d/0x40 [90476.157382] [a01abfff] ? xfs_buf_trylock+0x1f/0x80 [xfs] [90476.157382] [8119d229] alloc_pages_current+0xa9/0x170 [90476.157382] [811a7225] new_slab+0x275/0x300