Re: [PATCH 1/2 V2] memory-hotplug: fix BUG_ON in move_freepages()

2015-04-20 Thread Xishi Qiu
On 2015/4/21 2:23, Yasuaki Ishimatsu wrote:

> 
> On Mon, 20 Apr 2015 11:42:10 +0800
> Xishi Qiu  wrote:
> 
>> On 2015/4/20 11:29, Yasuaki Ishimatsu wrote:
>>
>>>
>>> On Mon, 20 Apr 2015 10:45:45 +0800
>>> Xishi Qiu  wrote:
>>>
 On 2015/4/20 9:42, Gu Zheng wrote:

> Hi Xishi,
> On 04/18/2015 04:05 AM, Yasuaki Ishimatsu wrote:
>
>>
>> Your patches will fix your issue.
>> But, if BIOS reports memory first at node hot add, pgdat can
>> not be initialized.
>>
>> Memory hot add flows are as follows:
>>
>> add_memory
>>   ...
>>   -> hotadd_new_pgdat()
>>   ...
>>   -> node_set_online(nid)
>>
>> When calling hotadd_new_pgdat() for a hot added node, the node is
>> offline because node_set_online() is not called yet. So if applying
>> your patches, the pgdat is not initialized in this case.
>
> Ishimtasu's worry is reasonable. And I am afraid the fix here is a bit
> over-kill. 
>
>>
>> Thanks,
>> Yasuaki Ishimatsu
>>
>> On Fri, 17 Apr 2015 18:50:32 +0800
>> Xishi Qiu  wrote:
>>
>>> Hot remove nodeXX, then hot add nodeXX. If BIOS report cpu first, it 
>>> will call
>>> hotadd_new_pgdat(nid, 0), this will set pgdat->node_start_pfn to 0. As 
>>> nodeXX
>>> exists at boot time, so pgdat->node_spanned_pages is the same as 
>>> original. Then
>>> free_area_init_core()->memmap_init() will pass a wrong start and a 
>>> nonzero size.
>
> As your analysis said the root cause here is passing a *0* as the 
> node_start_pfn,
> then the chaos occurred when init the zones. And this only happens to the 
> re-hotadd
> node, so how about using the saved *node_start_pfn* (via 
> get_pfn_range_for_nid(nid, _pfn, _pfn))
> instead if we find "pgdat->node_start_pfn == 0 && !node_online(XXX)"?
>
> Thanks,
> Gu
>

 Hi Gu,

 I first considered this method, but if the hot added node's start and size 
 are different
 from before, it makes the chaos.

>>>
 e.g.
 nodeXX (8-16G)
 remove nodeXX 
 BIOS report cpu first and online it
 hotadd nodeXX
 use the original value, so pgdat->node_start_pfn is set to 8G, and size is 
 8G
 BIOS report mem(10-12G)
 call add_memory()->__add_zone()->grow_zone_span()/grow_pgdat_span()
 the start is still 8G, not 10G, this is chaos!
>>>
>>> If you set CONFIG_HAVE_MEMBLOCK_NODE_MAP, kernel shows the following
>>> pr_info()'s message.
>>>
>>> void __paginginit free_area_init_node(int nid, unsigned long *zones_size,
>>> unsigned long node_start_pfn, unsigned long *zholes_size)
>>> {
>>> ...
>>> #ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
>>> get_pfn_range_for_nid(nid, _pfn, _pfn);
>>> pr_info("Initmem setup node %d [mem %#018Lx-%#018Lx]\n", nid,
>>> (u64)start_pfn << PAGE_SHIFT, ((u64)end_pfn << PAGE_SHIFT) 
>>> - 1);
>>> #endif
>>> }
>>>
>>> Is the memory range of the message "8G - 16G"?
>>> If so, the reason is that memblk is not deleted at memory hot remove.
>>>
>>> Thanks,
>>> Yasuaki Ishimatsu
>>>
>>
>> Hi Yasuaki,
>>
> 
>> By reading the code, I find memblk is not deleted at memory hot remove.
>> I am not sure whether we should remove it. If remove it, we should also reset
>> "arch_zone_lowest_possible_pfn", right? It seems a little complicated.
> 
> I think memblk should be added/removed by hot adding/removing memory.
> But, arch_zone_lowest_possible_pfn should not be changed.
> 

Ok, thanks for your suggestion.

> Thanks,
> Yasuaki Ishimatsu
> 
>>
>> Thanks,
>> Xishi Qiu
>>
>>>
>>>

 Thanks,
 Xishi Qiu

>>>
>>> .
>>>
>>
>>
>>
> 
> .
> 



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2 V2] memory-hotplug: fix BUG_ON in move_freepages()

2015-04-20 Thread Yasuaki Ishimatsu

On Mon, 20 Apr 2015 11:42:10 +0800
Xishi Qiu  wrote:

> On 2015/4/20 11:29, Yasuaki Ishimatsu wrote:
> 
> > 
> > On Mon, 20 Apr 2015 10:45:45 +0800
> > Xishi Qiu  wrote:
> > 
> >> On 2015/4/20 9:42, Gu Zheng wrote:
> >>
> >>> Hi Xishi,
> >>> On 04/18/2015 04:05 AM, Yasuaki Ishimatsu wrote:
> >>>
> 
>  Your patches will fix your issue.
>  But, if BIOS reports memory first at node hot add, pgdat can
>  not be initialized.
> 
>  Memory hot add flows are as follows:
> 
>  add_memory
>    ...
>    -> hotadd_new_pgdat()
>    ...
>    -> node_set_online(nid)
> 
>  When calling hotadd_new_pgdat() for a hot added node, the node is
>  offline because node_set_online() is not called yet. So if applying
>  your patches, the pgdat is not initialized in this case.
> >>>
> >>> Ishimtasu's worry is reasonable. And I am afraid the fix here is a bit
> >>> over-kill. 
> >>>
> 
>  Thanks,
>  Yasuaki Ishimatsu
> 
>  On Fri, 17 Apr 2015 18:50:32 +0800
>  Xishi Qiu  wrote:
> 
> > Hot remove nodeXX, then hot add nodeXX. If BIOS report cpu first, it 
> > will call
> > hotadd_new_pgdat(nid, 0), this will set pgdat->node_start_pfn to 0. As 
> > nodeXX
> > exists at boot time, so pgdat->node_spanned_pages is the same as 
> > original. Then
> > free_area_init_core()->memmap_init() will pass a wrong start and a 
> > nonzero size.
> >>>
> >>> As your analysis said the root cause here is passing a *0* as the 
> >>> node_start_pfn,
> >>> then the chaos occurred when init the zones. And this only happens to the 
> >>> re-hotadd
> >>> node, so how about using the saved *node_start_pfn* (via 
> >>> get_pfn_range_for_nid(nid, _pfn, _pfn))
> >>> instead if we find "pgdat->node_start_pfn == 0 && !node_online(XXX)"?
> >>>
> >>> Thanks,
> >>> Gu
> >>>
> >>
> >> Hi Gu,
> >>
> >> I first considered this method, but if the hot added node's start and size 
> >> are different
> >> from before, it makes the chaos.
> >>
> > 
> >> e.g.
> >> nodeXX (8-16G)
> >> remove nodeXX 
> >> BIOS report cpu first and online it
> >> hotadd nodeXX
> >> use the original value, so pgdat->node_start_pfn is set to 8G, and size is 
> >> 8G
> >> BIOS report mem(10-12G)
> >> call add_memory()->__add_zone()->grow_zone_span()/grow_pgdat_span()
> >> the start is still 8G, not 10G, this is chaos!
> > 
> > If you set CONFIG_HAVE_MEMBLOCK_NODE_MAP, kernel shows the following
> > pr_info()'s message.
> > 
> > void __paginginit free_area_init_node(int nid, unsigned long *zones_size,
> > unsigned long node_start_pfn, unsigned long *zholes_size)
> > {
> > ...
> > #ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
> > get_pfn_range_for_nid(nid, _pfn, _pfn);
> > pr_info("Initmem setup node %d [mem %#018Lx-%#018Lx]\n", nid,
> > (u64)start_pfn << PAGE_SHIFT, ((u64)end_pfn << PAGE_SHIFT) 
> > - 1);
> > #endif
> > }
> > 
> > Is the memory range of the message "8G - 16G"?
> > If so, the reason is that memblk is not deleted at memory hot remove.
> > 
> > Thanks,
> > Yasuaki Ishimatsu
> > 
> 
> Hi Yasuaki,
> 

> By reading the code, I find memblk is not deleted at memory hot remove.
> I am not sure whether we should remove it. If remove it, we should also reset
> "arch_zone_lowest_possible_pfn", right? It seems a little complicated.

I think memblk should be added/removed by hot adding/removing memory.
But, arch_zone_lowest_possible_pfn should not be changed.

Thanks,
Yasuaki Ishimatsu

> 
> Thanks,
> Xishi Qiu
> 
> > 
> > 
> >>
> >> Thanks,
> >> Xishi Qiu
> >>
> > 
> > .
> > 
> 
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2 V2] memory-hotplug: fix BUG_ON in move_freepages()

2015-04-20 Thread Xishi Qiu
On 2015/4/21 2:23, Yasuaki Ishimatsu wrote:

 
 On Mon, 20 Apr 2015 11:42:10 +0800
 Xishi Qiu qiuxi...@huawei.com wrote:
 
 On 2015/4/20 11:29, Yasuaki Ishimatsu wrote:


 On Mon, 20 Apr 2015 10:45:45 +0800
 Xishi Qiu qiuxi...@huawei.com wrote:

 On 2015/4/20 9:42, Gu Zheng wrote:

 Hi Xishi,
 On 04/18/2015 04:05 AM, Yasuaki Ishimatsu wrote:


 Your patches will fix your issue.
 But, if BIOS reports memory first at node hot add, pgdat can
 not be initialized.

 Memory hot add flows are as follows:

 add_memory
   ...
   - hotadd_new_pgdat()
   ...
   - node_set_online(nid)

 When calling hotadd_new_pgdat() for a hot added node, the node is
 offline because node_set_online() is not called yet. So if applying
 your patches, the pgdat is not initialized in this case.

 Ishimtasu's worry is reasonable. And I am afraid the fix here is a bit
 over-kill. 


 Thanks,
 Yasuaki Ishimatsu

 On Fri, 17 Apr 2015 18:50:32 +0800
 Xishi Qiu qiuxi...@huawei.com wrote:

 Hot remove nodeXX, then hot add nodeXX. If BIOS report cpu first, it 
 will call
 hotadd_new_pgdat(nid, 0), this will set pgdat-node_start_pfn to 0. As 
 nodeXX
 exists at boot time, so pgdat-node_spanned_pages is the same as 
 original. Then
 free_area_init_core()-memmap_init() will pass a wrong start and a 
 nonzero size.

 As your analysis said the root cause here is passing a *0* as the 
 node_start_pfn,
 then the chaos occurred when init the zones. And this only happens to the 
 re-hotadd
 node, so how about using the saved *node_start_pfn* (via 
 get_pfn_range_for_nid(nid, start_pfn, end_pfn))
 instead if we find pgdat-node_start_pfn == 0  !node_online(XXX)?

 Thanks,
 Gu


 Hi Gu,

 I first considered this method, but if the hot added node's start and size 
 are different
 from before, it makes the chaos.


 e.g.
 nodeXX (8-16G)
 remove nodeXX 
 BIOS report cpu first and online it
 hotadd nodeXX
 use the original value, so pgdat-node_start_pfn is set to 8G, and size is 
 8G
 BIOS report mem(10-12G)
 call add_memory()-__add_zone()-grow_zone_span()/grow_pgdat_span()
 the start is still 8G, not 10G, this is chaos!

 If you set CONFIG_HAVE_MEMBLOCK_NODE_MAP, kernel shows the following
 pr_info()'s message.

 void __paginginit free_area_init_node(int nid, unsigned long *zones_size,
 unsigned long node_start_pfn, unsigned long *zholes_size)
 {
 ...
 #ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
 get_pfn_range_for_nid(nid, start_pfn, end_pfn);
 pr_info(Initmem setup node %d [mem %#018Lx-%#018Lx]\n, nid,
 (u64)start_pfn  PAGE_SHIFT, ((u64)end_pfn  PAGE_SHIFT) 
 - 1);
 #endif
 }

 Is the memory range of the message 8G - 16G?
 If so, the reason is that memblk is not deleted at memory hot remove.

 Thanks,
 Yasuaki Ishimatsu


 Hi Yasuaki,

 
 By reading the code, I find memblk is not deleted at memory hot remove.
 I am not sure whether we should remove it. If remove it, we should also reset
 arch_zone_lowest_possible_pfn, right? It seems a little complicated.
 
 I think memblk should be added/removed by hot adding/removing memory.
 But, arch_zone_lowest_possible_pfn should not be changed.
 

Ok, thanks for your suggestion.

 Thanks,
 Yasuaki Ishimatsu
 

 Thanks,
 Xishi Qiu




 Thanks,
 Xishi Qiu


 .




 
 .
 



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2 V2] memory-hotplug: fix BUG_ON in move_freepages()

2015-04-20 Thread Yasuaki Ishimatsu

On Mon, 20 Apr 2015 11:42:10 +0800
Xishi Qiu qiuxi...@huawei.com wrote:

 On 2015/4/20 11:29, Yasuaki Ishimatsu wrote:
 
  
  On Mon, 20 Apr 2015 10:45:45 +0800
  Xishi Qiu qiuxi...@huawei.com wrote:
  
  On 2015/4/20 9:42, Gu Zheng wrote:
 
  Hi Xishi,
  On 04/18/2015 04:05 AM, Yasuaki Ishimatsu wrote:
 
 
  Your patches will fix your issue.
  But, if BIOS reports memory first at node hot add, pgdat can
  not be initialized.
 
  Memory hot add flows are as follows:
 
  add_memory
...
- hotadd_new_pgdat()
...
- node_set_online(nid)
 
  When calling hotadd_new_pgdat() for a hot added node, the node is
  offline because node_set_online() is not called yet. So if applying
  your patches, the pgdat is not initialized in this case.
 
  Ishimtasu's worry is reasonable. And I am afraid the fix here is a bit
  over-kill. 
 
 
  Thanks,
  Yasuaki Ishimatsu
 
  On Fri, 17 Apr 2015 18:50:32 +0800
  Xishi Qiu qiuxi...@huawei.com wrote:
 
  Hot remove nodeXX, then hot add nodeXX. If BIOS report cpu first, it 
  will call
  hotadd_new_pgdat(nid, 0), this will set pgdat-node_start_pfn to 0. As 
  nodeXX
  exists at boot time, so pgdat-node_spanned_pages is the same as 
  original. Then
  free_area_init_core()-memmap_init() will pass a wrong start and a 
  nonzero size.
 
  As your analysis said the root cause here is passing a *0* as the 
  node_start_pfn,
  then the chaos occurred when init the zones. And this only happens to the 
  re-hotadd
  node, so how about using the saved *node_start_pfn* (via 
  get_pfn_range_for_nid(nid, start_pfn, end_pfn))
  instead if we find pgdat-node_start_pfn == 0  !node_online(XXX)?
 
  Thanks,
  Gu
 
 
  Hi Gu,
 
  I first considered this method, but if the hot added node's start and size 
  are different
  from before, it makes the chaos.
 
  
  e.g.
  nodeXX (8-16G)
  remove nodeXX 
  BIOS report cpu first and online it
  hotadd nodeXX
  use the original value, so pgdat-node_start_pfn is set to 8G, and size is 
  8G
  BIOS report mem(10-12G)
  call add_memory()-__add_zone()-grow_zone_span()/grow_pgdat_span()
  the start is still 8G, not 10G, this is chaos!
  
  If you set CONFIG_HAVE_MEMBLOCK_NODE_MAP, kernel shows the following
  pr_info()'s message.
  
  void __paginginit free_area_init_node(int nid, unsigned long *zones_size,
  unsigned long node_start_pfn, unsigned long *zholes_size)
  {
  ...
  #ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
  get_pfn_range_for_nid(nid, start_pfn, end_pfn);
  pr_info(Initmem setup node %d [mem %#018Lx-%#018Lx]\n, nid,
  (u64)start_pfn  PAGE_SHIFT, ((u64)end_pfn  PAGE_SHIFT) 
  - 1);
  #endif
  }
  
  Is the memory range of the message 8G - 16G?
  If so, the reason is that memblk is not deleted at memory hot remove.
  
  Thanks,
  Yasuaki Ishimatsu
  
 
 Hi Yasuaki,
 

 By reading the code, I find memblk is not deleted at memory hot remove.
 I am not sure whether we should remove it. If remove it, we should also reset
 arch_zone_lowest_possible_pfn, right? It seems a little complicated.

I think memblk should be added/removed by hot adding/removing memory.
But, arch_zone_lowest_possible_pfn should not be changed.

Thanks,
Yasuaki Ishimatsu

 
 Thanks,
 Xishi Qiu
 
  
  
 
  Thanks,
  Xishi Qiu
 
  
  .
  
 
 
 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2 V2] memory-hotplug: fix BUG_ON in move_freepages()

2015-04-19 Thread Xishi Qiu
On 2015/4/20 11:29, Yasuaki Ishimatsu wrote:

> 
> On Mon, 20 Apr 2015 10:45:45 +0800
> Xishi Qiu  wrote:
> 
>> On 2015/4/20 9:42, Gu Zheng wrote:
>>
>>> Hi Xishi,
>>> On 04/18/2015 04:05 AM, Yasuaki Ishimatsu wrote:
>>>

 Your patches will fix your issue.
 But, if BIOS reports memory first at node hot add, pgdat can
 not be initialized.

 Memory hot add flows are as follows:

 add_memory
   ...
   -> hotadd_new_pgdat()
   ...
   -> node_set_online(nid)

 When calling hotadd_new_pgdat() for a hot added node, the node is
 offline because node_set_online() is not called yet. So if applying
 your patches, the pgdat is not initialized in this case.
>>>
>>> Ishimtasu's worry is reasonable. And I am afraid the fix here is a bit
>>> over-kill. 
>>>

 Thanks,
 Yasuaki Ishimatsu

 On Fri, 17 Apr 2015 18:50:32 +0800
 Xishi Qiu  wrote:

> Hot remove nodeXX, then hot add nodeXX. If BIOS report cpu first, it will 
> call
> hotadd_new_pgdat(nid, 0), this will set pgdat->node_start_pfn to 0. As 
> nodeXX
> exists at boot time, so pgdat->node_spanned_pages is the same as 
> original. Then
> free_area_init_core()->memmap_init() will pass a wrong start and a 
> nonzero size.
>>>
>>> As your analysis said the root cause here is passing a *0* as the 
>>> node_start_pfn,
>>> then the chaos occurred when init the zones. And this only happens to the 
>>> re-hotadd
>>> node, so how about using the saved *node_start_pfn* (via 
>>> get_pfn_range_for_nid(nid, _pfn, _pfn))
>>> instead if we find "pgdat->node_start_pfn == 0 && !node_online(XXX)"?
>>>
>>> Thanks,
>>> Gu
>>>
>>
>> Hi Gu,
>>
>> I first considered this method, but if the hot added node's start and size 
>> are different
>> from before, it makes the chaos.
>>
> 
>> e.g.
>> nodeXX (8-16G)
>> remove nodeXX 
>> BIOS report cpu first and online it
>> hotadd nodeXX
>> use the original value, so pgdat->node_start_pfn is set to 8G, and size is 8G
>> BIOS report mem(10-12G)
>> call add_memory()->__add_zone()->grow_zone_span()/grow_pgdat_span()
>> the start is still 8G, not 10G, this is chaos!
> 
> If you set CONFIG_HAVE_MEMBLOCK_NODE_MAP, kernel shows the following
> pr_info()'s message.
> 
> void __paginginit free_area_init_node(int nid, unsigned long *zones_size,
> unsigned long node_start_pfn, unsigned long *zholes_size)
> {
> ...
> #ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
> get_pfn_range_for_nid(nid, _pfn, _pfn);
> pr_info("Initmem setup node %d [mem %#018Lx-%#018Lx]\n", nid,
> (u64)start_pfn << PAGE_SHIFT, ((u64)end_pfn << PAGE_SHIFT) - 
> 1);
> #endif
> }
> 
> Is the memory range of the message "8G - 16G"?
> If so, the reason is that memblk is not deleted at memory hot remove.
> 
> Thanks,
> Yasuaki Ishimatsu
> 

Hi Yasuaki,

By reading the code, I find memblk is not deleted at memory hot remove.
I am not sure whether we should remove it. If remove it, we should also reset
"arch_zone_lowest_possible_pfn", right? It seems a little complicated.

Thanks,
Xishi Qiu

> 
> 
>>
>> Thanks,
>> Xishi Qiu
>>
> 
> .
> 



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2 V2] memory-hotplug: fix BUG_ON in move_freepages()

2015-04-19 Thread Xishi Qiu
On 2015/4/20 11:15, Yasuaki Ishimatsu wrote:

> 
> On Mon, 20 Apr 2015 10:59:37 +0800
> Xishi Qiu  wrote:
> 
>> On 2015/4/20 10:09, Gu Zheng wrote:
>>
>>> Hi Ishimatsu, Xishi,
>>>
>>> On 04/20/2015 10:11 AM, Yasuaki Ishimatsu wrote:
>>>

> When hot adding memory and creating new node, the node is offline.
> And after calling node_set_online(), the node becomes online.
>
> Oh, sorry. I misread your ptaches.
>

 Please ignore it...
>>>
>>> Seems also a misread to me.
>>> I clear it (my worry) here:
>>> If we set the node size to 0 here, it may hidden more things than we 
>>> experted.
>>> All the init chunks around with the size (spanned/present/managed...) will
>>> be non-sense, and the user/caller will not get a summary of the hot added 
>>> node
>>> because of the changes here.
>>> I am not sure the worry is necessary, please correct me if I missing 
>>> something.
>>>
>>> Regards,
>>> Gu
>>>
>>
>> Hi Gu,
>>
>> My patch is just set size to 0 when hotadd a node(old or new). I know your 
>> worry,
>> but I think it is not necessary.
>>
> 
>> When we calculate the size, it uses "arch_zone_lowest_possible_pfn[]" and 
>> "memblock",
>> and they are both from boot time. If we hotadd a new node, the calculated 
>> size is
>> 0 too. When add momery, __add_zone() will grow the size and start.
> 
> If hot adding new node, you are right. But if hot removing a memory which
> is presented at boot time, memblock of the memory range is not deleted.
> So when hot adding the memory, the calculated size does not become 0.
> 

Yes, so I just set it to 0, init_currently_empty_zone() and memmap_init() will 
be called
in __add_zone(), and start/size also will be grow there.

Thanks,
Xishi Qiu

> Thanks,
> Yasuaki Ishimatsu
> 
>>
>> Thanks,
>> Xishi Qiu
>>
> 
> .
> 



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2 V2] memory-hotplug: fix BUG_ON in move_freepages()

2015-04-19 Thread Yasuaki Ishimatsu

On Mon, 20 Apr 2015 10:45:45 +0800
Xishi Qiu  wrote:

> On 2015/4/20 9:42, Gu Zheng wrote:
> 
> > Hi Xishi,
> > On 04/18/2015 04:05 AM, Yasuaki Ishimatsu wrote:
> > 
> >>
> >> Your patches will fix your issue.
> >> But, if BIOS reports memory first at node hot add, pgdat can
> >> not be initialized.
> >>
> >> Memory hot add flows are as follows:
> >>
> >> add_memory
> >>   ...
> >>   -> hotadd_new_pgdat()
> >>   ...
> >>   -> node_set_online(nid)
> >>
> >> When calling hotadd_new_pgdat() for a hot added node, the node is
> >> offline because node_set_online() is not called yet. So if applying
> >> your patches, the pgdat is not initialized in this case.
> > 
> > Ishimtasu's worry is reasonable. And I am afraid the fix here is a bit
> > over-kill. 
> > 
> >>
> >> Thanks,
> >> Yasuaki Ishimatsu
> >>
> >> On Fri, 17 Apr 2015 18:50:32 +0800
> >> Xishi Qiu  wrote:
> >>
> >>> Hot remove nodeXX, then hot add nodeXX. If BIOS report cpu first, it will 
> >>> call
> >>> hotadd_new_pgdat(nid, 0), this will set pgdat->node_start_pfn to 0. As 
> >>> nodeXX
> >>> exists at boot time, so pgdat->node_spanned_pages is the same as 
> >>> original. Then
> >>> free_area_init_core()->memmap_init() will pass a wrong start and a 
> >>> nonzero size.
> > 
> > As your analysis said the root cause here is passing a *0* as the 
> > node_start_pfn,
> > then the chaos occurred when init the zones. And this only happens to the 
> > re-hotadd
> > node, so how about using the saved *node_start_pfn* (via 
> > get_pfn_range_for_nid(nid, _pfn, _pfn))
> > instead if we find "pgdat->node_start_pfn == 0 && !node_online(XXX)"?
> > 
> > Thanks,
> > Gu
> > 
> 
> Hi Gu,
> 
> I first considered this method, but if the hot added node's start and size 
> are different
> from before, it makes the chaos.
> 

> e.g.
> nodeXX (8-16G)
> remove nodeXX 
> BIOS report cpu first and online it
> hotadd nodeXX
> use the original value, so pgdat->node_start_pfn is set to 8G, and size is 8G
> BIOS report mem(10-12G)
> call add_memory()->__add_zone()->grow_zone_span()/grow_pgdat_span()
> the start is still 8G, not 10G, this is chaos!

If you set CONFIG_HAVE_MEMBLOCK_NODE_MAP, kernel shows the following
pr_info()'s message.

void __paginginit free_area_init_node(int nid, unsigned long *zones_size,
unsigned long node_start_pfn, unsigned long *zholes_size)
{
...
#ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
get_pfn_range_for_nid(nid, _pfn, _pfn);
pr_info("Initmem setup node %d [mem %#018Lx-%#018Lx]\n", nid,
(u64)start_pfn << PAGE_SHIFT, ((u64)end_pfn << PAGE_SHIFT) - 1);
#endif
}

Is the memory range of the message "8G - 16G"?
If so, the reason is that memblk is not deleted at memory hot remove.

Thanks,
Yasuaki Ishimatsu



> 
> Thanks,
> Xishi Qiu
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2 V2] memory-hotplug: fix BUG_ON in move_freepages()

2015-04-19 Thread Yasuaki Ishimatsu

On Mon, 20 Apr 2015 10:59:37 +0800
Xishi Qiu  wrote:

> On 2015/4/20 10:09, Gu Zheng wrote:
> 
> > Hi Ishimatsu, Xishi,
> > 
> > On 04/20/2015 10:11 AM, Yasuaki Ishimatsu wrote:
> > 
> >>
> >>> When hot adding memory and creating new node, the node is offline.
> >>> And after calling node_set_online(), the node becomes online.
> >>>
> >>> Oh, sorry. I misread your ptaches.
> >>>
> >>
> >> Please ignore it...
> > 
> > Seems also a misread to me.
> > I clear it (my worry) here:
> > If we set the node size to 0 here, it may hidden more things than we 
> > experted.
> > All the init chunks around with the size (spanned/present/managed...) will
> > be non-sense, and the user/caller will not get a summary of the hot added 
> > node
> > because of the changes here.
> > I am not sure the worry is necessary, please correct me if I missing 
> > something.
> > 
> > Regards,
> > Gu
> > 
> 
> Hi Gu,
> 
> My patch is just set size to 0 when hotadd a node(old or new). I know your 
> worry,
> but I think it is not necessary.
> 

> When we calculate the size, it uses "arch_zone_lowest_possible_pfn[]" and 
> "memblock",
> and they are both from boot time. If we hotadd a new node, the calculated 
> size is
> 0 too. When add momery, __add_zone() will grow the size and start.

If hot adding new node, you are right. But if hot removing a memory which
is presented at boot time, memblock of the memory range is not deleted.
So when hot adding the memory, the calculated size does not become 0.

Thanks,
Yasuaki Ishimatsu

> 
> Thanks,
> Xishi Qiu
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2 V2] memory-hotplug: fix BUG_ON in move_freepages()

2015-04-19 Thread Xishi Qiu
On 2015/4/20 9:42, Gu Zheng wrote:

> Hi Xishi,
> On 04/18/2015 04:05 AM, Yasuaki Ishimatsu wrote:
> 
>>
>> Your patches will fix your issue.
>> But, if BIOS reports memory first at node hot add, pgdat can
>> not be initialized.
>>
>> Memory hot add flows are as follows:
>>
>> add_memory
>>   ...
>>   -> hotadd_new_pgdat()
>>   ...
>>   -> node_set_online(nid)
>>
>> When calling hotadd_new_pgdat() for a hot added node, the node is
>> offline because node_set_online() is not called yet. So if applying
>> your patches, the pgdat is not initialized in this case.
> 
> Ishimtasu's worry is reasonable. And I am afraid the fix here is a bit
> over-kill. 
> 
>>
>> Thanks,
>> Yasuaki Ishimatsu
>>
>> On Fri, 17 Apr 2015 18:50:32 +0800
>> Xishi Qiu  wrote:
>>
>>> Hot remove nodeXX, then hot add nodeXX. If BIOS report cpu first, it will 
>>> call
>>> hotadd_new_pgdat(nid, 0), this will set pgdat->node_start_pfn to 0. As 
>>> nodeXX
>>> exists at boot time, so pgdat->node_spanned_pages is the same as original. 
>>> Then
>>> free_area_init_core()->memmap_init() will pass a wrong start and a nonzero 
>>> size.
> 
> As your analysis said the root cause here is passing a *0* as the 
> node_start_pfn,
> then the chaos occurred when init the zones. And this only happens to the 
> re-hotadd
> node, so how about using the saved *node_start_pfn* (via 
> get_pfn_range_for_nid(nid, _pfn, _pfn))
> instead if we find "pgdat->node_start_pfn == 0 && !node_online(XXX)"?
> 
> Thanks,
> Gu
> 

Hi Gu,

I first considered this method, but if the hot added node's start and size are 
different
from before, it makes the chaos.

e.g.
nodeXX (8-16G)
remove nodeXX 
BIOS report cpu first and online it
hotadd nodeXX
use the original value, so pgdat->node_start_pfn is set to 8G, and size is 8G
BIOS report mem(10-12G)
call add_memory()->__add_zone()->grow_zone_span()/grow_pgdat_span()
the start is still 8G, not 10G, this is chaos!

Thanks,
Xishi Qiu

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2 V2] memory-hotplug: fix BUG_ON in move_freepages()

2015-04-19 Thread Xishi Qiu
On 2015/4/20 10:09, Gu Zheng wrote:

> Hi Ishimatsu, Xishi,
> 
> On 04/20/2015 10:11 AM, Yasuaki Ishimatsu wrote:
> 
>>
>>> When hot adding memory and creating new node, the node is offline.
>>> And after calling node_set_online(), the node becomes online.
>>>
>>> Oh, sorry. I misread your ptaches.
>>>
>>
>> Please ignore it...
> 
> Seems also a misread to me.
> I clear it (my worry) here:
> If we set the node size to 0 here, it may hidden more things than we experted.
> All the init chunks around with the size (spanned/present/managed...) will
> be non-sense, and the user/caller will not get a summary of the hot added node
> because of the changes here.
> I am not sure the worry is necessary, please correct me if I missing 
> something.
> 
> Regards,
> Gu
> 

Hi Gu,

My patch is just set size to 0 when hotadd a node(old or new). I know your 
worry,
but I think it is not necessary.

When we calculate the size, it uses "arch_zone_lowest_possible_pfn[]" and 
"memblock",
and they are both from boot time. If we hotadd a new node, the calculated size 
is
0 too. When add momery, __add_zone() will grow the size and start.

Thanks,
Xishi Qiu

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2 V2] memory-hotplug: fix BUG_ON in move_freepages()

2015-04-19 Thread Gu Zheng
Hi Ishimatsu, Xishi,

On 04/20/2015 10:11 AM, Yasuaki Ishimatsu wrote:

> 
>> When hot adding memory and creating new node, the node is offline.
>> And after calling node_set_online(), the node becomes online.
>>
>> Oh, sorry. I misread your ptaches.
>>
> 
> Please ignore it...

Seems also a misread to me.
I clear it (my worry) here:
If we set the node size to 0 here, it may hidden more things than we experted,
and all the init chunks around with the size (spanned/present/managed...) will
be non-sense, and the user/caller will not get a summary of the hot added node
because of the changes here.
I am not sure the worry is necessary, please correct me if I missing something.

Regards,
Gu

> 
> Thanks,
> Yasuaki Ishimatsu
> 
> On 
> Yasuaki Ishimatsu  wrote:
> 
>>
>> When hot adding memory and creating new node, the node is offline.
>> And after calling node_set_online(), the node becomes online.
>>
>> Oh, sorry. I misread your ptaches.
>>
>> Thanks,
>> Yasuaki Ishimatsu
>>
>> On Mon, 20 Apr 2015 09:33:10 +0800
>> Xishi Qiu  wrote:
>>
>>> On 2015/4/18 4:05, Yasuaki Ishimatsu wrote:
>>>

 Your patches will fix your issue.
 But, if BIOS reports memory first at node hot add, pgdat can
 not be initialized.

 Memory hot add flows are as follows:

 add_memory
   ...
   -> hotadd_new_pgdat()
   ...
   -> node_set_online(nid)

 When calling hotadd_new_pgdat() for a hot added node, the node is
 offline because node_set_online() is not called yet. So if applying
 your patches, the pgdat is not initialized in this case.

 Thanks,
 Yasuaki Ishimatsu

>>>
>>> Hi Yasuaki,
>>>
>>> I'm not quite understand, when BIOS reports memory first, why pgdat
>>> can not be initialized?
>>> When hotadd a new node, hotadd_new_pgdat() will be called too, and
>>> when hotadd memory to a existent node, it's no need to call 
>>> hotadd_new_pgdat(),
>>> right?
>>>
>>> Thanks,
>>> Xishi Qiu
>>>
> .
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2 V2] memory-hotplug: fix BUG_ON in move_freepages()

2015-04-19 Thread Gu Zheng
Hi Ishimatsu, Xishi,

On 04/20/2015 10:11 AM, Yasuaki Ishimatsu wrote:

> 
>> When hot adding memory and creating new node, the node is offline.
>> And after calling node_set_online(), the node becomes online.
>>
>> Oh, sorry. I misread your ptaches.
>>
> 
> Please ignore it...

Seems also a misread to me.
I clear it (my worry) here:
If we set the node size to 0 here, it may hidden more things than we experted.
All the init chunks around with the size (spanned/present/managed...) will
be non-sense, and the user/caller will not get a summary of the hot added node
because of the changes here.
I am not sure the worry is necessary, please correct me if I missing something.

Regards,
Gu

> 
> Thanks,
> Yasuaki Ishimatsu
> 
> On 
> Yasuaki Ishimatsu  wrote:
> 
>>
>> When hot adding memory and creating new node, the node is offline.
>> And after calling node_set_online(), the node becomes online.
>>
>> Oh, sorry. I misread your ptaches.
>>
>> Thanks,
>> Yasuaki Ishimatsu
>>
>> On Mon, 20 Apr 2015 09:33:10 +0800
>> Xishi Qiu  wrote:
>>
>>> On 2015/4/18 4:05, Yasuaki Ishimatsu wrote:
>>>

 Your patches will fix your issue.
 But, if BIOS reports memory first at node hot add, pgdat can
 not be initialized.

 Memory hot add flows are as follows:

 add_memory
   ...
   -> hotadd_new_pgdat()
   ...
   -> node_set_online(nid)

 When calling hotadd_new_pgdat() for a hot added node, the node is
 offline because node_set_online() is not called yet. So if applying
 your patches, the pgdat is not initialized in this case.

 Thanks,
 Yasuaki Ishimatsu

>>>
>>> Hi Yasuaki,
>>>
>>> I'm not quite understand, when BIOS reports memory first, why pgdat
>>> can not be initialized?
>>> When hotadd a new node, hotadd_new_pgdat() will be called too, and
>>> when hotadd memory to a existent node, it's no need to call 
>>> hotadd_new_pgdat(),
>>> right?
>>>
>>> Thanks,
>>> Xishi Qiu
>>>
> .
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2 V2] memory-hotplug: fix BUG_ON in move_freepages()

2015-04-19 Thread Yasuaki Ishimatsu

On Mon, 20 Apr 2015 09:33:10 +0800
Xishi Qiu  wrote:

> On 2015/4/18 4:05, Yasuaki Ishimatsu wrote:
> 
> > 
> > Your patches will fix your issue.
> > But, if BIOS reports memory first at node hot add, pgdat can
> > not be initialized.
> > 
> > Memory hot add flows are as follows:
> > 
> > add_memory
> >   ...
> >   -> hotadd_new_pgdat()
> >   ...
> >   -> node_set_online(nid)
> > 
> > When calling hotadd_new_pgdat() for a hot added node, the node is
> > offline because node_set_online() is not called yet. So if applying
> > your patches, the pgdat is not initialized in this case.
> > 
> > Thanks,
> > Yasuaki Ishimatsu
> > 
> 
> Hi Yasuaki,
> 

> I'm not quite understand, when BIOS reports memory first, why pgdat
> can not be initialized?
> When hotadd a new node, hotadd_new_pgdat() will be called too, and
> when hotadd memory to a existent node, it's no need to call 
> hotadd_new_pgdat(),
> right?

Your patch sikps initialization of pgdat, when node is offline.
But when hot adding new node and calling hotadd_new_pgdat(), the node
is offline yet. So pgdat is not initialized. 

Thanks,
Yasuaki Ishimatsu

> 
> Thanks,
> Xishi Qiu
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2 V2] memory-hotplug: fix BUG_ON in move_freepages()

2015-04-19 Thread Yasuaki Ishimatsu

> When hot adding memory and creating new node, the node is offline.
> And after calling node_set_online(), the node becomes online.
> 
> Oh, sorry. I misread your ptaches.
> 

Please ignore it...

Thanks,
Yasuaki Ishimatsu

On 
Yasuaki Ishimatsu  wrote:

> 
> When hot adding memory and creating new node, the node is offline.
> And after calling node_set_online(), the node becomes online.
> 
> Oh, sorry. I misread your ptaches.
> 
> Thanks,
> Yasuaki Ishimatsu
> 
> On Mon, 20 Apr 2015 09:33:10 +0800
> Xishi Qiu  wrote:
> 
> > On 2015/4/18 4:05, Yasuaki Ishimatsu wrote:
> > 
> > > 
> > > Your patches will fix your issue.
> > > But, if BIOS reports memory first at node hot add, pgdat can
> > > not be initialized.
> > > 
> > > Memory hot add flows are as follows:
> > > 
> > > add_memory
> > >   ...
> > >   -> hotadd_new_pgdat()
> > >   ...
> > >   -> node_set_online(nid)
> > > 
> > > When calling hotadd_new_pgdat() for a hot added node, the node is
> > > offline because node_set_online() is not called yet. So if applying
> > > your patches, the pgdat is not initialized in this case.
> > > 
> > > Thanks,
> > > Yasuaki Ishimatsu
> > > 
> > 
> > Hi Yasuaki,
> > 
> > I'm not quite understand, when BIOS reports memory first, why pgdat
> > can not be initialized?
> > When hotadd a new node, hotadd_new_pgdat() will be called too, and
> > when hotadd memory to a existent node, it's no need to call 
> > hotadd_new_pgdat(),
> > right?
> > 
> > Thanks,
> > Xishi Qiu
> > 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2 V2] memory-hotplug: fix BUG_ON in move_freepages()

2015-04-19 Thread Gu Zheng
Hi Xishi,
On 04/18/2015 04:05 AM, Yasuaki Ishimatsu wrote:

> 
> Your patches will fix your issue.
> But, if BIOS reports memory first at node hot add, pgdat can
> not be initialized.
> 
> Memory hot add flows are as follows:
> 
> add_memory
>   ...
>   -> hotadd_new_pgdat()
>   ...
>   -> node_set_online(nid)
> 
> When calling hotadd_new_pgdat() for a hot added node, the node is
> offline because node_set_online() is not called yet. So if applying
> your patches, the pgdat is not initialized in this case.

Ishimtasu's worry is reasonable. And I am afraid the fix here is a bit
over-kill. 

> 
> Thanks,
> Yasuaki Ishimatsu
> 
> On Fri, 17 Apr 2015 18:50:32 +0800
> Xishi Qiu  wrote:
> 
>> Hot remove nodeXX, then hot add nodeXX. If BIOS report cpu first, it will 
>> call
>> hotadd_new_pgdat(nid, 0), this will set pgdat->node_start_pfn to 0. As nodeXX
>> exists at boot time, so pgdat->node_spanned_pages is the same as original. 
>> Then
>> free_area_init_core()->memmap_init() will pass a wrong start and a nonzero 
>> size.

As your analysis said the root cause here is passing a *0* as the 
node_start_pfn,
then the chaos occurred when init the zones. And this only happens to the 
re-hotadd
node, so how about using the saved *node_start_pfn* (via 
get_pfn_range_for_nid(nid, _pfn, _pfn))
instead if we find "pgdat->node_start_pfn == 0 && !node_online(XXX)"?

Thanks,
Gu

>>
>> free_area_init_core()
>>  memmap_init()
>>  memmap_init_zone()
>>  early_pfn_in_nid()
>>  set_page_links()
>>
>> "if (!early_pfn_in_nid(pfn, nid))" will skip the pfn(memory in section), but 
>> it
>> will not skip the pfn(hole in section), this will cover and relink the page 
>> to
>> zone/nid, so page_zone() from memory and hole in the same section are 
>> different.
>> The following call trace shows the bug.
>>
>> This patch will set the node size to 0 when hotadd a new node(original or 
>> new).
>> init_currently_empty_zone() and memmap_init() will be called in add_zone(), 
>> so
>> need not to change it.
>>
>> [90476.077469] kernel BUG at mm/page_alloc.c:1042!  // move_freepages() -> 
>> BUG_ON(page_zone(start_page) != page_zone(end_page));
>> [90476.077469] invalid opcode:  [#1] SMP 
>> [90476.077469] Modules linked in: iptable_nat nf_conntrack_ipv4 
>> nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack fuse btrfs zlib_deflate 
>> raid6_pq xor msdos ext4 mbcache jbd2 binfmt_misc bridge stp llc 
>> ip6table_filter ip6_tables iptable_filter ip_tables ebtable_nat ebtables 
>> cfg80211 rfkill sg iTCO_wdt iTCO_vendor_support intel_powerclamp coretemp 
>> intel_rapl kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32c_intel 
>> ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd 
>> pcspkr igb vfat i2c_algo_bit dca fat sb_edac edac_core i2c_i801 lpc_ich 
>> i2c_core mfd_core shpchp acpi_pad ipmi_si ipmi_msghandler uinput nfsd 
>> auth_rpcgss nfs_acl lockd sunrpc xfs libcrc32c sd_mod crc_t10dif 
>> crct10dif_common ahci libahci megaraid_sas tg3 ptp libata pps_core dm_mirror 
>> dm_region_hash dm_log dm_mod [last unloaded: rasf]
>> [90476.157382] CPU: 2 PID: 322803 Comm: updatedb Tainted: GF   W  
>> O--   3.10.0-229.1.2.5.hulk.rc14.x86_64 #1
>> [90476.157382] Hardware name: HUAWEI TECHNOLOGIES CO.,LTD. Huawei N1/Huawei 
>> N1, BIOS V100R001 04/13/2015
>> [90476.157382] task: 88006a6d5b00 ti: 880068eb8000 task.ti: 
>> 880068eb8000
>> [90476.157382] RIP: 0010:[]  [] 
>> move_freepages+0x12f/0x140
>> [90476.157382] RSP: 0018:880068ebb640  EFLAGS: 00010002
>> [90476.157382] RAX: 880002316cc0 RBX: ea0001bd RCX: 
>> 0001
>> [90476.157382] RDX: 880002476e40 RSI:  RDI: 
>> 880002316cc0
>> [90476.157382] RBP: 880068ebb690 R08: 0010 R09: 
>> ea0001bd7fc0
>> [90476.157382] R10: 0006f5ff R11:  R12: 
>> 0001
>> [90476.157382] R13: 0003 R14: 880002316eb8 R15: 
>> ea0001bd7fc0
>> [90476.157382] FS:  7f4d3ab95740() GS:880033a0() 
>> knlGS:
>> [90476.157382] CS:  0010 DS:  ES:  CR0: 80050033
>> [90476.157382] CR2: 7f4d3ae1a808 CR3: 00018907a000 CR4: 
>> 001407e0
>> [90476.157382] DR0:  DR1:  DR2: 
>> 
>> [90476.157382] DR3:  DR6: fffe0ff0 DR7: 
>> 0400
>> [90476.157382] Stack:
>> [90476.157382]  880068ebb698 880002316cc0 a800b5378098 
>> 880068ebb698
>> [90476.157382]  810b11dc 880002316cc0 0001 
>> 0003
>> [90476.157382]  880002316eb8 ea0001bd6420 880068ebb6a0 
>> 8115a003
>> [90476.157382] Call Trace:
>> [90476.157382]  [] ? update_curr+0xcc/0x150
>> [90476.157382]  [] move_freepages_block+0x73/0x80
>> [90476.157382]  [] __rmqueue+0x26a/0x460
>> [90476.157382]  [] ? native_sched_clock+0x13/0x80
>> 

Re: [PATCH 1/2 V2] memory-hotplug: fix BUG_ON in move_freepages()

2015-04-19 Thread Xishi Qiu
On 2015/4/18 4:05, Yasuaki Ishimatsu wrote:

> 
> Your patches will fix your issue.
> But, if BIOS reports memory first at node hot add, pgdat can
> not be initialized.
> 
> Memory hot add flows are as follows:
> 
> add_memory
>   ...
>   -> hotadd_new_pgdat()
>   ...
>   -> node_set_online(nid)
> 
> When calling hotadd_new_pgdat() for a hot added node, the node is
> offline because node_set_online() is not called yet. So if applying
> your patches, the pgdat is not initialized in this case.
> 
> Thanks,
> Yasuaki Ishimatsu
> 

Hi Yasuaki,

I'm not quite understand, when BIOS reports memory first, why pgdat
can not be initialized?
When hotadd a new node, hotadd_new_pgdat() will be called too, and
when hotadd memory to a existent node, it's no need to call hotadd_new_pgdat(),
right?

Thanks,
Xishi Qiu

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2 V2] memory-hotplug: fix BUG_ON in move_freepages()

2015-04-19 Thread Yasuaki Ishimatsu

When hot adding memory and creating new node, the node is offline.
And after calling node_set_online(), the node becomes online.

Oh, sorry. I misread your ptaches.

Thanks,
Yasuaki Ishimatsu

On Mon, 20 Apr 2015 09:33:10 +0800
Xishi Qiu  wrote:

> On 2015/4/18 4:05, Yasuaki Ishimatsu wrote:
> 
> > 
> > Your patches will fix your issue.
> > But, if BIOS reports memory first at node hot add, pgdat can
> > not be initialized.
> > 
> > Memory hot add flows are as follows:
> > 
> > add_memory
> >   ...
> >   -> hotadd_new_pgdat()
> >   ...
> >   -> node_set_online(nid)
> > 
> > When calling hotadd_new_pgdat() for a hot added node, the node is
> > offline because node_set_online() is not called yet. So if applying
> > your patches, the pgdat is not initialized in this case.
> > 
> > Thanks,
> > Yasuaki Ishimatsu
> > 
> 
> Hi Yasuaki,
> 
> I'm not quite understand, when BIOS reports memory first, why pgdat
> can not be initialized?
> When hotadd a new node, hotadd_new_pgdat() will be called too, and
> when hotadd memory to a existent node, it's no need to call 
> hotadd_new_pgdat(),
> right?
> 
> Thanks,
> Xishi Qiu
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2 V2] memory-hotplug: fix BUG_ON in move_freepages()

2015-04-19 Thread Yasuaki Ishimatsu

 When hot adding memory and creating new node, the node is offline.
 And after calling node_set_online(), the node becomes online.
 
 Oh, sorry. I misread your ptaches.
 

Please ignore it...

Thanks,
Yasuaki Ishimatsu

On 
Yasuaki Ishimatsu yasu.isim...@gmail.com wrote:

 
 When hot adding memory and creating new node, the node is offline.
 And after calling node_set_online(), the node becomes online.
 
 Oh, sorry. I misread your ptaches.
 
 Thanks,
 Yasuaki Ishimatsu
 
 On Mon, 20 Apr 2015 09:33:10 +0800
 Xishi Qiu qiuxi...@huawei.com wrote:
 
  On 2015/4/18 4:05, Yasuaki Ishimatsu wrote:
  
   
   Your patches will fix your issue.
   But, if BIOS reports memory first at node hot add, pgdat can
   not be initialized.
   
   Memory hot add flows are as follows:
   
   add_memory
 ...
 - hotadd_new_pgdat()
 ...
 - node_set_online(nid)
   
   When calling hotadd_new_pgdat() for a hot added node, the node is
   offline because node_set_online() is not called yet. So if applying
   your patches, the pgdat is not initialized in this case.
   
   Thanks,
   Yasuaki Ishimatsu
   
  
  Hi Yasuaki,
  
  I'm not quite understand, when BIOS reports memory first, why pgdat
  can not be initialized?
  When hotadd a new node, hotadd_new_pgdat() will be called too, and
  when hotadd memory to a existent node, it's no need to call 
  hotadd_new_pgdat(),
  right?
  
  Thanks,
  Xishi Qiu
  
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2 V2] memory-hotplug: fix BUG_ON in move_freepages()

2015-04-19 Thread Xishi Qiu
On 2015/4/18 4:05, Yasuaki Ishimatsu wrote:

 
 Your patches will fix your issue.
 But, if BIOS reports memory first at node hot add, pgdat can
 not be initialized.
 
 Memory hot add flows are as follows:
 
 add_memory
   ...
   - hotadd_new_pgdat()
   ...
   - node_set_online(nid)
 
 When calling hotadd_new_pgdat() for a hot added node, the node is
 offline because node_set_online() is not called yet. So if applying
 your patches, the pgdat is not initialized in this case.
 
 Thanks,
 Yasuaki Ishimatsu
 

Hi Yasuaki,

I'm not quite understand, when BIOS reports memory first, why pgdat
can not be initialized?
When hotadd a new node, hotadd_new_pgdat() will be called too, and
when hotadd memory to a existent node, it's no need to call hotadd_new_pgdat(),
right?

Thanks,
Xishi Qiu

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2 V2] memory-hotplug: fix BUG_ON in move_freepages()

2015-04-19 Thread Gu Zheng
Hi Ishimatsu, Xishi,

On 04/20/2015 10:11 AM, Yasuaki Ishimatsu wrote:

 
 When hot adding memory and creating new node, the node is offline.
 And after calling node_set_online(), the node becomes online.

 Oh, sorry. I misread your ptaches.

 
 Please ignore it...

Seems also a misread to me.
I clear it (my worry) here:
If we set the node size to 0 here, it may hidden more things than we experted,
and all the init chunks around with the size (spanned/present/managed...) will
be non-sense, and the user/caller will not get a summary of the hot added node
because of the changes here.
I am not sure the worry is necessary, please correct me if I missing something.

Regards,
Gu

 
 Thanks,
 Yasuaki Ishimatsu
 
 On 
 Yasuaki Ishimatsu yasu.isim...@gmail.com wrote:
 

 When hot adding memory and creating new node, the node is offline.
 And after calling node_set_online(), the node becomes online.

 Oh, sorry. I misread your ptaches.

 Thanks,
 Yasuaki Ishimatsu

 On Mon, 20 Apr 2015 09:33:10 +0800
 Xishi Qiu qiuxi...@huawei.com wrote:

 On 2015/4/18 4:05, Yasuaki Ishimatsu wrote:


 Your patches will fix your issue.
 But, if BIOS reports memory first at node hot add, pgdat can
 not be initialized.

 Memory hot add flows are as follows:

 add_memory
   ...
   - hotadd_new_pgdat()
   ...
   - node_set_online(nid)

 When calling hotadd_new_pgdat() for a hot added node, the node is
 offline because node_set_online() is not called yet. So if applying
 your patches, the pgdat is not initialized in this case.

 Thanks,
 Yasuaki Ishimatsu


 Hi Yasuaki,

 I'm not quite understand, when BIOS reports memory first, why pgdat
 can not be initialized?
 When hotadd a new node, hotadd_new_pgdat() will be called too, and
 when hotadd memory to a existent node, it's no need to call 
 hotadd_new_pgdat(),
 right?

 Thanks,
 Xishi Qiu

 .
 


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2 V2] memory-hotplug: fix BUG_ON in move_freepages()

2015-04-19 Thread Gu Zheng
Hi Ishimatsu, Xishi,

On 04/20/2015 10:11 AM, Yasuaki Ishimatsu wrote:

 
 When hot adding memory and creating new node, the node is offline.
 And after calling node_set_online(), the node becomes online.

 Oh, sorry. I misread your ptaches.

 
 Please ignore it...

Seems also a misread to me.
I clear it (my worry) here:
If we set the node size to 0 here, it may hidden more things than we experted.
All the init chunks around with the size (spanned/present/managed...) will
be non-sense, and the user/caller will not get a summary of the hot added node
because of the changes here.
I am not sure the worry is necessary, please correct me if I missing something.

Regards,
Gu

 
 Thanks,
 Yasuaki Ishimatsu
 
 On 
 Yasuaki Ishimatsu yasu.isim...@gmail.com wrote:
 

 When hot adding memory and creating new node, the node is offline.
 And after calling node_set_online(), the node becomes online.

 Oh, sorry. I misread your ptaches.

 Thanks,
 Yasuaki Ishimatsu

 On Mon, 20 Apr 2015 09:33:10 +0800
 Xishi Qiu qiuxi...@huawei.com wrote:

 On 2015/4/18 4:05, Yasuaki Ishimatsu wrote:


 Your patches will fix your issue.
 But, if BIOS reports memory first at node hot add, pgdat can
 not be initialized.

 Memory hot add flows are as follows:

 add_memory
   ...
   - hotadd_new_pgdat()
   ...
   - node_set_online(nid)

 When calling hotadd_new_pgdat() for a hot added node, the node is
 offline because node_set_online() is not called yet. So if applying
 your patches, the pgdat is not initialized in this case.

 Thanks,
 Yasuaki Ishimatsu


 Hi Yasuaki,

 I'm not quite understand, when BIOS reports memory first, why pgdat
 can not be initialized?
 When hotadd a new node, hotadd_new_pgdat() will be called too, and
 when hotadd memory to a existent node, it's no need to call 
 hotadd_new_pgdat(),
 right?

 Thanks,
 Xishi Qiu

 .
 


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2 V2] memory-hotplug: fix BUG_ON in move_freepages()

2015-04-19 Thread Xishi Qiu
On 2015/4/20 10:09, Gu Zheng wrote:

 Hi Ishimatsu, Xishi,
 
 On 04/20/2015 10:11 AM, Yasuaki Ishimatsu wrote:
 

 When hot adding memory and creating new node, the node is offline.
 And after calling node_set_online(), the node becomes online.

 Oh, sorry. I misread your ptaches.


 Please ignore it...
 
 Seems also a misread to me.
 I clear it (my worry) here:
 If we set the node size to 0 here, it may hidden more things than we experted.
 All the init chunks around with the size (spanned/present/managed...) will
 be non-sense, and the user/caller will not get a summary of the hot added node
 because of the changes here.
 I am not sure the worry is necessary, please correct me if I missing 
 something.
 
 Regards,
 Gu
 

Hi Gu,

My patch is just set size to 0 when hotadd a node(old or new). I know your 
worry,
but I think it is not necessary.

When we calculate the size, it uses arch_zone_lowest_possible_pfn[] and 
memblock,
and they are both from boot time. If we hotadd a new node, the calculated size 
is
0 too. When add momery, __add_zone() will grow the size and start.

Thanks,
Xishi Qiu

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2 V2] memory-hotplug: fix BUG_ON in move_freepages()

2015-04-19 Thread Xishi Qiu
On 2015/4/20 9:42, Gu Zheng wrote:

 Hi Xishi,
 On 04/18/2015 04:05 AM, Yasuaki Ishimatsu wrote:
 

 Your patches will fix your issue.
 But, if BIOS reports memory first at node hot add, pgdat can
 not be initialized.

 Memory hot add flows are as follows:

 add_memory
   ...
   - hotadd_new_pgdat()
   ...
   - node_set_online(nid)

 When calling hotadd_new_pgdat() for a hot added node, the node is
 offline because node_set_online() is not called yet. So if applying
 your patches, the pgdat is not initialized in this case.
 
 Ishimtasu's worry is reasonable. And I am afraid the fix here is a bit
 over-kill. 
 

 Thanks,
 Yasuaki Ishimatsu

 On Fri, 17 Apr 2015 18:50:32 +0800
 Xishi Qiu qiuxi...@huawei.com wrote:

 Hot remove nodeXX, then hot add nodeXX. If BIOS report cpu first, it will 
 call
 hotadd_new_pgdat(nid, 0), this will set pgdat-node_start_pfn to 0. As 
 nodeXX
 exists at boot time, so pgdat-node_spanned_pages is the same as original. 
 Then
 free_area_init_core()-memmap_init() will pass a wrong start and a nonzero 
 size.
 
 As your analysis said the root cause here is passing a *0* as the 
 node_start_pfn,
 then the chaos occurred when init the zones. And this only happens to the 
 re-hotadd
 node, so how about using the saved *node_start_pfn* (via 
 get_pfn_range_for_nid(nid, start_pfn, end_pfn))
 instead if we find pgdat-node_start_pfn == 0  !node_online(XXX)?
 
 Thanks,
 Gu
 

Hi Gu,

I first considered this method, but if the hot added node's start and size are 
different
from before, it makes the chaos.

e.g.
nodeXX (8-16G)
remove nodeXX 
BIOS report cpu first and online it
hotadd nodeXX
use the original value, so pgdat-node_start_pfn is set to 8G, and size is 8G
BIOS report mem(10-12G)
call add_memory()-__add_zone()-grow_zone_span()/grow_pgdat_span()
the start is still 8G, not 10G, this is chaos!

Thanks,
Xishi Qiu

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2 V2] memory-hotplug: fix BUG_ON in move_freepages()

2015-04-19 Thread Xishi Qiu
On 2015/4/20 11:15, Yasuaki Ishimatsu wrote:

 
 On Mon, 20 Apr 2015 10:59:37 +0800
 Xishi Qiu qiuxi...@huawei.com wrote:
 
 On 2015/4/20 10:09, Gu Zheng wrote:

 Hi Ishimatsu, Xishi,

 On 04/20/2015 10:11 AM, Yasuaki Ishimatsu wrote:


 When hot adding memory and creating new node, the node is offline.
 And after calling node_set_online(), the node becomes online.

 Oh, sorry. I misread your ptaches.


 Please ignore it...

 Seems also a misread to me.
 I clear it (my worry) here:
 If we set the node size to 0 here, it may hidden more things than we 
 experted.
 All the init chunks around with the size (spanned/present/managed...) will
 be non-sense, and the user/caller will not get a summary of the hot added 
 node
 because of the changes here.
 I am not sure the worry is necessary, please correct me if I missing 
 something.

 Regards,
 Gu


 Hi Gu,

 My patch is just set size to 0 when hotadd a node(old or new). I know your 
 worry,
 but I think it is not necessary.

 
 When we calculate the size, it uses arch_zone_lowest_possible_pfn[] and 
 memblock,
 and they are both from boot time. If we hotadd a new node, the calculated 
 size is
 0 too. When add momery, __add_zone() will grow the size and start.
 
 If hot adding new node, you are right. But if hot removing a memory which
 is presented at boot time, memblock of the memory range is not deleted.
 So when hot adding the memory, the calculated size does not become 0.
 

Yes, so I just set it to 0, init_currently_empty_zone() and memmap_init() will 
be called
in __add_zone(), and start/size also will be grow there.

Thanks,
Xishi Qiu

 Thanks,
 Yasuaki Ishimatsu
 

 Thanks,
 Xishi Qiu

 
 .
 



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2 V2] memory-hotplug: fix BUG_ON in move_freepages()

2015-04-19 Thread Yasuaki Ishimatsu

When hot adding memory and creating new node, the node is offline.
And after calling node_set_online(), the node becomes online.

Oh, sorry. I misread your ptaches.

Thanks,
Yasuaki Ishimatsu

On Mon, 20 Apr 2015 09:33:10 +0800
Xishi Qiu qiuxi...@huawei.com wrote:

 On 2015/4/18 4:05, Yasuaki Ishimatsu wrote:
 
  
  Your patches will fix your issue.
  But, if BIOS reports memory first at node hot add, pgdat can
  not be initialized.
  
  Memory hot add flows are as follows:
  
  add_memory
...
- hotadd_new_pgdat()
...
- node_set_online(nid)
  
  When calling hotadd_new_pgdat() for a hot added node, the node is
  offline because node_set_online() is not called yet. So if applying
  your patches, the pgdat is not initialized in this case.
  
  Thanks,
  Yasuaki Ishimatsu
  
 
 Hi Yasuaki,
 
 I'm not quite understand, when BIOS reports memory first, why pgdat
 can not be initialized?
 When hotadd a new node, hotadd_new_pgdat() will be called too, and
 when hotadd memory to a existent node, it's no need to call 
 hotadd_new_pgdat(),
 right?
 
 Thanks,
 Xishi Qiu
 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2 V2] memory-hotplug: fix BUG_ON in move_freepages()

2015-04-19 Thread Gu Zheng
Hi Xishi,
On 04/18/2015 04:05 AM, Yasuaki Ishimatsu wrote:

 
 Your patches will fix your issue.
 But, if BIOS reports memory first at node hot add, pgdat can
 not be initialized.
 
 Memory hot add flows are as follows:
 
 add_memory
   ...
   - hotadd_new_pgdat()
   ...
   - node_set_online(nid)
 
 When calling hotadd_new_pgdat() for a hot added node, the node is
 offline because node_set_online() is not called yet. So if applying
 your patches, the pgdat is not initialized in this case.

Ishimtasu's worry is reasonable. And I am afraid the fix here is a bit
over-kill. 

 
 Thanks,
 Yasuaki Ishimatsu
 
 On Fri, 17 Apr 2015 18:50:32 +0800
 Xishi Qiu qiuxi...@huawei.com wrote:
 
 Hot remove nodeXX, then hot add nodeXX. If BIOS report cpu first, it will 
 call
 hotadd_new_pgdat(nid, 0), this will set pgdat-node_start_pfn to 0. As nodeXX
 exists at boot time, so pgdat-node_spanned_pages is the same as original. 
 Then
 free_area_init_core()-memmap_init() will pass a wrong start and a nonzero 
 size.

As your analysis said the root cause here is passing a *0* as the 
node_start_pfn,
then the chaos occurred when init the zones. And this only happens to the 
re-hotadd
node, so how about using the saved *node_start_pfn* (via 
get_pfn_range_for_nid(nid, start_pfn, end_pfn))
instead if we find pgdat-node_start_pfn == 0  !node_online(XXX)?

Thanks,
Gu


 free_area_init_core()
  memmap_init()
  memmap_init_zone()
  early_pfn_in_nid()
  set_page_links()

 if (!early_pfn_in_nid(pfn, nid)) will skip the pfn(memory in section), but 
 it
 will not skip the pfn(hole in section), this will cover and relink the page 
 to
 zone/nid, so page_zone() from memory and hole in the same section are 
 different.
 The following call trace shows the bug.

 This patch will set the node size to 0 when hotadd a new node(original or 
 new).
 init_currently_empty_zone() and memmap_init() will be called in add_zone(), 
 so
 need not to change it.

 [90476.077469] kernel BUG at mm/page_alloc.c:1042!  // move_freepages() - 
 BUG_ON(page_zone(start_page) != page_zone(end_page));
 [90476.077469] invalid opcode:  [#1] SMP 
 [90476.077469] Modules linked in: iptable_nat nf_conntrack_ipv4 
 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack fuse btrfs zlib_deflate 
 raid6_pq xor msdos ext4 mbcache jbd2 binfmt_misc bridge stp llc 
 ip6table_filter ip6_tables iptable_filter ip_tables ebtable_nat ebtables 
 cfg80211 rfkill sg iTCO_wdt iTCO_vendor_support intel_powerclamp coretemp 
 intel_rapl kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32c_intel 
 ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd 
 pcspkr igb vfat i2c_algo_bit dca fat sb_edac edac_core i2c_i801 lpc_ich 
 i2c_core mfd_core shpchp acpi_pad ipmi_si ipmi_msghandler uinput nfsd 
 auth_rpcgss nfs_acl lockd sunrpc xfs libcrc32c sd_mod crc_t10dif 
 crct10dif_common ahci libahci megaraid_sas tg3 ptp libata pps_core dm_mirror 
 dm_region_hash dm_log dm_mod [last unloaded: rasf]
 [90476.157382] CPU: 2 PID: 322803 Comm: updatedb Tainted: GF   W  
 O--   3.10.0-229.1.2.5.hulk.rc14.x86_64 #1
 [90476.157382] Hardware name: HUAWEI TECHNOLOGIES CO.,LTD. Huawei N1/Huawei 
 N1, BIOS V100R001 04/13/2015
 [90476.157382] task: 88006a6d5b00 ti: 880068eb8000 task.ti: 
 880068eb8000
 [90476.157382] RIP: 0010:[81159f7f]  [81159f7f] 
 move_freepages+0x12f/0x140
 [90476.157382] RSP: 0018:880068ebb640  EFLAGS: 00010002
 [90476.157382] RAX: 880002316cc0 RBX: ea0001bd RCX: 
 0001
 [90476.157382] RDX: 880002476e40 RSI:  RDI: 
 880002316cc0
 [90476.157382] RBP: 880068ebb690 R08: 0010 R09: 
 ea0001bd7fc0
 [90476.157382] R10: 0006f5ff R11:  R12: 
 0001
 [90476.157382] R13: 0003 R14: 880002316eb8 R15: 
 ea0001bd7fc0
 [90476.157382] FS:  7f4d3ab95740() GS:880033a0() 
 knlGS:
 [90476.157382] CS:  0010 DS:  ES:  CR0: 80050033
 [90476.157382] CR2: 7f4d3ae1a808 CR3: 00018907a000 CR4: 
 001407e0
 [90476.157382] DR0:  DR1:  DR2: 
 
 [90476.157382] DR3:  DR6: fffe0ff0 DR7: 
 0400
 [90476.157382] Stack:
 [90476.157382]  880068ebb698 880002316cc0 a800b5378098 
 880068ebb698
 [90476.157382]  810b11dc 880002316cc0 0001 
 0003
 [90476.157382]  880002316eb8 ea0001bd6420 880068ebb6a0 
 8115a003
 [90476.157382] Call Trace:
 [90476.157382]  [810b11dc] ? update_curr+0xcc/0x150
 [90476.157382]  [8115a003] move_freepages_block+0x73/0x80
 [90476.157382]  [8115b9ba] __rmqueue+0x26a/0x460
 [90476.157382]  [8101ba53] ? native_sched_clock+0x13/0x80
 [90476.157382]  [8115e172] get_page_from_freelist+0x7f2/0xd30
 [90476.157382]  

Re: [PATCH 1/2 V2] memory-hotplug: fix BUG_ON in move_freepages()

2015-04-19 Thread Yasuaki Ishimatsu

On Mon, 20 Apr 2015 09:33:10 +0800
Xishi Qiu qiuxi...@huawei.com wrote:

 On 2015/4/18 4:05, Yasuaki Ishimatsu wrote:
 
  
  Your patches will fix your issue.
  But, if BIOS reports memory first at node hot add, pgdat can
  not be initialized.
  
  Memory hot add flows are as follows:
  
  add_memory
...
- hotadd_new_pgdat()
...
- node_set_online(nid)
  
  When calling hotadd_new_pgdat() for a hot added node, the node is
  offline because node_set_online() is not called yet. So if applying
  your patches, the pgdat is not initialized in this case.
  
  Thanks,
  Yasuaki Ishimatsu
  
 
 Hi Yasuaki,
 

 I'm not quite understand, when BIOS reports memory first, why pgdat
 can not be initialized?
 When hotadd a new node, hotadd_new_pgdat() will be called too, and
 when hotadd memory to a existent node, it's no need to call 
 hotadd_new_pgdat(),
 right?

Your patch sikps initialization of pgdat, when node is offline.
But when hot adding new node and calling hotadd_new_pgdat(), the node
is offline yet. So pgdat is not initialized. 

Thanks,
Yasuaki Ishimatsu

 
 Thanks,
 Xishi Qiu
 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2 V2] memory-hotplug: fix BUG_ON in move_freepages()

2015-04-19 Thread Yasuaki Ishimatsu

On Mon, 20 Apr 2015 10:59:37 +0800
Xishi Qiu qiuxi...@huawei.com wrote:

 On 2015/4/20 10:09, Gu Zheng wrote:
 
  Hi Ishimatsu, Xishi,
  
  On 04/20/2015 10:11 AM, Yasuaki Ishimatsu wrote:
  
 
  When hot adding memory and creating new node, the node is offline.
  And after calling node_set_online(), the node becomes online.
 
  Oh, sorry. I misread your ptaches.
 
 
  Please ignore it...
  
  Seems also a misread to me.
  I clear it (my worry) here:
  If we set the node size to 0 here, it may hidden more things than we 
  experted.
  All the init chunks around with the size (spanned/present/managed...) will
  be non-sense, and the user/caller will not get a summary of the hot added 
  node
  because of the changes here.
  I am not sure the worry is necessary, please correct me if I missing 
  something.
  
  Regards,
  Gu
  
 
 Hi Gu,
 
 My patch is just set size to 0 when hotadd a node(old or new). I know your 
 worry,
 but I think it is not necessary.
 

 When we calculate the size, it uses arch_zone_lowest_possible_pfn[] and 
 memblock,
 and they are both from boot time. If we hotadd a new node, the calculated 
 size is
 0 too. When add momery, __add_zone() will grow the size and start.

If hot adding new node, you are right. But if hot removing a memory which
is presented at boot time, memblock of the memory range is not deleted.
So when hot adding the memory, the calculated size does not become 0.

Thanks,
Yasuaki Ishimatsu

 
 Thanks,
 Xishi Qiu
 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2 V2] memory-hotplug: fix BUG_ON in move_freepages()

2015-04-19 Thread Yasuaki Ishimatsu

On Mon, 20 Apr 2015 10:45:45 +0800
Xishi Qiu qiuxi...@huawei.com wrote:

 On 2015/4/20 9:42, Gu Zheng wrote:
 
  Hi Xishi,
  On 04/18/2015 04:05 AM, Yasuaki Ishimatsu wrote:
  
 
  Your patches will fix your issue.
  But, if BIOS reports memory first at node hot add, pgdat can
  not be initialized.
 
  Memory hot add flows are as follows:
 
  add_memory
...
- hotadd_new_pgdat()
...
- node_set_online(nid)
 
  When calling hotadd_new_pgdat() for a hot added node, the node is
  offline because node_set_online() is not called yet. So if applying
  your patches, the pgdat is not initialized in this case.
  
  Ishimtasu's worry is reasonable. And I am afraid the fix here is a bit
  over-kill. 
  
 
  Thanks,
  Yasuaki Ishimatsu
 
  On Fri, 17 Apr 2015 18:50:32 +0800
  Xishi Qiu qiuxi...@huawei.com wrote:
 
  Hot remove nodeXX, then hot add nodeXX. If BIOS report cpu first, it will 
  call
  hotadd_new_pgdat(nid, 0), this will set pgdat-node_start_pfn to 0. As 
  nodeXX
  exists at boot time, so pgdat-node_spanned_pages is the same as 
  original. Then
  free_area_init_core()-memmap_init() will pass a wrong start and a 
  nonzero size.
  
  As your analysis said the root cause here is passing a *0* as the 
  node_start_pfn,
  then the chaos occurred when init the zones. And this only happens to the 
  re-hotadd
  node, so how about using the saved *node_start_pfn* (via 
  get_pfn_range_for_nid(nid, start_pfn, end_pfn))
  instead if we find pgdat-node_start_pfn == 0  !node_online(XXX)?
  
  Thanks,
  Gu
  
 
 Hi Gu,
 
 I first considered this method, but if the hot added node's start and size 
 are different
 from before, it makes the chaos.
 

 e.g.
 nodeXX (8-16G)
 remove nodeXX 
 BIOS report cpu first and online it
 hotadd nodeXX
 use the original value, so pgdat-node_start_pfn is set to 8G, and size is 8G
 BIOS report mem(10-12G)
 call add_memory()-__add_zone()-grow_zone_span()/grow_pgdat_span()
 the start is still 8G, not 10G, this is chaos!

If you set CONFIG_HAVE_MEMBLOCK_NODE_MAP, kernel shows the following
pr_info()'s message.

void __paginginit free_area_init_node(int nid, unsigned long *zones_size,
unsigned long node_start_pfn, unsigned long *zholes_size)
{
...
#ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
get_pfn_range_for_nid(nid, start_pfn, end_pfn);
pr_info(Initmem setup node %d [mem %#018Lx-%#018Lx]\n, nid,
(u64)start_pfn  PAGE_SHIFT, ((u64)end_pfn  PAGE_SHIFT) - 1);
#endif
}

Is the memory range of the message 8G - 16G?
If so, the reason is that memblk is not deleted at memory hot remove.

Thanks,
Yasuaki Ishimatsu



 
 Thanks,
 Xishi Qiu
 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2 V2] memory-hotplug: fix BUG_ON in move_freepages()

2015-04-19 Thread Xishi Qiu
On 2015/4/20 11:29, Yasuaki Ishimatsu wrote:

 
 On Mon, 20 Apr 2015 10:45:45 +0800
 Xishi Qiu qiuxi...@huawei.com wrote:
 
 On 2015/4/20 9:42, Gu Zheng wrote:

 Hi Xishi,
 On 04/18/2015 04:05 AM, Yasuaki Ishimatsu wrote:


 Your patches will fix your issue.
 But, if BIOS reports memory first at node hot add, pgdat can
 not be initialized.

 Memory hot add flows are as follows:

 add_memory
   ...
   - hotadd_new_pgdat()
   ...
   - node_set_online(nid)

 When calling hotadd_new_pgdat() for a hot added node, the node is
 offline because node_set_online() is not called yet. So if applying
 your patches, the pgdat is not initialized in this case.

 Ishimtasu's worry is reasonable. And I am afraid the fix here is a bit
 over-kill. 


 Thanks,
 Yasuaki Ishimatsu

 On Fri, 17 Apr 2015 18:50:32 +0800
 Xishi Qiu qiuxi...@huawei.com wrote:

 Hot remove nodeXX, then hot add nodeXX. If BIOS report cpu first, it will 
 call
 hotadd_new_pgdat(nid, 0), this will set pgdat-node_start_pfn to 0. As 
 nodeXX
 exists at boot time, so pgdat-node_spanned_pages is the same as 
 original. Then
 free_area_init_core()-memmap_init() will pass a wrong start and a 
 nonzero size.

 As your analysis said the root cause here is passing a *0* as the 
 node_start_pfn,
 then the chaos occurred when init the zones. And this only happens to the 
 re-hotadd
 node, so how about using the saved *node_start_pfn* (via 
 get_pfn_range_for_nid(nid, start_pfn, end_pfn))
 instead if we find pgdat-node_start_pfn == 0  !node_online(XXX)?

 Thanks,
 Gu


 Hi Gu,

 I first considered this method, but if the hot added node's start and size 
 are different
 from before, it makes the chaos.

 
 e.g.
 nodeXX (8-16G)
 remove nodeXX 
 BIOS report cpu first and online it
 hotadd nodeXX
 use the original value, so pgdat-node_start_pfn is set to 8G, and size is 8G
 BIOS report mem(10-12G)
 call add_memory()-__add_zone()-grow_zone_span()/grow_pgdat_span()
 the start is still 8G, not 10G, this is chaos!
 
 If you set CONFIG_HAVE_MEMBLOCK_NODE_MAP, kernel shows the following
 pr_info()'s message.
 
 void __paginginit free_area_init_node(int nid, unsigned long *zones_size,
 unsigned long node_start_pfn, unsigned long *zholes_size)
 {
 ...
 #ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
 get_pfn_range_for_nid(nid, start_pfn, end_pfn);
 pr_info(Initmem setup node %d [mem %#018Lx-%#018Lx]\n, nid,
 (u64)start_pfn  PAGE_SHIFT, ((u64)end_pfn  PAGE_SHIFT) - 
 1);
 #endif
 }
 
 Is the memory range of the message 8G - 16G?
 If so, the reason is that memblk is not deleted at memory hot remove.
 
 Thanks,
 Yasuaki Ishimatsu
 

Hi Yasuaki,

By reading the code, I find memblk is not deleted at memory hot remove.
I am not sure whether we should remove it. If remove it, we should also reset
arch_zone_lowest_possible_pfn, right? It seems a little complicated.

Thanks,
Xishi Qiu

 
 

 Thanks,
 Xishi Qiu

 
 .
 



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2 V2] memory-hotplug: fix BUG_ON in move_freepages()

2015-04-17 Thread Yasuaki Ishimatsu

Your patches will fix your issue.
But, if BIOS reports memory first at node hot add, pgdat can
not be initialized.

Memory hot add flows are as follows:

add_memory
  ...
  -> hotadd_new_pgdat()
  ...
  -> node_set_online(nid)

When calling hotadd_new_pgdat() for a hot added node, the node is
offline because node_set_online() is not called yet. So if applying
your patches, the pgdat is not initialized in this case.

Thanks,
Yasuaki Ishimatsu

On Fri, 17 Apr 2015 18:50:32 +0800
Xishi Qiu  wrote:

> Hot remove nodeXX, then hot add nodeXX. If BIOS report cpu first, it will call
> hotadd_new_pgdat(nid, 0), this will set pgdat->node_start_pfn to 0. As nodeXX
> exists at boot time, so pgdat->node_spanned_pages is the same as original. 
> Then
> free_area_init_core()->memmap_init() will pass a wrong start and a nonzero 
> size.
> 
> free_area_init_core()
>   memmap_init()
>   memmap_init_zone()
>   early_pfn_in_nid()
>   set_page_links()
> 
> "if (!early_pfn_in_nid(pfn, nid))" will skip the pfn(memory in section), but 
> it
> will not skip the pfn(hole in section), this will cover and relink the page to
> zone/nid, so page_zone() from memory and hole in the same section are 
> different.
> The following call trace shows the bug.
> 
> This patch will set the node size to 0 when hotadd a new node(original or 
> new).
> init_currently_empty_zone() and memmap_init() will be called in add_zone(), so
> need not to change it.
> 
> [90476.077469] kernel BUG at mm/page_alloc.c:1042!  // move_freepages() -> 
> BUG_ON(page_zone(start_page) != page_zone(end_page));
> [90476.077469] invalid opcode:  [#1] SMP 
> [90476.077469] Modules linked in: iptable_nat nf_conntrack_ipv4 
> nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack fuse btrfs zlib_deflate 
> raid6_pq xor msdos ext4 mbcache jbd2 binfmt_misc bridge stp llc 
> ip6table_filter ip6_tables iptable_filter ip_tables ebtable_nat ebtables 
> cfg80211 rfkill sg iTCO_wdt iTCO_vendor_support intel_powerclamp coretemp 
> intel_rapl kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32c_intel 
> ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd 
> pcspkr igb vfat i2c_algo_bit dca fat sb_edac edac_core i2c_i801 lpc_ich 
> i2c_core mfd_core shpchp acpi_pad ipmi_si ipmi_msghandler uinput nfsd 
> auth_rpcgss nfs_acl lockd sunrpc xfs libcrc32c sd_mod crc_t10dif 
> crct10dif_common ahci libahci megaraid_sas tg3 ptp libata pps_core dm_mirror 
> dm_region_hash dm_log dm_mod [last unloaded: rasf]
> [90476.157382] CPU: 2 PID: 322803 Comm: updatedb Tainted: GF   W  
> O--   3.10.0-229.1.2.5.hulk.rc14.x86_64 #1
> [90476.157382] Hardware name: HUAWEI TECHNOLOGIES CO.,LTD. Huawei N1/Huawei 
> N1, BIOS V100R001 04/13/2015
> [90476.157382] task: 88006a6d5b00 ti: 880068eb8000 task.ti: 
> 880068eb8000
> [90476.157382] RIP: 0010:[]  [] 
> move_freepages+0x12f/0x140
> [90476.157382] RSP: 0018:880068ebb640  EFLAGS: 00010002
> [90476.157382] RAX: 880002316cc0 RBX: ea0001bd RCX: 
> 0001
> [90476.157382] RDX: 880002476e40 RSI:  RDI: 
> 880002316cc0
> [90476.157382] RBP: 880068ebb690 R08: 0010 R09: 
> ea0001bd7fc0
> [90476.157382] R10: 0006f5ff R11:  R12: 
> 0001
> [90476.157382] R13: 0003 R14: 880002316eb8 R15: 
> ea0001bd7fc0
> [90476.157382] FS:  7f4d3ab95740() GS:880033a0() 
> knlGS:
> [90476.157382] CS:  0010 DS:  ES:  CR0: 80050033
> [90476.157382] CR2: 7f4d3ae1a808 CR3: 00018907a000 CR4: 
> 001407e0
> [90476.157382] DR0:  DR1:  DR2: 
> 
> [90476.157382] DR3:  DR6: fffe0ff0 DR7: 
> 0400
> [90476.157382] Stack:
> [90476.157382]  880068ebb698 880002316cc0 a800b5378098 
> 880068ebb698
> [90476.157382]  810b11dc 880002316cc0 0001 
> 0003
> [90476.157382]  880002316eb8 ea0001bd6420 880068ebb6a0 
> 8115a003
> [90476.157382] Call Trace:
> [90476.157382]  [] ? update_curr+0xcc/0x150
> [90476.157382]  [] move_freepages_block+0x73/0x80
> [90476.157382]  [] __rmqueue+0x26a/0x460
> [90476.157382]  [] ? native_sched_clock+0x13/0x80
> [90476.157382]  [] get_page_from_freelist+0x7f2/0xd30
> [90476.157382]  [] ? __switch_to+0x179/0x4a0
> [90476.157382]  [] ? xfs_iext_bno_to_ext+0xa7/0x1a0 [xfs]
> [90476.157382]  [] __alloc_pages_nodemask+0x1c1/0xc90
> [90476.157382]  [] ? _xfs_buf_ioapply+0x31c/0x420 [xfs]
> [90476.157382]  [] ? down_trylock+0x2d/0x40
> [90476.157382]  [] ? xfs_buf_trylock+0x1f/0x80 [xfs]
> [90476.157382]  [] alloc_pages_current+0xa9/0x170
> [90476.157382]  [] new_slab+0x275/0x300
> [90476.157382]  [] __slab_alloc+0x315/0x48f
> [90476.157382]  [] ? kmem_zone_alloc+0x77/0x100 [xfs]
> [90476.157382]  [] ? xfs_bmap_search_extents+0x5c/0xc0 [xfs]
> 

Re: [PATCH 1/2 V2] memory-hotplug: fix BUG_ON in move_freepages()

2015-04-17 Thread Yasuaki Ishimatsu

Your patches will fix your issue.
But, if BIOS reports memory first at node hot add, pgdat can
not be initialized.

Memory hot add flows are as follows:

add_memory
  ...
  - hotadd_new_pgdat()
  ...
  - node_set_online(nid)

When calling hotadd_new_pgdat() for a hot added node, the node is
offline because node_set_online() is not called yet. So if applying
your patches, the pgdat is not initialized in this case.

Thanks,
Yasuaki Ishimatsu

On Fri, 17 Apr 2015 18:50:32 +0800
Xishi Qiu qiuxi...@huawei.com wrote:

 Hot remove nodeXX, then hot add nodeXX. If BIOS report cpu first, it will call
 hotadd_new_pgdat(nid, 0), this will set pgdat-node_start_pfn to 0. As nodeXX
 exists at boot time, so pgdat-node_spanned_pages is the same as original. 
 Then
 free_area_init_core()-memmap_init() will pass a wrong start and a nonzero 
 size.
 
 free_area_init_core()
   memmap_init()
   memmap_init_zone()
   early_pfn_in_nid()
   set_page_links()
 
 if (!early_pfn_in_nid(pfn, nid)) will skip the pfn(memory in section), but 
 it
 will not skip the pfn(hole in section), this will cover and relink the page to
 zone/nid, so page_zone() from memory and hole in the same section are 
 different.
 The following call trace shows the bug.
 
 This patch will set the node size to 0 when hotadd a new node(original or 
 new).
 init_currently_empty_zone() and memmap_init() will be called in add_zone(), so
 need not to change it.
 
 [90476.077469] kernel BUG at mm/page_alloc.c:1042!  // move_freepages() - 
 BUG_ON(page_zone(start_page) != page_zone(end_page));
 [90476.077469] invalid opcode:  [#1] SMP 
 [90476.077469] Modules linked in: iptable_nat nf_conntrack_ipv4 
 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack fuse btrfs zlib_deflate 
 raid6_pq xor msdos ext4 mbcache jbd2 binfmt_misc bridge stp llc 
 ip6table_filter ip6_tables iptable_filter ip_tables ebtable_nat ebtables 
 cfg80211 rfkill sg iTCO_wdt iTCO_vendor_support intel_powerclamp coretemp 
 intel_rapl kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32c_intel 
 ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd 
 pcspkr igb vfat i2c_algo_bit dca fat sb_edac edac_core i2c_i801 lpc_ich 
 i2c_core mfd_core shpchp acpi_pad ipmi_si ipmi_msghandler uinput nfsd 
 auth_rpcgss nfs_acl lockd sunrpc xfs libcrc32c sd_mod crc_t10dif 
 crct10dif_common ahci libahci megaraid_sas tg3 ptp libata pps_core dm_mirror 
 dm_region_hash dm_log dm_mod [last unloaded: rasf]
 [90476.157382] CPU: 2 PID: 322803 Comm: updatedb Tainted: GF   W  
 O--   3.10.0-229.1.2.5.hulk.rc14.x86_64 #1
 [90476.157382] Hardware name: HUAWEI TECHNOLOGIES CO.,LTD. Huawei N1/Huawei 
 N1, BIOS V100R001 04/13/2015
 [90476.157382] task: 88006a6d5b00 ti: 880068eb8000 task.ti: 
 880068eb8000
 [90476.157382] RIP: 0010:[81159f7f]  [81159f7f] 
 move_freepages+0x12f/0x140
 [90476.157382] RSP: 0018:880068ebb640  EFLAGS: 00010002
 [90476.157382] RAX: 880002316cc0 RBX: ea0001bd RCX: 
 0001
 [90476.157382] RDX: 880002476e40 RSI:  RDI: 
 880002316cc0
 [90476.157382] RBP: 880068ebb690 R08: 0010 R09: 
 ea0001bd7fc0
 [90476.157382] R10: 0006f5ff R11:  R12: 
 0001
 [90476.157382] R13: 0003 R14: 880002316eb8 R15: 
 ea0001bd7fc0
 [90476.157382] FS:  7f4d3ab95740() GS:880033a0() 
 knlGS:
 [90476.157382] CS:  0010 DS:  ES:  CR0: 80050033
 [90476.157382] CR2: 7f4d3ae1a808 CR3: 00018907a000 CR4: 
 001407e0
 [90476.157382] DR0:  DR1:  DR2: 
 
 [90476.157382] DR3:  DR6: fffe0ff0 DR7: 
 0400
 [90476.157382] Stack:
 [90476.157382]  880068ebb698 880002316cc0 a800b5378098 
 880068ebb698
 [90476.157382]  810b11dc 880002316cc0 0001 
 0003
 [90476.157382]  880002316eb8 ea0001bd6420 880068ebb6a0 
 8115a003
 [90476.157382] Call Trace:
 [90476.157382]  [810b11dc] ? update_curr+0xcc/0x150
 [90476.157382]  [8115a003] move_freepages_block+0x73/0x80
 [90476.157382]  [8115b9ba] __rmqueue+0x26a/0x460
 [90476.157382]  [8101ba53] ? native_sched_clock+0x13/0x80
 [90476.157382]  [8115e172] get_page_from_freelist+0x7f2/0xd30
 [90476.157382]  [81012639] ? __switch_to+0x179/0x4a0
 [90476.157382]  [a01fc0d7] ? xfs_iext_bno_to_ext+0xa7/0x1a0 [xfs]
 [90476.157382]  [8115e871] __alloc_pages_nodemask+0x1c1/0xc90
 [90476.157382]  [a01ab24c] ? _xfs_buf_ioapply+0x31c/0x420 [xfs]
 [90476.157382]  [8109cb0d] ? down_trylock+0x2d/0x40
 [90476.157382]  [a01abfff] ? xfs_buf_trylock+0x1f/0x80 [xfs]
 [90476.157382]  [8119d229] alloc_pages_current+0xa9/0x170
 [90476.157382]  [811a7225] new_slab+0x275/0x300