Re: [5.6.0-rc2-next-20200218/powerpc] Boot failure on POWER9

2020-03-14 Thread Sachin Sant


> On 13-Mar-2020, at 5:05 PM, Vlastimil Babka  wrote:
> 
> On 3/13/20 12:12 PM, Srikar Dronamraju wrote:
>> * Michael Ellerman  [2020-03-13 21:48:06]:
>> 
>>> Sachin Sant  writes:
> The patch below might work. Sachin can you test this? I tried faking up
> a system with a memoryless node zero but couldn't get it to even start
> booting.
> 
 The patch did not help. The kernel crashed during
 the boot with the same call trace.
 
 BUG_ON() introduced with the patch was not triggered.
>>> 
>>> OK, that's weird.
>>> 
>>> I eventually managed to get a memoryless node going in sim, and it
>>> appears to work there.
>>> 
>>> eg in dmesg:
>>> 
>>>  [0.00][T0] numa:   NODE_DATA [mem 
>>> 0x2000fffa2f80-0x2000fffa7fff]
>>>  [0.00][T0] numa: NODE_DATA(0) on node 1
>>>  [0.00][T0] numa:   NODE_DATA [mem 
>>> 0x2000fff9df00-0x2000fffa2f7f]
>>>  ...
>>>  [0.00][T0] Early memory node ranges
>>>  [0.00][T0]   node   1: [mem 
>>> 0x-0x]
>>>  [0.00][T0]   node   1: [mem 
>>> 0x2000-0x2000]
>>>  [0.00][T0] Could not find start_pfn for node 0
>>>  [0.00][T0] Initmem setup node 0 [mem 
>>> 0x-0x]
>>>  [0.00][T0] On node 0 totalpages: 0
>>>  [0.00][T0] Initmem setup node 1 [mem 
>>> 0x-0x2000]
>>>  [0.00][T0] On node 1 totalpages: 131072
>>> 
>>>  # dmesg | grep set_numa
>>>  [0.00][T0] set_numa_mem: mem node for 0 = 1
>>>  [0.005654][T0] set_numa_mem: mem node for 1 = 1
>>> 
>>> So is the problem more than just node zero having no memory?
>>> 

I tried with just the patch Michael suggested on top of March 13 next tree.
I still see the same failure. Here is a snippet from the log

[0.00] numa:   NODE_DATA [mem 0x8bfedc900-0x8bfee3fff]
[0.00] numa: NODE_DATA(0) on node 1
[0.00] numa:   NODE_DATA [mem 0x8bfed5200-0x8bfedc8ff]
[0.00] rfi-flush: fallback displacement flush available
[0.00] rfi-flush: mttrig type flush available
[0.00] link-stack-flush: software flush enabled.
[0.00] count-cache-flush: software flush disabled.
[0.00] stf-barrier: eieio barrier available
[0.00] lpar: H_BLOCK_REMOVE supports base psize:0 psize:0 block size:8
[0.00] lpar: H_BLOCK_REMOVE supports base psize:0 psize:2 block size:8
[0.00] lpar: H_BLOCK_REMOVE supports base psize:0 psize:10 block size:8
[0.00] lpar: H_BLOCK_REMOVE supports base psize:2 psize:2 block size:8
[0.00] lpar: H_BLOCK_REMOVE supports base psize:2 psize:10 block size:8
[0.00] PPC64 nvram contains 15360 bytes
[0.00] barrier-nospec: using ORI speculation barrier
[0.00] Zone ranges:
[0.00]   Normal   [mem 0x-0x0008bfff]
[0.00]   Device   empty
[0.00] Movable zone start for each node
[0.00] Early memory node ranges
[0.00]   node   1: [mem 0x-0x0008bfff]
[0.00] Could not find start_pfn for node 0
[0.00] Initmem setup node 0 [mem 0x-0x]
[0.00] Initmem setup node 1 [mem 0x-0x0008bfff]
[0.00] percpu: Embedded 11 pages/cpu s624024 r0 d96872 u1048576
[0.00] Built 2 zonelists, mobility grouping on.  Total pages: 572880

Have attached the complete boot log.

Thanks
-Sachin



kernel-boot.log
Description: Binary data


Re: [5.6.0-rc2-next-20200218/powerpc] Boot failure on POWER9

2020-03-13 Thread Vlastimil Babka
On 3/13/20 12:12 PM, Srikar Dronamraju wrote:
> * Michael Ellerman  [2020-03-13 21:48:06]:
> 
>> Sachin Sant  writes:
>> >> The patch below might work. Sachin can you test this? I tried faking up
>> >> a system with a memoryless node zero but couldn't get it to even start
>> >> booting.
>> >> 
>> > The patch did not help. The kernel crashed during
>> > the boot with the same call trace.
>> >
>> > BUG_ON() introduced with the patch was not triggered.
>> 
>> OK, that's weird.
>> 
>> I eventually managed to get a memoryless node going in sim, and it
>> appears to work there.
>> 
>> eg in dmesg:
>> 
>>   [0.00][T0] numa:   NODE_DATA [mem 
>> 0x2000fffa2f80-0x2000fffa7fff]
>>   [0.00][T0] numa: NODE_DATA(0) on node 1
>>   [0.00][T0] numa:   NODE_DATA [mem 
>> 0x2000fff9df00-0x2000fffa2f7f]
>>   ...
>>   [0.00][T0] Early memory node ranges
>>   [0.00][T0]   node   1: [mem 
>> 0x-0x]
>>   [0.00][T0]   node   1: [mem 
>> 0x2000-0x2000]
>>   [0.00][T0] Could not find start_pfn for node 0
>>   [0.00][T0] Initmem setup node 0 [mem 
>> 0x-0x]
>>   [0.00][T0] On node 0 totalpages: 0
>>   [0.00][T0] Initmem setup node 1 [mem 
>> 0x-0x2000]
>>   [0.00][T0] On node 1 totalpages: 131072
>>   
>>   # dmesg | grep set_numa
>>   [0.00][T0] set_numa_mem: mem node for 0 = 1
>>   [0.005654][T0] set_numa_mem: mem node for 1 = 1
>> 
>> So is the problem more than just node zero having no memory?
>> 
> 
> The problem would happen with possible nodes which are not yet present. i.e
> no cpus, no memory attached to those nodes.
> 
> Please look at
> http://lore.kernel.org/lkml/20200312131438.gb3...@linux.vnet.ibm.com/t/#u
> for more details.
> 
> The summary being: pgdat/Node_Data for such nodes is not allocated. Hence

Michael's log shows that his pgdat is still allocated. But perhaps Sachin had
also your 3 patches from the other thread applied, in addition to Michael's
patch. So in his case pgdat for node 0 would indeed be no longer allocated, and
thus SLUB code was crashing in node_present_pages() instead.

> the node_present_pages(nid) called  where nid is a possible but not yet
> present node fails. Currently node_present_pages(nid) and node_to_mem_node
> don't seem to be equipped to handle possible but not present nodes.
> 
>> cheers
> 



Re: [5.6.0-rc2-next-20200218/powerpc] Boot failure on POWER9

2020-03-13 Thread Srikar Dronamraju
* Michael Ellerman  [2020-03-13 21:48:06]:

> Sachin Sant  writes:
> >> The patch below might work. Sachin can you test this? I tried faking up
> >> a system with a memoryless node zero but couldn't get it to even start
> >> booting.
> >> 
> > The patch did not help. The kernel crashed during
> > the boot with the same call trace.
> >
> > BUG_ON() introduced with the patch was not triggered.
> 
> OK, that's weird.
> 
> I eventually managed to get a memoryless node going in sim, and it
> appears to work there.
> 
> eg in dmesg:
> 
>   [0.00][T0] numa:   NODE_DATA [mem 0x2000fffa2f80-0x2000fffa7fff]
>   [0.00][T0] numa: NODE_DATA(0) on node 1
>   [0.00][T0] numa:   NODE_DATA [mem 0x2000fff9df00-0x2000fffa2f7f]
>   ...
>   [0.00][T0] Early memory node ranges
>   [0.00][T0]   node   1: [mem 
> 0x-0x]
>   [0.00][T0]   node   1: [mem 
> 0x2000-0x2000]
>   [0.00][T0] Could not find start_pfn for node 0
>   [0.00][T0] Initmem setup node 0 [mem 
> 0x-0x]
>   [0.00][T0] On node 0 totalpages: 0
>   [0.00][T0] Initmem setup node 1 [mem 
> 0x-0x2000]
>   [0.00][T0] On node 1 totalpages: 131072
>   
>   # dmesg | grep set_numa
>   [0.00][T0] set_numa_mem: mem node for 0 = 1
>   [0.005654][T0] set_numa_mem: mem node for 1 = 1
> 
> So is the problem more than just node zero having no memory?
> 

The problem would happen with possible nodes which are not yet present. i.e
no cpus, no memory attached to those nodes.

Please look at
http://lore.kernel.org/lkml/20200312131438.gb3...@linux.vnet.ibm.com/t/#u
for more details.

The summary being: pgdat/Node_Data for such nodes is not allocated. Hence
the node_present_pages(nid) called  where nid is a possible but not yet
present node fails. Currently node_present_pages(nid) and node_to_mem_node
don't seem to be equipped to handle possible but not present nodes.

> cheers

-- 
Thanks and Regards
Srikar Dronamraju



Re: [5.6.0-rc2-next-20200218/powerpc] Boot failure on POWER9

2020-03-13 Thread Michael Ellerman
Sachin Sant  writes:
>> The patch below might work. Sachin can you test this? I tried faking up
>> a system with a memoryless node zero but couldn't get it to even start
>> booting.
>> 
> The patch did not help. The kernel crashed during
> the boot with the same call trace.
>
> BUG_ON() introduced with the patch was not triggered.

OK, that's weird.

I eventually managed to get a memoryless node going in sim, and it
appears to work there.

eg in dmesg:

  [0.00][T0] numa:   NODE_DATA [mem 0x2000fffa2f80-0x2000fffa7fff]
  [0.00][T0] numa: NODE_DATA(0) on node 1
  [0.00][T0] numa:   NODE_DATA [mem 0x2000fff9df00-0x2000fffa2f7f]
  ...
  [0.00][T0] Early memory node ranges
  [0.00][T0]   node   1: [mem 0x-0x]
  [0.00][T0]   node   1: [mem 0x2000-0x2000]
  [0.00][T0] Could not find start_pfn for node 0
  [0.00][T0] Initmem setup node 0 [mem 
0x-0x]
  [0.00][T0] On node 0 totalpages: 0
  [0.00][T0] Initmem setup node 1 [mem 
0x-0x2000]
  [0.00][T0] On node 1 totalpages: 131072
  
  # dmesg | grep set_numa
  [0.00][T0] set_numa_mem: mem node for 0 = 1
  [0.005654][T0] set_numa_mem: mem node for 1 = 1

So is the problem more than just node zero having no memory?

cheers


Re: [5.6.0-rc2-next-20200218/powerpc] Boot failure on POWER9

2020-03-12 Thread Sachin Sant
> The patch below might work. Sachin can you test this? I tried faking up
> a system with a memoryless node zero but couldn't get it to even start
> booting.
> 
The patch did not help. The kernel crashed during
the boot with the same call trace.

BUG_ON() introduced with the patch was not triggered.

Thanks
-Sachin

> cheers
> 
> 
> diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
> index 9b4f5fb719e0..d1f11437f6c4 100644
> --- a/arch/powerpc/mm/mem.c
> +++ b/arch/powerpc/mm/mem.c
> @@ -282,6 +282,9 @@ void __init mem_init(void)
>*/
>   BUILD_BUG_ON(MMU_PAGE_COUNT > 16);
> 
> + BUG_ON(smp_processor_id() != boot_cpuid);
> + set_numa_mem(local_memory_node(numa_cpu_lookup_table[boot_cpuid]));
> +
> #ifdef CONFIG_SWIOTLB
>   /*
>* Some platforms (e.g. 85xx) limit DMA-able memory way below



Re: [5.6.0-rc2-next-20200218/powerpc] Boot failure on POWER9

2020-03-12 Thread Michael Ellerman
Michal Hocko  writes:
> On Thu 27-02-20 19:26:54, Michal Hocko wrote:
>> [Cc ppc maintainers]
> [...]
>> Please have a look at 
>> http://lkml.kernel.org/r/52ef4673-7292-4c4c-b459-af583951b...@linux.vnet.ibm.com
>> for the boot log with the debugging patch which tracks set_numa_mem.
>> This seems to lead to a crash in the slab allocator bebcause
>> node_to_mem_node(0) for memory less node resolves to the memory less
>> node http://lkml.kernel.org/r/dd450314-d428-6776-af07-f92c04c7b...@suse.cz.
>> The original report is 
>> http://lkml.kernel.org/r/3381cd91-ab3d-4773-ba04-e7a072a63...@linux.vnet.ibm.com
>
> ping 

The obvious fix is:

diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index 37c12e3bab9e..33b1fca0b258 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -892,6 +892,7 @@ void smp_prepare_boot_cpu(void)
paca_ptrs[boot_cpuid]->__current = current;
 #endif
set_numa_node(numa_cpu_lookup_table[boot_cpuid]);
+   set_numa_mem(local_memory_node(numa_cpu_lookup_table[boot_cpuid]));
current_set[boot_cpuid] = current;
 }


But that doesn't work because smp_prepare_boot_cpu() is called too
early:

asmlinkage __visible void __init start_kernel(void)
{
...
smp_prepare_boot_cpu(); /* arch-specific boot-cpu hooks */
boot_cpu_hotplug_init();

build_all_zonelists(NULL);


And local_memory_node() uses first_zones_zonelist() which doesn't work
prior to build_all_zonelists() being called.


The patch below might work. Sachin can you test this? I tried faking up
a system with a memoryless node zero but couldn't get it to even start
booting.

cheers


diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
index 9b4f5fb719e0..d1f11437f6c4 100644
--- a/arch/powerpc/mm/mem.c
+++ b/arch/powerpc/mm/mem.c
@@ -282,6 +282,9 @@ void __init mem_init(void)
 */
BUILD_BUG_ON(MMU_PAGE_COUNT > 16);
 
+   BUG_ON(smp_processor_id() != boot_cpuid);
+   set_numa_mem(local_memory_node(numa_cpu_lookup_table[boot_cpuid]));
+
 #ifdef CONFIG_SWIOTLB
/*
 * Some platforms (e.g. 85xx) limit DMA-able memory way below


Re: [5.6.0-rc2-next-20200218/powerpc] Boot failure on POWER9

2020-03-10 Thread Michal Hocko
On Thu 27-02-20 19:26:54, Michal Hocko wrote:
> [Cc ppc maintainers]
[...]
> Please have a look at 
> http://lkml.kernel.org/r/52ef4673-7292-4c4c-b459-af583951b...@linux.vnet.ibm.com
> for the boot log with the debugging patch which tracks set_numa_mem.
> This seems to lead to a crash in the slab allocator bebcause
> node_to_mem_node(0) for memory less node resolves to the memory less
> node http://lkml.kernel.org/r/dd450314-d428-6776-af07-f92c04c7b...@suse.cz.
> The original report is 
> http://lkml.kernel.org/r/3381cd91-ab3d-4773-ba04-e7a072a63...@linux.vnet.ibm.com

ping 
-- 
Michal Hocko
SUSE Labs


Re: [5.6.0-rc2-next-20200218/powerpc] Boot failure on POWER9

2020-02-27 Thread Michal Hocko
[Cc ppc maintainers]
On Thu 27-02-20 17:16:41, Vlastimil Babka wrote:
> On 2/27/20 5:00 PM, Sachin Sant wrote:
> > 
> > 
> >> On 27-Feb-2020, at 5:42 PM, Michal Hocko  wrote:
> >> 
> >> A very good hint indeed. I would do this
> >> diff --git a/include/linux/topology.h b/include/linux/topology.h
> >> index eb2fe6edd73c..d9f1b6737e4d 100644
> >> --- a/include/linux/topology.h
> >> +++ b/include/linux/topology.h
> >> @@ -137,6 +137,8 @@ static inline void set_numa_mem(int node)
> >> {
> >>this_cpu_write(_numa_mem_, node);
> >>_node_numa_mem_[numa_node_id()] = node;
> >> +  pr_info("%s %d -> %d\n", __FUNCTION__, numa_node_id(), node);
> >> +  dump_stack();
> >> }
> >> #endif
> >> 
> >> Btw. it would be also helpful to get
> >> `faddr2line ___slab_alloc+0x334' from your kernel Sachin.
> > 
> > [linux-next]# ./scripts/faddr2line ./vmlinux ___slab_alloc+0x334 
> > ___slab_alloc+0x334/0x760:
> > new_slab_objects at mm/slub.c:2478
> > (inlined by) ___slab_alloc at mm/slub.c:2628
> > [linux-next]# 
> 
> Hmm that doesn't look relevant, but that address was marked as unreliable, no?
> Don't we actually need this one?
> 
> [8.768727] NIP [c03d55f4] ___slab_alloc+0x1f4/0x760
> 
> > I have also attached boot log with a kernel that include about change.
> > I see the following o/p during boot:
> > 
> > [0.005269] set_numa_mem 1 -> 1
> 
> So there's no "set_numa_mem 0 -> X", specifically not
> "set_numa_mem 0 -> 1" which I would have expected. That seems to confirm my
> suspicion that the arch code doesn't set up the memoryless node 0 properly.

Please have a look at 
http://lkml.kernel.org/r/52ef4673-7292-4c4c-b459-af583951b...@linux.vnet.ibm.com
for the boot log with the debugging patch which tracks set_numa_mem.
This seems to lead to a crash in the slab allocator bebcause
node_to_mem_node(0) for memory less node resolves to the memory less
node http://lkml.kernel.org/r/dd450314-d428-6776-af07-f92c04c7b...@suse.cz.
The original report is 
http://lkml.kernel.org/r/3381cd91-ab3d-4773-ba04-e7a072a63...@linux.vnet.ibm.com

> 
> > [0.005270] CPU: 12 PID: 0 Comm: swapper/12 Not tainted 
> > 5.6.0-rc3-next-20200227-autotest+ #6
> > [0.005271] Call Trace:
> > [0.005272] [c008b37dfe80] [c0b5d948] dump_stack+0xbc/0x104 
> > (unreliable)
> > [0.005274] [c008b37dfec0] [c0059320] 
> > start_secondary+0x600/0x6e0
> > [0.005277] [c008b37dff90] [c000ac54] 
> > start_secondary_prolog+0x10/0x14
> > 
> > Thanks
> > -Sachin
> > 

-- 
Michal Hocko
SUSE Labs


Re: [5.6.0-rc2-next-20200218/powerpc] Boot failure on POWER9

2020-02-27 Thread Vlastimil Babka
On 2/27/20 5:00 PM, Sachin Sant wrote:
> 
> 
>> On 27-Feb-2020, at 5:42 PM, Michal Hocko  wrote:
>> 
>> A very good hint indeed. I would do this
>> diff --git a/include/linux/topology.h b/include/linux/topology.h
>> index eb2fe6edd73c..d9f1b6737e4d 100644
>> --- a/include/linux/topology.h
>> +++ b/include/linux/topology.h
>> @@ -137,6 +137,8 @@ static inline void set_numa_mem(int node)
>> {
>>  this_cpu_write(_numa_mem_, node);
>>  _node_numa_mem_[numa_node_id()] = node;
>> +pr_info("%s %d -> %d\n", __FUNCTION__, numa_node_id(), node);
>> +dump_stack();
>> }
>> #endif
>> 
>> Btw. it would be also helpful to get
>> `faddr2line ___slab_alloc+0x334' from your kernel Sachin.
> 
> [linux-next]# ./scripts/faddr2line ./vmlinux ___slab_alloc+0x334 
> ___slab_alloc+0x334/0x760:
> new_slab_objects at mm/slub.c:2478
> (inlined by) ___slab_alloc at mm/slub.c:2628
> [linux-next]# 

Hmm that doesn't look relevant, but that address was marked as unreliable, no?
Don't we actually need this one?

[8.768727] NIP [c03d55f4] ___slab_alloc+0x1f4/0x760

> I have also attached boot log with a kernel that include about change.
> I see the following o/p during boot:
> 
> [0.005269] set_numa_mem 1 -> 1

So there's no "set_numa_mem 0 -> X", specifically not
"set_numa_mem 0 -> 1" which I would have expected. That seems to confirm my
suspicion that the arch code doesn't set up the memoryless node 0 properly.

> [0.005270] CPU: 12 PID: 0 Comm: swapper/12 Not tainted 
> 5.6.0-rc3-next-20200227-autotest+ #6
> [0.005271] Call Trace:
> [0.005272] [c008b37dfe80] [c0b5d948] dump_stack+0xbc/0x104 
> (unreliable)
> [0.005274] [c008b37dfec0] [c0059320] 
> start_secondary+0x600/0x6e0
> [0.005277] [c008b37dff90] [c000ac54] 
> start_secondary_prolog+0x10/0x14
> 
> Thanks
> -Sachin
> 



Re: [5.6.0-rc2-next-20200218/powerpc] Boot failure on POWER9

2020-02-27 Thread Sachin Sant


> On 27-Feb-2020, at 5:42 PM, Michal Hocko  wrote:
> 
> A very good hint indeed. I would do this
> diff --git a/include/linux/topology.h b/include/linux/topology.h
> index eb2fe6edd73c..d9f1b6737e4d 100644
> --- a/include/linux/topology.h
> +++ b/include/linux/topology.h
> @@ -137,6 +137,8 @@ static inline void set_numa_mem(int node)
> {
>   this_cpu_write(_numa_mem_, node);
>   _node_numa_mem_[numa_node_id()] = node;
> + pr_info("%s %d -> %d\n", __FUNCTION__, numa_node_id(), node);
> + dump_stack();
> }
> #endif
> 
> Btw. it would be also helpful to get
> `faddr2line ___slab_alloc+0x334' from your kernel Sachin.

[linux-next]# ./scripts/faddr2line ./vmlinux ___slab_alloc+0x334 
___slab_alloc+0x334/0x760:
new_slab_objects at mm/slub.c:2478
(inlined by) ___slab_alloc at mm/slub.c:2628
[linux-next]# 

I have also attached boot log with a kernel that include about change.
I see the following o/p during boot:

[0.005269] set_numa_mem 1 -> 1
[0.005270] CPU: 12 PID: 0 Comm: swapper/12 Not tainted 
5.6.0-rc3-next-20200227-autotest+ #6
[0.005271] Call Trace:
[0.005272] [c008b37dfe80] [c0b5d948] dump_stack+0xbc/0x104 
(unreliable)
[0.005274] [c008b37dfec0] [c0059320] start_secondary+0x600/0x6e0
[0.005277] [c008b37dff90] [c000ac54] 
start_secondary_prolog+0x10/0x14

Thanks
-Sachin



boot.log
Description: Binary data


Re: [5.6.0-rc2-next-20200218/powerpc] Boot failure on POWER9

2020-02-27 Thread Michal Hocko
On Wed 26-02-20 23:29:24, Vlastimil Babka wrote:
> On 2/26/20 10:45 PM, Vlastimil Babka wrote:
> > 
> > 
> > if (node == NUMA_NO_NODE)
> > page = alloc_pages(flags, order);
> > else
> > page = __alloc_pages_node(node, flags, order);
> > 
> > So yeah looks like SLUB's kmalloc_node() is supposed to behave like the
> > page allocator's __alloc_pages_node() and respect __GFP_THISNODE but not
> > enforce it by itself. There's probably just some missing data structure
> > initialization somewhere right now for memoryless nodes.
> 
> Upon more digging, I think the problem could manifest if
> node_to_mem_node(0) (_node_numa_mem_[0]) returned 0 instead of 1,
> because it wasn't initialized properly for a memoryless node. Can you
> e.g. print it somewhere?

A very good hint indeed. I would do this
diff --git a/include/linux/topology.h b/include/linux/topology.h
index eb2fe6edd73c..d9f1b6737e4d 100644
--- a/include/linux/topology.h
+++ b/include/linux/topology.h
@@ -137,6 +137,8 @@ static inline void set_numa_mem(int node)
 {
this_cpu_write(_numa_mem_, node);
_node_numa_mem_[numa_node_id()] = node;
+   pr_info("%s %d -> %d\n", __FUNCTION__, numa_node_id(), node);
+   dump_stack();
 }
 #endif
 
Btw. it would be also helpful to get
`faddr2line ___slab_alloc+0x334' from your kernel Sachin.
-- 
Michal Hocko
SUSE Labs


Re: [5.6.0-rc2-next-20200218/powerpc] Boot failure on POWER9

2020-02-27 Thread Michal Hocko
On Wed 26-02-20 22:45:52, Vlastimil Babka wrote:
> On 2/26/20 7:41 PM, Michal Hocko wrote:
> > On Wed 26-02-20 18:25:28, Cristopher Lameter wrote:
> >> On Mon, 24 Feb 2020, Michal Hocko wrote:
> >>
> >>> Hmm, nasty. Is there any reason why kmalloc_node behaves differently
> >>> from the page allocator?
> >>
> >> The page allocator will do the same thing if you pass GFP_THISNODE and
> >> insist on allocating memory from a node that does not exist.
> > 
> > I do not think that the page allocator would blow up even with
> > GFP_THISNODE. The allocation would just fail on memory less node.
> > 
> > Besides that kmalloc_node shouldn't really have an implicit GFP_THISNODE
> > semantic right? At least I do not see anything like that documented
> > anywhere.
> 
> Seems like SLAB at least behaves like the page allocator. See
> cache_alloc_node() where it basically does:
> 
> page = cache_grow_begin(cachep, gfp_exact_node(flags), nodeid);
> ...
> if (!page)
>   fallback_alloc(cachep, flags)
> 
> gfp_exact_node() adds __GFP_THISNODE among other things, so the initial
> attempt does try to stick only to the given node. But fallback_alloc()
> doesn't. In fact, even if kmalloc_node() was called with __GFP_THISNODE
> then it wouldn't work as intended, as fallback_alloc() doesn't get the
> nodeid, but instead will use numa_mem_id(). That part could probably be
> improved.
> 
> SLUB's ___slab_alloc() has for example this:
> if (node != NUMA_NO_NODE && !node_present_pages(node))

Hmm, just a quick note. Shouldn't this be node_managed_pages? In most
cases the difference is negligible but I can imagine crazy setups where
all present pages are simply consumed.

> searchnode = node_to_mem_node(node);
> 
> That's from Joonsoo's 2014 commit a561ce00b09e ("slub: fall back to
> node_to_mem_node() node if allocating on memoryless node"), suggesting
> that the scenario in this bug report should work. Perhaps it just got
> broken unintentionally later.

A very good reference. Thanks!

> And AFAICS the whole path leading to alloc_slab_page() also doesn't add
> __GFP_THISNODE, but will keep it if caller passed it, and ultimately it
> does:
> 
> 
> if (node == NUMA_NO_NODE)
> page = alloc_pages(flags, order);
> else
> page = __alloc_pages_node(node, flags, order);
> 
> So yeah looks like SLUB's kmalloc_node() is supposed to behave like the
> page allocator's __alloc_pages_node() and respect __GFP_THISNODE but not
> enforce it by itself. There's probably just some missing data structure
> initialization somewhere right now for memoryless nodes.

Thanks for the confirmation!
-- 
Michal Hocko
SUSE Labs


Re: [5.6.0-rc2-next-20200218/powerpc] Boot failure on POWER9

2020-02-26 Thread Vlastimil Babka
On 2/26/20 10:45 PM, Vlastimil Babka wrote:
> 
> 
> if (node == NUMA_NO_NODE)
> page = alloc_pages(flags, order);
> else
> page = __alloc_pages_node(node, flags, order);
> 
> So yeah looks like SLUB's kmalloc_node() is supposed to behave like the
> page allocator's __alloc_pages_node() and respect __GFP_THISNODE but not
> enforce it by itself. There's probably just some missing data structure
> initialization somewhere right now for memoryless nodes.

Upon more digging, I think the problem could manifest if
node_to_mem_node(0) (_node_numa_mem_[0]) returned 0 instead of 1,
because it wasn't initialized properly for a memoryless node. Can you
e.g. print it somewhere?


Re: [5.6.0-rc2-next-20200218/powerpc] Boot failure on POWER9

2020-02-26 Thread Vlastimil Babka
On 2/26/20 7:41 PM, Michal Hocko wrote:
> On Wed 26-02-20 18:25:28, Cristopher Lameter wrote:
>> On Mon, 24 Feb 2020, Michal Hocko wrote:
>>
>>> Hmm, nasty. Is there any reason why kmalloc_node behaves differently
>>> from the page allocator?
>>
>> The page allocator will do the same thing if you pass GFP_THISNODE and
>> insist on allocating memory from a node that does not exist.
> 
> I do not think that the page allocator would blow up even with
> GFP_THISNODE. The allocation would just fail on memory less node.
> 
> Besides that kmalloc_node shouldn't really have an implicit GFP_THISNODE
> semantic right? At least I do not see anything like that documented
> anywhere.

Seems like SLAB at least behaves like the page allocator. See
cache_alloc_node() where it basically does:

page = cache_grow_begin(cachep, gfp_exact_node(flags), nodeid);
...
if (!page)
fallback_alloc(cachep, flags)

gfp_exact_node() adds __GFP_THISNODE among other things, so the initial
attempt does try to stick only to the given node. But fallback_alloc()
doesn't. In fact, even if kmalloc_node() was called with __GFP_THISNODE
then it wouldn't work as intended, as fallback_alloc() doesn't get the
nodeid, but instead will use numa_mem_id(). That part could probably be
improved.

SLUB's ___slab_alloc() has for example this:
if (node != NUMA_NO_NODE && !node_present_pages(node))
searchnode = node_to_mem_node(node);

That's from Joonsoo's 2014 commit a561ce00b09e ("slub: fall back to
node_to_mem_node() node if allocating on memoryless node"), suggesting
that the scenario in this bug report should work. Perhaps it just got
broken unintentionally later.

And AFAICS the whole path leading to alloc_slab_page() also doesn't add
__GFP_THISNODE, but will keep it if caller passed it, and ultimately it
does:


if (node == NUMA_NO_NODE)
page = alloc_pages(flags, order);
else
page = __alloc_pages_node(node, flags, order);

So yeah looks like SLUB's kmalloc_node() is supposed to behave like the
page allocator's __alloc_pages_node() and respect __GFP_THISNODE but not
enforce it by itself. There's probably just some missing data structure
initialization somewhere right now for memoryless nodes.


Re: [5.6.0-rc2-next-20200218/powerpc] Boot failure on POWER9

2020-02-26 Thread Michal Hocko
On Wed 26-02-20 12:31:56, David Rientjes wrote:
> On Wed, 26 Feb 2020, Michal Hocko wrote:
> 
> > On Wed 26-02-20 18:44:13, Cristopher Lameter wrote:
> > > On Wed, 26 Feb 2020, Michal Hocko wrote:
> > > 
> > > > Besides that kmalloc_node shouldn't really have an implicit GFP_THISNODE
> > > > semantic right? At least I do not see anything like that documented
> > > > anywhere.
> > > 
> > > Kmalloc_node does not support memory policies etc. Only kmalloc does.
> > > kmalloc_node is mostly used by subsystems that have determined the active
> > > nodes and want a targeted allocation on those nodes.
> >  
> > I am sorry but I have hard time to follow your responses here. They open
> > more questions than they answer for me. The primary point here is that
> > kmalloc_node on a memory less node blows up and panics the kernel. I
> > strongly believe this is a bug. We cannot really make all callers of
> > kmalloc_node and co. to be hotplug aware.
> > 
> > Another question is the semantic of kmalloc_node when the node cannot
> > satisfy the request. I have always thought that the allocation would
> > simply fall back to any other node unless __GFP_THISNODE is explicitly
> > specified.
> > 
> 
> Am I right in classifying this as a trade-off between an 
> unlikely(!node_state(nid, N_MEMORY)) directly in kmalloc_node() vs fixing 
> up a caller passing a memoryless nid?

The thing is that any check for node online/populated followed by the
allocation is inherently racy without using memory hotplug locking
around that and I am pretty sure this is a step into a wrong direction.

Is there any problem to initialize slub internal data structures for all
possible nodes? This wouldn't require any checks into hot paths.

-- 
Michal Hocko
SUSE Labs


Re: [5.6.0-rc2-next-20200218/powerpc] Boot failure on POWER9

2020-02-26 Thread Michal Hocko
On Wed 26-02-20 18:44:13, Cristopher Lameter wrote:
> On Wed, 26 Feb 2020, Michal Hocko wrote:
> 
> > Besides that kmalloc_node shouldn't really have an implicit GFP_THISNODE
> > semantic right? At least I do not see anything like that documented
> > anywhere.
> 
> Kmalloc_node does not support memory policies etc. Only kmalloc does.
> kmalloc_node is mostly used by subsystems that have determined the active
> nodes and want a targeted allocation on those nodes.
 
I am sorry but I have hard time to follow your responses here. They open
more questions than they answer for me. The primary point here is that
kmalloc_node on a memory less node blows up and panics the kernel. I
strongly believe this is a bug. We cannot really make all callers of
kmalloc_node and co. to be hotplug aware.

Another question is the semantic of kmalloc_node when the node cannot
satisfy the request. I have always thought that the allocation would
simply fall back to any other node unless __GFP_THISNODE is explicitly
specified.

-- 
Michal Hocko
SUSE Labs


Re: [5.6.0-rc2-next-20200218/powerpc] Boot failure on POWER9

2020-02-26 Thread Christopher Lameter
On Wed, 26 Feb 2020, Michal Hocko wrote:

> Besides that kmalloc_node shouldn't really have an implicit GFP_THISNODE
> semantic right? At least I do not see anything like that documented
> anywhere.

Kmalloc_node does not support memory policies etc. Only kmalloc does.
kmalloc_node is mostly used by subsystems that have determined the active
nodes and want a targeted allocation on those nodes.




Re: [5.6.0-rc2-next-20200218/powerpc] Boot failure on POWER9

2020-02-26 Thread Michal Hocko
On Wed 26-02-20 18:25:28, Cristopher Lameter wrote:
> On Mon, 24 Feb 2020, Michal Hocko wrote:
> 
> > Hmm, nasty. Is there any reason why kmalloc_node behaves differently
> > from the page allocator?
> 
> The page allocator will do the same thing if you pass GFP_THISNODE and
> insist on allocating memory from a node that does not exist.

I do not think that the page allocator would blow up even with
GFP_THISNODE. The allocation would just fail on memory less node.

Besides that kmalloc_node shouldn't really have an implicit GFP_THISNODE
semantic right? At least I do not see anything like that documented
anywhere.

-- 
Michal Hocko
SUSE Labs


Re: [5.6.0-rc2-next-20200218/powerpc] Boot failure on POWER9

2020-02-26 Thread Christopher Lameter
On Mon, 24 Feb 2020, Michal Hocko wrote:

> Hmm, nasty. Is there any reason why kmalloc_node behaves differently
> from the page allocator?

The page allocator will do the same thing if you pass GFP_THISNODE and
insist on allocating memory from a node that does not exist.


> > > A short summary. kmalloc_node blows up when trying to allocate from a
> > > memory less node.
> >
> > Use kmalloc instead? And set a memory allocation policy?
>
> The current code (memcg_expand_one_shrinker_map resp. 
> memcg_alloc_shrinker_maps)
> already use kvmalloc. Kirill's patch wanted to make those data structure
> on the respective node and kvmalloc_node sounded like the right thing to
> do. It comes as a surprise that the kernel simply blows up on a memory
> less node rather than falling back to a close node gracefully. I suspect
> this already happens when the target node is out of memory, right?

No. If the target node is out of memory then direct reclaim is going to be
invovked.

> How would a memory allocation policy help in this case btw.?

It would allow fallback to other nodes.




Re: [5.6.0-rc2-next-20200218/powerpc] Boot failure on POWER9

2020-02-24 Thread Michal Hocko
On Sat 22-02-20 03:38:11, Cristopher Lameter wrote:
> On Tue, 18 Feb 2020, Michal Hocko wrote:
> 
> > Anyway, I do not think it is expected that kmalloc_node just blows up
> > on those nodes. The page allocator simply falls back to the closest
> > node. Something for kmalloc maintainers I believe.
> 
> That is the case for an unconstrained allocation. kmalloc_node means that
> you want memory from that node. And If there is no such node then it is an
> error.

Hmm, nasty. Is there any reason why kmalloc_node behaves differently
from the page allocator?

> > A short summary. kmalloc_node blows up when trying to allocate from a
> > memory less node.
> 
> Use kmalloc instead? And set a memory allocation policy?

The current code (memcg_expand_one_shrinker_map resp. memcg_alloc_shrinker_maps)
already use kvmalloc. Kirill's patch wanted to make those data structure
on the respective node and kvmalloc_node sounded like the right thing to
do. It comes as a surprise that the kernel simply blows up on a memory
less node rather than falling back to a close node gracefully. I suspect
this already happens when the target node is out of memory, right?

How would a memory allocation policy help in this case btw.?

-- 
Michal Hocko
SUSE Labs


Re: [5.6.0-rc2-next-20200218/powerpc] Boot failure on POWER9

2020-02-21 Thread Christopher Lameter
On Tue, 18 Feb 2020, Michal Hocko wrote:

> Anyway, I do not think it is expected that kmalloc_node just blows up
> on those nodes. The page allocator simply falls back to the closest
> node. Something for kmalloc maintainers I believe.

That is the case for an unconstrained allocation. kmalloc_node means that
you want memory from that node. And If there is no such node then it is an
error.

> A short summary. kmalloc_node blows up when trying to allocate from a
> memory less node.

Use kmalloc instead? And set a memory allocation policy?



Re: [5.6.0-rc2-next-20200218/powerpc] Boot failure on POWER9

2020-02-18 Thread Michal Hocko
On Tue 18-02-20 20:41:12, Sachin Sant wrote:
> 
> >> Yes, I can recreate the same problem with the patch applied on top of
> >> 5.6.0-rc2. 
> > 
> > And just to make sure. This was with 
> > http://lkml.kernel.org/r/fff0e636-4c36-ed10-281c-8cdb0687c...@virtuozzo.com
> > right?
> > 
> Yes, the same patch.
> 
> > If yes, is it possible that the specific node is somehow crippled (e.g.
> > some nodes don't have any memory and thus the allocator blows up)? In
> > other words what is the numa topology? (numactl -H)
> > 
> 
> Here is the o/p of numactl
> 
> # numactl -H
> available: 2 nodes (0-1)
> node 0 cpus:
> node 0 size: 0 MB
> node 0 free: 0 MB

OK, so what I expected. The node0 is memory less or simply not present
at all. Fun!

Anyway, I do not think it is expected that kmalloc_node just blows up
on those nodes. The page allocator simply falls back to the closest
node. Something for kmalloc maintainers I believe.

A short summary. kmalloc_node blows up when trying to allocate from a
memory less node.

> node 1 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 
> 25 26 27 28 29 30 31
> node 1 size: 35247 MB
> node 1 free: 30907 MB
> node distances:
> node   0   1 
>   0:  10  40 
>   1:  40  10 
> # 
> 
> Thanks
> -Sachin

-- 
Michal Hocko
SUSE Labs


Re: [5.6.0-rc2-next-20200218/powerpc] Boot failure on POWER9

2020-02-18 Thread Sachin Sant


>> Yes, I can recreate the same problem with the patch applied on top of
>> 5.6.0-rc2. 
> 
> And just to make sure. This was with 
> http://lkml.kernel.org/r/fff0e636-4c36-ed10-281c-8cdb0687c...@virtuozzo.com
> right?
> 
Yes, the same patch.

> If yes, is it possible that the specific node is somehow crippled (e.g.
> some nodes don't have any memory and thus the allocator blows up)? In
> other words what is the numa topology? (numactl -H)
> 

Here is the o/p of numactl

# numactl -H
available: 2 nodes (0-1)
node 0 cpus:
node 0 size: 0 MB
node 0 free: 0 MB
node 1 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 
25 26 27 28 29 30 31
node 1 size: 35247 MB
node 1 free: 30907 MB
node distances:
node   0   1 
  0:  10  40 
  1:  40  10 
# 

Thanks
-Sachin


Re: [5.6.0-rc2-next-20200218/powerpc] Boot failure on POWER9

2020-02-18 Thread Michal Hocko
On Tue 18-02-20 19:30:33, Sachin Sant wrote:
> 
> 
> > On 18-Feb-2020, at 5:25 PM, Michal Hocko  wrote:
> > 
> > On Tue 18-02-20 17:10:47, Sachin Sant wrote:
> >> 
>  could you please test your boot with original patch from here:
>  
>  https://patchwork.kernel.org/patch/11360007/
> >>> 
> >>> After you tried the above patch instead of the problem patch,
> >>> do one more test and apply the below on current linux-next.
> >>> Please, say which of the patches makes your kernel bootable again.
> >>> 
> >>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> >>> index 63bb6a2aab81..7b9b48dcbc60 100644
> >>> --- a/mm/memcontrol.c
> >>> +++ b/mm/memcontrol.c
> >>> @@ -334,7 +334,7 @@ static int memcg_expand_one_shrinker_map(struct 
> >>> mem_cgroup *memcg,
> >>>   if (!old)
> >>>   return 0;
> >>> 
> >>> - new = kvmalloc_node(sizeof(*new) + size, GFP_KERNEL, nid);
> >>> + new = kmalloc_node(sizeof(*new) + size, GFP_KERNEL, nid);
> >>>   if (!new)
> >>>   return -ENOMEM;
> >>> 
> >>> @@ -378,7 +378,7 @@ static int memcg_alloc_shrinker_maps(struct 
> >>> mem_cgroup *memcg)
> >>>   mutex_lock(_shrinker_map_mutex);
> >>>   size = memcg_shrinker_map_size;
> >>>   for_each_node(nid) {
> >>> - map = kvzalloc_node(sizeof(*map) + size, GFP_KERNEL, nid);
> >>> + map = kzalloc_node(sizeof(*map) + size, GFP_KERNEL, nid);
> >>>   if (!map) {
> >>>   memcg_free_shrinker_maps(memcg);
> >>>   ret = -ENOMEM;
> >> 
> >> With this incremental patch applied on top of current linux-next, machine 
> >> fails to boot
> > 
> > Your calltrace points to a standard system call path. I do not see any
> > reason why that commit should cause any problems. Do you see the
> > same when applying the patch you managed to bisect to on top of Linus
> > tree? Just to rule out any other potential problems in linux-next?
> 
> Yes, I can recreate the same problem with the patch applied on top of
> 5.6.0-rc2. 

And just to make sure. This was with 
http://lkml.kernel.org/r/fff0e636-4c36-ed10-281c-8cdb0687c...@virtuozzo.com
right?

If yes, is it possible that the specific node is somehow crippled (e.g.
some nodes don't have any memory and thus the allocator blows up)? In
other words what is the numa topology? (numactl -H)

> CONFIG_SLUB is enabled in my case. I have attached the .config.
> The LPAR has 34GB of memory allocated.
> 
> [8.766078] BUG: Kernel NULL pointer dereference on read at 0x73b0
> [8.766083] Faulting instruction address: 0xc03d38a4
> [8.766089] Oops: Kernel access of bad area, sig: 11 [#1]
> [8.766093] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
> [8.766098] Modules linked in:
> [8.766103] CPU: 12 PID: 1 Comm: systemd Not tainted 5.6.0-rc2-autotest+ #2
> [8.766107] NIP:  c03d38a4 LR: c03d3e44 CTR: 
> 
> [8.766113] REGS: c008b37836e0 TRAP: 0300   Not tainted  
> (5.6.0-rc2-autotest+)
> [8.766118] MSR:  80009033   CR: 24004844  
> XER: 
> [8.766125] CFAR: c000dec4 DAR: 73b0 DSISR: 4000 
> IRQMASK: 1 
> [8.766125] GPR00: c03d3e44 c008b3783970 c155d500 
> c008b301f500 
> [8.766125] GPR04: 0dc0 0002 c03443f8 
> c008bac98620 
> [8.766125] GPR08: 0008b9bf 0001  
>  
> [8.766125] GPR12: 24004844 c0001ec5d200  
>  
> [8.766125] GPR16: c7be2048 c1595818 c1750c98 
> 0002 
> [8.766125] GPR20: c1750ca8 c1624470 000fffe0 
> 5deadbeef122 
> [8.766125] GPR24: 0001 0dc0 0002 
> c03443f8 
> [8.766125] GPR28: c008b301f500 c008bac98620  
> c00c02286fc0 
> [8.766172] NIP [c03d38a4] ___slab_alloc+0x1f4/0x760
> [8.766177] LR [c03d3e44] __slab_alloc+0x34/0x60
> [8.766181] Call Trace:
> [8.766184] [c008b3783970] [c03d39e4] 
> ___slab_alloc+0x334/0x760 (unreliable)
> [8.766191] [c008b3783a50] [c03d3e44] __slab_alloc+0x34/0x60
> [8.766196] [c008b3783a80] [c03d5250] 
> __kmalloc_node+0x110/0x490
> [8.766203] [c008b3783b00] [c03443f8] kvmalloc_node+0x58/0x110
> [8.766208] [c008b3783b40] [c03fcf58] 
> mem_cgroup_css_online+0x108/0x270
> [8.766215] [c008b3783ba0] [c0236078] online_css+0x48/0xd0
> [8.766220] [c008b3783bd0] [c023eebc] 
> cgroup_apply_control_enable+0x2ec/0x4d0
> [8.766226] [c008b3783cb0] [c0242728] cgroup_mkdir+0x228/0x5f0
> [8.766232] [c008b3783d20] [c051ab48] 
> kernfs_iop_mkdir+0xb8/0x170
> [8.766238] [c008b3783d50] [c043a7c0] vfs_mkdir+0x110/0x230
> [8.766243] 

Re: [5.6.0-rc2-next-20200218/powerpc] Boot failure on POWER9

2020-02-18 Thread Michal Hocko
On Tue 18-02-20 17:10:47, Sachin Sant wrote:
> 
> >> could you please test your boot with original patch from here:
> >> 
> >> https://patchwork.kernel.org/patch/11360007/
> > 
> > After you tried the above patch instead of the problem patch,
> > do one more test and apply the below on current linux-next.
> > Please, say which of the patches makes your kernel bootable again.
> > 
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index 63bb6a2aab81..7b9b48dcbc60 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -334,7 +334,7 @@ static int memcg_expand_one_shrinker_map(struct 
> > mem_cgroup *memcg,
> > if (!old)
> > return 0;
> > 
> > -   new = kvmalloc_node(sizeof(*new) + size, GFP_KERNEL, nid);
> > +   new = kmalloc_node(sizeof(*new) + size, GFP_KERNEL, nid);
> > if (!new)
> > return -ENOMEM;
> > 
> > @@ -378,7 +378,7 @@ static int memcg_alloc_shrinker_maps(struct mem_cgroup 
> > *memcg)
> > mutex_lock(_shrinker_map_mutex);
> > size = memcg_shrinker_map_size;
> > for_each_node(nid) {
> > -   map = kvzalloc_node(sizeof(*map) + size, GFP_KERNEL, nid);
> > +   map = kzalloc_node(sizeof(*map) + size, GFP_KERNEL, nid);
> > if (!map) {
> > memcg_free_shrinker_maps(memcg);
> > ret = -ENOMEM;
> 
> With this incremental patch applied on top of current linux-next, machine 
> fails to boot

Your calltrace points to a standard system call path. I do not see any
reason why that commit should cause any problems. Do you see the
same when applying the patch you managed to bisect to on top of Linus
tree? Just to rule out any other potential problems in linux-next?
This all smells like a corrupted slab allocator. Which allocator do
you use?

> [8.868433] BUG: Kernel NULL pointer dereference on read at 0x73b0
> [8.868439] Faulting instruction address: 0xc03d55f4
> [8.868444] Oops: Kernel access of bad area, sig: 11 [#1]
> [8.868449] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
> [8.868453] Modules linked in:
> [8.868458] CPU: 18 PID: 1 Comm: systemd Not tainted 
> 5.6.0-rc2-next-20200218-autotest+ #4
> [8.868463] NIP:  c03d55f4 LR: c03d5b94 CTR: 
> 
> [8.868468] REGS: c008b3783710 TRAP: 0300   Not tainted  
> (5.6.0-rc2-next-20200218-autotest+)
> [8.868474] MSR:  80009033   CR: 24004844  
> XER: 
> [8.868481] CFAR: c000dec4 DAR: 73b0 DSISR: 4000 
> IRQMASK: 1 
> [8.868481] GPR00: c03d5b94 c008b37839a0 c155d400 
> c008b301f500 
> [8.868481] GPR04: 0dc0 0002 c03fee38 
> c008bb298620 
> [8.868481] GPR08: 0008ba1f 0001  
>  
> [8.868481] GPR12: 24004844 c0001ec54200  
>  
> [8.868481] GPR16: c008a1a60048 c1595898 c1750c18 
> 0002 
> [8.868481] GPR20: c1750c28 c1624470 000fffe0 
> 5deadbeef122 
> [8.868481] GPR24: 0001 0dc0 0002 
> c03fee38 
> [8.868481] GPR28: c008b301f500 c008bb298620  
> c00c02286d00 
> [8.868529] NIP [c03d55f4] ___slab_alloc+0x1f4/0x760
> [8.868534] LR [c03d5b94] __slab_alloc+0x34/0x60
> [8.868538] Call Trace:
> [8.868541] [c008b37839a0] [c03d5734] 
> ___slab_alloc+0x334/0x760 (unreliable)
> [8.868547] [c008b3783a80] [c03d5b94] __slab_alloc+0x34/0x60
> [8.868553] [c008b3783ab0] [c03d6fa0] 
> __kmalloc_node+0x110/0x490
> [8.868559] [c008b3783b30] [c03fee38] 
> mem_cgroup_css_online+0x108/0x270
> [8.868565] [c008b3783b90] [c0235aa8] online_css+0x48/0xd0
> [8.868571] [c008b3783bc0] [c023eaec] 
> cgroup_apply_control_enable+0x2ec/0x4d0
> [8.868577] [c008b3783ca0] [c0242318] cgroup_mkdir+0x228/0x5f0
> [8.868583] [c008b3783d10] [c051e170] 
> kernfs_iop_mkdir+0x90/0xf0
> [8.868589] [c008b3783d50] [c043dc00] vfs_mkdir+0x110/0x230
> [8.868594] [c008b3783da0] [c0441c90] do_mkdirat+0xb0/0x1a0
> [8.868601] [c008b3783e20] [c000b278] system_call+0x5c/0x68
> [8.868605] Instruction dump:
> [8.868608] 7c421378 e95f 714a0001 4082fff0 4b64 6000 6000 
> faa10088 
> [8.868615] 3ea2000c 3ab57070 7b4a1f24 7d55502a  2faa 
> 409e0394 3d02002a 
> [8.868623] ---[ end trace f9b8e3c36493f430 ]---
> [8.870690] 
> [9.870701] Kernel panic - not syncing: Fatal exception
> 
> Thanks
> -Sachin

-- 
Michal Hocko
SUSE Labs


Re: [5.6.0-rc2-next-20200218/powerpc] Boot failure on POWER9

2020-02-18 Thread Kirill Tkhai
On 18.02.2020 14:38, Sachin Sant wrote:
> 
> 
>> On 18-Feb-2020, at 4:20 PM, Kirill Tkhai  wrote:
>>
>> Hi, Sachin,
>>
>> On 18.02.2020 13:45, Sachin Sant wrote:
>>>
>>> commit a75056fc1e7c 
>>> mm/memcontrol.c: allocate shrinker_map on appropriate NUMA node
>>>
>>> I can boot the kernel successfully if the patch is reverted. 
>>
>>
>> could you please test your boot with original patch from here:
>>
>> https://patchwork.kernel.org/patch/11360007/
>>
>> ?
> With this original patch I can boot the machine successfully.

Ok, thanks.

I think, there is no a problem in the commited patch, since 
mem_cgroup_css_alloc()
is called from the place, where any memory allocations have to be allowed. This
is one of the reason, memory_cgrp_subsys.early_init is 0, and all nodes 
allocations
should be availeble there.

The problem is not in vmalloc() itself, since the second patch with 
kmalloc_node()
also fails on your setup. Maybe, the reproduction depends on amount of allocated
memory. For me this looks like a problem in powerpc, but it would be interesting
to hear some comments from powerpc guys.

For now we may replace the commited patch with v2 
(https://patchwork.kernel.org/patch/11360007/)
containing workaround, which we have in another 
alloc_mem_cgroup_per_node_info() allocations.

Kirill


Re: [5.6.0-rc2-next-20200218/powerpc] Boot failure on POWER9

2020-02-18 Thread Sachin Sant


>> could you please test your boot with original patch from here:
>> 
>> https://patchwork.kernel.org/patch/11360007/
> 
> After you tried the above patch instead of the problem patch,
> do one more test and apply the below on current linux-next.
> Please, say which of the patches makes your kernel bootable again.
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 63bb6a2aab81..7b9b48dcbc60 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -334,7 +334,7 @@ static int memcg_expand_one_shrinker_map(struct 
> mem_cgroup *memcg,
>   if (!old)
>   return 0;
> 
> - new = kvmalloc_node(sizeof(*new) + size, GFP_KERNEL, nid);
> + new = kmalloc_node(sizeof(*new) + size, GFP_KERNEL, nid);
>   if (!new)
>   return -ENOMEM;
> 
> @@ -378,7 +378,7 @@ static int memcg_alloc_shrinker_maps(struct mem_cgroup 
> *memcg)
>   mutex_lock(_shrinker_map_mutex);
>   size = memcg_shrinker_map_size;
>   for_each_node(nid) {
> - map = kvzalloc_node(sizeof(*map) + size, GFP_KERNEL, nid);
> + map = kzalloc_node(sizeof(*map) + size, GFP_KERNEL, nid);
>   if (!map) {
>   memcg_free_shrinker_maps(memcg);
>   ret = -ENOMEM;

With this incremental patch applied on top of current linux-next, machine fails 
to boot

[8.868433] BUG: Kernel NULL pointer dereference on read at 0x73b0
[8.868439] Faulting instruction address: 0xc03d55f4
[8.868444] Oops: Kernel access of bad area, sig: 11 [#1]
[8.868449] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
[8.868453] Modules linked in:
[8.868458] CPU: 18 PID: 1 Comm: systemd Not tainted 
5.6.0-rc2-next-20200218-autotest+ #4
[8.868463] NIP:  c03d55f4 LR: c03d5b94 CTR: 
[8.868468] REGS: c008b3783710 TRAP: 0300   Not tainted  
(5.6.0-rc2-next-20200218-autotest+)
[8.868474] MSR:  80009033   CR: 24004844  
XER: 
[8.868481] CFAR: c000dec4 DAR: 73b0 DSISR: 4000 
IRQMASK: 1 
[8.868481] GPR00: c03d5b94 c008b37839a0 c155d400 
c008b301f500 
[8.868481] GPR04: 0dc0 0002 c03fee38 
c008bb298620 
[8.868481] GPR08: 0008ba1f 0001  
 
[8.868481] GPR12: 24004844 c0001ec54200  
 
[8.868481] GPR16: c008a1a60048 c1595898 c1750c18 
0002 
[8.868481] GPR20: c1750c28 c1624470 000fffe0 
5deadbeef122 
[8.868481] GPR24: 0001 0dc0 0002 
c03fee38 
[8.868481] GPR28: c008b301f500 c008bb298620  
c00c02286d00 
[8.868529] NIP [c03d55f4] ___slab_alloc+0x1f4/0x760
[8.868534] LR [c03d5b94] __slab_alloc+0x34/0x60
[8.868538] Call Trace:
[8.868541] [c008b37839a0] [c03d5734] ___slab_alloc+0x334/0x760 
(unreliable)
[8.868547] [c008b3783a80] [c03d5b94] __slab_alloc+0x34/0x60
[8.868553] [c008b3783ab0] [c03d6fa0] __kmalloc_node+0x110/0x490
[8.868559] [c008b3783b30] [c03fee38] 
mem_cgroup_css_online+0x108/0x270
[8.868565] [c008b3783b90] [c0235aa8] online_css+0x48/0xd0
[8.868571] [c008b3783bc0] [c023eaec] 
cgroup_apply_control_enable+0x2ec/0x4d0
[8.868577] [c008b3783ca0] [c0242318] cgroup_mkdir+0x228/0x5f0
[8.868583] [c008b3783d10] [c051e170] kernfs_iop_mkdir+0x90/0xf0
[8.868589] [c008b3783d50] [c043dc00] vfs_mkdir+0x110/0x230
[8.868594] [c008b3783da0] [c0441c90] do_mkdirat+0xb0/0x1a0
[8.868601] [c008b3783e20] [c000b278] system_call+0x5c/0x68
[8.868605] Instruction dump:
[8.868608] 7c421378 e95f 714a0001 4082fff0 4b64 6000 6000 
faa10088 
[8.868615] 3ea2000c 3ab57070 7b4a1f24 7d55502a  2faa 409e0394 
3d02002a 
[8.868623] ---[ end trace f9b8e3c36493f430 ]---
[8.870690] 
[9.870701] Kernel panic - not syncing: Fatal exception

Thanks
-Sachin



Re: [5.6.0-rc2-next-20200218/powerpc] Boot failure on POWER9

2020-02-18 Thread Sachin Sant



> On 18-Feb-2020, at 4:20 PM, Kirill Tkhai  wrote:
> 
> Hi, Sachin,
> 
> On 18.02.2020 13:45, Sachin Sant wrote:
>> 
>> commit a75056fc1e7c 
>> mm/memcontrol.c: allocate shrinker_map on appropriate NUMA node
>> 
>> I can boot the kernel successfully if the patch is reverted. 
> 
> 
> could you please test your boot with original patch from here:
> 
> https://patchwork.kernel.org/patch/11360007/
> 
> ?
With this original patch I can boot the machine successfully.

Thanks
-Sachin



Re: [5.6.0-rc2-next-20200218/powerpc] Boot failure on POWER9

2020-02-18 Thread Kirill Tkhai
On 18.02.2020 14:01, Kirill Tkhai wrote:
> On 18.02.2020 13:50, Kirill Tkhai wrote:
>> Hi, Sachin,
>>
>> On 18.02.2020 13:45, Sachin Sant wrote:
>>> Todays next fails to boot on a POWER9 PowerVM logical partition
>>> with following trace:
>>>
>>> [8.767660] random: systemd: uninitialized urandom read (16 bytes read)
>>> [8.768629] BUG: Kernel NULL pointer dereference on read at 0x73b0
>>> [8.768635] Faulting instruction address: 0xc03d55f4
>>> [8.768641] Oops: Kernel access of bad area, sig: 11 [#1]
>>> [8.768645] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
>>> [8.768650] Modules linked in:
>>> [8.768655] CPU: 19 PID: 1 Comm: systemd Not tainted 
>>> 5.6.0-rc2-next-20200218-autotest #1
>>> [8.768660] NIP:  c03d55f4 LR: c03d5b94 CTR: 
>>> 
>>> [8.768666] REGS: c008b37836d0 TRAP: 0300   Not tainted  
>>> (5.6.0-rc2-next-20200218-autotest)
>>> [8.768671] MSR:  80009033   CR: 24004844  
>>> XER: 
>>> [8.768679] CFAR: c000dec4 DAR: 73b0 DSISR: 4000 
>>> IRQMASK: 1
>>> [8.768679] GPR00: c03d5b94 c008b3783960 c155d400 
>>> c008b301f500
>>> [8.768679] GPR04: 0dc0 0002 c03443d8 
>>> c008bb398620
>>> [8.768679] GPR08: 0008ba2f 0001  
>>> 
>>> [8.768679] GPR12: 24004844 c0001ec52a00  
>>> 
>>> [8.768679] GPR16: c008a1b20048 c1595898 c1750c18 
>>> 0002
>>> [8.768679] GPR20: c1750c28 c1624470 000fffe0 
>>> 5deadbeef122
>>> [8.768679] GPR24: 0001 0dc0 0002 
>>> c03443d8
>>> [8.768679] GPR28: c008b301f500 c008bb398620  
>>> c00c02287180
>>> [8.768727] NIP [c03d55f4] ___slab_alloc+0x1f4/0x760
>>> [8.768732] LR [c03d5b94] __slab_alloc+0x34/0x60
>>> [8.768735] Call Trace:
>>> [8.768739] [c008b3783960] [c03d5734] 
>>> ___slab_alloc+0x334/0x760 (unreliable)
>>> [8.768745] [c008b3783a40] [c03d5b94] __slab_alloc+0x34/0x60
>>> [8.768751] [c008b3783a70] [c03d6fa0] 
>>> __kmalloc_node+0x110/0x490
>>> [8.768757] [c008b3783af0] [c03443d8] 
>>> kvmalloc_node+0x58/0x110
>>> [8.768763] [c008b3783b30] [c03fee38] 
>>> mem_cgroup_css_online+0x108/0x270
>>> [8.768769] [c008b3783b90] [c0235aa8] online_css+0x48/0xd0
>>> [8.768775] [c008b3783bc0] [c023eaec] 
>>> cgroup_apply_control_enable+0x2ec/0x4d0
>>> [8.768781] [c008b3783ca0] [c0242318] 
>>> cgroup_mkdir+0x228/0x5f0
>>> [8.768786] [c008b3783d10] [c051e170] 
>>> kernfs_iop_mkdir+0x90/0xf0
>>> [8.768792] [c008b3783d50] [c043dc00] vfs_mkdir+0x110/0x230
>>> [8.768797] [c008b3783da0] [c0441c90] do_mkdirat+0xb0/0x1a0
>>> [8.768804] [c008b3783e20] [c000b278] system_call+0x5c/0x68
>>> [8.768808] Instruction dump:
>>> [8.768811] 7c421378 e95f 714a0001 4082fff0 4b64 6000 
>>> 6000 faa10088
>>> [8.768818] 3ea2000c 3ab57070 7b4a1f24 7d55502a  2faa 
>>> 409e0394 3d02002a
>>> [8.768826] ---[ end trace 631af2cb73507891 ]---
>>> [8.770876]
>>> [9.770887] Kernel panic - not syncing: Fatal exception
>>>
>>> Bisect reveals the problem was introduced in next-20200217 by following 
>>> commit 
>>>
>>> commit a75056fc1e7c 
>>> mm/memcontrol.c: allocate shrinker_map on appropriate NUMA node
>>>
>>> I can boot the kernel successfully if the patch is reverted. 
>>
>>
>> could you please test your boot with original patch from here:
>>
>> https://patchwork.kernel.org/patch/11360007/
> 
> After you tried the above patch instead of the problem patch,
> do one more test and apply the below on current linux-next.
> Please, say which of the patches makes your kernel bootable again.
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 63bb6a2aab81..7b9b48dcbc60 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -334,7 +334,7 @@ static int memcg_expand_one_shrinker_map(struct 
> mem_cgroup *memcg,
>   if (!old)
>   return 0;
>  
> - new = kvmalloc_node(sizeof(*new) + size, GFP_KERNEL, nid);
> + new = kmalloc_node(sizeof(*new) + size, GFP_KERNEL, nid);
>   if (!new)
>   return -ENOMEM;
>  
> @@ -378,7 +378,7 @@ static int memcg_alloc_shrinker_maps(struct mem_cgroup 
> *memcg)
>   mutex_lock(_shrinker_map_mutex);
>   size = memcg_shrinker_map_size;
>   for_each_node(nid) {
> - map = kvzalloc_node(sizeof(*map) + size, GFP_KERNEL, nid);
> + map = kzalloc_node(sizeof(*map) + size, GFP_KERNEL, nid);
>   if (!map) {
>   

Re: [5.6.0-rc2-next-20200218/powerpc] Boot failure on POWER9

2020-02-18 Thread Kirill Tkhai
Hi, Sachin,

On 18.02.2020 13:45, Sachin Sant wrote:
> Todays next fails to boot on a POWER9 PowerVM logical partition
> with following trace:
> 
> [8.767660] random: systemd: uninitialized urandom read (16 bytes read)
> [8.768629] BUG: Kernel NULL pointer dereference on read at 0x73b0
> [8.768635] Faulting instruction address: 0xc03d55f4
> [8.768641] Oops: Kernel access of bad area, sig: 11 [#1]
> [8.768645] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
> [8.768650] Modules linked in:
> [8.768655] CPU: 19 PID: 1 Comm: systemd Not tainted 
> 5.6.0-rc2-next-20200218-autotest #1
> [8.768660] NIP:  c03d55f4 LR: c03d5b94 CTR: 
> 
> [8.768666] REGS: c008b37836d0 TRAP: 0300   Not tainted  
> (5.6.0-rc2-next-20200218-autotest)
> [8.768671] MSR:  80009033   CR: 24004844  
> XER: 
> [8.768679] CFAR: c000dec4 DAR: 73b0 DSISR: 4000 
> IRQMASK: 1
> [8.768679] GPR00: c03d5b94 c008b3783960 c155d400 
> c008b301f500
> [8.768679] GPR04: 0dc0 0002 c03443d8 
> c008bb398620
> [8.768679] GPR08: 0008ba2f 0001  
> 
> [8.768679] GPR12: 24004844 c0001ec52a00  
> 
> [8.768679] GPR16: c008a1b20048 c1595898 c1750c18 
> 0002
> [8.768679] GPR20: c1750c28 c1624470 000fffe0 
> 5deadbeef122
> [8.768679] GPR24: 0001 0dc0 0002 
> c03443d8
> [8.768679] GPR28: c008b301f500 c008bb398620  
> c00c02287180
> [8.768727] NIP [c03d55f4] ___slab_alloc+0x1f4/0x760
> [8.768732] LR [c03d5b94] __slab_alloc+0x34/0x60
> [8.768735] Call Trace:
> [8.768739] [c008b3783960] [c03d5734] 
> ___slab_alloc+0x334/0x760 (unreliable)
> [8.768745] [c008b3783a40] [c03d5b94] __slab_alloc+0x34/0x60
> [8.768751] [c008b3783a70] [c03d6fa0] 
> __kmalloc_node+0x110/0x490
> [8.768757] [c008b3783af0] [c03443d8] kvmalloc_node+0x58/0x110
> [8.768763] [c008b3783b30] [c03fee38] 
> mem_cgroup_css_online+0x108/0x270
> [8.768769] [c008b3783b90] [c0235aa8] online_css+0x48/0xd0
> [8.768775] [c008b3783bc0] [c023eaec] 
> cgroup_apply_control_enable+0x2ec/0x4d0
> [8.768781] [c008b3783ca0] [c0242318] cgroup_mkdir+0x228/0x5f0
> [8.768786] [c008b3783d10] [c051e170] 
> kernfs_iop_mkdir+0x90/0xf0
> [8.768792] [c008b3783d50] [c043dc00] vfs_mkdir+0x110/0x230
> [8.768797] [c008b3783da0] [c0441c90] do_mkdirat+0xb0/0x1a0
> [8.768804] [c008b3783e20] [c000b278] system_call+0x5c/0x68
> [8.768808] Instruction dump:
> [8.768811] 7c421378 e95f 714a0001 4082fff0 4b64 6000 6000 
> faa10088
> [8.768818] 3ea2000c 3ab57070 7b4a1f24 7d55502a  2faa 
> 409e0394 3d02002a
> [8.768826] ---[ end trace 631af2cb73507891 ]---
> [8.770876]
> [9.770887] Kernel panic - not syncing: Fatal exception
> 
> Bisect reveals the problem was introduced in next-20200217 by following 
> commit 
> 
> commit a75056fc1e7c 
> mm/memcontrol.c: allocate shrinker_map on appropriate NUMA node
> 
> I can boot the kernel successfully if the patch is reverted. 


could you please test your boot with original patch from here:

https://patchwork.kernel.org/patch/11360007/

?


Re: [5.6.0-rc2-next-20200218/powerpc] Boot failure on POWER9

2020-02-18 Thread Kirill Tkhai
On 18.02.2020 13:50, Kirill Tkhai wrote:
> Hi, Sachin,
> 
> On 18.02.2020 13:45, Sachin Sant wrote:
>> Todays next fails to boot on a POWER9 PowerVM logical partition
>> with following trace:
>>
>> [8.767660] random: systemd: uninitialized urandom read (16 bytes read)
>> [8.768629] BUG: Kernel NULL pointer dereference on read at 0x73b0
>> [8.768635] Faulting instruction address: 0xc03d55f4
>> [8.768641] Oops: Kernel access of bad area, sig: 11 [#1]
>> [8.768645] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
>> [8.768650] Modules linked in:
>> [8.768655] CPU: 19 PID: 1 Comm: systemd Not tainted 
>> 5.6.0-rc2-next-20200218-autotest #1
>> [8.768660] NIP:  c03d55f4 LR: c03d5b94 CTR: 
>> 
>> [8.768666] REGS: c008b37836d0 TRAP: 0300   Not tainted  
>> (5.6.0-rc2-next-20200218-autotest)
>> [8.768671] MSR:  80009033   CR: 24004844  
>> XER: 
>> [8.768679] CFAR: c000dec4 DAR: 73b0 DSISR: 4000 
>> IRQMASK: 1
>> [8.768679] GPR00: c03d5b94 c008b3783960 c155d400 
>> c008b301f500
>> [8.768679] GPR04: 0dc0 0002 c03443d8 
>> c008bb398620
>> [8.768679] GPR08: 0008ba2f 0001  
>> 
>> [8.768679] GPR12: 24004844 c0001ec52a00  
>> 
>> [8.768679] GPR16: c008a1b20048 c1595898 c1750c18 
>> 0002
>> [8.768679] GPR20: c1750c28 c1624470 000fffe0 
>> 5deadbeef122
>> [8.768679] GPR24: 0001 0dc0 0002 
>> c03443d8
>> [8.768679] GPR28: c008b301f500 c008bb398620  
>> c00c02287180
>> [8.768727] NIP [c03d55f4] ___slab_alloc+0x1f4/0x760
>> [8.768732] LR [c03d5b94] __slab_alloc+0x34/0x60
>> [8.768735] Call Trace:
>> [8.768739] [c008b3783960] [c03d5734] 
>> ___slab_alloc+0x334/0x760 (unreliable)
>> [8.768745] [c008b3783a40] [c03d5b94] __slab_alloc+0x34/0x60
>> [8.768751] [c008b3783a70] [c03d6fa0] 
>> __kmalloc_node+0x110/0x490
>> [8.768757] [c008b3783af0] [c03443d8] kvmalloc_node+0x58/0x110
>> [8.768763] [c008b3783b30] [c03fee38] 
>> mem_cgroup_css_online+0x108/0x270
>> [8.768769] [c008b3783b90] [c0235aa8] online_css+0x48/0xd0
>> [8.768775] [c008b3783bc0] [c023eaec] 
>> cgroup_apply_control_enable+0x2ec/0x4d0
>> [8.768781] [c008b3783ca0] [c0242318] cgroup_mkdir+0x228/0x5f0
>> [8.768786] [c008b3783d10] [c051e170] 
>> kernfs_iop_mkdir+0x90/0xf0
>> [8.768792] [c008b3783d50] [c043dc00] vfs_mkdir+0x110/0x230
>> [8.768797] [c008b3783da0] [c0441c90] do_mkdirat+0xb0/0x1a0
>> [8.768804] [c008b3783e20] [c000b278] system_call+0x5c/0x68
>> [8.768808] Instruction dump:
>> [8.768811] 7c421378 e95f 714a0001 4082fff0 4b64 6000 
>> 6000 faa10088
>> [8.768818] 3ea2000c 3ab57070 7b4a1f24 7d55502a  2faa 
>> 409e0394 3d02002a
>> [8.768826] ---[ end trace 631af2cb73507891 ]---
>> [8.770876]
>> [9.770887] Kernel panic - not syncing: Fatal exception
>>
>> Bisect reveals the problem was introduced in next-20200217 by following 
>> commit 
>>
>> commit a75056fc1e7c 
>> mm/memcontrol.c: allocate shrinker_map on appropriate NUMA node
>>
>> I can boot the kernel successfully if the patch is reverted. 
> 
> 
> could you please test your boot with original patch from here:
> 
> https://patchwork.kernel.org/patch/11360007/

After you tried the above patch instead of the problem patch,
do one more test and apply the below on current linux-next.
Please, say which of the patches makes your kernel bootable again.

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 63bb6a2aab81..7b9b48dcbc60 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -334,7 +334,7 @@ static int memcg_expand_one_shrinker_map(struct mem_cgroup 
*memcg,
if (!old)
return 0;
 
-   new = kvmalloc_node(sizeof(*new) + size, GFP_KERNEL, nid);
+   new = kmalloc_node(sizeof(*new) + size, GFP_KERNEL, nid);
if (!new)
return -ENOMEM;
 
@@ -378,7 +378,7 @@ static int memcg_alloc_shrinker_maps(struct mem_cgroup 
*memcg)
mutex_lock(_shrinker_map_mutex);
size = memcg_shrinker_map_size;
for_each_node(nid) {
-   map = kvzalloc_node(sizeof(*map) + size, GFP_KERNEL, nid);
+   map = kzalloc_node(sizeof(*map) + size, GFP_KERNEL, nid);
if (!map) {
memcg_free_shrinker_maps(memcg);
ret = -ENOMEM;