Re: [PATCH] arm64, numa: Add cpu_to_node() implementation.

2016-10-06 Thread Robert Richter
On 27.09.16 14:26:08, Hanjun Guo wrote:
> On 09/20/2016 09:21 PM, Robert Richter wrote:
> >On 20.09.16 19:32:34, Hanjun Guo wrote:
> >>On 09/20/2016 06:43 PM, Robert Richter wrote:
> >
> >>>Unfortunately either your nor my code does fix the BUG_ON() I see with
> >>>the numa kernel:
> >>>
> >>>  kernel BUG at mm/page_alloc.c:1848!
> >>>
> >>>See below for the core dump. It looks like this happens due to moving
> >>>a mem block where first and last page are mapped to different numa
> >>>nodes, thus, triggering the BUG_ON().
> >>
> >>Didn't triggered it on our NUMA hardware, could you provide your
> >>config then we can have a try?
> >
> >Config attached. Other configs with an initrd fail too.
> 
> hmm, we can't reproduce it on our hardware, do we need
> to run some specific stress test on it?

No, it depends on the efi memory zones marked reserved. See my other
thread on this where I have attached mem ranges from the log. I have a
fix available already.

Thanks,

-Robert



Re: [PATCH] arm64, numa: Add cpu_to_node() implementation.

2016-09-26 Thread Hanjun Guo

On 09/20/2016 09:21 PM, Robert Richter wrote:

On 20.09.16 19:32:34, Hanjun Guo wrote:

On 09/20/2016 06:43 PM, Robert Richter wrote:



Unfortunately either your nor my code does fix the BUG_ON() I see with
the numa kernel:

  kernel BUG at mm/page_alloc.c:1848!

See below for the core dump. It looks like this happens due to moving
a mem block where first and last page are mapped to different numa
nodes, thus, triggering the BUG_ON().


Didn't triggered it on our NUMA hardware, could you provide your
config then we can have a try?


Config attached. Other configs with an initrd fail too.


hmm, we can't reproduce it on our hardware, do we need
to run some specific stress test on it?

Thanks
Hanjun


Re: [PATCH] arm64, numa: Add cpu_to_node() implementation.

2016-09-21 Thread Jon Masters
On 09/20/2016 10:12 AM, Hanjun Guo wrote:
> On 09/20/2016 09:38 PM, Robert Richter wrote:
>> On 20.09.16 19:32:34, Hanjun Guo wrote:
>>> On 09/20/2016 06:43 PM, Robert Richter wrote:
>>
 Instead we need to make sure the set_*numa_node() functions are called
 earlier before secondary cpus are booted. My suggested change for that
 is this:


 diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
 index d93d43352504..952365c2f100 100644
 --- a/arch/arm64/kernel/smp.c
 +++ b/arch/arm64/kernel/smp.c
 @@ -204,7 +204,6 @@ int __cpu_up(unsigned int cpu, struct
 task_struct *idle)
   static void smp_store_cpu_info(unsigned int cpuid)
   {
   store_cpu_topology(cpuid);
 -numa_store_cpu_info(cpuid);
   }

   /*
 @@ -719,6 +718,7 @@ void __init smp_prepare_cpus(unsigned int max_cpus)
   continue;

   set_cpu_present(cpu, true);
 +numa_store_cpu_info(cpu);
   }
   }
>>>
>>> We tried a similar approach which add numa_store_cpu_info() in
>>> early_map_cpu_to_node(), and remove it from smp_store_cpu_info,
>>> but didn't work for us, we will try your approach to see if works.
> 
> And it works :)

Great. I'm curious for further (immediate) feedback on David's updated
patch in the other thread due to some time sensitive needs on our end.

Jon.



-- 
Computer Architect | Sent from my Fedora powered laptop


Re: [PATCH] arm64, numa: Add cpu_to_node() implementation.

2016-09-20 Thread David Daney

On 09/20/2016 03:43 AM, Robert Richter wrote:
[...]


Instead we need to make sure the set_*numa_node() functions are called
earlier before secondary cpus are booted. My suggested change for that
is this:


diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index d93d43352504..952365c2f100 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -204,7 +204,6 @@ int __cpu_up(unsigned int cpu, struct task_struct *idle)
  static void smp_store_cpu_info(unsigned int cpuid)
  {
store_cpu_topology(cpuid);
-   numa_store_cpu_info(cpuid);
  }

  /*
@@ -719,6 +718,7 @@ void __init smp_prepare_cpus(unsigned int max_cpus)
continue;

set_cpu_present(cpu, true);
+   numa_store_cpu_info(cpu);
}
  }


I have tested the code and it properly sets up all per-cpu workqueues.



Thanks Robert,

I have tested a slightly modified version of that, and it seems to also 
fix the problem for me.


I will submit a cleaned up patch.

David Daney




Unfortunately either your nor my code does fix the BUG_ON() I see with
the numa kernel:

  kernel BUG at mm/page_alloc.c:1848!

See below for the core dump. It looks like this happens due to moving
a mem block where first and last page are mapped to different numa
nodes, thus, triggering the BUG_ON().

Continuing with my investigations...

-Robert



[9.674272] [ cut here ]
[9.678881] kernel BUG at mm/page_alloc.c:1848!
[9.683406] Internal error: Oops - BUG: 0 [#1] SMP
[9.688190] Modules linked in:
[9.691247] CPU: 77 PID: 1 Comm: swapper/0 Tainted: GW   
4.8.0-rc5.vanilla5-00030-ga2b86cb3ce72 #38
[9.701322] Hardware name: www.cavium.com ThunderX CRB-2S/ThunderX CRB-2S, 
BIOS 0.3 Aug 24 2016
[9.710008] task: 800fe4561400 task.stack: 800ffbe0c000
[9.715939] PC is at move_freepages+0x160/0x168
[9.720460] LR is at move_freepages+0x160/0x168
[9.724979] pc : [] lr : [] pstate: 
60c5
[9.732362] sp : 800ffbe0f510
[9.735666] x29: 800ffbe0f510 x28: 7fe043f80020
[9.740975] x27: 7fe043f8 x26: 000c
[9.746283] x25: 000c x24: 810af0e0
[9.751591] x23: 0001 x22: 
[9.756898] x21: 7fe043c0 x20: 810aeb00
[9.762206] x19: 7fe043f8 x18: 0010
[9.767513] x17:  x16: 0001
[9.772821] x15: 88f03f37 x14: 6e2c303d64696e2c
[9.778128] x13: 30383365 x12: 66303038
[9.783436] x11: 3d656e6f7a203a64 x10: 0536
[9.788744] x9 : 0060 x8 : 303062656166
[9.794051] x7 : 66303138 x6 : 08f03f97
[9.799359] x5 : 0006 x4 : 000c
[9.804667] x3 : 0001 x2 : 0001
[9.809975] x1 : 08da7be0 x0 : 0050

[   10.517213] Call trace:
[   10.519651] Exception stack(0x800ffbe0f340 to 0x800ffbe0f470)
[   10.526081] f340: 7fe043f8 0001 800ffbe0f510 
081ec7d0
[   10.533900] f360: 08f03988 08da7bc8 800ffbe0f410 
081275fc
[   10.541718] f380: 800ffbe0f470 08ac5a00 7fe043c0 

[   10.549536] f3a0: 0001 810af0e0 000c 
000c
[   10.557355] f3c0: 7fe043f8 7fe043f80020 0030 

[   10.565173] f3e0: 0050 08da7be0 0001 
0001
[   10.572991] f400: 000c 0006 08f03f97 
66303138
[   10.580809] f420: 303062656166 0060 0536 
3d656e6f7a203a64
[   10.588628] f440: 66303038 30383365 6e2c303d64696e2c 
88f03f37
[   10.596446] f460: 0001 
[   10.601316] [] move_freepages+0x160/0x168
[   10.606879] [] move_freepages_block+0xa8/0xb8
[   10.612788] [] __rmqueue+0x610/0x670
[   10.617918] [] get_page_from_freelist+0x3cc/0xb40
[   10.624174] [] __alloc_pages_nodemask+0x12c/0xd40
[   10.630438] [] alloc_page_interleave+0x60/0xb0
[   10.636434] [] alloc_pages_current+0x108/0x168
[   10.642430] [] __page_cache_alloc+0x104/0x140
[   10.648339] [] pagecache_get_page+0x118/0x2e8
[   10.654248] [] grab_cache_page_write_begin+0x48/0x68
[   10.660769] [] simple_write_begin+0x40/0x150
[   10.666591] [] generic_perform_write+0xb8/0x1a0
[   10.672674] [] __generic_file_write_iter+0x178/0x1c8
[   10.679191] [] generic_file_write_iter+0xcc/0x1c8
[   10.685448] [] __vfs_write+0xcc/0x140
[   10.690663] [] vfs_write+0xa8/0x1c0
[   10.695704] [] SyS_write+0x54/0xb0
[   10.700666] [] xwrite+0x34/0x7c
[   10.705359] [] do_copy+0x9c/0xf4
[   10.710140] [] write_buffer+0x34/0x50
[   10.715354] [] flush_buffer+0x48/0xb8
[   10.720579] [] __gunzip+0x27c/0x324
[   10.725620] [] gunzip+0x18/0x20
[   10.730314] [] unpack_to_rootfs+0x168/0x280
[   

Re: [PATCH] arm64, numa: Add cpu_to_node() implementation.

2016-09-20 Thread Hanjun Guo

On 09/20/2016 09:38 PM, Robert Richter wrote:

On 20.09.16 19:32:34, Hanjun Guo wrote:

On 09/20/2016 06:43 PM, Robert Richter wrote:



Instead we need to make sure the set_*numa_node() functions are called
earlier before secondary cpus are booted. My suggested change for that
is this:


diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index d93d43352504..952365c2f100 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -204,7 +204,6 @@ int __cpu_up(unsigned int cpu, struct task_struct *idle)
  static void smp_store_cpu_info(unsigned int cpuid)
  {
store_cpu_topology(cpuid);
-   numa_store_cpu_info(cpuid);
  }

  /*
@@ -719,6 +718,7 @@ void __init smp_prepare_cpus(unsigned int max_cpus)
continue;

set_cpu_present(cpu, true);
+   numa_store_cpu_info(cpu);
}
  }


We tried a similar approach which add numa_store_cpu_info() in
early_map_cpu_to_node(), and remove it from smp_store_cpu_info,
but didn't work for us, we will try your approach to see if works.


And it works :)



Calling it in early_map_cpu_to_node() is probably too early since
setup_node_to_cpumask_map() is called in numa_init() afterwards
overwriting it again.

Actually, early_map_cpu_to_node() is used to temporarily store the
mapping until it can be setup in numa_store_cpu_info().


Thanks for the clarify, let's wait for David's reply on this one.

Thanks
Hanjun



Re: [PATCH] arm64, numa: Add cpu_to_node() implementation.

2016-09-20 Thread Robert Richter
On 20.09.16 19:32:34, Hanjun Guo wrote:
> On 09/20/2016 06:43 PM, Robert Richter wrote:

> >Instead we need to make sure the set_*numa_node() functions are called
> >earlier before secondary cpus are booted. My suggested change for that
> >is this:
> >
> >
> >diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
> >index d93d43352504..952365c2f100 100644
> >--- a/arch/arm64/kernel/smp.c
> >+++ b/arch/arm64/kernel/smp.c
> >@@ -204,7 +204,6 @@ int __cpu_up(unsigned int cpu, struct task_struct *idle)
> >  static void smp_store_cpu_info(unsigned int cpuid)
> >  {
> > store_cpu_topology(cpuid);
> >-numa_store_cpu_info(cpuid);
> >  }
> >
> >  /*
> >@@ -719,6 +718,7 @@ void __init smp_prepare_cpus(unsigned int max_cpus)
> > continue;
> >
> > set_cpu_present(cpu, true);
> >+numa_store_cpu_info(cpu);
> > }
> >  }
> 
> We tried a similar approach which add numa_store_cpu_info() in
> early_map_cpu_to_node(), and remove it from smp_store_cpu_info,
> but didn't work for us, we will try your approach to see if works.

Calling it in early_map_cpu_to_node() is probably too early since
setup_node_to_cpumask_map() is called in numa_init() afterwards
overwriting it again.

Actually, early_map_cpu_to_node() is used to temporarily store the
mapping until it can be setup in numa_store_cpu_info().

-Robert


Re: [PATCH] arm64, numa: Add cpu_to_node() implementation.

2016-09-20 Thread Robert Richter
On 20.09.16 19:32:34, Hanjun Guo wrote:
> On 09/20/2016 06:43 PM, Robert Richter wrote:

> >Unfortunately either your nor my code does fix the BUG_ON() I see with
> >the numa kernel:
> >
> >  kernel BUG at mm/page_alloc.c:1848!
> >
> >See below for the core dump. It looks like this happens due to moving
> >a mem block where first and last page are mapped to different numa
> >nodes, thus, triggering the BUG_ON().
> 
> Didn't triggered it on our NUMA hardware, could you provide your
> config then we can have a try?

Config attached. Other configs with an initrd fail too.

-Robert


config-4.8.0-rc5.vanilla5.xz
Description: application/xz


Re: [PATCH] arm64, numa: Add cpu_to_node() implementation.

2016-09-20 Thread Hanjun Guo

+Cc Yisheng,

On 09/20/2016 06:43 PM, Robert Richter wrote:

David,

On 19.09.16 11:49:30, David Daney wrote:

Fix by supplying a cpu_to_node() implementation that returns correct
node mappings.



+int cpu_to_node(int cpu)
+{
+   int nid;
+
+   /*
+* Return 0 for unknown mapping so that we report something
+* sensible if firmware doesn't supply a proper mapping.
+*/
+   if (cpu < 0 || cpu >= NR_CPUS)
+   return 0;
+
+   nid = cpu_to_node_map[cpu];
+   if (nid == NUMA_NO_NODE)
+   nid = 0;
+   return nid;
+}
+EXPORT_SYMBOL(cpu_to_node);


this implementation fixes the per-cpu workqueue initialization, but I
don't think a cpu_to_node() implementation private to arm64 is the
proper solution.

Apart from better using generic code, the cpu_to_node() function is
called in the kernel's fast path. I think your implementation is too
expensive and also does not consider per-cpu data access for the
lookup as the generic code does. Secondly, numa_off is not considered
at all.

Instead we need to make sure the set_*numa_node() functions are called
earlier before secondary cpus are booted. My suggested change for that
is this:


diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index d93d43352504..952365c2f100 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -204,7 +204,6 @@ int __cpu_up(unsigned int cpu, struct task_struct *idle)
  static void smp_store_cpu_info(unsigned int cpuid)
  {
store_cpu_topology(cpuid);
-   numa_store_cpu_info(cpuid);
  }

  /*
@@ -719,6 +718,7 @@ void __init smp_prepare_cpus(unsigned int max_cpus)
continue;

set_cpu_present(cpu, true);
+   numa_store_cpu_info(cpu);
}
  }


We tried a similar approach which add numa_store_cpu_info() in
early_map_cpu_to_node(), and remove it from smp_store_cpu_info,
but didn't work for us, we will try your approach to see if works.




I have tested the code and it properly sets up all per-cpu workqueues.

Unfortunately either your nor my code does fix the BUG_ON() I see with
the numa kernel:

  kernel BUG at mm/page_alloc.c:1848!

See below for the core dump. It looks like this happens due to moving
a mem block where first and last page are mapped to different numa
nodes, thus, triggering the BUG_ON().


Didn't triggered it on our NUMA hardware, could you provide your
config then we can have a try?

Thanks
Hanjun


Re: [PATCH] arm64, numa: Add cpu_to_node() implementation.

2016-09-20 Thread Mark Rutland
On Tue, Sep 20, 2016 at 12:43:48PM +0200, Robert Richter wrote:
> Unfortunately either your nor my code does fix the BUG_ON() I see with
> the numa kernel:
> 
>  kernel BUG at mm/page_alloc.c:1848!
> 
> See below for the core dump. It looks like this happens due to moving
> a mem block where first and last page are mapped to different numa
> nodes, thus, triggering the BUG_ON().

FWIW, I'm seeing a potentially-related BUG in the same function on a
v4.8-rc7 kernel without CONFIG_NUMA enabled. I have a number of debug
options set, including CONFIG_PAGE_POISONING and CONFIG_DEBUG_PAGEALLOC.

I've included the full log for that below, including subsequent
failures.

I'm triggering this by running $(hackbench 100 process 1000).

Thanks,
Mark.

[  742.923329] [ cut here ]
[  742.927951] kernel BUG at mm/page_alloc.c:1844!
[  742.932475] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
[  742.937951] Modules linked in:
[  742.941075] CPU: 4 PID: 3608 Comm: hackbench Not tainted 4.8.0-rc7 #1
[  742.947506] Hardware name: AMD Seattle (Rev.B0) Development Board 
(Overdrive) (DT)
[  742.955066] task: 800341c4af80 task.stack: 800341c84000
[  742.960981] PC is at move_freepages+0x220/0x338
[  742.965503] LR is at move_freepages_block+0x164/0x1e0
[  742.970544] pc : [] lr : [] pstate: 
21c5
[  742.977928] sp : 800341c86ce0
[  742.981233] x29: 800341c86ce0 x28: 2c8d9308
[  742.986541] x27: 7e000ffbffc0 x26: 0100
[  742.991850] x25:  x24: 083fefff
[  742.997157] x23: 7e000ffb8000 x22: 2c8d9000
[  743.002465] x21: 2c8d9000 x20: 0700
[  743.007772] x19: 7e000ffb8000 x18: 
[  743.013079] x17: 0001 x16: 28553d78
[  743.018387] x15: 0002 x14: 
[  743.023694] x13:  x12: 2b311000
[  743.029000] x11:  x10: 2d54
[  743.034307] x9 : dfff2000 x8 : 2c8d9318
[  743.039614] x7 : 0001 x6 : 1fffe4000149ab66
[  743.044921] x5 : dfff2000 x4 : 2a4d5b30
[  743.050228] x3 :  x2 : 7e000ffbffc0
[  743.055535] x1 :  x0 : 
[  743.060841]
[  743.062324] Process hackbench (pid: 3608, stack limit = 0x800341c84020)
[  743.069276] Stack: (0x800341c86ce0 to 0x800341c88000)
[  743.075013] 6ce0: 800341c86d70 283e0134 083fee00 
7e000ffb8000
[  743.082833] 6d00: 2c8d9000 0840 7e000ffb8000 
083fefff
[  743.090653] 6d20:  0100 2a9873a0 
2c8d9308
[  743.098472] 6d40:  0003 0005 
2c8d9000
[  743.106292] 6d60:  0100 800341c86dc0 
283e11bc
[  743.114111] 6d80: 0005 0001  
0003
[  743.121930] 6da0: 0005 2c8d9000 2b311000 
1000683896e8
[  743.129750] 6dc0: 800341c86f50 283e41fc 0003 
800341c873a0
[  743.137570] 6de0: 002156c0 0008 dfff2000 
2a4d5000
[  743.145389] 6e00:  002156c0 dfff2000 
2c8d9000
[  743.153209] 6e20: 800341c86ee0 281b3758 0001 
800341c84000
[  743.161028] 6e40:  7e000ffbb000 100068390dce 
800341c86f50
[  743.168848] 6e60: 7e000ffbb020 7e000ffbb000 41b58ab3 
2a2777d8
[  743.176668] 6e80: 283e06f8 b394 800341c86e01 
2823cd10
[  743.184487] 6ea0: 2b311000 01c0 2c8d9598 
002156c0
[  743.192307] 6ec0: dfff2000  800341c86f20 
29c28074
[  743.200126] 6ee0: 2c8d9580 0140 283e41c8 
0001
[  743.207946] 6f00: dfff2000 2a4d5000  

[  743.215765] 6f20: 800341c86f50 283e41c8 2c8d96e8 
800341c873a0
[  743.223585] 6f40: 002156c0 283e4884 800341c87160 
283e74d4
[  743.231404] 6f60: 800341c4af80 800341c873a0 002156c0 
0001
[  743.239224] 6f80: dfff2000 2a4d5000  
002156c0
[  743.247043] 6fa0: 100068390e50 0001 002156c0 
2a9827c0
[  743.254863] 6fc0: 800341c873b0 0003 0268 
0008002156c0
[  743.262682] 6fe0: 100068390e18 0001 8003ffef9000 
0008
[  743.270501] 7000: 8003002156c0 0002  
28236b1c
[  743.278321] 7020: 800341c870a0 100068390e76 0140 
800341c4b740
[  743.286141] 7040: 80030003 28236b1c 800341c870e0 
28236b1c
[  743.293960] 7060: 800341c870e0 28236dac 000

Re: [PATCH] arm64, numa: Add cpu_to_node() implementation.

2016-09-20 Thread Robert Richter
David,

On 19.09.16 11:49:30, David Daney wrote:
> Fix by supplying a cpu_to_node() implementation that returns correct
> node mappings.

> +int cpu_to_node(int cpu)
> +{
> + int nid;
> +
> + /*
> +  * Return 0 for unknown mapping so that we report something
> +  * sensible if firmware doesn't supply a proper mapping.
> +  */
> + if (cpu < 0 || cpu >= NR_CPUS)
> + return 0;
> +
> + nid = cpu_to_node_map[cpu];
> + if (nid == NUMA_NO_NODE)
> + nid = 0;
> + return nid;
> +}
> +EXPORT_SYMBOL(cpu_to_node);

this implementation fixes the per-cpu workqueue initialization, but I
don't think a cpu_to_node() implementation private to arm64 is the
proper solution.

Apart from better using generic code, the cpu_to_node() function is
called in the kernel's fast path. I think your implementation is too
expensive and also does not consider per-cpu data access for the
lookup as the generic code does. Secondly, numa_off is not considered
at all.

Instead we need to make sure the set_*numa_node() functions are called
earlier before secondary cpus are booted. My suggested change for that
is this:


diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index d93d43352504..952365c2f100 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -204,7 +204,6 @@ int __cpu_up(unsigned int cpu, struct task_struct *idle)
 static void smp_store_cpu_info(unsigned int cpuid)
 {
store_cpu_topology(cpuid);
-   numa_store_cpu_info(cpuid);
 }
 
 /*
@@ -719,6 +718,7 @@ void __init smp_prepare_cpus(unsigned int max_cpus)
continue;
 
set_cpu_present(cpu, true);
+   numa_store_cpu_info(cpu);
}
 }
 

I have tested the code and it properly sets up all per-cpu workqueues.

Unfortunately either your nor my code does fix the BUG_ON() I see with
the numa kernel:

 kernel BUG at mm/page_alloc.c:1848!

See below for the core dump. It looks like this happens due to moving
a mem block where first and last page are mapped to different numa
nodes, thus, triggering the BUG_ON().

Continuing with my investigations...

-Robert



[9.674272] [ cut here ]
[9.678881] kernel BUG at mm/page_alloc.c:1848!
[9.683406] Internal error: Oops - BUG: 0 [#1] SMP
[9.688190] Modules linked in:
[9.691247] CPU: 77 PID: 1 Comm: swapper/0 Tainted: GW   
4.8.0-rc5.vanilla5-00030-ga2b86cb3ce72 #38
[9.701322] Hardware name: www.cavium.com ThunderX CRB-2S/ThunderX CRB-2S, 
BIOS 0.3 Aug 24 2016
[9.710008] task: 800fe4561400 task.stack: 800ffbe0c000
[9.715939] PC is at move_freepages+0x160/0x168
[9.720460] LR is at move_freepages+0x160/0x168
[9.724979] pc : [] lr : [] pstate: 
60c5
[9.732362] sp : 800ffbe0f510
[9.735666] x29: 800ffbe0f510 x28: 7fe043f80020
[9.740975] x27: 7fe043f8 x26: 000c
[9.746283] x25: 000c x24: 810af0e0
[9.751591] x23: 0001 x22: 
[9.756898] x21: 7fe043c0 x20: 810aeb00
[9.762206] x19: 7fe043f8 x18: 0010
[9.767513] x17:  x16: 0001
[9.772821] x15: 88f03f37 x14: 6e2c303d64696e2c
[9.778128] x13: 30383365 x12: 66303038
[9.783436] x11: 3d656e6f7a203a64 x10: 0536 
[9.788744] x9 : 0060 x8 : 303062656166 
[9.794051] x7 : 66303138 x6 : 08f03f97 
[9.799359] x5 : 0006 x4 : 000c 
[9.804667] x3 : 0001 x2 : 0001 
[9.809975] x1 : 08da7be0 x0 : 0050 

[   10.517213] Call trace:
[   10.519651] Exception stack(0x800ffbe0f340 to 0x800ffbe0f470)
[   10.526081] f340: 7fe043f8 0001 800ffbe0f510 
081ec7d0
[   10.533900] f360: 08f03988 08da7bc8 800ffbe0f410 
081275fc
[   10.541718] f380: 800ffbe0f470 08ac5a00 7fe043c0 

[   10.549536] f3a0: 0001 810af0e0 000c 
000c
[   10.557355] f3c0: 7fe043f8 7fe043f80020 0030 

[   10.565173] f3e0: 0050 08da7be0 0001 
0001
[   10.572991] f400: 000c 0006 08f03f97 
66303138
[   10.580809] f420: 303062656166 0060 0536 
3d656e6f7a203a64
[   10.588628] f440: 66303038 30383365 6e2c303d64696e2c 
88f03f37
[   10.596446] f460: 0001 
[   10.601316] [] move_freepages+0x160/0x168
[   10.606879] [] move_freepages_block+0xa8/0xb8
[   10.612788] [] __rmqueue+0x610/0x670
[   10.617918] [] get_page_from_freelist+0x3cc/0xb40
[   10.624174] [] __alloc_pages_nodemask+0x12c/0xd40
[   10.630438] [] alloc_page_interleave+0x60/0xb0
[   10.636434] [] alloc_

Re: [PATCH] arm64, numa: Add cpu_to_node() implementation.

2016-09-20 Thread Hanjun Guo

On 09/20/2016 02:49 AM, David Daney wrote:

From: David Daney 

The wq_numa_init() function makes a private CPU to node map by calling
cpu_to_node() early in the boot process, before the non-boot CPUs are
brought online.  Since the default implementation of cpu_to_node()
returns zero for CPUs that have never been brought online, the
workqueue system's view is that *all* CPUs are on node zero.

When the unbound workqueue for a non-zero node is created, the
tsk_cpus_allowed() for the worker threads is the empty set because
there are, in the view of the workqueue system, no CPUs on non-zero
nodes.  The code in try_to_wake_up() using this empty cpumask ends up
using the cpumask empty set value of NR_CPUS as an index into the
per-CPU area pointer array, and gets garbage as it is one past the end
of the array.  This results in:

[0.881970] Unable to handle kernel paging request at virtual address 
fb1008b926a4
[1.970095] pgd = fc00094b
[1.973530] [fb1008b926a4] *pgd=, *pud=, 
*pmd=
[1.982610] Internal error: Oops: 9604 [#1] SMP
[1.987541] Modules linked in:
[1.990631] CPU: 48 PID: 295 Comm: cpuhp/48 Tainted: GW   
4.8.0-rc6-preempt-vol+ #9
[1.999435] Hardware name: Cavium ThunderX CN88XX board (DT)
[2.005159] task: fe0fe89cc300 task.stack: fe0fe8b8c000
[2.011158] PC is at try_to_wake_up+0x194/0x34c
[2.015737] LR is at try_to_wake_up+0x150/0x34c
[2.020318] pc : [] lr : [] pstate: 
60c5
[2.027803] sp : fe0fe8b8fb10
[2.031149] x29: fe0fe8b8fb10 x28: 
[2.036522] x27: fc0008c63bc8 x26: 1000
[2.041896] x25: fc0008c63c80 x24: fc0008bfb200
[2.047270] x23: 00c0 x22: 0004
[2.052642] x21: fe0fe89d25bc x20: 1000
[2.058014] x19: fe0fe89d1d00 x18: 
[2.063386] x17:  x16: 
[2.068760] x15: 0018 x14: 
[2.074133] x13:  x12: 
[2.079505] x11:  x10: 
[2.084879] x9 :  x8 : 
[2.090251] x7 : 0040 x6 : 
[2.095621] x5 :  x4 : 
[2.100991] x3 :  x2 : 
[2.106364] x1 : fc0008be4c24 x0 : ff0ada80
[2.111737]
[2.113236] Process cpuhp/48 (pid: 295, stack limit = 0xfe0fe8b8c020)
[2.120102] Stack: (0xfe0fe8b8fb10 to 0xfe0fe8b9)
[2.125914] fb00:   fe0fe8b8fb80 
fc00080e7648
.
.
.
[2.442859] Call trace:
[2.445327] Exception stack(0xfe0fe8b8f940 to 0xfe0fe8b8fa70)
[2.451843] f940: fe0fe89d1d00 0400 fe0fe8b8fb10 
fc00080e7468
[2.459767] f960: fe0fe8b8f980 fc00080e4958 ff0ff91ab200 
fc00080e4b64
[2.467690] f980: fe0fe8b8f9d0 fc00080e515c fe0fe8b8fa80 

[2.475614] f9a0: fe0fe8b8f9d0 fc00080e58e4 fe0fe8b8fa80 

[2.483540] f9c0: fe0fe8d1 0040 fe0fe8b8fa50 
fc00080e5ac4
[2.491465] f9e0: ff0ada80 fc0008be4c24  

[2.499387] fa00:    
0040
[2.507309] fa20:    

[2.515233] fa40:    
0018
[2.523156] fa60:  
[2.528089] [] try_to_wake_up+0x194/0x34c
[2.533723] [] wake_up_process+0x28/0x34
[2.539275] [] create_worker+0x110/0x19c
[2.544824] [] alloc_unbound_pwq+0x3cc/0x4b0
[2.550724] [] wq_update_unbound_numa+0x10c/0x1e4
[2.557066] [] workqueue_online_cpu+0x220/0x28c
[2.563234] [] cpuhp_invoke_callback+0x6c/0x168
[2.569398] [] cpuhp_up_callbacks+0x44/0xe4
[2.575210] [] cpuhp_thread_fun+0x13c/0x148
[2.581027] [] smpboot_thread_fn+0x19c/0x1a8
[2.586929] [] kthread+0xdc/0xf0
[2.591776] [] ret_from_fork+0x10/0x50
[2.597147] Code: b00057e1 91304021 91005021 b8626822 (b8606821)
[2.603464] ---[ end trace 58c0cd36b88802bc ]---
[2.608138] Kernel panic - not syncing: Fatal exception

Fix by supplying a cpu_to_node() implementation that returns correct
node mappings.

Cc:  # 4.7.x-
Signed-off-by: David Daney 

---
  arch/arm64/include/asm/topology.h |  3 +++
  arch/arm64/mm/numa.c  | 18 ++
  2 files changed, 21 insertions(+)

diff --git a/arch/arm64/include/asm/topology.h 
b/arch/arm64/include/asm/topology.h
index 8b57339..8d935447 100644
--- a/arch/arm64/include/asm/topology.h
+++ b/arch/arm64/include/asm/topology.h
@@ -30,6 +30,9 @@ int pcibus_to_node(struct pci_bus *bus);
 cpu_all_mask : 

Re: [PATCH] arm64, numa: Add cpu_to_node() implementation.

2016-09-20 Thread Yisheng Xie


On 2016/9/20 2:49, David Daney wrote:
> From: David Daney 
> 
> The wq_numa_init() function makes a private CPU to node map by calling
> cpu_to_node() early in the boot process, before the non-boot CPUs are
> brought online.  Since the default implementation of cpu_to_node()
> returns zero for CPUs that have never been brought online, the
> workqueue system's view is that *all* CPUs are on node zero.
> 
> When the unbound workqueue for a non-zero node is created, the
> tsk_cpus_allowed() for the worker threads is the empty set because
> there are, in the view of the workqueue system, no CPUs on non-zero
> nodes.  The code in try_to_wake_up() using this empty cpumask ends up
> using the cpumask empty set value of NR_CPUS as an index into the
> per-CPU area pointer array, and gets garbage as it is one past the end
> of the array.  This results in:
> 
> [0.881970] Unable to handle kernel paging request at virtual address 
> fb1008b926a4
> [1.970095] pgd = fc00094b
> [1.973530] [fb1008b926a4] *pgd=, 
> *pud=, *pmd=
> [1.982610] Internal error: Oops: 9604 [#1] SMP
> [1.987541] Modules linked in:
> [1.990631] CPU: 48 PID: 295 Comm: cpuhp/48 Tainted: GW   
> 4.8.0-rc6-preempt-vol+ #9
> [1.999435] Hardware name: Cavium ThunderX CN88XX board (DT)
> [2.005159] task: fe0fe89cc300 task.stack: fe0fe8b8c000
> [2.011158] PC is at try_to_wake_up+0x194/0x34c
> [2.015737] LR is at try_to_wake_up+0x150/0x34c
> [2.020318] pc : [] lr : [] pstate: 
> 60c5
> [2.027803] sp : fe0fe8b8fb10
> [2.031149] x29: fe0fe8b8fb10 x28: 
> [2.036522] x27: fc0008c63bc8 x26: 1000
> [2.041896] x25: fc0008c63c80 x24: fc0008bfb200
> [2.047270] x23: 00c0 x22: 0004
> [2.052642] x21: fe0fe89d25bc x20: 1000
> [2.058014] x19: fe0fe89d1d00 x18: 
> [2.063386] x17:  x16: 
> [2.068760] x15: 0018 x14: 
> [2.074133] x13:  x12: 
> [2.079505] x11:  x10: 
> [2.084879] x9 :  x8 : 
> [2.090251] x7 : 0040 x6 : 
> [2.095621] x5 :  x4 : 
> [2.100991] x3 :  x2 : 
> [2.106364] x1 : fc0008be4c24 x0 : ff0ada80
> [2.111737]
> [2.113236] Process cpuhp/48 (pid: 295, stack limit = 0xfe0fe8b8c020)
> [2.120102] Stack: (0xfe0fe8b8fb10 to 0xfe0fe8b9)
> [2.125914] fb00:   fe0fe8b8fb80 
> fc00080e7648
> .
> .
> .
> [2.442859] Call trace:
> [2.445327] Exception stack(0xfe0fe8b8f940 to 0xfe0fe8b8fa70)
> [2.451843] f940: fe0fe89d1d00 0400 fe0fe8b8fb10 
> fc00080e7468
> [2.459767] f960: fe0fe8b8f980 fc00080e4958 ff0ff91ab200 
> fc00080e4b64
> [2.467690] f980: fe0fe8b8f9d0 fc00080e515c fe0fe8b8fa80 
> 
> [2.475614] f9a0: fe0fe8b8f9d0 fc00080e58e4 fe0fe8b8fa80 
> 
> [2.483540] f9c0: fe0fe8d1 0040 fe0fe8b8fa50 
> fc00080e5ac4
> [2.491465] f9e0: ff0ada80 fc0008be4c24  
> 
> [2.499387] fa00:    
> 0040
> [2.507309] fa20:    
> 
> [2.515233] fa40:    
> 0018
> [2.523156] fa60:  
> [2.528089] [] try_to_wake_up+0x194/0x34c
> [2.533723] [] wake_up_process+0x28/0x34
> [2.539275] [] create_worker+0x110/0x19c
> [2.544824] [] alloc_unbound_pwq+0x3cc/0x4b0
> [2.550724] [] wq_update_unbound_numa+0x10c/0x1e4
> [2.557066] [] workqueue_online_cpu+0x220/0x28c
> [2.563234] [] cpuhp_invoke_callback+0x6c/0x168
> [2.569398] [] cpuhp_up_callbacks+0x44/0xe4
> [2.575210] [] cpuhp_thread_fun+0x13c/0x148
> [2.581027] [] smpboot_thread_fn+0x19c/0x1a8
> [2.586929] [] kthread+0xdc/0xf0
> [2.591776] [] ret_from_fork+0x10/0x50
> [2.597147] Code: b00057e1 91304021 91005021 b8626822 (b8606821)
> [2.603464] ---[ end trace 58c0cd36b88802bc ]---
> [2.608138] Kernel panic - not syncing: Fatal exception
> 
> Fix by supplying a cpu_to_node() implementation that returns correct
> node mappings.
> 
> Cc:  # 4.7.x-
> Signed-off-by: David Daney 
> 
Tested-by: Yisheng Xie 
> ---
>  arch/arm64/include/asm/topology.h |  3 +++
>  arch/arm64/mm/numa.c  | 18 ++
>  2 files changed, 21 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/topology.h 
> b/arch/arm64/include/asm/topology.h

Re: [PATCH] arm64, numa: Add cpu_to_node() implementation.

2016-09-19 Thread Ganapatrao Kulkarni
[sending again, previous email was not text]

On Tue, Sep 20, 2016 at 12:19 AM, David Daney  wrote:
> From: David Daney 
>
> The wq_numa_init() function makes a private CPU to node map by calling
> cpu_to_node() early in the boot process, before the non-boot CPUs are
> brought online.  Since the default implementation of cpu_to_node()
> returns zero for CPUs that have never been brought online, the
> workqueue system's view is that *all* CPUs are on node zero.
>
> When the unbound workqueue for a non-zero node is created, the
> tsk_cpus_allowed() for the worker threads is the empty set because
> there are, in the view of the workqueue system, no CPUs on non-zero
> nodes.  The code in try_to_wake_up() using this empty cpumask ends up
> using the cpumask empty set value of NR_CPUS as an index into the
> per-CPU area pointer array, and gets garbage as it is one past the end
> of the array.  This results in:
>
> [0.881970] Unable to handle kernel paging request at virtual address 
> fb1008b926a4
> [1.970095] pgd = fc00094b
> [1.973530] [fb1008b926a4] *pgd=, 
> *pud=, *pmd=
> [1.982610] Internal error: Oops: 9604 [#1] SMP
> [1.987541] Modules linked in:
> [1.990631] CPU: 48 PID: 295 Comm: cpuhp/48 Tainted: GW   
> 4.8.0-rc6-preempt-vol+ #9
> [1.999435] Hardware name: Cavium ThunderX CN88XX board (DT)
> [2.005159] task: fe0fe89cc300 task.stack: fe0fe8b8c000
> [2.011158] PC is at try_to_wake_up+0x194/0x34c
> [2.015737] LR is at try_to_wake_up+0x150/0x34c
> [2.020318] pc : [] lr : [] pstate: 
> 60c5
> [2.027803] sp : fe0fe8b8fb10
> [2.031149] x29: fe0fe8b8fb10 x28: 
> [2.036522] x27: fc0008c63bc8 x26: 1000
> [2.041896] x25: fc0008c63c80 x24: fc0008bfb200
> [2.047270] x23: 00c0 x22: 0004
> [2.052642] x21: fe0fe89d25bc x20: 1000
> [2.058014] x19: fe0fe89d1d00 x18: 
> [2.063386] x17:  x16: 
> [2.068760] x15: 0018 x14: 
> [2.074133] x13:  x12: 
> [2.079505] x11:  x10: 
> [2.084879] x9 :  x8 : 
> [2.090251] x7 : 0040 x6 : 
> [2.095621] x5 :  x4 : 
> [2.100991] x3 :  x2 : 
> [2.106364] x1 : fc0008be4c24 x0 : ff0ada80
> [2.111737]
> [2.113236] Process cpuhp/48 (pid: 295, stack limit = 0xfe0fe8b8c020)
> [2.120102] Stack: (0xfe0fe8b8fb10 to 0xfe0fe8b9)
> [2.125914] fb00:   fe0fe8b8fb80 
> fc00080e7648
> .
> .
> .
> [2.442859] Call trace:
> [2.445327] Exception stack(0xfe0fe8b8f940 to 0xfe0fe8b8fa70)
> [2.451843] f940: fe0fe89d1d00 0400 fe0fe8b8fb10 
> fc00080e7468
> [2.459767] f960: fe0fe8b8f980 fc00080e4958 ff0ff91ab200 
> fc00080e4b64
> [2.467690] f980: fe0fe8b8f9d0 fc00080e515c fe0fe8b8fa80 
> 
> [2.475614] f9a0: fe0fe8b8f9d0 fc00080e58e4 fe0fe8b8fa80 
> 
> [2.483540] f9c0: fe0fe8d1 0040 fe0fe8b8fa50 
> fc00080e5ac4
> [2.491465] f9e0: ff0ada80 fc0008be4c24  
> 
> [2.499387] fa00:    
> 0040
> [2.507309] fa20:    
> 
> [2.515233] fa40:    
> 0018
> [2.523156] fa60:  
> [2.528089] [] try_to_wake_up+0x194/0x34c
> [2.533723] [] wake_up_process+0x28/0x34
> [2.539275] [] create_worker+0x110/0x19c
> [2.544824] [] alloc_unbound_pwq+0x3cc/0x4b0
> [2.550724] [] wq_update_unbound_numa+0x10c/0x1e4
> [2.557066] [] workqueue_online_cpu+0x220/0x28c
> [2.563234] [] cpuhp_invoke_callback+0x6c/0x168
> [2.569398] [] cpuhp_up_callbacks+0x44/0xe4
> [2.575210] [] cpuhp_thread_fun+0x13c/0x148
> [2.581027] [] smpboot_thread_fn+0x19c/0x1a8
> [2.586929] [] kthread+0xdc/0xf0
> [2.591776] [] ret_from_fork+0x10/0x50
> [2.597147] Code: b00057e1 91304021 91005021 b8626822 (b8606821)
> [2.603464] ---[ end trace 58c0cd36b88802bc ]---
> [2.608138] Kernel panic - not syncing: Fatal exception
>
> Fix by supplying a cpu_to_node() implementation that returns correct
> node mappings.
>
> Cc:  # 4.7.x-
> Signed-off-by: David Daney 
>

Acked-by: Ganapatrao Kulkarni 

> ---
>  arch/arm64/include/asm/topology.h |  3 +++
>  arch/arm64/mm/numa.c  | 18 ++
>  2 files changed, 21 insertions(+)
>
> diff --git a/arch/arm64/

[PATCH] arm64, numa: Add cpu_to_node() implementation.

2016-09-19 Thread David Daney
From: David Daney 

The wq_numa_init() function makes a private CPU to node map by calling
cpu_to_node() early in the boot process, before the non-boot CPUs are
brought online.  Since the default implementation of cpu_to_node()
returns zero for CPUs that have never been brought online, the
workqueue system's view is that *all* CPUs are on node zero.

When the unbound workqueue for a non-zero node is created, the
tsk_cpus_allowed() for the worker threads is the empty set because
there are, in the view of the workqueue system, no CPUs on non-zero
nodes.  The code in try_to_wake_up() using this empty cpumask ends up
using the cpumask empty set value of NR_CPUS as an index into the
per-CPU area pointer array, and gets garbage as it is one past the end
of the array.  This results in:

[0.881970] Unable to handle kernel paging request at virtual address 
fb1008b926a4
[1.970095] pgd = fc00094b
[1.973530] [fb1008b926a4] *pgd=, *pud=, 
*pmd=
[1.982610] Internal error: Oops: 9604 [#1] SMP
[1.987541] Modules linked in:
[1.990631] CPU: 48 PID: 295 Comm: cpuhp/48 Tainted: GW   
4.8.0-rc6-preempt-vol+ #9
[1.999435] Hardware name: Cavium ThunderX CN88XX board (DT)
[2.005159] task: fe0fe89cc300 task.stack: fe0fe8b8c000
[2.011158] PC is at try_to_wake_up+0x194/0x34c
[2.015737] LR is at try_to_wake_up+0x150/0x34c
[2.020318] pc : [] lr : [] pstate: 
60c5
[2.027803] sp : fe0fe8b8fb10
[2.031149] x29: fe0fe8b8fb10 x28: 
[2.036522] x27: fc0008c63bc8 x26: 1000
[2.041896] x25: fc0008c63c80 x24: fc0008bfb200
[2.047270] x23: 00c0 x22: 0004
[2.052642] x21: fe0fe89d25bc x20: 1000
[2.058014] x19: fe0fe89d1d00 x18: 
[2.063386] x17:  x16: 
[2.068760] x15: 0018 x14: 
[2.074133] x13:  x12: 
[2.079505] x11:  x10: 
[2.084879] x9 :  x8 : 
[2.090251] x7 : 0040 x6 : 
[2.095621] x5 :  x4 : 
[2.100991] x3 :  x2 : 
[2.106364] x1 : fc0008be4c24 x0 : ff0ada80
[2.111737]
[2.113236] Process cpuhp/48 (pid: 295, stack limit = 0xfe0fe8b8c020)
[2.120102] Stack: (0xfe0fe8b8fb10 to 0xfe0fe8b9)
[2.125914] fb00:   fe0fe8b8fb80 
fc00080e7648
.
.
.
[2.442859] Call trace:
[2.445327] Exception stack(0xfe0fe8b8f940 to 0xfe0fe8b8fa70)
[2.451843] f940: fe0fe89d1d00 0400 fe0fe8b8fb10 
fc00080e7468
[2.459767] f960: fe0fe8b8f980 fc00080e4958 ff0ff91ab200 
fc00080e4b64
[2.467690] f980: fe0fe8b8f9d0 fc00080e515c fe0fe8b8fa80 

[2.475614] f9a0: fe0fe8b8f9d0 fc00080e58e4 fe0fe8b8fa80 

[2.483540] f9c0: fe0fe8d1 0040 fe0fe8b8fa50 
fc00080e5ac4
[2.491465] f9e0: ff0ada80 fc0008be4c24  

[2.499387] fa00:    
0040
[2.507309] fa20:    

[2.515233] fa40:    
0018
[2.523156] fa60:  
[2.528089] [] try_to_wake_up+0x194/0x34c
[2.533723] [] wake_up_process+0x28/0x34
[2.539275] [] create_worker+0x110/0x19c
[2.544824] [] alloc_unbound_pwq+0x3cc/0x4b0
[2.550724] [] wq_update_unbound_numa+0x10c/0x1e4
[2.557066] [] workqueue_online_cpu+0x220/0x28c
[2.563234] [] cpuhp_invoke_callback+0x6c/0x168
[2.569398] [] cpuhp_up_callbacks+0x44/0xe4
[2.575210] [] cpuhp_thread_fun+0x13c/0x148
[2.581027] [] smpboot_thread_fn+0x19c/0x1a8
[2.586929] [] kthread+0xdc/0xf0
[2.591776] [] ret_from_fork+0x10/0x50
[2.597147] Code: b00057e1 91304021 91005021 b8626822 (b8606821)
[2.603464] ---[ end trace 58c0cd36b88802bc ]---
[2.608138] Kernel panic - not syncing: Fatal exception

Fix by supplying a cpu_to_node() implementation that returns correct
node mappings.

Cc:  # 4.7.x-
Signed-off-by: David Daney 

---
 arch/arm64/include/asm/topology.h |  3 +++
 arch/arm64/mm/numa.c  | 18 ++
 2 files changed, 21 insertions(+)

diff --git a/arch/arm64/include/asm/topology.h 
b/arch/arm64/include/asm/topology.h
index 8b57339..8d935447 100644
--- a/arch/arm64/include/asm/topology.h
+++ b/arch/arm64/include/asm/topology.h
@@ -30,6 +30,9 @@ int pcibus_to_node(struct pci_bus *bus);
 cpu_all_mask : \