Re: [PATCH] of: return NUMA_NO_NODE from fallback of_node_to_nid()

2015-06-04 Thread Grant Likely
On Mon, 13 Apr 2015 11:49:31 -0500
, Rob Herring 
 wrote:
> On Mon, Apr 13, 2015 at 8:38 AM, Konstantin Khlebnikov
>  wrote:
> > On 13.04.2015 16:22, Rob Herring wrote:
> >>
> >> On Wed, Apr 8, 2015 at 11:59 AM, Konstantin Khlebnikov
> >>  wrote:
> >>>
> >>> Node 0 might be offline as well as any other numa node,
> >>> in this case kernel cannot handle memory allocation and crashes.
> >>>
> >>> Signed-off-by: Konstantin Khlebnikov 
> >>> Fixes: 0c3f061c195c ("of: implement of_node_to_nid as a weak function")
> >>> ---
> >>>   drivers/of/base.c  |2 +-
> >>>   include/linux/of.h |5 -
> >>>   2 files changed, 5 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/drivers/of/base.c b/drivers/of/base.c
> >>> index 8f165b112e03..51f4bd16e613 100644
> >>> --- a/drivers/of/base.c
> >>> +++ b/drivers/of/base.c
> >>> @@ -89,7 +89,7 @@ EXPORT_SYMBOL(of_n_size_cells);
> >>>   #ifdef CONFIG_NUMA
> >>>   int __weak of_node_to_nid(struct device_node *np)
> >>>   {
> >>> -   return numa_node_id();
> >>> +   return NUMA_NO_NODE;
> >>
> >>
> >> This is going to break any NUMA machine that enables OF and expects
> >> the weak function to work.
> >
> >
> > Why? NUMA_NO_NODE == -1 -- this's standard "no-affinity" signal.
> > As I see powerpc/sparc versions of of_node_to_nid returns -1 if they
> > cannot find out which node should be used.
> 
> Ah, I was thinking those platforms were relying on the default
> implementation. I guess any real NUMA support is going to need to
> override this function. The arm64 patch series does that as well. We
> need to be sure this change is correct for metag which appears to be
> the only other OF enabled platform with NUMA support.
> 
> In that case, then there is little reason to keep the inline and we
> can just always enable the weak function (with your change). It is
> slightly less optimal, but the few callers hardly appear to be hot
> paths.

Sounds like you're in agreement with this patch then? Shall I apply it?

g.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] of: return NUMA_NO_NODE from fallback of_node_to_nid()

2015-04-29 Thread Konstantin Khlebnikov

+x...@kernel.org
+linux-me...@vger.kernel.org

here is proposed fix:
https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg864009.html

It returns NUMA_NO_NODE from both static-inline (CONFIG_OF=n) and weak
version of of_node_to_nid(). This change might affect few arches which
whave CONFIG_OF=y but doesn't implement of_node_to_nid() (i.e. depends
on default behavior of weak function). It seems this is only metag.

From mm/ point of view returning NUMA_NO_NODE is a right choice when
code have no idea which numa node should be used -- memory allocation
functions choose current numa node (but they might use any).

On 29.04.2015 04:11, songxium...@inspur.com wrote:

When we test the cpu and memory hotplug feature in the server with x86
architecture and kernel4.0-rc4,we met the similar problem.

The situation is that when memory in node0 is offline,the system is down
during booting.

Following is the bug information:
[0.335176] BUG: unable to handle kernel paging request at
1b08
[0.342164] IP: [] __alloc_pages_nodemask+0xb7/0x940
[0.348706] PGD 0
[0.350735] Oops:  [#1] SMP
[ 0.353993] Modules linked in:
[0.357063] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.0.0-rc4 #1
[0.363232] Hardware name: Inspur TS860/TS860, BIOS TS860_2.0.0
2015/03/24
[0.370095] task: 88085b1e ti: 88085b1e8000 task.ti:
88085b1e8000
[0.377564] RIP: 0010:[]  []
__alloc_pages_nodemask+0xb7/0x940
[0.386524] RSP: :88085b1ebac8  EFLAGS: 00010246
[0.391828] RAX: 1b00 RBX: 0010 RCX:

[0.398953] RDX:  RSI:  RDI:
002052d0
[0.406075] RBP: 88085b1ebbb8 R08: 88085b13fec0 R09:
5b13fe01
[0.413198] R10: 88085e807300 R11: 810d4bc1 R12:
0001002a
[0.420321] R13: 002052d0 R14: 0001 R15:
40d0
[0.427446] FS: () GS:88085ee0()
knlGS:
[0.435522] CS:  0010 DS:  ES:  CR0: 80050033
[0.441259] CR2: 1b08 CR3: 019ae000 CR4:
001406f0
[0.448382] Stack:
[ 0.450392]  88085b1e 0400 88085b1e
88085b1ebb68
[0.457846]  007b 88085b12d140 88085b249000
007b
[ 0.465298]  88085b1ebb28 81af2900 
002052d05b12d140
[0.472750] Call Trace:
[0.475206]  [] ? deactivate_slab+0x383/0x400
[0.481123] [] new_slab+0xa7/0x460
[ 0.486174]  [] __slab_alloc+0x310/0x470
[0.491655] [] ? dmar_msi_set_affinity+0x8f/0xc0
[0.497921] [] ? __irq_domain_add+0x41/0x100
[ 0.503838]  [] ? irq_do_set_affinity+0x5e/0x70
[0.509920] [] __kmalloc_node+0xad/0x2e0
[ 0.515483]  [] ? __irq_domain_add+0x41/0x100
[0.521392] [] __irq_domain_add+0x41/0x100
[ 0.527133]  [] mp_irqdomain_create+0x9e/0x120
[0.533140] [] setup_IO_APIC+0x64/0x1be
[ 0.538622]  [] apic_bsp_setup+0xa2/0xae
[0.544099] [] native_smp_prepare_cpus+0x267/0x2b2
[0.550531] [] kernel_init_freeable+0xf2/0x253
[0.556625] [] ? rest_init+0x80/0x80
[ 0.561845]  [] kernel_init+0xe/0xf0
[0.566979] [] ret_from_fork+0x58/0x90
[ 0.572374]  [] ? rest_init+0x80/0x80
[0.577591] Code: 30 97 00 89 45 bc 83 e1 0f b8 22 01 32 01 01 c9 d3
f8 83 e0 03 89 9d 6c ff ff ff 83 e3 10 89 45 c0 0f 85 6d 01 00 00 48 8b
45 88 <48> 83 78 08 00 0f 84 51 01 00 00 b8 01 00 00 00 44 89 f1 d3 e0
[0.597537] RIP [] __alloc_pages_nodemask+0xb7/0x940
[0.604158]  RSP 
[0.607643] CR2: 1b08
[0.610962] ---[ end trace 0a600c0841386992 ]---
[0.615573] Kernel panic - not syncing: Fatal exception
[0.620792] ---[ end Kernel panic - not syncing: Fatal exception
*From:* Rob Herring <mailto:robherri...@gmail.com>
*Date:* 2015-04-14 00:49
*To:* Konstantin Khlebnikov <mailto:khlebni...@yandex-team.ru>
*CC:* Grant Likely <mailto:grant.lik...@linaro.org>;
devicet...@vger.kernel.org <mailto:devicet...@vger.kernel.org>; Rob
Herring <mailto:robh...@kernel.org>; linux-kernel@vger.kernel.org
<mailto:linux-kernel@vger.kernel.org>; sparcli...@vger.kernel.org
<mailto:sparcli...@vger.kernel.org>; linux...@kvack.org
<mailto:linux...@kvack.org>; linuxppc-dev
<mailto:linuxppc-...@lists.ozlabs.org>
*Subject:* Re: [PATCH] of: return NUMA_NO_NODE from fallback
of_node_to_nid()
On Mon, Apr 13, 2015 at 8:38 AM, Konstantin Khlebnikov
 wrote:
 > On 13.04.2015 16:22, Rob Herring wrote:
 >>
 >> On Wed, Apr 8, 2015 at 11:59 AM, Konstantin Khlebnikov
 >>  wrote:
 >>>
 >>> Node 0 might be offline as well as any other numa node,
 >>> in this case kernel cannot handle memory allocation and crashes.
 >>>
 >>> Signed-off-by: Konstantin Khlebnikov 
 >>> Fixes: 0c3f061c195c ("of: implement of_node_to_nid as a weak function")
 >>> ---
 >>> 

Re: [PATCH] of: return NUMA_NO_NODE from fallback of_node_to_nid()

2015-04-13 Thread Rob Herring
On Mon, Apr 13, 2015 at 8:38 AM, Konstantin Khlebnikov
 wrote:
> On 13.04.2015 16:22, Rob Herring wrote:
>>
>> On Wed, Apr 8, 2015 at 11:59 AM, Konstantin Khlebnikov
>>  wrote:
>>>
>>> Node 0 might be offline as well as any other numa node,
>>> in this case kernel cannot handle memory allocation and crashes.
>>>
>>> Signed-off-by: Konstantin Khlebnikov 
>>> Fixes: 0c3f061c195c ("of: implement of_node_to_nid as a weak function")
>>> ---
>>>   drivers/of/base.c  |2 +-
>>>   include/linux/of.h |5 -
>>>   2 files changed, 5 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/of/base.c b/drivers/of/base.c
>>> index 8f165b112e03..51f4bd16e613 100644
>>> --- a/drivers/of/base.c
>>> +++ b/drivers/of/base.c
>>> @@ -89,7 +89,7 @@ EXPORT_SYMBOL(of_n_size_cells);
>>>   #ifdef CONFIG_NUMA
>>>   int __weak of_node_to_nid(struct device_node *np)
>>>   {
>>> -   return numa_node_id();
>>> +   return NUMA_NO_NODE;
>>
>>
>> This is going to break any NUMA machine that enables OF and expects
>> the weak function to work.
>
>
> Why? NUMA_NO_NODE == -1 -- this's standard "no-affinity" signal.
> As I see powerpc/sparc versions of of_node_to_nid returns -1 if they
> cannot find out which node should be used.

Ah, I was thinking those platforms were relying on the default
implementation. I guess any real NUMA support is going to need to
override this function. The arm64 patch series does that as well. We
need to be sure this change is correct for metag which appears to be
the only other OF enabled platform with NUMA support.

In that case, then there is little reason to keep the inline and we
can just always enable the weak function (with your change). It is
slightly less optimal, but the few callers hardly appear to be hot
paths.

Rob
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] of: return NUMA_NO_NODE from fallback of_node_to_nid()

2015-04-13 Thread Konstantin Khlebnikov

On 13.04.2015 16:22, Rob Herring wrote:

On Wed, Apr 8, 2015 at 11:59 AM, Konstantin Khlebnikov
 wrote:

Node 0 might be offline as well as any other numa node,
in this case kernel cannot handle memory allocation and crashes.

Signed-off-by: Konstantin Khlebnikov 
Fixes: 0c3f061c195c ("of: implement of_node_to_nid as a weak function")
---
  drivers/of/base.c  |2 +-
  include/linux/of.h |5 -
  2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/of/base.c b/drivers/of/base.c
index 8f165b112e03..51f4bd16e613 100644
--- a/drivers/of/base.c
+++ b/drivers/of/base.c
@@ -89,7 +89,7 @@ EXPORT_SYMBOL(of_n_size_cells);
  #ifdef CONFIG_NUMA
  int __weak of_node_to_nid(struct device_node *np)
  {
-   return numa_node_id();
+   return NUMA_NO_NODE;


This is going to break any NUMA machine that enables OF and expects
the weak function to work.


Why? NUMA_NO_NODE == -1 -- this's standard "no-affinity" signal.
As I see powerpc/sparc versions of of_node_to_nid returns -1 if they
cannot find out which node should be used.



Rob


  }
  #endif

diff --git a/include/linux/of.h b/include/linux/of.h
index dfde07e77a63..78a04ee85a9c 100644
--- a/include/linux/of.h
+++ b/include/linux/of.h
@@ -623,7 +623,10 @@ static inline const char *of_prop_next_string(struct 
property *prop,
  #if defined(CONFIG_OF) && defined(CONFIG_NUMA)
  extern int of_node_to_nid(struct device_node *np);
  #else
-static inline int of_node_to_nid(struct device_node *device) { return 0; }
+static inline int of_node_to_nid(struct device_node *device)
+{
+   return NUMA_NO_NODE;
+}
  #endif

  static inline struct device_node *of_find_matching_node(

--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



--
Konstantin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] of: return NUMA_NO_NODE from fallback of_node_to_nid()

2015-04-13 Thread Rob Herring
On Wed, Apr 8, 2015 at 11:59 AM, Konstantin Khlebnikov
 wrote:
> Node 0 might be offline as well as any other numa node,
> in this case kernel cannot handle memory allocation and crashes.
>
> Signed-off-by: Konstantin Khlebnikov 
> Fixes: 0c3f061c195c ("of: implement of_node_to_nid as a weak function")
> ---
>  drivers/of/base.c  |2 +-
>  include/linux/of.h |5 -
>  2 files changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/of/base.c b/drivers/of/base.c
> index 8f165b112e03..51f4bd16e613 100644
> --- a/drivers/of/base.c
> +++ b/drivers/of/base.c
> @@ -89,7 +89,7 @@ EXPORT_SYMBOL(of_n_size_cells);
>  #ifdef CONFIG_NUMA
>  int __weak of_node_to_nid(struct device_node *np)
>  {
> -   return numa_node_id();
> +   return NUMA_NO_NODE;

This is going to break any NUMA machine that enables OF and expects
the weak function to work.

Rob

>  }
>  #endif
>
> diff --git a/include/linux/of.h b/include/linux/of.h
> index dfde07e77a63..78a04ee85a9c 100644
> --- a/include/linux/of.h
> +++ b/include/linux/of.h
> @@ -623,7 +623,10 @@ static inline const char *of_prop_next_string(struct 
> property *prop,
>  #if defined(CONFIG_OF) && defined(CONFIG_NUMA)
>  extern int of_node_to_nid(struct device_node *np);
>  #else
> -static inline int of_node_to_nid(struct device_node *device) { return 0; }
> +static inline int of_node_to_nid(struct device_node *device)
> +{
> +   return NUMA_NO_NODE;
> +}
>  #endif
>
>  static inline struct device_node *of_find_matching_node(
>
> --
> To unsubscribe from this list: send the line "unsubscribe devicetree" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] of: return NUMA_NO_NODE from fallback of_node_to_nid()

2015-04-10 Thread Nishanth Aravamudan
On 10.04.2015 [14:37:19 +0300], Konstantin Khlebnikov wrote:
> On 10.04.2015 01:58, Tanisha Aravamudan wrote:
> >On 09.04.2015 [07:27:28 +0300], Konstantin Khlebnikov wrote:
> >>On Thu, Apr 9, 2015 at 2:07 AM, Nishanth Aravamudan
> >> wrote:
> >>>On 08.04.2015 [20:04:04 +0300], Konstantin Khlebnikov wrote:
> On 08.04.2015 19:59, Konstantin Khlebnikov wrote:
> >Node 0 might be offline as well as any other numa node,
> >in this case kernel cannot handle memory allocation and crashes.
> >>>
> >>>Isn't the bug that numa_node_id() returned an offline node? That
> >>>shouldn't happen.
> >>
> >>Offline node 0 came from static-inline copy of that function from of.h
> >>I've patched weak function for keeping consistency.
> >
> >Got it, that's not necessarily clear in the original commit message.
> 
> Sorry.
> 
> >
> >>>#ifdef CONFIG_USE_PERCPU_NUMA_NODE_ID
> >>>...
> >>>#ifndef numa_node_id
> >>>/* Returns the number of the current Node. */
> >>>static inline int numa_node_id(void)
> >>>{
> >>> return raw_cpu_read(numa_node);
> >>>}
> >>>#endif
> >>>...
> >>>#else   /* !CONFIG_USE_PERCPU_NUMA_NODE_ID */
> >>>
> >>>/* Returns the number of the current Node. */
> >>>#ifndef numa_node_id
> >>>static inline int numa_node_id(void)
> >>>{
> >>> return cpu_to_node(raw_smp_processor_id());
> >>>}
> >>>#endif
> >>>...
> >>>
> >>>So that's either the per-cpu numa_node value, right? Or the result of
> >>>cpu_to_node on the current processor.
> >>>
> Example:
> 
> [0.027133] [ cut here ]
> [0.027938] kernel BUG at include/linux/gfp.h:322!
> >>>
> >>>This is
> >>>
> >>>VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES || !node_online(nid));
> >>>
> >>>in
> >>>
> >>>alloc_pages_exact_node().
> >>>
> >>>And based on the trace below, that's
> >>>
> >>>__slab_alloc -> alloc
> >>>
> >>>alloc_pages_exact_node
> >>> <- alloc_slab_page
> >>> <- allocate_slab
> >>> <- new_slab
> >>> <- new_slab_objects
> >>> < __slab_alloc?
> >>>
> >>>which is just passing the node value down, right? Which I think was
> >>>from:
> >>>
> >>> domain = kzalloc_node(sizeof(*domain) + (sizeof(unsigned int) * 
> >>> size),
> >>>   GFP_KERNEL, of_node_to_nid(of_node));
> >>>
> >>>?
> >>>
> >>>
> >>>What platform is this on, looks to be x86? qemu emulation of a
> >>>pathological topology? What was the topology?
> >>
> >>qemu x86_64, 2 cpu, 2 numa nodes, all memory in second.
> >
> >Ok, this worked before? That is, this is a regression?
> 
> Seems like that worked before 3.17 where
> bug was exposed by commit 44767bfaaed782d6d635ecbb13f3980041e6f33e
> (x86, irq: Enhance mp_register_ioapic() to support irqdomain)
> this is first usage of  *irq_domain_add*() in x86.

Ok.

> >>  I've slightly patched it to allow that setup (in qemu hardcoded 1Mb
> >>of memory connected to node 0) And i've found unrelated bug --
> >>if numa node has less that 4Mb ram then kernel crashes even
> >>earlier because numa code ignores that node
> >>but buddy allocator still tries to use that pages.
> >
> >So this isn't an actually supported topology by qemu?
> 
> Qemu easily created memoryless numa nodes but node 0 have hardcoded
> 1Mb of ram. This seems like legacy prop for DOS era software.

Well, the problem is that x86 doesn't support memoryless nodes.

git grep MEMORYLESS_NODES
arch/ia64/Kconfig:config HAVE_MEMORYLESS_NODES
arch/powerpc/Kconfig:config HAVE_MEMORYLESS_NODES

> >>>Note that there is a ton of code that seems to assume node 0 is online.
> >>>I started working on removing this assumption myself and it just led
> >>>down a rathole (on power, we always have node 0 online, even if it is
> >>>memoryless and cpuless, as a result).
> >>>
> >>>I am guessing this is just happening early in boot before the per-cpu
> >>>areas are setup? That's why (I think) x86 has the early_cpu_to_node()
> >>>function...
> >>>
> >>>Or do you not have CONFIG_OF set? So isn't the only change necessary to
> >>>the include file, and it should just return first_online_node rather
> >>>than 0?
> >>>
> >>>Ah and there's more of those node 0 assumptions :)
> >>
> >>That was x86 where is no CONFIG_OF at all.
> >>
> >>I don't know what's wrong with that machine but ACPI reports that
> >>cpus and memory from node 0 as connected to node 1 and everything
> >>seems worked fine until lates upgrade -- seems like buggy static-inline
> >>of_node_to_nid was intoduced in 3.13 but x86 ioapic uses it during
> >>early allocations only in since 3.17. Machine owner teells that 3.15
> >>worked fine.
> >
> >So, this was a qemu emulation of this actual physical machine without a
> >node 0?
> 
> Yep. Also I have crash from real machine but that stacktrace is messy
> because CONFIG_DEBUG_VM wasn't enabled and kernel crashed inside
> buddy allocator when tried to touch unallocated numa node structure.
> 
> >

Re: [PATCH] of: return NUMA_NO_NODE from fallback of_node_to_nid()

2015-04-08 Thread Konstantin Khlebnikov
On Thu, Apr 9, 2015 at 2:12 AM, Julian Calaby  wrote:
> Hi Konstantin,
>
> On Thu, Apr 9, 2015 at 3:04 AM, Konstantin Khlebnikov
>  wrote:
>> On 08.04.2015 19:59, Konstantin Khlebnikov wrote:
>>>
>>> Node 0 might be offline as well as any other numa node,
>>> in this case kernel cannot handle memory allocation and crashes.
>>
>>
>> Example:
>>
>> [0.027133] [ cut here ]
>> [0.027938] kernel BUG at include/linux/gfp.h:322!
>> [0.028000] invalid opcode:  [#1] SMP
>> [0.028000] Modules linked in:
>> [0.028000] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.0.0-rc7 #12
>> [0.028000] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
>> rel-1.7.5.1-0-g8936dbb-20141113_115728-nilsson.home.kraxel.org 04/01/2014
>> [0.028000] task: 88007d3f8000 ti: 88007d3dc000 task.ti:
>> 88007d3dc000
>> [0.028000] RIP: 0010:[]  []
>> new_slab+0x30c/0x3c0
>> [0.028000] RSP: :88007d3dfc28  EFLAGS: 00010246
>> [0.028000] RAX:  RBX: 88007d001800 RCX:
>> 0001
>> [0.028000] RDX:  RSI:  RDI:
>> 002012d0
>> [0.028000] RBP: 88007d3dfc58 R08:  R09:
>> 
>> [0.028000] R10: 0001 R11: 88007d02fe40 R12:
>> 00d0
>> [0.028000] R13: 00c0 R14: 0015 R15:
>> 
>> [0.028000] FS:  () GS:88007fc0()
>> knlGS:
>> [0.028000] CS:  0010 DS:  ES:  CR0: 8005003b
>> [0.028000] CR2:  CR3: 01e0e000 CR4:
>> 06f0
>> [0.028000] DR0:  DR1:  DR2:
>> 
>> [0.028000] DR3:  DR6: fffe0ff0 DR7:
>> 0400
>> [0.028000] Stack:
>> [0.028000]   88007fc175d0 ea0001f40bc0
>> 00c0
>> [0.028000]  88007d001800 80d0 88007d3dfd48
>> 8192da27
>> [0.028000]  000d 81e27038 
>> 
>> [0.028000] Call Trace:
>> [0.028000]  [] __slab_alloc+0x3df/0x55d
>> [0.028000]  [] ? __lock_acquire+0xc1b/0x1f40
>> [0.028000]  [] ? __irq_domain_add+0x3c/0xe0
>> [0.028000]  [] ? trace_hardirqs_on_caller+0x105/0x1d0
>> [0.028000]  [] ? trace_hardirqs_on_thunk+0x3a/0x3f
>> [0.028000]  [] __kmalloc_node+0xab/0x210
>> [0.028000]  [] ? ioapic_read_entry+0x1f/0x50
>> [0.028000]  [] ? __irq_domain_add+0x3c/0xe0
>> [0.028000]  [] __irq_domain_add+0x3c/0xe0
>> [0.028000]  [] mp_irqdomain_create+0x9e/0x120
>> [0.028000]  [] setup_IO_APIC+0x6b/0x798
>> [0.028000]  [] ? clear_IO_APIC+0x45/0x70
>> [0.028000]  [] apic_bsp_setup+0x87/0x96
>> [0.028000]  [] native_smp_prepare_cpus+0x237/0x275
>> [0.028000]  [] kernel_init_freeable+0x120/0x265
>> [0.028000]  [] ? kernel_init+0x9/0xf0
>> [0.028000]  [] ? rest_init+0x130/0x130
>> [0.028000]  [] kernel_init+0x9/0xf0
>> [0.028000]  [] ret_from_fork+0x58/0x90
>> [0.028000]  [] ? rest_init+0x130/0x130
>> [0.028000] Code: 6b b6 ff ff 49 89 c5 e9 ce fd ff ff 31 c0 90 e9 74 ff
>> ff ff 49 c7 04 04 00 00 00 00 e9 05 ff ff ff 4c 89 e7 ff d0 e9 d9 fe ff ff
>> <0f> 0b 4c 8b 73 38 44 89 e7 81 cf 00 00 20 00 4c 89 f6 48 c1 ee
>> [0.028000] RIP  [] new_slab+0x30c/0x3c0
>> [0.028000]  RSP 
>> [0.028039] ---[ end trace f03690e70d7e4be6 ]---
>
> Shouldn't this be in the commit message?

I don't think that this will help somebody, kernel crashes here
only on rare hardware setup. This stack came from especially
patched qemu (because normal cannot configure memory-less nore 0)

>
> Thanks,
>
> --
> Julian Calaby
>
> Email: julian.cal...@gmail.com
> Profile: http://www.google.com/profiles/julian.calaby/
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majord...@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: mailto:"d...@kvack.org";> em...@kvack.org 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] of: return NUMA_NO_NODE from fallback of_node_to_nid()

2015-04-08 Thread Konstantin Khlebnikov
On Thu, Apr 9, 2015 at 2:07 AM, Nishanth Aravamudan
 wrote:
> On 08.04.2015 [20:04:04 +0300], Konstantin Khlebnikov wrote:
>> On 08.04.2015 19:59, Konstantin Khlebnikov wrote:
>> >Node 0 might be offline as well as any other numa node,
>> >in this case kernel cannot handle memory allocation and crashes.
>
> Isn't the bug that numa_node_id() returned an offline node? That
> shouldn't happen.

Offline node 0 came from static-inline copy of that function from of.h
I've patched weak function for keeping consistency.

>
> #ifdef CONFIG_USE_PERCPU_NUMA_NODE_ID
> ...
> #ifndef numa_node_id
> /* Returns the number of the current Node. */
> static inline int numa_node_id(void)
> {
> return raw_cpu_read(numa_node);
> }
> #endif
> ...
> #else   /* !CONFIG_USE_PERCPU_NUMA_NODE_ID */
>
> /* Returns the number of the current Node. */
> #ifndef numa_node_id
> static inline int numa_node_id(void)
> {
> return cpu_to_node(raw_smp_processor_id());
> }
> #endif
> ...
>
> So that's either the per-cpu numa_node value, right? Or the result of
> cpu_to_node on the current processor.
>
>> Example:
>>
>> [0.027133] [ cut here ]
>> [0.027938] kernel BUG at include/linux/gfp.h:322!
>
> This is
>
> VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES || !node_online(nid));
>
> in
>
> alloc_pages_exact_node().
>
> And based on the trace below, that's
>
> __slab_alloc -> alloc
>
> alloc_pages_exact_node
> <- alloc_slab_page
> <- allocate_slab
> <- new_slab
> <- new_slab_objects
> < __slab_alloc?
>
> which is just passing the node value down, right? Which I think was
> from:
>
> domain = kzalloc_node(sizeof(*domain) + (sizeof(unsigned int) * size),
>   GFP_KERNEL, of_node_to_nid(of_node));
>
> ?
>
>
> What platform is this on, looks to be x86? qemu emulation of a
> pathological topology? What was the topology?

qemu x86_64, 2 cpu, 2 numa nodes, all memory in second.
 I've slightly patched it to allow that setup (in qemu hardcoded 1Mb
of memory connected to node 0) And i've found unrelated bug --
if numa node has less that 4Mb ram then kernel crashes even
earlier because numa code ignores that node
but buddy allocator still tries to use that pages.

>
> Note that there is a ton of code that seems to assume node 0 is online.
> I started working on removing this assumption myself and it just led
> down a rathole (on power, we always have node 0 online, even if it is
> memoryless and cpuless, as a result).
>
> I am guessing this is just happening early in boot before the per-cpu
> areas are setup? That's why (I think) x86 has the early_cpu_to_node()
> function...
>
> Or do you not have CONFIG_OF set? So isn't the only change necessary to
> the include file, and it should just return first_online_node rather
> than 0?
>
> Ah and there's more of those node 0 assumptions :)

That was x86 where is no CONFIG_OF at all.

I don't know what's wrong with that machine but ACPI reports that
cpus and memory from node 0 as connected to node 1 and everything
seems worked fine until lates upgrade -- seems like buggy static-inline
of_node_to_nid was intoduced in 3.13 but x86 ioapic uses it during
early allocations only in since 3.17. Machine owner teells that 3.15
worked fine.

>
> #define first_online_node   0
> #define first_memory_node   0
>
> if MAX_NUMODES == 1...
>
> -Nish
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majord...@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: mailto:"d...@kvack.org";> em...@kvack.org 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] of: return NUMA_NO_NODE from fallback of_node_to_nid()

2015-04-08 Thread Julian Calaby
Hi Konstantin,

On Thu, Apr 9, 2015 at 3:04 AM, Konstantin Khlebnikov
 wrote:
> On 08.04.2015 19:59, Konstantin Khlebnikov wrote:
>>
>> Node 0 might be offline as well as any other numa node,
>> in this case kernel cannot handle memory allocation and crashes.
>
>
> Example:
>
> [0.027133] [ cut here ]
> [0.027938] kernel BUG at include/linux/gfp.h:322!
> [0.028000] invalid opcode:  [#1] SMP
> [0.028000] Modules linked in:
> [0.028000] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.0.0-rc7 #12
> [0.028000] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
> rel-1.7.5.1-0-g8936dbb-20141113_115728-nilsson.home.kraxel.org 04/01/2014
> [0.028000] task: 88007d3f8000 ti: 88007d3dc000 task.ti:
> 88007d3dc000
> [0.028000] RIP: 0010:[]  []
> new_slab+0x30c/0x3c0
> [0.028000] RSP: :88007d3dfc28  EFLAGS: 00010246
> [0.028000] RAX:  RBX: 88007d001800 RCX:
> 0001
> [0.028000] RDX:  RSI:  RDI:
> 002012d0
> [0.028000] RBP: 88007d3dfc58 R08:  R09:
> 
> [0.028000] R10: 0001 R11: 88007d02fe40 R12:
> 00d0
> [0.028000] R13: 00c0 R14: 0015 R15:
> 
> [0.028000] FS:  () GS:88007fc0()
> knlGS:
> [0.028000] CS:  0010 DS:  ES:  CR0: 8005003b
> [0.028000] CR2:  CR3: 01e0e000 CR4:
> 06f0
> [0.028000] DR0:  DR1:  DR2:
> 
> [0.028000] DR3:  DR6: fffe0ff0 DR7:
> 0400
> [0.028000] Stack:
> [0.028000]   88007fc175d0 ea0001f40bc0
> 00c0
> [0.028000]  88007d001800 80d0 88007d3dfd48
> 8192da27
> [0.028000]  000d 81e27038 
> 
> [0.028000] Call Trace:
> [0.028000]  [] __slab_alloc+0x3df/0x55d
> [0.028000]  [] ? __lock_acquire+0xc1b/0x1f40
> [0.028000]  [] ? __irq_domain_add+0x3c/0xe0
> [0.028000]  [] ? trace_hardirqs_on_caller+0x105/0x1d0
> [0.028000]  [] ? trace_hardirqs_on_thunk+0x3a/0x3f
> [0.028000]  [] __kmalloc_node+0xab/0x210
> [0.028000]  [] ? ioapic_read_entry+0x1f/0x50
> [0.028000]  [] ? __irq_domain_add+0x3c/0xe0
> [0.028000]  [] __irq_domain_add+0x3c/0xe0
> [0.028000]  [] mp_irqdomain_create+0x9e/0x120
> [0.028000]  [] setup_IO_APIC+0x6b/0x798
> [0.028000]  [] ? clear_IO_APIC+0x45/0x70
> [0.028000]  [] apic_bsp_setup+0x87/0x96
> [0.028000]  [] native_smp_prepare_cpus+0x237/0x275
> [0.028000]  [] kernel_init_freeable+0x120/0x265
> [0.028000]  [] ? kernel_init+0x9/0xf0
> [0.028000]  [] ? rest_init+0x130/0x130
> [0.028000]  [] kernel_init+0x9/0xf0
> [0.028000]  [] ret_from_fork+0x58/0x90
> [0.028000]  [] ? rest_init+0x130/0x130
> [0.028000] Code: 6b b6 ff ff 49 89 c5 e9 ce fd ff ff 31 c0 90 e9 74 ff
> ff ff 49 c7 04 04 00 00 00 00 e9 05 ff ff ff 4c 89 e7 ff d0 e9 d9 fe ff ff
> <0f> 0b 4c 8b 73 38 44 89 e7 81 cf 00 00 20 00 4c 89 f6 48 c1 ee
> [0.028000] RIP  [] new_slab+0x30c/0x3c0
> [0.028000]  RSP 
> [0.028039] ---[ end trace f03690e70d7e4be6 ]---

Shouldn't this be in the commit message?

Thanks,

-- 
Julian Calaby

Email: julian.cal...@gmail.com
Profile: http://www.google.com/profiles/julian.calaby/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] of: return NUMA_NO_NODE from fallback of_node_to_nid()

2015-04-08 Thread Nishanth Aravamudan
On 08.04.2015 [20:04:04 +0300], Konstantin Khlebnikov wrote:
> On 08.04.2015 19:59, Konstantin Khlebnikov wrote:
> >Node 0 might be offline as well as any other numa node,
> >in this case kernel cannot handle memory allocation and crashes.

Isn't the bug that numa_node_id() returned an offline node? That
shouldn't happen.

#ifdef CONFIG_USE_PERCPU_NUMA_NODE_ID
...
#ifndef numa_node_id
/* Returns the number of the current Node. */
static inline int numa_node_id(void)
{
return raw_cpu_read(numa_node);
}
#endif
...
#else   /* !CONFIG_USE_PERCPU_NUMA_NODE_ID */

/* Returns the number of the current Node. */
#ifndef numa_node_id
static inline int numa_node_id(void)
{
return cpu_to_node(raw_smp_processor_id());
}
#endif
...

So that's either the per-cpu numa_node value, right? Or the result of
cpu_to_node on the current processor.

> Example:
> 
> [0.027133] [ cut here ]
> [0.027938] kernel BUG at include/linux/gfp.h:322!

This is 

VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES || !node_online(nid));

in

alloc_pages_exact_node().

And based on the trace below, that's

__slab_alloc -> alloc

alloc_pages_exact_node
<- alloc_slab_page
<- allocate_slab
<- new_slab
<- new_slab_objects
< __slab_alloc?

which is just passing the node value down, right? Which I think was
from:

domain = kzalloc_node(sizeof(*domain) + (sizeof(unsigned int) * size),
  GFP_KERNEL, of_node_to_nid(of_node));

?


What platform is this on, looks to be x86? qemu emulation of a
pathological topology? What was the topology?

Note that there is a ton of code that seems to assume node 0 is online.
I started working on removing this assumption myself and it just led
down a rathole (on power, we always have node 0 online, even if it is
memoryless and cpuless, as a result).

I am guessing this is just happening early in boot before the per-cpu
areas are setup? That's why (I think) x86 has the early_cpu_to_node()
function...

Or do you not have CONFIG_OF set? So isn't the only change necessary to
the include file, and it should just return first_online_node rather
than 0?

Ah and there's more of those node 0 assumptions :)

#define first_online_node   0
#define first_memory_node   0

if MAX_NUMODES == 1...

-Nish

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] of: return NUMA_NO_NODE from fallback of_node_to_nid()

2015-04-08 Thread Konstantin Khlebnikov

On 08.04.2015 19:59, Konstantin Khlebnikov wrote:

Node 0 might be offline as well as any other numa node,
in this case kernel cannot handle memory allocation and crashes.


Example:

[0.027133] [ cut here ]
[0.027938] kernel BUG at include/linux/gfp.h:322!
[0.028000] invalid opcode:  [#1] SMP
[0.028000] Modules linked in:
[0.028000] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.0.0-rc7 #12
[0.028000] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), 
BIOS rel-1.7.5.1-0-g8936dbb-20141113_115728-nilsson.home.kraxel.org 
04/01/2014
[0.028000] task: 88007d3f8000 ti: 88007d3dc000 task.ti: 
88007d3dc000
[0.028000] RIP: 0010:[]  [] 
new_slab+0x30c/0x3c0

[0.028000] RSP: :88007d3dfc28  EFLAGS: 00010246
[0.028000] RAX:  RBX: 88007d001800 RCX: 
0001
[0.028000] RDX:  RSI:  RDI: 
002012d0
[0.028000] RBP: 88007d3dfc58 R08:  R09: 

[0.028000] R10: 0001 R11: 88007d02fe40 R12: 
00d0
[0.028000] R13: 00c0 R14: 0015 R15: 

[0.028000] FS:  () GS:88007fc0() 
knlGS:

[0.028000] CS:  0010 DS:  ES:  CR0: 8005003b
[0.028000] CR2:  CR3: 01e0e000 CR4: 
06f0
[0.028000] DR0:  DR1:  DR2: 

[0.028000] DR3:  DR6: fffe0ff0 DR7: 
0400

[0.028000] Stack:
[0.028000]   88007fc175d0 ea0001f40bc0 
00c0
[0.028000]  88007d001800 80d0 88007d3dfd48 
8192da27
[0.028000]  000d 81e27038  


[0.028000] Call Trace:
[0.028000]  [] __slab_alloc+0x3df/0x55d
[0.028000]  [] ? __lock_acquire+0xc1b/0x1f40
[0.028000]  [] ? __irq_domain_add+0x3c/0xe0
[0.028000]  [] ? trace_hardirqs_on_caller+0x105/0x1d0
[0.028000]  [] ? trace_hardirqs_on_thunk+0x3a/0x3f
[0.028000]  [] __kmalloc_node+0xab/0x210
[0.028000]  [] ? ioapic_read_entry+0x1f/0x50
[0.028000]  [] ? __irq_domain_add+0x3c/0xe0
[0.028000]  [] __irq_domain_add+0x3c/0xe0
[0.028000]  [] mp_irqdomain_create+0x9e/0x120
[0.028000]  [] setup_IO_APIC+0x6b/0x798
[0.028000]  [] ? clear_IO_APIC+0x45/0x70
[0.028000]  [] apic_bsp_setup+0x87/0x96
[0.028000]  [] native_smp_prepare_cpus+0x237/0x275
[0.028000]  [] kernel_init_freeable+0x120/0x265
[0.028000]  [] ? kernel_init+0x9/0xf0
[0.028000]  [] ? rest_init+0x130/0x130
[0.028000]  [] kernel_init+0x9/0xf0
[0.028000]  [] ret_from_fork+0x58/0x90
[0.028000]  [] ? rest_init+0x130/0x130
[0.028000] Code: 6b b6 ff ff 49 89 c5 e9 ce fd ff ff 31 c0 90 e9 74 
ff ff ff 49 c7 04 04 00 00 00 00 e9 05 ff ff ff 4c 89 e7 ff d0 e9 d9 fe 
ff ff <0f> 0b 4c 8b 73 38 44 89 e7 81 cf 00 00 20 00 4c 89 f6 48 c1 ee

[0.028000] RIP  [] new_slab+0x30c/0x3c0
[0.028000]  RSP 
[0.028039] ---[ end trace f03690e70d7e4be6 ]---




Signed-off-by: Konstantin Khlebnikov 
Fixes: 0c3f061c195c ("of: implement of_node_to_nid as a weak function")
---
  drivers/of/base.c  |2 +-
  include/linux/of.h |5 -
  2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/of/base.c b/drivers/of/base.c
index 8f165b112e03..51f4bd16e613 100644
--- a/drivers/of/base.c
+++ b/drivers/of/base.c
@@ -89,7 +89,7 @@ EXPORT_SYMBOL(of_n_size_cells);
  #ifdef CONFIG_NUMA
  int __weak of_node_to_nid(struct device_node *np)
  {
-   return numa_node_id();
+   return NUMA_NO_NODE;
  }
  #endif

diff --git a/include/linux/of.h b/include/linux/of.h
index dfde07e77a63..78a04ee85a9c 100644
--- a/include/linux/of.h
+++ b/include/linux/of.h
@@ -623,7 +623,10 @@ static inline const char *of_prop_next_string(struct 
property *prop,
  #if defined(CONFIG_OF) && defined(CONFIG_NUMA)
  extern int of_node_to_nid(struct device_node *np);
  #else
-static inline int of_node_to_nid(struct device_node *device) { return 0; }
+static inline int of_node_to_nid(struct device_node *device)
+{
+   return NUMA_NO_NODE;
+}
  #endif

  static inline struct device_node *of_find_matching_node(




--
Konstantin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] of: return NUMA_NO_NODE from fallback of_node_to_nid()

2015-04-08 Thread Konstantin Khlebnikov
Node 0 might be offline as well as any other numa node,
in this case kernel cannot handle memory allocation and crashes.

Signed-off-by: Konstantin Khlebnikov 
Fixes: 0c3f061c195c ("of: implement of_node_to_nid as a weak function")
---
 drivers/of/base.c  |2 +-
 include/linux/of.h |5 -
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/of/base.c b/drivers/of/base.c
index 8f165b112e03..51f4bd16e613 100644
--- a/drivers/of/base.c
+++ b/drivers/of/base.c
@@ -89,7 +89,7 @@ EXPORT_SYMBOL(of_n_size_cells);
 #ifdef CONFIG_NUMA
 int __weak of_node_to_nid(struct device_node *np)
 {
-   return numa_node_id();
+   return NUMA_NO_NODE;
 }
 #endif
 
diff --git a/include/linux/of.h b/include/linux/of.h
index dfde07e77a63..78a04ee85a9c 100644
--- a/include/linux/of.h
+++ b/include/linux/of.h
@@ -623,7 +623,10 @@ static inline const char *of_prop_next_string(struct 
property *prop,
 #if defined(CONFIG_OF) && defined(CONFIG_NUMA)
 extern int of_node_to_nid(struct device_node *np);
 #else
-static inline int of_node_to_nid(struct device_node *device) { return 0; }
+static inline int of_node_to_nid(struct device_node *device)
+{
+   return NUMA_NO_NODE;
+}
 #endif
 
 static inline struct device_node *of_find_matching_node(

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/