Re: [Bugfix] sched: fix possible invalid memory access caused by CPU hot-addition

2014-04-24 Thread Peter Zijlstra
On Thu, Apr 24, 2014 at 05:41:20PM +, Luck, Tony wrote:
> >>The BIOS always sends CPU hot-addition events before memory
> >> hot-addition events, so it's hard to change the order.
> >>And we couldn't completely solve this performance penalty because the
> >> affected code tries to allocate memory for all possible
> >> CPUs instead of onlined CPUs.
> >
> > So the BIOS is fucked, news at 11, one would have hoped Intel would have
> > _some_ say in it, but alas. So how about instead you force memory online
> > when you online the first CPU, screw whatever the BIOS does or does not?
> 
> Certainly an interesting implementation choice by the BIOS. The only logical
> order to use to bring components of a modern cpu online is:
> 
> 1) Memory - so we have a place to allocate structure needed for following 
> steps
> 2) Cores - so we have a place to direct interrupts from next step
> 3) I/O
> 
> We should log a bug against the BIOS ... but systems are already shipping so 
> we will
> have to deal with this.

Someone want to clue me in what systems these are so I can try and stay
the hell away from them?

> Either we use your existing patch - and systems with silly BIOS will work, 
> but with a
> small NUMA penalty for objects allocated remotely

Depending on how this all is constructed, I can imagine the worst case
where we bring up a medium to large system (8+ nodes, non fully
connected etc) and we only have memory for the first node online from
booting. The cpu bringup could be concurrent/fast-enough to not have any
other memory online.

This would result in all cpus having their memory on the first node
(including per-cpu chunks I would imagine), that's entirely retarded.

We should really refuse to bring up CPUs and boot in reduced capacity
for such demented systems.

> or ... we implement some crazy queuing scheme ... where we delay bringing 
> cores
> online for a while to see whether more things like memory and I/O start 
> showing
> up too.  We can't wait forever - people sometimes do configure systems with
> memory-less nodes.

Is there no distinction between the cases? I've really no idea how the
BIOS communicates this (and honestly no real desire to know), but it
would be best if we can kludge around this in the arch code and keep it
out of core code.

Did I already say that memory-less nodes are stupid? ;-)

> I think your existing solution is the better choice ... the penalties 
> probably aren't
> all that big ... so extensive workarounds for BIOS bugs seem like the wrong 
> direction.

Why can't we have the architecture code generate a memory add event on
the first cpu up of which there is no memory yet?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [Bugfix] sched: fix possible invalid memory access caused by CPU hot-addition

2014-04-24 Thread Luck, Tony
>>  The BIOS always sends CPU hot-addition events before memory
>> hot-addition events, so it's hard to change the order.
>>  And we couldn't completely solve this performance penalty because the
>> affected code tries to allocate memory for all possible
>> CPUs instead of onlined CPUs.
>
> So the BIOS is fucked, news at 11, one would have hoped Intel would have
> _some_ say in it, but alas. So how about instead you force memory online
> when you online the first CPU, screw whatever the BIOS does or does not?

Certainly an interesting implementation choice by the BIOS. The only logical
order to use to bring components of a modern cpu online is:

1) Memory - so we have a place to allocate structure needed for following steps
2) Cores - so we have a place to direct interrupts from next step
3) I/O

We should log a bug against the BIOS ... but systems are already shipping so we 
will
have to deal with this.

Either we use your existing patch - and systems with silly BIOS will work, but 
with a
small NUMA penalty for objects allocated remotely

or ... we implement some crazy queuing scheme ... where we delay bringing cores
online for a while to see whether more things like memory and I/O start showing
up too.  We can't wait forever - people sometimes do configure systems with
memory-less nodes.

I think your existing solution is the better choice ... the penalties probably 
aren't
all that big ... so extensive workarounds for BIOS bugs seem like the wrong 
direction.

Maybe a one-time printk() so the user knows they have a buggy BIOS might help
provide back pressure to BIOS teams to do this right in the future? But it isn't
a bug for the memory-less node case.

-Tony
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bugfix] sched: fix possible invalid memory access caused by CPU hot-addition

2014-04-24 Thread Peter Zijlstra
On Thu, Apr 24, 2014 at 10:59:45AM +0800, Jiang Liu wrote:
> On 2014/4/24 1:46, Luck, Tony wrote:
>  1) Handle CPU hot-addition event
>  1.a) gather platform specific information
>  1.b) associate hot-added CPU with a node
>  1.c) create CPU device
>  2) User online hot-added CPUs through sysfs:
>  2.a) cpu_up()
>  2.b) ->try_online_node()
>  2.c) ->hotadd_new_pgdat()
>  2.d) ->node_set_online()
> 
>  So between 1.b and 2.c, kmalloc_node(nid) may cause invalid
>  memory access without the node_online(nid) check.
> >>>
> >>> Any why was all this not in the Changelog?
> >>
> >> Also, do explain what kind of hardware you needed to trigger this. This
> >> code has been like this for a good while.
> > 
> > With your proposed fix in place the allocations will succeed - but they
> > will be done from other nodes ... and this cpu will have to do a remote
> > NUMA access for the rest of time.
> > 
> > It would be better to switch the order above - add the memory first,
> > then add the cpus.  Is that possible?
> Hi Tony,
>   The BIOS always sends CPU hot-addition events before memory
> hot-addition events, so it's hard to change the order.
>   And we couldn't completely solve this performance penalty because the
> affected code tries to allocate memory for all possible
> CPUs instead of onlined CPUs.

So the BIOS is fucked, news at 11, one would have hoped Intel would have
_some_ say in it, but alas. So how about instead you force memory online
when you online the first CPU, screw whatever the BIOS does or does not?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bugfix] sched: fix possible invalid memory access caused by CPU hot-addition

2014-04-24 Thread Peter Zijlstra
On Thu, Apr 24, 2014 at 10:59:45AM +0800, Jiang Liu wrote:
 On 2014/4/24 1:46, Luck, Tony wrote:
  1) Handle CPU hot-addition event
  1.a) gather platform specific information
  1.b) associate hot-added CPU with a node
  1.c) create CPU device
  2) User online hot-added CPUs through sysfs:
  2.a) cpu_up()
  2.b) -try_online_node()
  2.c) -hotadd_new_pgdat()
  2.d) -node_set_online()
 
  So between 1.b and 2.c, kmalloc_node(nid) may cause invalid
  memory access without the node_online(nid) check.
 
  Any why was all this not in the Changelog?
 
  Also, do explain what kind of hardware you needed to trigger this. This
  code has been like this for a good while.
  
  With your proposed fix in place the allocations will succeed - but they
  will be done from other nodes ... and this cpu will have to do a remote
  NUMA access for the rest of time.
  
  It would be better to switch the order above - add the memory first,
  then add the cpus.  Is that possible?
 Hi Tony,
   The BIOS always sends CPU hot-addition events before memory
 hot-addition events, so it's hard to change the order.
   And we couldn't completely solve this performance penalty because the
 affected code tries to allocate memory for all possible
 CPUs instead of onlined CPUs.

So the BIOS is fucked, news at 11, one would have hoped Intel would have
_some_ say in it, but alas. So how about instead you force memory online
when you online the first CPU, screw whatever the BIOS does or does not?
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [Bugfix] sched: fix possible invalid memory access caused by CPU hot-addition

2014-04-24 Thread Luck, Tony
  The BIOS always sends CPU hot-addition events before memory
 hot-addition events, so it's hard to change the order.
  And we couldn't completely solve this performance penalty because the
 affected code tries to allocate memory for all possible
 CPUs instead of onlined CPUs.

 So the BIOS is fucked, news at 11, one would have hoped Intel would have
 _some_ say in it, but alas. So how about instead you force memory online
 when you online the first CPU, screw whatever the BIOS does or does not?

Certainly an interesting implementation choice by the BIOS. The only logical
order to use to bring components of a modern cpu online is:

1) Memory - so we have a place to allocate structure needed for following steps
2) Cores - so we have a place to direct interrupts from next step
3) I/O

We should log a bug against the BIOS ... but systems are already shipping so we 
will
have to deal with this.

Either we use your existing patch - and systems with silly BIOS will work, but 
with a
small NUMA penalty for objects allocated remotely

or ... we implement some crazy queuing scheme ... where we delay bringing cores
online for a while to see whether more things like memory and I/O start showing
up too.  We can't wait forever - people sometimes do configure systems with
memory-less nodes.

I think your existing solution is the better choice ... the penalties probably 
aren't
all that big ... so extensive workarounds for BIOS bugs seem like the wrong 
direction.

Maybe a one-time printk() so the user knows they have a buggy BIOS might help
provide back pressure to BIOS teams to do this right in the future? But it isn't
a bug for the memory-less node case.

-Tony
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bugfix] sched: fix possible invalid memory access caused by CPU hot-addition

2014-04-24 Thread Peter Zijlstra
On Thu, Apr 24, 2014 at 05:41:20PM +, Luck, Tony wrote:
 The BIOS always sends CPU hot-addition events before memory
  hot-addition events, so it's hard to change the order.
 And we couldn't completely solve this performance penalty because the
  affected code tries to allocate memory for all possible
  CPUs instead of onlined CPUs.
 
  So the BIOS is fucked, news at 11, one would have hoped Intel would have
  _some_ say in it, but alas. So how about instead you force memory online
  when you online the first CPU, screw whatever the BIOS does or does not?
 
 Certainly an interesting implementation choice by the BIOS. The only logical
 order to use to bring components of a modern cpu online is:
 
 1) Memory - so we have a place to allocate structure needed for following 
 steps
 2) Cores - so we have a place to direct interrupts from next step
 3) I/O
 
 We should log a bug against the BIOS ... but systems are already shipping so 
 we will
 have to deal with this.

Someone want to clue me in what systems these are so I can try and stay
the hell away from them?

 Either we use your existing patch - and systems with silly BIOS will work, 
 but with a
 small NUMA penalty for objects allocated remotely

Depending on how this all is constructed, I can imagine the worst case
where we bring up a medium to large system (8+ nodes, non fully
connected etc) and we only have memory for the first node online from
booting. The cpu bringup could be concurrent/fast-enough to not have any
other memory online.

This would result in all cpus having their memory on the first node
(including per-cpu chunks I would imagine), that's entirely retarded.

We should really refuse to bring up CPUs and boot in reduced capacity
for such demented systems.

 or ... we implement some crazy queuing scheme ... where we delay bringing 
 cores
 online for a while to see whether more things like memory and I/O start 
 showing
 up too.  We can't wait forever - people sometimes do configure systems with
 memory-less nodes.

Is there no distinction between the cases? I've really no idea how the
BIOS communicates this (and honestly no real desire to know), but it
would be best if we can kludge around this in the arch code and keep it
out of core code.

Did I already say that memory-less nodes are stupid? ;-)

 I think your existing solution is the better choice ... the penalties 
 probably aren't
 all that big ... so extensive workarounds for BIOS bugs seem like the wrong 
 direction.

Why can't we have the architecture code generate a memory add event on
the first cpu up of which there is no memory yet?
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bugfix] sched: fix possible invalid memory access caused by CPU hot-addition

2014-04-23 Thread Jiang Liu
On 2014/4/24 1:46, Luck, Tony wrote:
 1) Handle CPU hot-addition event
 1.a) gather platform specific information
 1.b) associate hot-added CPU with a node
 1.c) create CPU device
 2) User online hot-added CPUs through sysfs:
 2.a)   cpu_up()
 2.b)   ->try_online_node()
 2.c)   ->hotadd_new_pgdat()
 2.d)   ->node_set_online()

 So between 1.b and 2.c, kmalloc_node(nid) may cause invalid
 memory access without the node_online(nid) check.
>>>
>>> Any why was all this not in the Changelog?
>>
>> Also, do explain what kind of hardware you needed to trigger this. This
>> code has been like this for a good while.
> 
> With your proposed fix in place the allocations will succeed - but they
> will be done from other nodes ... and this cpu will have to do a remote
> NUMA access for the rest of time.
> 
> It would be better to switch the order above - add the memory first,
> then add the cpus.  Is that possible?
Hi Tony,
The BIOS always sends CPU hot-addition events before memory
hot-addition events, so it's hard to change the order.
And we couldn't completely solve this performance penalty because the
affected code tries to allocate memory for all possible
CPUs instead of onlined CPUs.

Best Regards!
Gerry
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [Bugfix] sched: fix possible invalid memory access caused by CPU hot-addition

2014-04-23 Thread Luck, Tony
> > > 1) Handle CPU hot-addition event
> > > 1.a) gather platform specific information
> > > 1.b) associate hot-added CPU with a node
> > > 1.c) create CPU device
> > > 2) User online hot-added CPUs through sysfs:
> > > 2.a)  cpu_up()
> > > 2.b)  ->try_online_node()
> > > 2.c)  ->hotadd_new_pgdat()
> > > 2.d)  ->node_set_online()
> > > 
> > > So between 1.b and 2.c, kmalloc_node(nid) may cause invalid
> > > memory access without the node_online(nid) check.
> > 
> > Any why was all this not in the Changelog?
>
> Also, do explain what kind of hardware you needed to trigger this. This
> code has been like this for a good while.

With your proposed fix in place the allocations will succeed - but they
will be done from other nodes ... and this cpu will have to do a remote
NUMA access for the rest of time.

It would be better to switch the order above - add the memory first,
then add the cpus.  Is that possible?

-Tony
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [Bugfix] sched: fix possible invalid memory access caused by CPU hot-addition

2014-04-23 Thread Luck, Tony
   1) Handle CPU hot-addition event
   1.a) gather platform specific information
   1.b) associate hot-added CPU with a node
   1.c) create CPU device
   2) User online hot-added CPUs through sysfs:
   2.a)  cpu_up()
   2.b)  -try_online_node()
   2.c)  -hotadd_new_pgdat()
   2.d)  -node_set_online()
   
   So between 1.b and 2.c, kmalloc_node(nid) may cause invalid
   memory access without the node_online(nid) check.
  
  Any why was all this not in the Changelog?

 Also, do explain what kind of hardware you needed to trigger this. This
 code has been like this for a good while.

With your proposed fix in place the allocations will succeed - but they
will be done from other nodes ... and this cpu will have to do a remote
NUMA access for the rest of time.

It would be better to switch the order above - add the memory first,
then add the cpus.  Is that possible?

-Tony
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bugfix] sched: fix possible invalid memory access caused by CPU hot-addition

2014-04-23 Thread Jiang Liu
On 2014/4/24 1:46, Luck, Tony wrote:
 1) Handle CPU hot-addition event
 1.a) gather platform specific information
 1.b) associate hot-added CPU with a node
 1.c) create CPU device
 2) User online hot-added CPUs through sysfs:
 2.a)   cpu_up()
 2.b)   -try_online_node()
 2.c)   -hotadd_new_pgdat()
 2.d)   -node_set_online()

 So between 1.b and 2.c, kmalloc_node(nid) may cause invalid
 memory access without the node_online(nid) check.

 Any why was all this not in the Changelog?

 Also, do explain what kind of hardware you needed to trigger this. This
 code has been like this for a good while.
 
 With your proposed fix in place the allocations will succeed - but they
 will be done from other nodes ... and this cpu will have to do a remote
 NUMA access for the rest of time.
 
 It would be better to switch the order above - add the memory first,
 then add the cpus.  Is that possible?
Hi Tony,
The BIOS always sends CPU hot-addition events before memory
hot-addition events, so it's hard to change the order.
And we couldn't completely solve this performance penalty because the
affected code tries to allocate memory for all possible
CPUs instead of onlined CPUs.

Best Regards!
Gerry
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bugfix] sched: fix possible invalid memory access caused by CPU hot-addition

2014-04-22 Thread Peter Zijlstra
On Wed, Apr 23, 2014 at 07:32:13AM +0200, Peter Zijlstra wrote:
> On Wed, Apr 23, 2014 at 10:45:13AM +0800, Jiang Liu wrote:
> > Hi Peter,
> > It's not for memoryless node, but to solve a race window
> > in CPU hot-addition. The related CPU hot-addition flow is:
> > 1) Handle CPU hot-addition event
> > 1.a) gather platform specific information
> > 1.b) associate hot-added CPU with a node
> > 1.c) create CPU device
> > 2) User online hot-added CPUs through sysfs:
> > 2.a)cpu_up()
> > 2.b)->try_online_node()
> > 2.c)->hotadd_new_pgdat()
> > 2.d)->node_set_online()
> > 
> > So between 1.b and 2.c, kmalloc_node(nid) may cause invalid
> > memory access without the node_online(nid) check.
> 
> Any why was all this not in the Changelog?

Also, do explain what kind of hardware you needed to trigger this. This
code has been like this for a good while.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bugfix] sched: fix possible invalid memory access caused by CPU hot-addition

2014-04-22 Thread Jiang Liu
On 2014/4/23 13:32, Peter Zijlstra wrote:
> On Wed, Apr 23, 2014 at 10:45:13AM +0800, Jiang Liu wrote:
>> Hi Peter,
>>  It's not for memoryless node, but to solve a race window
>> in CPU hot-addition. The related CPU hot-addition flow is:
>> 1) Handle CPU hot-addition event
>> 1.a) gather platform specific information
>> 1.b) associate hot-added CPU with a node
>> 1.c) create CPU device
>> 2) User online hot-added CPUs through sysfs:
>> 2.a) cpu_up()
>> 2.b) ->try_online_node()
>> 2.c) ->hotadd_new_pgdat()
>> 2.d) ->node_set_online()
>>
>> So between 1.b and 2.c, kmalloc_node(nid) may cause invalid
>> memory access without the node_online(nid) check.
> 
> Any why was all this not in the Changelog?
Sorry, will add above message into changelog.
Thanks!
Gerry
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bugfix] sched: fix possible invalid memory access caused by CPU hot-addition

2014-04-22 Thread Peter Zijlstra
On Wed, Apr 23, 2014 at 10:45:13AM +0800, Jiang Liu wrote:
> Hi Peter,
>   It's not for memoryless node, but to solve a race window
> in CPU hot-addition. The related CPU hot-addition flow is:
> 1) Handle CPU hot-addition event
> 1.a) gather platform specific information
> 1.b) associate hot-added CPU with a node
> 1.c) create CPU device
> 2) User online hot-added CPUs through sysfs:
> 2.a)  cpu_up()
> 2.b)  ->try_online_node()
> 2.c)  ->hotadd_new_pgdat()
> 2.d)  ->node_set_online()
> 
> So between 1.b and 2.c, kmalloc_node(nid) may cause invalid
> memory access without the node_online(nid) check.

Any why was all this not in the Changelog?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bugfix] sched: fix possible invalid memory access caused by CPU hot-addition

2014-04-22 Thread Jiang Liu


On 2014/4/23 9:59, David Rientjes wrote:
> On Tue, 22 Apr 2014, Peter Zijlstra wrote:
> 
>> On Tue, Apr 22, 2014 at 01:01:51PM -0700, Andrew Morton wrote:
>>> On Tue, 22 Apr 2014 10:15:15 +0200 Peter Zijlstra  
>>> wrote:
>>>
 On Tue, Apr 22, 2014 at 01:27:15PM +0800, Jiang Liu wrote:
> When calling kzalloc_node(size, flags, node), we should first check
> whether node is onlined, otherwise it may cause invalid memory access
> as below.

 But this is only for memory less node crap, right? 
>>>
>>> um, why are memoryless nodes crap?
>>
>> Why wouldn't they be? Having CPUs with no local memory seems decidedly
>> suboptimal.
> 
> The quick fix for memoryless node issues is usually just do cpu_to_mem() 
> rather than cpu_to_node() in the caller.  This assumes that the arch is 
> setup correctly to handle memoryless nodes with 
> CONFIG_HAVE_MEMORYLESS_NODES (and we've had problems recently with 
> memoryless nodes not being configured correctly on powerpc).
> 
> That type of a fix would probably be better handled in the slab allocator, 
> though, since kmalloc_node(nid) shouldn't crash just because nid is 
> memoryless, we should be doing local_memory_node(node) when allocating the 
> slab pages.
> 
> However, I don't think memoryless nodes are the problem here since Jiang 
> is testing for !node_online(nid) in his patch, so it's a problem with 
> cpu_to_node() pointing to an offline node.  It makes sense for the page 
> allocator to crash in such a case, the node id is erroneous.
> 
> So either the cpu-to-node mapping is invalid or alloc_fair_sched_group() 
> is allocating memory for a cpu on an offline node.  The 
> for_each_possible_cpu() looks suspicious.  There's no guarantee that 
> local_memory_node(node) for an offline node will return anything with 
> affinity, so falling back to NUMA_NO_NODE looks appropriate in Jiang's 
> patch.
Hi David,
That's the case, alloc_fair_sched_group() is trying to allocate
memory for CPU in offline node, which then access non-exist NODE_DATA.
Thanks!
Gerry

> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bugfix] sched: fix possible invalid memory access caused by CPU hot-addition

2014-04-22 Thread Jiang Liu
Hi Peter,
It's not for memoryless node, but to solve a race window
in CPU hot-addition. The related CPU hot-addition flow is:
1) Handle CPU hot-addition event
1.a) gather platform specific information
1.b) associate hot-added CPU with a node
1.c) create CPU device
2) User online hot-added CPUs through sysfs:
2.a)cpu_up()
2.b)->try_online_node()
2.c)->hotadd_new_pgdat()
2.d)->node_set_online()

So between 1.b and 2.c, kmalloc_node(nid) may cause invalid
memory access without the node_online(nid) check.

Best Regards!
Gerry

On 2014/4/22 16:15, Peter Zijlstra wrote:
> On Tue, Apr 22, 2014 at 01:27:15PM +0800, Jiang Liu wrote:
>> When calling kzalloc_node(size, flags, node), we should first check
>> whether node is onlined, otherwise it may cause invalid memory access
>> as below.
> 
> But this is only for memory less node crap, right? 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bugfix] sched: fix possible invalid memory access caused by CPU hot-addition

2014-04-22 Thread Mike Galbraith
On Tue, 2014-04-22 at 22:04 +0200, Peter Zijlstra wrote: 
> On Tue, Apr 22, 2014 at 01:01:51PM -0700, Andrew Morton wrote:
> > On Tue, 22 Apr 2014 10:15:15 +0200 Peter Zijlstra  
> > wrote:
> > 
> > > On Tue, Apr 22, 2014 at 01:27:15PM +0800, Jiang Liu wrote:
> > > > When calling kzalloc_node(size, flags, node), we should first check
> > > > whether node is onlined, otherwise it may cause invalid memory access
> > > > as below.
> > > 
> > > But this is only for memory less node crap, right? 
> > 
> > um, why are memoryless nodes crap?
> 
> Why wouldn't they be? Having CPUs with no local memory seems decidedly
> suboptimal.

This ain't exactly wonderful either, makes CPU domains to crawl over.
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 30 31 32 33 34 35 36 37 38 39 
40 41 42 43 44 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 
81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 
120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 150 151 152 153 154 
155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 
175 176 177 178 179
node 0 size: 31723 MB
node 0 free: 27949 MB
node 1 cpus: 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 45 46 47 48 49 50 51 
52 53 54 55 56 57 58 59 105 106 107 108 109 110 111 112 113 114 115 116 117 118 
119 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149
node 1 size: 32308 MB
node 1 free: 31033 MB
node 2 cpus:
node 2 size: 32768 MB
node 2 free: 16631 MB
node 3 cpus:
node 3 size: 32768 MB
node 3 free: 32640 MB
node 4 cpus:
node 4 size: 32768 MB
node 4 free: 32640 MB
node 5 cpus:
node 5 size: 32768 MB
node 5 free: 32638 MB
node distances:
node   0   1   2   3   4   5 
  0:  10  12  12  15  12  15 
  1:  12  10  15  12  15  15 
  2:  12  15  10  12  15  15 
  3:  15  12  12  10  15  12 
  4:  12  15  15  15  10  12 
  5:  15  15  15  12  12  10


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bugfix] sched: fix possible invalid memory access caused by CPU hot-addition

2014-04-22 Thread David Rientjes
On Tue, 22 Apr 2014, Peter Zijlstra wrote:

> On Tue, Apr 22, 2014 at 01:01:51PM -0700, Andrew Morton wrote:
> > On Tue, 22 Apr 2014 10:15:15 +0200 Peter Zijlstra  
> > wrote:
> > 
> > > On Tue, Apr 22, 2014 at 01:27:15PM +0800, Jiang Liu wrote:
> > > > When calling kzalloc_node(size, flags, node), we should first check
> > > > whether node is onlined, otherwise it may cause invalid memory access
> > > > as below.
> > > 
> > > But this is only for memory less node crap, right? 
> > 
> > um, why are memoryless nodes crap?
> 
> Why wouldn't they be? Having CPUs with no local memory seems decidedly
> suboptimal.

The quick fix for memoryless node issues is usually just do cpu_to_mem() 
rather than cpu_to_node() in the caller.  This assumes that the arch is 
setup correctly to handle memoryless nodes with 
CONFIG_HAVE_MEMORYLESS_NODES (and we've had problems recently with 
memoryless nodes not being configured correctly on powerpc).

That type of a fix would probably be better handled in the slab allocator, 
though, since kmalloc_node(nid) shouldn't crash just because nid is 
memoryless, we should be doing local_memory_node(node) when allocating the 
slab pages.

However, I don't think memoryless nodes are the problem here since Jiang 
is testing for !node_online(nid) in his patch, so it's a problem with 
cpu_to_node() pointing to an offline node.  It makes sense for the page 
allocator to crash in such a case, the node id is erroneous.

So either the cpu-to-node mapping is invalid or alloc_fair_sched_group() 
is allocating memory for a cpu on an offline node.  The 
for_each_possible_cpu() looks suspicious.  There's no guarantee that 
local_memory_node(node) for an offline node will return anything with 
affinity, so falling back to NUMA_NO_NODE looks appropriate in Jiang's 
patch.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bugfix] sched: fix possible invalid memory access caused by CPU hot-addition

2014-04-22 Thread Peter Zijlstra
On Tue, Apr 22, 2014 at 01:01:51PM -0700, Andrew Morton wrote:
> On Tue, 22 Apr 2014 10:15:15 +0200 Peter Zijlstra  
> wrote:
> 
> > On Tue, Apr 22, 2014 at 01:27:15PM +0800, Jiang Liu wrote:
> > > When calling kzalloc_node(size, flags, node), we should first check
> > > whether node is onlined, otherwise it may cause invalid memory access
> > > as below.
> > 
> > But this is only for memory less node crap, right? 
> 
> um, why are memoryless nodes crap?

Why wouldn't they be? Having CPUs with no local memory seems decidedly
suboptimal.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bugfix] sched: fix possible invalid memory access caused by CPU hot-addition

2014-04-22 Thread Andrew Morton
On Tue, 22 Apr 2014 10:15:15 +0200 Peter Zijlstra  wrote:

> On Tue, Apr 22, 2014 at 01:27:15PM +0800, Jiang Liu wrote:
> > When calling kzalloc_node(size, flags, node), we should first check
> > whether node is onlined, otherwise it may cause invalid memory access
> > as below.
> 
> But this is only for memory less node crap, right? 

um, why are memoryless nodes crap?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bugfix] sched: fix possible invalid memory access caused by CPU hot-addition

2014-04-22 Thread Peter Zijlstra
On Tue, Apr 22, 2014 at 01:27:15PM +0800, Jiang Liu wrote:
> When calling kzalloc_node(size, flags, node), we should first check
> whether node is onlined, otherwise it may cause invalid memory access
> as below.

But this is only for memory less node crap, right? 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bugfix] sched: fix possible invalid memory access caused by CPU hot-addition

2014-04-22 Thread Peter Zijlstra
On Tue, Apr 22, 2014 at 01:27:15PM +0800, Jiang Liu wrote:
 When calling kzalloc_node(size, flags, node), we should first check
 whether node is onlined, otherwise it may cause invalid memory access
 as below.

But this is only for memory less node crap, right? 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bugfix] sched: fix possible invalid memory access caused by CPU hot-addition

2014-04-22 Thread Andrew Morton
On Tue, 22 Apr 2014 10:15:15 +0200 Peter Zijlstra pet...@infradead.org wrote:

 On Tue, Apr 22, 2014 at 01:27:15PM +0800, Jiang Liu wrote:
  When calling kzalloc_node(size, flags, node), we should first check
  whether node is onlined, otherwise it may cause invalid memory access
  as below.
 
 But this is only for memory less node crap, right? 

um, why are memoryless nodes crap?
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bugfix] sched: fix possible invalid memory access caused by CPU hot-addition

2014-04-22 Thread Peter Zijlstra
On Tue, Apr 22, 2014 at 01:01:51PM -0700, Andrew Morton wrote:
 On Tue, 22 Apr 2014 10:15:15 +0200 Peter Zijlstra pet...@infradead.org 
 wrote:
 
  On Tue, Apr 22, 2014 at 01:27:15PM +0800, Jiang Liu wrote:
   When calling kzalloc_node(size, flags, node), we should first check
   whether node is onlined, otherwise it may cause invalid memory access
   as below.
  
  But this is only for memory less node crap, right? 
 
 um, why are memoryless nodes crap?

Why wouldn't they be? Having CPUs with no local memory seems decidedly
suboptimal.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bugfix] sched: fix possible invalid memory access caused by CPU hot-addition

2014-04-22 Thread David Rientjes
On Tue, 22 Apr 2014, Peter Zijlstra wrote:

 On Tue, Apr 22, 2014 at 01:01:51PM -0700, Andrew Morton wrote:
  On Tue, 22 Apr 2014 10:15:15 +0200 Peter Zijlstra pet...@infradead.org 
  wrote:
  
   On Tue, Apr 22, 2014 at 01:27:15PM +0800, Jiang Liu wrote:
When calling kzalloc_node(size, flags, node), we should first check
whether node is onlined, otherwise it may cause invalid memory access
as below.
   
   But this is only for memory less node crap, right? 
  
  um, why are memoryless nodes crap?
 
 Why wouldn't they be? Having CPUs with no local memory seems decidedly
 suboptimal.

The quick fix for memoryless node issues is usually just do cpu_to_mem() 
rather than cpu_to_node() in the caller.  This assumes that the arch is 
setup correctly to handle memoryless nodes with 
CONFIG_HAVE_MEMORYLESS_NODES (and we've had problems recently with 
memoryless nodes not being configured correctly on powerpc).

That type of a fix would probably be better handled in the slab allocator, 
though, since kmalloc_node(nid) shouldn't crash just because nid is 
memoryless, we should be doing local_memory_node(node) when allocating the 
slab pages.

However, I don't think memoryless nodes are the problem here since Jiang 
is testing for !node_online(nid) in his patch, so it's a problem with 
cpu_to_node() pointing to an offline node.  It makes sense for the page 
allocator to crash in such a case, the node id is erroneous.

So either the cpu-to-node mapping is invalid or alloc_fair_sched_group() 
is allocating memory for a cpu on an offline node.  The 
for_each_possible_cpu() looks suspicious.  There's no guarantee that 
local_memory_node(node) for an offline node will return anything with 
affinity, so falling back to NUMA_NO_NODE looks appropriate in Jiang's 
patch.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bugfix] sched: fix possible invalid memory access caused by CPU hot-addition

2014-04-22 Thread Mike Galbraith
On Tue, 2014-04-22 at 22:04 +0200, Peter Zijlstra wrote: 
 On Tue, Apr 22, 2014 at 01:01:51PM -0700, Andrew Morton wrote:
  On Tue, 22 Apr 2014 10:15:15 +0200 Peter Zijlstra pet...@infradead.org 
  wrote:
  
   On Tue, Apr 22, 2014 at 01:27:15PM +0800, Jiang Liu wrote:
When calling kzalloc_node(size, flags, node), we should first check
whether node is onlined, otherwise it may cause invalid memory access
as below.
   
   But this is only for memory less node crap, right? 
  
  um, why are memoryless nodes crap?
 
 Why wouldn't they be? Having CPUs with no local memory seems decidedly
 suboptimal.

This ain't exactly wonderful either, makes CPU domains to crawl over.
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 30 31 32 33 34 35 36 37 38 39 
40 41 42 43 44 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 
81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 
120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 150 151 152 153 154 
155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 
175 176 177 178 179
node 0 size: 31723 MB
node 0 free: 27949 MB
node 1 cpus: 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 45 46 47 48 49 50 51 
52 53 54 55 56 57 58 59 105 106 107 108 109 110 111 112 113 114 115 116 117 118 
119 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149
node 1 size: 32308 MB
node 1 free: 31033 MB
node 2 cpus:
node 2 size: 32768 MB
node 2 free: 16631 MB
node 3 cpus:
node 3 size: 32768 MB
node 3 free: 32640 MB
node 4 cpus:
node 4 size: 32768 MB
node 4 free: 32640 MB
node 5 cpus:
node 5 size: 32768 MB
node 5 free: 32638 MB
node distances:
node   0   1   2   3   4   5 
  0:  10  12  12  15  12  15 
  1:  12  10  15  12  15  15 
  2:  12  15  10  12  15  15 
  3:  15  12  12  10  15  12 
  4:  12  15  15  15  10  12 
  5:  15  15  15  12  12  10


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bugfix] sched: fix possible invalid memory access caused by CPU hot-addition

2014-04-22 Thread Jiang Liu
Hi Peter,
It's not for memoryless node, but to solve a race window
in CPU hot-addition. The related CPU hot-addition flow is:
1) Handle CPU hot-addition event
1.a) gather platform specific information
1.b) associate hot-added CPU with a node
1.c) create CPU device
2) User online hot-added CPUs through sysfs:
2.a)cpu_up()
2.b)-try_online_node()
2.c)-hotadd_new_pgdat()
2.d)-node_set_online()

So between 1.b and 2.c, kmalloc_node(nid) may cause invalid
memory access without the node_online(nid) check.

Best Regards!
Gerry

On 2014/4/22 16:15, Peter Zijlstra wrote:
 On Tue, Apr 22, 2014 at 01:27:15PM +0800, Jiang Liu wrote:
 When calling kzalloc_node(size, flags, node), we should first check
 whether node is onlined, otherwise it may cause invalid memory access
 as below.
 
 But this is only for memory less node crap, right? 
 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bugfix] sched: fix possible invalid memory access caused by CPU hot-addition

2014-04-22 Thread Jiang Liu


On 2014/4/23 9:59, David Rientjes wrote:
 On Tue, 22 Apr 2014, Peter Zijlstra wrote:
 
 On Tue, Apr 22, 2014 at 01:01:51PM -0700, Andrew Morton wrote:
 On Tue, 22 Apr 2014 10:15:15 +0200 Peter Zijlstra pet...@infradead.org 
 wrote:

 On Tue, Apr 22, 2014 at 01:27:15PM +0800, Jiang Liu wrote:
 When calling kzalloc_node(size, flags, node), we should first check
 whether node is onlined, otherwise it may cause invalid memory access
 as below.

 But this is only for memory less node crap, right? 

 um, why are memoryless nodes crap?

 Why wouldn't they be? Having CPUs with no local memory seems decidedly
 suboptimal.
 
 The quick fix for memoryless node issues is usually just do cpu_to_mem() 
 rather than cpu_to_node() in the caller.  This assumes that the arch is 
 setup correctly to handle memoryless nodes with 
 CONFIG_HAVE_MEMORYLESS_NODES (and we've had problems recently with 
 memoryless nodes not being configured correctly on powerpc).
 
 That type of a fix would probably be better handled in the slab allocator, 
 though, since kmalloc_node(nid) shouldn't crash just because nid is 
 memoryless, we should be doing local_memory_node(node) when allocating the 
 slab pages.
 
 However, I don't think memoryless nodes are the problem here since Jiang 
 is testing for !node_online(nid) in his patch, so it's a problem with 
 cpu_to_node() pointing to an offline node.  It makes sense for the page 
 allocator to crash in such a case, the node id is erroneous.
 
 So either the cpu-to-node mapping is invalid or alloc_fair_sched_group() 
 is allocating memory for a cpu on an offline node.  The 
 for_each_possible_cpu() looks suspicious.  There's no guarantee that 
 local_memory_node(node) for an offline node will return anything with 
 affinity, so falling back to NUMA_NO_NODE looks appropriate in Jiang's 
 patch.
Hi David,
That's the case, alloc_fair_sched_group() is trying to allocate
memory for CPU in offline node, which then access non-exist NODE_DATA.
Thanks!
Gerry

 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bugfix] sched: fix possible invalid memory access caused by CPU hot-addition

2014-04-22 Thread Peter Zijlstra
On Wed, Apr 23, 2014 at 10:45:13AM +0800, Jiang Liu wrote:
 Hi Peter,
   It's not for memoryless node, but to solve a race window
 in CPU hot-addition. The related CPU hot-addition flow is:
 1) Handle CPU hot-addition event
 1.a) gather platform specific information
 1.b) associate hot-added CPU with a node
 1.c) create CPU device
 2) User online hot-added CPUs through sysfs:
 2.a)  cpu_up()
 2.b)  -try_online_node()
 2.c)  -hotadd_new_pgdat()
 2.d)  -node_set_online()
 
 So between 1.b and 2.c, kmalloc_node(nid) may cause invalid
 memory access without the node_online(nid) check.

Any why was all this not in the Changelog?
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bugfix] sched: fix possible invalid memory access caused by CPU hot-addition

2014-04-22 Thread Jiang Liu
On 2014/4/23 13:32, Peter Zijlstra wrote:
 On Wed, Apr 23, 2014 at 10:45:13AM +0800, Jiang Liu wrote:
 Hi Peter,
  It's not for memoryless node, but to solve a race window
 in CPU hot-addition. The related CPU hot-addition flow is:
 1) Handle CPU hot-addition event
 1.a) gather platform specific information
 1.b) associate hot-added CPU with a node
 1.c) create CPU device
 2) User online hot-added CPUs through sysfs:
 2.a) cpu_up()
 2.b) -try_online_node()
 2.c) -hotadd_new_pgdat()
 2.d) -node_set_online()

 So between 1.b and 2.c, kmalloc_node(nid) may cause invalid
 memory access without the node_online(nid) check.
 
 Any why was all this not in the Changelog?
Sorry, will add above message into changelog.
Thanks!
Gerry
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bugfix] sched: fix possible invalid memory access caused by CPU hot-addition

2014-04-22 Thread Peter Zijlstra
On Wed, Apr 23, 2014 at 07:32:13AM +0200, Peter Zijlstra wrote:
 On Wed, Apr 23, 2014 at 10:45:13AM +0800, Jiang Liu wrote:
  Hi Peter,
  It's not for memoryless node, but to solve a race window
  in CPU hot-addition. The related CPU hot-addition flow is:
  1) Handle CPU hot-addition event
  1.a) gather platform specific information
  1.b) associate hot-added CPU with a node
  1.c) create CPU device
  2) User online hot-added CPUs through sysfs:
  2.a)cpu_up()
  2.b)-try_online_node()
  2.c)-hotadd_new_pgdat()
  2.d)-node_set_online()
  
  So between 1.b and 2.c, kmalloc_node(nid) may cause invalid
  memory access without the node_online(nid) check.
 
 Any why was all this not in the Changelog?

Also, do explain what kind of hardware you needed to trigger this. This
code has been like this for a good while.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[Bugfix] sched: fix possible invalid memory access caused by CPU hot-addition

2014-04-21 Thread Jiang Liu
When calling kzalloc_node(size, flags, node), we should first check
whether node is onlined, otherwise it may cause invalid memory access
as below.

[ 3663.324476] BUG: unable to handle kernel paging request at 1f08
[ 3663.332348] IP: [] __alloc_pages_nodemask+0xb9/0x2d0
[ 3663.339719] PGD 82fe10067 PUD 82ebef067 PMD 0
[ 3663.344773] Oops:  [#1] SMP
[ 3663.348455] Modules linked in: shpchp gpio_ich x86_pkg_temp_thermal 
intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul 
ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper 
cryptd microcode joydev sb_edac edac_core lpc_ich ipmi_si tpm_tis 
ipmi_msghandler ioatdma wmi acpi_pad mac_hid lp parport ixgbe isci mpt2sas dca 
ahci ptp libsas libahci raid_class pps_core scsi_transport_sas mdio hid_generic 
usbhid hid
[ 3663.394393] CPU: 61 PID: 2416 Comm: cron Tainted: GW3.14.0-rc5+ 
#21
[ 3663.402643] Hardware name: Intel Corporation BRICKLAND/BRICKLAND, BIOS 
BRIVTIN1.86B.0047.F03.1403031049 03/03/2014
[ 3663.414299] task: 88082fe54b00 ti: 880845fba000 task.ti: 
880845fba000
[ 3663.422741] RIP: 0010:[]  [] 
__alloc_pages_nodemask+0xb9/0x2d0
[ 3663.432857] RSP: 0018:880845fbbcd0  EFLAGS: 00010246
[ 3663.439265] RAX: 1f00 RBX:  RCX: 
[ 3663.447291] RDX:  RSI: 0a8d RDI: 81a8d950
[ 3663.455318] RBP: 880845fbbd58 R08: 880823293400 R09: 0001
[ 3663.463345] R10: 0001 R11:  R12: 002052d0
[ 3663.471363] R13: 880854c07600 R14: 0002 R15: 
[ 3663.479389] FS:  7f2e8b99e800() GS:88105a40() 
knlGS:
[ 3663.488514] CS:  0010 DS:  ES:  CR0: 80050033
[ 3663.495018] CR2: 1f08 CR3: 0008237b1000 CR4: 001407e0
[ 3663.503476] Stack:
[ 3663.505757]  811bd74d 880854c01d98 880854c01df0 
880854c01dd0
[ 3663.514167]  0003208ca420 00075a5d84d0 88082fe54b00 
811bb35f
[ 3663.522567]  880854c07600 0003 1f00 
880845fbbd48
[ 3663.530976] Call Trace:
[ 3663.533753]  [] ? deactivate_slab+0x41d/0x4f0
[ 3663.540421]  [] ? new_slab+0x3f/0x2d0
[ 3663.546307]  [] new_slab+0xa5/0x2d0
[ 3663.552001]  [] __slab_alloc+0x35d/0x54a
[ 3663.558185]  [] ? local_clock+0x25/0x30
[ 3663.564686]  [] ? __do_page_fault+0x4ec/0x5e0
[ 3663.571356]  [] ? alloc_fair_sched_group+0xc4/0x190
[ 3663.578609]  [] ? __raw_spin_lock_init+0x21/0x60
[ 3663.585570]  [] kmem_cache_alloc_node_trace+0xa6/0x1d0
[ 3663.593112]  [] ? alloc_fair_sched_group+0xc4/0x190
[ 3663.600363]  [] alloc_fair_sched_group+0xc4/0x190
[ 3663.607423]  [] sched_create_group+0x3f/0x80
[ 3663.613994]  [] sched_autogroup_create_attach+0x3f/0x1b0
[ 3663.621732]  [] sys_setsid+0xea/0x110
[ 3663.628020]  [] system_call_fastpath+0x1a/0x1f
[ 3663.634780] Code: 00 44 89 e7 e8 b9 f8 f4 ff 41 f6 c4 10 74 18 31 d2 be 8d 
0a 00 00 48 c7 c7 50 d9 a8 81 e8 70 6a f2 ff e8 db dd 5f 00 48 8b 45 c8 <48> 83 
78 08 00 0f 84 b5 01 00 00 48 83 c0 08 44 89 75 c0 4d 89
[ 3663.657032] RIP  [] __alloc_pages_nodemask+0xb9/0x2d0
[ 3663.664491]  RSP 
[ 3663.668429] CR2: 1f08
[ 3663.672659] ---[ end trace df13f08ed9de18ad ]---

Signed-off-by: Jiang Liu 
---
 kernel/sched/fair.c |   12 +++-
 kernel/sched/rt.c   |   11 +++
 2 files changed, 14 insertions(+), 9 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 7e9bd0b1fa9e..1c093d213b0f 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7485,7 +7485,7 @@ int alloc_fair_sched_group(struct task_group *tg, struct 
task_group *parent)
 {
struct cfs_rq *cfs_rq;
struct sched_entity *se;
-   int i;
+   int i, nid;
 
tg->cfs_rq = kzalloc(sizeof(cfs_rq) * nr_cpu_ids, GFP_KERNEL);
if (!tg->cfs_rq)
@@ -7499,13 +7499,15 @@ int alloc_fair_sched_group(struct task_group *tg, 
struct task_group *parent)
init_cfs_bandwidth(tg_cfs_bandwidth(tg));
 
for_each_possible_cpu(i) {
-   cfs_rq = kzalloc_node(sizeof(struct cfs_rq),
- GFP_KERNEL, cpu_to_node(i));
+   nid = cpu_to_node(i);
+   if (nid != NUMA_NO_NODE && !node_online(nid))
+   nid = NUMA_NO_NODE;
+
+   cfs_rq = kzalloc_node(sizeof(struct cfs_rq), GFP_KERNEL, nid);
if (!cfs_rq)
goto err;
 
-   se = kzalloc_node(sizeof(struct sched_entity),
- GFP_KERNEL, cpu_to_node(i));
+   se = kzalloc_node(sizeof(struct sched_entity), GFP_KERNEL, nid);
if (!se)
goto err_free_rq;
 
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index d8cdf1618551..5a5a32e1115c 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -161,7 +161,7 @@ int 

[Bugfix] sched: fix possible invalid memory access caused by CPU hot-addition

2014-04-21 Thread Jiang Liu
When calling kzalloc_node(size, flags, node), we should first check
whether node is onlined, otherwise it may cause invalid memory access
as below.

[ 3663.324476] BUG: unable to handle kernel paging request at 1f08
[ 3663.332348] IP: [81172219] __alloc_pages_nodemask+0xb9/0x2d0
[ 3663.339719] PGD 82fe10067 PUD 82ebef067 PMD 0
[ 3663.344773] Oops:  [#1] SMP
[ 3663.348455] Modules linked in: shpchp gpio_ich x86_pkg_temp_thermal 
intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul 
ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper 
cryptd microcode joydev sb_edac edac_core lpc_ich ipmi_si tpm_tis 
ipmi_msghandler ioatdma wmi acpi_pad mac_hid lp parport ixgbe isci mpt2sas dca 
ahci ptp libsas libahci raid_class pps_core scsi_transport_sas mdio hid_generic 
usbhid hid
[ 3663.394393] CPU: 61 PID: 2416 Comm: cron Tainted: GW3.14.0-rc5+ 
#21
[ 3663.402643] Hardware name: Intel Corporation BRICKLAND/BRICKLAND, BIOS 
BRIVTIN1.86B.0047.F03.1403031049 03/03/2014
[ 3663.414299] task: 88082fe54b00 ti: 880845fba000 task.ti: 
880845fba000
[ 3663.422741] RIP: 0010:[81172219]  [81172219] 
__alloc_pages_nodemask+0xb9/0x2d0
[ 3663.432857] RSP: 0018:880845fbbcd0  EFLAGS: 00010246
[ 3663.439265] RAX: 1f00 RBX:  RCX: 
[ 3663.447291] RDX:  RSI: 0a8d RDI: 81a8d950
[ 3663.455318] RBP: 880845fbbd58 R08: 880823293400 R09: 0001
[ 3663.463345] R10: 0001 R11:  R12: 002052d0
[ 3663.471363] R13: 880854c07600 R14: 0002 R15: 
[ 3663.479389] FS:  7f2e8b99e800() GS:88105a40() 
knlGS:
[ 3663.488514] CS:  0010 DS:  ES:  CR0: 80050033
[ 3663.495018] CR2: 1f08 CR3: 0008237b1000 CR4: 001407e0
[ 3663.503476] Stack:
[ 3663.505757]  811bd74d 880854c01d98 880854c01df0 
880854c01dd0
[ 3663.514167]  0003208ca420 00075a5d84d0 88082fe54b00 
811bb35f
[ 3663.522567]  880854c07600 0003 1f00 
880845fbbd48
[ 3663.530976] Call Trace:
[ 3663.533753]  [811bd74d] ? deactivate_slab+0x41d/0x4f0
[ 3663.540421]  [811bb35f] ? new_slab+0x3f/0x2d0
[ 3663.546307]  [811bb3c5] new_slab+0xa5/0x2d0
[ 3663.552001]  [81768c97] __slab_alloc+0x35d/0x54a
[ 3663.558185]  [810a4845] ? local_clock+0x25/0x30
[ 3663.564686]  [8177a34c] ? __do_page_fault+0x4ec/0x5e0
[ 3663.571356]  [810b0054] ? alloc_fair_sched_group+0xc4/0x190
[ 3663.578609]  [810c77f1] ? __raw_spin_lock_init+0x21/0x60
[ 3663.585570]  [811be476] kmem_cache_alloc_node_trace+0xa6/0x1d0
[ 3663.593112]  [810b0054] ? alloc_fair_sched_group+0xc4/0x190
[ 3663.600363]  [810b0054] alloc_fair_sched_group+0xc4/0x190
[ 3663.607423]  [810a359f] sched_create_group+0x3f/0x80
[ 3663.613994]  [810b611f] sched_autogroup_create_attach+0x3f/0x1b0
[ 3663.621732]  [8108258a] sys_setsid+0xea/0x110
[ 3663.628020]  [8177f42d] system_call_fastpath+0x1a/0x1f
[ 3663.634780] Code: 00 44 89 e7 e8 b9 f8 f4 ff 41 f6 c4 10 74 18 31 d2 be 8d 
0a 00 00 48 c7 c7 50 d9 a8 81 e8 70 6a f2 ff e8 db dd 5f 00 48 8b 45 c8 48 83 
78 08 00 0f 84 b5 01 00 00 48 83 c0 08 44 89 75 c0 4d 89
[ 3663.657032] RIP  [81172219] __alloc_pages_nodemask+0xb9/0x2d0
[ 3663.664491]  RSP 880845fbbcd0
[ 3663.668429] CR2: 1f08
[ 3663.672659] ---[ end trace df13f08ed9de18ad ]---

Signed-off-by: Jiang Liu jiang@linux.intel.com
---
 kernel/sched/fair.c |   12 +++-
 kernel/sched/rt.c   |   11 +++
 2 files changed, 14 insertions(+), 9 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 7e9bd0b1fa9e..1c093d213b0f 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7485,7 +7485,7 @@ int alloc_fair_sched_group(struct task_group *tg, struct 
task_group *parent)
 {
struct cfs_rq *cfs_rq;
struct sched_entity *se;
-   int i;
+   int i, nid;
 
tg-cfs_rq = kzalloc(sizeof(cfs_rq) * nr_cpu_ids, GFP_KERNEL);
if (!tg-cfs_rq)
@@ -7499,13 +7499,15 @@ int alloc_fair_sched_group(struct task_group *tg, 
struct task_group *parent)
init_cfs_bandwidth(tg_cfs_bandwidth(tg));
 
for_each_possible_cpu(i) {
-   cfs_rq = kzalloc_node(sizeof(struct cfs_rq),
- GFP_KERNEL, cpu_to_node(i));
+   nid = cpu_to_node(i);
+   if (nid != NUMA_NO_NODE  !node_online(nid))
+   nid = NUMA_NO_NODE;
+
+   cfs_rq = kzalloc_node(sizeof(struct cfs_rq), GFP_KERNEL, nid);
if (!cfs_rq)
goto err;
 
-   se = kzalloc_node(sizeof(struct sched_entity),
- GFP_KERNEL, cpu_to_node(i));