Re: [Bugfix] sched: fix possible invalid memory access caused by CPU hot-addition
On Thu, Apr 24, 2014 at 05:41:20PM +, Luck, Tony wrote: > >>The BIOS always sends CPU hot-addition events before memory > >> hot-addition events, so it's hard to change the order. > >>And we couldn't completely solve this performance penalty because the > >> affected code tries to allocate memory for all possible > >> CPUs instead of onlined CPUs. > > > > So the BIOS is fucked, news at 11, one would have hoped Intel would have > > _some_ say in it, but alas. So how about instead you force memory online > > when you online the first CPU, screw whatever the BIOS does or does not? > > Certainly an interesting implementation choice by the BIOS. The only logical > order to use to bring components of a modern cpu online is: > > 1) Memory - so we have a place to allocate structure needed for following > steps > 2) Cores - so we have a place to direct interrupts from next step > 3) I/O > > We should log a bug against the BIOS ... but systems are already shipping so > we will > have to deal with this. Someone want to clue me in what systems these are so I can try and stay the hell away from them? > Either we use your existing patch - and systems with silly BIOS will work, > but with a > small NUMA penalty for objects allocated remotely Depending on how this all is constructed, I can imagine the worst case where we bring up a medium to large system (8+ nodes, non fully connected etc) and we only have memory for the first node online from booting. The cpu bringup could be concurrent/fast-enough to not have any other memory online. This would result in all cpus having their memory on the first node (including per-cpu chunks I would imagine), that's entirely retarded. We should really refuse to bring up CPUs and boot in reduced capacity for such demented systems. > or ... we implement some crazy queuing scheme ... where we delay bringing > cores > online for a while to see whether more things like memory and I/O start > showing > up too. We can't wait forever - people sometimes do configure systems with > memory-less nodes. Is there no distinction between the cases? I've really no idea how the BIOS communicates this (and honestly no real desire to know), but it would be best if we can kludge around this in the arch code and keep it out of core code. Did I already say that memory-less nodes are stupid? ;-) > I think your existing solution is the better choice ... the penalties > probably aren't > all that big ... so extensive workarounds for BIOS bugs seem like the wrong > direction. Why can't we have the architecture code generate a memory add event on the first cpu up of which there is no memory yet? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [Bugfix] sched: fix possible invalid memory access caused by CPU hot-addition
>> The BIOS always sends CPU hot-addition events before memory >> hot-addition events, so it's hard to change the order. >> And we couldn't completely solve this performance penalty because the >> affected code tries to allocate memory for all possible >> CPUs instead of onlined CPUs. > > So the BIOS is fucked, news at 11, one would have hoped Intel would have > _some_ say in it, but alas. So how about instead you force memory online > when you online the first CPU, screw whatever the BIOS does or does not? Certainly an interesting implementation choice by the BIOS. The only logical order to use to bring components of a modern cpu online is: 1) Memory - so we have a place to allocate structure needed for following steps 2) Cores - so we have a place to direct interrupts from next step 3) I/O We should log a bug against the BIOS ... but systems are already shipping so we will have to deal with this. Either we use your existing patch - and systems with silly BIOS will work, but with a small NUMA penalty for objects allocated remotely or ... we implement some crazy queuing scheme ... where we delay bringing cores online for a while to see whether more things like memory and I/O start showing up too. We can't wait forever - people sometimes do configure systems with memory-less nodes. I think your existing solution is the better choice ... the penalties probably aren't all that big ... so extensive workarounds for BIOS bugs seem like the wrong direction. Maybe a one-time printk() so the user knows they have a buggy BIOS might help provide back pressure to BIOS teams to do this right in the future? But it isn't a bug for the memory-less node case. -Tony -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bugfix] sched: fix possible invalid memory access caused by CPU hot-addition
On Thu, Apr 24, 2014 at 10:59:45AM +0800, Jiang Liu wrote: > On 2014/4/24 1:46, Luck, Tony wrote: > 1) Handle CPU hot-addition event > 1.a) gather platform specific information > 1.b) associate hot-added CPU with a node > 1.c) create CPU device > 2) User online hot-added CPUs through sysfs: > 2.a) cpu_up() > 2.b) ->try_online_node() > 2.c) ->hotadd_new_pgdat() > 2.d) ->node_set_online() > > So between 1.b and 2.c, kmalloc_node(nid) may cause invalid > memory access without the node_online(nid) check. > >>> > >>> Any why was all this not in the Changelog? > >> > >> Also, do explain what kind of hardware you needed to trigger this. This > >> code has been like this for a good while. > > > > With your proposed fix in place the allocations will succeed - but they > > will be done from other nodes ... and this cpu will have to do a remote > > NUMA access for the rest of time. > > > > It would be better to switch the order above - add the memory first, > > then add the cpus. Is that possible? > Hi Tony, > The BIOS always sends CPU hot-addition events before memory > hot-addition events, so it's hard to change the order. > And we couldn't completely solve this performance penalty because the > affected code tries to allocate memory for all possible > CPUs instead of onlined CPUs. So the BIOS is fucked, news at 11, one would have hoped Intel would have _some_ say in it, but alas. So how about instead you force memory online when you online the first CPU, screw whatever the BIOS does or does not? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bugfix] sched: fix possible invalid memory access caused by CPU hot-addition
On Thu, Apr 24, 2014 at 10:59:45AM +0800, Jiang Liu wrote: On 2014/4/24 1:46, Luck, Tony wrote: 1) Handle CPU hot-addition event 1.a) gather platform specific information 1.b) associate hot-added CPU with a node 1.c) create CPU device 2) User online hot-added CPUs through sysfs: 2.a) cpu_up() 2.b) -try_online_node() 2.c) -hotadd_new_pgdat() 2.d) -node_set_online() So between 1.b and 2.c, kmalloc_node(nid) may cause invalid memory access without the node_online(nid) check. Any why was all this not in the Changelog? Also, do explain what kind of hardware you needed to trigger this. This code has been like this for a good while. With your proposed fix in place the allocations will succeed - but they will be done from other nodes ... and this cpu will have to do a remote NUMA access for the rest of time. It would be better to switch the order above - add the memory first, then add the cpus. Is that possible? Hi Tony, The BIOS always sends CPU hot-addition events before memory hot-addition events, so it's hard to change the order. And we couldn't completely solve this performance penalty because the affected code tries to allocate memory for all possible CPUs instead of onlined CPUs. So the BIOS is fucked, news at 11, one would have hoped Intel would have _some_ say in it, but alas. So how about instead you force memory online when you online the first CPU, screw whatever the BIOS does or does not? -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [Bugfix] sched: fix possible invalid memory access caused by CPU hot-addition
The BIOS always sends CPU hot-addition events before memory hot-addition events, so it's hard to change the order. And we couldn't completely solve this performance penalty because the affected code tries to allocate memory for all possible CPUs instead of onlined CPUs. So the BIOS is fucked, news at 11, one would have hoped Intel would have _some_ say in it, but alas. So how about instead you force memory online when you online the first CPU, screw whatever the BIOS does or does not? Certainly an interesting implementation choice by the BIOS. The only logical order to use to bring components of a modern cpu online is: 1) Memory - so we have a place to allocate structure needed for following steps 2) Cores - so we have a place to direct interrupts from next step 3) I/O We should log a bug against the BIOS ... but systems are already shipping so we will have to deal with this. Either we use your existing patch - and systems with silly BIOS will work, but with a small NUMA penalty for objects allocated remotely or ... we implement some crazy queuing scheme ... where we delay bringing cores online for a while to see whether more things like memory and I/O start showing up too. We can't wait forever - people sometimes do configure systems with memory-less nodes. I think your existing solution is the better choice ... the penalties probably aren't all that big ... so extensive workarounds for BIOS bugs seem like the wrong direction. Maybe a one-time printk() so the user knows they have a buggy BIOS might help provide back pressure to BIOS teams to do this right in the future? But it isn't a bug for the memory-less node case. -Tony -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bugfix] sched: fix possible invalid memory access caused by CPU hot-addition
On Thu, Apr 24, 2014 at 05:41:20PM +, Luck, Tony wrote: The BIOS always sends CPU hot-addition events before memory hot-addition events, so it's hard to change the order. And we couldn't completely solve this performance penalty because the affected code tries to allocate memory for all possible CPUs instead of onlined CPUs. So the BIOS is fucked, news at 11, one would have hoped Intel would have _some_ say in it, but alas. So how about instead you force memory online when you online the first CPU, screw whatever the BIOS does or does not? Certainly an interesting implementation choice by the BIOS. The only logical order to use to bring components of a modern cpu online is: 1) Memory - so we have a place to allocate structure needed for following steps 2) Cores - so we have a place to direct interrupts from next step 3) I/O We should log a bug against the BIOS ... but systems are already shipping so we will have to deal with this. Someone want to clue me in what systems these are so I can try and stay the hell away from them? Either we use your existing patch - and systems with silly BIOS will work, but with a small NUMA penalty for objects allocated remotely Depending on how this all is constructed, I can imagine the worst case where we bring up a medium to large system (8+ nodes, non fully connected etc) and we only have memory for the first node online from booting. The cpu bringup could be concurrent/fast-enough to not have any other memory online. This would result in all cpus having their memory on the first node (including per-cpu chunks I would imagine), that's entirely retarded. We should really refuse to bring up CPUs and boot in reduced capacity for such demented systems. or ... we implement some crazy queuing scheme ... where we delay bringing cores online for a while to see whether more things like memory and I/O start showing up too. We can't wait forever - people sometimes do configure systems with memory-less nodes. Is there no distinction between the cases? I've really no idea how the BIOS communicates this (and honestly no real desire to know), but it would be best if we can kludge around this in the arch code and keep it out of core code. Did I already say that memory-less nodes are stupid? ;-) I think your existing solution is the better choice ... the penalties probably aren't all that big ... so extensive workarounds for BIOS bugs seem like the wrong direction. Why can't we have the architecture code generate a memory add event on the first cpu up of which there is no memory yet? -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bugfix] sched: fix possible invalid memory access caused by CPU hot-addition
On 2014/4/24 1:46, Luck, Tony wrote: 1) Handle CPU hot-addition event 1.a) gather platform specific information 1.b) associate hot-added CPU with a node 1.c) create CPU device 2) User online hot-added CPUs through sysfs: 2.a) cpu_up() 2.b) ->try_online_node() 2.c) ->hotadd_new_pgdat() 2.d) ->node_set_online() So between 1.b and 2.c, kmalloc_node(nid) may cause invalid memory access without the node_online(nid) check. >>> >>> Any why was all this not in the Changelog? >> >> Also, do explain what kind of hardware you needed to trigger this. This >> code has been like this for a good while. > > With your proposed fix in place the allocations will succeed - but they > will be done from other nodes ... and this cpu will have to do a remote > NUMA access for the rest of time. > > It would be better to switch the order above - add the memory first, > then add the cpus. Is that possible? Hi Tony, The BIOS always sends CPU hot-addition events before memory hot-addition events, so it's hard to change the order. And we couldn't completely solve this performance penalty because the affected code tries to allocate memory for all possible CPUs instead of onlined CPUs. Best Regards! Gerry -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [Bugfix] sched: fix possible invalid memory access caused by CPU hot-addition
> > > 1) Handle CPU hot-addition event > > > 1.a) gather platform specific information > > > 1.b) associate hot-added CPU with a node > > > 1.c) create CPU device > > > 2) User online hot-added CPUs through sysfs: > > > 2.a) cpu_up() > > > 2.b) ->try_online_node() > > > 2.c) ->hotadd_new_pgdat() > > > 2.d) ->node_set_online() > > > > > > So between 1.b and 2.c, kmalloc_node(nid) may cause invalid > > > memory access without the node_online(nid) check. > > > > Any why was all this not in the Changelog? > > Also, do explain what kind of hardware you needed to trigger this. This > code has been like this for a good while. With your proposed fix in place the allocations will succeed - but they will be done from other nodes ... and this cpu will have to do a remote NUMA access for the rest of time. It would be better to switch the order above - add the memory first, then add the cpus. Is that possible? -Tony -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [Bugfix] sched: fix possible invalid memory access caused by CPU hot-addition
1) Handle CPU hot-addition event 1.a) gather platform specific information 1.b) associate hot-added CPU with a node 1.c) create CPU device 2) User online hot-added CPUs through sysfs: 2.a) cpu_up() 2.b) -try_online_node() 2.c) -hotadd_new_pgdat() 2.d) -node_set_online() So between 1.b and 2.c, kmalloc_node(nid) may cause invalid memory access without the node_online(nid) check. Any why was all this not in the Changelog? Also, do explain what kind of hardware you needed to trigger this. This code has been like this for a good while. With your proposed fix in place the allocations will succeed - but they will be done from other nodes ... and this cpu will have to do a remote NUMA access for the rest of time. It would be better to switch the order above - add the memory first, then add the cpus. Is that possible? -Tony -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bugfix] sched: fix possible invalid memory access caused by CPU hot-addition
On 2014/4/24 1:46, Luck, Tony wrote: 1) Handle CPU hot-addition event 1.a) gather platform specific information 1.b) associate hot-added CPU with a node 1.c) create CPU device 2) User online hot-added CPUs through sysfs: 2.a) cpu_up() 2.b) -try_online_node() 2.c) -hotadd_new_pgdat() 2.d) -node_set_online() So between 1.b and 2.c, kmalloc_node(nid) may cause invalid memory access without the node_online(nid) check. Any why was all this not in the Changelog? Also, do explain what kind of hardware you needed to trigger this. This code has been like this for a good while. With your proposed fix in place the allocations will succeed - but they will be done from other nodes ... and this cpu will have to do a remote NUMA access for the rest of time. It would be better to switch the order above - add the memory first, then add the cpus. Is that possible? Hi Tony, The BIOS always sends CPU hot-addition events before memory hot-addition events, so it's hard to change the order. And we couldn't completely solve this performance penalty because the affected code tries to allocate memory for all possible CPUs instead of onlined CPUs. Best Regards! Gerry -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bugfix] sched: fix possible invalid memory access caused by CPU hot-addition
On Wed, Apr 23, 2014 at 07:32:13AM +0200, Peter Zijlstra wrote: > On Wed, Apr 23, 2014 at 10:45:13AM +0800, Jiang Liu wrote: > > Hi Peter, > > It's not for memoryless node, but to solve a race window > > in CPU hot-addition. The related CPU hot-addition flow is: > > 1) Handle CPU hot-addition event > > 1.a) gather platform specific information > > 1.b) associate hot-added CPU with a node > > 1.c) create CPU device > > 2) User online hot-added CPUs through sysfs: > > 2.a)cpu_up() > > 2.b)->try_online_node() > > 2.c)->hotadd_new_pgdat() > > 2.d)->node_set_online() > > > > So between 1.b and 2.c, kmalloc_node(nid) may cause invalid > > memory access without the node_online(nid) check. > > Any why was all this not in the Changelog? Also, do explain what kind of hardware you needed to trigger this. This code has been like this for a good while. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bugfix] sched: fix possible invalid memory access caused by CPU hot-addition
On 2014/4/23 13:32, Peter Zijlstra wrote: > On Wed, Apr 23, 2014 at 10:45:13AM +0800, Jiang Liu wrote: >> Hi Peter, >> It's not for memoryless node, but to solve a race window >> in CPU hot-addition. The related CPU hot-addition flow is: >> 1) Handle CPU hot-addition event >> 1.a) gather platform specific information >> 1.b) associate hot-added CPU with a node >> 1.c) create CPU device >> 2) User online hot-added CPUs through sysfs: >> 2.a) cpu_up() >> 2.b) ->try_online_node() >> 2.c) ->hotadd_new_pgdat() >> 2.d) ->node_set_online() >> >> So between 1.b and 2.c, kmalloc_node(nid) may cause invalid >> memory access without the node_online(nid) check. > > Any why was all this not in the Changelog? Sorry, will add above message into changelog. Thanks! Gerry -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bugfix] sched: fix possible invalid memory access caused by CPU hot-addition
On Wed, Apr 23, 2014 at 10:45:13AM +0800, Jiang Liu wrote: > Hi Peter, > It's not for memoryless node, but to solve a race window > in CPU hot-addition. The related CPU hot-addition flow is: > 1) Handle CPU hot-addition event > 1.a) gather platform specific information > 1.b) associate hot-added CPU with a node > 1.c) create CPU device > 2) User online hot-added CPUs through sysfs: > 2.a) cpu_up() > 2.b) ->try_online_node() > 2.c) ->hotadd_new_pgdat() > 2.d) ->node_set_online() > > So between 1.b and 2.c, kmalloc_node(nid) may cause invalid > memory access without the node_online(nid) check. Any why was all this not in the Changelog? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bugfix] sched: fix possible invalid memory access caused by CPU hot-addition
On 2014/4/23 9:59, David Rientjes wrote: > On Tue, 22 Apr 2014, Peter Zijlstra wrote: > >> On Tue, Apr 22, 2014 at 01:01:51PM -0700, Andrew Morton wrote: >>> On Tue, 22 Apr 2014 10:15:15 +0200 Peter Zijlstra >>> wrote: >>> On Tue, Apr 22, 2014 at 01:27:15PM +0800, Jiang Liu wrote: > When calling kzalloc_node(size, flags, node), we should first check > whether node is onlined, otherwise it may cause invalid memory access > as below. But this is only for memory less node crap, right? >>> >>> um, why are memoryless nodes crap? >> >> Why wouldn't they be? Having CPUs with no local memory seems decidedly >> suboptimal. > > The quick fix for memoryless node issues is usually just do cpu_to_mem() > rather than cpu_to_node() in the caller. This assumes that the arch is > setup correctly to handle memoryless nodes with > CONFIG_HAVE_MEMORYLESS_NODES (and we've had problems recently with > memoryless nodes not being configured correctly on powerpc). > > That type of a fix would probably be better handled in the slab allocator, > though, since kmalloc_node(nid) shouldn't crash just because nid is > memoryless, we should be doing local_memory_node(node) when allocating the > slab pages. > > However, I don't think memoryless nodes are the problem here since Jiang > is testing for !node_online(nid) in his patch, so it's a problem with > cpu_to_node() pointing to an offline node. It makes sense for the page > allocator to crash in such a case, the node id is erroneous. > > So either the cpu-to-node mapping is invalid or alloc_fair_sched_group() > is allocating memory for a cpu on an offline node. The > for_each_possible_cpu() looks suspicious. There's no guarantee that > local_memory_node(node) for an offline node will return anything with > affinity, so falling back to NUMA_NO_NODE looks appropriate in Jiang's > patch. Hi David, That's the case, alloc_fair_sched_group() is trying to allocate memory for CPU in offline node, which then access non-exist NODE_DATA. Thanks! Gerry > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bugfix] sched: fix possible invalid memory access caused by CPU hot-addition
Hi Peter, It's not for memoryless node, but to solve a race window in CPU hot-addition. The related CPU hot-addition flow is: 1) Handle CPU hot-addition event 1.a) gather platform specific information 1.b) associate hot-added CPU with a node 1.c) create CPU device 2) User online hot-added CPUs through sysfs: 2.a)cpu_up() 2.b)->try_online_node() 2.c)->hotadd_new_pgdat() 2.d)->node_set_online() So between 1.b and 2.c, kmalloc_node(nid) may cause invalid memory access without the node_online(nid) check. Best Regards! Gerry On 2014/4/22 16:15, Peter Zijlstra wrote: > On Tue, Apr 22, 2014 at 01:27:15PM +0800, Jiang Liu wrote: >> When calling kzalloc_node(size, flags, node), we should first check >> whether node is onlined, otherwise it may cause invalid memory access >> as below. > > But this is only for memory less node crap, right? > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bugfix] sched: fix possible invalid memory access caused by CPU hot-addition
On Tue, 2014-04-22 at 22:04 +0200, Peter Zijlstra wrote: > On Tue, Apr 22, 2014 at 01:01:51PM -0700, Andrew Morton wrote: > > On Tue, 22 Apr 2014 10:15:15 +0200 Peter Zijlstra > > wrote: > > > > > On Tue, Apr 22, 2014 at 01:27:15PM +0800, Jiang Liu wrote: > > > > When calling kzalloc_node(size, flags, node), we should first check > > > > whether node is onlined, otherwise it may cause invalid memory access > > > > as below. > > > > > > But this is only for memory less node crap, right? > > > > um, why are memoryless nodes crap? > > Why wouldn't they be? Having CPUs with no local memory seems decidedly > suboptimal. This ain't exactly wonderful either, makes CPU domains to crawl over. node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 node 0 size: 31723 MB node 0 free: 27949 MB node 1 cpus: 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 node 1 size: 32308 MB node 1 free: 31033 MB node 2 cpus: node 2 size: 32768 MB node 2 free: 16631 MB node 3 cpus: node 3 size: 32768 MB node 3 free: 32640 MB node 4 cpus: node 4 size: 32768 MB node 4 free: 32640 MB node 5 cpus: node 5 size: 32768 MB node 5 free: 32638 MB node distances: node 0 1 2 3 4 5 0: 10 12 12 15 12 15 1: 12 10 15 12 15 15 2: 12 15 10 12 15 15 3: 15 12 12 10 15 12 4: 12 15 15 15 10 12 5: 15 15 15 12 12 10 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bugfix] sched: fix possible invalid memory access caused by CPU hot-addition
On Tue, 22 Apr 2014, Peter Zijlstra wrote: > On Tue, Apr 22, 2014 at 01:01:51PM -0700, Andrew Morton wrote: > > On Tue, 22 Apr 2014 10:15:15 +0200 Peter Zijlstra > > wrote: > > > > > On Tue, Apr 22, 2014 at 01:27:15PM +0800, Jiang Liu wrote: > > > > When calling kzalloc_node(size, flags, node), we should first check > > > > whether node is onlined, otherwise it may cause invalid memory access > > > > as below. > > > > > > But this is only for memory less node crap, right? > > > > um, why are memoryless nodes crap? > > Why wouldn't they be? Having CPUs with no local memory seems decidedly > suboptimal. The quick fix for memoryless node issues is usually just do cpu_to_mem() rather than cpu_to_node() in the caller. This assumes that the arch is setup correctly to handle memoryless nodes with CONFIG_HAVE_MEMORYLESS_NODES (and we've had problems recently with memoryless nodes not being configured correctly on powerpc). That type of a fix would probably be better handled in the slab allocator, though, since kmalloc_node(nid) shouldn't crash just because nid is memoryless, we should be doing local_memory_node(node) when allocating the slab pages. However, I don't think memoryless nodes are the problem here since Jiang is testing for !node_online(nid) in his patch, so it's a problem with cpu_to_node() pointing to an offline node. It makes sense for the page allocator to crash in such a case, the node id is erroneous. So either the cpu-to-node mapping is invalid or alloc_fair_sched_group() is allocating memory for a cpu on an offline node. The for_each_possible_cpu() looks suspicious. There's no guarantee that local_memory_node(node) for an offline node will return anything with affinity, so falling back to NUMA_NO_NODE looks appropriate in Jiang's patch. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bugfix] sched: fix possible invalid memory access caused by CPU hot-addition
On Tue, Apr 22, 2014 at 01:01:51PM -0700, Andrew Morton wrote: > On Tue, 22 Apr 2014 10:15:15 +0200 Peter Zijlstra > wrote: > > > On Tue, Apr 22, 2014 at 01:27:15PM +0800, Jiang Liu wrote: > > > When calling kzalloc_node(size, flags, node), we should first check > > > whether node is onlined, otherwise it may cause invalid memory access > > > as below. > > > > But this is only for memory less node crap, right? > > um, why are memoryless nodes crap? Why wouldn't they be? Having CPUs with no local memory seems decidedly suboptimal. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bugfix] sched: fix possible invalid memory access caused by CPU hot-addition
On Tue, 22 Apr 2014 10:15:15 +0200 Peter Zijlstra wrote: > On Tue, Apr 22, 2014 at 01:27:15PM +0800, Jiang Liu wrote: > > When calling kzalloc_node(size, flags, node), we should first check > > whether node is onlined, otherwise it may cause invalid memory access > > as below. > > But this is only for memory less node crap, right? um, why are memoryless nodes crap? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bugfix] sched: fix possible invalid memory access caused by CPU hot-addition
On Tue, Apr 22, 2014 at 01:27:15PM +0800, Jiang Liu wrote: > When calling kzalloc_node(size, flags, node), we should first check > whether node is onlined, otherwise it may cause invalid memory access > as below. But this is only for memory less node crap, right? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bugfix] sched: fix possible invalid memory access caused by CPU hot-addition
On Tue, Apr 22, 2014 at 01:27:15PM +0800, Jiang Liu wrote: When calling kzalloc_node(size, flags, node), we should first check whether node is onlined, otherwise it may cause invalid memory access as below. But this is only for memory less node crap, right? -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bugfix] sched: fix possible invalid memory access caused by CPU hot-addition
On Tue, 22 Apr 2014 10:15:15 +0200 Peter Zijlstra pet...@infradead.org wrote: On Tue, Apr 22, 2014 at 01:27:15PM +0800, Jiang Liu wrote: When calling kzalloc_node(size, flags, node), we should first check whether node is onlined, otherwise it may cause invalid memory access as below. But this is only for memory less node crap, right? um, why are memoryless nodes crap? -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bugfix] sched: fix possible invalid memory access caused by CPU hot-addition
On Tue, Apr 22, 2014 at 01:01:51PM -0700, Andrew Morton wrote: On Tue, 22 Apr 2014 10:15:15 +0200 Peter Zijlstra pet...@infradead.org wrote: On Tue, Apr 22, 2014 at 01:27:15PM +0800, Jiang Liu wrote: When calling kzalloc_node(size, flags, node), we should first check whether node is onlined, otherwise it may cause invalid memory access as below. But this is only for memory less node crap, right? um, why are memoryless nodes crap? Why wouldn't they be? Having CPUs with no local memory seems decidedly suboptimal. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bugfix] sched: fix possible invalid memory access caused by CPU hot-addition
On Tue, 22 Apr 2014, Peter Zijlstra wrote: On Tue, Apr 22, 2014 at 01:01:51PM -0700, Andrew Morton wrote: On Tue, 22 Apr 2014 10:15:15 +0200 Peter Zijlstra pet...@infradead.org wrote: On Tue, Apr 22, 2014 at 01:27:15PM +0800, Jiang Liu wrote: When calling kzalloc_node(size, flags, node), we should first check whether node is onlined, otherwise it may cause invalid memory access as below. But this is only for memory less node crap, right? um, why are memoryless nodes crap? Why wouldn't they be? Having CPUs with no local memory seems decidedly suboptimal. The quick fix for memoryless node issues is usually just do cpu_to_mem() rather than cpu_to_node() in the caller. This assumes that the arch is setup correctly to handle memoryless nodes with CONFIG_HAVE_MEMORYLESS_NODES (and we've had problems recently with memoryless nodes not being configured correctly on powerpc). That type of a fix would probably be better handled in the slab allocator, though, since kmalloc_node(nid) shouldn't crash just because nid is memoryless, we should be doing local_memory_node(node) when allocating the slab pages. However, I don't think memoryless nodes are the problem here since Jiang is testing for !node_online(nid) in his patch, so it's a problem with cpu_to_node() pointing to an offline node. It makes sense for the page allocator to crash in such a case, the node id is erroneous. So either the cpu-to-node mapping is invalid or alloc_fair_sched_group() is allocating memory for a cpu on an offline node. The for_each_possible_cpu() looks suspicious. There's no guarantee that local_memory_node(node) for an offline node will return anything with affinity, so falling back to NUMA_NO_NODE looks appropriate in Jiang's patch. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bugfix] sched: fix possible invalid memory access caused by CPU hot-addition
On Tue, 2014-04-22 at 22:04 +0200, Peter Zijlstra wrote: On Tue, Apr 22, 2014 at 01:01:51PM -0700, Andrew Morton wrote: On Tue, 22 Apr 2014 10:15:15 +0200 Peter Zijlstra pet...@infradead.org wrote: On Tue, Apr 22, 2014 at 01:27:15PM +0800, Jiang Liu wrote: When calling kzalloc_node(size, flags, node), we should first check whether node is onlined, otherwise it may cause invalid memory access as below. But this is only for memory less node crap, right? um, why are memoryless nodes crap? Why wouldn't they be? Having CPUs with no local memory seems decidedly suboptimal. This ain't exactly wonderful either, makes CPU domains to crawl over. node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 node 0 size: 31723 MB node 0 free: 27949 MB node 1 cpus: 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 node 1 size: 32308 MB node 1 free: 31033 MB node 2 cpus: node 2 size: 32768 MB node 2 free: 16631 MB node 3 cpus: node 3 size: 32768 MB node 3 free: 32640 MB node 4 cpus: node 4 size: 32768 MB node 4 free: 32640 MB node 5 cpus: node 5 size: 32768 MB node 5 free: 32638 MB node distances: node 0 1 2 3 4 5 0: 10 12 12 15 12 15 1: 12 10 15 12 15 15 2: 12 15 10 12 15 15 3: 15 12 12 10 15 12 4: 12 15 15 15 10 12 5: 15 15 15 12 12 10 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bugfix] sched: fix possible invalid memory access caused by CPU hot-addition
Hi Peter, It's not for memoryless node, but to solve a race window in CPU hot-addition. The related CPU hot-addition flow is: 1) Handle CPU hot-addition event 1.a) gather platform specific information 1.b) associate hot-added CPU with a node 1.c) create CPU device 2) User online hot-added CPUs through sysfs: 2.a)cpu_up() 2.b)-try_online_node() 2.c)-hotadd_new_pgdat() 2.d)-node_set_online() So between 1.b and 2.c, kmalloc_node(nid) may cause invalid memory access without the node_online(nid) check. Best Regards! Gerry On 2014/4/22 16:15, Peter Zijlstra wrote: On Tue, Apr 22, 2014 at 01:27:15PM +0800, Jiang Liu wrote: When calling kzalloc_node(size, flags, node), we should first check whether node is onlined, otherwise it may cause invalid memory access as below. But this is only for memory less node crap, right? -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bugfix] sched: fix possible invalid memory access caused by CPU hot-addition
On 2014/4/23 9:59, David Rientjes wrote: On Tue, 22 Apr 2014, Peter Zijlstra wrote: On Tue, Apr 22, 2014 at 01:01:51PM -0700, Andrew Morton wrote: On Tue, 22 Apr 2014 10:15:15 +0200 Peter Zijlstra pet...@infradead.org wrote: On Tue, Apr 22, 2014 at 01:27:15PM +0800, Jiang Liu wrote: When calling kzalloc_node(size, flags, node), we should first check whether node is onlined, otherwise it may cause invalid memory access as below. But this is only for memory less node crap, right? um, why are memoryless nodes crap? Why wouldn't they be? Having CPUs with no local memory seems decidedly suboptimal. The quick fix for memoryless node issues is usually just do cpu_to_mem() rather than cpu_to_node() in the caller. This assumes that the arch is setup correctly to handle memoryless nodes with CONFIG_HAVE_MEMORYLESS_NODES (and we've had problems recently with memoryless nodes not being configured correctly on powerpc). That type of a fix would probably be better handled in the slab allocator, though, since kmalloc_node(nid) shouldn't crash just because nid is memoryless, we should be doing local_memory_node(node) when allocating the slab pages. However, I don't think memoryless nodes are the problem here since Jiang is testing for !node_online(nid) in his patch, so it's a problem with cpu_to_node() pointing to an offline node. It makes sense for the page allocator to crash in such a case, the node id is erroneous. So either the cpu-to-node mapping is invalid or alloc_fair_sched_group() is allocating memory for a cpu on an offline node. The for_each_possible_cpu() looks suspicious. There's no guarantee that local_memory_node(node) for an offline node will return anything with affinity, so falling back to NUMA_NO_NODE looks appropriate in Jiang's patch. Hi David, That's the case, alloc_fair_sched_group() is trying to allocate memory for CPU in offline node, which then access non-exist NODE_DATA. Thanks! Gerry -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bugfix] sched: fix possible invalid memory access caused by CPU hot-addition
On Wed, Apr 23, 2014 at 10:45:13AM +0800, Jiang Liu wrote: Hi Peter, It's not for memoryless node, but to solve a race window in CPU hot-addition. The related CPU hot-addition flow is: 1) Handle CPU hot-addition event 1.a) gather platform specific information 1.b) associate hot-added CPU with a node 1.c) create CPU device 2) User online hot-added CPUs through sysfs: 2.a) cpu_up() 2.b) -try_online_node() 2.c) -hotadd_new_pgdat() 2.d) -node_set_online() So between 1.b and 2.c, kmalloc_node(nid) may cause invalid memory access without the node_online(nid) check. Any why was all this not in the Changelog? -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bugfix] sched: fix possible invalid memory access caused by CPU hot-addition
On 2014/4/23 13:32, Peter Zijlstra wrote: On Wed, Apr 23, 2014 at 10:45:13AM +0800, Jiang Liu wrote: Hi Peter, It's not for memoryless node, but to solve a race window in CPU hot-addition. The related CPU hot-addition flow is: 1) Handle CPU hot-addition event 1.a) gather platform specific information 1.b) associate hot-added CPU with a node 1.c) create CPU device 2) User online hot-added CPUs through sysfs: 2.a) cpu_up() 2.b) -try_online_node() 2.c) -hotadd_new_pgdat() 2.d) -node_set_online() So between 1.b and 2.c, kmalloc_node(nid) may cause invalid memory access without the node_online(nid) check. Any why was all this not in the Changelog? Sorry, will add above message into changelog. Thanks! Gerry -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bugfix] sched: fix possible invalid memory access caused by CPU hot-addition
On Wed, Apr 23, 2014 at 07:32:13AM +0200, Peter Zijlstra wrote: On Wed, Apr 23, 2014 at 10:45:13AM +0800, Jiang Liu wrote: Hi Peter, It's not for memoryless node, but to solve a race window in CPU hot-addition. The related CPU hot-addition flow is: 1) Handle CPU hot-addition event 1.a) gather platform specific information 1.b) associate hot-added CPU with a node 1.c) create CPU device 2) User online hot-added CPUs through sysfs: 2.a)cpu_up() 2.b)-try_online_node() 2.c)-hotadd_new_pgdat() 2.d)-node_set_online() So between 1.b and 2.c, kmalloc_node(nid) may cause invalid memory access without the node_online(nid) check. Any why was all this not in the Changelog? Also, do explain what kind of hardware you needed to trigger this. This code has been like this for a good while. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[Bugfix] sched: fix possible invalid memory access caused by CPU hot-addition
When calling kzalloc_node(size, flags, node), we should first check whether node is onlined, otherwise it may cause invalid memory access as below. [ 3663.324476] BUG: unable to handle kernel paging request at 1f08 [ 3663.332348] IP: [] __alloc_pages_nodemask+0xb9/0x2d0 [ 3663.339719] PGD 82fe10067 PUD 82ebef067 PMD 0 [ 3663.344773] Oops: [#1] SMP [ 3663.348455] Modules linked in: shpchp gpio_ich x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd microcode joydev sb_edac edac_core lpc_ich ipmi_si tpm_tis ipmi_msghandler ioatdma wmi acpi_pad mac_hid lp parport ixgbe isci mpt2sas dca ahci ptp libsas libahci raid_class pps_core scsi_transport_sas mdio hid_generic usbhid hid [ 3663.394393] CPU: 61 PID: 2416 Comm: cron Tainted: GW3.14.0-rc5+ #21 [ 3663.402643] Hardware name: Intel Corporation BRICKLAND/BRICKLAND, BIOS BRIVTIN1.86B.0047.F03.1403031049 03/03/2014 [ 3663.414299] task: 88082fe54b00 ti: 880845fba000 task.ti: 880845fba000 [ 3663.422741] RIP: 0010:[] [] __alloc_pages_nodemask+0xb9/0x2d0 [ 3663.432857] RSP: 0018:880845fbbcd0 EFLAGS: 00010246 [ 3663.439265] RAX: 1f00 RBX: RCX: [ 3663.447291] RDX: RSI: 0a8d RDI: 81a8d950 [ 3663.455318] RBP: 880845fbbd58 R08: 880823293400 R09: 0001 [ 3663.463345] R10: 0001 R11: R12: 002052d0 [ 3663.471363] R13: 880854c07600 R14: 0002 R15: [ 3663.479389] FS: 7f2e8b99e800() GS:88105a40() knlGS: [ 3663.488514] CS: 0010 DS: ES: CR0: 80050033 [ 3663.495018] CR2: 1f08 CR3: 0008237b1000 CR4: 001407e0 [ 3663.503476] Stack: [ 3663.505757] 811bd74d 880854c01d98 880854c01df0 880854c01dd0 [ 3663.514167] 0003208ca420 00075a5d84d0 88082fe54b00 811bb35f [ 3663.522567] 880854c07600 0003 1f00 880845fbbd48 [ 3663.530976] Call Trace: [ 3663.533753] [] ? deactivate_slab+0x41d/0x4f0 [ 3663.540421] [] ? new_slab+0x3f/0x2d0 [ 3663.546307] [] new_slab+0xa5/0x2d0 [ 3663.552001] [] __slab_alloc+0x35d/0x54a [ 3663.558185] [] ? local_clock+0x25/0x30 [ 3663.564686] [] ? __do_page_fault+0x4ec/0x5e0 [ 3663.571356] [] ? alloc_fair_sched_group+0xc4/0x190 [ 3663.578609] [] ? __raw_spin_lock_init+0x21/0x60 [ 3663.585570] [] kmem_cache_alloc_node_trace+0xa6/0x1d0 [ 3663.593112] [] ? alloc_fair_sched_group+0xc4/0x190 [ 3663.600363] [] alloc_fair_sched_group+0xc4/0x190 [ 3663.607423] [] sched_create_group+0x3f/0x80 [ 3663.613994] [] sched_autogroup_create_attach+0x3f/0x1b0 [ 3663.621732] [] sys_setsid+0xea/0x110 [ 3663.628020] [] system_call_fastpath+0x1a/0x1f [ 3663.634780] Code: 00 44 89 e7 e8 b9 f8 f4 ff 41 f6 c4 10 74 18 31 d2 be 8d 0a 00 00 48 c7 c7 50 d9 a8 81 e8 70 6a f2 ff e8 db dd 5f 00 48 8b 45 c8 <48> 83 78 08 00 0f 84 b5 01 00 00 48 83 c0 08 44 89 75 c0 4d 89 [ 3663.657032] RIP [] __alloc_pages_nodemask+0xb9/0x2d0 [ 3663.664491] RSP [ 3663.668429] CR2: 1f08 [ 3663.672659] ---[ end trace df13f08ed9de18ad ]--- Signed-off-by: Jiang Liu --- kernel/sched/fair.c | 12 +++- kernel/sched/rt.c | 11 +++ 2 files changed, 14 insertions(+), 9 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 7e9bd0b1fa9e..1c093d213b0f 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -7485,7 +7485,7 @@ int alloc_fair_sched_group(struct task_group *tg, struct task_group *parent) { struct cfs_rq *cfs_rq; struct sched_entity *se; - int i; + int i, nid; tg->cfs_rq = kzalloc(sizeof(cfs_rq) * nr_cpu_ids, GFP_KERNEL); if (!tg->cfs_rq) @@ -7499,13 +7499,15 @@ int alloc_fair_sched_group(struct task_group *tg, struct task_group *parent) init_cfs_bandwidth(tg_cfs_bandwidth(tg)); for_each_possible_cpu(i) { - cfs_rq = kzalloc_node(sizeof(struct cfs_rq), - GFP_KERNEL, cpu_to_node(i)); + nid = cpu_to_node(i); + if (nid != NUMA_NO_NODE && !node_online(nid)) + nid = NUMA_NO_NODE; + + cfs_rq = kzalloc_node(sizeof(struct cfs_rq), GFP_KERNEL, nid); if (!cfs_rq) goto err; - se = kzalloc_node(sizeof(struct sched_entity), - GFP_KERNEL, cpu_to_node(i)); + se = kzalloc_node(sizeof(struct sched_entity), GFP_KERNEL, nid); if (!se) goto err_free_rq; diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c index d8cdf1618551..5a5a32e1115c 100644 --- a/kernel/sched/rt.c +++ b/kernel/sched/rt.c @@ -161,7 +161,7 @@ int
[Bugfix] sched: fix possible invalid memory access caused by CPU hot-addition
When calling kzalloc_node(size, flags, node), we should first check whether node is onlined, otherwise it may cause invalid memory access as below. [ 3663.324476] BUG: unable to handle kernel paging request at 1f08 [ 3663.332348] IP: [81172219] __alloc_pages_nodemask+0xb9/0x2d0 [ 3663.339719] PGD 82fe10067 PUD 82ebef067 PMD 0 [ 3663.344773] Oops: [#1] SMP [ 3663.348455] Modules linked in: shpchp gpio_ich x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd microcode joydev sb_edac edac_core lpc_ich ipmi_si tpm_tis ipmi_msghandler ioatdma wmi acpi_pad mac_hid lp parport ixgbe isci mpt2sas dca ahci ptp libsas libahci raid_class pps_core scsi_transport_sas mdio hid_generic usbhid hid [ 3663.394393] CPU: 61 PID: 2416 Comm: cron Tainted: GW3.14.0-rc5+ #21 [ 3663.402643] Hardware name: Intel Corporation BRICKLAND/BRICKLAND, BIOS BRIVTIN1.86B.0047.F03.1403031049 03/03/2014 [ 3663.414299] task: 88082fe54b00 ti: 880845fba000 task.ti: 880845fba000 [ 3663.422741] RIP: 0010:[81172219] [81172219] __alloc_pages_nodemask+0xb9/0x2d0 [ 3663.432857] RSP: 0018:880845fbbcd0 EFLAGS: 00010246 [ 3663.439265] RAX: 1f00 RBX: RCX: [ 3663.447291] RDX: RSI: 0a8d RDI: 81a8d950 [ 3663.455318] RBP: 880845fbbd58 R08: 880823293400 R09: 0001 [ 3663.463345] R10: 0001 R11: R12: 002052d0 [ 3663.471363] R13: 880854c07600 R14: 0002 R15: [ 3663.479389] FS: 7f2e8b99e800() GS:88105a40() knlGS: [ 3663.488514] CS: 0010 DS: ES: CR0: 80050033 [ 3663.495018] CR2: 1f08 CR3: 0008237b1000 CR4: 001407e0 [ 3663.503476] Stack: [ 3663.505757] 811bd74d 880854c01d98 880854c01df0 880854c01dd0 [ 3663.514167] 0003208ca420 00075a5d84d0 88082fe54b00 811bb35f [ 3663.522567] 880854c07600 0003 1f00 880845fbbd48 [ 3663.530976] Call Trace: [ 3663.533753] [811bd74d] ? deactivate_slab+0x41d/0x4f0 [ 3663.540421] [811bb35f] ? new_slab+0x3f/0x2d0 [ 3663.546307] [811bb3c5] new_slab+0xa5/0x2d0 [ 3663.552001] [81768c97] __slab_alloc+0x35d/0x54a [ 3663.558185] [810a4845] ? local_clock+0x25/0x30 [ 3663.564686] [8177a34c] ? __do_page_fault+0x4ec/0x5e0 [ 3663.571356] [810b0054] ? alloc_fair_sched_group+0xc4/0x190 [ 3663.578609] [810c77f1] ? __raw_spin_lock_init+0x21/0x60 [ 3663.585570] [811be476] kmem_cache_alloc_node_trace+0xa6/0x1d0 [ 3663.593112] [810b0054] ? alloc_fair_sched_group+0xc4/0x190 [ 3663.600363] [810b0054] alloc_fair_sched_group+0xc4/0x190 [ 3663.607423] [810a359f] sched_create_group+0x3f/0x80 [ 3663.613994] [810b611f] sched_autogroup_create_attach+0x3f/0x1b0 [ 3663.621732] [8108258a] sys_setsid+0xea/0x110 [ 3663.628020] [8177f42d] system_call_fastpath+0x1a/0x1f [ 3663.634780] Code: 00 44 89 e7 e8 b9 f8 f4 ff 41 f6 c4 10 74 18 31 d2 be 8d 0a 00 00 48 c7 c7 50 d9 a8 81 e8 70 6a f2 ff e8 db dd 5f 00 48 8b 45 c8 48 83 78 08 00 0f 84 b5 01 00 00 48 83 c0 08 44 89 75 c0 4d 89 [ 3663.657032] RIP [81172219] __alloc_pages_nodemask+0xb9/0x2d0 [ 3663.664491] RSP 880845fbbcd0 [ 3663.668429] CR2: 1f08 [ 3663.672659] ---[ end trace df13f08ed9de18ad ]--- Signed-off-by: Jiang Liu jiang@linux.intel.com --- kernel/sched/fair.c | 12 +++- kernel/sched/rt.c | 11 +++ 2 files changed, 14 insertions(+), 9 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 7e9bd0b1fa9e..1c093d213b0f 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -7485,7 +7485,7 @@ int alloc_fair_sched_group(struct task_group *tg, struct task_group *parent) { struct cfs_rq *cfs_rq; struct sched_entity *se; - int i; + int i, nid; tg-cfs_rq = kzalloc(sizeof(cfs_rq) * nr_cpu_ids, GFP_KERNEL); if (!tg-cfs_rq) @@ -7499,13 +7499,15 @@ int alloc_fair_sched_group(struct task_group *tg, struct task_group *parent) init_cfs_bandwidth(tg_cfs_bandwidth(tg)); for_each_possible_cpu(i) { - cfs_rq = kzalloc_node(sizeof(struct cfs_rq), - GFP_KERNEL, cpu_to_node(i)); + nid = cpu_to_node(i); + if (nid != NUMA_NO_NODE !node_online(nid)) + nid = NUMA_NO_NODE; + + cfs_rq = kzalloc_node(sizeof(struct cfs_rq), GFP_KERNEL, nid); if (!cfs_rq) goto err; - se = kzalloc_node(sizeof(struct sched_entity), - GFP_KERNEL, cpu_to_node(i));