Re: [Bugfix] sched: fix possible invalid memory access caused by CPU hot-addition
On Thu, Apr 24, 2014 at 05:41:20PM +, Luck, Tony wrote: > >>The BIOS always sends CPU hot-addition events before memory > >> hot-addition events, so it's hard to change the order. > >>And we couldn't completely solve this performance penalty because the > >> affected code tries to allocate memory for all possible > >> CPUs instead of onlined CPUs. > > > > So the BIOS is fucked, news at 11, one would have hoped Intel would have > > _some_ say in it, but alas. So how about instead you force memory online > > when you online the first CPU, screw whatever the BIOS does or does not? > > Certainly an interesting implementation choice by the BIOS. The only logical > order to use to bring components of a modern cpu online is: > > 1) Memory - so we have a place to allocate structure needed for following > steps > 2) Cores - so we have a place to direct interrupts from next step > 3) I/O > > We should log a bug against the BIOS ... but systems are already shipping so > we will > have to deal with this. Someone want to clue me in what systems these are so I can try and stay the hell away from them? > Either we use your existing patch - and systems with silly BIOS will work, > but with a > small NUMA penalty for objects allocated remotely Depending on how this all is constructed, I can imagine the worst case where we bring up a medium to large system (8+ nodes, non fully connected etc) and we only have memory for the first node online from booting. The cpu bringup could be concurrent/fast-enough to not have any other memory online. This would result in all cpus having their memory on the first node (including per-cpu chunks I would imagine), that's entirely retarded. We should really refuse to bring up CPUs and boot in reduced capacity for such demented systems. > or ... we implement some crazy queuing scheme ... where we delay bringing > cores > online for a while to see whether more things like memory and I/O start > showing > up too. We can't wait forever - people sometimes do configure systems with > memory-less nodes. Is there no distinction between the cases? I've really no idea how the BIOS communicates this (and honestly no real desire to know), but it would be best if we can kludge around this in the arch code and keep it out of core code. Did I already say that memory-less nodes are stupid? ;-) > I think your existing solution is the better choice ... the penalties > probably aren't > all that big ... so extensive workarounds for BIOS bugs seem like the wrong > direction. Why can't we have the architecture code generate a memory add event on the first cpu up of which there is no memory yet? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [Bugfix] sched: fix possible invalid memory access caused by CPU hot-addition
>> The BIOS always sends CPU hot-addition events before memory >> hot-addition events, so it's hard to change the order. >> And we couldn't completely solve this performance penalty because the >> affected code tries to allocate memory for all possible >> CPUs instead of onlined CPUs. > > So the BIOS is fucked, news at 11, one would have hoped Intel would have > _some_ say in it, but alas. So how about instead you force memory online > when you online the first CPU, screw whatever the BIOS does or does not? Certainly an interesting implementation choice by the BIOS. The only logical order to use to bring components of a modern cpu online is: 1) Memory - so we have a place to allocate structure needed for following steps 2) Cores - so we have a place to direct interrupts from next step 3) I/O We should log a bug against the BIOS ... but systems are already shipping so we will have to deal with this. Either we use your existing patch - and systems with silly BIOS will work, but with a small NUMA penalty for objects allocated remotely or ... we implement some crazy queuing scheme ... where we delay bringing cores online for a while to see whether more things like memory and I/O start showing up too. We can't wait forever - people sometimes do configure systems with memory-less nodes. I think your existing solution is the better choice ... the penalties probably aren't all that big ... so extensive workarounds for BIOS bugs seem like the wrong direction. Maybe a one-time printk() so the user knows they have a buggy BIOS might help provide back pressure to BIOS teams to do this right in the future? But it isn't a bug for the memory-less node case. -Tony -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bugfix] sched: fix possible invalid memory access caused by CPU hot-addition
On Thu, Apr 24, 2014 at 10:59:45AM +0800, Jiang Liu wrote: > On 2014/4/24 1:46, Luck, Tony wrote: > 1) Handle CPU hot-addition event > 1.a) gather platform specific information > 1.b) associate hot-added CPU with a node > 1.c) create CPU device > 2) User online hot-added CPUs through sysfs: > 2.a) cpu_up() > 2.b) ->try_online_node() > 2.c) ->hotadd_new_pgdat() > 2.d) ->node_set_online() > > So between 1.b and 2.c, kmalloc_node(nid) may cause invalid > memory access without the node_online(nid) check. > >>> > >>> Any why was all this not in the Changelog? > >> > >> Also, do explain what kind of hardware you needed to trigger this. This > >> code has been like this for a good while. > > > > With your proposed fix in place the allocations will succeed - but they > > will be done from other nodes ... and this cpu will have to do a remote > > NUMA access for the rest of time. > > > > It would be better to switch the order above - add the memory first, > > then add the cpus. Is that possible? > Hi Tony, > The BIOS always sends CPU hot-addition events before memory > hot-addition events, so it's hard to change the order. > And we couldn't completely solve this performance penalty because the > affected code tries to allocate memory for all possible > CPUs instead of onlined CPUs. So the BIOS is fucked, news at 11, one would have hoped Intel would have _some_ say in it, but alas. So how about instead you force memory online when you online the first CPU, screw whatever the BIOS does or does not? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bugfix] sched: fix possible invalid memory access caused by CPU hot-addition
On Thu, Apr 24, 2014 at 10:59:45AM +0800, Jiang Liu wrote: On 2014/4/24 1:46, Luck, Tony wrote: 1) Handle CPU hot-addition event 1.a) gather platform specific information 1.b) associate hot-added CPU with a node 1.c) create CPU device 2) User online hot-added CPUs through sysfs: 2.a) cpu_up() 2.b) -try_online_node() 2.c) -hotadd_new_pgdat() 2.d) -node_set_online() So between 1.b and 2.c, kmalloc_node(nid) may cause invalid memory access without the node_online(nid) check. Any why was all this not in the Changelog? Also, do explain what kind of hardware you needed to trigger this. This code has been like this for a good while. With your proposed fix in place the allocations will succeed - but they will be done from other nodes ... and this cpu will have to do a remote NUMA access for the rest of time. It would be better to switch the order above - add the memory first, then add the cpus. Is that possible? Hi Tony, The BIOS always sends CPU hot-addition events before memory hot-addition events, so it's hard to change the order. And we couldn't completely solve this performance penalty because the affected code tries to allocate memory for all possible CPUs instead of onlined CPUs. So the BIOS is fucked, news at 11, one would have hoped Intel would have _some_ say in it, but alas. So how about instead you force memory online when you online the first CPU, screw whatever the BIOS does or does not? -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [Bugfix] sched: fix possible invalid memory access caused by CPU hot-addition
The BIOS always sends CPU hot-addition events before memory hot-addition events, so it's hard to change the order. And we couldn't completely solve this performance penalty because the affected code tries to allocate memory for all possible CPUs instead of onlined CPUs. So the BIOS is fucked, news at 11, one would have hoped Intel would have _some_ say in it, but alas. So how about instead you force memory online when you online the first CPU, screw whatever the BIOS does or does not? Certainly an interesting implementation choice by the BIOS. The only logical order to use to bring components of a modern cpu online is: 1) Memory - so we have a place to allocate structure needed for following steps 2) Cores - so we have a place to direct interrupts from next step 3) I/O We should log a bug against the BIOS ... but systems are already shipping so we will have to deal with this. Either we use your existing patch - and systems with silly BIOS will work, but with a small NUMA penalty for objects allocated remotely or ... we implement some crazy queuing scheme ... where we delay bringing cores online for a while to see whether more things like memory and I/O start showing up too. We can't wait forever - people sometimes do configure systems with memory-less nodes. I think your existing solution is the better choice ... the penalties probably aren't all that big ... so extensive workarounds for BIOS bugs seem like the wrong direction. Maybe a one-time printk() so the user knows they have a buggy BIOS might help provide back pressure to BIOS teams to do this right in the future? But it isn't a bug for the memory-less node case. -Tony -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bugfix] sched: fix possible invalid memory access caused by CPU hot-addition
On Thu, Apr 24, 2014 at 05:41:20PM +, Luck, Tony wrote: The BIOS always sends CPU hot-addition events before memory hot-addition events, so it's hard to change the order. And we couldn't completely solve this performance penalty because the affected code tries to allocate memory for all possible CPUs instead of onlined CPUs. So the BIOS is fucked, news at 11, one would have hoped Intel would have _some_ say in it, but alas. So how about instead you force memory online when you online the first CPU, screw whatever the BIOS does or does not? Certainly an interesting implementation choice by the BIOS. The only logical order to use to bring components of a modern cpu online is: 1) Memory - so we have a place to allocate structure needed for following steps 2) Cores - so we have a place to direct interrupts from next step 3) I/O We should log a bug against the BIOS ... but systems are already shipping so we will have to deal with this. Someone want to clue me in what systems these are so I can try and stay the hell away from them? Either we use your existing patch - and systems with silly BIOS will work, but with a small NUMA penalty for objects allocated remotely Depending on how this all is constructed, I can imagine the worst case where we bring up a medium to large system (8+ nodes, non fully connected etc) and we only have memory for the first node online from booting. The cpu bringup could be concurrent/fast-enough to not have any other memory online. This would result in all cpus having their memory on the first node (including per-cpu chunks I would imagine), that's entirely retarded. We should really refuse to bring up CPUs and boot in reduced capacity for such demented systems. or ... we implement some crazy queuing scheme ... where we delay bringing cores online for a while to see whether more things like memory and I/O start showing up too. We can't wait forever - people sometimes do configure systems with memory-less nodes. Is there no distinction between the cases? I've really no idea how the BIOS communicates this (and honestly no real desire to know), but it would be best if we can kludge around this in the arch code and keep it out of core code. Did I already say that memory-less nodes are stupid? ;-) I think your existing solution is the better choice ... the penalties probably aren't all that big ... so extensive workarounds for BIOS bugs seem like the wrong direction. Why can't we have the architecture code generate a memory add event on the first cpu up of which there is no memory yet? -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bugfix] sched: fix possible invalid memory access caused by CPU hot-addition
On 2014/4/24 1:46, Luck, Tony wrote: 1) Handle CPU hot-addition event 1.a) gather platform specific information 1.b) associate hot-added CPU with a node 1.c) create CPU device 2) User online hot-added CPUs through sysfs: 2.a) cpu_up() 2.b) ->try_online_node() 2.c) ->hotadd_new_pgdat() 2.d) ->node_set_online() So between 1.b and 2.c, kmalloc_node(nid) may cause invalid memory access without the node_online(nid) check. >>> >>> Any why was all this not in the Changelog? >> >> Also, do explain what kind of hardware you needed to trigger this. This >> code has been like this for a good while. > > With your proposed fix in place the allocations will succeed - but they > will be done from other nodes ... and this cpu will have to do a remote > NUMA access for the rest of time. > > It would be better to switch the order above - add the memory first, > then add the cpus. Is that possible? Hi Tony, The BIOS always sends CPU hot-addition events before memory hot-addition events, so it's hard to change the order. And we couldn't completely solve this performance penalty because the affected code tries to allocate memory for all possible CPUs instead of onlined CPUs. Best Regards! Gerry -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [Bugfix] sched: fix possible invalid memory access caused by CPU hot-addition
> > > 1) Handle CPU hot-addition event > > > 1.a) gather platform specific information > > > 1.b) associate hot-added CPU with a node > > > 1.c) create CPU device > > > 2) User online hot-added CPUs through sysfs: > > > 2.a) cpu_up() > > > 2.b) ->try_online_node() > > > 2.c) ->hotadd_new_pgdat() > > > 2.d) ->node_set_online() > > > > > > So between 1.b and 2.c, kmalloc_node(nid) may cause invalid > > > memory access without the node_online(nid) check. > > > > Any why was all this not in the Changelog? > > Also, do explain what kind of hardware you needed to trigger this. This > code has been like this for a good while. With your proposed fix in place the allocations will succeed - but they will be done from other nodes ... and this cpu will have to do a remote NUMA access for the rest of time. It would be better to switch the order above - add the memory first, then add the cpus. Is that possible? -Tony -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [Bugfix] sched: fix possible invalid memory access caused by CPU hot-addition
1) Handle CPU hot-addition event 1.a) gather platform specific information 1.b) associate hot-added CPU with a node 1.c) create CPU device 2) User online hot-added CPUs through sysfs: 2.a) cpu_up() 2.b) -try_online_node() 2.c) -hotadd_new_pgdat() 2.d) -node_set_online() So between 1.b and 2.c, kmalloc_node(nid) may cause invalid memory access without the node_online(nid) check. Any why was all this not in the Changelog? Also, do explain what kind of hardware you needed to trigger this. This code has been like this for a good while. With your proposed fix in place the allocations will succeed - but they will be done from other nodes ... and this cpu will have to do a remote NUMA access for the rest of time. It would be better to switch the order above - add the memory first, then add the cpus. Is that possible? -Tony -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bugfix] sched: fix possible invalid memory access caused by CPU hot-addition
On 2014/4/24 1:46, Luck, Tony wrote: 1) Handle CPU hot-addition event 1.a) gather platform specific information 1.b) associate hot-added CPU with a node 1.c) create CPU device 2) User online hot-added CPUs through sysfs: 2.a) cpu_up() 2.b) -try_online_node() 2.c) -hotadd_new_pgdat() 2.d) -node_set_online() So between 1.b and 2.c, kmalloc_node(nid) may cause invalid memory access without the node_online(nid) check. Any why was all this not in the Changelog? Also, do explain what kind of hardware you needed to trigger this. This code has been like this for a good while. With your proposed fix in place the allocations will succeed - but they will be done from other nodes ... and this cpu will have to do a remote NUMA access for the rest of time. It would be better to switch the order above - add the memory first, then add the cpus. Is that possible? Hi Tony, The BIOS always sends CPU hot-addition events before memory hot-addition events, so it's hard to change the order. And we couldn't completely solve this performance penalty because the affected code tries to allocate memory for all possible CPUs instead of onlined CPUs. Best Regards! Gerry -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bugfix] sched: fix possible invalid memory access caused by CPU hot-addition
On Wed, Apr 23, 2014 at 07:32:13AM +0200, Peter Zijlstra wrote: > On Wed, Apr 23, 2014 at 10:45:13AM +0800, Jiang Liu wrote: > > Hi Peter, > > It's not for memoryless node, but to solve a race window > > in CPU hot-addition. The related CPU hot-addition flow is: > > 1) Handle CPU hot-addition event > > 1.a) gather platform specific information > > 1.b) associate hot-added CPU with a node > > 1.c) create CPU device > > 2) User online hot-added CPUs through sysfs: > > 2.a)cpu_up() > > 2.b)->try_online_node() > > 2.c)->hotadd_new_pgdat() > > 2.d)->node_set_online() > > > > So between 1.b and 2.c, kmalloc_node(nid) may cause invalid > > memory access without the node_online(nid) check. > > Any why was all this not in the Changelog? Also, do explain what kind of hardware you needed to trigger this. This code has been like this for a good while. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bugfix] sched: fix possible invalid memory access caused by CPU hot-addition
On 2014/4/23 13:32, Peter Zijlstra wrote: > On Wed, Apr 23, 2014 at 10:45:13AM +0800, Jiang Liu wrote: >> Hi Peter, >> It's not for memoryless node, but to solve a race window >> in CPU hot-addition. The related CPU hot-addition flow is: >> 1) Handle CPU hot-addition event >> 1.a) gather platform specific information >> 1.b) associate hot-added CPU with a node >> 1.c) create CPU device >> 2) User online hot-added CPUs through sysfs: >> 2.a) cpu_up() >> 2.b) ->try_online_node() >> 2.c) ->hotadd_new_pgdat() >> 2.d) ->node_set_online() >> >> So between 1.b and 2.c, kmalloc_node(nid) may cause invalid >> memory access without the node_online(nid) check. > > Any why was all this not in the Changelog? Sorry, will add above message into changelog. Thanks! Gerry -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bugfix] sched: fix possible invalid memory access caused by CPU hot-addition
On Wed, Apr 23, 2014 at 10:45:13AM +0800, Jiang Liu wrote: > Hi Peter, > It's not for memoryless node, but to solve a race window > in CPU hot-addition. The related CPU hot-addition flow is: > 1) Handle CPU hot-addition event > 1.a) gather platform specific information > 1.b) associate hot-added CPU with a node > 1.c) create CPU device > 2) User online hot-added CPUs through sysfs: > 2.a) cpu_up() > 2.b) ->try_online_node() > 2.c) ->hotadd_new_pgdat() > 2.d) ->node_set_online() > > So between 1.b and 2.c, kmalloc_node(nid) may cause invalid > memory access without the node_online(nid) check. Any why was all this not in the Changelog? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bugfix] sched: fix possible invalid memory access caused by CPU hot-addition
On 2014/4/23 9:59, David Rientjes wrote: > On Tue, 22 Apr 2014, Peter Zijlstra wrote: > >> On Tue, Apr 22, 2014 at 01:01:51PM -0700, Andrew Morton wrote: >>> On Tue, 22 Apr 2014 10:15:15 +0200 Peter Zijlstra >>> wrote: >>> On Tue, Apr 22, 2014 at 01:27:15PM +0800, Jiang Liu wrote: > When calling kzalloc_node(size, flags, node), we should first check > whether node is onlined, otherwise it may cause invalid memory access > as below. But this is only for memory less node crap, right? >>> >>> um, why are memoryless nodes crap? >> >> Why wouldn't they be? Having CPUs with no local memory seems decidedly >> suboptimal. > > The quick fix for memoryless node issues is usually just do cpu_to_mem() > rather than cpu_to_node() in the caller. This assumes that the arch is > setup correctly to handle memoryless nodes with > CONFIG_HAVE_MEMORYLESS_NODES (and we've had problems recently with > memoryless nodes not being configured correctly on powerpc). > > That type of a fix would probably be better handled in the slab allocator, > though, since kmalloc_node(nid) shouldn't crash just because nid is > memoryless, we should be doing local_memory_node(node) when allocating the > slab pages. > > However, I don't think memoryless nodes are the problem here since Jiang > is testing for !node_online(nid) in his patch, so it's a problem with > cpu_to_node() pointing to an offline node. It makes sense for the page > allocator to crash in such a case, the node id is erroneous. > > So either the cpu-to-node mapping is invalid or alloc_fair_sched_group() > is allocating memory for a cpu on an offline node. The > for_each_possible_cpu() looks suspicious. There's no guarantee that > local_memory_node(node) for an offline node will return anything with > affinity, so falling back to NUMA_NO_NODE looks appropriate in Jiang's > patch. Hi David, That's the case, alloc_fair_sched_group() is trying to allocate memory for CPU in offline node, which then access non-exist NODE_DATA. Thanks! Gerry > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bugfix] sched: fix possible invalid memory access caused by CPU hot-addition
Hi Peter, It's not for memoryless node, but to solve a race window in CPU hot-addition. The related CPU hot-addition flow is: 1) Handle CPU hot-addition event 1.a) gather platform specific information 1.b) associate hot-added CPU with a node 1.c) create CPU device 2) User online hot-added CPUs through sysfs: 2.a)cpu_up() 2.b)->try_online_node() 2.c)->hotadd_new_pgdat() 2.d)->node_set_online() So between 1.b and 2.c, kmalloc_node(nid) may cause invalid memory access without the node_online(nid) check. Best Regards! Gerry On 2014/4/22 16:15, Peter Zijlstra wrote: > On Tue, Apr 22, 2014 at 01:27:15PM +0800, Jiang Liu wrote: >> When calling kzalloc_node(size, flags, node), we should first check >> whether node is onlined, otherwise it may cause invalid memory access >> as below. > > But this is only for memory less node crap, right? > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bugfix] sched: fix possible invalid memory access caused by CPU hot-addition
On Tue, 2014-04-22 at 22:04 +0200, Peter Zijlstra wrote: > On Tue, Apr 22, 2014 at 01:01:51PM -0700, Andrew Morton wrote: > > On Tue, 22 Apr 2014 10:15:15 +0200 Peter Zijlstra > > wrote: > > > > > On Tue, Apr 22, 2014 at 01:27:15PM +0800, Jiang Liu wrote: > > > > When calling kzalloc_node(size, flags, node), we should first check > > > > whether node is onlined, otherwise it may cause invalid memory access > > > > as below. > > > > > > But this is only for memory less node crap, right? > > > > um, why are memoryless nodes crap? > > Why wouldn't they be? Having CPUs with no local memory seems decidedly > suboptimal. This ain't exactly wonderful either, makes CPU domains to crawl over. node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 node 0 size: 31723 MB node 0 free: 27949 MB node 1 cpus: 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 node 1 size: 32308 MB node 1 free: 31033 MB node 2 cpus: node 2 size: 32768 MB node 2 free: 16631 MB node 3 cpus: node 3 size: 32768 MB node 3 free: 32640 MB node 4 cpus: node 4 size: 32768 MB node 4 free: 32640 MB node 5 cpus: node 5 size: 32768 MB node 5 free: 32638 MB node distances: node 0 1 2 3 4 5 0: 10 12 12 15 12 15 1: 12 10 15 12 15 15 2: 12 15 10 12 15 15 3: 15 12 12 10 15 12 4: 12 15 15 15 10 12 5: 15 15 15 12 12 10 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bugfix] sched: fix possible invalid memory access caused by CPU hot-addition
On Tue, 22 Apr 2014, Peter Zijlstra wrote: > On Tue, Apr 22, 2014 at 01:01:51PM -0700, Andrew Morton wrote: > > On Tue, 22 Apr 2014 10:15:15 +0200 Peter Zijlstra > > wrote: > > > > > On Tue, Apr 22, 2014 at 01:27:15PM +0800, Jiang Liu wrote: > > > > When calling kzalloc_node(size, flags, node), we should first check > > > > whether node is onlined, otherwise it may cause invalid memory access > > > > as below. > > > > > > But this is only for memory less node crap, right? > > > > um, why are memoryless nodes crap? > > Why wouldn't they be? Having CPUs with no local memory seems decidedly > suboptimal. The quick fix for memoryless node issues is usually just do cpu_to_mem() rather than cpu_to_node() in the caller. This assumes that the arch is setup correctly to handle memoryless nodes with CONFIG_HAVE_MEMORYLESS_NODES (and we've had problems recently with memoryless nodes not being configured correctly on powerpc). That type of a fix would probably be better handled in the slab allocator, though, since kmalloc_node(nid) shouldn't crash just because nid is memoryless, we should be doing local_memory_node(node) when allocating the slab pages. However, I don't think memoryless nodes are the problem here since Jiang is testing for !node_online(nid) in his patch, so it's a problem with cpu_to_node() pointing to an offline node. It makes sense for the page allocator to crash in such a case, the node id is erroneous. So either the cpu-to-node mapping is invalid or alloc_fair_sched_group() is allocating memory for a cpu on an offline node. The for_each_possible_cpu() looks suspicious. There's no guarantee that local_memory_node(node) for an offline node will return anything with affinity, so falling back to NUMA_NO_NODE looks appropriate in Jiang's patch. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bugfix] sched: fix possible invalid memory access caused by CPU hot-addition
On Tue, Apr 22, 2014 at 01:01:51PM -0700, Andrew Morton wrote: > On Tue, 22 Apr 2014 10:15:15 +0200 Peter Zijlstra > wrote: > > > On Tue, Apr 22, 2014 at 01:27:15PM +0800, Jiang Liu wrote: > > > When calling kzalloc_node(size, flags, node), we should first check > > > whether node is onlined, otherwise it may cause invalid memory access > > > as below. > > > > But this is only for memory less node crap, right? > > um, why are memoryless nodes crap? Why wouldn't they be? Having CPUs with no local memory seems decidedly suboptimal. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bugfix] sched: fix possible invalid memory access caused by CPU hot-addition
On Tue, 22 Apr 2014 10:15:15 +0200 Peter Zijlstra wrote: > On Tue, Apr 22, 2014 at 01:27:15PM +0800, Jiang Liu wrote: > > When calling kzalloc_node(size, flags, node), we should first check > > whether node is onlined, otherwise it may cause invalid memory access > > as below. > > But this is only for memory less node crap, right? um, why are memoryless nodes crap? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bugfix] sched: fix possible invalid memory access caused by CPU hot-addition
On Tue, Apr 22, 2014 at 01:27:15PM +0800, Jiang Liu wrote: > When calling kzalloc_node(size, flags, node), we should first check > whether node is onlined, otherwise it may cause invalid memory access > as below. But this is only for memory less node crap, right? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bugfix] sched: fix possible invalid memory access caused by CPU hot-addition
On Tue, Apr 22, 2014 at 01:27:15PM +0800, Jiang Liu wrote: When calling kzalloc_node(size, flags, node), we should first check whether node is onlined, otherwise it may cause invalid memory access as below. But this is only for memory less node crap, right? -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bugfix] sched: fix possible invalid memory access caused by CPU hot-addition
On Tue, 22 Apr 2014 10:15:15 +0200 Peter Zijlstra pet...@infradead.org wrote: On Tue, Apr 22, 2014 at 01:27:15PM +0800, Jiang Liu wrote: When calling kzalloc_node(size, flags, node), we should first check whether node is onlined, otherwise it may cause invalid memory access as below. But this is only for memory less node crap, right? um, why are memoryless nodes crap? -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bugfix] sched: fix possible invalid memory access caused by CPU hot-addition
On Tue, Apr 22, 2014 at 01:01:51PM -0700, Andrew Morton wrote: On Tue, 22 Apr 2014 10:15:15 +0200 Peter Zijlstra pet...@infradead.org wrote: On Tue, Apr 22, 2014 at 01:27:15PM +0800, Jiang Liu wrote: When calling kzalloc_node(size, flags, node), we should first check whether node is onlined, otherwise it may cause invalid memory access as below. But this is only for memory less node crap, right? um, why are memoryless nodes crap? Why wouldn't they be? Having CPUs with no local memory seems decidedly suboptimal. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bugfix] sched: fix possible invalid memory access caused by CPU hot-addition
On Tue, 22 Apr 2014, Peter Zijlstra wrote: On Tue, Apr 22, 2014 at 01:01:51PM -0700, Andrew Morton wrote: On Tue, 22 Apr 2014 10:15:15 +0200 Peter Zijlstra pet...@infradead.org wrote: On Tue, Apr 22, 2014 at 01:27:15PM +0800, Jiang Liu wrote: When calling kzalloc_node(size, flags, node), we should first check whether node is onlined, otherwise it may cause invalid memory access as below. But this is only for memory less node crap, right? um, why are memoryless nodes crap? Why wouldn't they be? Having CPUs with no local memory seems decidedly suboptimal. The quick fix for memoryless node issues is usually just do cpu_to_mem() rather than cpu_to_node() in the caller. This assumes that the arch is setup correctly to handle memoryless nodes with CONFIG_HAVE_MEMORYLESS_NODES (and we've had problems recently with memoryless nodes not being configured correctly on powerpc). That type of a fix would probably be better handled in the slab allocator, though, since kmalloc_node(nid) shouldn't crash just because nid is memoryless, we should be doing local_memory_node(node) when allocating the slab pages. However, I don't think memoryless nodes are the problem here since Jiang is testing for !node_online(nid) in his patch, so it's a problem with cpu_to_node() pointing to an offline node. It makes sense for the page allocator to crash in such a case, the node id is erroneous. So either the cpu-to-node mapping is invalid or alloc_fair_sched_group() is allocating memory for a cpu on an offline node. The for_each_possible_cpu() looks suspicious. There's no guarantee that local_memory_node(node) for an offline node will return anything with affinity, so falling back to NUMA_NO_NODE looks appropriate in Jiang's patch. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bugfix] sched: fix possible invalid memory access caused by CPU hot-addition
On Tue, 2014-04-22 at 22:04 +0200, Peter Zijlstra wrote: On Tue, Apr 22, 2014 at 01:01:51PM -0700, Andrew Morton wrote: On Tue, 22 Apr 2014 10:15:15 +0200 Peter Zijlstra pet...@infradead.org wrote: On Tue, Apr 22, 2014 at 01:27:15PM +0800, Jiang Liu wrote: When calling kzalloc_node(size, flags, node), we should first check whether node is onlined, otherwise it may cause invalid memory access as below. But this is only for memory less node crap, right? um, why are memoryless nodes crap? Why wouldn't they be? Having CPUs with no local memory seems decidedly suboptimal. This ain't exactly wonderful either, makes CPU domains to crawl over. node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 node 0 size: 31723 MB node 0 free: 27949 MB node 1 cpus: 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 node 1 size: 32308 MB node 1 free: 31033 MB node 2 cpus: node 2 size: 32768 MB node 2 free: 16631 MB node 3 cpus: node 3 size: 32768 MB node 3 free: 32640 MB node 4 cpus: node 4 size: 32768 MB node 4 free: 32640 MB node 5 cpus: node 5 size: 32768 MB node 5 free: 32638 MB node distances: node 0 1 2 3 4 5 0: 10 12 12 15 12 15 1: 12 10 15 12 15 15 2: 12 15 10 12 15 15 3: 15 12 12 10 15 12 4: 12 15 15 15 10 12 5: 15 15 15 12 12 10 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bugfix] sched: fix possible invalid memory access caused by CPU hot-addition
Hi Peter, It's not for memoryless node, but to solve a race window in CPU hot-addition. The related CPU hot-addition flow is: 1) Handle CPU hot-addition event 1.a) gather platform specific information 1.b) associate hot-added CPU with a node 1.c) create CPU device 2) User online hot-added CPUs through sysfs: 2.a)cpu_up() 2.b)-try_online_node() 2.c)-hotadd_new_pgdat() 2.d)-node_set_online() So between 1.b and 2.c, kmalloc_node(nid) may cause invalid memory access without the node_online(nid) check. Best Regards! Gerry On 2014/4/22 16:15, Peter Zijlstra wrote: On Tue, Apr 22, 2014 at 01:27:15PM +0800, Jiang Liu wrote: When calling kzalloc_node(size, flags, node), we should first check whether node is onlined, otherwise it may cause invalid memory access as below. But this is only for memory less node crap, right? -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bugfix] sched: fix possible invalid memory access caused by CPU hot-addition
On 2014/4/23 9:59, David Rientjes wrote: On Tue, 22 Apr 2014, Peter Zijlstra wrote: On Tue, Apr 22, 2014 at 01:01:51PM -0700, Andrew Morton wrote: On Tue, 22 Apr 2014 10:15:15 +0200 Peter Zijlstra pet...@infradead.org wrote: On Tue, Apr 22, 2014 at 01:27:15PM +0800, Jiang Liu wrote: When calling kzalloc_node(size, flags, node), we should first check whether node is onlined, otherwise it may cause invalid memory access as below. But this is only for memory less node crap, right? um, why are memoryless nodes crap? Why wouldn't they be? Having CPUs with no local memory seems decidedly suboptimal. The quick fix for memoryless node issues is usually just do cpu_to_mem() rather than cpu_to_node() in the caller. This assumes that the arch is setup correctly to handle memoryless nodes with CONFIG_HAVE_MEMORYLESS_NODES (and we've had problems recently with memoryless nodes not being configured correctly on powerpc). That type of a fix would probably be better handled in the slab allocator, though, since kmalloc_node(nid) shouldn't crash just because nid is memoryless, we should be doing local_memory_node(node) when allocating the slab pages. However, I don't think memoryless nodes are the problem here since Jiang is testing for !node_online(nid) in his patch, so it's a problem with cpu_to_node() pointing to an offline node. It makes sense for the page allocator to crash in such a case, the node id is erroneous. So either the cpu-to-node mapping is invalid or alloc_fair_sched_group() is allocating memory for a cpu on an offline node. The for_each_possible_cpu() looks suspicious. There's no guarantee that local_memory_node(node) for an offline node will return anything with affinity, so falling back to NUMA_NO_NODE looks appropriate in Jiang's patch. Hi David, That's the case, alloc_fair_sched_group() is trying to allocate memory for CPU in offline node, which then access non-exist NODE_DATA. Thanks! Gerry -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bugfix] sched: fix possible invalid memory access caused by CPU hot-addition
On Wed, Apr 23, 2014 at 10:45:13AM +0800, Jiang Liu wrote: Hi Peter, It's not for memoryless node, but to solve a race window in CPU hot-addition. The related CPU hot-addition flow is: 1) Handle CPU hot-addition event 1.a) gather platform specific information 1.b) associate hot-added CPU with a node 1.c) create CPU device 2) User online hot-added CPUs through sysfs: 2.a) cpu_up() 2.b) -try_online_node() 2.c) -hotadd_new_pgdat() 2.d) -node_set_online() So between 1.b and 2.c, kmalloc_node(nid) may cause invalid memory access without the node_online(nid) check. Any why was all this not in the Changelog? -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bugfix] sched: fix possible invalid memory access caused by CPU hot-addition
On 2014/4/23 13:32, Peter Zijlstra wrote: On Wed, Apr 23, 2014 at 10:45:13AM +0800, Jiang Liu wrote: Hi Peter, It's not for memoryless node, but to solve a race window in CPU hot-addition. The related CPU hot-addition flow is: 1) Handle CPU hot-addition event 1.a) gather platform specific information 1.b) associate hot-added CPU with a node 1.c) create CPU device 2) User online hot-added CPUs through sysfs: 2.a) cpu_up() 2.b) -try_online_node() 2.c) -hotadd_new_pgdat() 2.d) -node_set_online() So between 1.b and 2.c, kmalloc_node(nid) may cause invalid memory access without the node_online(nid) check. Any why was all this not in the Changelog? Sorry, will add above message into changelog. Thanks! Gerry -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bugfix] sched: fix possible invalid memory access caused by CPU hot-addition
On Wed, Apr 23, 2014 at 07:32:13AM +0200, Peter Zijlstra wrote: On Wed, Apr 23, 2014 at 10:45:13AM +0800, Jiang Liu wrote: Hi Peter, It's not for memoryless node, but to solve a race window in CPU hot-addition. The related CPU hot-addition flow is: 1) Handle CPU hot-addition event 1.a) gather platform specific information 1.b) associate hot-added CPU with a node 1.c) create CPU device 2) User online hot-added CPUs through sysfs: 2.a)cpu_up() 2.b)-try_online_node() 2.c)-hotadd_new_pgdat() 2.d)-node_set_online() So between 1.b and 2.c, kmalloc_node(nid) may cause invalid memory access without the node_online(nid) check. Any why was all this not in the Changelog? Also, do explain what kind of hardware you needed to trigger this. This code has been like this for a good while. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/