Re: [Bugfix v2] sched: fix possible invalid memory access caused by CPU hot-addition
Thanks Peter, I will try to find other solutions. On 2014/4/28 15:09, Peter Zijlstra wrote: > On Mon, Apr 28, 2014 at 10:48:13AM +0800, Jiang Liu wrote: >> Intel platforms with Nehalem/Westmere/IvyBridge CPUs may support socket >> hotplug/online at runtime. The CPU hot-addition flow is: >> 1) handle CPU hot-addition event >> 1.a) gather platform specific information >> 1.b) associate hot-added CPU with NUMA node >> 1.c) create CPU device >> 2) online hot-added CPU through sysfs: >> 2.a)cpu_up() >> 2.b)->try_online_node() >> 2.c)->hotadd_new_pgdat() >> 2.d)->node_set_online() >> >> Between 1.b and 2.c, hot-added CPUs are associated with NUMA nodes >> but those NUMA nodes may still be in offlined state. So we should >> check node_online(nid) before calling kmalloc_node(nid) and friends, >> otherwise it may cause invalid memory access as below. > > So complete and full NAK on this. This is a workaround for a fucked in > the head BIOS. If you're going to do a work around for that they should > live in arch/ space, not in core code. > > The code in question is nearly 7 years old (2.6.24), which leads me to > believe it works just fine for (regular) memory less nodes as I've not > had complaints about it before. > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bugfix v2] sched: fix possible invalid memory access caused by CPU hot-addition
Thanks Peter, I will try to find other solutions. On 2014/4/28 15:09, Peter Zijlstra wrote: On Mon, Apr 28, 2014 at 10:48:13AM +0800, Jiang Liu wrote: Intel platforms with Nehalem/Westmere/IvyBridge CPUs may support socket hotplug/online at runtime. The CPU hot-addition flow is: 1) handle CPU hot-addition event 1.a) gather platform specific information 1.b) associate hot-added CPU with NUMA node 1.c) create CPU device 2) online hot-added CPU through sysfs: 2.a)cpu_up() 2.b)-try_online_node() 2.c)-hotadd_new_pgdat() 2.d)-node_set_online() Between 1.b and 2.c, hot-added CPUs are associated with NUMA nodes but those NUMA nodes may still be in offlined state. So we should check node_online(nid) before calling kmalloc_node(nid) and friends, otherwise it may cause invalid memory access as below. So complete and full NAK on this. This is a workaround for a fucked in the head BIOS. If you're going to do a work around for that they should live in arch/ space, not in core code. The code in question is nearly 7 years old (2.6.24), which leads me to believe it works just fine for (regular) memory less nodes as I've not had complaints about it before. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bugfix v2] sched: fix possible invalid memory access caused by CPU hot-addition
On Mon, Apr 28, 2014 at 10:48:13AM +0800, Jiang Liu wrote: > Intel platforms with Nehalem/Westmere/IvyBridge CPUs may support socket > hotplug/online at runtime. The CPU hot-addition flow is: > 1) handle CPU hot-addition event > 1.a) gather platform specific information > 1.b) associate hot-added CPU with NUMA node > 1.c) create CPU device > 2) online hot-added CPU through sysfs: > 2.a)cpu_up() > 2.b)->try_online_node() > 2.c)->hotadd_new_pgdat() > 2.d)->node_set_online() > > Between 1.b and 2.c, hot-added CPUs are associated with NUMA nodes > but those NUMA nodes may still be in offlined state. So we should > check node_online(nid) before calling kmalloc_node(nid) and friends, > otherwise it may cause invalid memory access as below. So complete and full NAK on this. This is a workaround for a fucked in the head BIOS. If you're going to do a work around for that they should live in arch/ space, not in core code. The code in question is nearly 7 years old (2.6.24), which leads me to believe it works just fine for (regular) memory less nodes as I've not had complaints about it before. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bugfix v2] sched: fix possible invalid memory access caused by CPU hot-addition
On Mon, Apr 28, 2014 at 10:48:13AM +0800, Jiang Liu wrote: Intel platforms with Nehalem/Westmere/IvyBridge CPUs may support socket hotplug/online at runtime. The CPU hot-addition flow is: 1) handle CPU hot-addition event 1.a) gather platform specific information 1.b) associate hot-added CPU with NUMA node 1.c) create CPU device 2) online hot-added CPU through sysfs: 2.a)cpu_up() 2.b)-try_online_node() 2.c)-hotadd_new_pgdat() 2.d)-node_set_online() Between 1.b and 2.c, hot-added CPUs are associated with NUMA nodes but those NUMA nodes may still be in offlined state. So we should check node_online(nid) before calling kmalloc_node(nid) and friends, otherwise it may cause invalid memory access as below. So complete and full NAK on this. This is a workaround for a fucked in the head BIOS. If you're going to do a work around for that they should live in arch/ space, not in core code. The code in question is nearly 7 years old (2.6.24), which leads me to believe it works just fine for (regular) memory less nodes as I've not had complaints about it before. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[Bugfix v2] sched: fix possible invalid memory access caused by CPU hot-addition
Intel platforms with Nehalem/Westmere/IvyBridge CPUs may support socket hotplug/online at runtime. The CPU hot-addition flow is: 1) handle CPU hot-addition event 1.a) gather platform specific information 1.b) associate hot-added CPU with NUMA node 1.c) create CPU device 2) online hot-added CPU through sysfs: 2.a)cpu_up() 2.b)->try_online_node() 2.c)->hotadd_new_pgdat() 2.d)->node_set_online() Between 1.b and 2.c, hot-added CPUs are associated with NUMA nodes but those NUMA nodes may still be in offlined state. So we should check node_online(nid) before calling kmalloc_node(nid) and friends, otherwise it may cause invalid memory access as below. [ 3663.324476] BUG: unable to handle kernel paging request at 1f08 [ 3663.332348] IP: [] __alloc_pages_nodemask+0xb9/0x2d0 [ 3663.339719] PGD 82fe10067 PUD 82ebef067 PMD 0 [ 3663.344773] Oops: [#1] SMP [ 3663.348455] Modules linked in: shpchp gpio_ich x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd microcode joydev sb_edac edac_core lpc_ich ipmi_si tpm_tis ipmi_msghandler ioatdma wmi acpi_pad mac_hid lp parport ixgbe isci mpt2sas dca ahci ptp libsas libahci raid_class pps_core scsi_transport_sas mdio hid_generic usbhid hid [ 3663.394393] CPU: 61 PID: 2416 Comm: cron Tainted: GW3.14.0-rc5+ #21 [ 3663.402643] Hardware name: Intel Corporation BRICKLAND/BRICKLAND, BIOS BRIVTIN1.86B.0047.F03.1403031049 03/03/2014 [ 3663.414299] task: 88082fe54b00 ti: 880845fba000 task.ti: 880845fba000 [ 3663.422741] RIP: 0010:[] [] __alloc_pages_nodemask+0xb9/0x2d0 [ 3663.432857] RSP: 0018:880845fbbcd0 EFLAGS: 00010246 [ 3663.439265] RAX: 1f00 RBX: RCX: [ 3663.447291] RDX: RSI: 0a8d RDI: 81a8d950 [ 3663.455318] RBP: 880845fbbd58 R08: 880823293400 R09: 0001 [ 3663.463345] R10: 0001 R11: R12: 002052d0 [ 3663.471363] R13: 880854c07600 R14: 0002 R15: [ 3663.479389] FS: 7f2e8b99e800() GS:88105a40() knlGS: [ 3663.488514] CS: 0010 DS: ES: CR0: 80050033 [ 3663.495018] CR2: 1f08 CR3: 0008237b1000 CR4: 001407e0 [ 3663.503476] Stack: [ 3663.505757] 811bd74d 880854c01d98 880854c01df0 880854c01dd0 [ 3663.514167] 0003208ca420 00075a5d84d0 88082fe54b00 811bb35f [ 3663.522567] 880854c07600 0003 1f00 880845fbbd48 [ 3663.530976] Call Trace: [ 3663.533753] [] ? deactivate_slab+0x41d/0x4f0 [ 3663.540421] [] ? new_slab+0x3f/0x2d0 [ 3663.546307] [] new_slab+0xa5/0x2d0 [ 3663.552001] [] __slab_alloc+0x35d/0x54a [ 3663.558185] [] ? local_clock+0x25/0x30 [ 3663.564686] [] ? __do_page_fault+0x4ec/0x5e0 [ 3663.571356] [] ? alloc_fair_sched_group+0xc4/0x190 [ 3663.578609] [] ? __raw_spin_lock_init+0x21/0x60 [ 3663.585570] [] kmem_cache_alloc_node_trace+0xa6/0x1d0 [ 3663.593112] [] ? alloc_fair_sched_group+0xc4/0x190 [ 3663.600363] [] alloc_fair_sched_group+0xc4/0x190 [ 3663.607423] [] sched_create_group+0x3f/0x80 [ 3663.613994] [] sched_autogroup_create_attach+0x3f/0x1b0 [ 3663.621732] [] sys_setsid+0xea/0x110 [ 3663.628020] [] system_call_fastpath+0x1a/0x1f [ 3663.634780] Code: 00 44 89 e7 e8 b9 f8 f4 ff 41 f6 c4 10 74 18 31 d2 be 8d 0a 00 00 48 c7 c7 50 d9 a8 81 e8 70 6a f2 ff e8 db dd 5f 00 48 8b 45 c8 <48> 83 78 08 00 0f 84 b5 01 00 00 48 83 c0 08 44 89 75 c0 4d 89 [ 3663.657032] RIP [] __alloc_pages_nodemask+0xb9/0x2d0 [ 3663.664491] RSP [ 3663.668429] CR2: 1f08 [ 3663.672659] ---[ end trace df13f08ed9de18ad ]--- Signed-off-by: Jiang Liu --- Hi all, We have improved log messages according to Peter's suggestion, no code changes. Thanks! Gerry --- kernel/sched/fair.c | 12 +++- kernel/sched/rt.c | 11 +++ 2 files changed, 14 insertions(+), 9 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 7570dd969c28..71be1b96662e 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -7487,7 +7487,7 @@ int alloc_fair_sched_group(struct task_group *tg, struct task_group *parent) { struct cfs_rq *cfs_rq; struct sched_entity *se; - int i; + int i, nid; tg->cfs_rq = kzalloc(sizeof(cfs_rq) * nr_cpu_ids, GFP_KERNEL); if (!tg->cfs_rq) @@ -7501,13 +7501,15 @@ int alloc_fair_sched_group(struct task_group *tg, struct task_group *parent) init_cfs_bandwidth(tg_cfs_bandwidth(tg)); for_each_possible_cpu(i) { - cfs_rq = kzalloc_node(sizeof(struct cfs_rq), - GFP_KERNEL, cpu_to_node(i)); + nid =
[Bugfix v2] sched: fix possible invalid memory access caused by CPU hot-addition
Intel platforms with Nehalem/Westmere/IvyBridge CPUs may support socket hotplug/online at runtime. The CPU hot-addition flow is: 1) handle CPU hot-addition event 1.a) gather platform specific information 1.b) associate hot-added CPU with NUMA node 1.c) create CPU device 2) online hot-added CPU through sysfs: 2.a)cpu_up() 2.b)-try_online_node() 2.c)-hotadd_new_pgdat() 2.d)-node_set_online() Between 1.b and 2.c, hot-added CPUs are associated with NUMA nodes but those NUMA nodes may still be in offlined state. So we should check node_online(nid) before calling kmalloc_node(nid) and friends, otherwise it may cause invalid memory access as below. [ 3663.324476] BUG: unable to handle kernel paging request at 1f08 [ 3663.332348] IP: [81172219] __alloc_pages_nodemask+0xb9/0x2d0 [ 3663.339719] PGD 82fe10067 PUD 82ebef067 PMD 0 [ 3663.344773] Oops: [#1] SMP [ 3663.348455] Modules linked in: shpchp gpio_ich x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd microcode joydev sb_edac edac_core lpc_ich ipmi_si tpm_tis ipmi_msghandler ioatdma wmi acpi_pad mac_hid lp parport ixgbe isci mpt2sas dca ahci ptp libsas libahci raid_class pps_core scsi_transport_sas mdio hid_generic usbhid hid [ 3663.394393] CPU: 61 PID: 2416 Comm: cron Tainted: GW3.14.0-rc5+ #21 [ 3663.402643] Hardware name: Intel Corporation BRICKLAND/BRICKLAND, BIOS BRIVTIN1.86B.0047.F03.1403031049 03/03/2014 [ 3663.414299] task: 88082fe54b00 ti: 880845fba000 task.ti: 880845fba000 [ 3663.422741] RIP: 0010:[81172219] [81172219] __alloc_pages_nodemask+0xb9/0x2d0 [ 3663.432857] RSP: 0018:880845fbbcd0 EFLAGS: 00010246 [ 3663.439265] RAX: 1f00 RBX: RCX: [ 3663.447291] RDX: RSI: 0a8d RDI: 81a8d950 [ 3663.455318] RBP: 880845fbbd58 R08: 880823293400 R09: 0001 [ 3663.463345] R10: 0001 R11: R12: 002052d0 [ 3663.471363] R13: 880854c07600 R14: 0002 R15: [ 3663.479389] FS: 7f2e8b99e800() GS:88105a40() knlGS: [ 3663.488514] CS: 0010 DS: ES: CR0: 80050033 [ 3663.495018] CR2: 1f08 CR3: 0008237b1000 CR4: 001407e0 [ 3663.503476] Stack: [ 3663.505757] 811bd74d 880854c01d98 880854c01df0 880854c01dd0 [ 3663.514167] 0003208ca420 00075a5d84d0 88082fe54b00 811bb35f [ 3663.522567] 880854c07600 0003 1f00 880845fbbd48 [ 3663.530976] Call Trace: [ 3663.533753] [811bd74d] ? deactivate_slab+0x41d/0x4f0 [ 3663.540421] [811bb35f] ? new_slab+0x3f/0x2d0 [ 3663.546307] [811bb3c5] new_slab+0xa5/0x2d0 [ 3663.552001] [81768c97] __slab_alloc+0x35d/0x54a [ 3663.558185] [810a4845] ? local_clock+0x25/0x30 [ 3663.564686] [8177a34c] ? __do_page_fault+0x4ec/0x5e0 [ 3663.571356] [810b0054] ? alloc_fair_sched_group+0xc4/0x190 [ 3663.578609] [810c77f1] ? __raw_spin_lock_init+0x21/0x60 [ 3663.585570] [811be476] kmem_cache_alloc_node_trace+0xa6/0x1d0 [ 3663.593112] [810b0054] ? alloc_fair_sched_group+0xc4/0x190 [ 3663.600363] [810b0054] alloc_fair_sched_group+0xc4/0x190 [ 3663.607423] [810a359f] sched_create_group+0x3f/0x80 [ 3663.613994] [810b611f] sched_autogroup_create_attach+0x3f/0x1b0 [ 3663.621732] [8108258a] sys_setsid+0xea/0x110 [ 3663.628020] [8177f42d] system_call_fastpath+0x1a/0x1f [ 3663.634780] Code: 00 44 89 e7 e8 b9 f8 f4 ff 41 f6 c4 10 74 18 31 d2 be 8d 0a 00 00 48 c7 c7 50 d9 a8 81 e8 70 6a f2 ff e8 db dd 5f 00 48 8b 45 c8 48 83 78 08 00 0f 84 b5 01 00 00 48 83 c0 08 44 89 75 c0 4d 89 [ 3663.657032] RIP [81172219] __alloc_pages_nodemask+0xb9/0x2d0 [ 3663.664491] RSP 880845fbbcd0 [ 3663.668429] CR2: 1f08 [ 3663.672659] ---[ end trace df13f08ed9de18ad ]--- Signed-off-by: Jiang Liu jiang@linux.intel.com --- Hi all, We have improved log messages according to Peter's suggestion, no code changes. Thanks! Gerry --- kernel/sched/fair.c | 12 +++- kernel/sched/rt.c | 11 +++ 2 files changed, 14 insertions(+), 9 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 7570dd969c28..71be1b96662e 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -7487,7 +7487,7 @@ int alloc_fair_sched_group(struct task_group *tg, struct task_group *parent) { struct cfs_rq *cfs_rq; struct sched_entity *se; - int i; + int i, nid; tg-cfs_rq = kzalloc(sizeof(cfs_rq) * nr_cpu_ids, GFP_KERNEL); if (!tg-cfs_rq)