Re: [Bugfix v2] sched: fix possible invalid memory access caused by CPU hot-addition

2014-04-29 Thread Jiang Liu
Thanks Peter, I will try to find other solutions.

On 2014/4/28 15:09, Peter Zijlstra wrote:
> On Mon, Apr 28, 2014 at 10:48:13AM +0800, Jiang Liu wrote:
>> Intel platforms with Nehalem/Westmere/IvyBridge CPUs may support socket
>> hotplug/online at runtime. The CPU hot-addition flow is:
>> 1) handle CPU hot-addition event
>>  1.a) gather platform specific information
>>  1.b) associate hot-added CPU with NUMA node
>>  1.c) create CPU device
>> 2) online hot-added CPU through sysfs:
>>  2.a)cpu_up()
>>  2.b)->try_online_node()
>>  2.c)->hotadd_new_pgdat()
>>  2.d)->node_set_online()
>>
>> Between 1.b and 2.c, hot-added CPUs are associated with NUMA nodes
>> but those NUMA nodes may still be in offlined state. So we should
>> check node_online(nid) before calling kmalloc_node(nid) and friends,
>> otherwise it may cause invalid memory access as below.
> 
> So complete and full NAK on this. This is a workaround for a fucked in
> the head BIOS. If you're going to do a work around for that they should
> live in arch/ space, not in core code.
> 
> The code in question is nearly 7 years old (2.6.24), which leads me to
> believe it works just fine for (regular) memory less nodes as I've not
> had complaints about it before.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bugfix v2] sched: fix possible invalid memory access caused by CPU hot-addition

2014-04-29 Thread Jiang Liu
Thanks Peter, I will try to find other solutions.

On 2014/4/28 15:09, Peter Zijlstra wrote:
 On Mon, Apr 28, 2014 at 10:48:13AM +0800, Jiang Liu wrote:
 Intel platforms with Nehalem/Westmere/IvyBridge CPUs may support socket
 hotplug/online at runtime. The CPU hot-addition flow is:
 1) handle CPU hot-addition event
  1.a) gather platform specific information
  1.b) associate hot-added CPU with NUMA node
  1.c) create CPU device
 2) online hot-added CPU through sysfs:
  2.a)cpu_up()
  2.b)-try_online_node()
  2.c)-hotadd_new_pgdat()
  2.d)-node_set_online()

 Between 1.b and 2.c, hot-added CPUs are associated with NUMA nodes
 but those NUMA nodes may still be in offlined state. So we should
 check node_online(nid) before calling kmalloc_node(nid) and friends,
 otherwise it may cause invalid memory access as below.
 
 So complete and full NAK on this. This is a workaround for a fucked in
 the head BIOS. If you're going to do a work around for that they should
 live in arch/ space, not in core code.
 
 The code in question is nearly 7 years old (2.6.24), which leads me to
 believe it works just fine for (regular) memory less nodes as I've not
 had complaints about it before.
 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/
 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bugfix v2] sched: fix possible invalid memory access caused by CPU hot-addition

2014-04-28 Thread Peter Zijlstra
On Mon, Apr 28, 2014 at 10:48:13AM +0800, Jiang Liu wrote:
> Intel platforms with Nehalem/Westmere/IvyBridge CPUs may support socket
> hotplug/online at runtime. The CPU hot-addition flow is:
> 1) handle CPU hot-addition event
>   1.a) gather platform specific information
>   1.b) associate hot-added CPU with NUMA node
>   1.c) create CPU device
> 2) online hot-added CPU through sysfs:
>   2.a)cpu_up()
>   2.b)->try_online_node()
>   2.c)->hotadd_new_pgdat()
>   2.d)->node_set_online()
> 
> Between 1.b and 2.c, hot-added CPUs are associated with NUMA nodes
> but those NUMA nodes may still be in offlined state. So we should
> check node_online(nid) before calling kmalloc_node(nid) and friends,
> otherwise it may cause invalid memory access as below.

So complete and full NAK on this. This is a workaround for a fucked in
the head BIOS. If you're going to do a work around for that they should
live in arch/ space, not in core code.

The code in question is nearly 7 years old (2.6.24), which leads me to
believe it works just fine for (regular) memory less nodes as I've not
had complaints about it before.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bugfix v2] sched: fix possible invalid memory access caused by CPU hot-addition

2014-04-28 Thread Peter Zijlstra
On Mon, Apr 28, 2014 at 10:48:13AM +0800, Jiang Liu wrote:
 Intel platforms with Nehalem/Westmere/IvyBridge CPUs may support socket
 hotplug/online at runtime. The CPU hot-addition flow is:
 1) handle CPU hot-addition event
   1.a) gather platform specific information
   1.b) associate hot-added CPU with NUMA node
   1.c) create CPU device
 2) online hot-added CPU through sysfs:
   2.a)cpu_up()
   2.b)-try_online_node()
   2.c)-hotadd_new_pgdat()
   2.d)-node_set_online()
 
 Between 1.b and 2.c, hot-added CPUs are associated with NUMA nodes
 but those NUMA nodes may still be in offlined state. So we should
 check node_online(nid) before calling kmalloc_node(nid) and friends,
 otherwise it may cause invalid memory access as below.

So complete and full NAK on this. This is a workaround for a fucked in
the head BIOS. If you're going to do a work around for that they should
live in arch/ space, not in core code.

The code in question is nearly 7 years old (2.6.24), which leads me to
believe it works just fine for (regular) memory less nodes as I've not
had complaints about it before.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[Bugfix v2] sched: fix possible invalid memory access caused by CPU hot-addition

2014-04-27 Thread Jiang Liu
Intel platforms with Nehalem/Westmere/IvyBridge CPUs may support socket
hotplug/online at runtime. The CPU hot-addition flow is:
1) handle CPU hot-addition event
1.a) gather platform specific information
1.b) associate hot-added CPU with NUMA node
1.c) create CPU device
2) online hot-added CPU through sysfs:
2.a)cpu_up()
2.b)->try_online_node()
2.c)->hotadd_new_pgdat()
2.d)->node_set_online()

Between 1.b and 2.c, hot-added CPUs are associated with NUMA nodes
but those NUMA nodes may still be in offlined state. So we should
check node_online(nid) before calling kmalloc_node(nid) and friends,
otherwise it may cause invalid memory access as below.

[ 3663.324476] BUG: unable to handle kernel paging request at 1f08
[ 3663.332348] IP: [] __alloc_pages_nodemask+0xb9/0x2d0
[ 3663.339719] PGD 82fe10067 PUD 82ebef067 PMD 0
[ 3663.344773] Oops:  [#1] SMP
[ 3663.348455] Modules linked in: shpchp gpio_ich x86_pkg_temp_thermal 
intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul 
ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper 
cryptd microcode joydev sb_edac edac_core lpc_ich ipmi_si tpm_tis 
ipmi_msghandler ioatdma wmi acpi_pad mac_hid lp parport ixgbe isci mpt2sas dca 
ahci ptp libsas libahci raid_class pps_core scsi_transport_sas mdio hid_generic 
usbhid hid
[ 3663.394393] CPU: 61 PID: 2416 Comm: cron Tainted: GW3.14.0-rc5+ 
#21
[ 3663.402643] Hardware name: Intel Corporation BRICKLAND/BRICKLAND, BIOS 
BRIVTIN1.86B.0047.F03.1403031049 03/03/2014
[ 3663.414299] task: 88082fe54b00 ti: 880845fba000 task.ti: 
880845fba000
[ 3663.422741] RIP: 0010:[]  [] 
__alloc_pages_nodemask+0xb9/0x2d0
[ 3663.432857] RSP: 0018:880845fbbcd0  EFLAGS: 00010246
[ 3663.439265] RAX: 1f00 RBX:  RCX: 
[ 3663.447291] RDX:  RSI: 0a8d RDI: 81a8d950
[ 3663.455318] RBP: 880845fbbd58 R08: 880823293400 R09: 0001
[ 3663.463345] R10: 0001 R11:  R12: 002052d0
[ 3663.471363] R13: 880854c07600 R14: 0002 R15: 
[ 3663.479389] FS:  7f2e8b99e800() GS:88105a40() 
knlGS:
[ 3663.488514] CS:  0010 DS:  ES:  CR0: 80050033
[ 3663.495018] CR2: 1f08 CR3: 0008237b1000 CR4: 001407e0
[ 3663.503476] Stack:
[ 3663.505757]  811bd74d 880854c01d98 880854c01df0 
880854c01dd0
[ 3663.514167]  0003208ca420 00075a5d84d0 88082fe54b00 
811bb35f
[ 3663.522567]  880854c07600 0003 1f00 
880845fbbd48
[ 3663.530976] Call Trace:
[ 3663.533753]  [] ? deactivate_slab+0x41d/0x4f0
[ 3663.540421]  [] ? new_slab+0x3f/0x2d0
[ 3663.546307]  [] new_slab+0xa5/0x2d0
[ 3663.552001]  [] __slab_alloc+0x35d/0x54a
[ 3663.558185]  [] ? local_clock+0x25/0x30
[ 3663.564686]  [] ? __do_page_fault+0x4ec/0x5e0
[ 3663.571356]  [] ? alloc_fair_sched_group+0xc4/0x190
[ 3663.578609]  [] ? __raw_spin_lock_init+0x21/0x60
[ 3663.585570]  [] kmem_cache_alloc_node_trace+0xa6/0x1d0
[ 3663.593112]  [] ? alloc_fair_sched_group+0xc4/0x190
[ 3663.600363]  [] alloc_fair_sched_group+0xc4/0x190
[ 3663.607423]  [] sched_create_group+0x3f/0x80
[ 3663.613994]  [] sched_autogroup_create_attach+0x3f/0x1b0
[ 3663.621732]  [] sys_setsid+0xea/0x110
[ 3663.628020]  [] system_call_fastpath+0x1a/0x1f
[ 3663.634780] Code: 00 44 89 e7 e8 b9 f8 f4 ff 41 f6 c4 10 74 18 31 d2 be 8d 
0a 00 00 48 c7 c7 50 d9 a8 81 e8 70 6a f2 ff e8 db dd 5f 00 48 8b 45 c8 <48> 83 
78 08 00 0f 84 b5 01 00 00 48 83 c0 08 44 89 75 c0 4d 89
[ 3663.657032] RIP  [] __alloc_pages_nodemask+0xb9/0x2d0
[ 3663.664491]  RSP 
[ 3663.668429] CR2: 1f08
[ 3663.672659] ---[ end trace df13f08ed9de18ad ]---

Signed-off-by: Jiang Liu 
---
Hi all,
We have improved log messages according to Peter's suggestion,
no code changes.
Thanks!
Gerry
---
 kernel/sched/fair.c |   12 +++-
 kernel/sched/rt.c   |   11 +++
 2 files changed, 14 insertions(+), 9 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 7570dd969c28..71be1b96662e 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7487,7 +7487,7 @@ int alloc_fair_sched_group(struct task_group *tg, struct 
task_group *parent)
 {
struct cfs_rq *cfs_rq;
struct sched_entity *se;
-   int i;
+   int i, nid;
 
tg->cfs_rq = kzalloc(sizeof(cfs_rq) * nr_cpu_ids, GFP_KERNEL);
if (!tg->cfs_rq)
@@ -7501,13 +7501,15 @@ int alloc_fair_sched_group(struct task_group *tg, 
struct task_group *parent)
init_cfs_bandwidth(tg_cfs_bandwidth(tg));
 
for_each_possible_cpu(i) {
-   cfs_rq = kzalloc_node(sizeof(struct cfs_rq),
- GFP_KERNEL, cpu_to_node(i));
+   nid = 

[Bugfix v2] sched: fix possible invalid memory access caused by CPU hot-addition

2014-04-27 Thread Jiang Liu
Intel platforms with Nehalem/Westmere/IvyBridge CPUs may support socket
hotplug/online at runtime. The CPU hot-addition flow is:
1) handle CPU hot-addition event
1.a) gather platform specific information
1.b) associate hot-added CPU with NUMA node
1.c) create CPU device
2) online hot-added CPU through sysfs:
2.a)cpu_up()
2.b)-try_online_node()
2.c)-hotadd_new_pgdat()
2.d)-node_set_online()

Between 1.b and 2.c, hot-added CPUs are associated with NUMA nodes
but those NUMA nodes may still be in offlined state. So we should
check node_online(nid) before calling kmalloc_node(nid) and friends,
otherwise it may cause invalid memory access as below.

[ 3663.324476] BUG: unable to handle kernel paging request at 1f08
[ 3663.332348] IP: [81172219] __alloc_pages_nodemask+0xb9/0x2d0
[ 3663.339719] PGD 82fe10067 PUD 82ebef067 PMD 0
[ 3663.344773] Oops:  [#1] SMP
[ 3663.348455] Modules linked in: shpchp gpio_ich x86_pkg_temp_thermal 
intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul 
ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper 
cryptd microcode joydev sb_edac edac_core lpc_ich ipmi_si tpm_tis 
ipmi_msghandler ioatdma wmi acpi_pad mac_hid lp parport ixgbe isci mpt2sas dca 
ahci ptp libsas libahci raid_class pps_core scsi_transport_sas mdio hid_generic 
usbhid hid
[ 3663.394393] CPU: 61 PID: 2416 Comm: cron Tainted: GW3.14.0-rc5+ 
#21
[ 3663.402643] Hardware name: Intel Corporation BRICKLAND/BRICKLAND, BIOS 
BRIVTIN1.86B.0047.F03.1403031049 03/03/2014
[ 3663.414299] task: 88082fe54b00 ti: 880845fba000 task.ti: 
880845fba000
[ 3663.422741] RIP: 0010:[81172219]  [81172219] 
__alloc_pages_nodemask+0xb9/0x2d0
[ 3663.432857] RSP: 0018:880845fbbcd0  EFLAGS: 00010246
[ 3663.439265] RAX: 1f00 RBX:  RCX: 
[ 3663.447291] RDX:  RSI: 0a8d RDI: 81a8d950
[ 3663.455318] RBP: 880845fbbd58 R08: 880823293400 R09: 0001
[ 3663.463345] R10: 0001 R11:  R12: 002052d0
[ 3663.471363] R13: 880854c07600 R14: 0002 R15: 
[ 3663.479389] FS:  7f2e8b99e800() GS:88105a40() 
knlGS:
[ 3663.488514] CS:  0010 DS:  ES:  CR0: 80050033
[ 3663.495018] CR2: 1f08 CR3: 0008237b1000 CR4: 001407e0
[ 3663.503476] Stack:
[ 3663.505757]  811bd74d 880854c01d98 880854c01df0 
880854c01dd0
[ 3663.514167]  0003208ca420 00075a5d84d0 88082fe54b00 
811bb35f
[ 3663.522567]  880854c07600 0003 1f00 
880845fbbd48
[ 3663.530976] Call Trace:
[ 3663.533753]  [811bd74d] ? deactivate_slab+0x41d/0x4f0
[ 3663.540421]  [811bb35f] ? new_slab+0x3f/0x2d0
[ 3663.546307]  [811bb3c5] new_slab+0xa5/0x2d0
[ 3663.552001]  [81768c97] __slab_alloc+0x35d/0x54a
[ 3663.558185]  [810a4845] ? local_clock+0x25/0x30
[ 3663.564686]  [8177a34c] ? __do_page_fault+0x4ec/0x5e0
[ 3663.571356]  [810b0054] ? alloc_fair_sched_group+0xc4/0x190
[ 3663.578609]  [810c77f1] ? __raw_spin_lock_init+0x21/0x60
[ 3663.585570]  [811be476] kmem_cache_alloc_node_trace+0xa6/0x1d0
[ 3663.593112]  [810b0054] ? alloc_fair_sched_group+0xc4/0x190
[ 3663.600363]  [810b0054] alloc_fair_sched_group+0xc4/0x190
[ 3663.607423]  [810a359f] sched_create_group+0x3f/0x80
[ 3663.613994]  [810b611f] sched_autogroup_create_attach+0x3f/0x1b0
[ 3663.621732]  [8108258a] sys_setsid+0xea/0x110
[ 3663.628020]  [8177f42d] system_call_fastpath+0x1a/0x1f
[ 3663.634780] Code: 00 44 89 e7 e8 b9 f8 f4 ff 41 f6 c4 10 74 18 31 d2 be 8d 
0a 00 00 48 c7 c7 50 d9 a8 81 e8 70 6a f2 ff e8 db dd 5f 00 48 8b 45 c8 48 83 
78 08 00 0f 84 b5 01 00 00 48 83 c0 08 44 89 75 c0 4d 89
[ 3663.657032] RIP  [81172219] __alloc_pages_nodemask+0xb9/0x2d0
[ 3663.664491]  RSP 880845fbbcd0
[ 3663.668429] CR2: 1f08
[ 3663.672659] ---[ end trace df13f08ed9de18ad ]---

Signed-off-by: Jiang Liu jiang@linux.intel.com
---
Hi all,
We have improved log messages according to Peter's suggestion,
no code changes.
Thanks!
Gerry
---
 kernel/sched/fair.c |   12 +++-
 kernel/sched/rt.c   |   11 +++
 2 files changed, 14 insertions(+), 9 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 7570dd969c28..71be1b96662e 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7487,7 +7487,7 @@ int alloc_fair_sched_group(struct task_group *tg, struct 
task_group *parent)
 {
struct cfs_rq *cfs_rq;
struct sched_entity *se;
-   int i;
+   int i, nid;
 
tg-cfs_rq = kzalloc(sizeof(cfs_rq) * nr_cpu_ids, GFP_KERNEL);
if (!tg-cfs_rq)