Re: [RFC] Arm64 boot fail with numa enable in BIOS

2016-09-20 Thread Will Deacon
Hi Yisheng,

On Tue, Sep 20, 2016 at 11:29:24AM +0800, Yisheng Xie wrote:
> On 2016/9/19 22:07, Mark Rutland wrote:
> > On Mon, Sep 19, 2016 at 09:05:26PM +0800, Yisheng Xie wrote:
> > Can you modify the warning in cpumask.h to dump the bad CPU number? That
> > would make it fairly clear if that's the case.
> > 
> hi Mark,
> I dump the bad CPU number, it is 64,
> And the cpumask get from task is ,.
> 
> [3.873044] select_task_rq: allowed 0, allow_cpumask ,
> [3.879727] cpumask_check: cpu 64, nr_cpumask_bits:64, nr_cpu_ids= 64
> [3.895989] [ cut here ]
> [3.900652] WARNING: CPU: 16 PID: 103 at ./include/linux/cpumask.h:122 
> try_to_wake_up+0x410/0x4ac

Can you look at this patch from David, please:

http://lists.infradead.org/pipermail/linux-arm-kernel/2016-September/458110.html

and offer a Tested-by if it fixes your problem?

Thanks,

Will


Re: [RFC] Arm64 boot fail with numa enable in BIOS

2016-09-20 Thread Will Deacon
Hi Yisheng,

On Tue, Sep 20, 2016 at 11:29:24AM +0800, Yisheng Xie wrote:
> On 2016/9/19 22:07, Mark Rutland wrote:
> > On Mon, Sep 19, 2016 at 09:05:26PM +0800, Yisheng Xie wrote:
> > Can you modify the warning in cpumask.h to dump the bad CPU number? That
> > would make it fairly clear if that's the case.
> > 
> hi Mark,
> I dump the bad CPU number, it is 64,
> And the cpumask get from task is ,.
> 
> [3.873044] select_task_rq: allowed 0, allow_cpumask ,
> [3.879727] cpumask_check: cpu 64, nr_cpumask_bits:64, nr_cpu_ids= 64
> [3.895989] [ cut here ]
> [3.900652] WARNING: CPU: 16 PID: 103 at ./include/linux/cpumask.h:122 
> try_to_wake_up+0x410/0x4ac

Can you look at this patch from David, please:

http://lists.infradead.org/pipermail/linux-arm-kernel/2016-September/458110.html

and offer a Tested-by if it fixes your problem?

Thanks,

Will


Re: [RFC] Arm64 boot fail with numa enable in BIOS

2016-09-19 Thread Yisheng Xie


On 2016/9/20 10:01, Ming Lei wrote:
> On Mon, Sep 19, 2016 at 9:05 PM, Yisheng Xie  wrote:
>> hi all,
>> When I enable NUMA in BIOS for arm64, it failed to boot on 
>> v4.8-rc4-162-g071e31e.
>> For the crash log, it seems caused by error number of cpumask.
>> Any ideas about it?
> 
> When I played v4.7 on ARM64 with NUMA, I saw the same issue[1] too,
> but it can be avoided by reverting e9d867a(sched: Allow per-cpu kernel
> threads to run on online && !active).
> 
> But with v4.8-rc6, looks the issue can't be observed any more, so I guess
> it has been fixed with some recent patch.
> 
> 
> [1] https://lkml.org/lkml/2016/8/8/74
> 
> Thanks,
> 
Hi Ming,
Thanks for this info.
Do you use the same config as me? I have tried on 4.8.0-rc6-00331-gb01cf67
also have the same problem.

Thanks
Yisheng Xie

>>
>> Thanks.
>>
>> The related config and detail dmesg can be seen in the attachment.
>>
>> --- crash messages ---
>> [1.279155] [ cut here ]
>> [1.537146] WARNING: CPU: 16 PID: 103 at ./include/linux/cpumask.h:121 
>> try_to_wake_up+0x298/0x300
>> [1.546112] Modules linked in:
>> [1.549190]
>> [1.550687] CPU: 16 PID: 103 Comm: cpuhp/16 Tainted: GW   
>> 4.8.0-rc4-00163-g803ea3a #21
>> [1.559741] Hardware name: Hisilicon Hi1616 Evaluation Board (DT)
>> [1.565896] task: 8013e9678000 task.stack: 8013e9674000
>> [1.571874] PC is at try_to_wake_up+0x298/0x300
>> [1.576446] LR is at try_to_wake_up+0x278/0x300
>> [1.581019] pc : [] lr : [] pstate: 
>> 20c5
>> [1.588490] sp : 8013e9677b90
>> [1.591832] x29: 8013e9677b90 x28: 8413eb81a4b0
>> [1.597196] x27: 008c x26: 08d6e840
>> [1.602561] x25: 0004 x24: 8013e96e82e0
>> [1.607925] x23: 0040 x22: 00c0
>> [1.613289] x21: 8013e96e868c x20: 
>> [1.618653] x19: 8013e96e8000 x18: 
>> [1.624018] x17:  x16: 03010066
>> [1.629381] x15: 08ca8000 x14: 0003
>> [1.634745] x13: 0026 x12: 0009
>> [1.640109] x11: 0009 x10: 
>> [1.645472] x9 :  x8 : 0014
>> [1.650837] x7 : 8013e9452e00 x6 : 
>> [1.656200] x5 :  x4 : 
>> [1.661565] x3 :  x2 : 0040
>> [1.666929] x1 : 0001 x0 : 08d63df9
>> [1.672293]
>> [1.673788] ---[ end trace b58e70f3295a8cd8 ]---
>> [1.678448] Call trace:
>> [1.680911] Exception stack(0x8013e96779c0 to 0x8013e9677af0)
>> [1.687417] 79c0: 8013e96e8000 0001 8013e9677b90 
>> 080df66c
>> [1.695329] 79e0:  0808e1f4  
>> 8013e9d30c80
>> [1.703242] 7a00: 8013e9677a20 0882b6f4 8013e9677a60 
>> 080dd384
>> [1.711153] 7a20:  8013e9677b00 08cbaa00 
>> 08d6e000
>> [1.719065] 7a40:   0001 
>> 0080
>> [1.726977] 7a60: 08d63df9 0001 0040 
>> 
>> [1.734889] 7a80:    
>> 8013e9452e00
>> [1.742801] 7aa0: 0014   
>> 0009
>> [1.750713] 7ac0: 0009 0026 0003 
>> 08ca8000
>> [1.758624] 7ae0: 03010066 
>> [1.763548] [] try_to_wake_up+0x298/0x300
>> [1.769175] [] wake_up_process+0x14/0x1c
>> [1.774716] [] create_worker+0x108/0x194
>> [1.780255] [] alloc_unbound_pwq+0x1e4/0x398
>> [1.786146] [] wq_update_unbound_numa+0xdc/0x190
>> [1.792389] [] workqueue_online_cpu+0x254/0x2a8
>> [1.798545] [] cpuhp_up_callbacks+0x54/0x100
>> [1.804436] [] cpuhp_thread_fun+0x12c/0x13c
>> [1.810240] [] smpboot_thread_fn+0x1a8/0x1cc
>> [1.816130] [] kthread+0xd4/0xe8
>> [1.820967] [] ret_from_fork+0x10/0x40
>> [1.826334] Unable to handle kernel paging request at virtual address 
>> fffe841404c71524
>> [1.834333] pgd = 08dae000
>> [1.837762] [fffe841404c71524] *pgd=0413fbfee003, 
>> *pud=
>> [1.844797] Internal error: Oops: 9604 [#1] SMP
>> [1.849720] Modules linked in:
>> [1.852799] CPU: 16 PID: 103 Comm: cpuhp/16 Tainted: GW   
>> 4.8.0-rc4-00163-g803ea3a #21
>> [1.861853] Hardware name: Hisilicon Hi1616 Evaluation Board (DT)
>> [1.868007] task: 8013e9678000 task.stack: 8013e9674000
>> [1.873985] PC is at try_to_wake_up+0x148/0x300
>> [1.878557] LR is at try_to_wake_up+0x11c/0x300
>> [1.883129] pc : [] lr : [] pstate: 
>> 60c5
>> [1.890602] sp : 8013e9677b90
>> [1.893943] x29: 

Re: [RFC] Arm64 boot fail with numa enable in BIOS

2016-09-19 Thread Yisheng Xie


On 2016/9/20 10:01, Ming Lei wrote:
> On Mon, Sep 19, 2016 at 9:05 PM, Yisheng Xie  wrote:
>> hi all,
>> When I enable NUMA in BIOS for arm64, it failed to boot on 
>> v4.8-rc4-162-g071e31e.
>> For the crash log, it seems caused by error number of cpumask.
>> Any ideas about it?
> 
> When I played v4.7 on ARM64 with NUMA, I saw the same issue[1] too,
> but it can be avoided by reverting e9d867a(sched: Allow per-cpu kernel
> threads to run on online && !active).
> 
> But with v4.8-rc6, looks the issue can't be observed any more, so I guess
> it has been fixed with some recent patch.
> 
> 
> [1] https://lkml.org/lkml/2016/8/8/74
> 
> Thanks,
> 
Hi Ming,
Thanks for this info.
Do you use the same config as me? I have tried on 4.8.0-rc6-00331-gb01cf67
also have the same problem.

Thanks
Yisheng Xie

>>
>> Thanks.
>>
>> The related config and detail dmesg can be seen in the attachment.
>>
>> --- crash messages ---
>> [1.279155] [ cut here ]
>> [1.537146] WARNING: CPU: 16 PID: 103 at ./include/linux/cpumask.h:121 
>> try_to_wake_up+0x298/0x300
>> [1.546112] Modules linked in:
>> [1.549190]
>> [1.550687] CPU: 16 PID: 103 Comm: cpuhp/16 Tainted: GW   
>> 4.8.0-rc4-00163-g803ea3a #21
>> [1.559741] Hardware name: Hisilicon Hi1616 Evaluation Board (DT)
>> [1.565896] task: 8013e9678000 task.stack: 8013e9674000
>> [1.571874] PC is at try_to_wake_up+0x298/0x300
>> [1.576446] LR is at try_to_wake_up+0x278/0x300
>> [1.581019] pc : [] lr : [] pstate: 
>> 20c5
>> [1.588490] sp : 8013e9677b90
>> [1.591832] x29: 8013e9677b90 x28: 8413eb81a4b0
>> [1.597196] x27: 008c x26: 08d6e840
>> [1.602561] x25: 0004 x24: 8013e96e82e0
>> [1.607925] x23: 0040 x22: 00c0
>> [1.613289] x21: 8013e96e868c x20: 
>> [1.618653] x19: 8013e96e8000 x18: 
>> [1.624018] x17:  x16: 03010066
>> [1.629381] x15: 08ca8000 x14: 0003
>> [1.634745] x13: 0026 x12: 0009
>> [1.640109] x11: 0009 x10: 
>> [1.645472] x9 :  x8 : 0014
>> [1.650837] x7 : 8013e9452e00 x6 : 
>> [1.656200] x5 :  x4 : 
>> [1.661565] x3 :  x2 : 0040
>> [1.666929] x1 : 0001 x0 : 08d63df9
>> [1.672293]
>> [1.673788] ---[ end trace b58e70f3295a8cd8 ]---
>> [1.678448] Call trace:
>> [1.680911] Exception stack(0x8013e96779c0 to 0x8013e9677af0)
>> [1.687417] 79c0: 8013e96e8000 0001 8013e9677b90 
>> 080df66c
>> [1.695329] 79e0:  0808e1f4  
>> 8013e9d30c80
>> [1.703242] 7a00: 8013e9677a20 0882b6f4 8013e9677a60 
>> 080dd384
>> [1.711153] 7a20:  8013e9677b00 08cbaa00 
>> 08d6e000
>> [1.719065] 7a40:   0001 
>> 0080
>> [1.726977] 7a60: 08d63df9 0001 0040 
>> 
>> [1.734889] 7a80:    
>> 8013e9452e00
>> [1.742801] 7aa0: 0014   
>> 0009
>> [1.750713] 7ac0: 0009 0026 0003 
>> 08ca8000
>> [1.758624] 7ae0: 03010066 
>> [1.763548] [] try_to_wake_up+0x298/0x300
>> [1.769175] [] wake_up_process+0x14/0x1c
>> [1.774716] [] create_worker+0x108/0x194
>> [1.780255] [] alloc_unbound_pwq+0x1e4/0x398
>> [1.786146] [] wq_update_unbound_numa+0xdc/0x190
>> [1.792389] [] workqueue_online_cpu+0x254/0x2a8
>> [1.798545] [] cpuhp_up_callbacks+0x54/0x100
>> [1.804436] [] cpuhp_thread_fun+0x12c/0x13c
>> [1.810240] [] smpboot_thread_fn+0x1a8/0x1cc
>> [1.816130] [] kthread+0xd4/0xe8
>> [1.820967] [] ret_from_fork+0x10/0x40
>> [1.826334] Unable to handle kernel paging request at virtual address 
>> fffe841404c71524
>> [1.834333] pgd = 08dae000
>> [1.837762] [fffe841404c71524] *pgd=0413fbfee003, 
>> *pud=
>> [1.844797] Internal error: Oops: 9604 [#1] SMP
>> [1.849720] Modules linked in:
>> [1.852799] CPU: 16 PID: 103 Comm: cpuhp/16 Tainted: GW   
>> 4.8.0-rc4-00163-g803ea3a #21
>> [1.861853] Hardware name: Hisilicon Hi1616 Evaluation Board (DT)
>> [1.868007] task: 8013e9678000 task.stack: 8013e9674000
>> [1.873985] PC is at try_to_wake_up+0x148/0x300
>> [1.878557] LR is at try_to_wake_up+0x11c/0x300
>> [1.883129] pc : [] lr : [] pstate: 
>> 60c5
>> [1.890602] sp : 8013e9677b90
>> [1.893943] x29: 8013e9677b90 x28: 

Re: [RFC] Arm64 boot fail with numa enable in BIOS

2016-09-19 Thread Yisheng Xie


On 2016/9/19 22:07, Mark Rutland wrote:
> [adding LAKML, arm64 maintainers]
> 
> On Mon, Sep 19, 2016 at 09:05:26PM +0800, Yisheng Xie wrote:
>> hi all,
> 
> Can you modify the warning in cpumask.h to dump the bad CPU number? That
> would make it fairly clear if that's the case.
> 
hi Mark,
I dump the bad CPU number, it is 64,
And the cpumask get from task is ,.

[3.873044] select_task_rq: allowed 0, allow_cpumask ,
[3.879727] cpumask_check: cpu 64, nr_cpumask_bits:64, nr_cpu_ids= 64
[3.895989] [ cut here ]
[3.900652] WARNING: CPU: 16 PID: 103 at ./include/linux/cpumask.h:122 
try_to_wake_up+0x410/0x4ac

Thanks.
Yisheng Xie

> Thanks,
> Mark.
> 
>> [0.297337] Detected PIPT I-cache on CPU1
>> [0.297347] GICv3: CPU1: found redistributor 10001 region 
>> 1:0x4d14
>> [0.297356] CPU1: Booted secondary processor [410fd082]
>> [0.297375] [ cut here ]
>> [0.320390] WARNING: CPU: 1 PID: 0 at ./include/linux/cpumask.h:121 
>> gic_raise_softirq+0x128/0x17c
>> [0.329356] Modules linked in:
>> [0.332434] 
>> [0.333932] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 
>> 4.8.0-rc4-00163-g803ea3a #21
>> [0.341581] Hardware name: Hisilicon Hi1616 Evaluation Board (DT)
>> [0.347735] task: 8013e9dd task.stack: 8013e9dcc000
>> [0.353714] PC is at gic_raise_softirq+0x128/0x17c
>> [0.358550] LR is at gic_raise_softirq+0xa0/0x17c
>> [0.363298] pc : [] lr : [] pstate: 
>> 21c5
>> [0.370770] sp : 8013e9dcfde0
>> [0.374112] x29: 8013e9dcfde0 x28:  
>> [0.379476] x27: 0083207c x26: 08ca5d70 
>> [0.384841] x25: 00010001 x24: 08d63ff3 
>> [0.390205] x23:  x22: 08cb 
>> [0.395569] x21: 0884edb0 x20: 0001 
>> [0.400933] x19: 0001 x18:  
>> [0.406298] x17:  x16: 03010066 
>> [0.411661] x15: 08ca8000 x14: 0013 
>> [0.417025] x13:  x12: 0013 
>> [0.422389] x11: 0013 x10: 02e92aa7 
>> [0.427754] x9 :  x8 : 8413eb6ca668 
>> [0.433118] x7 : 8413eb6ca690 x6 :  
>> [0.438482] x5 : fffe x4 :  
>> [0.443845] x3 : 0040 x2 : 0041 
>> [0.449209] x1 :  x0 : 0001 
>> [0.454573] 
>> [0.456069] ---[ end trace b58e70f3295a8cd7 ]---
>> [0.460730] Call trace:
>> [0.463193] Exception stack(0x8013e9dcfc10 to 0x8013e9dcfd40)
>> [0.469699] fc00:   0001 
>> 0001
>> [0.477611] fc20: 8013e9dcfde0 0838c124 08d72228 
>> 8013e9dcff70
>> [0.485524] fc40: 08d72608 08ab02a4  
>> 
>> [0.493436] fc60:  3464313430303030  
>> 
>> [0.501348] fc80: 8013e9dcfc90 0836e678 8013e9dcfca0 
>> 0836e910
>> [0.509259] fca0: 8013e9dcfd30 0836ec10 0001 
>> 
>> [0.517171] fcc0: 0041 0040  
>> fffe
>> [0.525083] fce0:  8413eb6ca690 8413eb6ca668 
>> 
>> [0.532995] fd00: 02e92aa7 0013 0013 
>> 
>> [0.540907] fd20: 0013 08ca8000 03010066 
>> 
>> [0.548819] [] gic_raise_softirq+0x128/0x17c
>> [0.554713] [] smp_send_reschedule+0x34/0x3c
>> [0.560605] [] resched_curr+0x40/0x5c
>> [0.565881] [] check_preempt_curr+0x58/0xa0
>> [0.571685] [] ttwu_do_wakeup+0x18/0x80
>> [0.577136] [] ttwu_do_activate+0x78/0x88
>> [0.582763] [] try_to_wake_up+0x1f8/0x300
>> [0.588390] [] default_wake_function+0x10/0x18
>> [0.594458] [] __wake_up_common+0x5c/0x9c
>> [0.600085] [] __wake_up_locked+0x14/0x1c
>> [0.605712] [] complete+0x40/0x5c
>> [0.610635] [] secondary_start_kernel+0x148/0x1a8
>> [0.616965] [<000831a8>] 0x831a8
> 
> .
> 



Re: [RFC] Arm64 boot fail with numa enable in BIOS

2016-09-19 Thread Yisheng Xie


On 2016/9/19 22:07, Mark Rutland wrote:
> [adding LAKML, arm64 maintainers]
> 
> On Mon, Sep 19, 2016 at 09:05:26PM +0800, Yisheng Xie wrote:
>> hi all,
> 
> Can you modify the warning in cpumask.h to dump the bad CPU number? That
> would make it fairly clear if that's the case.
> 
hi Mark,
I dump the bad CPU number, it is 64,
And the cpumask get from task is ,.

[3.873044] select_task_rq: allowed 0, allow_cpumask ,
[3.879727] cpumask_check: cpu 64, nr_cpumask_bits:64, nr_cpu_ids= 64
[3.895989] [ cut here ]
[3.900652] WARNING: CPU: 16 PID: 103 at ./include/linux/cpumask.h:122 
try_to_wake_up+0x410/0x4ac

Thanks.
Yisheng Xie

> Thanks,
> Mark.
> 
>> [0.297337] Detected PIPT I-cache on CPU1
>> [0.297347] GICv3: CPU1: found redistributor 10001 region 
>> 1:0x4d14
>> [0.297356] CPU1: Booted secondary processor [410fd082]
>> [0.297375] [ cut here ]
>> [0.320390] WARNING: CPU: 1 PID: 0 at ./include/linux/cpumask.h:121 
>> gic_raise_softirq+0x128/0x17c
>> [0.329356] Modules linked in:
>> [0.332434] 
>> [0.333932] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 
>> 4.8.0-rc4-00163-g803ea3a #21
>> [0.341581] Hardware name: Hisilicon Hi1616 Evaluation Board (DT)
>> [0.347735] task: 8013e9dd task.stack: 8013e9dcc000
>> [0.353714] PC is at gic_raise_softirq+0x128/0x17c
>> [0.358550] LR is at gic_raise_softirq+0xa0/0x17c
>> [0.363298] pc : [] lr : [] pstate: 
>> 21c5
>> [0.370770] sp : 8013e9dcfde0
>> [0.374112] x29: 8013e9dcfde0 x28:  
>> [0.379476] x27: 0083207c x26: 08ca5d70 
>> [0.384841] x25: 00010001 x24: 08d63ff3 
>> [0.390205] x23:  x22: 08cb 
>> [0.395569] x21: 0884edb0 x20: 0001 
>> [0.400933] x19: 0001 x18:  
>> [0.406298] x17:  x16: 03010066 
>> [0.411661] x15: 08ca8000 x14: 0013 
>> [0.417025] x13:  x12: 0013 
>> [0.422389] x11: 0013 x10: 02e92aa7 
>> [0.427754] x9 :  x8 : 8413eb6ca668 
>> [0.433118] x7 : 8413eb6ca690 x6 :  
>> [0.438482] x5 : fffe x4 :  
>> [0.443845] x3 : 0040 x2 : 0041 
>> [0.449209] x1 :  x0 : 0001 
>> [0.454573] 
>> [0.456069] ---[ end trace b58e70f3295a8cd7 ]---
>> [0.460730] Call trace:
>> [0.463193] Exception stack(0x8013e9dcfc10 to 0x8013e9dcfd40)
>> [0.469699] fc00:   0001 
>> 0001
>> [0.477611] fc20: 8013e9dcfde0 0838c124 08d72228 
>> 8013e9dcff70
>> [0.485524] fc40: 08d72608 08ab02a4  
>> 
>> [0.493436] fc60:  3464313430303030  
>> 
>> [0.501348] fc80: 8013e9dcfc90 0836e678 8013e9dcfca0 
>> 0836e910
>> [0.509259] fca0: 8013e9dcfd30 0836ec10 0001 
>> 
>> [0.517171] fcc0: 0041 0040  
>> fffe
>> [0.525083] fce0:  8413eb6ca690 8413eb6ca668 
>> 
>> [0.532995] fd00: 02e92aa7 0013 0013 
>> 
>> [0.540907] fd20: 0013 08ca8000 03010066 
>> 
>> [0.548819] [] gic_raise_softirq+0x128/0x17c
>> [0.554713] [] smp_send_reschedule+0x34/0x3c
>> [0.560605] [] resched_curr+0x40/0x5c
>> [0.565881] [] check_preempt_curr+0x58/0xa0
>> [0.571685] [] ttwu_do_wakeup+0x18/0x80
>> [0.577136] [] ttwu_do_activate+0x78/0x88
>> [0.582763] [] try_to_wake_up+0x1f8/0x300
>> [0.588390] [] default_wake_function+0x10/0x18
>> [0.594458] [] __wake_up_common+0x5c/0x9c
>> [0.600085] [] __wake_up_locked+0x14/0x1c
>> [0.605712] [] complete+0x40/0x5c
>> [0.610635] [] secondary_start_kernel+0x148/0x1a8
>> [0.616965] [<000831a8>] 0x831a8
> 
> .
> 



Re: [RFC] Arm64 boot fail with numa enable in BIOS

2016-09-19 Thread Hanjun Guo
On 2016/9/19 22:07, Mark Rutland wrote:
> [adding LAKML, arm64 maintainers]
>
> On Mon, Sep 19, 2016 at 09:05:26PM +0800, Yisheng Xie wrote:
>> hi all,
> Hi,
>
> In future, please make sure to Cc LAKML along with relevant parties when
> sending arm64 patches/queries.
>
> For everyone newly Cc'd, the original message (with attachments) can be
> found at:
>
> http://lkml.kernel.org/r/7618d76d-bfa8-d8aa-59aa-06f9d90c1...@huawei.com
>
>> When I enable NUMA in BIOS for arm64, it failed to boot on 
>> v4.8-rc4-162-g071e31e.
> That commit ID doesn't seem to be in mainline (I can't find it in my
> local tree). Which tree are you using? Do you have local patches
> applied?

Yes, we have GICv3 ITS and mbigen patches on top which trying to enable PCI msi
and native SAS on the board.

>
> I take it that by "enable NUMA in BIOS", you mean exposing SRAT to the
> OS?

Yes, SRAT and SLIT.

>
>> For the crash log, it seems caused by error number of cpumask.
>> Any ideas about it?
> Much earlier in your log, there was a (non-fatal) warning, as below. Do
> you see this without NUMA/SRAT enabled in your FW? 

Works ok without NUMA/SRAT enabled, we will check the SRAT table.

> I don't see how the
>
> SRAT should affect the secondaries we try to bring online.

Yes, CPU masks and secondaries boot up is related MADT not SRAT.

Thanks
Hanjun

>
> Given your MPIDRs have Aff2 bits set, I wonder if we've conflated a
> logical ID with a physical ID somewhere, and it just so happens that the
> NUMA code is more likely to poke something based on that.
>
> Can you modify the warning in cpumask.h to dump the bad CPU number? That
> would make it fairly clear if that's the case.
>
> Thanks,
> Mark.
>
>> [0.297337] Detected PIPT I-cache on CPU1
>> [0.297347] GICv3: CPU1: found redistributor 10001 region 
>> 1:0x4d14
>> [0.297356] CPU1: Booted secondary processor [410fd082]
>> [0.297375] [ cut here ]
>> [0.320390] WARNING: CPU: 1 PID: 0 at ./include/linux/cpumask.h:121 
>> gic_raise_softirq+0x128/0x17c
>> [0.329356] Modules linked in:
>> [0.332434] 
>> [0.333932] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 
>> 4.8.0-rc4-00163-g803ea3a #21
>> [0.341581] Hardware name: Hisilicon Hi1616 Evaluation Board (DT)
>> [0.347735] task: 8013e9dd task.stack: 8013e9dcc000
>> [0.353714] PC is at gic_raise_softirq+0x128/0x17c
>> [0.358550] LR is at gic_raise_softirq+0xa0/0x17c
>> [0.363298] pc : [] lr : [] pstate: 
>> 21c5
>> [0.370770] sp : 8013e9dcfde0
>> [0.374112] x29: 8013e9dcfde0 x28:  
>> [0.379476] x27: 0083207c x26: 08ca5d70 
>> [0.384841] x25: 00010001 x24: 08d63ff3 
>> [0.390205] x23:  x22: 08cb 
>> [0.395569] x21: 0884edb0 x20: 0001 
>> [0.400933] x19: 0001 x18:  
>> [0.406298] x17:  x16: 03010066 
>> [0.411661] x15: 08ca8000 x14: 0013 
>> [0.417025] x13:  x12: 0013 
>> [0.422389] x11: 0013 x10: 02e92aa7 
>> [0.427754] x9 :  x8 : 8413eb6ca668 
>> [0.433118] x7 : 8413eb6ca690 x6 :  
>> [0.438482] x5 : fffe x4 :  
>> [0.443845] x3 : 0040 x2 : 0041 
>> [0.449209] x1 :  x0 : 0001 
>> [0.454573] 
>> [0.456069] ---[ end trace b58e70f3295a8cd7 ]---
>> [0.460730] Call trace:
>> [0.463193] Exception stack(0x8013e9dcfc10 to 0x8013e9dcfd40)
>> [0.469699] fc00:   0001 
>> 0001
>> [0.477611] fc20: 8013e9dcfde0 0838c124 08d72228 
>> 8013e9dcff70
>> [0.485524] fc40: 08d72608 08ab02a4  
>> 
>> [0.493436] fc60:  3464313430303030  
>> 
>> [0.501348] fc80: 8013e9dcfc90 0836e678 8013e9dcfca0 
>> 0836e910
>> [0.509259] fca0: 8013e9dcfd30 0836ec10 0001 
>> 
>> [0.517171] fcc0: 0041 0040  
>> fffe
>> [0.525083] fce0:  8413eb6ca690 8413eb6ca668 
>> 
>> [0.532995] fd00: 02e92aa7 0013 0013 
>> 
>> [0.540907] fd20: 0013 08ca8000 03010066 
>> 
>> [0.548819] [] gic_raise_softirq+0x128/0x17c
>> [0.554713] [] smp_send_reschedule+0x34/0x3c
>> [0.560605] [] resched_curr+0x40/0x5c
>> [0.565881] [] check_preempt_curr+0x58/0xa0
>> [0.571685] [] ttwu_do_wakeup+0x18/0x80
>> [0.577136] [] ttwu_do_activate+0x78/0x88
>> [0.582763] [] try_to_wake_up+0x1f8/0x300
>> [0.588390] 

Re: [RFC] Arm64 boot fail with numa enable in BIOS

2016-09-19 Thread Hanjun Guo
On 2016/9/19 22:07, Mark Rutland wrote:
> [adding LAKML, arm64 maintainers]
>
> On Mon, Sep 19, 2016 at 09:05:26PM +0800, Yisheng Xie wrote:
>> hi all,
> Hi,
>
> In future, please make sure to Cc LAKML along with relevant parties when
> sending arm64 patches/queries.
>
> For everyone newly Cc'd, the original message (with attachments) can be
> found at:
>
> http://lkml.kernel.org/r/7618d76d-bfa8-d8aa-59aa-06f9d90c1...@huawei.com
>
>> When I enable NUMA in BIOS for arm64, it failed to boot on 
>> v4.8-rc4-162-g071e31e.
> That commit ID doesn't seem to be in mainline (I can't find it in my
> local tree). Which tree are you using? Do you have local patches
> applied?

Yes, we have GICv3 ITS and mbigen patches on top which trying to enable PCI msi
and native SAS on the board.

>
> I take it that by "enable NUMA in BIOS", you mean exposing SRAT to the
> OS?

Yes, SRAT and SLIT.

>
>> For the crash log, it seems caused by error number of cpumask.
>> Any ideas about it?
> Much earlier in your log, there was a (non-fatal) warning, as below. Do
> you see this without NUMA/SRAT enabled in your FW? 

Works ok without NUMA/SRAT enabled, we will check the SRAT table.

> I don't see how the
>
> SRAT should affect the secondaries we try to bring online.

Yes, CPU masks and secondaries boot up is related MADT not SRAT.

Thanks
Hanjun

>
> Given your MPIDRs have Aff2 bits set, I wonder if we've conflated a
> logical ID with a physical ID somewhere, and it just so happens that the
> NUMA code is more likely to poke something based on that.
>
> Can you modify the warning in cpumask.h to dump the bad CPU number? That
> would make it fairly clear if that's the case.
>
> Thanks,
> Mark.
>
>> [0.297337] Detected PIPT I-cache on CPU1
>> [0.297347] GICv3: CPU1: found redistributor 10001 region 
>> 1:0x4d14
>> [0.297356] CPU1: Booted secondary processor [410fd082]
>> [0.297375] [ cut here ]
>> [0.320390] WARNING: CPU: 1 PID: 0 at ./include/linux/cpumask.h:121 
>> gic_raise_softirq+0x128/0x17c
>> [0.329356] Modules linked in:
>> [0.332434] 
>> [0.333932] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 
>> 4.8.0-rc4-00163-g803ea3a #21
>> [0.341581] Hardware name: Hisilicon Hi1616 Evaluation Board (DT)
>> [0.347735] task: 8013e9dd task.stack: 8013e9dcc000
>> [0.353714] PC is at gic_raise_softirq+0x128/0x17c
>> [0.358550] LR is at gic_raise_softirq+0xa0/0x17c
>> [0.363298] pc : [] lr : [] pstate: 
>> 21c5
>> [0.370770] sp : 8013e9dcfde0
>> [0.374112] x29: 8013e9dcfde0 x28:  
>> [0.379476] x27: 0083207c x26: 08ca5d70 
>> [0.384841] x25: 00010001 x24: 08d63ff3 
>> [0.390205] x23:  x22: 08cb 
>> [0.395569] x21: 0884edb0 x20: 0001 
>> [0.400933] x19: 0001 x18:  
>> [0.406298] x17:  x16: 03010066 
>> [0.411661] x15: 08ca8000 x14: 0013 
>> [0.417025] x13:  x12: 0013 
>> [0.422389] x11: 0013 x10: 02e92aa7 
>> [0.427754] x9 :  x8 : 8413eb6ca668 
>> [0.433118] x7 : 8413eb6ca690 x6 :  
>> [0.438482] x5 : fffe x4 :  
>> [0.443845] x3 : 0040 x2 : 0041 
>> [0.449209] x1 :  x0 : 0001 
>> [0.454573] 
>> [0.456069] ---[ end trace b58e70f3295a8cd7 ]---
>> [0.460730] Call trace:
>> [0.463193] Exception stack(0x8013e9dcfc10 to 0x8013e9dcfd40)
>> [0.469699] fc00:   0001 
>> 0001
>> [0.477611] fc20: 8013e9dcfde0 0838c124 08d72228 
>> 8013e9dcff70
>> [0.485524] fc40: 08d72608 08ab02a4  
>> 
>> [0.493436] fc60:  3464313430303030  
>> 
>> [0.501348] fc80: 8013e9dcfc90 0836e678 8013e9dcfca0 
>> 0836e910
>> [0.509259] fca0: 8013e9dcfd30 0836ec10 0001 
>> 
>> [0.517171] fcc0: 0041 0040  
>> fffe
>> [0.525083] fce0:  8413eb6ca690 8413eb6ca668 
>> 
>> [0.532995] fd00: 02e92aa7 0013 0013 
>> 
>> [0.540907] fd20: 0013 08ca8000 03010066 
>> 
>> [0.548819] [] gic_raise_softirq+0x128/0x17c
>> [0.554713] [] smp_send_reschedule+0x34/0x3c
>> [0.560605] [] resched_curr+0x40/0x5c
>> [0.565881] [] check_preempt_curr+0x58/0xa0
>> [0.571685] [] ttwu_do_wakeup+0x18/0x80
>> [0.577136] [] ttwu_do_activate+0x78/0x88
>> [0.582763] [] try_to_wake_up+0x1f8/0x300
>> [0.588390] 

Re: [RFC] Arm64 boot fail with numa enable in BIOS

2016-09-19 Thread Ming Lei
On Mon, Sep 19, 2016 at 9:05 PM, Yisheng Xie  wrote:
> hi all,
> When I enable NUMA in BIOS for arm64, it failed to boot on 
> v4.8-rc4-162-g071e31e.
> For the crash log, it seems caused by error number of cpumask.
> Any ideas about it?

When I played v4.7 on ARM64 with NUMA, I saw the same issue[1] too,
but it can be avoided by reverting e9d867a(sched: Allow per-cpu kernel
threads to run on online && !active).

But with v4.8-rc6, looks the issue can't be observed any more, so I guess
it has been fixed with some recent patch.


[1] https://lkml.org/lkml/2016/8/8/74

Thanks,

>
> Thanks.
>
> The related config and detail dmesg can be seen in the attachment.
>
> --- crash messages ---
> [1.279155] [ cut here ]
> [1.537146] WARNING: CPU: 16 PID: 103 at ./include/linux/cpumask.h:121 
> try_to_wake_up+0x298/0x300
> [1.546112] Modules linked in:
> [1.549190]
> [1.550687] CPU: 16 PID: 103 Comm: cpuhp/16 Tainted: GW   
> 4.8.0-rc4-00163-g803ea3a #21
> [1.559741] Hardware name: Hisilicon Hi1616 Evaluation Board (DT)
> [1.565896] task: 8013e9678000 task.stack: 8013e9674000
> [1.571874] PC is at try_to_wake_up+0x298/0x300
> [1.576446] LR is at try_to_wake_up+0x278/0x300
> [1.581019] pc : [] lr : [] pstate: 
> 20c5
> [1.588490] sp : 8013e9677b90
> [1.591832] x29: 8013e9677b90 x28: 8413eb81a4b0
> [1.597196] x27: 008c x26: 08d6e840
> [1.602561] x25: 0004 x24: 8013e96e82e0
> [1.607925] x23: 0040 x22: 00c0
> [1.613289] x21: 8013e96e868c x20: 
> [1.618653] x19: 8013e96e8000 x18: 
> [1.624018] x17:  x16: 03010066
> [1.629381] x15: 08ca8000 x14: 0003
> [1.634745] x13: 0026 x12: 0009
> [1.640109] x11: 0009 x10: 
> [1.645472] x9 :  x8 : 0014
> [1.650837] x7 : 8013e9452e00 x6 : 
> [1.656200] x5 :  x4 : 
> [1.661565] x3 :  x2 : 0040
> [1.666929] x1 : 0001 x0 : 08d63df9
> [1.672293]
> [1.673788] ---[ end trace b58e70f3295a8cd8 ]---
> [1.678448] Call trace:
> [1.680911] Exception stack(0x8013e96779c0 to 0x8013e9677af0)
> [1.687417] 79c0: 8013e96e8000 0001 8013e9677b90 
> 080df66c
> [1.695329] 79e0:  0808e1f4  
> 8013e9d30c80
> [1.703242] 7a00: 8013e9677a20 0882b6f4 8013e9677a60 
> 080dd384
> [1.711153] 7a20:  8013e9677b00 08cbaa00 
> 08d6e000
> [1.719065] 7a40:   0001 
> 0080
> [1.726977] 7a60: 08d63df9 0001 0040 
> 
> [1.734889] 7a80:    
> 8013e9452e00
> [1.742801] 7aa0: 0014   
> 0009
> [1.750713] 7ac0: 0009 0026 0003 
> 08ca8000
> [1.758624] 7ae0: 03010066 
> [1.763548] [] try_to_wake_up+0x298/0x300
> [1.769175] [] wake_up_process+0x14/0x1c
> [1.774716] [] create_worker+0x108/0x194
> [1.780255] [] alloc_unbound_pwq+0x1e4/0x398
> [1.786146] [] wq_update_unbound_numa+0xdc/0x190
> [1.792389] [] workqueue_online_cpu+0x254/0x2a8
> [1.798545] [] cpuhp_up_callbacks+0x54/0x100
> [1.804436] [] cpuhp_thread_fun+0x12c/0x13c
> [1.810240] [] smpboot_thread_fn+0x1a8/0x1cc
> [1.816130] [] kthread+0xd4/0xe8
> [1.820967] [] ret_from_fork+0x10/0x40
> [1.826334] Unable to handle kernel paging request at virtual address 
> fffe841404c71524
> [1.834333] pgd = 08dae000
> [1.837762] [fffe841404c71524] *pgd=0413fbfee003, *pud=
> [1.844797] Internal error: Oops: 9604 [#1] SMP
> [1.849720] Modules linked in:
> [1.852799] CPU: 16 PID: 103 Comm: cpuhp/16 Tainted: GW   
> 4.8.0-rc4-00163-g803ea3a #21
> [1.861853] Hardware name: Hisilicon Hi1616 Evaluation Board (DT)
> [1.868007] task: 8013e9678000 task.stack: 8013e9674000
> [1.873985] PC is at try_to_wake_up+0x148/0x300
> [1.878557] LR is at try_to_wake_up+0x11c/0x300
> [1.883129] pc : [] lr : [] pstate: 
> 60c5
> [1.890602] sp : 8013e9677b90
> [1.893943] x29: 8013e9677b90 x28: 8413eb81a4b0
> [1.899307] x27: 008c x26: 08d6e840
> [1.904670] x25: 08ca5f10 x24: 08c77600
> [1.910033] x23: 0040 x22: 00c0
> [1.915398] x21: 8013e96e868c x20: 0004
> [1.920761] x19: 8013e96e8000 x18: 

Re: [RFC] Arm64 boot fail with numa enable in BIOS

2016-09-19 Thread Ming Lei
On Mon, Sep 19, 2016 at 9:05 PM, Yisheng Xie  wrote:
> hi all,
> When I enable NUMA in BIOS for arm64, it failed to boot on 
> v4.8-rc4-162-g071e31e.
> For the crash log, it seems caused by error number of cpumask.
> Any ideas about it?

When I played v4.7 on ARM64 with NUMA, I saw the same issue[1] too,
but it can be avoided by reverting e9d867a(sched: Allow per-cpu kernel
threads to run on online && !active).

But with v4.8-rc6, looks the issue can't be observed any more, so I guess
it has been fixed with some recent patch.


[1] https://lkml.org/lkml/2016/8/8/74

Thanks,

>
> Thanks.
>
> The related config and detail dmesg can be seen in the attachment.
>
> --- crash messages ---
> [1.279155] [ cut here ]
> [1.537146] WARNING: CPU: 16 PID: 103 at ./include/linux/cpumask.h:121 
> try_to_wake_up+0x298/0x300
> [1.546112] Modules linked in:
> [1.549190]
> [1.550687] CPU: 16 PID: 103 Comm: cpuhp/16 Tainted: GW   
> 4.8.0-rc4-00163-g803ea3a #21
> [1.559741] Hardware name: Hisilicon Hi1616 Evaluation Board (DT)
> [1.565896] task: 8013e9678000 task.stack: 8013e9674000
> [1.571874] PC is at try_to_wake_up+0x298/0x300
> [1.576446] LR is at try_to_wake_up+0x278/0x300
> [1.581019] pc : [] lr : [] pstate: 
> 20c5
> [1.588490] sp : 8013e9677b90
> [1.591832] x29: 8013e9677b90 x28: 8413eb81a4b0
> [1.597196] x27: 008c x26: 08d6e840
> [1.602561] x25: 0004 x24: 8013e96e82e0
> [1.607925] x23: 0040 x22: 00c0
> [1.613289] x21: 8013e96e868c x20: 
> [1.618653] x19: 8013e96e8000 x18: 
> [1.624018] x17:  x16: 03010066
> [1.629381] x15: 08ca8000 x14: 0003
> [1.634745] x13: 0026 x12: 0009
> [1.640109] x11: 0009 x10: 
> [1.645472] x9 :  x8 : 0014
> [1.650837] x7 : 8013e9452e00 x6 : 
> [1.656200] x5 :  x4 : 
> [1.661565] x3 :  x2 : 0040
> [1.666929] x1 : 0001 x0 : 08d63df9
> [1.672293]
> [1.673788] ---[ end trace b58e70f3295a8cd8 ]---
> [1.678448] Call trace:
> [1.680911] Exception stack(0x8013e96779c0 to 0x8013e9677af0)
> [1.687417] 79c0: 8013e96e8000 0001 8013e9677b90 
> 080df66c
> [1.695329] 79e0:  0808e1f4  
> 8013e9d30c80
> [1.703242] 7a00: 8013e9677a20 0882b6f4 8013e9677a60 
> 080dd384
> [1.711153] 7a20:  8013e9677b00 08cbaa00 
> 08d6e000
> [1.719065] 7a40:   0001 
> 0080
> [1.726977] 7a60: 08d63df9 0001 0040 
> 
> [1.734889] 7a80:    
> 8013e9452e00
> [1.742801] 7aa0: 0014   
> 0009
> [1.750713] 7ac0: 0009 0026 0003 
> 08ca8000
> [1.758624] 7ae0: 03010066 
> [1.763548] [] try_to_wake_up+0x298/0x300
> [1.769175] [] wake_up_process+0x14/0x1c
> [1.774716] [] create_worker+0x108/0x194
> [1.780255] [] alloc_unbound_pwq+0x1e4/0x398
> [1.786146] [] wq_update_unbound_numa+0xdc/0x190
> [1.792389] [] workqueue_online_cpu+0x254/0x2a8
> [1.798545] [] cpuhp_up_callbacks+0x54/0x100
> [1.804436] [] cpuhp_thread_fun+0x12c/0x13c
> [1.810240] [] smpboot_thread_fn+0x1a8/0x1cc
> [1.816130] [] kthread+0xd4/0xe8
> [1.820967] [] ret_from_fork+0x10/0x40
> [1.826334] Unable to handle kernel paging request at virtual address 
> fffe841404c71524
> [1.834333] pgd = 08dae000
> [1.837762] [fffe841404c71524] *pgd=0413fbfee003, *pud=
> [1.844797] Internal error: Oops: 9604 [#1] SMP
> [1.849720] Modules linked in:
> [1.852799] CPU: 16 PID: 103 Comm: cpuhp/16 Tainted: GW   
> 4.8.0-rc4-00163-g803ea3a #21
> [1.861853] Hardware name: Hisilicon Hi1616 Evaluation Board (DT)
> [1.868007] task: 8013e9678000 task.stack: 8013e9674000
> [1.873985] PC is at try_to_wake_up+0x148/0x300
> [1.878557] LR is at try_to_wake_up+0x11c/0x300
> [1.883129] pc : [] lr : [] pstate: 
> 60c5
> [1.890602] sp : 8013e9677b90
> [1.893943] x29: 8013e9677b90 x28: 8413eb81a4b0
> [1.899307] x27: 008c x26: 08d6e840
> [1.904670] x25: 08ca5f10 x24: 08c77600
> [1.910033] x23: 0040 x22: 00c0
> [1.915398] x21: 8013e96e868c x20: 0004
> [1.920761] x19: 8013e96e8000 x18: 
> [

Re: [RFC] Arm64 boot fail with numa enable in BIOS

2016-09-19 Thread Leizhen (ThunderTown)


On 2016/9/19 22:45, Will Deacon wrote:
> On Mon, Sep 19, 2016 at 03:07:19PM +0100, Mark Rutland wrote:
>> [adding LAKML, arm64 maintainers]
> 
> I've also looped in Euler ThunderTown, since (a) he's at Huawei and is
> assumedly testing this stuff and (b) he has a fairly big NUMA patch
> series doing the rounds (some of which I've queued).
In my patch series, only one is used to resolve crashed problem, but it's 
related to device-tree.

> 
>> On Mon, Sep 19, 2016 at 09:05:26PM +0800, Yisheng Xie wrote:
>> In future, please make sure to Cc LAKML along with relevant parties when
>> sending arm64 patches/queries.
>>
>> For everyone newly Cc'd, the original message (with attachments) can be
>> found at:
>>
>> http://lkml.kernel.org/r/7618d76d-bfa8-d8aa-59aa-06f9d90c1...@huawei.com
>>
>>> When I enable NUMA in BIOS for arm64, it failed to boot on 
>>> v4.8-rc4-162-g071e31e.
>>
>> That commit ID doesn't seem to be in mainline (I can't find it in my
>> local tree). Which tree are you using? Do you have local patches
>> applied?
> 
> That commit is in mainline:
> 
>   http://git.kernel.org/linus/071e31e
> 
> It would be nice to know if the problem also exists on the arm64
> for-next/core branch.
> 
> Will
> 
> 
>> I take it that by "enable NUMA in BIOS", you mean exposing SRAT to the
>> OS?
>>
>>> For the crash log, it seems caused by error number of cpumask.
>>> Any ideas about it?
>>
>> Much earlier in your log, there was a (non-fatal) warning, as below. Do
>> you see this without NUMA/SRAT enabled in your FW? I don't see how the
>> SRAT should affect the secondaries we try to bring online.
>>
>> Given your MPIDRs have Aff2 bits set, I wonder if we've conflated a
>> logical ID with a physical ID somewhere, and it just so happens that the
>> NUMA code is more likely to poke something based on that.
>>
>> Can you modify the warning in cpumask.h to dump the bad CPU number? That
>> would make it fairly clear if that's the case.
>>
>> Thanks,
>> Mark.
>>
>>> [0.297337] Detected PIPT I-cache on CPU1
>>> [0.297347] GICv3: CPU1: found redistributor 10001 region 
>>> 1:0x4d14
>>> [0.297356] CPU1: Booted secondary processor [410fd082]
>>> [0.297375] [ cut here ]
>>> [0.320390] WARNING: CPU: 1 PID: 0 at ./include/linux/cpumask.h:121 
>>> gic_raise_softirq+0x128/0x17c
>>> [0.329356] Modules linked in:
>>> [0.332434] 
>>> [0.333932] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 
>>> 4.8.0-rc4-00163-g803ea3a #21
>>> [0.341581] Hardware name: Hisilicon Hi1616 Evaluation Board (DT)
>>> [0.347735] task: 8013e9dd task.stack: 8013e9dcc000
>>> [0.353714] PC is at gic_raise_softirq+0x128/0x17c
>>> [0.358550] LR is at gic_raise_softirq+0xa0/0x17c
>>> [0.363298] pc : [] lr : [] pstate: 
>>> 21c5
>>> [0.370770] sp : 8013e9dcfde0
>>> [0.374112] x29: 8013e9dcfde0 x28:  
>>> [0.379476] x27: 0083207c x26: 08ca5d70 
>>> [0.384841] x25: 00010001 x24: 08d63ff3 
>>> [0.390205] x23:  x22: 08cb 
>>> [0.395569] x21: 0884edb0 x20: 0001 
>>> [0.400933] x19: 0001 x18:  
>>> [0.406298] x17:  x16: 03010066 
>>> [0.411661] x15: 08ca8000 x14: 0013 
>>> [0.417025] x13:  x12: 0013 
>>> [0.422389] x11: 0013 x10: 02e92aa7 
>>> [0.427754] x9 :  x8 : 8413eb6ca668 
>>> [0.433118] x7 : 8413eb6ca690 x6 :  
>>> [0.438482] x5 : fffe x4 :  
>>> [0.443845] x3 : 0040 x2 : 0041 
>>> [0.449209] x1 :  x0 : 0001 
>>> [0.454573] 
>>> [0.456069] ---[ end trace b58e70f3295a8cd7 ]---
>>> [0.460730] Call trace:
>>> [0.463193] Exception stack(0x8013e9dcfc10 to 0x8013e9dcfd40)
>>> [0.469699] fc00:   0001 
>>> 0001
>>> [0.477611] fc20: 8013e9dcfde0 0838c124 08d72228 
>>> 8013e9dcff70
>>> [0.485524] fc40: 08d72608 08ab02a4  
>>> 
>>> [0.493436] fc60:  3464313430303030  
>>> 
>>> [0.501348] fc80: 8013e9dcfc90 0836e678 8013e9dcfca0 
>>> 0836e910
>>> [0.509259] fca0: 8013e9dcfd30 0836ec10 0001 
>>> 
>>> [0.517171] fcc0: 0041 0040  
>>> fffe
>>> [0.525083] fce0:  8413eb6ca690 8413eb6ca668 
>>> 
>>> [0.532995] fd00: 02e92aa7 0013 0013 
>>> 
>>> [0.540907] fd20: 0013 08ca8000 03010066 
>>> 
>>> [

Re: [RFC] Arm64 boot fail with numa enable in BIOS

2016-09-19 Thread Leizhen (ThunderTown)


On 2016/9/19 22:45, Will Deacon wrote:
> On Mon, Sep 19, 2016 at 03:07:19PM +0100, Mark Rutland wrote:
>> [adding LAKML, arm64 maintainers]
> 
> I've also looped in Euler ThunderTown, since (a) he's at Huawei and is
> assumedly testing this stuff and (b) he has a fairly big NUMA patch
> series doing the rounds (some of which I've queued).
In my patch series, only one is used to resolve crashed problem, but it's 
related to device-tree.

> 
>> On Mon, Sep 19, 2016 at 09:05:26PM +0800, Yisheng Xie wrote:
>> In future, please make sure to Cc LAKML along with relevant parties when
>> sending arm64 patches/queries.
>>
>> For everyone newly Cc'd, the original message (with attachments) can be
>> found at:
>>
>> http://lkml.kernel.org/r/7618d76d-bfa8-d8aa-59aa-06f9d90c1...@huawei.com
>>
>>> When I enable NUMA in BIOS for arm64, it failed to boot on 
>>> v4.8-rc4-162-g071e31e.
>>
>> That commit ID doesn't seem to be in mainline (I can't find it in my
>> local tree). Which tree are you using? Do you have local patches
>> applied?
> 
> That commit is in mainline:
> 
>   http://git.kernel.org/linus/071e31e
> 
> It would be nice to know if the problem also exists on the arm64
> for-next/core branch.
> 
> Will
> 
> 
>> I take it that by "enable NUMA in BIOS", you mean exposing SRAT to the
>> OS?
>>
>>> For the crash log, it seems caused by error number of cpumask.
>>> Any ideas about it?
>>
>> Much earlier in your log, there was a (non-fatal) warning, as below. Do
>> you see this without NUMA/SRAT enabled in your FW? I don't see how the
>> SRAT should affect the secondaries we try to bring online.
>>
>> Given your MPIDRs have Aff2 bits set, I wonder if we've conflated a
>> logical ID with a physical ID somewhere, and it just so happens that the
>> NUMA code is more likely to poke something based on that.
>>
>> Can you modify the warning in cpumask.h to dump the bad CPU number? That
>> would make it fairly clear if that's the case.
>>
>> Thanks,
>> Mark.
>>
>>> [0.297337] Detected PIPT I-cache on CPU1
>>> [0.297347] GICv3: CPU1: found redistributor 10001 region 
>>> 1:0x4d14
>>> [0.297356] CPU1: Booted secondary processor [410fd082]
>>> [0.297375] [ cut here ]
>>> [0.320390] WARNING: CPU: 1 PID: 0 at ./include/linux/cpumask.h:121 
>>> gic_raise_softirq+0x128/0x17c
>>> [0.329356] Modules linked in:
>>> [0.332434] 
>>> [0.333932] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 
>>> 4.8.0-rc4-00163-g803ea3a #21
>>> [0.341581] Hardware name: Hisilicon Hi1616 Evaluation Board (DT)
>>> [0.347735] task: 8013e9dd task.stack: 8013e9dcc000
>>> [0.353714] PC is at gic_raise_softirq+0x128/0x17c
>>> [0.358550] LR is at gic_raise_softirq+0xa0/0x17c
>>> [0.363298] pc : [] lr : [] pstate: 
>>> 21c5
>>> [0.370770] sp : 8013e9dcfde0
>>> [0.374112] x29: 8013e9dcfde0 x28:  
>>> [0.379476] x27: 0083207c x26: 08ca5d70 
>>> [0.384841] x25: 00010001 x24: 08d63ff3 
>>> [0.390205] x23:  x22: 08cb 
>>> [0.395569] x21: 0884edb0 x20: 0001 
>>> [0.400933] x19: 0001 x18:  
>>> [0.406298] x17:  x16: 03010066 
>>> [0.411661] x15: 08ca8000 x14: 0013 
>>> [0.417025] x13:  x12: 0013 
>>> [0.422389] x11: 0013 x10: 02e92aa7 
>>> [0.427754] x9 :  x8 : 8413eb6ca668 
>>> [0.433118] x7 : 8413eb6ca690 x6 :  
>>> [0.438482] x5 : fffe x4 :  
>>> [0.443845] x3 : 0040 x2 : 0041 
>>> [0.449209] x1 :  x0 : 0001 
>>> [0.454573] 
>>> [0.456069] ---[ end trace b58e70f3295a8cd7 ]---
>>> [0.460730] Call trace:
>>> [0.463193] Exception stack(0x8013e9dcfc10 to 0x8013e9dcfd40)
>>> [0.469699] fc00:   0001 
>>> 0001
>>> [0.477611] fc20: 8013e9dcfde0 0838c124 08d72228 
>>> 8013e9dcff70
>>> [0.485524] fc40: 08d72608 08ab02a4  
>>> 
>>> [0.493436] fc60:  3464313430303030  
>>> 
>>> [0.501348] fc80: 8013e9dcfc90 0836e678 8013e9dcfca0 
>>> 0836e910
>>> [0.509259] fca0: 8013e9dcfd30 0836ec10 0001 
>>> 
>>> [0.517171] fcc0: 0041 0040  
>>> fffe
>>> [0.525083] fce0:  8413eb6ca690 8413eb6ca668 
>>> 
>>> [0.532995] fd00: 02e92aa7 0013 0013 
>>> 
>>> [0.540907] fd20: 0013 08ca8000 03010066 
>>> 
>>> [

Re: [RFC] Arm64 boot fail with numa enable in BIOS

2016-09-19 Thread James Morse
On 19/09/16 15:07, Mark Rutland wrote:
> On Mon, Sep 19, 2016 at 09:05:26PM +0800, Yisheng Xie wrote:
>> For the crash log, it seems caused by error number of cpumask.
>> Any ideas about it?

> Much earlier in your log, there was a (non-fatal) warning, as below. Do
> you see this without NUMA/SRAT enabled in your FW?

>> [0.297337] Detected PIPT I-cache on CPU1
>> [0.297347] GICv3: CPU1: found redistributor 10001 region 
>> 1:0x4d14
>> [0.297356] CPU1: Booted secondary processor [410fd082]
>> [0.297375] [ cut here ]
>> [0.320390] WARNING: CPU: 1 PID: 0 at ./include/linux/cpumask.h:121 
>> gic_raise_softirq+0x128/0x17c
>> [0.329356] Modules linked in:
>> [0.332434] 
>> [0.333932] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 
>> 4.8.0-rc4-00163-g803ea3a #21
>> [0.341581] Hardware name: Hisilicon Hi1616 Evaluation Board (DT)
>> [0.347735] task: 8013e9dd task.stack: 8013e9dcc000
>> [0.353714] PC is at gic_raise_softirq+0x128/0x17c
>> [0.358550] LR is at gic_raise_softirq+0xa0/0x17c

I've seen this first trace when built with DEBUG_PER_CPU_MAPS. My version of
this trace[0] was just noise due to gic_compute_target_list() and
gic_raise_softirq() sharing an iterator.

This patch silenced it for me:
https://lkml.org/lkml/2016/9/19/623

Yours may be a different problem with the same symptom.


Thanks,

James


[0] gicv3 trace when built with DEBUG_PER_CPU_MAPS
[3.077738] GICv3: CPU1: found redistributor 1 region 0:0x2f12
[3.077943] CPU1: Booted secondary processor [410fd0f0]
[3.078542] [ cut here ]
[3.078746] WARNING: CPU: 1 PID: 0 at ../include/linux/cpumask.h:121
gic_raise_softirq+0x12c/0x170
[3.078812] Modules linked in:
[3.078869]
[3.078930] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.8.0-rc5+ #5188
[3.078994] Hardware name: Foundation-v8A (DT)
[3.079059] task: 80087a1a0080 task.stack: 80087a19c000
[3.079145] PC is at gic_raise_softirq+0x12c/0x170
[3.079226] LR is at gic_raise_softirq+0xa4/0x170




Re: [RFC] Arm64 boot fail with numa enable in BIOS

2016-09-19 Thread James Morse
On 19/09/16 15:07, Mark Rutland wrote:
> On Mon, Sep 19, 2016 at 09:05:26PM +0800, Yisheng Xie wrote:
>> For the crash log, it seems caused by error number of cpumask.
>> Any ideas about it?

> Much earlier in your log, there was a (non-fatal) warning, as below. Do
> you see this without NUMA/SRAT enabled in your FW?

>> [0.297337] Detected PIPT I-cache on CPU1
>> [0.297347] GICv3: CPU1: found redistributor 10001 region 
>> 1:0x4d14
>> [0.297356] CPU1: Booted secondary processor [410fd082]
>> [0.297375] [ cut here ]
>> [0.320390] WARNING: CPU: 1 PID: 0 at ./include/linux/cpumask.h:121 
>> gic_raise_softirq+0x128/0x17c
>> [0.329356] Modules linked in:
>> [0.332434] 
>> [0.333932] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 
>> 4.8.0-rc4-00163-g803ea3a #21
>> [0.341581] Hardware name: Hisilicon Hi1616 Evaluation Board (DT)
>> [0.347735] task: 8013e9dd task.stack: 8013e9dcc000
>> [0.353714] PC is at gic_raise_softirq+0x128/0x17c
>> [0.358550] LR is at gic_raise_softirq+0xa0/0x17c

I've seen this first trace when built with DEBUG_PER_CPU_MAPS. My version of
this trace[0] was just noise due to gic_compute_target_list() and
gic_raise_softirq() sharing an iterator.

This patch silenced it for me:
https://lkml.org/lkml/2016/9/19/623

Yours may be a different problem with the same symptom.


Thanks,

James


[0] gicv3 trace when built with DEBUG_PER_CPU_MAPS
[3.077738] GICv3: CPU1: found redistributor 1 region 0:0x2f12
[3.077943] CPU1: Booted secondary processor [410fd0f0]
[3.078542] [ cut here ]
[3.078746] WARNING: CPU: 1 PID: 0 at ../include/linux/cpumask.h:121
gic_raise_softirq+0x12c/0x170
[3.078812] Modules linked in:
[3.078869]
[3.078930] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.8.0-rc5+ #5188
[3.078994] Hardware name: Foundation-v8A (DT)
[3.079059] task: 80087a1a0080 task.stack: 80087a19c000
[3.079145] PC is at gic_raise_softirq+0x12c/0x170
[3.079226] LR is at gic_raise_softirq+0xa4/0x170




Re: [RFC] Arm64 boot fail with numa enable in BIOS

2016-09-19 Thread Will Deacon
On Mon, Sep 19, 2016 at 03:07:19PM +0100, Mark Rutland wrote:
> [adding LAKML, arm64 maintainers]

I've also looped in Euler ThunderTown, since (a) he's at Huawei and is
assumedly testing this stuff and (b) he has a fairly big NUMA patch
series doing the rounds (some of which I've queued).

> On Mon, Sep 19, 2016 at 09:05:26PM +0800, Yisheng Xie wrote:
> In future, please make sure to Cc LAKML along with relevant parties when
> sending arm64 patches/queries.
> 
> For everyone newly Cc'd, the original message (with attachments) can be
> found at:
> 
> http://lkml.kernel.org/r/7618d76d-bfa8-d8aa-59aa-06f9d90c1...@huawei.com
> 
> > When I enable NUMA in BIOS for arm64, it failed to boot on 
> > v4.8-rc4-162-g071e31e.
> 
> That commit ID doesn't seem to be in mainline (I can't find it in my
> local tree). Which tree are you using? Do you have local patches
> applied?

That commit is in mainline:

  http://git.kernel.org/linus/071e31e

It would be nice to know if the problem also exists on the arm64
for-next/core branch.

Will


> I take it that by "enable NUMA in BIOS", you mean exposing SRAT to the
> OS?
> 
> > For the crash log, it seems caused by error number of cpumask.
> > Any ideas about it?
> 
> Much earlier in your log, there was a (non-fatal) warning, as below. Do
> you see this without NUMA/SRAT enabled in your FW? I don't see how the
> SRAT should affect the secondaries we try to bring online.
> 
> Given your MPIDRs have Aff2 bits set, I wonder if we've conflated a
> logical ID with a physical ID somewhere, and it just so happens that the
> NUMA code is more likely to poke something based on that.
> 
> Can you modify the warning in cpumask.h to dump the bad CPU number? That
> would make it fairly clear if that's the case.
> 
> Thanks,
> Mark.
> 
> > [0.297337] Detected PIPT I-cache on CPU1
> > [0.297347] GICv3: CPU1: found redistributor 10001 region 
> > 1:0x4d14
> > [0.297356] CPU1: Booted secondary processor [410fd082]
> > [0.297375] [ cut here ]
> > [0.320390] WARNING: CPU: 1 PID: 0 at ./include/linux/cpumask.h:121 
> > gic_raise_softirq+0x128/0x17c
> > [0.329356] Modules linked in:
> > [0.332434] 
> > [0.333932] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 
> > 4.8.0-rc4-00163-g803ea3a #21
> > [0.341581] Hardware name: Hisilicon Hi1616 Evaluation Board (DT)
> > [0.347735] task: 8013e9dd task.stack: 8013e9dcc000
> > [0.353714] PC is at gic_raise_softirq+0x128/0x17c
> > [0.358550] LR is at gic_raise_softirq+0xa0/0x17c
> > [0.363298] pc : [] lr : [] pstate: 
> > 21c5
> > [0.370770] sp : 8013e9dcfde0
> > [0.374112] x29: 8013e9dcfde0 x28:  
> > [0.379476] x27: 0083207c x26: 08ca5d70 
> > [0.384841] x25: 00010001 x24: 08d63ff3 
> > [0.390205] x23:  x22: 08cb 
> > [0.395569] x21: 0884edb0 x20: 0001 
> > [0.400933] x19: 0001 x18:  
> > [0.406298] x17:  x16: 03010066 
> > [0.411661] x15: 08ca8000 x14: 0013 
> > [0.417025] x13:  x12: 0013 
> > [0.422389] x11: 0013 x10: 02e92aa7 
> > [0.427754] x9 :  x8 : 8413eb6ca668 
> > [0.433118] x7 : 8413eb6ca690 x6 :  
> > [0.438482] x5 : fffe x4 :  
> > [0.443845] x3 : 0040 x2 : 0041 
> > [0.449209] x1 :  x0 : 0001 
> > [0.454573] 
> > [0.456069] ---[ end trace b58e70f3295a8cd7 ]---
> > [0.460730] Call trace:
> > [0.463193] Exception stack(0x8013e9dcfc10 to 0x8013e9dcfd40)
> > [0.469699] fc00:   0001 
> > 0001
> > [0.477611] fc20: 8013e9dcfde0 0838c124 08d72228 
> > 8013e9dcff70
> > [0.485524] fc40: 08d72608 08ab02a4  
> > 
> > [0.493436] fc60:  3464313430303030  
> > 
> > [0.501348] fc80: 8013e9dcfc90 0836e678 8013e9dcfca0 
> > 0836e910
> > [0.509259] fca0: 8013e9dcfd30 0836ec10 0001 
> > 
> > [0.517171] fcc0: 0041 0040  
> > fffe
> > [0.525083] fce0:  8413eb6ca690 8413eb6ca668 
> > 
> > [0.532995] fd00: 02e92aa7 0013 0013 
> > 
> > [0.540907] fd20: 0013 08ca8000 03010066 
> > 
> > [0.548819] [] gic_raise_softirq+0x128/0x17c
> > [0.554713] [] smp_send_reschedule+0x34/0x3c
> > [0.560605] [] resched_curr+0x40/0x5c
> > [0.565881] [] check_preempt_curr+0x58/0xa0
> > [

Re: [RFC] Arm64 boot fail with numa enable in BIOS

2016-09-19 Thread Will Deacon
On Mon, Sep 19, 2016 at 03:07:19PM +0100, Mark Rutland wrote:
> [adding LAKML, arm64 maintainers]

I've also looped in Euler ThunderTown, since (a) he's at Huawei and is
assumedly testing this stuff and (b) he has a fairly big NUMA patch
series doing the rounds (some of which I've queued).

> On Mon, Sep 19, 2016 at 09:05:26PM +0800, Yisheng Xie wrote:
> In future, please make sure to Cc LAKML along with relevant parties when
> sending arm64 patches/queries.
> 
> For everyone newly Cc'd, the original message (with attachments) can be
> found at:
> 
> http://lkml.kernel.org/r/7618d76d-bfa8-d8aa-59aa-06f9d90c1...@huawei.com
> 
> > When I enable NUMA in BIOS for arm64, it failed to boot on 
> > v4.8-rc4-162-g071e31e.
> 
> That commit ID doesn't seem to be in mainline (I can't find it in my
> local tree). Which tree are you using? Do you have local patches
> applied?

That commit is in mainline:

  http://git.kernel.org/linus/071e31e

It would be nice to know if the problem also exists on the arm64
for-next/core branch.

Will


> I take it that by "enable NUMA in BIOS", you mean exposing SRAT to the
> OS?
> 
> > For the crash log, it seems caused by error number of cpumask.
> > Any ideas about it?
> 
> Much earlier in your log, there was a (non-fatal) warning, as below. Do
> you see this without NUMA/SRAT enabled in your FW? I don't see how the
> SRAT should affect the secondaries we try to bring online.
> 
> Given your MPIDRs have Aff2 bits set, I wonder if we've conflated a
> logical ID with a physical ID somewhere, and it just so happens that the
> NUMA code is more likely to poke something based on that.
> 
> Can you modify the warning in cpumask.h to dump the bad CPU number? That
> would make it fairly clear if that's the case.
> 
> Thanks,
> Mark.
> 
> > [0.297337] Detected PIPT I-cache on CPU1
> > [0.297347] GICv3: CPU1: found redistributor 10001 region 
> > 1:0x4d14
> > [0.297356] CPU1: Booted secondary processor [410fd082]
> > [0.297375] [ cut here ]
> > [0.320390] WARNING: CPU: 1 PID: 0 at ./include/linux/cpumask.h:121 
> > gic_raise_softirq+0x128/0x17c
> > [0.329356] Modules linked in:
> > [0.332434] 
> > [0.333932] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 
> > 4.8.0-rc4-00163-g803ea3a #21
> > [0.341581] Hardware name: Hisilicon Hi1616 Evaluation Board (DT)
> > [0.347735] task: 8013e9dd task.stack: 8013e9dcc000
> > [0.353714] PC is at gic_raise_softirq+0x128/0x17c
> > [0.358550] LR is at gic_raise_softirq+0xa0/0x17c
> > [0.363298] pc : [] lr : [] pstate: 
> > 21c5
> > [0.370770] sp : 8013e9dcfde0
> > [0.374112] x29: 8013e9dcfde0 x28:  
> > [0.379476] x27: 0083207c x26: 08ca5d70 
> > [0.384841] x25: 00010001 x24: 08d63ff3 
> > [0.390205] x23:  x22: 08cb 
> > [0.395569] x21: 0884edb0 x20: 0001 
> > [0.400933] x19: 0001 x18:  
> > [0.406298] x17:  x16: 03010066 
> > [0.411661] x15: 08ca8000 x14: 0013 
> > [0.417025] x13:  x12: 0013 
> > [0.422389] x11: 0013 x10: 02e92aa7 
> > [0.427754] x9 :  x8 : 8413eb6ca668 
> > [0.433118] x7 : 8413eb6ca690 x6 :  
> > [0.438482] x5 : fffe x4 :  
> > [0.443845] x3 : 0040 x2 : 0041 
> > [0.449209] x1 :  x0 : 0001 
> > [0.454573] 
> > [0.456069] ---[ end trace b58e70f3295a8cd7 ]---
> > [0.460730] Call trace:
> > [0.463193] Exception stack(0x8013e9dcfc10 to 0x8013e9dcfd40)
> > [0.469699] fc00:   0001 
> > 0001
> > [0.477611] fc20: 8013e9dcfde0 0838c124 08d72228 
> > 8013e9dcff70
> > [0.485524] fc40: 08d72608 08ab02a4  
> > 
> > [0.493436] fc60:  3464313430303030  
> > 
> > [0.501348] fc80: 8013e9dcfc90 0836e678 8013e9dcfca0 
> > 0836e910
> > [0.509259] fca0: 8013e9dcfd30 0836ec10 0001 
> > 
> > [0.517171] fcc0: 0041 0040  
> > fffe
> > [0.525083] fce0:  8413eb6ca690 8413eb6ca668 
> > 
> > [0.532995] fd00: 02e92aa7 0013 0013 
> > 
> > [0.540907] fd20: 0013 08ca8000 03010066 
> > 
> > [0.548819] [] gic_raise_softirq+0x128/0x17c
> > [0.554713] [] smp_send_reschedule+0x34/0x3c
> > [0.560605] [] resched_curr+0x40/0x5c
> > [0.565881] [] check_preempt_curr+0x58/0xa0
> > [

Re: [RFC] Arm64 boot fail with numa enable in BIOS

2016-09-19 Thread Mark Rutland
[adding LAKML, arm64 maintainers]

On Mon, Sep 19, 2016 at 09:05:26PM +0800, Yisheng Xie wrote:
> hi all,

Hi,

In future, please make sure to Cc LAKML along with relevant parties when
sending arm64 patches/queries.

For everyone newly Cc'd, the original message (with attachments) can be
found at:

http://lkml.kernel.org/r/7618d76d-bfa8-d8aa-59aa-06f9d90c1...@huawei.com

> When I enable NUMA in BIOS for arm64, it failed to boot on 
> v4.8-rc4-162-g071e31e.

That commit ID doesn't seem to be in mainline (I can't find it in my
local tree). Which tree are you using? Do you have local patches
applied?

I take it that by "enable NUMA in BIOS", you mean exposing SRAT to the
OS?

> For the crash log, it seems caused by error number of cpumask.
> Any ideas about it?

Much earlier in your log, there was a (non-fatal) warning, as below. Do
you see this without NUMA/SRAT enabled in your FW? I don't see how the
SRAT should affect the secondaries we try to bring online.

Given your MPIDRs have Aff2 bits set, I wonder if we've conflated a
logical ID with a physical ID somewhere, and it just so happens that the
NUMA code is more likely to poke something based on that.

Can you modify the warning in cpumask.h to dump the bad CPU number? That
would make it fairly clear if that's the case.

Thanks,
Mark.

> [0.297337] Detected PIPT I-cache on CPU1
> [0.297347] GICv3: CPU1: found redistributor 10001 region 
> 1:0x4d14
> [0.297356] CPU1: Booted secondary processor [410fd082]
> [0.297375] [ cut here ]
> [0.320390] WARNING: CPU: 1 PID: 0 at ./include/linux/cpumask.h:121 
> gic_raise_softirq+0x128/0x17c
> [0.329356] Modules linked in:
> [0.332434] 
> [0.333932] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 
> 4.8.0-rc4-00163-g803ea3a #21
> [0.341581] Hardware name: Hisilicon Hi1616 Evaluation Board (DT)
> [0.347735] task: 8013e9dd task.stack: 8013e9dcc000
> [0.353714] PC is at gic_raise_softirq+0x128/0x17c
> [0.358550] LR is at gic_raise_softirq+0xa0/0x17c
> [0.363298] pc : [] lr : [] pstate: 
> 21c5
> [0.370770] sp : 8013e9dcfde0
> [0.374112] x29: 8013e9dcfde0 x28:  
> [0.379476] x27: 0083207c x26: 08ca5d70 
> [0.384841] x25: 00010001 x24: 08d63ff3 
> [0.390205] x23:  x22: 08cb 
> [0.395569] x21: 0884edb0 x20: 0001 
> [0.400933] x19: 0001 x18:  
> [0.406298] x17:  x16: 03010066 
> [0.411661] x15: 08ca8000 x14: 0013 
> [0.417025] x13:  x12: 0013 
> [0.422389] x11: 0013 x10: 02e92aa7 
> [0.427754] x9 :  x8 : 8413eb6ca668 
> [0.433118] x7 : 8413eb6ca690 x6 :  
> [0.438482] x5 : fffe x4 :  
> [0.443845] x3 : 0040 x2 : 0041 
> [0.449209] x1 :  x0 : 0001 
> [0.454573] 
> [0.456069] ---[ end trace b58e70f3295a8cd7 ]---
> [0.460730] Call trace:
> [0.463193] Exception stack(0x8013e9dcfc10 to 0x8013e9dcfd40)
> [0.469699] fc00:   0001 
> 0001
> [0.477611] fc20: 8013e9dcfde0 0838c124 08d72228 
> 8013e9dcff70
> [0.485524] fc40: 08d72608 08ab02a4  
> 
> [0.493436] fc60:  3464313430303030  
> 
> [0.501348] fc80: 8013e9dcfc90 0836e678 8013e9dcfca0 
> 0836e910
> [0.509259] fca0: 8013e9dcfd30 0836ec10 0001 
> 
> [0.517171] fcc0: 0041 0040  
> fffe
> [0.525083] fce0:  8413eb6ca690 8413eb6ca668 
> 
> [0.532995] fd00: 02e92aa7 0013 0013 
> 
> [0.540907] fd20: 0013 08ca8000 03010066 
> 
> [0.548819] [] gic_raise_softirq+0x128/0x17c
> [0.554713] [] smp_send_reschedule+0x34/0x3c
> [0.560605] [] resched_curr+0x40/0x5c
> [0.565881] [] check_preempt_curr+0x58/0xa0
> [0.571685] [] ttwu_do_wakeup+0x18/0x80
> [0.577136] [] ttwu_do_activate+0x78/0x88
> [0.582763] [] try_to_wake_up+0x1f8/0x300
> [0.588390] [] default_wake_function+0x10/0x18
> [0.594458] [] __wake_up_common+0x5c/0x9c
> [0.600085] [] __wake_up_locked+0x14/0x1c
> [0.605712] [] complete+0x40/0x5c
> [0.610635] [] secondary_start_kernel+0x148/0x1a8
> [0.616965] [<000831a8>] 0x831a8


Re: [RFC] Arm64 boot fail with numa enable in BIOS

2016-09-19 Thread Mark Rutland
[adding LAKML, arm64 maintainers]

On Mon, Sep 19, 2016 at 09:05:26PM +0800, Yisheng Xie wrote:
> hi all,

Hi,

In future, please make sure to Cc LAKML along with relevant parties when
sending arm64 patches/queries.

For everyone newly Cc'd, the original message (with attachments) can be
found at:

http://lkml.kernel.org/r/7618d76d-bfa8-d8aa-59aa-06f9d90c1...@huawei.com

> When I enable NUMA in BIOS for arm64, it failed to boot on 
> v4.8-rc4-162-g071e31e.

That commit ID doesn't seem to be in mainline (I can't find it in my
local tree). Which tree are you using? Do you have local patches
applied?

I take it that by "enable NUMA in BIOS", you mean exposing SRAT to the
OS?

> For the crash log, it seems caused by error number of cpumask.
> Any ideas about it?

Much earlier in your log, there was a (non-fatal) warning, as below. Do
you see this without NUMA/SRAT enabled in your FW? I don't see how the
SRAT should affect the secondaries we try to bring online.

Given your MPIDRs have Aff2 bits set, I wonder if we've conflated a
logical ID with a physical ID somewhere, and it just so happens that the
NUMA code is more likely to poke something based on that.

Can you modify the warning in cpumask.h to dump the bad CPU number? That
would make it fairly clear if that's the case.

Thanks,
Mark.

> [0.297337] Detected PIPT I-cache on CPU1
> [0.297347] GICv3: CPU1: found redistributor 10001 region 
> 1:0x4d14
> [0.297356] CPU1: Booted secondary processor [410fd082]
> [0.297375] [ cut here ]
> [0.320390] WARNING: CPU: 1 PID: 0 at ./include/linux/cpumask.h:121 
> gic_raise_softirq+0x128/0x17c
> [0.329356] Modules linked in:
> [0.332434] 
> [0.333932] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 
> 4.8.0-rc4-00163-g803ea3a #21
> [0.341581] Hardware name: Hisilicon Hi1616 Evaluation Board (DT)
> [0.347735] task: 8013e9dd task.stack: 8013e9dcc000
> [0.353714] PC is at gic_raise_softirq+0x128/0x17c
> [0.358550] LR is at gic_raise_softirq+0xa0/0x17c
> [0.363298] pc : [] lr : [] pstate: 
> 21c5
> [0.370770] sp : 8013e9dcfde0
> [0.374112] x29: 8013e9dcfde0 x28:  
> [0.379476] x27: 0083207c x26: 08ca5d70 
> [0.384841] x25: 00010001 x24: 08d63ff3 
> [0.390205] x23:  x22: 08cb 
> [0.395569] x21: 0884edb0 x20: 0001 
> [0.400933] x19: 0001 x18:  
> [0.406298] x17:  x16: 03010066 
> [0.411661] x15: 08ca8000 x14: 0013 
> [0.417025] x13:  x12: 0013 
> [0.422389] x11: 0013 x10: 02e92aa7 
> [0.427754] x9 :  x8 : 8413eb6ca668 
> [0.433118] x7 : 8413eb6ca690 x6 :  
> [0.438482] x5 : fffe x4 :  
> [0.443845] x3 : 0040 x2 : 0041 
> [0.449209] x1 :  x0 : 0001 
> [0.454573] 
> [0.456069] ---[ end trace b58e70f3295a8cd7 ]---
> [0.460730] Call trace:
> [0.463193] Exception stack(0x8013e9dcfc10 to 0x8013e9dcfd40)
> [0.469699] fc00:   0001 
> 0001
> [0.477611] fc20: 8013e9dcfde0 0838c124 08d72228 
> 8013e9dcff70
> [0.485524] fc40: 08d72608 08ab02a4  
> 
> [0.493436] fc60:  3464313430303030  
> 
> [0.501348] fc80: 8013e9dcfc90 0836e678 8013e9dcfca0 
> 0836e910
> [0.509259] fca0: 8013e9dcfd30 0836ec10 0001 
> 
> [0.517171] fcc0: 0041 0040  
> fffe
> [0.525083] fce0:  8413eb6ca690 8413eb6ca668 
> 
> [0.532995] fd00: 02e92aa7 0013 0013 
> 
> [0.540907] fd20: 0013 08ca8000 03010066 
> 
> [0.548819] [] gic_raise_softirq+0x128/0x17c
> [0.554713] [] smp_send_reschedule+0x34/0x3c
> [0.560605] [] resched_curr+0x40/0x5c
> [0.565881] [] check_preempt_curr+0x58/0xa0
> [0.571685] [] ttwu_do_wakeup+0x18/0x80
> [0.577136] [] ttwu_do_activate+0x78/0x88
> [0.582763] [] try_to_wake_up+0x1f8/0x300
> [0.588390] [] default_wake_function+0x10/0x18
> [0.594458] [] __wake_up_common+0x5c/0x9c
> [0.600085] [] __wake_up_locked+0x14/0x1c
> [0.605712] [] complete+0x40/0x5c
> [0.610635] [] secondary_start_kernel+0x148/0x1a8
> [0.616965] [<000831a8>] 0x831a8


[RFC] Arm64 boot fail with numa enable in BIOS

2016-09-19 Thread Yisheng Xie
hi all,
When I enable NUMA in BIOS for arm64, it failed to boot on 
v4.8-rc4-162-g071e31e.
For the crash log, it seems caused by error number of cpumask.
Any ideas about it?

Thanks.

The related config and detail dmesg can be seen in the attachment.

--- crash messages ---
[1.279155] [ cut here ]
[1.537146] WARNING: CPU: 16 PID: 103 at ./include/linux/cpumask.h:121 
try_to_wake_up+0x298/0x300
[1.546112] Modules linked in:
[1.549190]
[1.550687] CPU: 16 PID: 103 Comm: cpuhp/16 Tainted: GW   
4.8.0-rc4-00163-g803ea3a #21
[1.559741] Hardware name: Hisilicon Hi1616 Evaluation Board (DT)
[1.565896] task: 8013e9678000 task.stack: 8013e9674000
[1.571874] PC is at try_to_wake_up+0x298/0x300
[1.576446] LR is at try_to_wake_up+0x278/0x300
[1.581019] pc : [] lr : [] pstate: 
20c5
[1.588490] sp : 8013e9677b90
[1.591832] x29: 8013e9677b90 x28: 8413eb81a4b0
[1.597196] x27: 008c x26: 08d6e840
[1.602561] x25: 0004 x24: 8013e96e82e0
[1.607925] x23: 0040 x22: 00c0
[1.613289] x21: 8013e96e868c x20: 
[1.618653] x19: 8013e96e8000 x18: 
[1.624018] x17:  x16: 03010066
[1.629381] x15: 08ca8000 x14: 0003
[1.634745] x13: 0026 x12: 0009
[1.640109] x11: 0009 x10: 
[1.645472] x9 :  x8 : 0014
[1.650837] x7 : 8013e9452e00 x6 : 
[1.656200] x5 :  x4 : 
[1.661565] x3 :  x2 : 0040
[1.666929] x1 : 0001 x0 : 08d63df9
[1.672293]
[1.673788] ---[ end trace b58e70f3295a8cd8 ]---
[1.678448] Call trace:
[1.680911] Exception stack(0x8013e96779c0 to 0x8013e9677af0)
[1.687417] 79c0: 8013e96e8000 0001 8013e9677b90 
080df66c
[1.695329] 79e0:  0808e1f4  
8013e9d30c80
[1.703242] 7a00: 8013e9677a20 0882b6f4 8013e9677a60 
080dd384
[1.711153] 7a20:  8013e9677b00 08cbaa00 
08d6e000
[1.719065] 7a40:   0001 
0080
[1.726977] 7a60: 08d63df9 0001 0040 

[1.734889] 7a80:    
8013e9452e00
[1.742801] 7aa0: 0014   
0009
[1.750713] 7ac0: 0009 0026 0003 
08ca8000
[1.758624] 7ae0: 03010066 
[1.763548] [] try_to_wake_up+0x298/0x300
[1.769175] [] wake_up_process+0x14/0x1c
[1.774716] [] create_worker+0x108/0x194
[1.780255] [] alloc_unbound_pwq+0x1e4/0x398
[1.786146] [] wq_update_unbound_numa+0xdc/0x190
[1.792389] [] workqueue_online_cpu+0x254/0x2a8
[1.798545] [] cpuhp_up_callbacks+0x54/0x100
[1.804436] [] cpuhp_thread_fun+0x12c/0x13c
[1.810240] [] smpboot_thread_fn+0x1a8/0x1cc
[1.816130] [] kthread+0xd4/0xe8
[1.820967] [] ret_from_fork+0x10/0x40
[1.826334] Unable to handle kernel paging request at virtual address 
fffe841404c71524
[1.834333] pgd = 08dae000
[1.837762] [fffe841404c71524] *pgd=0413fbfee003, *pud=
[1.844797] Internal error: Oops: 9604 [#1] SMP
[1.849720] Modules linked in:
[1.852799] CPU: 16 PID: 103 Comm: cpuhp/16 Tainted: GW   
4.8.0-rc4-00163-g803ea3a #21
[1.861853] Hardware name: Hisilicon Hi1616 Evaluation Board (DT)
[1.868007] task: 8013e9678000 task.stack: 8013e9674000
[1.873985] PC is at try_to_wake_up+0x148/0x300
[1.878557] LR is at try_to_wake_up+0x11c/0x300
[1.883129] pc : [] lr : [] pstate: 
60c5
[1.890602] sp : 8013e9677b90
[1.893943] x29: 8013e9677b90 x28: 8413eb81a4b0
[1.899307] x27: 008c x26: 08d6e840
[1.904670] x25: 08ca5f10 x24: 08c77600
[1.910033] x23: 0040 x22: 00c0
[1.915398] x21: 8013e96e868c x20: 0004
[1.920761] x19: 8013e96e8000 x18: 
[1.926125] x17:  x16: 03010066
[1.931489] x15: 08ca8000 x14: 0003
[1.936853] x13: 0026 x12: 0009
[1.942217] x11: 0009 x10: 
[1.947581] x9 :  x8 : 0014
[1.952945] x7 : 8013e9452e00 x6 : 
[1.958309] x5 : 8413eb6ca700 x4 : 
[1.963674] x3 : 8413e2ba3000 x2 : 0010
[1.969037] x1 : 8413fbfffa80 x0 : 08c71aa4
[1.974401]
[1.975897] Process cpuhp/16 (pid: 103, 

[RFC] Arm64 boot fail with numa enable in BIOS

2016-09-19 Thread Yisheng Xie
hi all,
When I enable NUMA in BIOS for arm64, it failed to boot on 
v4.8-rc4-162-g071e31e.
For the crash log, it seems caused by error number of cpumask.
Any ideas about it?

Thanks.

The related config and detail dmesg can be seen in the attachment.

--- crash messages ---
[1.279155] [ cut here ]
[1.537146] WARNING: CPU: 16 PID: 103 at ./include/linux/cpumask.h:121 
try_to_wake_up+0x298/0x300
[1.546112] Modules linked in:
[1.549190]
[1.550687] CPU: 16 PID: 103 Comm: cpuhp/16 Tainted: GW   
4.8.0-rc4-00163-g803ea3a #21
[1.559741] Hardware name: Hisilicon Hi1616 Evaluation Board (DT)
[1.565896] task: 8013e9678000 task.stack: 8013e9674000
[1.571874] PC is at try_to_wake_up+0x298/0x300
[1.576446] LR is at try_to_wake_up+0x278/0x300
[1.581019] pc : [] lr : [] pstate: 
20c5
[1.588490] sp : 8013e9677b90
[1.591832] x29: 8013e9677b90 x28: 8413eb81a4b0
[1.597196] x27: 008c x26: 08d6e840
[1.602561] x25: 0004 x24: 8013e96e82e0
[1.607925] x23: 0040 x22: 00c0
[1.613289] x21: 8013e96e868c x20: 
[1.618653] x19: 8013e96e8000 x18: 
[1.624018] x17:  x16: 03010066
[1.629381] x15: 08ca8000 x14: 0003
[1.634745] x13: 0026 x12: 0009
[1.640109] x11: 0009 x10: 
[1.645472] x9 :  x8 : 0014
[1.650837] x7 : 8013e9452e00 x6 : 
[1.656200] x5 :  x4 : 
[1.661565] x3 :  x2 : 0040
[1.666929] x1 : 0001 x0 : 08d63df9
[1.672293]
[1.673788] ---[ end trace b58e70f3295a8cd8 ]---
[1.678448] Call trace:
[1.680911] Exception stack(0x8013e96779c0 to 0x8013e9677af0)
[1.687417] 79c0: 8013e96e8000 0001 8013e9677b90 
080df66c
[1.695329] 79e0:  0808e1f4  
8013e9d30c80
[1.703242] 7a00: 8013e9677a20 0882b6f4 8013e9677a60 
080dd384
[1.711153] 7a20:  8013e9677b00 08cbaa00 
08d6e000
[1.719065] 7a40:   0001 
0080
[1.726977] 7a60: 08d63df9 0001 0040 

[1.734889] 7a80:    
8013e9452e00
[1.742801] 7aa0: 0014   
0009
[1.750713] 7ac0: 0009 0026 0003 
08ca8000
[1.758624] 7ae0: 03010066 
[1.763548] [] try_to_wake_up+0x298/0x300
[1.769175] [] wake_up_process+0x14/0x1c
[1.774716] [] create_worker+0x108/0x194
[1.780255] [] alloc_unbound_pwq+0x1e4/0x398
[1.786146] [] wq_update_unbound_numa+0xdc/0x190
[1.792389] [] workqueue_online_cpu+0x254/0x2a8
[1.798545] [] cpuhp_up_callbacks+0x54/0x100
[1.804436] [] cpuhp_thread_fun+0x12c/0x13c
[1.810240] [] smpboot_thread_fn+0x1a8/0x1cc
[1.816130] [] kthread+0xd4/0xe8
[1.820967] [] ret_from_fork+0x10/0x40
[1.826334] Unable to handle kernel paging request at virtual address 
fffe841404c71524
[1.834333] pgd = 08dae000
[1.837762] [fffe841404c71524] *pgd=0413fbfee003, *pud=
[1.844797] Internal error: Oops: 9604 [#1] SMP
[1.849720] Modules linked in:
[1.852799] CPU: 16 PID: 103 Comm: cpuhp/16 Tainted: GW   
4.8.0-rc4-00163-g803ea3a #21
[1.861853] Hardware name: Hisilicon Hi1616 Evaluation Board (DT)
[1.868007] task: 8013e9678000 task.stack: 8013e9674000
[1.873985] PC is at try_to_wake_up+0x148/0x300
[1.878557] LR is at try_to_wake_up+0x11c/0x300
[1.883129] pc : [] lr : [] pstate: 
60c5
[1.890602] sp : 8013e9677b90
[1.893943] x29: 8013e9677b90 x28: 8413eb81a4b0
[1.899307] x27: 008c x26: 08d6e840
[1.904670] x25: 08ca5f10 x24: 08c77600
[1.910033] x23: 0040 x22: 00c0
[1.915398] x21: 8013e96e868c x20: 0004
[1.920761] x19: 8013e96e8000 x18: 
[1.926125] x17:  x16: 03010066
[1.931489] x15: 08ca8000 x14: 0003
[1.936853] x13: 0026 x12: 0009
[1.942217] x11: 0009 x10: 
[1.947581] x9 :  x8 : 0014
[1.952945] x7 : 8013e9452e00 x6 : 
[1.958309] x5 : 8413eb6ca700 x4 : 
[1.963674] x3 : 8413e2ba3000 x2 : 0010
[1.969037] x1 : 8413fbfffa80 x0 : 08c71aa4
[1.974401]
[1.975897] Process cpuhp/16 (pid: 103,