Re: SMP lockup at boot on Freescale/NXP T2080 (powerpc 64)

2019-08-08 Thread Christophe Leroy




Le 08/08/2019 à 10:46, Christophe Leroy a écrit :



Le 07/08/2019 à 03:24, Chris Packham a écrit :

On Wed, 2019-08-07 at 11:13 +1000, Michael Ellerman wrote:

Chris Packham  writes:


On Tue, 2019-08-06 at 21:32 +1000, Michael Ellerman wrote:
The difference between a working and non working defconfig is
CONFIG_PREEMPT specifically CONFIG_PREEMPT=y makes my system hang
at
boot.

Is that now intentionally prohibited on 64-bit powerpc?

It's not prohibitied, but it probably should be because no one really
tests it properly. I have a handful of IBM machines where I boot a
PREEMPT kernel but that's about it.

The corenet configs don't have PREEMPT enabled, which suggests it was
never really supported on those machines.

But maybe someone from NXP can tell me otherwise.



I think our workloads need CONFIG_PREEMPT=y because our systems have
switch ASIC drivers implemented in userland and we need to be able to
react quickly to network events in order to prevent loops. We have seen
instances of this not happening simply because some other process is in
the middle of a syscall.

One thing I am working on here is a setup with a few vendor boards and
some of our own kit that we can test the upstream kernels on. Hopefully
that'd make these kinds of reports more timely rather than just
whenever we decide to move to a new kernel version.





The defconfig also sets CONFIG_DEBUG_PREEMPT. Have you tried without 
CONFIG_DEBUG_PREEMPT ?




Reproduced on QEMU. CONFIG_DEBUG_PREEMPT is the trigger. Due to 
smp_processor_id() being called from early_init_this_mmu(), when 
CONFIG_DEBUG_PREEMPT is set debug_smp_processor_id() is called instead 
of raw_smp_processor_id(), but this is too early for 
debug_smp_processor_id()


As this call is useless, just drop it.

Can you test patch at https://patchwork.ozlabs.org/patch/1144005/ ?

Thanks
Christophe



Re: SMP lockup at boot on Freescale/NXP T2080 (powerpc 64)

2019-08-08 Thread Christophe Leroy




Le 07/08/2019 à 03:24, Chris Packham a écrit :

On Wed, 2019-08-07 at 11:13 +1000, Michael Ellerman wrote:

Chris Packham  writes:


On Tue, 2019-08-06 at 21:32 +1000, Michael Ellerman wrote:
The difference between a working and non working defconfig is
CONFIG_PREEMPT specifically CONFIG_PREEMPT=y makes my system hang
at
boot.

Is that now intentionally prohibited on 64-bit powerpc?

It's not prohibitied, but it probably should be because no one really
tests it properly. I have a handful of IBM machines where I boot a
PREEMPT kernel but that's about it.

The corenet configs don't have PREEMPT enabled, which suggests it was
never really supported on those machines.

But maybe someone from NXP can tell me otherwise.



I think our workloads need CONFIG_PREEMPT=y because our systems have
switch ASIC drivers implemented in userland and we need to be able to
react quickly to network events in order to prevent loops. We have seen
instances of this not happening simply because some other process is in
the middle of a syscall.

One thing I am working on here is a setup with a few vendor boards and
some of our own kit that we can test the upstream kernels on. Hopefully
that'd make these kinds of reports more timely rather than just
whenever we decide to move to a new kernel version.





The defconfig also sets CONFIG_DEBUG_PREEMPT. Have you tried without 
CONFIG_DEBUG_PREEMPT ?


Christophe


Re: SMP lockup at boot on Freescale/NXP T2080 (powerpc 64)

2019-08-06 Thread Chris Packham
On Wed, 2019-08-07 at 11:13 +1000, Michael Ellerman wrote:
> Chris Packham  writes:
> > 
> > On Tue, 2019-08-06 at 21:32 +1000, Michael Ellerman wrote:
> > > 
> > > Chris Packham  writes:
> > > > 
> > > > On Mon, 2019-08-05 at 14:06 +1200, Chris Packham wrote:
> > > > > 
> > > > > 
> > > > > Hi All,
> > > > > 
> > > > > I have a custom board that uses the Freescale/NXP T2080 SoC.
> > > > > 
> > > > > The board boots fine using v4.19.60 but when I use v5.1.21 it
> > > > > locks
> > > > > up
> > > > > waiting for the other CPUs to come online (earlyprintk output
> > > > > below).
> > > > > If I set maxcpus=0 then the system boots all the way through
> > > > > to
> > > > > userland. The same thing happens with 5.3-rc2.
> > > > > 
> > > > > The defconfig I'm using is 
> > > > > https://gist.github.com/cpackham/f24d0b426f3
> > > > > de0eaaba17b82c3528a9d it was updated from the working
> > > > > v4.19.60
> > > > > defconfig using make olddefconfig.
> > > > > 
> > > > > Does this ring any bells for anyone?
> > > > > 
> > > > > I haven't dug into the differences between the working an
> > > > > non-
> > > > > working
> > > > > versions yet. I'll start looking now.
> > > > I've bisected this to the following commit
> > > Thanks that's super helpful.
> > > 
> > > > 
> > > > 
> > > > commit ed1cd6deb013a11959d17a94e35ce159197632da
> > > > Author: Christophe Leroy 
> > > > Date:   Thu Jan 31 10:08:58 2019 +
> > > > 
> > > > powerpc: Activate CONFIG_THREAD_INFO_IN_TASK
> > > > 
> > > > This patch activates CONFIG_THREAD_INFO_IN_TASK which
> > > > moves the thread_info into task_struct.
> > > > 
> > > > I'll be the first to admit this is well beyond my area of
> > > > knowledge
> > > > so
> > > > I'm unsure what about this patch is problematic but I can be
> > > > fairly
> > > > sure that a build immediately before this patch works while a
> > > > build
> > > > with this patch hangs.
> > > It makes a pretty fundamental change to the way the kernel stores
> > > some
> > > information about each task, moving it off the stack and into the
> > > task
> > > struct.
> > > 
> > > It definitely has the potential to break things, but I thought we
> > > had
> > > reasonable test coverage of the Book3E platforms, I have a
> > > p5020ds
> > > (e5500) that I boot as part of my CI.
> > > 
> > > Aha. If I take your config and try to boot it on my p5020ds I get
> > > the
> > > same behaviour, stuck at SMP bringup. So it seems it's something
> > > in
> > > your
> > > config vs corenet64_smp_defconfig that is triggering the bug.
> > > 
> > > Can you try bisecting what in the config triggers it?
> > > 
> > > To do that you checkout ed1cd6deb013a11959d17a94e35ce159197632da,
> > > then
> > > you build/boot with corenet64_smp_defconfig to confirm it works.
> > > Then
> > > you use tools/testing/ktest/config-bisect.pl to bisect the
> > > changes in
> > > the .config.
> > > 
> > The difference between a working and non working defconfig is
> > CONFIG_PREEMPT specifically CONFIG_PREEMPT=y makes my system hang
> > at
> > boot.
> > 
> > Is that now intentionally prohibited on 64-bit powerpc?
> It's not prohibitied, but it probably should be because no one really
> tests it properly. I have a handful of IBM machines where I boot a
> PREEMPT kernel but that's about it.
> 
> The corenet configs don't have PREEMPT enabled, which suggests it was
> never really supported on those machines.
> 
> But maybe someone from NXP can tell me otherwise.
> 

I think our workloads need CONFIG_PREEMPT=y because our systems have
switch ASIC drivers implemented in userland and we need to be able to
react quickly to network events in order to prevent loops. We have seen
instances of this not happening simply because some other process is in
the middle of a syscall.

One thing I am working on here is a setup with a few vendor boards and
some of our own kit that we can test the upstream kernels on. Hopefully
that'd make these kinds of reports more timely rather than just
whenever we decide to move to a new kernel version.




Re: SMP lockup at boot on Freescale/NXP T2080 (powerpc 64)

2019-08-06 Thread Michael Ellerman
Chris Packham  writes:
> On Tue, 2019-08-06 at 21:32 +1000, Michael Ellerman wrote:
>> Chris Packham  writes:
>> > On Mon, 2019-08-05 at 14:06 +1200, Chris Packham wrote:
>> > > 
>> > > Hi All,
>> > > 
>> > > I have a custom board that uses the Freescale/NXP T2080 SoC.
>> > > 
>> > > The board boots fine using v4.19.60 but when I use v5.1.21 it
>> > > locks
>> > > up
>> > > waiting for the other CPUs to come online (earlyprintk output
>> > > below).
>> > > If I set maxcpus=0 then the system boots all the way through to
>> > > userland. The same thing happens with 5.3-rc2.
>> > > 
>> > > The defconfig I'm using is 
>> > > https://gist.github.com/cpackham/f24d0b426f3
>> > > de0eaaba17b82c3528a9d it was updated from the working v4.19.60
>> > > defconfig using make olddefconfig.
>> > > 
>> > > Does this ring any bells for anyone?
>> > > 
>> > > I haven't dug into the differences between the working an non-
>> > > working
>> > > versions yet. I'll start looking now.
>> > I've bisected this to the following commit
>> Thanks that's super helpful.
>> 
>> > 
>> > commit ed1cd6deb013a11959d17a94e35ce159197632da
>> > Author: Christophe Leroy 
>> > Date:   Thu Jan 31 10:08:58 2019 +
>> > 
>> > powerpc: Activate CONFIG_THREAD_INFO_IN_TASK
>> > 
>> > This patch activates CONFIG_THREAD_INFO_IN_TASK which
>> > moves the thread_info into task_struct.
>> > 
>> > I'll be the first to admit this is well beyond my area of knowledge
>> > so
>> > I'm unsure what about this patch is problematic but I can be fairly
>> > sure that a build immediately before this patch works while a build
>> > with this patch hangs.
>> It makes a pretty fundamental change to the way the kernel stores
>> some
>> information about each task, moving it off the stack and into the
>> task
>> struct.
>> 
>> It definitely has the potential to break things, but I thought we had
>> reasonable test coverage of the Book3E platforms, I have a p5020ds
>> (e5500) that I boot as part of my CI.
>> 
>> Aha. If I take your config and try to boot it on my p5020ds I get the
>> same behaviour, stuck at SMP bringup. So it seems it's something in
>> your
>> config vs corenet64_smp_defconfig that is triggering the bug.
>> 
>> Can you try bisecting what in the config triggers it?
>> 
>> To do that you checkout ed1cd6deb013a11959d17a94e35ce159197632da,
>> then
>> you build/boot with corenet64_smp_defconfig to confirm it works. Then
>> you use tools/testing/ktest/config-bisect.pl to bisect the changes in
>> the .config.
>> 
>
> The difference between a working and non working defconfig is
> CONFIG_PREEMPT specifically CONFIG_PREEMPT=y makes my system hang at
> boot.
>
> Is that now intentionally prohibited on 64-bit powerpc?

It's not prohibitied, but it probably should be because no one really
tests it properly. I have a handful of IBM machines where I boot a
PREEMPT kernel but that's about it.

The corenet configs don't have PREEMPT enabled, which suggests it was
never really supported on those machines.

But maybe someone from NXP can tell me otherwise.

cheers


Re: SMP lockup at boot on Freescale/NXP T2080 (powerpc 64)

2019-08-06 Thread Chris Packham
On Tue, 2019-08-06 at 21:32 +1000, Michael Ellerman wrote:
> Chris Packham  writes:
> > 
> > On Mon, 2019-08-05 at 14:06 +1200, Chris Packham wrote:
> > > 
> > > Hi All,
> > > 
> > > I have a custom board that uses the Freescale/NXP T2080 SoC.
> > > 
> > > The board boots fine using v4.19.60 but when I use v5.1.21 it
> > > locks
> > > up
> > > waiting for the other CPUs to come online (earlyprintk output
> > > below).
> > > If I set maxcpus=0 then the system boots all the way through to
> > > userland. The same thing happens with 5.3-rc2.
> > > 
> > > The defconfig I'm using is 
> > > https://gist.github.com/cpackham/f24d0b426f3
> > > de0eaaba17b82c3528a9d it was updated from the working v4.19.60
> > > defconfig using make olddefconfig.
> > > 
> > > Does this ring any bells for anyone?
> > > 
> > > I haven't dug into the differences between the working an non-
> > > working
> > > versions yet. I'll start looking now.
> > I've bisected this to the following commit
> Thanks that's super helpful.
> 
> > 
> > commit ed1cd6deb013a11959d17a94e35ce159197632da
> > Author: Christophe Leroy 
> > Date:   Thu Jan 31 10:08:58 2019 +
> > 
> > powerpc: Activate CONFIG_THREAD_INFO_IN_TASK
> > 
> > This patch activates CONFIG_THREAD_INFO_IN_TASK which
> > moves the thread_info into task_struct.
> > 
> > I'll be the first to admit this is well beyond my area of knowledge
> > so
> > I'm unsure what about this patch is problematic but I can be fairly
> > sure that a build immediately before this patch works while a build
> > with this patch hangs.
> It makes a pretty fundamental change to the way the kernel stores
> some
> information about each task, moving it off the stack and into the
> task
> struct.
> 
> It definitely has the potential to break things, but I thought we had
> reasonable test coverage of the Book3E platforms, I have a p5020ds
> (e5500) that I boot as part of my CI.
> 
> Aha. If I take your config and try to boot it on my p5020ds I get the
> same behaviour, stuck at SMP bringup. So it seems it's something in
> your
> config vs corenet64_smp_defconfig that is triggering the bug.
> 
> Can you try bisecting what in the config triggers it?
> 
> To do that you checkout ed1cd6deb013a11959d17a94e35ce159197632da,
> then
> you build/boot with corenet64_smp_defconfig to confirm it works. Then
> you use tools/testing/ktest/config-bisect.pl to bisect the changes in
> the .config.
> 
> cheers
> 

The difference between a working and non working defconfig is
CONFIG_PREEMPT specifically CONFIG_PREEMPT=y makes my system hang at
boot.

Is that now intentionally prohibited on 64-bit powerpc?

> > 
> > > 
> > > Booting...
> > > MMU: Supported page sizes
> > >  4 KB as direct
> > >   2048 KB as direct & indirect
> > >   4096 KB as direct
> > >  16384 KB as direct
> > >  65536 KB as direct
> > > 262144 KB as direct
> > >    1048576 KB as direct
> > > MMU: Book3E HW tablewalk enabled
> > > Linux version 5.1.21-at1+ (@chrisp-dl) (gcc version 4.9.3
> > > (crosstool-
> > > NG 
> > > crosstool-ng-1.22.0)) #24 SMP PREEMPT Mon Aug 5 01:42:00 UTC 2019
> > > Found initrd at 0xc0002f045000:0xc0003000
> > > Using CoreNet Generic machine description
> > > Found legacy serial port 0 for /soc@ffe00/serial@11c500
> > >   mem=ffe11c500, taddr=ffe11c500, irq=0, clk=3, speed=0
> > > Found legacy serial port 1 for /soc@ffe00/serial@11c600
> > >   mem=ffe11c600, taddr=ffe11c600, irq=0, clk=3, speed=0
> > > Found legacy serial port 2 for /soc@ffe00/serial@11d500
> > >   mem=ffe11d500, taddr=ffe11d500, irq=0, clk=3, speed=0
> > > Found legacy serial port 3 for /soc@ffe00/serial@11d600
> > >   mem=ffe11d600, taddr=ffe11d600, irq=0, clk=3, speed=0
> > > printk: bootconsole [udbg0] enabled
> > > CPU maps initialized for 2 threads per core
> > >  (thread shift is 1)
> > > Allocated 1856 bytes for 8 pacas
> > > -
> > > phys_mem_size = 0x1
> > > dcache_bsize  = 0x40
> > > icache_bsize  = 0x40
> > > cpu_features  = 0x0003009003b6
> > >   possible= 0x0003009003b6
> > >   always  = 0x0003008003b4
> > > cpu_user_features = 0xdc008000 0x0800
> > > mmu_features  = 0x000a0010
> > > firmware_features = 0x
> > > -
> > > CoreNet Generic board
> > > barrier-nospec: using isync; sync as speculation barrier
> > > barrier-nospec: patched 412 locations
> > > Top of RAM: 0x1, Total RAM: 0x1
> > > Memory hole size: 0MB
> > > Zone ranges:
> > >   DMA  [mem 0x-0x7fffefff]
> > >   Normal   [mem 0x7000-0x]
> > > Movable zone start for each node
> > > Early memory node ranges
> > >   node   0: [mem 0x-0x]
> > > Initmem setup node 0 [mem 0x-0x]
> > > On node 0 

Re: SMP lockup at boot on Freescale/NXP T2080 (powerpc 64)

2019-08-06 Thread Michael Ellerman
Chris Packham  writes:
> On Mon, 2019-08-05 at 14:06 +1200, Chris Packham wrote:
>> Hi All,
>> 
>> I have a custom board that uses the Freescale/NXP T2080 SoC.
>> 
>> The board boots fine using v4.19.60 but when I use v5.1.21 it locks
>> up
>> waiting for the other CPUs to come online (earlyprintk output below).
>> If I set maxcpus=0 then the system boots all the way through to
>> userland. The same thing happens with 5.3-rc2.
>> 
>> The defconfig I'm using is 
>> https://gist.github.com/cpackham/f24d0b426f3
>> de0eaaba17b82c3528a9d it was updated from the working v4.19.60
>> defconfig using make olddefconfig.
>> 
>> Does this ring any bells for anyone?
>> 
>> I haven't dug into the differences between the working an non-working
>> versions yet. I'll start looking now.
>
> I've bisected this to the following commit

Thanks that's super helpful.

> commit ed1cd6deb013a11959d17a94e35ce159197632da
> Author: Christophe Leroy 
> Date:   Thu Jan 31 10:08:58 2019 +
>
> powerpc: Activate CONFIG_THREAD_INFO_IN_TASK
> 
> This patch activates CONFIG_THREAD_INFO_IN_TASK which
> moves the thread_info into task_struct.
>
> I'll be the first to admit this is well beyond my area of knowledge so
> I'm unsure what about this patch is problematic but I can be fairly
> sure that a build immediately before this patch works while a build
> with this patch hangs.

It makes a pretty fundamental change to the way the kernel stores some
information about each task, moving it off the stack and into the task
struct.

It definitely has the potential to break things, but I thought we had
reasonable test coverage of the Book3E platforms, I have a p5020ds
(e5500) that I boot as part of my CI.

Aha. If I take your config and try to boot it on my p5020ds I get the
same behaviour, stuck at SMP bringup. So it seems it's something in your
config vs corenet64_smp_defconfig that is triggering the bug.

Can you try bisecting what in the config triggers it?

To do that you checkout ed1cd6deb013a11959d17a94e35ce159197632da, then
you build/boot with corenet64_smp_defconfig to confirm it works. Then
you use tools/testing/ktest/config-bisect.pl to bisect the changes in
the .config.

cheers


>> Booting...
>> MMU: Supported page sizes
>>  4 KB as direct
>>   2048 KB as direct & indirect
>>   4096 KB as direct
>>  16384 KB as direct
>>  65536 KB as direct
>> 262144 KB as direct
>>1048576 KB as direct
>> MMU: Book3E HW tablewalk enabled
>> Linux version 5.1.21-at1+ (@chrisp-dl) (gcc version 4.9.3 (crosstool-
>> NG 
>> crosstool-ng-1.22.0)) #24 SMP PREEMPT Mon Aug 5 01:42:00 UTC 2019
>> Found initrd at 0xc0002f045000:0xc0003000
>> Using CoreNet Generic machine description
>> Found legacy serial port 0 for /soc@ffe00/serial@11c500
>>   mem=ffe11c500, taddr=ffe11c500, irq=0, clk=3, speed=0
>> Found legacy serial port 1 for /soc@ffe00/serial@11c600
>>   mem=ffe11c600, taddr=ffe11c600, irq=0, clk=3, speed=0
>> Found legacy serial port 2 for /soc@ffe00/serial@11d500
>>   mem=ffe11d500, taddr=ffe11d500, irq=0, clk=3, speed=0
>> Found legacy serial port 3 for /soc@ffe00/serial@11d600
>>   mem=ffe11d600, taddr=ffe11d600, irq=0, clk=3, speed=0
>> printk: bootconsole [udbg0] enabled
>> CPU maps initialized for 2 threads per core
>>  (thread shift is 1)
>> Allocated 1856 bytes for 8 pacas
>> -
>> phys_mem_size = 0x1
>> dcache_bsize  = 0x40
>> icache_bsize  = 0x40
>> cpu_features  = 0x0003009003b6
>>   possible= 0x0003009003b6
>>   always  = 0x0003008003b4
>> cpu_user_features = 0xdc008000 0x0800
>> mmu_features  = 0x000a0010
>> firmware_features = 0x
>> -
>> CoreNet Generic board
>> barrier-nospec: using isync; sync as speculation barrier
>> barrier-nospec: patched 412 locations
>> Top of RAM: 0x1, Total RAM: 0x1
>> Memory hole size: 0MB
>> Zone ranges:
>>   DMA  [mem 0x-0x7fffefff]
>>   Normal   [mem 0x7000-0x]
>> Movable zone start for each node
>> Early memory node ranges
>>   node   0: [mem 0x-0x]
>> Initmem setup node 0 [mem 0x-0x]
>> On node 0 totalpages: 1048576
>>   DMA zone: 7168 pages used for memmap
>>   DMA zone: 0 pages reserved
>>   DMA zone: 524287 pages, LIFO batch:63
>>   Normal zone: 7169 pages used for memmap
>>   Normal zone: 524289 pages, LIFO batch:63
>> MMU: Allocated 2112 bytes of context maps for 255 contexts
>> percpu: Embedded 22 pages/cpu s49304 r0 d40808 u131072
>> pcpu-alloc: s49304 r0 d40808 u131072 alloc=1*1048576
>> pcpu-alloc: [0] 0 1 2 3 4 5 6 7 
>> Built 1 zonelists, mobility grouping on.  Total pages: 1034239
>> Kernel command line: console=ttyS0,115200 root=/dev/ram0
>> 

Re: SMP lockup at boot on Freescale/NXP T2080 (powerpc 64)

2019-08-05 Thread Chris Packham
On Mon, 2019-08-05 at 14:06 +1200, Chris Packham wrote:
> Hi All,
> 
> I have a custom board that uses the Freescale/NXP T2080 SoC.
> 
> The board boots fine using v4.19.60 but when I use v5.1.21 it locks
> up
> waiting for the other CPUs to come online (earlyprintk output below).
> If I set maxcpus=0 then the system boots all the way through to
> userland. The same thing happens with 5.3-rc2.
> 
> The defconfig I'm using is 
> https://gist.github.com/cpackham/f24d0b426f3
> de0eaaba17b82c3528a9d it was updated from the working v4.19.60
> defconfig using make olddefconfig.
> 
> Does this ring any bells for anyone?
> 
> I haven't dug into the differences between the working an non-working
> versions yet. I'll start looking now.

I've bisected this to the following commit

commit ed1cd6deb013a11959d17a94e35ce159197632da
Author: Christophe Leroy 
Date:   Thu Jan 31 10:08:58 2019 +

powerpc: Activate CONFIG_THREAD_INFO_IN_TASK

This patch activates CONFIG_THREAD_INFO_IN_TASK which
moves the thread_info into task_struct.

I'll be the first to admit this is well beyond my area of knowledge so
I'm unsure what about this patch is problematic but I can be fairly
sure that a build immediately before this patch works while a build
with this patch hangs.

> 
> Booting...
> MMU: Supported page sizes
>  4 KB as direct
>   2048 KB as direct & indirect
>   4096 KB as direct
>  16384 KB as direct
>  65536 KB as direct
> 262144 KB as direct
>1048576 KB as direct
> MMU: Book3E HW tablewalk enabled
> Linux version 5.1.21-at1+ (@chrisp-dl) (gcc version 4.9.3 (crosstool-
> NG 
> crosstool-ng-1.22.0)) #24 SMP PREEMPT Mon Aug 5 01:42:00 UTC 2019
> Found initrd at 0xc0002f045000:0xc0003000
> Using CoreNet Generic machine description
> Found legacy serial port 0 for /soc@ffe00/serial@11c500
>   mem=ffe11c500, taddr=ffe11c500, irq=0, clk=3, speed=0
> Found legacy serial port 1 for /soc@ffe00/serial@11c600
>   mem=ffe11c600, taddr=ffe11c600, irq=0, clk=3, speed=0
> Found legacy serial port 2 for /soc@ffe00/serial@11d500
>   mem=ffe11d500, taddr=ffe11d500, irq=0, clk=3, speed=0
> Found legacy serial port 3 for /soc@ffe00/serial@11d600
>   mem=ffe11d600, taddr=ffe11d600, irq=0, clk=3, speed=0
> printk: bootconsole [udbg0] enabled
> CPU maps initialized for 2 threads per core
>  (thread shift is 1)
> Allocated 1856 bytes for 8 pacas
> -
> phys_mem_size = 0x1
> dcache_bsize  = 0x40
> icache_bsize  = 0x40
> cpu_features  = 0x0003009003b6
>   possible= 0x0003009003b6
>   always  = 0x0003008003b4
> cpu_user_features = 0xdc008000 0x0800
> mmu_features  = 0x000a0010
> firmware_features = 0x
> -
> CoreNet Generic board
> barrier-nospec: using isync; sync as speculation barrier
> barrier-nospec: patched 412 locations
> Top of RAM: 0x1, Total RAM: 0x1
> Memory hole size: 0MB
> Zone ranges:
>   DMA  [mem 0x-0x7fffefff]
>   Normal   [mem 0x7000-0x]
> Movable zone start for each node
> Early memory node ranges
>   node   0: [mem 0x-0x]
> Initmem setup node 0 [mem 0x-0x]
> On node 0 totalpages: 1048576
>   DMA zone: 7168 pages used for memmap
>   DMA zone: 0 pages reserved
>   DMA zone: 524287 pages, LIFO batch:63
>   Normal zone: 7169 pages used for memmap
>   Normal zone: 524289 pages, LIFO batch:63
> MMU: Allocated 2112 bytes of context maps for 255 contexts
> percpu: Embedded 22 pages/cpu s49304 r0 d40808 u131072
> pcpu-alloc: s49304 r0 d40808 u131072 alloc=1*1048576
> pcpu-alloc: [0] 0 1 2 3 4 5 6 7 
> Built 1 zonelists, mobility grouping on.  Total pages: 1034239
> Kernel command line: console=ttyS0,115200 root=/dev/ram0
> releasefile=linuxbox_ppc64_e6500mc-tb233.rel bootversion=6.2.7
> loglevel=8 mtdoops.mtddev=errlog
> mtdparts=fff80.flash:4088M(user),8M(errlog)
> earlyprintk=ttyS0,115200 real_init=
> /bin/sh securitylevel=1 reladdr=0x100,1522523
> printk: log_buf_len individual max cpu contribution: 4096 bytes
> printk: log_buf_len total cpu_extra contributions: 28672 bytes
> printk: log_buf_len min size: 16384 bytes
> printk: log_buf_len: 65536 bytes
> printk: early log buf free: 12412(75%)
> Dentry cache hash table entries: 524288 (order: 10, 4194304 bytes)
> Inode-cache hash table entries: 262144 (order: 9, 2097152 bytes)
> Memory: 3979284K/4194304K available (8704K kernel code, 1584K rwdata,
> 2496K rodata, 472K init, 299K bss, 215020K reserved, 0K cma-reserved)
> SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=8, Nodes=1
> rcu: Preemptible hierarchical RCU implementation.
> rcu:RCU event tracing is enabled.
> Tasks RCU enabled.
> rcu: RCU calculated value of scheduler-enlistment delay is 25
> 

SMP lockup at boot on Freescale/NXP T2080 (powerpc 64)

2019-08-04 Thread Chris Packham
Hi All,

I have a custom board that uses the Freescale/NXP T2080 SoC.

The board boots fine using v4.19.60 but when I use v5.1.21 it locks up
waiting for the other CPUs to come online (earlyprintk output below).
If I set maxcpus=0 then the system boots all the way through to
userland. The same thing happens with 5.3-rc2.

The defconfig I'm using is https://gist.github.com/cpackham/f24d0b426f3
de0eaaba17b82c3528a9d it was updated from the working v4.19.60
defconfig using make olddefconfig.

Does this ring any bells for anyone?

I haven't dug into the differences between the working an non-working
versions yet. I'll start looking now.

Booting...
MMU: Supported page sizes
 4 KB as direct
  2048 KB as direct & indirect
  4096 KB as direct
 16384 KB as direct
 65536 KB as direct
262144 KB as direct
   1048576 KB as direct
MMU: Book3E HW tablewalk enabled
Linux version 5.1.21-at1+ (@chrisp-dl) (gcc version 4.9.3 (crosstool-NG 
crosstool-ng-1.22.0)) #24 SMP PREEMPT Mon Aug 5 01:42:00 UTC 2019
Found initrd at 0xc0002f045000:0xc0003000
Using CoreNet Generic machine description
Found legacy serial port 0 for /soc@ffe00/serial@11c500
  mem=ffe11c500, taddr=ffe11c500, irq=0, clk=3, speed=0
Found legacy serial port 1 for /soc@ffe00/serial@11c600
  mem=ffe11c600, taddr=ffe11c600, irq=0, clk=3, speed=0
Found legacy serial port 2 for /soc@ffe00/serial@11d500
  mem=ffe11d500, taddr=ffe11d500, irq=0, clk=3, speed=0
Found legacy serial port 3 for /soc@ffe00/serial@11d600
  mem=ffe11d600, taddr=ffe11d600, irq=0, clk=3, speed=0
printk: bootconsole [udbg0] enabled
CPU maps initialized for 2 threads per core
 (thread shift is 1)
Allocated 1856 bytes for 8 pacas
-
phys_mem_size = 0x1
dcache_bsize  = 0x40
icache_bsize  = 0x40
cpu_features  = 0x0003009003b6
  possible= 0x0003009003b6
  always  = 0x0003008003b4
cpu_user_features = 0xdc008000 0x0800
mmu_features  = 0x000a0010
firmware_features = 0x
-
CoreNet Generic board
barrier-nospec: using isync; sync as speculation barrier
barrier-nospec: patched 412 locations
Top of RAM: 0x1, Total RAM: 0x1
Memory hole size: 0MB
Zone ranges:
  DMA  [mem 0x-0x7fffefff]
  Normal   [mem 0x7000-0x]
Movable zone start for each node
Early memory node ranges
  node   0: [mem 0x-0x]
Initmem setup node 0 [mem 0x-0x]
On node 0 totalpages: 1048576
  DMA zone: 7168 pages used for memmap
  DMA zone: 0 pages reserved
  DMA zone: 524287 pages, LIFO batch:63
  Normal zone: 7169 pages used for memmap
  Normal zone: 524289 pages, LIFO batch:63
MMU: Allocated 2112 bytes of context maps for 255 contexts
percpu: Embedded 22 pages/cpu s49304 r0 d40808 u131072
pcpu-alloc: s49304 r0 d40808 u131072 alloc=1*1048576
pcpu-alloc: [0] 0 1 2 3 4 5 6 7 
Built 1 zonelists, mobility grouping on.  Total pages: 1034239
Kernel command line: console=ttyS0,115200 root=/dev/ram0
releasefile=linuxbox_ppc64_e6500mc-tb233.rel bootversion=6.2.7
loglevel=8 mtdoops.mtddev=errlog
mtdparts=fff80.flash:4088M(user),8M(errlog)
earlyprintk=ttyS0,115200 real_init=
/bin/sh securitylevel=1 reladdr=0x100,1522523
printk: log_buf_len individual max cpu contribution: 4096 bytes
printk: log_buf_len total cpu_extra contributions: 28672 bytes
printk: log_buf_len min size: 16384 bytes
printk: log_buf_len: 65536 bytes
printk: early log buf free: 12412(75%)
Dentry cache hash table entries: 524288 (order: 10, 4194304 bytes)
Inode-cache hash table entries: 262144 (order: 9, 2097152 bytes)
Memory: 3979284K/4194304K available (8704K kernel code, 1584K rwdata,
2496K rodata, 472K init, 299K bss, 215020K reserved, 0K cma-reserved)
SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=8, Nodes=1
rcu: Preemptible hierarchical RCU implementation.
rcu:RCU event tracing is enabled.
Tasks RCU enabled.
rcu: RCU calculated value of scheduler-enlistment delay is 25 jiffies.
NR_IRQS: 512, nr_irqs: 512, preallocated irqs: 16
mpic: Setting up MPIC " OpenPIC  " version 1.2 at ffe04, max 8 CPUs
mpic: ISU size: 512, shift: 9, mask: 1ff
mpic: Initializing for 512 sources
time_init: decrementer frequency = 37.50 MHz
time_init: processor frequency   = 1500.00 MHz
clocksource: timebase: mask: 0x max_cycles:
0x8a60dd6a9, max_idle_ns: 440795204056 ns
clocksource: timebase mult[1aab] shift[24] registered
clockevent: decrementer mult[99a] shift[32] cpu[0]
pid_max: default: 32768 minimum: 301
Mount-cache hash table entries: 8192 (order: 4, 65536 bytes)
Mountpoint-cache hash table entries: 8192 (order: 4, 65536 bytes)
e6500 family performance monitor hardware support registered
rcu: Hierarchical SRCU implementation.
smp: Bringing up secondary