Re: [stable 4.9] PANIC: double fault, error_code: 0x0 - clang boot failed on x86_64

2020-12-01 Thread Nick Desaulniers
On Tue, Dec 1, 2020 at 12:19 AM Greg Kroah-Hartman
 wrote:
>
> On Mon, Nov 30, 2020 at 12:12:39PM -0800, Nick Desaulniers wrote:
> > On Wed, Nov 25, 2020 at 10:38 PM Greg Kroah-Hartman
> >  wrote:
> > >
> > > Is the mainline 4.9 tree supposed to work with clang?  I didn't think
> > > that upstream effort started until 4.19 or so.
> >
> > (For historical records, separate from the initial bug report that
> > started this thread)
> >
> > I consider 785f11aa595b ("kbuild: Add better clang cross build
> > support") to be the starting point of a renewed effort to upstream
> > clang support. 785f11aa595b landed in v4.12-rc1.  I think most patches
> > landed between there and 4.15 (would have been my guess).  From there,
> > support was backported to 4.14, 4.9, and 4.4 for x86_64 and aarch64.
> > We still have CI coverage of those branches+arches with Clang today.
> > Pixel 2 shipped with 4.4+clang, Pixel 3 and 3a with 4.9+clang, Pixel 4
> > and 4a with 4.14+clang.  CrOS has also shipped clang built kernels
> > since 4.4+.
>
> Thanks for the info.  Naresh, does this help explain why maybe testing
> these kernel branches with clang might not be the best thing to do?

On the contrary, I think it's very much worthwhile to test these
branches with Clang.  Particularly since CrOS is shipping x86_64
devices built with Clang since 4.4.y.  This looks like a problem
that's potentially been fixed but the fix not yet identified and
backported.  It would be good for us to identify and fix the issue
before it becomes a problem for CrOS.

Though, it looks like CrOS just skipped 4.9...? Looking at:
https://chromium.googlesource.com/chromiumos/third_party/kernel/+refs
I don't see a chromeos-4.9 branch.

That said, I still find such reports helpful to track.
-- 
Thanks,
~Nick Desaulniers


Re: [stable 4.9] PANIC: double fault, error_code: 0x0 - clang boot failed on x86_64

2020-12-01 Thread Naresh Kamboju
On Tue, 1 Dec 2020 at 13:49, Greg Kroah-Hartman
 wrote:
>
> On Mon, Nov 30, 2020 at 12:12:39PM -0800, Nick Desaulniers wrote:
> >
> > (For historical records, separate from the initial bug report that
> > started this thread)
> >
> > I consider 785f11aa595b ("kbuild: Add better clang cross build
> > support") to be the starting point of a renewed effort to upstream
> > clang support. 785f11aa595b landed in v4.12-rc1.  I think most patches
> > landed between there and 4.15 (would have been my guess).  From there,
> > support was backported to 4.14, 4.9, and 4.4 for x86_64 and aarch64.
> > We still have CI coverage of those branches+arches with Clang today.
> > Pixel 2 shipped with 4.4+clang, Pixel 3 and 3a with 4.9+clang, Pixel 4
> > and 4a with 4.14+clang.  CrOS has also shipped clang built kernels
> > since 4.4+.
>
> Thanks for the info.  Naresh, does this help explain why maybe testing
> these kernel branches with clang might not be the best thing to do?

It is clear now.

FYI,
With this note LKFT will not test 4.14+clang and old branches.

- Naresh


Re: [stable 4.9] PANIC: double fault, error_code: 0x0 - clang boot failed on x86_64

2020-12-01 Thread Greg Kroah-Hartman
On Mon, Nov 30, 2020 at 12:12:39PM -0800, Nick Desaulniers wrote:
> On Wed, Nov 25, 2020 at 10:38 PM Greg Kroah-Hartman
>  wrote:
> >
> > Is the mainline 4.9 tree supposed to work with clang?  I didn't think
> > that upstream effort started until 4.19 or so.
> 
> (For historical records, separate from the initial bug report that
> started this thread)
> 
> I consider 785f11aa595b ("kbuild: Add better clang cross build
> support") to be the starting point of a renewed effort to upstream
> clang support. 785f11aa595b landed in v4.12-rc1.  I think most patches
> landed between there and 4.15 (would have been my guess).  From there,
> support was backported to 4.14, 4.9, and 4.4 for x86_64 and aarch64.
> We still have CI coverage of those branches+arches with Clang today.
> Pixel 2 shipped with 4.4+clang, Pixel 3 and 3a with 4.9+clang, Pixel 4
> and 4a with 4.14+clang.  CrOS has also shipped clang built kernels
> since 4.4+.

Thanks for the info.  Naresh, does this help explain why maybe testing
these kernel branches with clang might not be the best thing to do?

greg k-h


Re: [stable 4.9] PANIC: double fault, error_code: 0x0 - clang boot failed on x86_64

2020-11-30 Thread Nick Desaulniers
On Wed, Nov 25, 2020 at 10:38 PM Greg Kroah-Hartman
 wrote:
>
> Is the mainline 4.9 tree supposed to work with clang?  I didn't think
> that upstream effort started until 4.19 or so.

(For historical records, separate from the initial bug report that
started this thread)

I consider 785f11aa595b ("kbuild: Add better clang cross build
support") to be the starting point of a renewed effort to upstream
clang support. 785f11aa595b landed in v4.12-rc1.  I think most patches
landed between there and 4.15 (would have been my guess).  From there,
support was backported to 4.14, 4.9, and 4.4 for x86_64 and aarch64.
We still have CI coverage of those branches+arches with Clang today.
Pixel 2 shipped with 4.4+clang, Pixel 3 and 3a with 4.9+clang, Pixel 4
and 4a with 4.14+clang.  CrOS has also shipped clang built kernels
since 4.4+.
-- 
Thanks,
~Nick Desaulniers


Re: [stable 4.9] PANIC: double fault, error_code: 0x0 - clang boot failed on x86_64

2020-11-28 Thread Nathan Chancellor
On Thu, Nov 26, 2020 at 07:39:33AM +0100, Greg Kroah-Hartman wrote:
> On Thu, Nov 26, 2020 at 10:14:43AM +0530, Naresh Kamboju wrote:
> > Linaro recently started building and testing with stable branches with 
> > clang.
> > Stable 4.9 branch kernel built with clang 10 boot crashed on x86 and 
> > qemu_x86.
> > We do not have base line results to compare with.
> > 
> > steps to build and boot:
> > # build kernel with tuxmake
> > # sudo pip3 install -U tuxmake
> > # tuxmake --runtime docker --target-arch x86 --toolchain clang-10
> > --kconfig defconfig --kconfig-add
> > https://builds.tuxbuild.com/1kgtX7QEDmhvj6OfbZBdlGaEple/config
> > # boot qemu_x86_64
> > # /usr/bin/qemu-system-x86_64 -cpu host -enable-kvm -nographic -net
> > nic,model=virtio,macaddr=DE:AD:BE:EF:66:14 -net tap -m 1024 -monitor
> > none -kernel kernel/bzImage --append "root=/dev/sda  rootwait
> > console=ttyS0,115200" -hda
> > rootfs/rpb-console-image-lkft-intel-corei7-64-20201022181159-3085.rootfs.ext4
> > -m 4096 -smp 4 -nographic
> > 
> > Crash log:
> > ---
> > [   14.121499] Freeing unused kernel memory: 1896K
> > [   14.126962] random: fast init done
> > [   14.206005] PANIC: double fault, error_code: 0x0
> > [   14.210633] CPU: 1 PID: 1 Comm: systemd Not tainted 4.9.246-rc1 #2
> > [   14.216809] Hardware name: Supermicro SYS-5019S-ML/X11SSH-F, BIOS
> > 2.2 05/23/2018
> > [   14.224196] task: 88026e2c task.stack: c902
> > [   14.230105] RIP: 0010:[]  []
> > proc_dostring+0x13b/0x1e0
> > [   14.238374] RSP: 0018:000c  EFLAGS: 00010297
> > [   14.243676] RAX: 5638939fb850 RBX: 000c RCX: 
> > 5638939fb850
> > [   14.250799] RDX: 000c RSI:  RDI: 
> > 007f
> > [   14.257925] RBP: c9023d98 R08: c9023ef8 R09: 
> > 5638939fb850
> > [   14.265049] R10:  R11: 8117f9e0 R12: 
> > 82479cf0
> > [   14.272171] R13: c9023ef8 R14: c9023dd8 R15: 
> > 007f
> > [   14.279298] FS:  7f57fbce8840() GS:88027788()
> > knlGS:
> > [   14.287384] CS:  0010 DS:  ES:  CR0: 80050033
> > [   14.293120] CR2: fff8 CR3: 00026d58a000 CR4: 
> > 00360670
> > [   14.300243] DR0:  DR1:  DR2: 
> > 
> > [   14.307368] DR3:  DR6: fffe0ff0 DR7: 
> > 0400
> > [   14.314491] Stack:
> > [   14.316504] Call Trace:
> > [   14.318955] Code: c3 49 8b 10 31 f6 48 01 da 49 89 10 49 83 3e 00
> > 74 49 41 83 c7 ff 49 63 ff 4c 89 c9 0f 1f 40 00 48 39 fe 73 36 48 89
> > c8 48 89 dc  b0 9d 3a 00 85 c0 0f 85 8c 00 00 00 84 d2 74 1f 80 fa
> > 0a 74
> > [   14.338906] Kernel panic - not syncing: Machine halted.
> > [   14.344123] CPU: 1 PID: 1 Comm: systemd Not tainted 4.9.246-rc1 #2
> > [   14.350291] Hardware name: Supermicro SYS-5019S-ML/X11SSH-F, BIOS
> > 2.2 05/23/2018
> > [   14.357677]  880277888e80 81518ae9 880277888e98
> > 82971a10
> > [   14.365129]  000f  0086
> > 820c5d57
> > [   14.372584]  880277888f08 81175736 0038
> > 880277888f18
> > [   14.380038] Call Trace:
> > [   14.382481]  <#DF> [   14.384406]  [] 
> > dump_stack+0xa9/0x100
> > [   14.389641]  [] panic+0xe6/0x2a0
> > [   14.394432]  [] df_debug+0x31/0x40
> > [   14.399389]  [] do_double_fault+0x102/0x140
> > [   14.405128]  [] double_fault+0x27/0x30
> > [   14.410440]  [] ? proc_put_long+0xc0/0xc0
> > [   14.416004]  [] ? proc_dostring+0x13b/0x1e0
> > [   14.421739]   [   14.423703] Kernel Offset: disabled
> > [   14.427209] ---[ end Kernel panic - not syncing: Machine halted.
> > 
> > Reported-by: Naresh Kamboju 
> > 
> > full test log,
> > https://lkft.validation.linaro.org/scheduler/job/1978901#L916
> > https://lkft.validation.linaro.org/scheduler/job/1980839#L578
> 
> Is the mainline 4.9 tree supposed to work with clang?  I didn't think
> that upstream effort started until 4.19 or so.
> 
> thanks,
> 
> greg k-h
> 

We have been building and boot testing the mainline 4.9 tree for quite
some time. This issue appears to be exposed by the rootfs that Linaro is
using for testing; ours is incredibly simple (prints the version string
then shuts down, there is no systemd or complex init).

Some initial notes, I am not sure how much time I will have to look at
this in the near future:

1. This does not happen with the same configuration file on
   linux-4.14.y.

2. This happens with the latest version of clang on linux-4.9.y.

3. Bisecting v4.9 to v4.14 will be rather difficult because clang
   support was backported to 4.9 somewhere in the 130s.

There could be a clang backport missing or a bug was unintentionally
fixed somewhere else.

Cheers,
Nathan


Re: [stable 4.9] PANIC: double fault, error_code: 0x0 - clang boot failed on x86_64

2020-11-25 Thread Greg Kroah-Hartman
On Thu, Nov 26, 2020 at 10:14:43AM +0530, Naresh Kamboju wrote:
> Linaro recently started building and testing with stable branches with clang.
> Stable 4.9 branch kernel built with clang 10 boot crashed on x86 and qemu_x86.
> We do not have base line results to compare with.
> 
> steps to build and boot:
> # build kernel with tuxmake
> # sudo pip3 install -U tuxmake
> # tuxmake --runtime docker --target-arch x86 --toolchain clang-10
> --kconfig defconfig --kconfig-add
> https://builds.tuxbuild.com/1kgtX7QEDmhvj6OfbZBdlGaEple/config
> # boot qemu_x86_64
> # /usr/bin/qemu-system-x86_64 -cpu host -enable-kvm -nographic -net
> nic,model=virtio,macaddr=DE:AD:BE:EF:66:14 -net tap -m 1024 -monitor
> none -kernel kernel/bzImage --append "root=/dev/sda  rootwait
> console=ttyS0,115200" -hda
> rootfs/rpb-console-image-lkft-intel-corei7-64-20201022181159-3085.rootfs.ext4
> -m 4096 -smp 4 -nographic
> 
> Crash log:
> ---
> [   14.121499] Freeing unused kernel memory: 1896K
> [   14.126962] random: fast init done
> [   14.206005] PANIC: double fault, error_code: 0x0
> [   14.210633] CPU: 1 PID: 1 Comm: systemd Not tainted 4.9.246-rc1 #2
> [   14.216809] Hardware name: Supermicro SYS-5019S-ML/X11SSH-F, BIOS
> 2.2 05/23/2018
> [   14.224196] task: 88026e2c task.stack: c902
> [   14.230105] RIP: 0010:[]  []
> proc_dostring+0x13b/0x1e0
> [   14.238374] RSP: 0018:000c  EFLAGS: 00010297
> [   14.243676] RAX: 5638939fb850 RBX: 000c RCX: 
> 5638939fb850
> [   14.250799] RDX: 000c RSI:  RDI: 
> 007f
> [   14.257925] RBP: c9023d98 R08: c9023ef8 R09: 
> 5638939fb850
> [   14.265049] R10:  R11: 8117f9e0 R12: 
> 82479cf0
> [   14.272171] R13: c9023ef8 R14: c9023dd8 R15: 
> 007f
> [   14.279298] FS:  7f57fbce8840() GS:88027788()
> knlGS:
> [   14.287384] CS:  0010 DS:  ES:  CR0: 80050033
> [   14.293120] CR2: fff8 CR3: 00026d58a000 CR4: 
> 00360670
> [   14.300243] DR0:  DR1:  DR2: 
> 
> [   14.307368] DR3:  DR6: fffe0ff0 DR7: 
> 0400
> [   14.314491] Stack:
> [   14.316504] Call Trace:
> [   14.318955] Code: c3 49 8b 10 31 f6 48 01 da 49 89 10 49 83 3e 00
> 74 49 41 83 c7 ff 49 63 ff 4c 89 c9 0f 1f 40 00 48 39 fe 73 36 48 89
> c8 48 89 dc  b0 9d 3a 00 85 c0 0f 85 8c 00 00 00 84 d2 74 1f 80 fa
> 0a 74
> [   14.338906] Kernel panic - not syncing: Machine halted.
> [   14.344123] CPU: 1 PID: 1 Comm: systemd Not tainted 4.9.246-rc1 #2
> [   14.350291] Hardware name: Supermicro SYS-5019S-ML/X11SSH-F, BIOS
> 2.2 05/23/2018
> [   14.357677]  880277888e80 81518ae9 880277888e98
> 82971a10
> [   14.365129]  000f  0086
> 820c5d57
> [   14.372584]  880277888f08 81175736 0038
> 880277888f18
> [   14.380038] Call Trace:
> [   14.382481]  <#DF> [   14.384406]  [] 
> dump_stack+0xa9/0x100
> [   14.389641]  [] panic+0xe6/0x2a0
> [   14.394432]  [] df_debug+0x31/0x40
> [   14.399389]  [] do_double_fault+0x102/0x140
> [   14.405128]  [] double_fault+0x27/0x30
> [   14.410440]  [] ? proc_put_long+0xc0/0xc0
> [   14.416004]  [] ? proc_dostring+0x13b/0x1e0
> [   14.421739]   [   14.423703] Kernel Offset: disabled
> [   14.427209] ---[ end Kernel panic - not syncing: Machine halted.
> 
> Reported-by: Naresh Kamboju 
> 
> full test log,
> https://lkft.validation.linaro.org/scheduler/job/1978901#L916
> https://lkft.validation.linaro.org/scheduler/job/1980839#L578

Is the mainline 4.9 tree supposed to work with clang?  I didn't think
that upstream effort started until 4.19 or so.

thanks,

greg k-h


[stable 4.9] PANIC: double fault, error_code: 0x0 - clang boot failed on x86_64

2020-11-25 Thread Naresh Kamboju
Linaro recently started building and testing with stable branches with clang.
Stable 4.9 branch kernel built with clang 10 boot crashed on x86 and qemu_x86.
We do not have base line results to compare with.

steps to build and boot:
# build kernel with tuxmake
# sudo pip3 install -U tuxmake
# tuxmake --runtime docker --target-arch x86 --toolchain clang-10
--kconfig defconfig --kconfig-add
https://builds.tuxbuild.com/1kgtX7QEDmhvj6OfbZBdlGaEple/config
# boot qemu_x86_64
# /usr/bin/qemu-system-x86_64 -cpu host -enable-kvm -nographic -net
nic,model=virtio,macaddr=DE:AD:BE:EF:66:14 -net tap -m 1024 -monitor
none -kernel kernel/bzImage --append "root=/dev/sda  rootwait
console=ttyS0,115200" -hda
rootfs/rpb-console-image-lkft-intel-corei7-64-20201022181159-3085.rootfs.ext4
-m 4096 -smp 4 -nographic

Crash log:
---
[   14.121499] Freeing unused kernel memory: 1896K
[   14.126962] random: fast init done
[   14.206005] PANIC: double fault, error_code: 0x0
[   14.210633] CPU: 1 PID: 1 Comm: systemd Not tainted 4.9.246-rc1 #2
[   14.216809] Hardware name: Supermicro SYS-5019S-ML/X11SSH-F, BIOS
2.2 05/23/2018
[   14.224196] task: 88026e2c task.stack: c902
[   14.230105] RIP: 0010:[]  []
proc_dostring+0x13b/0x1e0
[   14.238374] RSP: 0018:000c  EFLAGS: 00010297
[   14.243676] RAX: 5638939fb850 RBX: 000c RCX: 5638939fb850
[   14.250799] RDX: 000c RSI:  RDI: 007f
[   14.257925] RBP: c9023d98 R08: c9023ef8 R09: 5638939fb850
[   14.265049] R10:  R11: 8117f9e0 R12: 82479cf0
[   14.272171] R13: c9023ef8 R14: c9023dd8 R15: 007f
[   14.279298] FS:  7f57fbce8840() GS:88027788()
knlGS:
[   14.287384] CS:  0010 DS:  ES:  CR0: 80050033
[   14.293120] CR2: fff8 CR3: 00026d58a000 CR4: 00360670
[   14.300243] DR0:  DR1:  DR2: 
[   14.307368] DR3:  DR6: fffe0ff0 DR7: 0400
[   14.314491] Stack:
[   14.316504] Call Trace:
[   14.318955] Code: c3 49 8b 10 31 f6 48 01 da 49 89 10 49 83 3e 00
74 49 41 83 c7 ff 49 63 ff 4c 89 c9 0f 1f 40 00 48 39 fe 73 36 48 89
c8 48 89 dc  b0 9d 3a 00 85 c0 0f 85 8c 00 00 00 84 d2 74 1f 80 fa
0a 74
[   14.338906] Kernel panic - not syncing: Machine halted.
[   14.344123] CPU: 1 PID: 1 Comm: systemd Not tainted 4.9.246-rc1 #2
[   14.350291] Hardware name: Supermicro SYS-5019S-ML/X11SSH-F, BIOS
2.2 05/23/2018
[   14.357677]  880277888e80 81518ae9 880277888e98
82971a10
[   14.365129]  000f  0086
820c5d57
[   14.372584]  880277888f08 81175736 0038
880277888f18
[   14.380038] Call Trace:
[   14.382481]  <#DF> [   14.384406]  [] dump_stack+0xa9/0x100
[   14.389641]  [] panic+0xe6/0x2a0
[   14.394432]  [] df_debug+0x31/0x40
[   14.399389]  [] do_double_fault+0x102/0x140
[   14.405128]  [] double_fault+0x27/0x30
[   14.410440]  [] ? proc_put_long+0xc0/0xc0
[   14.416004]  [] ? proc_dostring+0x13b/0x1e0
[   14.421739]   [   14.423703] Kernel Offset: disabled
[   14.427209] ---[ end Kernel panic - not syncing: Machine halted.

Reported-by: Naresh Kamboju 

full test log,
https://lkft.validation.linaro.org/scheduler/job/1978901#L916
https://lkft.validation.linaro.org/scheduler/job/1980839#L578

-- 
Linaro LKFT
https://lkft.linaro.org