Re: [objtool] ca5e2b42c0: kernel_BUG_at_arch/x86/kernel/jump_label.c

2022-09-28 Thread Nathan Chancellor
On Wed, Sep 28, 2022 at 12:13:53PM -0700, Josh Poimboeuf wrote:
> On Wed, Sep 28, 2022 at 08:44:27AM -0700, Nathan Chancellor wrote:
> > This crash appears to just be a symptom of objtool erroring throughout
> > the entire build, which means things like the jump label hacks do not
> > get applied. I see a flood of
> > 
> >   error: objtool: --mnop requires --mcount
> > 
> > throughout the build because the configuration has
> > CONFIG_HAVE_NOP_MCOUNT=y because CONFIG_HAVE_OBJTOOL_MCOUNT is
> > unconditionally enabled for x86_64 due to CONFIG_HAVE_OBJTOOL but
> > '--mcount' is only actually used when CONFIG_FTRACE_MCOUNT_USE_OBJTOOL
> > is enabled so '--mnop' gets passed in without '--mcount'. This should
> > obviously be fixed somehow, perhaps by moving the '--mnop' addition into
> > the '--mcount' if, even if that makes the line really long.
> > 
> > A secondary issue is that it seems like if objtool encounters a fatal
> > error like this, it should completely fail the build to make it obvious
> > that something is wrong, rather than allowing it to continue and
> > generate a broken kernel, especially since x86_64 requires objtool to
> > build a working kernel at this point.
> 
> Grrr... I really dislike that objtool is capable of bricking the kernel
> like this.  We just saw something similar in RHEL.
> 
> IMO, we should just get rid of this "short JMP" feature in the jump
> label code, those saved three bytes aren't worth the pain.
> 
> But yes, we do need to fix that config issue.

Right, I actually see that the report I was CC'd on was a part of a
larger thread, where Naveen already suggested the fix for this problem,
which is not clang specific it seems:

https://lore.kernel.org/1663223588.wppdx3129x.nav...@linux.ibm.com/

> And yes, maybe fatal objtool warnings should cause a build failure.  We
> used to do that, but it brought a different sort of pain.  But if
> objtool is going to be in the kernel's critical boot path then I guess
> we have to do that.

Right, that was

  644592d32837 ("objtool: Fail the kernel build on fatal errors")

which was reverted in

  655cf86548a3 ("objtool: Don't fail the kernel build on fatal errors")

objtool should not error on warnings but it seems like it should error
for invalid option combinations and other misconfiguration problems? Did
this regress with commit b51277eb9775 ("objtool: Ditch subcommands")? I
can see that the return code of the subcommands would be passed back via
exit() (?) so objtool could fail the build if there was a true problem
but after that change, objtool_run() does not have its return code
checked so any errors that happen don't get passed back up. Perhaps just
the following diff would resolve this? I assume we would need to look at
all the different return values to know if this is safe though.

Cheers,
Nathan

diff --git a/tools/objtool/objtool.c b/tools/objtool/objtool.c
index a7ecc32e3512..cda649644e32 100644
--- a/tools/objtool/objtool.c
+++ b/tools/objtool/objtool.c
@@ -146,7 +146,5 @@ int main(int argc, const char **argv)
exec_cmd_init("objtool", UNUSED, UNUSED, UNUSED);
pager_init(UNUSED);
 
-   objtool_run(argc, argv);
-
-   return 0;
+   return objtool_run(argc, argv);
 }


Re: [objtool] ca5e2b42c0: kernel_BUG_at_arch/x86/kernel/jump_label.c

2022-09-28 Thread Josh Poimboeuf
On Wed, Sep 28, 2022 at 08:44:27AM -0700, Nathan Chancellor wrote:
> This crash appears to just be a symptom of objtool erroring throughout
> the entire build, which means things like the jump label hacks do not
> get applied. I see a flood of
> 
>   error: objtool: --mnop requires --mcount
> 
> throughout the build because the configuration has
> CONFIG_HAVE_NOP_MCOUNT=y because CONFIG_HAVE_OBJTOOL_MCOUNT is
> unconditionally enabled for x86_64 due to CONFIG_HAVE_OBJTOOL but
> '--mcount' is only actually used when CONFIG_FTRACE_MCOUNT_USE_OBJTOOL
> is enabled so '--mnop' gets passed in without '--mcount'. This should
> obviously be fixed somehow, perhaps by moving the '--mnop' addition into
> the '--mcount' if, even if that makes the line really long.
> 
> A secondary issue is that it seems like if objtool encounters a fatal
> error like this, it should completely fail the build to make it obvious
> that something is wrong, rather than allowing it to continue and
> generate a broken kernel, especially since x86_64 requires objtool to
> build a working kernel at this point.

Grrr... I really dislike that objtool is capable of bricking the kernel
like this.  We just saw something similar in RHEL.

IMO, we should just get rid of this "short JMP" feature in the jump
label code, those saved three bytes aren't worth the pain.

But yes, we do need to fix that config issue.

And yes, maybe fatal objtool warnings should cause a build failure.  We
used to do that, but it brought a different sort of pain.  But if
objtool is going to be in the kernel's critical boot path then I guess
we have to do that.

-- 
Josh


Re: [objtool] ca5e2b42c0: kernel_BUG_at_arch/x86/kernel/jump_label.c

2022-09-28 Thread Nathan Chancellor
Hi all,

On Wed, Sep 28, 2022 at 08:48:53AM +0800, kernel test robot wrote:
> Greeting,
> 
> FYI, we noticed the following commit (built with clang-14):
> 
> commit: ca5e2b42c0d4438ba93623579b6860b98f3598f3 ("[PATCH v3 11/16] objtool: 
> Add --mnop as an option to --mcount")
> url: 
> https://github.com/intel-lab-lkp/linux/commits/Sathvika-Vasireddy/objtool-Enable-and-implement-mcount-option-on-powerpc/20220912-163023
> base: https://git.kernel.org/cgit/linux/kernel/git/powerpc/linux.git 
> topic/ppc-kvm
> patch link: 
> https://lore.kernel.org/linuxppc-dev/20220912082020.226755-12...@linux.ibm.com
> 
> in testcase: boot
> 
> on test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G
> 
> caused below changes (please refer to attached dmesg/kmsg for entire 
> log/backtrace):
> 
> 
> [  152.068363][T0] jump_label: Fatal kernel bug, unexpected op at 
> trace_initcall_start+0xc/0x180 [810016ec] (e9 c9 00 00 00 != 0f 1f 44 
> 00 00)) size:5 type:1
> [  152.070368][T0] [ cut here ]
> [  152.071050][T0] kernel BUG at arch/x86/kernel/jump_label.c:73!
> [  152.071825][T0] invalid opcode:  [#1] SMP KASAN PTI
> [  152.072427][T0] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 
> 6.0.0-rc2-00011-gca5e2b42c0d4 #1 96a19ca45386d518c4bccc5b3bc53f548a2dc122
> [  152.073837][T0] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), 
> BIOS 1.16.0-debian-1.16.0-4 04/01/2014
> [  152.075461][T0] RIP: 0010:__jump_label_patch+0x340/0x350
> [  152.076162][T0] Code: 00 48 89 da e9 51 fe ff ff 48 c7 c7 00 d1 80 83 
> 4c 89 fe 4c 89 fa 4c 89 f9 49 89 d8 45 89 e9 41 54 e8 f2 91 34 02 48 83 c4 08 
> <0f> 0b 0f 0b 0f 0b 0f 0b 0f 1f 84 00 00 00 00 00 48 c7 c7 00 09 69
> [  152.078374][T0] RSP: :84607cb8 EFLAGS: 00010086
> [  152.079159][T0] RAX: 0092 RBX: 8380f62a RCX: 
> 84634d80
> [  152.080100][T0] RDX:  RSI: ffea RDI: 
> fffe
> [  152.081020][T0] RBP: 855d9f60 R08: 8124f17c R09: 
> fbfff08c0f53
> [  152.081936][T0] R10: d7fff08c0f54 R11: 108c0f52 R12: 
> 0001
> [  152.082832][T0] R13: 0005 R14: 8380f62a R15: 
> 810016ec
> [  152.083744][T0] FS:  () GS:8883aee0() 
> knlGS:
> [  152.084763][T0] CS:  0010 DS:  ES:  CR0: 80050033
> [  152.085567][T0] CR2: 88843000 CR3: 04628000 CR4: 
> 000406b0
> [  152.086472][T0] DR0:  DR1:  DR2: 
> 
> [  152.087407][T0] DR3:  DR6: fffe0ff0 DR7: 
> 0400
> [  152.088326][T0] Call Trace:
> [  152.088702][T0]  
> [  152.089042][T0]  ? trace_initcall_start+0xc/0x180
> [  152.089660][T0]  ? trace_initcall_start+0x1b/0x180
> [  152.090281][T0]  ? trace_initcall_start+0x11/0x180
> [  152.091237][T0]  ? jump_label_transform+0x25/0xd0
> [  152.091923][T0]  ? arch_jump_label_transform_queue+0x87/0xd0
> [  152.092651][T0]  ? __jump_label_update+0x192/0x3b0
> [  152.093320][T0]  ? static_key_enable_cpuslocked+0x129/0x250
> [  152.094020][T0]  ? rcu_lock_release+0x20/0x20
> [  152.094573][T0]  ? static_key_enable+0x16/0x20
> [  152.095167][T0]  ? tracepoint_add_func+0x87e/0x9d0
> [  152.095822][T0]  ? rcu_lock_release+0x20/0x20
> [  152.096394][T0]  ? tracepoint_probe_register+0x99/0xd0
> [  152.097055][T0]  ? rcu_lock_release+0x20/0x20
> [  152.097606][T0]  ? initcall_debug_enable+0x21/0x6b
> [  152.098305][T0]  ? start_kernel+0x24b/0x4e6
> [  152.098861][T0]  ? secondary_startup_64_no_verify+0xce/0xdb
> [  152.099556][T0]  
> [  152.099891][T0] Modules linked in:
> [  152.100352][T0] ---[ end trace  ]---
> [  152.100980][T0] RIP: 0010:__jump_label_patch+0x340/0x350
> [  152.101652][T0] Code: 00 48 89 da e9 51 fe ff ff 48 c7 c7 00 d1 80 83 
> 4c 89 fe 4c 89 fa 4c 89 f9 49 89 d8 45 89 e9 41 54 e8 f2 91 34 02 48 83 c4 08 
> <0f> 0b 0f 0b 0f 0b 0f 0b 0f 1f 84 00 00 00 00 00 48 c7 c7 00 09 69
> [  152.103892][T0] RSP: :84607cb8 EFLAGS: 00010086
> [  152.104544][T0] RAX: 0092 RBX: 8380f62a RCX: 
> 84634d80
> [  152.105421][T0] RDX:  RSI: ffea RDI: 
> fffe
> [  152.106280][T0] RBP: 855d9f60 R08: 8124f17c R09: 
> fbfff08c0f53
> [  152.107182][T0] R10: d7fff08c0f54 R11: 108c0f52 R12: 
> 0001
> [  152.108110][T0] R13: 0005 R14: 8380f62a R15: 
> 810016ec
> [  152.109002][T0] FS:  () GS:8883aee0() 
> knlGS:
> [  152.109986][T0] CS:  0010 DS:  ES:  CR0: 80050033
> [  152.110796][T0] CR2: 88843000 CR3: 04628000