[Bug target/84908] retpoline weirdness: 7.3.0-1 and 4.8.5-16: with -fPIC: R_X86_64_PC32 against undefined symbol `__x86_indirect_thunk_rax'
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84908 --- Comment #14 from Jason Vas Dias --- RE: Comment #13: > You said that Andi Kleen had a comment. Can you point me to it? Here is a quote, from LKML message : Subject: Re: [PATCH v4.16-rc5 2/2] x86/vdso: \ VDSO should handle clock_gettime(CLOCK_MONOTONIC_RAW) \ without syscall Kernel From: Andi Kleen Date: 17 March 2018 at 23:00 To: jason.vas.d...@gmail.com Cc: linux-ker...@vger.kernel.org, x...@kernel.org, t...@linutronix.de, mi...@kernel.org, pet...@infradead.org, a...@firstfloor.org >On Sat, Mar 17, 2018 at 02:29:34PM +, jason.vas.d...@gmail.com wrote: >> This patch allows compilation to succeed with compilers that >> support -DRETPOLINE - >> it was kindly contributed by H.J. Liu in GCC Bugzilla: 84908 : >> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84908 >> >> Apparently the GCC retpoline implementation has a limitation >> that it cannot handle switch statements with more than 5 clauses, >> which vclock_gettime.c's >> __vdso_clock_gettime function now conts. > >That's quite a mischaracterization of the issue. gcc works as intended, >but the kernel did not correctly supply a indirect call retpoline thunk >to the vdso, and it just happened to work by accident with the old >vdso. > >> >> The automated test builds should now succeed with this patch. > >How about just adding the thunk function to the vdso object instead of >this cheap hack? > >The other option would be to build vdso with inline thunks. > >But just disabling is completely the wrong action. > >-Andi So, in their wisdom, the kernel developers have decided it is best to compile the VDSO with indirect-branch("thunk-extern,register"), and we need to work around the limitations this imposes. RE: Comment #13: > I also think that static inlines have anything to do with it. > Nor so I see why any function attributes make any sense. On the contrary, I think function attributes and static inline-ness have ALOT to do with it - the problem does not occur if the called functions have the SAME indirect-branch attributes as the caller function, or if the called function is not static inline. It definitely appears to be something to do with GCC having to instantiate a callable version of a static inline, when that inline has different indirect-branch attributes than its caller. The '-mindirect-branch=thunk-extern -mindirect-branch-register -DRETPOLINE' flags should have the effect of making the default function attributes to be '__attribute__((indirect-branch("thunk-extern,register"))' for any function that does not specify other indirect-branch attributes, so when instantiating this callable version of the inline GCC needs to generate a thunk-extern relocation. The code which produces the special stripped & augmented version of the VDSO object does not include support for such relocations and complains about them, failing the builds. I also have many unanswered questions about this bug, mentioned above - perhaps H.J. could clarify ? : o why exactly is it that the 6-clause switch in the first post triggers the relocation, and the 5-clause switch does not? why cannot this limitation be removed from GCC? o why should GCC be trying to generate an extern thunk for an indirect branch to a static inline function anyway ? I think static-inlineness should always trump indirect-branch("thunk-extern"). o GCC appears to provide no way for users to construct their own jump tables without relocations - why does every reference to a PC-relative label produce a relocation? I do believe this
[Bug target/84908] retpoline weirdness: 7.3.0-1 and 4.8.5-16: with -fPIC: R_X86_64_PC32 against undefined symbol `__x86_indirect_thunk_rax'
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84908 --- Comment #13 from Andy Lutomirski --- I find this whole discussion very confusing. The problem has nothing to do with relocations AFAICT. The problem is that gcc is (as requested) generating retpolines, and it's set up to do it by calling __x86_indirect_thunk_*, and those helpers don't exist in the vDSO. I also think that static inlines have anything to do with it. Nor so I see why any function attributes make any sense. The trivial fix is: diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile index 261802b1cc50..8140176b8b41 100644 --- a/arch/x86/entry/vdso/Makefile +++ b/arch/x86/entry/vdso/Makefile @@ -74,7 +74,7 @@ CFL := $(PROFILING) -mcmodel=small -fPIC -O2 -fasynchronous-unwind-tables -m64 \ -fno-omit-frame-pointer -foptimize-sibling-calls \ -DDISABLE_BRANCH_PROFILING -DBUILD_VDSO -$(vobjs): KBUILD_CFLAGS := $(filter-out $(GCC_PLUGINS_CFLAGS),$(KBUILD_CFLAGS)) $(CFL) +$(vobjs): KBUILD_CFLAGS := $(filter-out $(GCC_PLUGINS_CFLAGS) $(RETPOLINE_CFLAGS),$(KBUILD_CFLAGS)) $(CFL) # # vDSO code runs in userspace and -pg doesn't help with profiling anyway. @@ -138,6 +138,7 @@ KBUILD_CFLAGS_32 := $(filter-out -mcmodel=kernel,$(KBUILD_CFLAGS_32)) KBUILD_CFLAGS_32 := $(filter-out -fno-pic,$(KBUILD_CFLAGS_32)) KBUILD_CFLAGS_32 := $(filter-out -mfentry,$(KBUILD_CFLAGS_32)) KBUILD_CFLAGS_32 := $(filter-out $(GCC_PLUGINS_CFLAGS),$(KBUILD_CFLAGS_32)) +KBUILD_CFLAGS_32 := $(filter-out $(RETPOLINE_CFLAGS),$(KBUILD_CFLAGS_32)) KBUILD_CFLAGS_32 += -m32 -msoft-float -mregparm=0 -fpic KBUILD_CFLAGS_32 += $(call cc-option, -fno-stack-protector) KBUILD_CFLAGS_32 += $(call cc-option, -foptimize-sibling-calls) but it might be better to use the internal thunk mechanism. You said that Andi Kleen had a comment. Can you point me to it?
[Bug target/84908] retpoline weirdness: 7.3.0-1 and 4.8.5-16: with -fPIC: R_X86_64_PC32 against undefined symbol `__x86_indirect_thunk_rax'
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84908 --- Comment #12 from Jason Vas Dias --- RE: Comment #11 : > notrace int _RETPOLINE_FUNC_ATTR_ > __vdso_clock_gettime(clockid_t clock, struct timespec *ts) should of course be notrace _RETPOLINE_FUNC_ATTR_ int __vdso_clock_gettime(clockid_t clock, struct timespec *ts) - sorry, I was typing from memory, not pasting.
[Bug target/84908] retpoline weirdness: 7.3.0-1 and 4.8.5-16: with -fPIC: R_X86_64_PC32 against undefined symbol `__x86_indirect_thunk_rax'
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84908 --- Comment #11 from Jason Vas Dias --- In reply to Comment #9 : Thanks Andy - I think it is because when the retpoline flags are enabled , the 'static inline' function calls in vclock_gettime.c have default function attributes which differ from those that __vdso_clock_gettime ( the function containing the switch ) gets, so GCC finds it must instantiate a callable version of the normally static inline function calls this function makes, and generates relocations for them which are not getting copied into the VDSO for some reason. Rather than making more radical changes to the VDSO assembly code to handle these types of relocations, I think it is better to avoid them being generated by either: A) compiling vclock_gettime.c without '-mindirect-branch=thunk-extern -mindirect-branch-register -DRETPOLINE' (rejected by Andi Kleen) B) Specifying ALL entry points that __vdso_clock_gettime calls to have attributes that do not require relocation for GCC to handle if the above flags are in effect: #ifdef RETPOLINE +# define _NO_THUNK_RELOCS_()(indirect_branch("keep"),\ + function_return("keep")) +# define _RETPOLINE_FUNC_ATTR_ __attribute__(_NO_THUNK_RELOCS_()) +# define _RETPOLINE_INLINE_ inline +#else +# define _RETPOLINE_FUNC_ATTR_ +# define _RETPOLINE_INLINE_ __always_inline +#endif So then one declares any function called by __vdso_clock_gettime() like: notrace static _RETPOLINE_FUNC_ATTR_ _RETPOLINE_INLINE_ int do_monotonic_raw( int clock, struct timespec *ts ) { ... } and then must declare notrace int _RETPOLINE_FUNC_ATTR_ __vdso_clock_gettime(clockid_t clock, struct timespec *ts) {... do_monotonic_raw(ts); ...} to avoid the relocations being generated. I did submit a patch to vclock_gettime.c to do this - should I re-submit it? Or one could fix the code which produces the resulting vdso to include the generated (unecessary and redundant) relocations, or generate the relocation functions in assembler somehow, but I don't think this is the way the VDSO should go. There is no reason to make functions in the VDSO have function attributes indirect_branch("thunk-extern,register") - they always WILL generate relocations if more that 5 such calls are made in same switch, with all current versions of GCC, it seems. Every function that directly invokes these functions (in glibc), must also have these function attributes, which are entirely unnecessary and inefficient compared to the indirect_branch("keep") style; and those relocations then must be in the VDSO for callers to invoke.
[Bug target/84908] retpoline weirdness: 7.3.0-1 and 4.8.5-16: with -fPIC: R_X86_64_PC32 against undefined symbol `__x86_indirect_thunk_rax'
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84908 --- Comment #10 from H.J. Lu --- (In reply to Andy Lutomirski from comment #9) > I haven't fully dug into this, but I do one one immediate question: why is > GCC generating a jump table for a five-entry switch statement if retpolines > are on? This has got to be a *huge* performance loss. The retpoline > sequence is very, very slow, and branches aren't that slow. A five-entry > switch is only three branches deep. I opened: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86952
[Bug target/84908] retpoline weirdness: 7.3.0-1 and 4.8.5-16: with -fPIC: R_X86_64_PC32 against undefined symbol `__x86_indirect_thunk_rax'
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84908 Andy Lutomirski changed: What|Removed |Added CC||luto at kernel dot org --- Comment #9 from Andy Lutomirski --- I haven't fully dug into this, but I do one one immediate question: why is GCC generating a jump table for a five-entry switch statement if retpolines are on? This has got to be a *huge* performance loss. The retpoline sequence is very, very slow, and branches aren't that slow. A five-entry switch is only three branches deep.
[Bug target/84908] retpoline weirdness: 7.3.0-1 and 4.8.5-16: with -fPIC: R_X86_64_PC32 against undefined symbol `__x86_indirect_thunk_rax'
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84908 --- Comment #8 from Jason Vas Dias --- Thanks for the clarification, and I hope the kernel developers stop compiling the mainline vDSO with -mindirect-branch=thunk-extern -mindirect-branch-register . But there are still a few things I am trying to figure out : o Why is the thunk entry reference & relocation inserted for 6 switch clauses and not for 5 ? o So do I understand correctly: __x86_indirect_thunk_rax is resolved and jumped to save %rax , jump to the address in %rcx, and then restore %rax, and return ? it is normally A) generated for libraries / executable as required or B) is in libgcc ? If (A), then what triggers it and why is it not being generated for the vDSO? If (B), then where is the code in libgcc ? I can't find it. I'd also like more details on why it is wrong to compile the vDSO with these flags - it does work, and yes causes compilation problems like this one which can be worked around by declaring all 'notrace static inline' entry points in vclock_gettime.c to have the function attributes: static inline __attribute__(( indirect_branch("keep"), function_return("keep") )) int do_monotonic_raw( int clock, struct timespec *ts ); which effectively disables the effect of -mindirect-branch=thunk-extern -mindirect-branch-register for these functions ; also __vdso_clock_gettime itself MUST be then declared: __attribute__(( indirect_branch("keep"), function_return("keep") )) int __vdso_clock_gettime ( ... ) { ... But does it cause any other issues ? I did investigate changing the switch in __vdso_clock_gettime to : #define _GTOD_LABEL_PREFIX_ _vcg_ #define _SYMCAT_(_S1_,_S2_) _S1_##_S2_ #define _GTOD_CLK_LABEL_(_CLK_) _SYMCAT_(_GTOD_LABEL_PREFIX_,_CLK_) notrace int __vdso_clock_gettime(clockid_t clock, struct timespec *ts) { static const void * clk_jmp_tbl[ MAX_CLOCKS ] = { [ CLOCK_REALTIME ] = &&_GTOD_CLK_LABEL_(CLOCK_REALTIME) , [ CLOCK_MONOTONIC ] = &&_GTOD_CLK_LABEL_(CLOCK_MONOTONIC) , [ CLOCK_PROCESS_CPUTIME_ID ]= & , [ CLOCK_THREAD_CPUTIME_ID ]= & , [ CLOCK_MONOTONIC_RAW ]= &&_GTOD_CLK_LABEL_(CLOCK_MONOTONIC_RAW) , [ CLOCK_REALTIME_COARSE]= &&_GTOD_CLK_LABEL_(CLOCK_REALTIME_COARSE) , [ CLOCK_MONOTONIC_COARSE ]= &&_GTOD_CLK_LABEL_(CLOCK_MONOTONIC_COARSE) , [ CLOCK_BOOTTIME ]= & , [ CLOCK_BOOTTIME_ALARM ]= & , [ CLOCK_SGI_CYCLE ]= & , [ CLOCK_TAI]= & , // unused clocks [ 12 ]= &_supported , [ 13 ]= &_supported , [ 14 ]= &_supported , [ 15 ]= &_supported }; goto *clk_jmp_tbl [ clock & 0xf ] ; _GTOD_CLK_LABEL_(CLOCK_REALTIME): if (do_realtime(ts) == VCLOCK_NONE) goto fallback; goto return_ok; _GTOD_CLK_LABEL_(CLOCK_MONOTONIC) : if (do_monotonic(ts) == VCLOCK_NONE) goto fallback; goto return_ok; _GTOD_CLK_LABEL_(CLOCK_MONOTONIC_RAW) : if (do_monotonic_raw(ts) == VCLOCK_NONE) goto fallback; goto return_ok; _GTOD_CLK_LABEL_(CLOCK_REALTIME_COARSE) : do_realtime_coarse(ts); goto return_ok; _GTOD_CLK_LABEL_(CLOCK_MONOTONIC_COARSE): do_monotonic_coarse(ts); goto return_ok; return_ok: return 0; not_supported: return -1; fallback: return vdso_fallback_gettime(clock, ts); } Results in 16 dynamic relocations, shown in objdump -R arch/x86/entry/vdso/vdso64.so.dbg output: DYNAMIC RELOCATION RECORDS OFFSET TYPE VALUE 04a0 R_X86_64_RELATIVE *ABS*+0x0c49 ... 518 R_X86_64_RELATIVE *ABS*+0x0b31 what's the problem with dyn relocs ? I can't understand why GCC decides to generate relocations here - it knows all the information before hand - why can't it insert code to compute ( ( LOAD_ADDRESS_OF(__vdso_clock_gettime) == %pc at entry) + label_offset ) and populate the clk_jmp_table in an initializer in the above version of the function rather than generate relocations ? Thanks & Best Regards, Jason
[Bug target/84908] retpoline weirdness: 7.3.0-1 and 4.8.5-16: with -fPIC: R_X86_64_PC32 against undefined symbol `__x86_indirect_thunk_rax'
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84908 H.J. Lu changed: What|Removed |Added Resolution|INVALID |MOVED --- Comment #7 from H.J. Lu --- It is a kernel bug: https://bugzilla.kernel.org/show_bug.cgi?id=199129
[Bug target/84908] retpoline weirdness: 7.3.0-1 and 4.8.5-16: with -fPIC: R_X86_64_PC32 against undefined symbol `__x86_indirect_thunk_rax'
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84908 Richard Biener changed: What|Removed |Added Target||x86_64-*-*, i?86-*-* Status|WAITING |RESOLVED Resolution|--- |INVALID --- Comment #6 from Richard Biener --- So not a gcc bug?.
[Bug target/84908] retpoline weirdness: 7.3.0-1 and 4.8.5-16: with -fPIC: R_X86_64_PC32 against undefined symbol `__x86_indirect_thunk_rax'
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84908 --- Comment #5 from H.J. Lu --- (In reply to H.J. Lu from comment #3) > (In reply to Jason Vas Dias from comment #2) > > Thanks H.J. - > > > > RE: > > > vDSO isn't compiled with -mindirect-branch=thunk-extern in kernel > > > 4.16-rc5. Why isn't it the case for you? > > > > All I know is , when submitting a patched vclock_gettime.c > > in which the switch in vdso_clock_gettime() had 6 clauses > > instead of 5, I received in response a mail from > > 4.16-rc5 has > > ifdef CONFIG_RETPOLINE > ifneq ($(RETPOLINE_CFLAGS),) > KBUILD_CFLAGS += $(RETPOLINE_CFLAGS) -DRETPOLINE > endif > endif > > vDSO isn't compiled with $(KBUILD_CFLAGS). Why does your kernel do it? > Please try my kernel patch at comment 4.
[Bug target/84908] retpoline weirdness: 7.3.0-1 and 4.8.5-16: with -fPIC: R_X86_64_PC32 against undefined symbol `__x86_indirect_thunk_rax'
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84908 --- Comment #4 from H.J. Lu --- Created attachment 43685 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43685=edit Kernel patch for 4.16-rc5
[Bug target/84908] retpoline weirdness: 7.3.0-1 and 4.8.5-16: with -fPIC: R_X86_64_PC32 against undefined symbol `__x86_indirect_thunk_rax'
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84908 --- Comment #3 from H.J. Lu --- (In reply to Jason Vas Dias from comment #2) > Thanks H.J. - > > RE: > > vDSO isn't compiled with -mindirect-branch=thunk-extern in kernel > > 4.16-rc5. Why isn't it the case for you? > > All I know is , when submitting a patched vclock_gettime.c > in which the switch in vdso_clock_gettime() had 6 clauses > instead of 5, I received in response a mail from 4.16-rc5 has ifdef CONFIG_RETPOLINE ifneq ($(RETPOLINE_CFLAGS),) KBUILD_CFLAGS += $(RETPOLINE_CFLAGS) -DRETPOLINE endif endif vDSO isn't compiled with $(KBUILD_CFLAGS). Why does your kernel do it? > > But I don't want my patch to have to depend on this being removed - > the removal of these flags effectively removes retpoline support, no ? > It is a shame to have to do that after it was carefully added. > > My gripe is really that nothing in GCC should work with a switch > with < 6 clauses, and fail with a switch with >= 6 clauses. > This is if statement vs jump table for switch implementation.
[Bug target/84908] retpoline weirdness: 7.3.0-1 and 4.8.5-16: with -fPIC: R_X86_64_PC32 against undefined symbol `__x86_indirect_thunk_rax'
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84908 --- Comment #2 from Jason Vas Dias --- Thanks H.J. - RE: > vDSO isn't compiled with -mindirect-branch=thunk-extern in kernel > 4.16-rc5. Why isn't it the case for you? All I know is , when submitting a patched vclock_gettime.c in which the switch in vdso_clock_gettime() had 6 clauses instead of 5, I received in response a mail from kbuild test robot: >Hi > >Thank you for the patch! Yet something to improve: > >[auto build test ERROR on v4.16-rc4] >[also build test ERROR on next-20180315] >[if your patch is applied to the wrong git tree, please drop us a note to help >>improve the system] > >url: >https://github.com/0day-ci/linux/commits/jason-vas-dias-gmail-com/x86-vdso-on-Intel-VDSO-should-handle-CLOCK_MONOTONIC_RAW/20180316-070319 >config: x86_64-rhel (attached as .config) >compiler: gcc-7 (Debian 7.3.0-1) 7.3.0 >reproduce: ># save the attached .config to linux build tree >make ARCH=x86_64 > >All errors (new ones prefixed by >>): > > arch/x86/entry/vdso/vclock_gettime.o: In function `__vdso_clock_gettime': > vclock_gettime.c:(.text+0xf7): undefined reference to > >`__x86_indirect_thunk_rax' > /usr/bin/ld: arch/x86/entry/vdso/vclock_gettime.o: relocation R_X86_64_PC32 > >against undefined symbol `__x86_indirect_thunk_rax' can not be used when > making >a shared object; recompile with -fPIC > /usr/bin/ld: final link failed: Bad value >>> collect2: error: ld returned 1 exit status >-- >>> arch/x86/entry/vdso/vdso32.so.dbg: undefined symbols found >-- >>> objcopy: 'arch/x86/entry/vdso/vdso64.so.dbg': No such file >--- >0-DAY kernel test infrastructureOpen Source Technology Center >https://lists.01.org/pipermail/kbuild-all Intel Corporation That github URL does not point to anything anymore. So it was 4.16-rc4 they were compiling against. Maybe it is fixed in 4.16-rc5. I had seen exactly the same problem when compiling the same patch with the RHEL-7 4.8.5-16-el7_4.2.x86_64 compiler, which did not happen when compiling the same code with the previous RHEL-7 4.8.5-16-el7_4.1.x86_64 compiler. So it is caused by -mindirect-branch=thunk-extern ? Aha, I see in kernel Makefile of the RHEL 3.10.0-21.1.el7 kernel: RETPOLINE_CFLAGS += $(call cc-option,-mindirect-branch=thunk-extern -mindirect-branch-register) And it is still in the CFLAGS of the kernel tagged 'v4.16-rc5:' $ grep RETPOLINE linux-4.16-rc5/Makefile RETPOLINE_CFLAGS_GCC := -mindirect-branch=thunk-extern -mindirect-branch-register RETPOLINE_CFLAGS_CLANG := -mretpoline-external-thunk RETPOLINE_CFLAGS := $(call cc-option,$(RETPOLINE_CFLAGS_GCC),$(call cc-option,$(RETPOLINE_CFLAGS_CLANG))) But the compiler I am using for the 4.x builds is 6.4.1 , which does not support -mindirect-branch=thunk-extern / -mindirect-branch-register . I will get round to building one of the latest compilers with retpoline support eventually, but it is not a priority for me at the moment, for my Linux 4.x builds. I just want to be able to build the patched vclock_gettime.c under RHEL, with the new 4.8.5-16 retpoline supporting compiler, and to be able to submit patches that don't break to the 4.16-rc5 branch . YES, I have just tested, removing '-mindirect-branch=thunk-extern -mindirect-branch-register' from the 3.10.0-693.21.1.el7 build DOES allow it to succeed with the 6-clause version of the switch in vclock_gettime.c . But I don't want my patch to have to depend on this being removed - the removal of these flags effectively removes retpoline support, no ? It is a shame to have to do that after it was carefully added. My gripe is really that nothing in GCC should work with a switch with < 6 clauses, and fail with a switch with >= 6 clauses. Just reporting this issue so it can be fixed. Thanks & Best Regards, Jason
[Bug target/84908] retpoline weirdness: 7.3.0-1 and 4.8.5-16: with -fPIC: R_X86_64_PC32 against undefined symbol `__x86_indirect_thunk_rax'
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84908 H.J. Lu changed: What|Removed |Added Status|UNCONFIRMED |WAITING Last reconfirmed||2018-03-16 Ever confirmed|0 |1 --- Comment #1 from H.J. Lu --- vDSO isn't compiled with -mindirect-branch=thunk-extern in kernel 4.16-rc5. Why isn't it the case for you?