[Bug middle-end/100593] [ELF] -fno-pic: Use GOT to take address of an external default visibility function
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100593 Andrew Pinski changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Target Milestone|--- |12.0 Resolution|--- |FIXED --- Comment #16 from Andrew Pinski --- Fixed for GCC 12.
[Bug middle-end/100593] [ELF] -fno-pic: Use GOT to take address of an external default visibility function
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100593 Nick Desaulniers changed: What|Removed |Added CC||ndesaulniers at google dot com --- Comment #15 from Nick Desaulniers --- Any chance we could get -mdirect-extern-access implemented for aarch64? Otherwise we're discussing the use of `#pragma GCC visibility push(hidden)` for use in the linux kernel since it's slightly more portable at the moment. https://lore.kernel.org/linux-arm-kernel/[email protected]/
[Bug middle-end/100593] [ELF] -fno-pic: Use GOT to take address of an external default visibility function
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100593 --- Comment #14 from CVS Commits --- The master branch has been updated by H.J. Lu : https://gcc.gnu.org/g:ab0b5fbfe90168d2e470aefb19e0cf31526290bc commit r12-7126-gab0b5fbfe90168d2e470aefb19e0cf31526290bc Author: H.J. Lu Date: Sat Jun 19 05:12:48 2021 -0700 x86: Add -m[no-]direct-extern-access Add -m[no-]direct-extern-access and nodirect_extern_access attribute. -mdirect-extern-access is the default. With nodirect_extern_access attribute, GOT is always used to access undefined data and function symbols with nodirect_extern_access attribute, including in PIE and non-PIE. With -mno-direct-extern-access: 1. Always use GOT to access undefined data and function symbols, including in PIE and non-PIE. These will avoid copy relocations in executables. This is compatible with existing executables and shared libraries. 2. In executable and shared library, bind symbols with the STV_PROTECTED visibility locally: a. The address of data symbol is the address of data body. b. For systems without function descriptor, the function pointer is the address of function body. c. The resulting shared libraries may not be incompatible with executables which have copy relocations on protected symbols or use executable PLT entries as function addresses for protected functions in shared libraries. 3. Update asm_preferred_eh_data_format to select PC relative EH encoding format with -mno-direct-extern-access to avoid copy relocation. 4. Add ix86_reloc_rw_mask for TARGET_ASM_RELOC_RW_MASK to avoid copy relocation with -mno-direct-extern-access. gcc/ PR target/35513 PR target/100593 * config/i386/gnu-property.cc: Include "i386-protos.h". (file_end_indicate_exec_stack_and_gnu_property): Generate a GNU_PROPERTY_1_NEEDED note for -mno-direct-extern-access or nodirect_extern_access attribute. * config/i386/i386-options.cc (handle_nodirect_extern_access_attribute): New function. (ix86_attribute_table): Add nodirect_extern_access attribute. * config/i386/i386-protos.h (ix86_force_load_from_GOT_p): Add a bool argument. (ix86_has_no_direct_extern_access): New. * config/i386/i386.cc (ix86_has_no_direct_extern_access): New. (ix86_force_load_from_GOT_p): Add a bool argument to indicate call operand. Force non-call load from GOT for -mno-direct-extern-access or nodirect_extern_access attribute. (legitimate_pic_address_disp_p): Avoid copy relocation in PIE for -mno-direct-extern-access or nodirect_extern_access attribute. (ix86_print_operand): Pass true to ix86_force_load_from_GOT_p for call operand. (asm_preferred_eh_data_format): Use PC-relative format for -mno-direct-extern-access to avoid copy relocation. Check ptr_mode instead of TARGET_64BIT when selecting DW_EH_PE_sdata4. (ix86_binds_local_p): Set ix86_has_no_direct_extern_access to true for -mno-direct-extern-access or nodirect_extern_access attribute. Don't treat protected data as extern and avoid copy relocation on common symbol with -mno-direct-extern-access or nodirect_extern_access attribute. (ix86_reloc_rw_mask): New to avoid copy relocation for -mno-direct-extern-access. (TARGET_ASM_RELOC_RW_MASK): New. * config/i386/i386.opt: Add -mdirect-extern-access. * doc/extend.texi: Document nodirect_extern_access attribute. * doc/invoke.texi: Document -m[no-]direct-extern-access. gcc/testsuite/ PR target/35513 PR target/100593 * g++.target/i386/pr35513-1.C: New file. * g++.target/i386/pr35513-2.C: Likewise. * gcc.target/i386/pr35513-1a.c: Likewise. * gcc.target/i386/pr35513-1b.c: Likewise. * gcc.target/i386/pr35513-2a.c: Likewise. * gcc.target/i386/pr35513-2b.c: Likewise. * gcc.target/i386/pr35513-3a.c: Likewise. * gcc.target/i386/pr35513-3b.c: Likewise. * gcc.target/i386/pr35513-4a.c: Likewise. * gcc.target/i386/pr35513-4b.c: Likewise. * gcc.target/i386/pr35513-5a.c: Likewise. * gcc.target/i386/pr35513-5b.c: Likewise. * gcc.target/i386/pr35513-6a.c: Likewise. * gcc.target/i386/pr35513-6b.c: Likewise. * gcc.target/i386/pr35513-7a.c: Likewise. * gcc.target/i386/pr35513-7b.c: Likewise. * gcc.target/i386/pr35513-8.c: Likewise. * gcc.target/i386/pr35513-9a.c: Likewise. * gcc.target/i386/pr35513-9b.c: Likewise. * gcc.target/
[Bug middle-end/100593] [ELF] -fno-pic: Use GOT to take address of an external default visibility function
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100593 --- Comment #13 from Fangrui Song --- (In reply to H.J. Lu from comment #12) > We should handle it in the whole Linux software stack: > > https://gitlab.com/x86-psABIs/x86-64-ABI/-/issues/8 > > not just in compiler. It is great that you have the desire to fix these fundamental issues :) I think a GNU_PROPERTY marker is over-engineering. See https://gitlab.com/x86-psABIs/x86-64-ABI/-/issues/8 for details. Many things (including this and PR98112) can be changed today. When -fno-direct-access-external-data/-fno-direct-access-external-function as -fno-pic default becomes prevailing, make ld warning by default for R_*_COPY/canonical PLT entries. After a while (say one or two years), let glibc ld.so warn for R_*_COPY/canonical PLT entries.
[Bug middle-end/100593] [ELF] -fno-pic: Use GOT to take address of an external default visibility function
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100593 H.J. Lu changed: What|Removed |Added CC||hjl.tools at gmail dot com --- Comment #12 from H.J. Lu --- We should handle it in the whole Linux software stack: https://gitlab.com/x86-psABIs/x86-64-ABI/-/issues/8 not just in compiler.
[Bug middle-end/100593] [ELF] -fno-pic: Use GOT to take address of an external default visibility function
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100593 --- Comment #11 from Fangrui Song --- (In reply to Alexander Monakov from comment #10) > Is there something wrong or undesirable with making this under -fno-plt (or > the noplt attribute as in your example)? > > (after all, it is a kind of PLT-avoidance transformation, just for > addressing rather than direct calling/jumping) -fno-plt is generally undesired due to longer branch instructions and performance lost when the branch target is defined in the exe/so when the linker is gold/ld.lld (they cannot optimize jmp *got to jmp target) For non-x86, -fno-plt doesn't exist at all. If implemented, there requires many more instructions which are certainly undesirable. So -fno-plt can never be a default. Using GOT to take the address of an external function in -fno-pic is just a better default. I want the behavior to become the behavior, so it should not be under -fno-plt.
[Bug middle-end/100593] [ELF] -fno-pic: Use GOT to take address of an external default visibility function
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100593 --- Comment #10 from Alexander Monakov --- Is there something wrong or undesirable with making this under -fno-plt (or the noplt attribute as in your example)? (after all, it is a kind of PLT-avoidance transformation, just for addressing rather than direct calling/jumping)
[Bug middle-end/100593] [ELF] -fno-pic: Use GOT to take address of an external default visibility function
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100593 --- Comment #9 from Fangrui Song --- I have a patch to implement this Clang. It'd be good to have a name even if GCC wants to postpone the implementation for now. How about -fdirect-access-external-function & -fno-direct-access-external-function ? It is similar to the feature request -fdirect-access-external-data
[Bug middle-end/100593] [ELF] -fno-pic: Use GOT to take address of an external default visibility function
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100593
--- Comment #8 from Fangrui Song ---
Seems that -fno-plt -fno-pic does have the required properties.
A side effect is that all external calls use the (x86-64) call
*f@GOTPCREL(%rip) (x86-32) call *f@GOT form.
The instruction is one byte longer. (Calling a function is a common case.
Taking the address in a non-vtable case is uncommon. So I'd rather punish the
uncommon address taking).
When the linker notices that the branch target is defined in the executable, it
can optimize out the GOT to use an addr32 prefix instead.
(gold and ld.lld haven't implemented the optimization for 32-bit)
__attribute__((noplt))
int f();
void h() {}
void *g()
{
h(); // call h
f(); // call *f@GOTPCREL(%rip)
return f; // movq f@GOTPCREL(%rip), %rax
}
[Bug middle-end/100593] [ELF] -fno-pic: Use GOT to take address of an external default visibility function
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100593
--- Comment #7 from Alexander Monakov ---
Thanks. I agree that inferring address significance on the linker side is
problematic.
Thinking about your original request, I was about to say that it would be very
reasonable to do under -fno-plt flag, but then I found it was already
implemented for x86-64 in gcc-7 and for 32-bit x86 in gcc-8. Compiling
int f();
void *g()
{
return f;
}
with -fno-pic -fno-plt yields
g:
movqf@GOTPCREL(%rip), %rax
ret
(yields GOTPCRELX relocation) and
g:
movlf@GOT, %eax
ret
on 32-bit (yields GOT32X relocation), so on x86 it's already implemented?
[Bug middle-end/100593] [ELF] -fno-pic: Use GOT to take address of an external default visibility function
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100593 --- Comment #6 from Fangrui Song --- (In reply to Alexander Monakov from comment #5) > Hm, I still don't think I'm misunderstanding what you're saying. I'm > familiar with the ELF standard (and FWIW I have read your blog posts on > related matters). I am responding to this sentiment from the opening comment: > > > I believe ld -Bsymbolic-functions can materialize most of the savings other > > implementations provide, without introducing complex things to ELF. > > However, since -Bsymbolic-functions doesn't play well with -fno-pic's > > canonical PLT entries, we should fix -fno-pic. > > I am saying that fixing -fno-pic is not the only possible way forward. > Rather, a restricted -Bsymbolic-functions that relaxes relocations that are > not address-significant allows to still get some (but not all) of the > benefits for unchanged -fno-pic executables. You are right. A pure linker approach is possible. However, I think the approach is inelegant, because the linker would have different preemptibility ideas on different relocation types and (as you said) indirect calls like vtable definitions are not optimized. Let's say the proposed linker option for shared objects is -Bsymbolic-plt. The discussion below focuses on default visibility definitions which would otherwise be preemptible. Let categorize relocation types first. PLT-generating: R_X86_64_PLT32 GOT-generating: R_X86_64_GOTPCREL, R_X86_64_GOTPCRELX, R_X86_64_REX_GOTPCRELX absolute (symbolic): R_X86_64_64 There are three choices. (a) If all relocation types are PLT-generating, bind branch targets directly and suppress the PLT entry. If GOT-generating/absolute relocations are present, don't change behaviors. This choice is less effective for some otherwise address-insignificant functions, e.g. non-vague-linkage virtual functions. b) If all relocation types are R_X86_64_PLT32 or GOT-generating, bind branch targets directly and suppress the PLT entry. If GOT-generating relocations are present, produce a GOT entry and an associated R_X86_64_GLOB_DAT. If absolute relocations are present, don't change behaviors. c) Always bind branch targets directly and suppress the PLT entry. If GOT-generating relocations are present, produce a GOT entry and an associated R_X86_64_GLOB_DAT. If absolute relocations are present, produce outstanding dynamic relocations of the same type. > > You misunderstand this. Emitting GOT-generating relocation in -fno-pic mode > > is the only way to avoid canonical PLT entry, if the function turns out to > > be defined in a shared object. No -Bsymbolic variant can make this > > compatible. > > Well, if you frame the goal as "eliminate canonical PLT entries", then yes, > but that in itself surely is not the end goal? The end goals are reducing > startup time (which my idea helps only partially since it may bind direct > calls but not e.g. vtable definitions) and runtime overheads (where again my > proposal is weaker but not significantly so, assuming address loads are > rarely on hot paths). Yes, the end goal is to reduce startup time and bind call targets directly if feasible. Yes, -Bsymbolic-plt can help the goal partially. > > To clarify once more. I am not outright rejecting the idea in your opening > comment. I am saying that there potentially is a lighter-weight alternative, > which may be implementable purely in the linker, and still gets most of the > benefit you're promoting (like in your Clang example). Which is nice, > because it can be rolled out sooner, individual libraries/distros/users can > opt-in and experiment as they like, etc. Such a -Bsymbolic-plt can achieve some goals. But given that the function pointer equality problems are usually benign (-fno-pic is relatively uncommon in many areas; making use of such pointer equality is not a common practice), I'd hope we just don't add that intermediate linker option.
[Bug middle-end/100593] [ELF] -fno-pic: Use GOT to take address of an external default visibility function
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100593 --- Comment #5 from Alexander Monakov --- Hm, I still don't think I'm misunderstanding what you're saying. I'm familiar with the ELF standard (and FWIW I have read your blog posts on related matters). I am responding to this sentiment from the opening comment: > I believe ld -Bsymbolic-functions can materialize most of the savings other > implementations provide, without introducing complex things to ELF. > However, since -Bsymbolic-functions doesn't play well with -fno-pic's > canonical PLT entries, we should fix -fno-pic. I am saying that fixing -fno-pic is not the only possible way forward. Rather, a restricted -Bsymbolic-functions that relaxes relocations that are not address-significant allows to still get some (but not all) of the benefits for unchanged -fno-pic executables. > You misunderstand this. Emitting GOT-generating relocation in -fno-pic mode > is the only way to avoid canonical PLT entry, if the function turns out to > be defined in a shared object. No -Bsymbolic variant can make this > compatible. Well, if you frame the goal as "eliminate canonical PLT entries", then yes, but that in itself surely is not the end goal? The end goals are reducing startup time (which my idea helps only partially since it may bind direct calls but not e.g. vtable definitions) and runtime overheads (where again my proposal is weaker but not significantly so, assuming address loads are rarely on hot paths). To clarify once more. I am not outright rejecting the idea in your opening comment. I am saying that there potentially is a lighter-weight alternative, which may be implementable purely in the linker, and still gets most of the benefit you're promoting (like in your Clang example). Which is nice, because it can be rolled out sooner, individual libraries/distros/users can opt-in and experiment as they like, etc.
[Bug middle-end/100593] [ELF] -fno-pic: Use GOT to take address of an external default visibility function
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100593 --- Comment #4 from Fangrui Song --- (In reply to Alexander Monakov from comment #3) > I understand what you're saying, but it seems we're talking past each other. > > I agree that if a library is linked with any -Bsymbolic* flag, the main > executable is at risk of broken address uniqueness unless it uses GOT > indirection. > > I am saying that if the library was linked with a more restrictive variant > of -Bsymbolic (that I called -Bsymbolic-plt), it would still get most the > benefit of -Bsymbolic, while remaining compatible with unmodified > executables. > > Would you agree? You misunderstand this. Emitting GOT-generating relocation in -fno-pic mode is the only way to avoid canonical PLT entry, if the function turns out to be defined in a shared object. No -Bsymbolic variant can make this compatible. Our goal is to eliminate symbol lookup for the function definition in the shared object. We must eliminate symbolic dynamic relocations, i.e. no JUMP_SLOT, no GLOB_DAT, no R_X86_64_64. The linker must set an address in the shared object and bind references to that address. In many programs (not long-running, not all code paths are exercised), the symbol lookup may cost more than the PLT indirection, given the sheer amount of symbol lookups. Now a -fno-pic program uses an absolute/PC-relative relocation => the linker must set an address in the executable's address space as well. The traditional ELF hack (st_value!=0, st_shndx=0) achieves this and let the shared object symbol reference bind to the executable definition. Note that we have explicitly eliminated symbol lookup for the defining shared object so the pointer equality cannot be satisfied at all.
[Bug middle-end/100593] [ELF] -fno-pic: Use GOT to take address of an external default visibility function
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100593 --- Comment #3 from Alexander Monakov --- I understand what you're saying, but it seems we're talking past each other. I agree that if a library is linked with any -Bsymbolic* flag, the main executable is at risk of broken address uniqueness unless it uses GOT indirection. I am saying that if the library was linked with a more restrictive variant of -Bsymbolic (that I called -Bsymbolic-plt), it would still get most the benefit of -Bsymbolic, while remaining compatible with unmodified executables. Would you agree?
[Bug middle-end/100593] [ELF] -fno-pic: Use GOT to take address of an external default visibility function
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100593 --- Comment #2 from Fangrui Song --- (In reply to Alexander Monakov from comment #1) > It is not necessary to change -fno-pic code generation to gain most of the > -Bsymbolic benefit It is necessary, otherwise the function address taken from the -Bsymbolic/-Bsymbolic-functions/-Bsymbolic-global-functions shared object may be different from the address taken from the -fno-pic code. The ELF hack is called canonical PLT entry, similar to copy relocations. > as you say, the most important point is to avoid jumping > via PLT trampolines (or, with -fno-plt, GOT loads) for function calls, so > the linker could do -Bsymbolic relaxation for sites where address doesn't > matter (calls and jumps) while keeping a dynamic relocation for address > loads? Under some new option of course, like -Bsymbolic-plt. Right? There are two points: (1) R_*_JUMP_SLOT symbol lookup cost (2) whether call sites get penalized by the PLT indirection. -fno-pic code must use GOT (instead of an absolute relocation) for default visibility external function access to be compatible with a -Bsymbolic/-Bsymbolic-functions/-Bsymbolic-global-functions shared object.
[Bug middle-end/100593] [ELF] -fno-pic: Use GOT to take address of an external default visibility function
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100593 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org --- Comment #1 from Alexander Monakov --- It is not necessary to change -fno-pic code generation to gain most of the -Bsymbolic benefit: as you say, the most important point is to avoid jumping via PLT trampolines (or, with -fno-plt, GOT loads) for function calls, so the linker could do -Bsymbolic relaxation for sites where address doesn't matter (calls and jumps) while keeping a dynamic relocation for address loads? Under some new option of course, like -Bsymbolic-plt. Right?
