[Bug target/55522] -funsafe-math-optimizations is unexpectedly harmful, especially w/ -shared
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55522 Andy Lutomirski changed: What|Removed |Added Severity|enhancement |major --- Comment #9 from Andy Lutomirski --- I'm changing this from an enhancement to a bug. It's a poor design, no one seems to think it's a good idea, and it's broken more than one thing in real life.
[Bug target/70146] missed-optimization: i386 hidden references should use PC32 relocations instead of GOTOFF
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70146 --- Comment #3 from Andy Lutomirski --- I think a similar optimization could be applied to default-visibility references as well. For example, gcc can generate things like this: call__x86.get_pc_thunk.ax addl$_GLOBAL_OFFSET_TABLE_, %eax movlfoo@GOT(%eax), %eax The addl is unnecessary.
[Bug target/70146] missed-optimization: i386 hidden references should use PC32 relocations instead of GOTOFF
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70146 --- Comment #6 from Andy Lutomirski --- The PLT case could be reduced significantly by -fno-plt, but the TLS issue is just sad. I don't suppose it could be fixed with just toolchain changes while retaining compatibility with existing libc versions?
[Bug target/70146] New: missed-optimization: i386 hidden references should use PC32 relocations instead of GOTOFF
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70146 Bug ID: 70146 Summary: missed-optimization: i386 hidden references should use PC32 relocations instead of GOTOFF Product: gcc Version: 5.3.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: luto at kernel dot org Target Milestone: --- If I compile: extern char foo[] __attribute__((visibility("hidden"))); char *get_foo(void) { return foo; } with gcc -m32 -O2 -fPIC, I get (annocataions trimmed for brevity): get_foo: call__x86.get_pc_thunk.ax addl$_GLOBAL_OFFSET_TABLE_, %eax lealfoo@GOTOFF(%eax), %eax ret I think this should generate: get_foo: call__x86.get_pc_thunk.ax .Lpcbase: lealfoo-.Lpcbase(%eax), %eax ret That would be smaller, faster, and result in fewer relocations and thus less time spend linking.
[Bug target/70148] New: Feature request: allow overriding the SSP canary location
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70148 Bug ID: 70148 Summary: Feature request: allow overriding the SSP canary location Product: gcc Version: 5.3.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: luto at kernel dot org Target Milestone: --- The 32-bit x86 Linux kernel jumps through ridiculous hoops to but the SSP canary where GCC expects it. (If I had been paying attention when that code was written, I would have NAKed it outright -- it's a disaster.) The 64-bit kernel jumps through less ridiculous hoops. Can we have the ability to tell GCC what segment override to use and what offset to use? It would be even better if we could tell GCC what segment override to use and make GCC emit an absolute symbol reference for the offset so we could relocate it. (The 32-bit issue is that GCC insists on using %gs, and Linux would *much* prefer %fs.)
[Bug rtl-optimization/66795] Incorrect and missed optimizations of __builtin_frame_address
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66795 --- Comment #2 from Andy Lutomirski --- (In reply to Andrew Pinski from comment #1) > The code below is invalid as __builtin_frame_address is only for reading > from the current frame. If this code is invalid, then what exactly is valid code that uses __builtin_frame_address? This is way outside the scope of the C standard. Apparently one cannot reliably write to pointers derived from __builtin_frame_address? Can one reliably *read* them? What I'm trying to do is actually the most well-defined think I can think of to do with __builtin_frame_address. Roughly: __attribute__((externally_visible)) void func(void) { unsigned long *ptr = (unsigned long *)__builtin_frame_address(0) - 2; *ptr = some_other_value; } pushq some_value call func ; the top of the stack should now contain some_other_value As it stands, gcc can't usefully compile this because the store is deleted. > Also -O2 enables -fomit-frame-pointer which might > cause the frame pointer to go away. It's the other way around. Using __builtin_frame_address(0) prevents the frame pointer from being omitted. This is an unnecessary constraint, at least for my use case. If __builtin_frame_pointer(0) returned a pointer to the stack one slot below the saved return address regardless of whether the frame pointer exists, it would work for me. Would it make sense to add a new, well-defined intrinsic that returns the address of the current function's return address? As long as the function is noinline, this would have sensible semantics. (Arguably it should be an error to use such an intrinsic in a function that isn't either noinline or forcibly inlined.)
[Bug target/81708] The x86 stack canary location should be customizable
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81708 --- Comment #2 from Andy Lutomirski --- (In reply to H. Peter Anvin from comment #1) > Well, you can choose between "__stack_chk_guard(%rip)" and "%gs:40"... :) Wow, I guess I didn't even consider the former. That would be option 5: symbol(%rip). Let's not use it for the kernel. For x86_32, we want %fs:symbol. %fs:constant would be tolarable, too. For x86_64, it depends on the end game for percpu access, I suppose.
[Bug target/81708] New: The x86 stack canary location should be customizable
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81708 Bug ID: 81708 Summary: The x86 stack canary location should be customizable Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: luto at kernel dot org Target Milestone: --- There are four credible ways to find the stack canary: 1. %fs:symbol 2. %fs:symbol(%rip) [with a PCREL relocation] 3. %gs:symbol 4. %gs:symbol(%rip) (Obviously the %rip variants only work on x86_64.) The current code is roughly equivalent to (1) or (3) where symbol is an absolute symbol equal to 0x28 or similar. Please give a command line option to choose any of the four modes and specify the symbol name. (Or just hardcode the symbol name to __gcc_stack_canary or whatever if the option is set.) My perferred solution would be -mstack-protector-cookie=gs:symname or -mstack-protector-cookie=gs:symname(%rip) or -mstack-protector-cookie=gs:0x28 depending on what's desired. I personally consider it to have been a mistake for Linux to support a stack canary without insisting that GCC fix this issue first. The x86_32 case, in particular, is a collossal mess in the kernel that slows kernel entries down and seriously overcomplicates the kernel code because the stack canary addressing mode that GCC chooses is nonsensical.
[Bug target/81490] x86: Handling of symbol ranges for __seg_fs/__seg_gs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81490 --- Comment #21 from Andy Lutomirski --- Re: the stack canary, I filed bug 81708. It seems to me that __seg_gs is analogous and users should be able to directly specify the addressing mode.
[Bug target/81490] x86: Handling of symbol ranges for __seg_fs/__seg_gs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81490 --- Comment #20 from Andy Lutomirski --- We have issues putting modules more than 2G from the main kernel no matter what, but I don't see what this has to do with %gs addressing. I still think that GCC should let us directly control the addressing mode for __seg accesses. Can someone elaborate on what the @GPREL suffix actually means?
[Bug target/81825] New: x86_64 stack realignment code is suboptimal
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81825 Bug ID: 81825 Summary: x86_64 stack realignment code is suboptimal Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: luto at kernel dot org Target Milestone: --- I compiled this: void func() { int var __attribute__((aligned(32))); asm volatile ("" :: "m" (var)); } using gcc (GCC) 7.1.1 20170622 (Red Hat 7.1.1-3). I got (after stripping CFI stuff): func: leaq8(%rsp), %r10 andq$-32, %rsp pushq -8(%r10) pushq %rbp movq%rsp, %rbp pushq %r10 popq%r10 popq%rbp leaq-8(%r10), %rsp ret I have three objections to this code. 1. The push and immediate pop of %r10 seems pointless. Maybe it's due to some weird DWARF limitation? A register allocation limitation sounds more likely, though. 2. The addressing modes used for r10 are suboptimal. Shouldn't the first instruction be just movq %rsp, %r10? By my count, this would save 12 bytes of text. 3. Couldn't the whole thing just be: pushq %rbp movq %rsp, %rbp andq $-32, %rsp function body here. rbp can't be used to locate stack variables, but rsp can. leaveq ret
[Bug target/81490] x86: Handling of symbol ranges for __seg_fs/__seg_gs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81490 --- Comment #14 from Andy Lutomirski --- (In reply to H. Peter Anvin from comment #13) > On July 20, 2017 10:47:54 AM PDT, ubizjak at gmail dot com >wrote: > >https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81490 > >OTOH, we can avoid RIP relative addresses for non-default address > >spaces, when > >-mcmodel=kernel compile flag is in effect. > > It's not just about the kernel model. The example I have is for the > small-pic model. We are considering moving away from the kernel model. The kernel, and possibly future userspace, can use very strange relocations for non-standard segments. The actual percpu space is some contiguous (presumably) block that starts at some offset from the actual GS base, but that offset is flexible. There are several possible models: a) Percpu data starts at %gs:0 (or %gs:small number) and access uses absolute addressing. This means that percpu symbols will resolve to numbers between 0 and 2G and do *not* change if the kernel is relocated. b) Percpu data starts at %gs:0 (or %gs:small number) and access uses rip-relative addressing. This requires that the kernel be located at a very high address so that the percpu data is reachable. Percpu symbols are largish numbers but still less than 2G. The symbols do change if the kernel is relocated (which is totally the opposite of normal PIC addressing, where PIC offsets are not relocated). (a) and (b) could plausibly coexist with appropriate magic in the symbol tables. c) Percpu data starts at %gs:x where x is the kernel load address plus a smallish constant. Access is like %gs:symbol(%rip). No dynamic relocations. Also no constraints on where the kernel is loaded. d) Percpu data starts at %gs:x where x is the kernel load address plus a smallish constant. Access is like %gs:symbol. There are dynamic relocations. Different choices here have different tradeoffs, and, for any given kernel code model, more than one of these is possible. I don't think -mcmodel should be the determining factor. (Models c and d would require gcc to allow a flexible stack cookie location.)
[Bug target/81490] x86: Handling of symbol ranges for __seg_fs/__seg_gs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81490 Andy Lutomirski changed: What|Removed |Added CC||luto at kernel dot org --- Comment #8 from Andy Lutomirski --- (In reply to H. Peter Anvin from comment #7) > Thinking about this some more, this is really not an aspect of __seg_* but > rather the section the symbol is placed in. An embedded system kernel, for > example, could quite possibly want to access an absolute section while the > kernel itself is position-independent; maybe it is even running XIP. > > As such perhaps a better solution would be to have attributes that control > these specific parameters. A quick off-the-top-of-my-head proposal: > > __attribute__((absolute)) > > ... this symbol should be accessed using absolute address references. > > __attribute__((range(from,to))) > > ... specifies the valid address range for this symbol. I would like to see something along these lines but even stronger: a way to instruct GCC to use a particular type of access and relocation. For example, I think I should be able to ask GCC to reference symbol "foo" using a PC-relative relocation or using an absolute relocation at my option.
[Bug target/81490] x86: Handling of symbol ranges for __seg_fs/__seg_gs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81490 --- Comment #28 from Andy Lutomirski --- hjl, I don't understand what problem you're trying to solve here. Your patch makes it relatively straightforward to access global variables using __seg_gs if the environment sets up the %gs base exactly the way GCC wants it to be set up. Why is this useful? As far as I know, anyone using __seg_gs is trying to do something that is outside the scope of what normal C code does. They have some addressing model that they want to use, and they're probably trying to access data that is *not* a normal global variable. On Linux, we want to use __seg_gs to access *percpu* data, which is allocated by a fancy percpu allocator and relocated by the kernel's internal relocation processing. The latter is indeed based on data emitted by binutils, but it is *not* a conventional ELF relocation engine. It will, of course, adapt as needed. When I do: extern int __seg_gs [insert more addressing details here?] variable; int val = variable; I am *not* trying to access something that would necessarily be credibly defined by: int __seg_gs [whatever] variable; I'm trying to access something that is defined by a mechanism that is completely invisible to gcc, may involve linker script magic, and may well involve magic page table manipulation. Also, please do not consider the stack cookie at %gs:0x28 to be a design constraint in any respect. We're talking about using a new GCC version to solve problems that aren't cleanly solved on existing GCC. That new GCC can and should fix the silly stack cookie addressing, too, so it stops getting in the way.
[Bug target/81708] The x86 stack canary location should be customizable
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81708 --- Comment #7 from Andy Lutomirski --- Hmm. This is a big improvement, but it's still going to be awkward to use -- if we want to use a normal Linux percpu variable, we're stuck putting it in a fixed location that's known at compile time as opposed to just at link time. It also still prevents Linux from switching to PC-relative percpu addressing to reduce text size. Can we do -mstack-protector-offset=[symbol name]? And could it be extended to ask for PC-relative addressing?
[Bug target/68491] libgcc calls __get_cpuid with 0 level breaks on early 486
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68491 Andy Lutomirski changed: What|Removed |Added CC||luto at kernel dot org --- Comment #9 from Andy Lutomirski --- I'm a bit late to the party, but this patch seems dubious to me. __get_cpuid_max() fails to distinguish between CPUs that have max level 0 (although I doubt that any such CPUs exist) and CPUs that don't have CPUID at all. Shouldn't __get_cpuid_max return -1 if CPUID isn't there or, alternatively, just generally return the last level + 1 (and 0 if CPUID isn't there)?
[Bug target/84908] retpoline weirdness: 7.3.0-1 and 4.8.5-16: with -fPIC: R_X86_64_PC32 against undefined symbol `__x86_indirect_thunk_rax'
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84908 --- Comment #13 from Andy Lutomirski --- I find this whole discussion very confusing. The problem has nothing to do with relocations AFAICT. The problem is that gcc is (as requested) generating retpolines, and it's set up to do it by calling __x86_indirect_thunk_*, and those helpers don't exist in the vDSO. I also think that static inlines have anything to do with it. Nor so I see why any function attributes make any sense. The trivial fix is: diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile index 261802b1cc50..8140176b8b41 100644 --- a/arch/x86/entry/vdso/Makefile +++ b/arch/x86/entry/vdso/Makefile @@ -74,7 +74,7 @@ CFL := $(PROFILING) -mcmodel=small -fPIC -O2 -fasynchronous-unwind-tables -m64 \ -fno-omit-frame-pointer -foptimize-sibling-calls \ -DDISABLE_BRANCH_PROFILING -DBUILD_VDSO -$(vobjs): KBUILD_CFLAGS := $(filter-out $(GCC_PLUGINS_CFLAGS),$(KBUILD_CFLAGS)) $(CFL) +$(vobjs): KBUILD_CFLAGS := $(filter-out $(GCC_PLUGINS_CFLAGS) $(RETPOLINE_CFLAGS),$(KBUILD_CFLAGS)) $(CFL) # # vDSO code runs in userspace and -pg doesn't help with profiling anyway. @@ -138,6 +138,7 @@ KBUILD_CFLAGS_32 := $(filter-out -mcmodel=kernel,$(KBUILD_CFLAGS_32)) KBUILD_CFLAGS_32 := $(filter-out -fno-pic,$(KBUILD_CFLAGS_32)) KBUILD_CFLAGS_32 := $(filter-out -mfentry,$(KBUILD_CFLAGS_32)) KBUILD_CFLAGS_32 := $(filter-out $(GCC_PLUGINS_CFLAGS),$(KBUILD_CFLAGS_32)) +KBUILD_CFLAGS_32 := $(filter-out $(RETPOLINE_CFLAGS),$(KBUILD_CFLAGS_32)) KBUILD_CFLAGS_32 += -m32 -msoft-float -mregparm=0 -fpic KBUILD_CFLAGS_32 += $(call cc-option, -fno-stack-protector) KBUILD_CFLAGS_32 += $(call cc-option, -foptimize-sibling-calls) but it might be better to use the internal thunk mechanism. You said that Andi Kleen had a comment. Can you point me to it?
[Bug target/84908] retpoline weirdness: 7.3.0-1 and 4.8.5-16: with -fPIC: R_X86_64_PC32 against undefined symbol `__x86_indirect_thunk_rax'
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84908 Andy Lutomirski changed: What|Removed |Added CC||luto at kernel dot org --- Comment #9 from Andy Lutomirski --- I haven't fully dug into this, but I do one one immediate question: why is GCC generating a jump table for a five-entry switch statement if retpolines are on? This has got to be a *huge* performance loss. The retpoline sequence is very, very slow, and branches aren't that slow. A five-entry switch is only three branches deep.
[Bug libstdc++/91357] New: _GLIBCXX_ASSERTIONS rejects possibly-valid code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91357 Bug ID: 91357 Summary: _GLIBCXX_ASSERTIONS rejects possibly-valid code Product: gcc Version: 9.1.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: libstdc++ Assignee: unassigned at gcc dot gnu.org Reporter: luto at kernel dot org Target Milestone: --- Created attachment 46672 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=46672=edit Example For a non-empty vector v, calling v.operator[](v.size()) fires this assertion: /usr/include/c++/9/bits/stl_vector.h:1042: std::vector<_Tp, _Alloc>::reference std::vector<_Tp, _Alloc>::operator[](std::vector<_Tp, _Alloc>::size_type) [with _Tp = int; _Alloc = std::allocator; std::vector<_Tp, _Alloc>::reference = int&; std::vector<_Tp, _Alloc>::size_type = long unsigned int]: Assertion '__builtin_expect(__n < this->size(), true)' failed. After some digging into the standard, I'm not entirely sure whether that particular invocation of operator[] is UB or otherwise invalid. It's certainly poor form -- the right way in modern C++ to form a one-past-the-end pointer to a vector is (v.data() + v.size()), but, if it's indeed legal, then _GLIBCXX_ASSERTIONS should not be firing. I suggest one of the following resolutions: 1. Determine that merely calling v.operator[](v.size()) is UB or that it is otherwise permissible for a conforming implementation to crash when this happens. This would require some careful standard reading and maybe even a change to the standard. 2. Weaken the assertion to only crash if the index is strictly greater than the size. Calling v.operator[](v.size() + 1) is definitely UB (except perhaps in the pathological case where v.size() + 1 == 0, but I doubt it's possible to make a vector that large on any supported platform). 3. Use compiler magic to only fire the assertion if the pointer returned by operator[] is dereferenced for real, e.g. dereferenced and converted to an lvalue. Maybe gcc could gain a builtin like __builtin_return_value_is_definitely_dereferenced(). 4. Improve the assertion so that, when this assertion fires, a very clear message is printed to stderr saying something like "This assertion fired due to the formation of a one-past-the-end reference. This may indicate a stylistic issue with the program but is not necessarily a bug."
[Bug c++/57998] Unhelpful error message when a class has no move constructor
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57998 Andy Lutomirski changed: What|Removed |Added Resolution|--- |DUPLICATE Status|UNCONFIRMED |RESOLVED --- Comment #2 from Andy Lutomirski --- Indeed. That's slightly embarrassing. I admit that after more than six years I don't remember how I made this mistake :) *** This bug has been marked as a duplicate of bug 53234 ***
[Bug c++/53234] [c++0x] unfriendly error message for missing move constructor
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53234 --- Comment #2 from Andy Lutomirski --- *** Bug 57998 has been marked as a duplicate of this bug. ***
[Bug rtl-optimization/48877] Inline asm for rdtsc generates silly code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=48877 Andy Lutomirski changed: What|Removed |Added Resolution|--- |FIXED Status|NEW |RESOLVED --- Comment #3 from Andy Lutomirski --- (In reply to Ivan Sorokin from comment #2) > Modern GCC doesn't generate excessive moves for this example. It looks like > the problem was fixed in 4.9.0: https://godbolt.org/z/MqE7sP . > > I think the bug can be closed now. Indeed.
[Bug target/98997] New: Linking mismatched -fcf-protection objects generates incorrect code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98997 Bug ID: 98997 Summary: Linking mismatched -fcf-protection objects generates incorrect code Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: luto at kernel dot org Target Milestone: --- $ gcc --version gcc (GCC) 10.2.1 20201125 (Red Hat 10.2.1-9) Copyright (C) 2020 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. $ cat cet1.c #include void test(jmp_buf buf) { __builtin_longjmp(buf, 1); } $ cat cet2.c #include void test(jmp_buf buf); int main() { jmp_buf buf; if (__builtin_setjmp(buf) == 0) test(buf); return 0; } $ gcc -c -O2 cet1.c -fcf-protection=full $ gcc -c -O2 cet2.c $ gcc -o cet -O2 cet1.o cet2.o $ ./cet Illegal instruction (core dumped) Presumably the magic CET fixup in the builtin setjmp/longjmp requires that matching code is generated for both. I'm testing on a non-CET system. I haven't even tried to figure out if anything sensible happens to the CET GNU property, but I think the current GCC behavior is impolite at best. I think GCC should do one of a few things: 1. Generate working code. 2. Fail to link. 3. At least document this pitfall very clearly. I really hope this doesn't affect shared objects.
[Bug target/98997] Linking mismatched -fcf-protection objects generates incorrect code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98997 --- Comment #4 from Andy Lutomirski --- I should add: on brief inspection, that patch looks like an ABI break for -fcf-protection=none
[Bug target/98997] Linking mismatched -fcf-protection objects generates incorrect code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98997 --- Comment #3 from Andy Lutomirski --- What is -fcf-protection=stack actually supposed to do as compared to -fcf-protection=none? Is it valid to run code compiled with -fcf-protection=none with SHSTK enabled? If so, then I wonder why -fcf-protection=stack exists at all. If not, then I'm wondering why your patch seems to be effectively hardcoding "stack" mode for SJLJ? You could probably fix this bug differently by changing __builtin_setjmp() to store 0 for SSP in "none" mode. Then at least -fcf-protection=none wouldn't emit CET code.
[Bug target/55522] -funsafe-math-optimizations is unexpectedly harmful, especially w/ -shared
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55522 --- Comment #16 from Andy Lutomirski --- (In reply to Andrew Pinski from comment #13) > Patient: Doctor it hurts when I do this. > Doctor: then don't do that and if you read the instructions I gave you I > told you I would hurt this way. I think this analogy is off. Maybe for most questionable optimization or codegen options it would apply, but -ffast-math is more like: Patient: Doctor, it hurts me when my neighbor does this. Doctor: Then tell your neighbor not to read that and to read the fine print on the box. If nitroglycerin pills have side effects and patients need to read the label, maybe fine. But if they start exploding and breaking other peoples' windows, that's a whole different problem.