[Bug target/114641] New: sh: fdpic optimization of function address breaks pointer equality
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114641 Bug ID: 114641 Summary: sh: fdpic optimization of function address breaks pointer equality Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: bugdal at aerifal dot cx Target Milestone: --- Created attachment 57904 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57904=edit rough fix For FDPIC targets where the canonical value of function address for equality purposes is determined by the address of the function descriptor, the function symbol being locally defined is not a sufficient condition for using a GOT-relative descriptor address. The address cannot be determined at link-time, only at ldso-time, and thus must be loaded through the GOT. sh.c's legitimize_pic_address wrongly optimizes references with SYMBOL_REF_LOCAL_P to @GOTOFFFUNCDESC form unless they are weak (for undef-weak) reasons, but also needs to refrain from doing this optimization if the symbol is external and not hidden. The test case I was working with is: #include #include int main() { printf("%p %p\n", (void *)main, dlsym(RTLD_DEFAULT, "main")); } but you can see the problem without executing anything, just looking at the emitted assembly. The attached patch fixes it but is probably not idiomatic. Note that there is a related binutils bug that prevents the fix from having an effect on the test program when linked: https://sourceware.org/bugzilla/show_bug.cgi?id=31619 With both applied, the linked output is correct too.
[Bug target/114158] Wrong FDPIC special-casing in crtstuff produces invalid pointer in init_array
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114158 --- Comment #5 from Rich Felker --- I don't know how I ended up copying the wrong commit id, but the one I meant to reference was 9c560cf23996271ee26dfc4a1d8484b85173cd12. Actually, I do know now. I got it out of the gitweb url which gratuitously ahs the parent hash in a place where it's easy to accidentally copy instead of the hash of the commit you're viewing (one of the many reasons I prefer cgit): https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=9c560cf23996271ee26dfc4a1d8484b85173cd12;hp=6bcbf80c6e2bd8a60d88bbcac3d70ffb67f4888f So indeed, the breakage was detected upstream and worked around, as I said.
[Bug libgcc/114158] New: Wrong FDPIC special-casing in crtstuff produces invalid pointer in init_array
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114158 Bug ID: 114158 Summary: Wrong FDPIC special-casing in crtstuff produces invalid pointer in init_array Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: libgcc Assignee: unassigned at gcc dot gnu.org Reporter: bugdal at aerifal dot cx Target Milestone: --- Commit 11189793b6ef60645d5d1126d0bd9d0dd83e6583 introduced wrong special-casing of FDPIC to __do_global_dtors_aux handling in crtstuff.c. For some reason, it was assumed that, on FDPIC targets, init/fini arrays would contain instruction addresses rather than function addresses (which are addresses of descriptors, on FDPIC targets). This is NOT the case. The gABI contract of the init/fini arrays is that they contain ABI-callable function pointers, and in fact GCC correctly emits FUNCDESC-type relocations referencing then when translating ctors/dtors, on ARM as well as sh. It seems to have been realized that this was not working, as 6bcbf80c6e2bd8a60d88bbcac3d70ffb67f4888f disabled initfini arrays on ARM/FDPIC, but didn't identify the root cause. Commit 11189793b6ef60645d5d1126d0bd9d0dd83e6583 should be reverted ASAP, and backported to all maintained versions, as it's actively breaking other targets by putting an invalid function pointer in the init_array. Commit 6bcbf80c6e2bd8a60d88bbcac3d70ffb67f4888f should also be reverted in theory, but may need coordination with uclibc if they want to work around binaries built with broken versions. Further discussion of the issue can be found on the musl mailing list, in this thread where myself and the author of the in-progress xtensa/fdpic port were trying to figure out what's going on here: https://www.openwall.com/lists/musl/2024/02/28/12
[Bug target/114060] asm constraints getting GOT address for ARM/FDPIC look wrong
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114060 --- Comment #2 from Rich Felker --- How could there be such a contract? In order to call any other function, the GOT address of the callee needs to be loaded, replacing the caller's value, which must be spilled and reloaded if it's needed again -- but if it's not needed again, it makes sense to just discard it. On SH (and AFAIK FRV, the original FDPIC), GCC happily discards the FDPIC/GOT register when it won't be used again. Maybe as an implementation detail GCC is not doing that on ARM right now, but if not, that's probably a big missed optimization and not something libgcc unwinder code should be relying on.
[Bug libgcc/114060] New: asm constraints getting GOT address for ARM/FDPIC look wrong
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114060 Bug ID: 114060 Summary: asm constraints getting GOT address for ARM/FDPIC look wrong Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: libgcc Assignee: unassigned at gcc dot gnu.org Reporter: bugdal at aerifal dot cx Target Milestone: --- Reading the code added for unwind-pe.h for FDPIC, I came across the ARM implementation that uses FDPIC_REGNUM as an input constraint to __asm to get the GOT register value. As I understand it, this is not correct, as there is no contract that this register permanently hold the GOT address for the executing code; it's just a hidden argument register for making function calls, which the callee can throw away if it does not need to access the GOT or any global data, or spill and reload. To reliably get the GOT register, I think you need to make an actual external call to an asm function that movs the GOT register to the return-value register and returns.
[Bug c/113653] Failure to diagnose use of (non-constant-expr) const objects in static initializers
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113653 --- Comment #6 from Rich Felker --- I'm aware of the allowance to accept "other forms". It's unfortunately underspecified (does the implementation need to be specific in what forms? document them per the normal rules for implementation-defined behavior? etc.) but indeed it exists. Regardless, at least -pedantic should diagnose this, because it's a big footgun for writing code that is not valid C, that only works with certain compilers that implement C++-like behavior in C. I would also be happy with a separate warning option controlling it, named something like like -Wextended-constant-expressions.
[Bug c/113653] Failure to diagnose use of (non-constant-expr) const objects in static initializers
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113653 Rich Felker changed: What|Removed |Added Resolution|DUPLICATE |--- Status|RESOLVED|UNCONFIRMED --- Comment #4 from Rich Felker --- This is NOT a duplicate of the marked bug - that bug was complaining that invalid code didn't compile. This bug is that GCC accepts invalid code, even with -pedantic, with no diagnostic, making it impossible to catch invalid C. This bug bit me in the wild - I accepted code that should have been rejected as a constraint violation, and thereby made the project impossible to compile with other compilers for a couple releases. In standards-conforming and/or pedantic mode, the code should be rejected.
[Bug c/113653] Failure to diagnose use of (non-constant-expr) const objects in static initializers
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113653 --- Comment #1 from Rich Felker --- FWIW -pedantic also does not help.
[Bug c/113653] New: Failure to diagnose use of (non-constant-expr) const objects in static initializers
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113653 Bug ID: 113653 Summary: Failure to diagnose use of (non-constant-expr) const objects in static initializers Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: bugdal at aerifal dot cx Target Milestone: --- The following is a constraint violation: int foo() { static const int x = 1; static const int y = x; // not a constant expression return y; } However, gcc does not diagnose it as such, even with -Wall -Wextra. This appears to have been a regression somewhere between the gcc 4 era and now. I'm not sure what component this should be assigned to. I chose "c" because it's C-specific that this is not a constant expression; it would be in C++.
[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667 --- Comment #62 from Rich Felker --- The process described there would have to end at least N bits before the end of the destination buffer. The point was that it would destroy information internal to the buffer at each step along the way, before it got to the end.
[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667 --- Comment #60 from Rich Felker --- Nobody said anything about writing past end of buffer. Obviously you can't do that.
[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667 --- Comment #57 from Rich Felker --- I think one could reasonably envision an implementation that does some sort of vector loads/stores where, due to some performance constraint or avoiding special casing for possible page boundary past the end of the copy, it only wants to load N bits at a time, but the efficient store instruction always stores a full vector of 2N bits. Of course, one could also argue quite reasonably that this is a weird enough thing to do that the implementation should then just check for src==dest and early-out. I'm far less concerned about whether such mechanical breakage exists, and more concerned about the consequences of LTO/whole-program-analysis where something in the translation process can see the violated restrict qualifier, infer UB, and blow everything up. The change being requested here is really one of removing the restrict qualification from the arguments and making a custom weaker condition. This may in turn have consequences on what types of transformations are possible.
[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667 --- Comment #44 from Rich Felker --- My naive expectation is that "if ((uintptr_t)src == 0x400400)" is and should be UB, but I may be misremembering the details of the formalism by which the spec for restrict is implemented. If so, that's kinda a help, but I still think you would want to remove restrict from the arguments and apply it later, so that the fast-path head/tail copies can avoid any branch, and the check for equality can be deferred until it's known that there's a "body remainder" to copy. That's the part where you really want the benefits of restrict anyway -- without restrict it's not vectorizable because the compiler has to assume there might be nonexact overlap, in which case reordering the loads and stores in any way could change the result.
[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667 --- Comment #42 from Rich Felker --- > I'm not saying that such an implementation will be a good idea, but just a > remark: You could, in fact, keep restrict for the arguments in this case, > because the object pointed to by src and dest is not accessed at all when > src==dest. So this is correct code according to the standard. (The exact > semantics of restrict are a bit involved...) Nope, UB is invoked as soon as you evaluate src==dest, even with no dereferencing. The semantics of restrict are such that the behavior of the code must be unchanged if the pointer were replaced to a pointer to a relocated copy of the pointed-to object. Since this would alter the result of the == operator, that constraint is not satisfied and thereby the behavior is undefined.
[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667 --- Comment #37 from Rich Felker --- Also: has anyone even looked at what happens if a C memcpy with proper use of restrict gets LTO-inlined into a caller with GCC-generated memcpy calll where src==dest? That sounds like a case likely to blow up...
[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667 --- Comment #36 from Rich Felker --- > the assembly generated by the current implementations already supports that > case. Our memcpy is not written in asm but in C, and it has the restrict qualifier on src and dest. This entitles a compiler to emit asm equivalent to if (src==dest) system("rm -rf /") if it likes. I don't know how you can write a valid C implementation of memcpy that "doesn't care" about 100% overlap without giving up restrict (and the benefits it entails) entirely. If you're happy with a branch, you could probably take restrict off the arguments and do something like: if (src==dest) return; const char *restrict src2 = src; char *restrict dest2 = dest; ... but that's shoving the branch into memcpy where it's a cost on every caller making dynamic memcpys with potentially tiny size (like qsort, etc.) and obeying the contract not to call with overlapping src/dest, rather than just imposing it on bad callers.
[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667 --- Comment #28 from Rich Felker --- > No, that is not a reasonable fix, because it severely pessimizes common code > for a theoretical only problem. Far less than a call to memmove (which necessarily has something comparable to that and other unnecessary branches) pessimizes it. I also disagree that it's severe. On basically any machine with branch prediction, the branch will be predicted correctly all the time and has basically zero cost. On the other hand, the branches in memmove could go different ways depending on the caller, so it's much more machine-capability-dependent whether they can be predicted. In some sense the optimal thing to do is "nothing", just assuming it would be hard to write a memcpy that fails on src==dest. However, at the very least this precludes hardened memcpy trapping on src==dest, which might be a useful hardening feature (or rather on a range test for overlapping, which would happen to also catch exact overlap). So it would be nice if it were fixed. FWIW, I don't think single branches are relevant to overall performance in cases where the compiler is doing something reasonable by emitting a call to memcpy to implement assignment. If the object is small enough that the branch is relevant, the call overhead is even more of a big deal, and it should be inlining loads/stores to perform the assignment.
[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667 --- Comment #26 from Rich Felker --- > The only reasonable fix on the compiler side is to never emit memcpy but > always use memmove. No, it can literally just emit (equivalent at whatever intermediate form of): cmp src,dst je 1f call memcpy 1: in place of memcpy. It can even optimize out that in the case where it's provable that they're not equal, e.g. presence of restrict or one of the two objects not having had its address taken/leaked.
[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667 Rich Felker changed: What|Removed |Added CC||bugdal at aerifal dot cx --- Comment #24 from Rich Felker --- If the copy is such that gcc is happy to emit an external call to memcpy for it, there is no significant size or performance cost to emitting a branch checking for equality before making the call, and performing this branch would greatly optimize the (maybe rare in the caller, maybe not) case of self-assignment! On the other hand, expecting the libc memcpy to make this check greatly pessimizes every reasonable small use of memcpy with a gratuitous branch for what is undefined behavior and should never appear in any valid program. Fix it on the compiler side please.
[Bug middle-end/111849] GCC replaces volatile struct assignments with memcpy calls
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111849 --- Comment #2 from Rich Felker --- I agree that volatile isn't the best way to handle memcpy suppression for other purposes - it was just one of the methods I experimented with that led to me discovering this issue, which I found surprising and reported. With regards to impact of this bug, in discussion within the musl libc community where it was found, I did encounter one potentially affected user who is using volatile struct stores to write entire bitfields at once on mmio registers instead of (possibly invalid, at least inefficient) read-modify-write cycles on each bitfield member. I believe their use was unaffected, probably because the whole struct is small enouth that it gets emitted as direct load/store rather than a memcpy call.
[Bug target/111849] New: GCC replaces volatile struct assignments with memcpy calls
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111849 Bug ID: 111849 Summary: GCC replaces volatile struct assignments with memcpy calls Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: bugdal at aerifal dot cx Target Milestone: --- On at least some targets where GCC compiles struct assignments to memcpy calls, this pattern is also used when the struct objects involved are volatile-qualified. This is invalid; the memcpy function has no contract to work on volatile objects, and making it compatible with volatile objects would impose extreme implementation constraints that would limit its performance. For example, memcpy may copy the same byte more than once to avoid branches, may use special store instructions with particular cache semantics or data transfer sizes that aren't compatible with various volatile objects like memory-mapped registers, etc. I don't think the C standard is very clear on what is supposed to happen for volatile struct assignments, but they should at least be done in a way that's known to be compatible with any memory-mapped interfaces supported on the target architecture, and the safe behavior is probably implementing them as member-by-member assignment with some fixup for padding. I found this while looking at ways to suppress generation of external calls to memcpy when compiling very restrictive TUs that aren't allowed to make any external calls, and being surprised that "just add volatile" was not one of the ways. I'm filing this as target component because I think the transformation is taking place at the target backend layer on affected targets rather than earlier, but I'm not certain. This should be reviewed and possibly reclassified if that's wrong.
[Bug tree-optimization/107107] [10/11/12/13 Regression] Wrong codegen from TBAA when stores to distinct same-mode types are collapsed?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107107 --- Comment #7 from Rich Felker --- Second one filed as https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107115
[Bug middle-end/107115] New: Wrong codegen from TBAA under stores that change effective type?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107115 Bug ID: 107115 Summary: Wrong codegen from TBAA under stores that change effective type? Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: bugdal at aerifal dot cx Target Milestone: --- Created attachment 53648 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53648=edit original test case by supercat The attached test case is from user supercat on Stack Overflow (original source: https://stackoverflow.com/questions/42178179/will-casting-around-sockaddr-storage-and-sockaddr-in-break-strict-aliasing/42178347?noredirect=1#comment130510083_42178347, https://godbolt.org/z/jfv1Ge6v4) and demonstrates what appears to be wrong TBAA optimization on an object with allocated storage whose effective type changes under stores. It was first presented as another example of this kind of problem alongside the example that became https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107107, but it seems likely that the root cause is distinct. Reportedly clang/LLVM also transforms this example wrong. On 64-bit targets, the test program outputs 2/1 with optimization levels that enable -fstrict-aliasing. The expected output is 2/2. Using -fno-strict-aliasing fixes it.
[Bug middle-end/107107] Wrong codegen from TBAA when stores to distinct same-mode types are collapsed?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107107 --- Comment #1 from Rich Felker --- There's also a potentially related test case at https://godbolt.org/z/jfv1Ge6v4 - I'm not yet clear on whether it's likely to have the same root cause.
[Bug middle-end/107107] New: Wrong codegen from TBAA when stores to distinct same-mode types are collapsed?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107107 Bug ID: 107107 Summary: Wrong codegen from TBAA when stores to distinct same-mode types are collapsed? Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: bugdal at aerifal dot cx Target Milestone: --- Created attachment 53646 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53646=edit original test case by supercat The attached test case is from user supercat on Stack Overflow (original source: https://stackoverflow.com/questions/42178179/will-casting-around-sockaddr-storage-and-sockaddr-in-break-strict-aliasing/42178347?noredirect=1#comment130509588_42178347, https://godbolt.org/z/83v4ssrn4) and demonstrates wrong TBAA apparently assuming an object of type long long was not modified after the code path modifying it was collapsed with a different code path performing the modification via an lvalue of type long. On 64-bit targets, the test program outputs 1/2 with optimization levels that enable -fstrict-aliasing. The expected output is 2/2. Using -fno-strict-aliasing fixes it. I have not checked this myself, but according to others who have looked at the test case, the regression came between GCC 4.7 and 4.8.
[Bug ipa/95558] [9/10/11/12 Regression] Invalid IPA optimizations based on weak definition
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95558 --- Comment #11 from Rich Felker --- Are you sure? If pure/const discovery is no longer applied to weak definitions, it shouldn't be able to propagate to a non-inlined caller. Of course the fix may be incomplete or not working, which I guess we could tell from whether it happened prior to or after comment 5. :)
[Bug ipa/95558] [9/10/11/12 Regression] Invalid IPA optimizations based on weak definition
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95558 --- Comment #9 from Rich Felker --- Can you provide a link to the commit that might have fixed it? I imagine it's simple enough to backport, in which case I'd like to do so.
[Bug ipa/95558] [9/10/11/12 Regression] Invalid IPA optimizations based on weak definition
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95558 --- Comment #7 from Rich Felker --- > Do weak aliases fall under some implicit ODR here? The whole definition of "weak" is that it entitles you to make a definition that will be exempt from ODR, where a non-weak definition, if any, replaces it.
[Bug middle-end/95189] [9/10 Regression] memcmp being wrongly stripped like strcmp
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95189 --- Comment #30 from Rich Felker --- This is a critical codegen issue. Is it really still not fixed in 9.4.0?
[Bug rtl-optimization/98555] Functions optimized to zero length break function pointer inequality
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98555 --- Comment #5 from Rich Felker --- Ping. Could this be solved without the need for target-specific logic by, in some earlier layer, transforming entirely empty function bodies to __builtin_trap()? (And thereby relying on the target's implementation thereof, which defaults to a call to abort() if the target doesn't provide one.)
[Bug target/99491] New: [mips64] over-strict refusal to emit tail calls
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99491 Bug ID: 99491 Summary: [mips64] over-strict refusal to emit tail calls Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: bugdal at aerifal dot cx Target Milestone: --- mips_function_ok_for_sibcall refuses to generate sibcalls (except locally) on mips64 due to %gp being call-saved and the possibility that the callee is a lazy resolver stub. This is presumably correct-ish on dynamic-linked platforms with lazy resolver, due to the resolver using the caller's value of %gp, but completely gratuitous on platforms that are static linked or don't use lazy resolver, such as musl libc. Moreover, the problem could be fixed even for lazy-resolver targets by generating an indirect function call reference that forcibly loads the address and can't go through a lazy resolver, rather than a PLT-like reference that might.
[Bug libstdc++/66146] call_once not C++11-compliant on ppc64le
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66146 --- Comment #46 from Rich Felker --- It's a standard and completely reasonable assumption that, if you statically linked libstdc++ into your shared library, the copy there is for *internal use only* and cannot share objects of the standard library's types across boundaries with other libraries or the main application. The problem only comes when the library's implementation (via templates or inline code in headers) imposes the same requirement on normal dynamic linking, where it's a nonstandard and unreasonable one.
[Bug libstdc++/66146] call_once not C++11-compliant on ppc64le
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66146 --- Comment #44 from Rich Felker --- Uhg. I don't know what kind of retroactive fix for that is possible, if any, but going forward this kind of thing (assumptions that impose ABI boundaries) should not be inlined by the template. It should just expand to an external call so that the implementation details can be kept as implementation details and changed as needed.
[Bug libstdc++/66146] call_once not C++11-compliant on ppc64le
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66146 --- Comment #42 from Rich Felker --- I'm confused why this is an ABI boundary at all. Was the old implementation of std::call_once being inlined into callers? Otherwise all code operating on the same once object should be using a common implementation, either the old one or the new one, from libstdc++.
[Bug rtl-optimization/98555] Functions optimized to zero length break function pointer inequality
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98555 --- Comment #3 from Rich Felker --- > Due to "undefined behavior" of course means this isn't unexpected That would only be the case if undefined behavior were reached during execution, but it's not. This bug affects programs that do not and cannot call the zero-length function.
[Bug middle-end/98555] New: Functions optimized to zero length break function pointer inequality
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98555 Bug ID: 98555 Summary: Functions optimized to zero length break function pointer inequality Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: bugdal at aerifal dot cx Target Milestone: --- Given a function such as void foo() { __builtin_unreachable(); } or optimized to such due to unconditional undefined behavior when the function is reached, GCC emits a zero-length function. This causes the address of foo to be equal to the address of whatever function happens to follow foo, breaking the language requirement that distinct functions' addresses compare not-equal. As far as I can tell, all versions back to 4.x or earlier are affected.
[Bug target/97431] [SH] Python crashes with 'Segmentation fault with -finline-small-functions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97431 --- Comment #1 from Rich Felker --- Do you have a complete disassembly of the function it crashed in and register dump at the point of crash? That would help.
[Bug middle-end/95189] [9/10 Regression] memcmp being wrongly stripped like strcmp
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95189 --- Comment #26 from Rich Felker --- Is that complete, or is it unclear whether there are code paths other than builtin memcmp by which this is hit? Am I correct in assuming that with builtin memcmp expansion returning NULL_RTX, GCC always expands it to a function call?
[Bug middle-end/95189] [9/10 Regression] memcmp being wrongly stripped like strcmp
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95189 --- Comment #24 from Rich Felker --- The fixes do not seem trivial to backport; lots of conflicts. It would be really helpful to have versions of the patch that are minified and applicable to all affected versions that might be shipping in distros (looks like 9.2, 9.3, 10.1, and 10.2) since this is a critical codegen regression.
[Bug middle-end/95189] [9/10 Regression] memcmp being wrongly stripped like strcmp
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95189 Rich Felker changed: What|Removed |Added CC||bugdal at aerifal dot cx --- Comment #20 from Rich Felker --- For what it's worth, -fno-builtin is a workaround for this entire class of bug.
[Bug preprocessor/96952] __builtin_thread_pointer support cannot be probed
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96952 --- Comment #5 from Rich Felker --- The whole point of __has_builtin is to let you avoid the configure-time checks on compilers that support __has_builtin. If __has_builtin doesn't actually work, it's pointless that it even exists and indeed everyone should just pretend it doesn't exist and keep using configure-time checks for everything.
[Bug libstdc++/93421] futex.cc use of futex syscall is not time64-compatible
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93421 --- Comment #7 from Rich Felker --- Indeed, the direct clock_gettime syscall stuff is just unnecessary on any modern system, certainly any time64 one. I read the patch briefly and I don't see anywhere it would break anything, but it also wouldn't produce a useful Y2038-ready configuration, so I don't think it makes sense. Configure or source-level assertions should just ensure that, if time_t is larger than long and there's a distinct time64 syscall, the direct syscall is never used.
[Bug libstdc++/93421] futex.cc use of futex syscall is not time64-compatible
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93421 --- Comment #4 from Rich Felker --- Actually I didn't see it, I just saw Florian added to CC and it reminded me of the issue, which reminded me I needed to check this for riscv32 issues with the riscv32 port pending merge. :-)
[Bug libstdc++/93421] futex.cc use of futex syscall is not time64-compatible
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93421 --- Comment #2 from Rich Felker --- Rather than #if defined(SYS_futex_time64), I think it should be made: #if defined(SYS_futex_time64) && SYS_futex_time64 != SYS_futex This is in consideration of support for riscv32 and future archs without legacy syscalls. It's my intent in musl to accept the riscv32 port with SYS_futex defined to be equal to SYS_futex_time64; otherwise all software making use of SYS_futex gratuitously breaks.
[Bug preprocessor/96952] __builtin_thread_pointer support cannot be probed
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96952 Rich Felker changed: What|Removed |Added CC||bugdal at aerifal dot cx --- Comment #3 from Rich Felker --- This answer does not seem satisfactory. Whether it will be optimized is not the question. Just whether it's semantically defined. That should either be universally true on GCC versions that offer the builtin (via a libgcc function if nothing else is available) or target-specific (which is known at preprocessing time).
[Bug target/95921] [m68k] invalid codegen for __builtin_sqrt
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95921 --- Comment #4 from Rich Felker --- The related issue I meant to link to is https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93681 which is for x87, but the equivalent happens on m68k due to FLT_EVAL_METHOD being 2 here as well.
[Bug target/95921] [m68k] invalid codegen for __builtin_sqrt
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95921 --- Comment #3 from Rich Felker --- Yes,I'm aware m68k has FLT_EVAL_METHOD=2. That's not license for *functions* to return excess precision. The language specification is very clear about where excess precision is and isn't kept, and here it must not be. All results are deterministic even with excess precision. Moreover if there's excess precision where gcc's middle end didn't expect it, it will turn into cascadingly wrong optimization, possibly even making pure integer results wrong.
[Bug target/95921] [m68k] invalid codegen for __builtin_sqrt
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95921 --- Comment #1 from Rich Felker --- I wonder if the fact that GCC thinks the output of the insn is already double suggests other similar bugs in the m68k backend, though... If extended precision were working correctly, I'd think it would at least expect the result to have extended precision and be trying to drop the excess precision separately. But it's not; it's just returning. Here's my test case: double my_sqrt(double x) { return __builtin_sqrt(x); } with -O2 -std=c11 -fno-math-errno -fomit-frame-pointer The last 2 options are non-critical (GCC still uses the inline insn even with -fmath-errno and branches only for the exceptional case) but clean up the output so it's more clear what's going on.
[Bug target/95921] New: [m68k] invalid codegen for __builtin_sqrt
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95921 Bug ID: 95921 Summary: [m68k] invalid codegen for __builtin_sqrt Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: bugdal at aerifal dot cx Target Milestone: --- On ISA levels below 68040, __builtin_sqrt expands to code that performs an extended-precision sqrt operation rather than a double-precision one. Not only does this give the wrong result; it enables further cascadingly-wrong optimization ala #93806 and related bugs, because the compiler thinks the value in the output register is a double, but it's not. I think the right fix is making the rtl in m68k.md only allow long double operands unless ISA level is at least 68040, in which case the correctly-rounding instruction can be used. Then the standard function will be used instead of a builtin definition, and it can patch up the result accordingly.
[Bug ipa/95558] Invalid IPA optimizations based on weak definition
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95558 --- Comment #3 from Rich Felker --- In addition to a fix, this is going to need a workaround as well. Do you have ideas for a clean one? A dummy asm in the dummy function to kill pureness is certainly a big hammer that would work, but it precludes LTO optimization if the weak definition doesn't actually get replaced, so I don't like that. One idea I think would work, but not sure: make an external __weak_dummy_tail function that all the weak dummies tail call to. This should only take a few bytes more than just returning, and precludes pureness analysis in the TU it's in, while still allowing DCE at LTO time when the definition of __weak_dummy_tail becomes available. Is my reasoning correct here?
[Bug ipa/95558] Invalid IPA optimizations based on weak definition
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95558 --- Comment #2 from Rich Felker --- Wow. It's interesting that we've never seen this lead to incorrect codegen before, though. All weak dummies should be affected, but only in some cases does the pure get used to optimize out the external call. This suggests there's a major missed optimization around pure functions too, in addition to the wrong application of pure (transfering it from the weak definition to the external declaration) that's the bug.
[Bug middle-end/95558] New: Invalid IPA optimizations based on weak definition
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95558 Bug ID: 95558 Summary: Invalid IPA optimizations based on weak definition Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: bugdal at aerifal dot cx Target Milestone: --- Created attachment 48689 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=48689=edit test case Here is a case that came up in WIP code on musl libc, where I wanted to provide a weak dummy definition for functionality that would optionally be replaced by a strong definition elsewhere at ld time. I've been looking for some plausible explanation aside from an IPA bug, like interaction with UB, but I can't find any. In the near-minimal test case here, the function reclaim() still has all of the logic it should, but reclaim_gaps gets optimized down to a nop. What seems to be happening is that the dummy weak definition does not leak into its direct caller via IPA optimizations, but does leak to the caller's caller.
[Bug middle-end/95249] Stack protector runtime has to waste one byte on null terminator
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95249 --- Comment #2 from Rich Felker --- Indeed, using an extra zero pad byte could bump the stack frame size by 4 or 8 or 16 bytes, or could leave it unchanged, depending on alignment prior to adding the byte and the alignment requirements of the target.
[Bug middle-end/95249] New: Stack protector runtime has to waste one byte on null terminator
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95249 Bug ID: 95249 Summary: Stack protector runtime has to waste one byte on null terminator Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: bugdal at aerifal dot cx Target Milestone: --- At least glibc presently stores a null byte in the first byte of the stack protector canary value, so that string-based read overflows can't leak the canary value. On 32-bit targets, this wastes a significant portion of the randomness, making it possible that massive-scale attacks (e.g. against millions of mobile or IoT devices) will have a decent chance of some success bypassing stack protector. musl presently does not zero the first byte, but I received a suggestion that we should do so, and got to thinking about the tradeoffs involved. If GCC would skip one byte below the canary, the full range of values could be used by the stack protector runtime without the risk of string-read-based disclosure. This should be inexpensive in terms of space and time to store a single 0 byte on the stack.
[Bug tree-optimization/95097] New: Missed optimization with bitfield value ranges
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95097 Bug ID: 95097 Summary: Missed optimization with bitfield value ranges Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: bugdal at aerifal dot cx Target Milestone: --- #include struct foo { uint32_t x:20; }; int bar(struct foo f) { if (f.x) { uint32_t y = (uint32_t)f.x*4096; if (y<200) return 1; else return 2; } return 3; } Here, truth of the condition f.x implies y>=4096, but GCC does not DCE the y<200 test and return 1 codepath. I actually had this come up in real world code, where I was considering use of an inline function with nontrivial low size cases when a "page count" bitfield is zero, where I expected these nontrivial cases to be optimized out based on already having tested that the page count being nonzero, but GCC was unable to do it. LLVM/clang does it.
[Bug target/91970] arm: 64bit int to double conversion does not respect rounding mode
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91970 --- Comment #12 from Rich Felker --- There's some awful hand-written asm in libgcc/config/arm/ieee754-df.S replacing the standard libgcc2.c versions; that's the problem. But in order to use the latter it would need to be compiled with -mfloat-abi=softfp since the __aeabi_l2d function (and all the __aeabi_* apparently) use the standard soft-float EABI even on EABIHF targets. I'm not sure why you want a library function to be called for this on hardfloat targets anyway. Inlining the hi*0x1p32+lo is almost surely smaller than the function call, counting spills and conversion of the result back from GP registers to an FP register. It seems like GCC should be able to inline this idiom at a high level for *all* targets that lack a floatdidf operation but have floatsidf. Of course a high level fix is going to be hell to backport, and this really needs a backportable fix or workaround (maintained in mcm not upstream gcc) from musl perspective. Maybe the easiest way to do that is just to hack the right preprocessor conditions for a hardfloat implementation into ieee754-df.S...
[Bug target/94646] New: [arm] invalid codegen for conversion from 64-bit int to double hardfloat
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94646 Bug ID: 94646 Summary: [arm] invalid codegen for conversion from 64-bit int to double hardfloat Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: bugdal at aerifal dot cx Target Milestone: --- GCC emits a call to __aeabi_l2d to convert from long long to double. This is invalid for hardfloat ABI because it does not honor rounding modes or raise exception flags. That in turn causes the implementation of fma in musl libc to produce wrong results for non-default rounding modes.
[Bug target/94643] New: [x86_64] gratuitous sign extension of nonnegative value from 32 to 64 bits
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94643 Bug ID: 94643 Summary: [x86_64] gratuitous sign extension of nonnegative value from 32 to 64 bits Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: bugdal at aerifal dot cx Target Milestone: --- Test case: #include uint16_t a[]; uint64_t f(int i) { return a[i]*16; } Produces: movslq %edi, %rdi movzwl a(%rdi,%rdi), %eax sall$4, %eax cltq ret The value is necessarily in the range [0,1M) (in particular, nonnegative) and operation on eax has already cleared the upper bits of rax, so cltq is completely gratuitous. I've observed the same in nontrivial examples where movslq gets used.
[Bug c/94631] Wrong codegen for arithmetic on bitfields
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94631 --- Comment #8 from Rich Felker --- OK, I think it's in 6.3.1.1 Boolean, characters, and integers, ΒΆ2, but somewhat poorly worded: "The following may be used in an expression wherever an int or unsigned int may be used: - An object or expression with an integer type (other than int or unsigned int) whose integer conversion rank is less than or equal to the rank of int and unsigned int. - A bit-field of type _Bool, int, signed int, or unsigned int. If an int can represent all values of the original type (as restricted by the width, for a bit-field), the value is converted to an int; otherwise, it is converted to an unsigned int. These are called the integer promotions." The first sentence with second bullet point suggests it should behave as unsigned int, but the "as restricted by the width, for a bit-field" in the paragraph after after the bulleted list seems to confirm your interpretation.
[Bug c/94631] Wrong codegen for arithmetic on bitfields
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94631 --- Comment #7 from Rich Felker --- Can you provide a citation for that?
[Bug c/94631] Wrong codegen for arithmetic on bitfields
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94631 --- Comment #5 from Rich Felker --- No, GCC's treatment also seems to mess up bitfields smaller than int and fully governed by the standard (no implementation-defined use of non-int types): struct foo { unsigned x:31; }; struct foo bar = {0}; bar.x-1 should yield UINT_MAX but yields -1 (same representation but different type) because it behaves as a promotion from a phantom type unsigned:31 to int rather than as having type unsigned to begin with. This can of course be observed by comparing it against 0. It's subtle and dangerous because it may also trigger optimization around UB of signed overflow when the correct behavior would be well-defined modular arithmetic.
[Bug c/94631] Wrong codegen for arithmetic on bitfields
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94631 --- Comment #2 from Rich Felker --- So basically the outcome of DR120 was allowing the GCC behavior? It still seems like a bad thing, not required, and likely to produce exploitable bugs (due to truncation of arithmetic) as well as very poor-performance code (due to constant masking).
[Bug c/94631] New: Wrong codegen for arithmetic on bitfields
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94631 Bug ID: 94631 Summary: Wrong codegen for arithmetic on bitfields Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: bugdal at aerifal dot cx Target Milestone: --- Test case: struct foo { unsigned long long low:12, hi:52; }; unsigned long long bar(struct foo *p) { return p->hi*4096; } Should generate only a mask off of the low bits, but gcc generates code to mask off the low 12 bits and the high 12 bits (reducing the result to 52 bits). Presumably GCC is interpreting the expression p->hi as having a phantom type that's only 52 bits wide, rather than having type unsigned long long. clang/LLVM compiles it correctly. I don't believe there's any language in the standard supporting what GCC is doing here.
[Bug tree-optimization/14441] [tree-ssa] missed sib calling when types change
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=14441 Rich Felker changed: What|Removed |Added CC||bugdal at aerifal dot cx --- Comment #11 from Rich Felker --- I've hit what seems to be this same issue on x86_64 with minimal test case: long g(void); int f(void) { return g(); } It's actually really annoying because it causes all of the intended tail-call handling of syscall returns in musl to be non-tail calls since __syscall_ret returns long (needed for a few syscalls) but most thin syscall-wrapper functions return int. If the x86_64 version is not this same issue but something separate I can open a new bug for it.
[Bug inline-asm/87733] local register variable not honored with earlyclobber
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87733 --- Comment #35 from Rich Felker --- > Oh, your real code is different, and $10 doesn't work for that? I see. No, the real code is exactly that. What you're missing is that the kernel, entered through syscall, has a jump back to the addu after it's clobbered all the registers in the clobberlist if the syscall is interrupted and needs to be restarted.
[Bug inline-asm/87733] local register variable not honored with earlyclobber
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87733 --- Comment #33 from Rich Felker --- > An asm clobber just means "may be an output", and no operand will be assigned > a register mentioned in a clobber. There is no magic. This, plus the compiler cannot assume the value in any of the clobbered registers is preserved across the asm statement. > This is inlined just fine? It produces *wrong code* so it doesn't matter if it inlines fine. $10 is modified by the kernel in the event the syscall is restarted, so the wrong value will be loaded on restart.
[Bug inline-asm/87733] local register variable not honored with earlyclobber
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87733 --- Comment #30 from Rich Felker --- > You need to make $r10 not a clobber but an inout, of course. And not That's not a correct constraint, because it's clobbered by the kernel between the first syscall instruction's execution and the second execution of the addu instruction after the kernel returns to restart it. $10 absolutely needs to be a clobber because the kernel clobbers it. The asm block can't use any registers the kernel clobbers. > allowing the "i" just costs one more register move, not so bad imo. > So you do have a workaround now. Of course we should see if this can > actually be fixed instead ;-) I don't follow. As long as the "i" gets chosen, the asm inlines nicely. If not, it forces a gratuitous stack frame to spill a non-clobberlisted register to use as the input. The code has been working for the past 8 years with the "0"(r2) input constraint added, and would clearly be valid if r2 were pre-initialized with something.
[Bug inline-asm/87733] local register variable not honored with earlyclobber
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87733 --- Comment #28 from Rich Felker --- And it looks like I actually hit this exact bug back in 2012 but misattributed it: https://git.musl-libc.org/cgit/musl/commit/?id=4221f154ff29ab0d6be1e7beaa5ea2d1731bc58e I assumed things went haywire from using two separate "r" constraints, rather than "r" and "0", to bind the same register, but it seems the real problem was that the "="(r2) was not binding at all, and the "0"(r2) served to fix that.
[Bug inline-asm/87733] local register variable not honored with earlyclobber
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87733 --- Comment #27 from Rich Felker --- Also just realized: > Rich, forcing "n" to be in "$r10" seems to do the trick? Is that a reasonable solution for you? It doesn't even work, because the syscall clobbers basically all call-clobbered registers. Current kernels are preserving at least $25 (t9) and $28 (gp) and the syscall argument registers, so $25 may be usable, but it was deemed not clear in 2012. I'm looking back through musl git history, and this is actually why the "i" alternative was wanted -- in basically all uses, "i" is satisfiable, and avoids needing to setup a stack frame and spill a call-saved register to the stack in order to use it to hold the syscall number to reload on restart.
[Bug inline-asm/87733] local register variable not honored with earlyclobber
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87733 --- Comment #26 from Rich Felker --- Indeed, I just confirmed that binding the n input to a particular register prevents the "i" part of the "ir" alternative from working.
[Bug inline-asm/87733] local register variable not honored with earlyclobber
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87733 --- Comment #24 from Rich Felker --- The reasons I was hesitant to force n to a particular register through an extra register __asm__ temp var was that I was unsure how it would interact with the "i" constraint (maybe prevent it from being used?) and that this is code that needs to be inlined all over the place, and adding more specific-register constraints usually hurts register allocation in all functions where it's used. If the "0"(r2) input constraint seems unsafe to rely on with r2 being uninitialized (is this a real concern I should have?) just writing 0 or n to r2 before the asm would only waste one instruction and shouldn't really hurt.
[Bug inline-asm/87733] local register variable not honored with earlyclobber
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87733 --- Comment #22 from Rich Felker --- What should I call the new bug? The description sounds the same as this one, and it's fixed in gcc 9.x, just not earlier versions, so it seems to be the same bug.
[Bug inline-asm/87733] local register variable not honored with earlyclobber
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87733 --- Comment #19 from Rich Felker --- > This looks like bad inline asm. You seem to be using $2, $8, $9 and $sp > explicitly and not letting the compiler know you are using them. $2, $8, and $9 are all explicitly outputs. All changes to $sp are reversed before the asm ends and there are no memory operands which could be sp-based and thereby invalidated by temp changes to it. > I think you want to change those to %0, %2 and %3 and adding one for $sp? All that does it make the code harder to read and more fragile against changes to the order the constraints are written in. > ...and "n" is an argument register, so why use "ir" for n's constraint? > Shouldn't that just be "r"? Maybe that is confusing IRA/LRA/reload? The code has been reduced as a standalone example that still reproduced the bug, from a static inline function that was inlined into a function with exactly the same signature. The static inline has a constant n after constant propagation for almost all places it gets inlined, so it "ir" constraint makes sense there. However, removing the "i" does not make the problem go away anyway.
[Bug inline-asm/87733] local register variable not honored with earlyclobber
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87733 --- Comment #16 from Rich Felker --- > I didn't say this very well... The only issue is using the same hard > register for two different operands. You don't need to do this for > syscalls (and you do not *need* that *ever*, of course). I hit the bug without using the same hard register for two operands. At least I'm pretty sure it's the same bug because the behavior matches and it's present in 6.3.0 but not 9.2.0. > Can you post some code that fails? If you think this is a GCC bug (in > some older branch?) that we should fix, please open a new PR for it. Here's the relevant code extracted out of musl: #define SYSCALL_CLOBBERLIST \ "$1", "$3", "$11", "$12", "$13", \ "$14", "$15", "$24", "$25", "hi", "lo", "memory" long syscall6(long n, long a, long b, long c, long d, long e, long f) { register long r4 __asm__("$4") = a; register long r5 __asm__("$5") = b; register long r6 __asm__("$6") = c; register long r7 __asm__("$7") = d; register long r8 __asm__("$8") = e; register long r9 __asm__("$9") = f; register long r2 __asm__("$2"); __asm__ __volatile__ ( "subu $sp,$sp,32 ; sw $8,16($sp) ; sw $9,20($sp) ; " "addu $2,$0,%4 ; syscall ;" "addu $sp,$sp,32" : "="(r2), "+r"(r7), "+r"(r8), "+r"(r9) : "ir"(n), "r"(r4), "r"(r5), "r"(r6) : SYSCALL_CLOBBERLIST, "$10"); return r7 && r2>0 ? -r2 : r2; } Built with gcc 6.3.0, %4 ends up expanding to $2, violating the earlyclobber, and %0 gets bound to $16 rather than $2 (which is why the violation is allowed, it seems). With "0"(r2) added to input constraints, the bug goes away. I don't particularly think this bug is something that needs to be fixed in older branches, especially if doing so is hard, but I do think it's something we need a solid reliable workaround for.
[Bug inline-asm/87733] local register variable not honored with earlyclobber
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87733 --- Comment #12 from Rich Felker --- > You can work around it on older GCC by simply not using a register var > for more than one asm operand, I think? Nope. Making a syscall inherently requires binding specific registers for all of the inputs/outputs, unless you want to spill everything to an explicit structure in memory and load them all explicitly in the asm block. So it really is a big deal. In particular, all mips variants need an earlyclobber constraint for the output register $2 because the old Linux kernel syscall contract was that, when restartable syscalls were interrupted, the syscall number passed in through $2 was lost, and the kernel returned to $pc-8 and expected a userspace instruction to reload $2 with the syscall number from an immediate or another register. If the input to load into $2 were itself passed in $2 (possible without earlyclobber), the reloading would be ineffective and restarting syscalls would execute the wrong syscall. The original mips port of musl had undocumented and seemingly useless "0"(r2) input constraints that were suppressing this bug, using the input to bind the register where the earlyclobber output failed to do so. After some recent changes broke compatibility with older kernels requiring the above contract, I manually reverted them (due to intervening conflicting diffs) and omitted the seemingly useless constraint, and it broke horribly. Eventually I found this bug searching the tracker. My plan for now is just to add back the "0"(r2) constraint, but since r2 is uninitialized, it's not clear that having it as an input constraint is even well-defined. Is this the best thing to do?
[Bug inline-asm/87733] local register variable not honored with earlyclobber
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87733 Rich Felker changed: What|Removed |Added CC||bugdal at aerifal dot cx --- Comment #10 from Rich Felker --- This is a rather huge bug to have been fixed silently. Could someone who knows the commit that fixed it and information on what versions are affected attach the info to the tracker here? And ideally some information on working around it for older GCCs? >From what I can tell experimenting so far, adding a dummy "0"(r0) constraint, or using + instead of =, makes the problem go away, but potentially has other ill effects from use of an uninitialized object..?
[Bug middle-end/93806] Wrong optimization: instability of floating-point results with -funsafe-math-optimizations leads to nonsense
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93806 --- Comment #32 from Rich Felker --- > A slightly modified version of the example, showing the issue with GCC 5 to 7 > (as the noipa attribute directive has been added in GCC 8): Note that __attribute__((__weak__)) necessarily applies noipa and works in basically all GCC versions, so you can use it where you want this kind of example for older GCC.
[Bug middle-end/93806] Wrong optimization: instability of floating-point results with -funsafe-math-optimizations leads to nonsense
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93806 --- Comment #14 from Rich Felker --- Indeed, without Anenx F, division by zero is UB, so it's fine to do anything if the program performs division by zero. So we need examples without division by zero.
[Bug middle-end/93806] Wrong optimization: instability of floating-point results with -funsafe-math-optimizations leads to nonsense
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93806 --- Comment #12 from Rich Felker --- To me the meaning of internal consistency is very clear: that the semantics of the C language specification are honored and that the only valid transformations are those that follow the "as-if rule". Since C without Annex F allows arbitrarily awful floating point results, your example in comment 11 is fine. Each instance of 1/a can evaluate to a different value. They could even evaluate to random values. However, if you had written: int b = 1/a == 1/0.; int c = b; return b == c; then the function must necessarily return 1, because the single instance of 1/a==1/0. in the abstract machine has a single value, either 0 or 1, and in the abstract machine that value is stored to b, then copied to c, and b and c necessarily have the same value. While I don't think it's likely that GCC would mess up this specific example, it seems that it currently _can_ make transformations such that a more elaborate version of the same idea would be broken.
[Bug middle-end/93806] Wrong optimization: instability of floating-point results with -funsafe-math-optimizations leads to nonsense
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93806 --- Comment #10 from Rich Felker --- I don't think it's at all clear that -fno-signed-zeros is supposed to mean the programmer is promising that their code has behavior independent of the sign of zeros, and that any construct which would be influenced by the sign of a zero has undefined behavior. I've always read it as a license to optimize in ways that disregard the sign of a zero or change the sign of a zero, but with internal consistency of the program preserved. If -fno-signed-zeros is really supposed to be an option that vastly expands the scope of what's undefined behavior, rather than just removing part of Annex F and allowing the unspecified quality of floating point results that C otherwise allows in the absence of Annex F, it really needs a much much bigger warning in its documentation!
[Bug c/85957] i686: Integers appear to be different, but compare as equal
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85957 --- Comment #25 from Rich Felker --- I think standards-conforming excess precision should be forced on, and added to C++; there are just too many dangerous ways things can break as it is now. If you really think this is a platform of dwindling relevance (though I question that; due to the way patent lifetimes work, the first viable open-hardware x86 clones will almost surely lack sse, no?) then we should not have dangerous hacks for the sake of marginal performance gains, with too few people spending the time to deal with their fallout. I'd be fine with an option to change the behavior of constants, and have it set by default for -std=gnu* as long as the unsafe behavior is removed from -std=gnu*.
[Bug tree-optimization/93682] Wrong optimization: on x87 -fexcess-precision=standard is incompatible with -mpc64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93682 --- Comment #2 from Rich Felker --- I think the underlying issue here is just that -mpc64 (along with -mpc32) is just hopelessly broken and should be documented as such. It could probably be made to work, but there are all sorts of issues like float.h being wrong, math library code breaking, etc. On a more fundamental level (but seemingly unrelated to the mechanism of breakage here), the underlying x87 precision control modes are also hopelessly broken. They're not actually single/double precision modes, but single/double mantissa with ld80 exponent. So I don't think it's possible to make the optimizer aware of them without making it aware of two new floating point formats that it doesn't presently know about. If you just pretended they were single/double, the same sort of issue would arise again as soon as someone uses small or large values that should be denormal/underflow/overflow but which retain their full-precision values by virtue of the excess exponent precision.
[Bug c/85957] i686: Integers appear to be different, but compare as equal
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85957 --- Comment #19 from Rich Felker --- Test case provided by Szabolcs Nagy showing that GCC does seem to spill right if it can't assume there's no excess precision to begin with: double h(); double ff(double x, double y) { return x+y+h(); } In theory this doesn't force a spill, but GCC seems to choose to do one, I guess to avoid having to preserve two incoming values (although they're already in stack slots that would be naturally preserved). Here GCC 9.2 with -fexcess-precision=standard -O3 it emits fstpt/fldt.
[Bug c/85957] i686: Integers appear to be different, but compare as equal
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85957 --- Comment #18 from Rich Felker --- It was just pointed out to me that this might be an invalid test since GCC assumes (correctly or not) that the return value of a function does not have excess precision. I'll see if I can make a better test.
[Bug c/85957] i686: Integers appear to be different, but compare as equal
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85957 --- Comment #17 from Rich Felker --- And indeed you're right that GCC does it wrong. This can be seen from a minimal example: double g(),h(); double f() { return g()+h(); } where gcc emits fstpl/fldp around the second call rather than fstpt/fldt. So this is all even more broken that I thought. It looks like the only way to get deterministic behavior from GCC right now is to get the wrong deterministic behavior via -ffloat-store. Note that libfirm/cparser gets the right result, emitting fstpt/fldt.
[Bug c/85957] i686: Integers appear to be different, but compare as equal
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85957 --- Comment #16 from Rich Felker --- > And GCC does not do spills in this format, as see in bug 323. In my experience it seems to (assuming -fexcess-precision=standard), though I have not done extensive testing. I'll check and follow up. > This is conforming as there is no requirement to keep intermediate results in > excess precision and range. Such behavior absolutely is non-conforming. The standard reads (5.2.4.2.2 ΒΆ9): "Except for assignment and cast (which remove all extra range and precision), the values yielded by operators with floating operands and values subject to the usual arithmetic conversions and of floating constants are evaluated to a format whose range and precision may be greater than required by the type" Note "are evaluated", not "may be evaluated depending on what spills the compiler chooses to perform".
[Bug c/82318] -fexcess-precision=standard has no effect on a libm function call
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82318 --- Comment #9 from Rich Felker --- Indeed, I don't think the ABI says anything about this; a bug against the psABI should probably be opened.
[Bug c/85957] i686: Integers appear to be different, but compare as equal
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85957 --- Comment #14 from Rich Felker --- > No problems: FLT_EVAL_METHOD==2 means "evaluate all operations and constants > to the range and precision of the long double type", which is what really > occurs. The consequence is indeed double rounding when storing in memory, but > this can happen at *any* time even without -ffloat-store (due to spilling), > because you are never sure that registers are still available; see some > reports in bug 323. It sounds like you misunderstand the standard's requirements on, and GCC's implementation of, FLT_EVAL_METHOD==2/excess-precision. The availability of registers does not in any way affect the result, because when expressions are evaluated with excess precision, any spills must take place in the format of float_t or double_t (long double) and are thereby transparent to the application. The buggy behavior prior to -fexcess-precision=standard (and now produced with -fexcess-precision=fast which is default in "gnu" modes) spills in the nominal type, producing nondeterministic results that depend on the compiler's transformations and that lead to situations like this bug (where the optimizer has been lied to that two expressions are equal, but they're not). > Double rounding can be a problem with some codes, but this just means that > the code is not compatible with FLT_EVAL_METHOD==2. For some floating-point > algorithms, double rounding is not a problem at all, while keeping a result > in extended precision will make them fail. With standards-conforming behavior, the rounding of an operation and of storage to an object of float/double type are discrete roundings and you can observe and handle the intermediate value between them. With -ffloat-store, every operation inherently has a double-rounding attached to it. This behavior is non-conforming but at least deterministic, and is what I was referring to in my previous comment. But I think this is largely a distraction from the issue at hand; I was only pointing out that -ffloat-store is a workaround, but one with its own (often severe) problems.
[Bug c/82318] -fexcess-precision=standard has no effect on a libm function call
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82318 --- Comment #7 from Rich Felker --- I'll inquire about it. Note that F.6 already requires this for C functions; the loophole is just that the implementation itself does not inherently have to consist of C functions. If it's determined that C won't require the library functions not bound to IEEE operations to return values representable in their nominal type, then GCC needs to be aware of whether the target libc can be expected to do so, and if not, it needs to, as a special case, assume there might be excess precision in the return value, so that (double)retval==retval can't be assumed to be true in the optimizer. Note that such an option would be nice to have anyway, for arbitrary functions, since it's necessary for being able to call code that was compiled with -fexcess-precision=fast from code that can't accept the non-conforming/optimizer-unsafe behavior and safely use the return value. It should probably be an attribute, with a flag to set the global default. For example, __attribute__((__returns_excess_precision__)).
[Bug c/85957] i686: Integers appear to be different, but compare as equal
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85957 --- Comment #12 from Rich Felker --- Note that -fexcess-precision=standard is not available in C++ mode to fix this. However, -ffloat-store should also ensure consistency to the optimizer (necessary to prevent this bug, and other variants of it, from happening) at the expense of some extreme performance and code size costs and making the floating point results even more semantically incorrect (double-rounding all over the place, mismatching FLT_EVAL_METHOD==2) and -ffloat-store is available in C++ mode. Despite all these nasty effects, it may be a suitable workaround, and at least it avoids letting the optimizer prove 0==1, thereby effectively treating any affected code as if it contained UB. Note that in code written to be excess-precision-aware, making use of float_t and double_t for intermediate operands and only using float and double for in-memory storage, -ffloat-store should yield behavior equivalent to -fexcess-precision=standard.
[Bug middle-end/323] optimized code gives strange floating point results
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=323 --- Comment #214 from Rich Felker --- I'm not particular in terms of the path it takes as long as this gets back to a status where it's on the radar for fixing.
[Bug middle-end/323] optimized code gives strange floating point results
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=323 --- Comment #211 from Rich Felker --- If new reports are going to be marked as duplicates of this, then can it please be moved from SUSPENDED status to REOPENED? The situation is far worse than what seems to have been realized last this was worked on, as evidenced by pr 85957. These issues just came up again breaking real-world software in https://github.com/OSGeo/PROJ/issues/1906
[Bug middle-end/323] optimized code gives strange floating point results
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=323 --- Comment #210 from Rich Felker --- If new reports are going to be marked as duplicates of this, then can it please be moved from SUSPENDED status to REOPENED? The situation is far worse than what seems to have been realized last this was worked on, as evidenced by pr 85957. These issues just came up again breaking real-world software in https://github.com/OSGeo/PROJ/issues/1906
[Bug c++/93620] New: Floating point is broken in C++ on targets with excess precision
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93620 Bug ID: 93620 Summary: Floating point is broken in C++ on targets with excess precision Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: bugdal at aerifal dot cx Target Milestone: --- Attempting to use -fexcess-precision=standard with g++ produces: cc1plus: sorry, unimplemented: '-fexcess-precision=standard' for C++ In light of eldritch horrors like pr 85957 this means floating point is essentially catastrophically broken on i386 and m68k. This came to my attention while analyzing https://github.com/OSGeo/PROJ/issues/1906. Most of the problems are g++ incorrectly handling excess precision, and they're having to put awful hacks with volatile objects in place to work around it.
[Bug c/82318] -fexcess-precision=standard has no effect on a libm function call
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82318 Rich Felker changed: What|Removed |Added CC||bugdal at aerifal dot cx --- Comment #5 from Rich Felker --- My understanding is that C2x is fixing this underspecification and will require the library functions to drop excess precision as if they used a return statement. So this really should be fixed in glibc if it's still an issue; if they accept fixing that I don't think GCC needs any action on this. I just fixed it in musl.
[Bug target/65249] unable to find a register to spill in class 'R0_REGS' when compiling protobuf on sh4
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65249 Rich Felker changed: What|Removed |Added CC||bugdal at aerifal dot cx --- Comment #27 from Rich Felker --- We've hit what seems like almost the exact same issue on gcc 8.3.0 with this minimized testcase: void fg(int *); int get_response(int a) { int b; if (a) fg(); return 0; } compiled with -O -c -fstack-protector-strong for sh2eb-linux-muslfdpic. With gcc 9.2.0 it compiles successfully. I looked for a record of such a fix having been made, but couldn't find one. Was it a known issue that was fixed silently, or might it be a lurking bug that's just no longer being hit?
[Bug middle-end/93509] New: Stack protector should offer trap-only handling
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93509 Bug ID: 93509 Summary: Stack protector should offer trap-only handling Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: bugdal at aerifal dot cx Target Milestone: --- Presently stack protector functionality depends on making a call to __stack_chk_fail (possibly via __stack_chk_fail_local to avoid PLT-call-ABI constraint in the caller). This is less secure than it could be, since it depends on the ability to make function calls (and possibly operate on global data and make syscalls in the callee) in a process whose state is compromised. For example the GOT entries used by PLT could be clobbered or %gs:0x10 (i386 syscall vector) could be clobbered by the same stack-based overflow that caused the stack protector event in the first place. In https://gcc.gnu.org/ml/gcc/2020-01/msg00483.html where the topic is being discussed for other reasons (contract between gcc and libc for where these symbols are provided), I proposed that GCC should offer an option to emit a trapping instruction directly, instead of making a function call, analogous to -fsanitize-undefined-trap-on-error for UBSan. This would work well on all targets where __builtin_trap is defined, but would regress (requiring PLT call) on targets where it uses the default abort() definition (are there any relevant ones?). Segher Boessenkool then requested I file this here on the GCC tracker. Note: I'm filing this for middle-end because that was my best guess of where GCC handles it, but it's possible all this logic is repeated in each target or takes place somewhere else entirely; if so please reassign to appropriate component.
[Bug libstdc++/93421] New: futex.cc use of futex syscall is not time64-compatible
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93421 Bug ID: 93421 Summary: futex.cc use of futex syscall is not time64-compatible Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: libstdc++ Assignee: unassigned at gcc dot gnu.org Reporter: bugdal at aerifal dot cx Target Milestone: --- Created attachment 47704 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=47704=edit simple fix, not necessarily right for upstream This code directly passes a userspace timespec struct to the SYS_futex syscall, which does not work if the userspace type is 64-bit but the syscall expects legacy 32-bit timespec. I'm attaching the patch we're using in musl-cross-make to fix this. It does not attempt to use the SYS_futex_time64 syscall, since that would require fallback logic with cost tradeoffs for which to try first, and since the timeout is relative and therefore doesn't even need to be 64-bit. Instead it just uses the existence of SYS_futex_time64 to infer that the plain SYS_futex uses a pair of longs, and converts the relative timestamp into that. This assumes that any system where the libc timespec type has been changed for time64 will also have had its headers updated to define SYS_futex_time64. Error handling for extreme out-of-bound values should probably be added.
[Bug libstdc++/93325] New: libstdc++ wrongly uses direct clock_gettime syscall on non-glibc, breaks time64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93325 Bug ID: 93325 Summary: libstdc++ wrongly uses direct clock_gettime syscall on non-glibc, breaks time64 Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: libstdc++ Assignee: unassigned at gcc dot gnu.org Reporter: bugdal at aerifal dot cx Target Milestone: --- The configure logic for libstdc++ is choosing to make direct clock_gettime syscalls (via syscall()) rather than using the clock_gettime function except on glibc 2.17 or later (when it was moved from librt to libc). This is incompatible with time64 (because struct timespec mismatches the form the old clock_gettime syscall uses) and also undesirable because it can't take advantage of vdso. The hard-coded glibc version dependency is a configure anti-pattern and should be removed; the right way to test this would be just probing for the clock_gettime function without adding any libs (like -lrt).
[Bug c/61579] -Wwrite-strings does not behave as a warning option
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61579 --- Comment #6 from Rich Felker --- Ping.