[Bug target/114641] New: sh: fdpic optimization of function address breaks pointer equality

2024-04-08 Thread bugdal at aerifal dot cx via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114641

Bug ID: 114641
   Summary: sh: fdpic optimization of function address breaks
pointer equality
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: bugdal at aerifal dot cx
  Target Milestone: ---

Created attachment 57904
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57904=edit
rough fix

For FDPIC targets where the canonical value of function address for equality
purposes is determined by the address of the function descriptor, the function
symbol being locally defined is not a sufficient condition for using a
GOT-relative descriptor address. The address cannot be determined at link-time,
only at ldso-time, and thus must be loaded through the GOT.

sh.c's legitimize_pic_address wrongly optimizes references with
SYMBOL_REF_LOCAL_P to @GOTOFFFUNCDESC form unless they are weak (for
undef-weak) reasons, but also needs to refrain from doing this optimization if
the symbol is external and not hidden.

The test case I was working with is:

#include 
#include 

int main()
{
printf("%p %p\n", (void *)main, dlsym(RTLD_DEFAULT, "main"));
}

but you can see the problem without executing anything, just looking at the
emitted assembly.

The attached patch fixes it but is probably not idiomatic.

Note that there is a related binutils bug that prevents the fix from having an
effect on the test program when linked:

https://sourceware.org/bugzilla/show_bug.cgi?id=31619

With both applied, the linked output is correct too.

[Bug target/114158] Wrong FDPIC special-casing in crtstuff produces invalid pointer in init_array

2024-02-28 Thread bugdal at aerifal dot cx via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114158

--- Comment #5 from Rich Felker  ---
I don't know how I ended up copying the wrong commit id, but the one I meant to
reference was 9c560cf23996271ee26dfc4a1d8484b85173cd12.

Actually, I do know now. I got it out of the gitweb url which gratuitously ahs
the parent hash in a place where it's easy to accidentally copy instead of the
hash of the commit you're viewing (one of the many reasons I prefer cgit):

https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=9c560cf23996271ee26dfc4a1d8484b85173cd12;hp=6bcbf80c6e2bd8a60d88bbcac3d70ffb67f4888f

So indeed, the breakage was detected upstream and worked around, as I said.

[Bug libgcc/114158] New: Wrong FDPIC special-casing in crtstuff produces invalid pointer in init_array

2024-02-28 Thread bugdal at aerifal dot cx via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114158

Bug ID: 114158
   Summary: Wrong FDPIC special-casing in crtstuff produces
invalid pointer in init_array
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libgcc
  Assignee: unassigned at gcc dot gnu.org
  Reporter: bugdal at aerifal dot cx
  Target Milestone: ---

Commit 11189793b6ef60645d5d1126d0bd9d0dd83e6583 introduced wrong special-casing
of FDPIC to __do_global_dtors_aux handling in crtstuff.c. For some reason, it
was assumed that, on FDPIC targets, init/fini arrays would contain instruction
addresses rather than function addresses (which are addresses of descriptors,
on FDPIC targets). This is NOT the case. The gABI contract of the init/fini
arrays is that they contain ABI-callable function pointers, and in fact GCC
correctly emits FUNCDESC-type relocations referencing then when translating
ctors/dtors, on ARM as well as sh.

It seems to have been realized that this was not working, as
6bcbf80c6e2bd8a60d88bbcac3d70ffb67f4888f disabled initfini arrays on ARM/FDPIC,
but didn't identify the root cause.

Commit 11189793b6ef60645d5d1126d0bd9d0dd83e6583 should be reverted ASAP, and
backported to all maintained versions, as it's actively breaking other targets
by putting an invalid function pointer in the init_array.

Commit 6bcbf80c6e2bd8a60d88bbcac3d70ffb67f4888f should also be reverted in
theory, but may need coordination with uclibc if they want to work around
binaries built with broken versions.

Further discussion of the issue can be found on the musl mailing list, in this
thread where myself and the author of the in-progress xtensa/fdpic port were
trying to figure out what's going on here:

https://www.openwall.com/lists/musl/2024/02/28/12

[Bug target/114060] asm constraints getting GOT address for ARM/FDPIC look wrong

2024-02-22 Thread bugdal at aerifal dot cx via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114060

--- Comment #2 from Rich Felker  ---
How could there be such a contract? In order to call any other function, the
GOT address of the callee needs to be loaded, replacing the caller's value,
which must be spilled and reloaded if it's needed again -- but if it's not
needed again, it makes sense to just discard it.

On SH (and AFAIK FRV, the original FDPIC), GCC happily discards the FDPIC/GOT
register when it won't be used again.

Maybe as an implementation detail GCC is not doing that on ARM right now, but
if not, that's probably a big missed optimization and not something libgcc
unwinder code should be relying on.

[Bug libgcc/114060] New: asm constraints getting GOT address for ARM/FDPIC look wrong

2024-02-22 Thread bugdal at aerifal dot cx via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114060

Bug ID: 114060
   Summary: asm constraints getting GOT address for ARM/FDPIC look
wrong
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libgcc
  Assignee: unassigned at gcc dot gnu.org
  Reporter: bugdal at aerifal dot cx
  Target Milestone: ---

Reading the code added for unwind-pe.h for FDPIC, I came across the ARM
implementation that uses FDPIC_REGNUM as an input constraint to __asm to get
the GOT register value. As I understand it, this is not correct, as there is no
contract that this register permanently hold the GOT address for the executing
code; it's just a hidden argument register for making function calls, which the
callee can throw away if it does not need to access the GOT or any global data,
or spill and reload.

To reliably get the GOT register, I think you need to make an actual external
call to an asm function that movs the GOT register to the return-value register
and returns.

[Bug c/113653] Failure to diagnose use of (non-constant-expr) const objects in static initializers

2024-01-29 Thread bugdal at aerifal dot cx via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113653

--- Comment #6 from Rich Felker  ---
I'm aware of the allowance to accept "other forms". It's unfortunately
underspecified (does the implementation need to be specific in what forms?
document them per the normal rules for implementation-defined behavior? etc.)
but indeed it exists.

Regardless, at least -pedantic should diagnose this, because it's a big footgun
for writing code that is not valid C, that only works with certain compilers
that implement C++-like behavior in C. I would also be happy with a separate
warning option controlling it, named something like like
-Wextended-constant-expressions.

[Bug c/113653] Failure to diagnose use of (non-constant-expr) const objects in static initializers

2024-01-29 Thread bugdal at aerifal dot cx via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113653

Rich Felker  changed:

   What|Removed |Added

 Resolution|DUPLICATE   |---
 Status|RESOLVED|UNCONFIRMED

--- Comment #4 from Rich Felker  ---
This is NOT a duplicate of the marked bug - that bug was complaining that
invalid code didn't compile.

This bug is that GCC accepts invalid code, even with -pedantic, with no
diagnostic, making it impossible to catch invalid C. This bug bit me in the
wild - I accepted code that should have been rejected as a constraint
violation, and thereby made the project impossible to compile with other
compilers for a couple releases.

In standards-conforming and/or pedantic mode, the code should be rejected.

[Bug c/113653] Failure to diagnose use of (non-constant-expr) const objects in static initializers

2024-01-29 Thread bugdal at aerifal dot cx via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113653

--- Comment #1 from Rich Felker  ---
FWIW -pedantic also does not help.

[Bug c/113653] New: Failure to diagnose use of (non-constant-expr) const objects in static initializers

2024-01-29 Thread bugdal at aerifal dot cx via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113653

Bug ID: 113653
   Summary: Failure to diagnose use of (non-constant-expr) const
objects in static initializers
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: bugdal at aerifal dot cx
  Target Milestone: ---

The following is a constraint violation:

int foo()
{
static const int x = 1;
static const int y = x; // not a constant expression
return y;
}

However, gcc does not diagnose it as such, even with -Wall -Wextra.

This appears to have been a regression somewhere between the gcc 4 era and now.

I'm not sure what component this should be assigned to. I chose "c" because
it's C-specific that this is not a constant expression; it would be in C++.

[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy

2024-01-04 Thread bugdal at aerifal dot cx via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667

--- Comment #62 from Rich Felker  ---
The process described there would have to end at least N bits before the end of
the destination buffer. The point was that it would destroy information
internal to the buffer at each step along the way, before it got to the end.

[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy

2024-01-04 Thread bugdal at aerifal dot cx via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667

--- Comment #60 from Rich Felker  ---
Nobody said anything about writing past end of buffer. Obviously you can't do
that.

[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy

2024-01-04 Thread bugdal at aerifal dot cx via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667

--- Comment #57 from Rich Felker  ---
I think one could reasonably envision an implementation that does some sort of
vector loads/stores where, due to some performance constraint or avoiding
special casing for possible page boundary past the end of the copy, it only
wants to load N bits at a time, but the efficient store instruction always
stores a full vector of 2N bits. Of course, one could also argue quite
reasonably that this is a weird enough thing to do that the implementation
should then just check for src==dest and early-out.

I'm far less concerned about whether such mechanical breakage exists, and more
concerned about the consequences of LTO/whole-program-analysis where something
in the translation process can see the violated restrict qualifier, infer UB,
and blow everything up.

The change being requested here is really one of removing the restrict
qualification from the arguments and making a custom weaker condition. This may
in turn have consequences on what types of transformations are possible.

[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy

2023-11-23 Thread bugdal at aerifal dot cx via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667

--- Comment #44 from Rich Felker  ---
My naive expectation is that "if ((uintptr_t)src == 0x400400)" is and should be
UB, but I may be misremembering the details of the formalism by which the spec
for restrict is implemented.

If so, that's kinda a help, but I still think you would want to remove restrict
from the arguments and apply it later, so that the fast-path head/tail copies
can avoid any branch, and the check for equality can be deferred until it's
known that there's a "body remainder" to copy. That's the part where you really
want the benefits of restrict anyway -- without restrict it's not vectorizable
because the compiler has to assume there might be nonexact overlap, in which
case reordering the loads and stores in any way could change the result.

[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy

2023-11-23 Thread bugdal at aerifal dot cx via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667

--- Comment #42 from Rich Felker  ---
> I'm not saying that such an implementation will be a good idea, but just a 
> remark: You could, in fact, keep restrict for the arguments in this case, 
> because the object pointed to by src and dest is not accessed at all when 
> src==dest. So this is correct code according to the standard. (The exact 
> semantics of restrict are a bit involved...)

Nope, UB is invoked as soon as you evaluate src==dest, even with no
dereferencing. The semantics of restrict are such that the behavior of the code
must be unchanged if the pointer were replaced to a pointer to a relocated copy
of the pointed-to object. Since this would alter the result of the == operator,
that constraint is not satisfied and thereby the behavior is undefined.

[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy

2023-11-23 Thread bugdal at aerifal dot cx via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667

--- Comment #37 from Rich Felker  ---
Also: has anyone even looked at what happens if a C memcpy with proper use of
restrict gets LTO-inlined into a caller with GCC-generated memcpy calll where
src==dest? That sounds like a case likely to blow up...

[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy

2023-11-23 Thread bugdal at aerifal dot cx via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667

--- Comment #36 from Rich Felker  ---
> the assembly generated by the current implementations already supports that 
> case.

Our memcpy is not written in asm but in C, and it has the restrict qualifier on
src and dest. This entitles a compiler to emit asm equivalent to if (src==dest)
system("rm -rf /") if it likes. I don't know how you can write a valid C
implementation of memcpy that "doesn't care" about 100% overlap without giving
up restrict (and the benefits it entails) entirely.

If you're happy with a branch, you could probably take restrict off the
arguments and do something like:

if (src==dest) return;
const char *restrict src2 = src;
char *restrict dest2 = dest;
...

but that's shoving the branch into memcpy where it's a cost on every caller
making dynamic memcpys with potentially tiny size (like qsort, etc.) and
obeying the contract not to call with overlapping src/dest, rather than just
imposing it on bad callers.

[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy

2023-11-22 Thread bugdal at aerifal dot cx via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667

--- Comment #28 from Rich Felker  ---
> No, that is not a reasonable fix, because it severely pessimizes common code 
> for a theoretical only problem.

Far less than a call to memmove (which necessarily has something comparable to
that and other unnecessary branches) pessimizes it.

I also disagree that it's severe. On basically any machine with branch
prediction, the branch will be predicted correctly all the time and has
basically zero cost. On the other hand, the branches in memmove could go
different ways depending on the caller, so it's much more
machine-capability-dependent whether they can be predicted.

In some sense the optimal thing to do is "nothing", just assuming it would be
hard to write a memcpy that fails on src==dest. However, at the very least this
precludes hardened memcpy trapping on src==dest, which might be a useful
hardening feature (or rather on a range test for overlapping, which would
happen to also catch exact overlap). So it would be nice if it were fixed.

FWIW, I don't think single branches are relevant to overall performance in
cases where the compiler is doing something reasonable by emitting a call to
memcpy to implement assignment. If the object is small enough that the branch
is relevant, the call overhead is even more of a big deal, and it should be
inlining loads/stores to perform the assignment.

[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy

2023-11-22 Thread bugdal at aerifal dot cx via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667

--- Comment #26 from Rich Felker  ---
> The only reasonable fix on the compiler side is to never emit memcpy but 
> always use memmove.

No, it can literally just emit (equivalent at whatever intermediate form of):

cmp src,dst
je 1f
call memcpy
1:

in place of memcpy.

It can even optimize out that in the case where it's provable that they're not
equal, e.g. presence of restrict or one of the two objects not having had its
address taken/leaked.

[Bug middle-end/32667] block copy with exact overlap is expanded as memcpy

2023-11-21 Thread bugdal at aerifal dot cx via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667

Rich Felker  changed:

   What|Removed |Added

 CC||bugdal at aerifal dot cx

--- Comment #24 from Rich Felker  ---
If the copy is such that gcc is happy to emit an external call to memcpy for
it, there is no significant size or performance cost to emitting a branch
checking for equality before making the call, and performing this branch would
greatly optimize the (maybe rare in the caller, maybe not) case of
self-assignment!

On the other hand, expecting the libc memcpy to make this check greatly
pessimizes every reasonable small use of memcpy with a gratuitous branch for
what is undefined behavior and should never appear in any valid program.

Fix it on the compiler side please.

[Bug middle-end/111849] GCC replaces volatile struct assignments with memcpy calls

2023-10-18 Thread bugdal at aerifal dot cx via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111849

--- Comment #2 from Rich Felker  ---
I agree that volatile isn't the best way to handle memcpy suppression for other
purposes - it was just one of the methods I experimented with that led to me
discovering this issue, which I found surprising and reported.

With regards to impact of this bug, in discussion within the musl libc
community where it was found, I did encounter one potentially affected user who
is using volatile struct stores to write entire bitfields at once on mmio
registers instead of (possibly invalid, at least inefficient) read-modify-write
cycles on each bitfield member. I believe their use was unaffected, probably
because the whole struct is small enouth that it gets emitted as direct
load/store rather than a memcpy call.

[Bug target/111849] New: GCC replaces volatile struct assignments with memcpy calls

2023-10-17 Thread bugdal at aerifal dot cx via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111849

Bug ID: 111849
   Summary: GCC replaces volatile struct assignments with memcpy
calls
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: bugdal at aerifal dot cx
  Target Milestone: ---

On at least some targets where GCC compiles struct assignments to memcpy calls,
this pattern is also used when the struct objects involved are
volatile-qualified. This is invalid; the memcpy function has no contract to
work on volatile objects, and making it compatible with volatile objects would
impose extreme implementation constraints that would limit its performance. For
example, memcpy may copy the same byte more than once to avoid branches, may
use special store instructions with particular cache semantics or data transfer
sizes that aren't compatible with various volatile objects like memory-mapped
registers, etc.

I don't think the C standard is very clear on what is supposed to happen for
volatile struct assignments, but they should at least be done in a way that's
known to be compatible with any memory-mapped interfaces supported on the
target architecture, and the safe behavior is probably implementing them as
member-by-member assignment with some fixup for padding.

I found this while looking at ways to suppress generation of external calls to
memcpy when compiling very restrictive TUs that aren't allowed to make any
external calls, and being surprised that "just add volatile" was not one of the
ways.

I'm filing this as target component because I think the transformation is
taking place at the target backend layer on affected targets rather than
earlier, but I'm not certain. This should be reviewed and possibly reclassified
if that's wrong.

[Bug tree-optimization/107107] [10/11/12/13 Regression] Wrong codegen from TBAA when stores to distinct same-mode types are collapsed?

2022-10-01 Thread bugdal at aerifal dot cx via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107107

--- Comment #7 from Rich Felker  ---
Second one filed as https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107115

[Bug middle-end/107115] New: Wrong codegen from TBAA under stores that change effective type?

2022-10-01 Thread bugdal at aerifal dot cx via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107115

Bug ID: 107115
   Summary: Wrong codegen from TBAA under stores that change
effective type?
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: bugdal at aerifal dot cx
  Target Milestone: ---

Created attachment 53648
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53648=edit
original test case by supercat

The attached test case is from user supercat on Stack Overflow (original
source:
https://stackoverflow.com/questions/42178179/will-casting-around-sockaddr-storage-and-sockaddr-in-break-strict-aliasing/42178347?noredirect=1#comment130510083_42178347,
https://godbolt.org/z/jfv1Ge6v4) and demonstrates what appears to be wrong TBAA
optimization on an object with allocated storage whose effective type changes
under stores. It was first presented as another example of this kind of problem
alongside the example that became
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107107, but it seems likely that
the root cause is distinct.

Reportedly clang/LLVM also transforms this example wrong.

On 64-bit targets, the test program outputs 2/1 with optimization levels that
enable -fstrict-aliasing. The expected output is 2/2. Using
-fno-strict-aliasing fixes it.

[Bug middle-end/107107] Wrong codegen from TBAA when stores to distinct same-mode types are collapsed?

2022-09-30 Thread bugdal at aerifal dot cx via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107107

--- Comment #1 from Rich Felker  ---
There's also a potentially related test case at https://godbolt.org/z/jfv1Ge6v4
- I'm not yet clear on whether it's likely to have the same root cause.

[Bug middle-end/107107] New: Wrong codegen from TBAA when stores to distinct same-mode types are collapsed?

2022-09-30 Thread bugdal at aerifal dot cx via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107107

Bug ID: 107107
   Summary: Wrong codegen from TBAA when stores to distinct
same-mode types are collapsed?
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: bugdal at aerifal dot cx
  Target Milestone: ---

Created attachment 53646
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53646=edit
original test case by supercat

The attached test case is from user supercat on Stack Overflow (original
source:
https://stackoverflow.com/questions/42178179/will-casting-around-sockaddr-storage-and-sockaddr-in-break-strict-aliasing/42178347?noredirect=1#comment130509588_42178347,
https://godbolt.org/z/83v4ssrn4) and demonstrates wrong TBAA apparently
assuming an object of type long long was not modified after the code path
modifying it was collapsed with a different code path performing the
modification via an lvalue of type long.

On 64-bit targets, the test program outputs 1/2 with optimization levels that
enable -fstrict-aliasing. The expected output is 2/2. Using
-fno-strict-aliasing fixes it.

I have not checked this myself, but according to others who have looked at the
test case, the regression came between GCC 4.7 and 4.8.

[Bug ipa/95558] [9/10/11/12 Regression] Invalid IPA optimizations based on weak definition

2022-01-17 Thread bugdal at aerifal dot cx via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95558

--- Comment #11 from Rich Felker  ---
Are you sure? If pure/const discovery is no longer applied to weak definitions,
it shouldn't be able to propagate to a non-inlined caller. Of course the fix
may be incomplete or not working, which I guess we could tell from whether it
happened prior to or after comment 5. :)

[Bug ipa/95558] [9/10/11/12 Regression] Invalid IPA optimizations based on weak definition

2022-01-17 Thread bugdal at aerifal dot cx via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95558

--- Comment #9 from Rich Felker  ---
Can you provide a link to the commit that might have fixed it? I imagine it's
simple enough to backport, in which case I'd like to do so.

[Bug ipa/95558] [9/10/11/12 Regression] Invalid IPA optimizations based on weak definition

2022-01-17 Thread bugdal at aerifal dot cx via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95558

--- Comment #7 from Rich Felker  ---
> Do weak aliases fall under some implicit ODR here?

The whole definition of "weak" is that it entitles you to make a definition
that will be exempt from ODR, where a non-weak definition, if any, replaces it.

[Bug middle-end/95189] [9/10 Regression] memcmp being wrongly stripped like strcmp

2021-06-23 Thread bugdal at aerifal dot cx via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95189

--- Comment #30 from Rich Felker  ---
This is a critical codegen issue. Is it really still not fixed in 9.4.0?

[Bug rtl-optimization/98555] Functions optimized to zero length break function pointer inequality

2021-03-16 Thread bugdal at aerifal dot cx via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98555

--- Comment #5 from Rich Felker  ---
Ping. Could this be solved without the need for target-specific logic by, in
some earlier layer, transforming entirely empty function bodies to
__builtin_trap()? (And thereby relying on the target's implementation thereof,
which defaults to a call to abort() if the target doesn't provide one.)

[Bug target/99491] New: [mips64] over-strict refusal to emit tail calls

2021-03-09 Thread bugdal at aerifal dot cx via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99491

Bug ID: 99491
   Summary: [mips64] over-strict refusal to emit tail calls
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: bugdal at aerifal dot cx
  Target Milestone: ---

mips_function_ok_for_sibcall refuses to generate sibcalls (except locally) on
mips64 due to %gp being call-saved and the possibility that the callee is a
lazy resolver stub. This is presumably correct-ish on dynamic-linked platforms
with lazy resolver, due to the resolver using the caller's value of %gp, but
completely gratuitous on platforms that are static linked or don't use lazy
resolver, such as musl libc.

Moreover, the problem could be fixed even for lazy-resolver targets by
generating an indirect function call reference that forcibly loads the address
and can't go through a lazy resolver, rather than a PLT-like reference that
might.

[Bug libstdc++/66146] call_once not C++11-compliant on ppc64le

2021-02-15 Thread bugdal at aerifal dot cx via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66146

--- Comment #46 from Rich Felker  ---
It's a standard and completely reasonable assumption that, if you statically
linked libstdc++ into your shared library, the copy there is for *internal use
only* and cannot share objects of the standard library's types across
boundaries with other libraries or the main application. The problem only comes
when the library's implementation (via templates or inline code in headers)
imposes the same requirement on normal dynamic linking, where it's a
nonstandard and unreasonable one.

[Bug libstdc++/66146] call_once not C++11-compliant on ppc64le

2021-02-15 Thread bugdal at aerifal dot cx via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66146

--- Comment #44 from Rich Felker  ---
Uhg. I don't know what kind of retroactive fix for that is possible, if any,
but going forward this kind of thing (assumptions that impose ABI boundaries)
should not be inlined by the template. It should just expand to an external
call so that the implementation details can be kept as implementation details
and changed as needed.

[Bug libstdc++/66146] call_once not C++11-compliant on ppc64le

2021-02-15 Thread bugdal at aerifal dot cx via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66146

--- Comment #42 from Rich Felker  ---
I'm confused why this is an ABI boundary at all. Was the old implementation of
std::call_once being inlined into callers? Otherwise all code operating on the
same once object should be using a common implementation, either the old one or
the new one, from libstdc++.

[Bug rtl-optimization/98555] Functions optimized to zero length break function pointer inequality

2021-01-06 Thread bugdal at aerifal dot cx via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98555

--- Comment #3 from Rich Felker  ---
> Due to "undefined behavior" of course means this isn't unexpected

That would only be the case if undefined behavior were reached during
execution, but it's not. This bug affects programs that do not and cannot call
the zero-length function.

[Bug middle-end/98555] New: Functions optimized to zero length break function pointer inequality

2021-01-05 Thread bugdal at aerifal dot cx via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98555

Bug ID: 98555
   Summary: Functions optimized to zero length break function
pointer inequality
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: bugdal at aerifal dot cx
  Target Milestone: ---

Given a function such as

void foo() { __builtin_unreachable(); }

or optimized to such due to unconditional undefined behavior when the function
is reached, GCC emits a zero-length function. This causes the address of foo to
be equal to the address of whatever function happens to follow foo, breaking
the language requirement that distinct functions' addresses compare not-equal.

As far as I can tell, all versions back to 4.x or earlier are affected.

[Bug target/97431] [SH] Python crashes with 'Segmentation fault with -finline-small-functions

2020-10-14 Thread bugdal at aerifal dot cx via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97431

--- Comment #1 from Rich Felker  ---
Do you have a complete disassembly of the function it crashed in and register
dump at the point of crash? That would help.

[Bug middle-end/95189] [9/10 Regression] memcmp being wrongly stripped like strcmp

2020-10-08 Thread bugdal at aerifal dot cx via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95189

--- Comment #26 from Rich Felker  ---
Is that complete, or is it unclear whether there are code paths other than
builtin memcmp by which this is hit? Am I correct in assuming that with builtin
memcmp expansion returning NULL_RTX, GCC always expands it to a function call?

[Bug middle-end/95189] [9/10 Regression] memcmp being wrongly stripped like strcmp

2020-10-08 Thread bugdal at aerifal dot cx via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95189

--- Comment #24 from Rich Felker  ---
The fixes do not seem trivial to backport; lots of conflicts. It would be
really helpful to have versions of the patch that are minified and applicable
to all affected versions that might be shipping in distros (looks like 9.2,
9.3, 10.1, and 10.2) since this is a critical codegen regression.

[Bug middle-end/95189] [9/10 Regression] memcmp being wrongly stripped like strcmp

2020-10-07 Thread bugdal at aerifal dot cx via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95189

Rich Felker  changed:

   What|Removed |Added

 CC||bugdal at aerifal dot cx

--- Comment #20 from Rich Felker  ---
For what it's worth, -fno-builtin is a workaround for this entire class of bug.

[Bug preprocessor/96952] __builtin_thread_pointer support cannot be probed

2020-09-29 Thread bugdal at aerifal dot cx via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96952

--- Comment #5 from Rich Felker  ---
The whole point of __has_builtin is to let you avoid the configure-time checks
on compilers that support __has_builtin. If __has_builtin doesn't actually
work, it's pointless that it even exists and indeed everyone should just
pretend it doesn't exist and keep using configure-time checks for everything.

[Bug libstdc++/93421] futex.cc use of futex syscall is not time64-compatible

2020-09-24 Thread bugdal at aerifal dot cx via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93421

--- Comment #7 from Rich Felker  ---
Indeed, the direct clock_gettime syscall stuff is just unnecessary on any
modern system, certainly any time64 one. I read the patch briefly and I don't
see anywhere it would break anything, but it also wouldn't produce a useful
Y2038-ready configuration, so I don't think it makes sense. Configure or
source-level assertions should just ensure that, if time_t is larger than long
and there's a distinct time64 syscall, the direct syscall is never used.

[Bug libstdc++/93421] futex.cc use of futex syscall is not time64-compatible

2020-09-23 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93421

--- Comment #4 from Rich Felker  ---
Actually I didn't see it, I just saw Florian added to CC and it reminded me of
the issue, which reminded me I needed to check this for riscv32 issues with the
riscv32 port pending merge. :-)

[Bug libstdc++/93421] futex.cc use of futex syscall is not time64-compatible

2020-09-23 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93421

--- Comment #2 from Rich Felker  ---
Rather than #if defined(SYS_futex_time64), I think it should be made:

#if defined(SYS_futex_time64) && SYS_futex_time64 != SYS_futex

This is in consideration of support for riscv32 and future archs without legacy
syscalls. It's my intent in musl to accept the riscv32 port with SYS_futex
defined to be equal to SYS_futex_time64; otherwise all software making use of
SYS_futex gratuitously breaks.

[Bug preprocessor/96952] __builtin_thread_pointer support cannot be probed

2020-09-07 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96952

Rich Felker  changed:

   What|Removed |Added

 CC||bugdal at aerifal dot cx

--- Comment #3 from Rich Felker  ---
This answer does not seem satisfactory. Whether it will be optimized is not the
question. Just whether it's semantically defined. That should either be
universally true on GCC versions that offer the builtin (via a libgcc function
if nothing else is available) or target-specific (which is known at
preprocessing time).

[Bug target/95921] [m68k] invalid codegen for __builtin_sqrt

2020-07-01 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95921

--- Comment #4 from Rich Felker  ---
The related issue I meant to link to is
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93681 which is for x87, but the
equivalent happens on m68k due to FLT_EVAL_METHOD being 2 here as well.

[Bug target/95921] [m68k] invalid codegen for __builtin_sqrt

2020-06-27 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95921

--- Comment #3 from Rich Felker  ---
Yes,I'm aware m68k has FLT_EVAL_METHOD=2. That's not license for *functions* to
return excess precision. The language specification is very clear about where
excess precision is and isn't kept, and here it must not be. All results are
deterministic even with excess precision. Moreover if there's excess precision
where gcc's middle end didn't expect it, it will turn into cascadingly wrong
optimization, possibly even making pure integer results wrong.

[Bug target/95921] [m68k] invalid codegen for __builtin_sqrt

2020-06-26 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95921

--- Comment #1 from Rich Felker  ---
I wonder if the fact that GCC thinks the output of the insn is already double
suggests other similar bugs in the m68k backend, though... If extended
precision were working correctly, I'd think it would at least expect the result
to have extended precision and be trying to drop the excess precision
separately. But it's not; it's just returning. Here's my test case:

double my_sqrt(double x)
{
return __builtin_sqrt(x);
}

with -O2 -std=c11 -fno-math-errno -fomit-frame-pointer

The last 2 options are non-critical (GCC still uses the inline insn even with
-fmath-errno and branches only for the exceptional case) but clean up the
output so it's more clear what's going on.

[Bug target/95921] New: [m68k] invalid codegen for __builtin_sqrt

2020-06-26 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95921

Bug ID: 95921
   Summary: [m68k] invalid codegen for __builtin_sqrt
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: bugdal at aerifal dot cx
  Target Milestone: ---

On ISA levels below 68040, __builtin_sqrt expands to code that performs an
extended-precision sqrt operation rather than a double-precision one. Not only
does this give the wrong result; it enables further cascadingly-wrong
optimization ala #93806 and related bugs, because the compiler thinks the value
in the output register is a double, but it's not.

I think the right fix is making the rtl in m68k.md only allow long double
operands unless ISA level is at least 68040, in which case the
correctly-rounding instruction can be used. Then the standard function will be
used instead of a builtin definition, and it can patch up the result
accordingly.

[Bug ipa/95558] Invalid IPA optimizations based on weak definition

2020-06-06 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95558

--- Comment #3 from Rich Felker  ---
In addition to a fix, this is going to need a workaround as well. Do you have
ideas for a clean one? A dummy asm in the dummy function to kill pureness is
certainly a big hammer that would work, but it precludes LTO optimization if
the weak definition doesn't actually get replaced, so I don't like that.

One idea I think would work, but not sure: make an external __weak_dummy_tail
function that all the weak dummies tail call to. This should only take a few
bytes more than just returning, and precludes pureness analysis in the TU it's
in, while still allowing DCE at LTO time when the definition of
__weak_dummy_tail becomes available.

Is my reasoning correct here?

[Bug ipa/95558] Invalid IPA optimizations based on weak definition

2020-06-06 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95558

--- Comment #2 from Rich Felker  ---
Wow. It's interesting that we've never seen this lead to incorrect codegen
before, though. All weak dummies should be affected, but only in some cases
does the pure get used to optimize out the external call.

This suggests there's a major missed optimization around pure functions too, in
addition to the wrong application of pure (transfering it from the weak
definition to the external declaration) that's the bug.

[Bug middle-end/95558] New: Invalid IPA optimizations based on weak definition

2020-06-05 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95558

Bug ID: 95558
   Summary: Invalid IPA optimizations based on weak definition
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: bugdal at aerifal dot cx
  Target Milestone: ---

Created attachment 48689
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=48689=edit
test case

Here is a case that came up in WIP code on musl libc, where I wanted to provide
a weak dummy definition for functionality that would optionally be replaced by
a strong definition elsewhere at ld time. I've been looking for some plausible
explanation aside from an IPA bug, like interaction with UB, but I can't find
any.

In the near-minimal test case here, the function reclaim() still has all of the
logic it should, but reclaim_gaps gets optimized down to a nop.

What seems to be happening is that the dummy weak definition does not leak into
its direct caller via IPA optimizations, but does leak to the caller's caller.

[Bug middle-end/95249] Stack protector runtime has to waste one byte on null terminator

2020-05-20 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95249

--- Comment #2 from Rich Felker  ---
Indeed, using an extra zero pad byte could bump the stack frame size by 4 or 8
or 16 bytes, or could leave it unchanged, depending on alignment prior to
adding the byte and the alignment requirements of the target.

[Bug middle-end/95249] New: Stack protector runtime has to waste one byte on null terminator

2020-05-20 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95249

Bug ID: 95249
   Summary: Stack protector runtime has to waste one byte on null
terminator
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: bugdal at aerifal dot cx
  Target Milestone: ---

At least glibc presently stores a null byte in the first byte of the stack
protector canary value, so that string-based read overflows can't leak the
canary value. On 32-bit targets, this wastes a significant portion of the
randomness, making it possible that massive-scale attacks (e.g. against
millions of mobile or IoT devices) will have a decent chance of some success
bypassing stack protector. musl presently does not zero the first byte, but I
received a suggestion that we should do so, and got to thinking about the
tradeoffs involved.

If GCC would skip one byte below the canary, the full range of values could be
used by the stack protector runtime without the risk of string-read-based
disclosure. This should be inexpensive in terms of space and time to store a
single 0 byte on the stack.

[Bug tree-optimization/95097] New: Missed optimization with bitfield value ranges

2020-05-12 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95097

Bug ID: 95097
   Summary: Missed optimization with bitfield value ranges
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: bugdal at aerifal dot cx
  Target Milestone: ---

#include 
struct foo {
uint32_t x:20;
};
int bar(struct foo f)
{
if (f.x) {
uint32_t y = (uint32_t)f.x*4096;
if (y<200) return 1;
else return 2;
}
return 3;
}

Here, truth of the condition f.x implies y>=4096, but GCC does not DCE the
y<200 test and return 1 codepath.

I actually had this come up in real world code, where I was considering use of
an inline function with nontrivial low size cases when a "page count" bitfield
is zero, where I expected these nontrivial cases to be optimized out based on
already having tested that the page count being nonzero, but GCC was unable to
do it. LLVM/clang does it.

[Bug target/91970] arm: 64bit int to double conversion does not respect rounding mode

2020-04-18 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91970

--- Comment #12 from Rich Felker  ---
There's some awful hand-written asm in libgcc/config/arm/ieee754-df.S replacing
the standard libgcc2.c versions; that's the problem. But in order to use the
latter it would need to be compiled with -mfloat-abi=softfp since the
__aeabi_l2d function (and all the __aeabi_* apparently) use the standard
soft-float EABI even on EABIHF targets.

I'm not sure why you want a library function to be called for this on hardfloat
targets anyway. Inlining the hi*0x1p32+lo is almost surely smaller than the
function call, counting spills and conversion of the result back from GP
registers to an FP register. It seems like GCC should be able to inline this
idiom at a high level for *all* targets that lack a floatdidf operation but
have floatsidf.

Of course a high level fix is going to be hell to backport, and this really
needs a backportable fix or workaround (maintained in mcm not upstream gcc)
from musl perspective. Maybe the easiest way to do that is just to hack the
right preprocessor conditions for a hardfloat implementation into
ieee754-df.S...

[Bug target/94646] New: [arm] invalid codegen for conversion from 64-bit int to double hardfloat

2020-04-17 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94646

Bug ID: 94646
   Summary: [arm] invalid codegen for conversion from 64-bit int
to double hardfloat
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: bugdal at aerifal dot cx
  Target Milestone: ---

GCC emits a call to __aeabi_l2d to convert from long long to double. This is
invalid for hardfloat ABI because it does not honor rounding modes or raise
exception flags. That in turn causes the implementation of fma in musl libc to
produce wrong results for non-default rounding modes.

[Bug target/94643] New: [x86_64] gratuitous sign extension of nonnegative value from 32 to 64 bits

2020-04-17 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94643

Bug ID: 94643
   Summary: [x86_64] gratuitous sign extension of nonnegative
value from 32 to 64 bits
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: bugdal at aerifal dot cx
  Target Milestone: ---

Test case:

#include 
uint16_t a[];
uint64_t f(int i)
{
return a[i]*16;
}

Produces:

movslq  %edi, %rdi
movzwl  a(%rdi,%rdi), %eax
sall$4, %eax
cltq
ret

The value is necessarily in the range [0,1M) (in particular, nonnegative) and
operation on eax has already cleared the upper bits of rax, so cltq is
completely gratuitous. I've observed the same in nontrivial examples where
movslq gets used.

[Bug c/94631] Wrong codegen for arithmetic on bitfields

2020-04-17 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94631

--- Comment #8 from Rich Felker  ---
OK, I think it's in 6.3.1.1 Boolean, characters, and integers, ΒΆ2, but somewhat
poorly worded:

"The following may be used in an expression wherever an int or unsigned int may
be used: 

- An object or expression with an integer type (other than int or unsigned int)
whose integer conversion rank is less than or equal to the rank of int and
unsigned int.
- A bit-field of type _Bool, int, signed int, or unsigned int.

If an int can represent all values of the original type (as restricted by the
width, for a bit-field), the value is converted to an int; otherwise, it is
converted to an unsigned int. These are called the integer promotions."

The first sentence with second bullet point suggests it should behave as
unsigned int, but the "as restricted by the width, for a bit-field" in the
paragraph after after the bulleted list seems to confirm your interpretation.

[Bug c/94631] Wrong codegen for arithmetic on bitfields

2020-04-17 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94631

--- Comment #7 from Rich Felker  ---
Can you provide a citation for that?

[Bug c/94631] Wrong codegen for arithmetic on bitfields

2020-04-17 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94631

--- Comment #5 from Rich Felker  ---
No, GCC's treatment also seems to mess up bitfields smaller than int and fully
governed by the standard (no implementation-defined use of non-int types):

struct foo {
unsigned x:31;
};

struct foo bar = {0};

bar.x-1 should yield UINT_MAX but yields -1 (same representation but different
type) because it behaves as a promotion from a phantom type unsigned:31 to int
rather than as having type unsigned to begin with.

This can of course be observed by comparing it against 0. It's subtle and
dangerous because it may also trigger optimization around UB of signed overflow
when the correct behavior would be well-defined modular arithmetic.

[Bug c/94631] Wrong codegen for arithmetic on bitfields

2020-04-16 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94631

--- Comment #2 from Rich Felker  ---
So basically the outcome of DR120 was allowing the GCC behavior? It still seems
like a bad thing, not required, and likely to produce exploitable bugs (due to
truncation of arithmetic) as well as very poor-performance code (due to
constant masking).

[Bug c/94631] New: Wrong codegen for arithmetic on bitfields

2020-04-16 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94631

Bug ID: 94631
   Summary: Wrong codegen for arithmetic on bitfields
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: bugdal at aerifal dot cx
  Target Milestone: ---

Test case:

struct foo {
unsigned long long low:12, hi:52;
};
unsigned long long bar(struct foo *p)
{
return p->hi*4096;
}

Should generate only a mask off of the low bits, but gcc generates code to mask
off the low 12 bits and the high 12 bits (reducing the result to 52 bits).
Presumably GCC is interpreting the expression p->hi as having a phantom type
that's only 52 bits wide, rather than having type unsigned long long.

clang/LLVM compiles it correctly.

I don't believe there's any language in the standard supporting what GCC is
doing here.

[Bug tree-optimization/14441] [tree-ssa] missed sib calling when types change

2020-04-16 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=14441

Rich Felker  changed:

   What|Removed |Added

 CC||bugdal at aerifal dot cx

--- Comment #11 from Rich Felker  ---
I've hit what seems to be this same issue on x86_64 with minimal test case:

long g(void);
int f(void)
{
return g();
}

It's actually really annoying because it causes all of the intended tail-call
handling of syscall returns in musl to be non-tail calls since __syscall_ret
returns long (needed for a few syscalls) but most thin syscall-wrapper
functions return int.

If the x86_64 version is not this same issue but something separate I can open
a new bug for it.

[Bug inline-asm/87733] local register variable not honored with earlyclobber

2020-03-15 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87733

--- Comment #35 from Rich Felker  ---
> Oh, your real code is different, and $10 doesn't work for that?  I see.

No, the real code is exactly that. What you're missing is that the kernel,
entered through syscall, has a jump back to the addu after it's clobbered all
the registers in the clobberlist if the syscall is interrupted and needs to be
restarted.

[Bug inline-asm/87733] local register variable not honored with earlyclobber

2020-03-14 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87733

--- Comment #33 from Rich Felker  ---
> An asm clobber just means "may be an output", and no operand will be assigned
> a register mentioned in a clobber.  There is no magic.

This, plus the compiler cannot assume the value in any of the clobbered
registers is preserved across the asm statement.

> This is inlined just fine?

It produces *wrong code* so it doesn't matter if it inlines fine. $10 is
modified by the kernel in the event the syscall is restarted, so the wrong
value will be loaded on restart.

[Bug inline-asm/87733] local register variable not honored with earlyclobber

2020-03-14 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87733

--- Comment #30 from Rich Felker  ---
> You need to make $r10 not a clobber but an inout, of course.  And not

That's not a correct constraint, because it's clobbered by the kernel between
the first syscall instruction's execution and the second execution of the addu
instruction after the kernel returns to restart it. $10 absolutely needs to be
a clobber because the kernel clobbers it. The asm block can't use any registers
the kernel clobbers.

> allowing the "i" just costs one more register move, not so bad imo.
> So you do have a workaround now.  Of course we should see if this can
> actually be fixed instead ;-)

I don't follow. As long as the "i" gets chosen, the asm inlines nicely. If not,
it forces a gratuitous stack frame to spill a non-clobberlisted register to use
as the input.

The code has been working for the past 8 years with the "0"(r2) input
constraint added, and would clearly be valid if r2 were pre-initialized with
something.

[Bug inline-asm/87733] local register variable not honored with earlyclobber

2020-03-14 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87733

--- Comment #28 from Rich Felker  ---
And it looks like I actually hit this exact bug back in 2012 but misattributed
it:

https://git.musl-libc.org/cgit/musl/commit/?id=4221f154ff29ab0d6be1e7beaa5ea2d1731bc58e

I assumed things went haywire from using two separate "r" constraints, rather
than "r" and "0", to bind the same register, but it seems the real problem was
that the "="(r2) was not binding at all, and the "0"(r2) served to fix that.

[Bug inline-asm/87733] local register variable not honored with earlyclobber

2020-03-14 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87733

--- Comment #27 from Rich Felker  ---
Also just realized:

> Rich, forcing "n" to be in "$r10" seems to do the trick?  Is that a reasonable
solution for you?

It doesn't even work, because the syscall clobbers basically all call-clobbered
registers. Current kernels are preserving at least $25 (t9) and $28 (gp) and
the syscall argument registers, so $25 may be usable, but it was deemed not
clear in 2012. I'm looking back through musl git history, and this is actually
why the "i" alternative was wanted -- in basically all uses, "i" is
satisfiable, and avoids needing to setup a stack frame and spill a call-saved
register to the stack in order to use it to hold the syscall number to reload
on restart.

[Bug inline-asm/87733] local register variable not honored with earlyclobber

2020-03-14 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87733

--- Comment #26 from Rich Felker  ---
Indeed, I just confirmed that binding the n input to a particular register
prevents the "i" part of the "ir" alternative from working.

[Bug inline-asm/87733] local register variable not honored with earlyclobber

2020-03-14 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87733

--- Comment #24 from Rich Felker  ---
The reasons I was hesitant to force n to a particular register through an extra
register __asm__ temp var was that I was unsure how it would interact with the
"i" constraint (maybe prevent it from being used?) and that this is code that
needs to be inlined all over the place, and adding more specific-register
constraints usually hurts register allocation in all functions where it's used.

If the "0"(r2) input constraint seems unsafe to rely on with r2 being
uninitialized (is this a real concern I should have?) just writing 0 or n to r2
before the asm would only waste one instruction and shouldn't really hurt.

[Bug inline-asm/87733] local register variable not honored with earlyclobber

2020-03-14 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87733

--- Comment #22 from Rich Felker  ---
What should I call the new bug? The description sounds the same as this one,
and it's fixed in gcc 9.x, just not earlier versions, so it seems to be the
same bug.

[Bug inline-asm/87733] local register variable not honored with earlyclobber

2020-03-14 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87733

--- Comment #19 from Rich Felker  ---
> This looks like bad inline asm.  You seem to be using $2, $8, $9 and $sp 
> explicitly and not letting the compiler know you are using them.

$2, $8, and $9 are all explicitly outputs. All changes to $sp are reversed
before the asm ends and there are no memory operands which could be sp-based
and thereby invalidated by temp changes to it.

> I think you want to change those to %0, %2 and %3 and adding one for $sp?

All that does it make the code harder to read and more fragile against changes
to the order the constraints are written in.

> ...and "n" is an argument register, so why use "ir" for n's constraint? 
> Shouldn't that just be "r"?  Maybe that is confusing IRA/LRA/reload?

The code has been reduced as a standalone example that still reproduced the
bug, from a static inline function that was inlined into a function with
exactly the same signature. The static inline has a constant n after constant
propagation for almost all places it gets inlined, so it "ir" constraint makes
sense there. However, removing the "i" does not make the problem go away
anyway.

[Bug inline-asm/87733] local register variable not honored with earlyclobber

2020-03-14 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87733

--- Comment #16 from Rich Felker  ---
> I didn't say this very well...  The only issue is using the same hard
> register for two different operands.  You don't need to do this for
> syscalls (and you do not *need* that *ever*, of course).

I hit the bug without using the same hard register for two operands. At least
I'm pretty sure it's the same bug because the behavior matches and it's present
in 6.3.0 but not 9.2.0.

> Can you post some code that fails?  If you think this is a GCC bug (in
> some older branch?) that we should fix, please open a new PR for it.

Here's the relevant code extracted out of musl:

#define SYSCALL_CLOBBERLIST \
"$1", "$3", "$11", "$12", "$13", \
"$14", "$15", "$24", "$25", "hi", "lo", "memory"

long syscall6(long n, long a, long b, long c, long d, long e, long f)
{
register long r4 __asm__("$4") = a;
register long r5 __asm__("$5") = b;
register long r6 __asm__("$6") = c;
register long r7 __asm__("$7") = d;
register long r8 __asm__("$8") = e;
register long r9 __asm__("$9") = f;
register long r2 __asm__("$2");
__asm__ __volatile__ (
"subu $sp,$sp,32 ; sw $8,16($sp) ; sw $9,20($sp) ; "
"addu $2,$0,%4 ; syscall ;"
"addu $sp,$sp,32"
: "="(r2), "+r"(r7), "+r"(r8), "+r"(r9)
: "ir"(n), "r"(r4), "r"(r5), "r"(r6)
: SYSCALL_CLOBBERLIST, "$10");
return r7 && r2>0 ? -r2 : r2;
}

Built with gcc 6.3.0, %4 ends up expanding to $2, violating the earlyclobber,
and %0 gets bound to $16 rather than $2 (which is why the violation is allowed,
it seems).

With "0"(r2) added to input constraints, the bug goes away.

I don't particularly think this bug is something that needs to be fixed in
older branches, especially if doing so is hard, but I do think it's something
we need a solid reliable workaround for.

[Bug inline-asm/87733] local register variable not honored with earlyclobber

2020-03-14 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87733

--- Comment #12 from Rich Felker  ---
> You can work around it on older GCC by simply not using a register var
> for more than one asm operand, I think?

Nope. Making a syscall inherently requires binding specific registers for all
of the inputs/outputs, unless you want to spill everything to an explicit
structure in memory and load them all explicitly in the asm block. So it really
is a big deal.

In particular, all mips variants need an earlyclobber constraint for the output
register $2 because the old Linux kernel syscall contract was that, when
restartable syscalls were interrupted, the syscall number passed in through $2
was lost, and the kernel returned to $pc-8 and expected a userspace instruction
to reload $2 with the syscall number from an immediate or another register. If
the input to load into $2 were itself passed in $2 (possible without
earlyclobber), the reloading would be ineffective and restarting syscalls would
execute the wrong syscall.

The original mips port of musl had undocumented and seemingly useless "0"(r2)
input constraints that were suppressing this bug, using the input to bind the
register where the earlyclobber output failed to do so. After some recent
changes broke compatibility with older kernels requiring the above contract, I
manually reverted them (due to intervening conflicting diffs) and omitted the
seemingly useless constraint, and it broke horribly. Eventually I found this
bug searching the tracker. My plan for now is just to add back the "0"(r2)
constraint, but since r2 is uninitialized, it's not clear that having it as an
input constraint is even well-defined. Is this the best thing to do?

[Bug inline-asm/87733] local register variable not honored with earlyclobber

2020-03-13 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87733

Rich Felker  changed:

   What|Removed |Added

 CC||bugdal at aerifal dot cx

--- Comment #10 from Rich Felker  ---
This is a rather huge bug to have been fixed silently. Could someone who knows
the commit that fixed it and information on what versions are affected attach
the info to the tracker here? And ideally some information on working around it
for older GCCs?

>From what I can tell experimenting so far, adding a dummy "0"(r0) constraint,
or using + instead of =, makes the problem go away, but potentially has other
ill effects from use of an uninitialized object..?

[Bug middle-end/93806] Wrong optimization: instability of floating-point results with -funsafe-math-optimizations leads to nonsense

2020-02-25 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93806

--- Comment #32 from Rich Felker  ---
> A slightly modified version of the example, showing the issue with GCC 5 to 7 
> (as the noipa attribute directive has been added in GCC 8):

Note that __attribute__((__weak__)) necessarily applies noipa and works in
basically all GCC versions, so you can use it where you want this kind of
example for older GCC.

[Bug middle-end/93806] Wrong optimization: instability of floating-point results with -funsafe-math-optimizations leads to nonsense

2020-02-20 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93806

--- Comment #14 from Rich Felker  ---
Indeed, without Anenx F, division by zero is UB, so it's fine to do anything if
the program performs division by zero. So we need examples without division by
zero.

[Bug middle-end/93806] Wrong optimization: instability of floating-point results with -funsafe-math-optimizations leads to nonsense

2020-02-20 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93806

--- Comment #12 from Rich Felker  ---
To me the meaning of internal consistency is very clear: that the semantics of
the C language specification are honored and that the only valid
transformations are those that follow the "as-if rule". Since C without Annex F
allows arbitrarily awful floating point results, your example in comment 11 is
fine. Each instance of 1/a can evaluate to a different value. They could even
evaluate to random values. However, if you had written:

  int b = 1/a == 1/0.;
  int c = b;
  return b == c;

then the function must necessarily return 1, because the single instance of
1/a==1/0. in the abstract machine has a single value, either 0 or 1, and in the
abstract machine that value is stored to b, then copied to c, and b and c
necessarily have the same value. While I don't think it's likely that GCC would
mess up this specific example, it seems that it currently _can_ make
transformations such that a more elaborate version of the same idea would be
broken.

[Bug middle-end/93806] Wrong optimization: instability of floating-point results with -funsafe-math-optimizations leads to nonsense

2020-02-20 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93806

--- Comment #10 from Rich Felker  ---
I don't think it's at all clear that -fno-signed-zeros is supposed to mean the
programmer is promising that their code has behavior independent of the sign of
zeros, and that any construct which would be influenced by the sign of a zero
has undefined behavior. I've always read it as a license to optimize in ways
that disregard the sign of a zero or change the sign of a zero, but with
internal consistency of the program preserved.

If -fno-signed-zeros is really supposed to be an option that vastly expands the
scope of what's undefined behavior, rather than just removing part of Annex F
and allowing the unspecified quality of floating point results that C otherwise
allows in the absence of Annex F, it really needs a much much bigger warning in
its documentation!

[Bug c/85957] i686: Integers appear to be different, but compare as equal

2020-02-11 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85957

--- Comment #25 from Rich Felker  ---
I think standards-conforming excess precision should be forced on, and added to
C++; there are just too many dangerous ways things can break as it is now. If
you really think this is a platform of dwindling relevance (though I question
that; due to the way patent lifetimes work, the first viable open-hardware x86
clones will almost surely lack sse, no?) then we should not have dangerous
hacks for the sake of marginal performance gains, with too few people spending
the time to deal with their fallout.

I'd be fine with an option to change the behavior of constants, and have it set
by default for -std=gnu* as long as the unsafe behavior is removed from
-std=gnu*.

[Bug tree-optimization/93682] Wrong optimization: on x87 -fexcess-precision=standard is incompatible with -mpc64

2020-02-11 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93682

--- Comment #2 from Rich Felker  ---
I think the underlying issue here is just that -mpc64 (along with -mpc32) is
just hopelessly broken and should be documented as such. It could probably be
made to work, but there are all sorts of issues like float.h being wrong, math
library code breaking, etc.

On a more fundamental level (but seemingly unrelated to the mechanism of
breakage here), the underlying x87 precision control modes are also hopelessly
broken. They're not actually single/double precision modes, but single/double
mantissa with ld80 exponent. So I don't think it's possible to make the
optimizer aware of them without making it aware of two new floating point
formats that it doesn't presently know about. If you just pretended they were
single/double, the same sort of issue would arise again as soon as someone uses
small or large values that should be denormal/underflow/overflow but which
retain their full-precision values by virtue of the excess exponent precision.

[Bug c/85957] i686: Integers appear to be different, but compare as equal

2020-02-09 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85957

--- Comment #19 from Rich Felker  ---
Test case provided by Szabolcs Nagy showing that GCC does seem to spill right
if it can't assume there's no excess precision to begin with:

double h();
double ff(double x, double y)
{
return x+y+h();
}

In theory this doesn't force a spill, but GCC seems to choose to do one, I
guess to avoid having to preserve two incoming values (although they're already
in stack slots that would be naturally preserved).

Here GCC 9.2 with -fexcess-precision=standard -O3 it emits fstpt/fldt.

[Bug c/85957] i686: Integers appear to be different, but compare as equal

2020-02-09 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85957

--- Comment #18 from Rich Felker  ---
It was just pointed out to me that this might be an invalid test since GCC
assumes (correctly or not) that the return value of a function does not have
excess precision. I'll see if I can make a better test.

[Bug c/85957] i686: Integers appear to be different, but compare as equal

2020-02-09 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85957

--- Comment #17 from Rich Felker  ---
And indeed you're right that GCC does it wrong. This can be seen from a minimal
example:

double g(),h();
double f()
{
return g()+h();
}

where gcc emits fstpl/fldp around the second call rather than fstpt/fldt.

So this is all even more broken that I thought. It looks like the only way to
get deterministic behavior from GCC right now is to get the wrong deterministic
behavior via -ffloat-store.

Note that libfirm/cparser gets the right result, emitting fstpt/fldt.

[Bug c/85957] i686: Integers appear to be different, but compare as equal

2020-02-09 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85957

--- Comment #16 from Rich Felker  ---
> And GCC does not do spills in this format, as see in bug 323.

In my experience it seems to (assuming -fexcess-precision=standard), though I
have not done extensive testing. I'll check and follow up.

> This is conforming as there is no requirement to keep intermediate results in 
> excess precision and range.

Such behavior absolutely is non-conforming. The standard reads (5.2.4.2.2 ΒΆ9):

"Except for assignment and cast (which remove all extra range and precision),
the values yielded by operators with floating operands and values subject to
the usual arithmetic conversions and of floating constants are evaluated to a
format whose range and precision may be greater than required by the type"

Note "are evaluated", not "may be evaluated depending on what spills the
compiler chooses to perform".

[Bug c/82318] -fexcess-precision=standard has no effect on a libm function call

2020-02-08 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82318

--- Comment #9 from Rich Felker  ---
Indeed, I don't think the ABI says anything about this; a bug against the psABI
should probably be opened.

[Bug c/85957] i686: Integers appear to be different, but compare as equal

2020-02-08 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85957

--- Comment #14 from Rich Felker  ---
> No problems: FLT_EVAL_METHOD==2 means "evaluate all operations and constants 
> to the range and precision of the long double type", which is what really 
> occurs. The consequence is indeed double rounding when storing in memory, but 
> this can happen at *any* time even without -ffloat-store (due to spilling), 
> because you are never sure that registers are still available; see some 
> reports in bug 323.

It sounds like you misunderstand the standard's requirements on, and GCC's
implementation of, FLT_EVAL_METHOD==2/excess-precision. The availability of
registers does not in any way affect the result, because when expressions are
evaluated with excess precision, any spills must take place in the format of
float_t or double_t (long double) and are thereby transparent to the
application. The buggy behavior prior to -fexcess-precision=standard (and now
produced with -fexcess-precision=fast which is default in "gnu" modes) spills
in the nominal type, producing nondeterministic results that depend on the
compiler's transformations and that lead to situations like this bug (where the
optimizer has been lied to that two expressions are equal, but they're not).

> Double rounding can be a problem with some codes, but this just means that 
> the code is not compatible with FLT_EVAL_METHOD==2. For some floating-point 
> algorithms, double rounding is not a problem at all, while keeping a result 
> in extended precision will make them fail.

With standards-conforming behavior, the rounding of an operation and of storage
to an object of float/double type are discrete roundings and you can observe
and handle the intermediate value between them. With -ffloat-store, every
operation inherently has a double-rounding attached to it. This behavior is
non-conforming but at least deterministic, and is what I was referring to in my
previous comment. But I think this is largely a distraction from the issue at
hand; I was only pointing out that -ffloat-store is a workaround, but one with
its own (often severe) problems.

[Bug c/82318] -fexcess-precision=standard has no effect on a libm function call

2020-02-07 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82318

--- Comment #7 from Rich Felker  ---
I'll inquire about it. Note that F.6 already requires this for C functions; the
loophole is just that the implementation itself does not inherently have to
consist of C functions.

If it's determined that C won't require the library functions not bound to IEEE
operations to return values representable in their nominal type, then GCC needs
to be aware of whether the target libc can be expected to do so, and if not, it
needs to, as a special case, assume there might be excess precision in the
return value, so that (double)retval==retval can't be assumed to be true in the
optimizer.

Note that such an option would be nice to have anyway, for arbitrary functions,
since it's necessary for being able to call code that was compiled with
-fexcess-precision=fast from code that can't accept the
non-conforming/optimizer-unsafe behavior and safely use the return value. It
should probably be an attribute, with a flag to set the global default. For
example, __attribute__((__returns_excess_precision__)).

[Bug c/85957] i686: Integers appear to be different, but compare as equal

2020-02-07 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85957

--- Comment #12 from Rich Felker  ---
Note that -fexcess-precision=standard is not available in C++ mode to fix this.

However, -ffloat-store should also ensure consistency to the optimizer
(necessary to prevent this bug, and other variants of it, from happening) at
the expense of some extreme performance and code size costs and making the
floating point results even more semantically incorrect (double-rounding all
over the place, mismatching FLT_EVAL_METHOD==2) and -ffloat-store is available
in C++ mode. Despite all these nasty effects, it may be a suitable workaround,
and at least it avoids letting the optimizer prove 0==1, thereby effectively
treating any affected code as if it contained UB.

Note that in code written to be excess-precision-aware, making use of float_t
and double_t for intermediate operands and only using float and double for
in-memory storage, -ffloat-store should yield behavior equivalent to
-fexcess-precision=standard.

[Bug middle-end/323] optimized code gives strange floating point results

2020-02-07 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=323

--- Comment #214 from Rich Felker  ---
I'm not particular in terms of the path it takes as long as this gets back to a
status where it's on the radar for fixing.

[Bug middle-end/323] optimized code gives strange floating point results

2020-02-06 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=323

--- Comment #211 from Rich Felker  ---
If new reports are going to be marked as duplicates of this, then can it please
be moved from SUSPENDED status to REOPENED? The situation is far worse than
what seems to have been realized last this was worked on, as evidenced by pr
85957. These issues just came up again breaking real-world software in
https://github.com/OSGeo/PROJ/issues/1906

[Bug middle-end/323] optimized code gives strange floating point results

2020-02-06 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=323

--- Comment #210 from Rich Felker  ---
If new reports are going to be marked as duplicates of this, then can it please
be moved from SUSPENDED status to REOPENED? The situation is far worse than
what seems to have been realized last this was worked on, as evidenced by pr
85957. These issues just came up again breaking real-world software in
https://github.com/OSGeo/PROJ/issues/1906

[Bug c++/93620] New: Floating point is broken in C++ on targets with excess precision

2020-02-06 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93620

Bug ID: 93620
   Summary: Floating point is broken in C++ on targets with excess
precision
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: bugdal at aerifal dot cx
  Target Milestone: ---

Attempting to use -fexcess-precision=standard with g++ produces:

cc1plus: sorry, unimplemented: '-fexcess-precision=standard' for C++

In light of eldritch horrors like pr 85957 this means floating point is
essentially catastrophically broken on i386 and m68k.

This came to my attention while analyzing
https://github.com/OSGeo/PROJ/issues/1906. Most of the problems are g++
incorrectly handling excess precision, and they're having to put awful hacks
with volatile objects in place to work around it.

[Bug c/82318] -fexcess-precision=standard has no effect on a libm function call

2020-02-06 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82318

Rich Felker  changed:

   What|Removed |Added

 CC||bugdal at aerifal dot cx

--- Comment #5 from Rich Felker  ---
My understanding is that C2x is fixing this underspecification and will require
the library functions to drop excess precision as if they used a return
statement. So this really should be fixed in glibc if it's still an issue; if
they accept fixing that I don't think GCC needs any action on this. I just
fixed it in musl.

[Bug target/65249] unable to find a register to spill in class 'R0_REGS' when compiling protobuf on sh4

2020-01-30 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65249

Rich Felker  changed:

   What|Removed |Added

 CC||bugdal at aerifal dot cx

--- Comment #27 from Rich Felker  ---
We've hit what seems like almost the exact same issue on gcc 8.3.0 with this
minimized testcase:

void fg(int *);
int get_response(int a)
{
  int b;
  if (a) fg();
  return 0;
}

compiled with -O -c -fstack-protector-strong for sh2eb-linux-muslfdpic. With
gcc 9.2.0 it compiles successfully. I looked for a record of such a fix having
been made, but couldn't find one. Was it a known issue that was fixed silently,
or might it be a lurking bug that's just no longer being hit?

[Bug middle-end/93509] New: Stack protector should offer trap-only handling

2020-01-30 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93509

Bug ID: 93509
   Summary: Stack protector should offer trap-only handling
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: bugdal at aerifal dot cx
  Target Milestone: ---

Presently stack protector functionality depends on making a call to
__stack_chk_fail (possibly via __stack_chk_fail_local to avoid PLT-call-ABI
constraint in the caller). This is less secure than it could be, since it
depends on the ability to make function calls (and possibly operate on global
data and make syscalls in the callee) in a process whose state is compromised.
For example the GOT entries used by PLT could be clobbered or %gs:0x10 (i386
syscall vector) could be clobbered by the same stack-based overflow that caused
the stack protector event in the first place.

In https://gcc.gnu.org/ml/gcc/2020-01/msg00483.html where the topic is being
discussed for other reasons (contract between gcc and libc for where these
symbols are provided), I proposed that GCC should offer an option to emit a
trapping instruction directly, instead of making a function call, analogous to
-fsanitize-undefined-trap-on-error for UBSan. This would work well on all
targets where __builtin_trap is defined, but would regress (requiring PLT call)
on targets where it uses the default abort() definition (are there any relevant
ones?). Segher Boessenkool then requested I file this here on the GCC tracker.

Note: I'm filing this for middle-end because that was my best guess of where
GCC handles it, but it's possible all this logic is repeated in each target or
takes place somewhere else entirely; if so please reassign to appropriate
component.

[Bug libstdc++/93421] New: futex.cc use of futex syscall is not time64-compatible

2020-01-24 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93421

Bug ID: 93421
   Summary: futex.cc use of futex syscall is not time64-compatible
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libstdc++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: bugdal at aerifal dot cx
  Target Milestone: ---

Created attachment 47704
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=47704=edit
simple fix, not necessarily right for upstream

This code directly passes a userspace timespec struct to the SYS_futex syscall,
which does not work if the userspace type is 64-bit but the syscall expects
legacy 32-bit timespec.

I'm attaching the patch we're using in musl-cross-make to fix this. It does not
attempt to use the SYS_futex_time64 syscall, since that would require fallback
logic with cost tradeoffs for which to try first, and since the timeout is
relative and therefore doesn't even need to be 64-bit. Instead it just uses the
existence of SYS_futex_time64 to infer that the plain SYS_futex uses a pair of
longs, and converts the relative timestamp into that. This assumes that any
system where the libc timespec type has been changed for time64 will also have
had its headers updated to define SYS_futex_time64.

Error handling for extreme out-of-bound values should probably be added.

[Bug libstdc++/93325] New: libstdc++ wrongly uses direct clock_gettime syscall on non-glibc, breaks time64

2020-01-19 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93325

Bug ID: 93325
   Summary: libstdc++ wrongly uses direct clock_gettime syscall on
non-glibc, breaks time64
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libstdc++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: bugdal at aerifal dot cx
  Target Milestone: ---

The configure logic for libstdc++ is choosing to make direct clock_gettime
syscalls (via syscall()) rather than using the clock_gettime function except on
glibc 2.17 or later (when it was moved from librt to libc). This is
incompatible with time64 (because struct timespec mismatches the form the old
clock_gettime syscall uses) and also undesirable because it can't take
advantage of vdso.

The hard-coded glibc version dependency is a configure anti-pattern and should
be removed; the right way to test this would be just probing for the
clock_gettime function without adding any libs (like -lrt).

[Bug c/61579] -Wwrite-strings does not behave as a warning option

2019-12-14 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61579

--- Comment #6 from Rich Felker  ---
Ping.

  1   2   3   >