[Bug tree-optimization/32648] missed-optimization: bit-manipulation via bool's
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32648 --- Comment #4 from Andrew Pinski --- For ARM64 we get: f1: ubfxx1, x0, 5, 1 ubfxx0, x0, 3, 1 eor w0, w1, w0 ret f2: eor w0, w0, w0, lsl 2 ubfxx0, x0, 5, 1 ret Which might be just as the same really depending if the lsl is not split away from the eor.
[Bug tree-optimization/52345] Missed optimization dealing with bools
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=52345 --- Comment #4 from Andrew Pinski --- The generic rule is: original 3 expressions; (((int)b) | i) == 0 -> (b == 0) & (i == 0) -> ~b & (i == 0) (4 expressions; maybe 3 if b is a comparison and single use) (((int)b) | i) != 0 -> (b != 0) | (i != 0) -> b | (i != 0) (still 3 expressions) (((int)b) & i) == 0 -> (b == 0) | ((i&1) == 0) -> ~b | ((i&1) == 0) (4 expressions; maybe 3 if b is a comparison and single use) (((int)b) & i) != 0 -> (b != 0) & ((i&1) != 0) -> b & ((i&1) != 0) (still 3 expressions) Where b is a "boolean" type variable.
[Bug tree-optimization/71636] Missed optimization in variable alignment test
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71636 Andrew Pinski changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED Target Milestone|--- |7.0 --- Comment #6 from Andrew Pinski --- Fixed as mentioned.
[Bug rtl-optimization/48696] Horrible bitfield code generation on x86
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=48696 --- Comment #17 from Andrew Pinski --- I think some of this is due to SLOW_BYTE_ACCESS being set to 0.
[Bug fortran/67076] [6/7/8 Regression] [F08] Critical inside a module procedure
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67076 Thomas Koenig changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #7 from Thomas Koenig --- Works on gcc-6, gcc-7 and gcc-8. Closing as fixed.
[Bug target/85485] Remove -mcet
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85485 H.J. Lu changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2018-04-21 Ever confirmed|0 |1 --- Comment #2 from H.J. Lu --- (In reply to Jakub Jelinek from comment #1) > I think it is better to keep it as alias, I think -mcet is much more > familiar to people than -mshstk. -mcet won't get any shadow stack protection. -mcet/-mshstk are used to enable SHSTK intrinsics to IMPLEMENT shadow stack, not to USE shadow stack. To enable shadow stack protection, you need to use -fcf-protection=return. -mcet will only lead user confusions.
[Bug target/85485] Remove -mcet
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85485 Jakub Jelinek changed: What|Removed |Added CC||jakub at gcc dot gnu.org --- Comment #1 from Jakub Jelinek --- I think it is better to keep it as alias, I think -mcet is much more familiar to people than -mshstk.
[Bug c++/58372] internal compiler error: ix86_compute_frame_layout
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58372 Bitterblue changed: What|Removed |Added CC||cantabile.desu at gmail dot com --- Comment #12 from Bitterblue --- Hi. This bug still exists in GCC 7.3.0. It comes up when cross-compiling Qt 5.10.1 from 64 bit Linux for 32 bit Windows. Well, I assume it's the same bug because of comments seen online. https://github.com/mxe/mxe/issues/2011 https://bugreports.qt.io/browse/QTBUG-64707 The offending function can be seen here: https://github.com/qt/qtbase/blob/6c6ace9d23f90845fd424e474d38fe30f070775e/src/corelib/global/qrandom.cpp#L104 % i686-w64-mingw32-gcc -v Using built-in specs. COLLECT_GCC=i686-w64-mingw32-gcc COLLECT_LTO_WRAPPER=/usr/lib/gcc/i686-w64-mingw32/7.3.0/lto-wrapper Target: i686-w64-mingw32 Configured with: /build/mingw-w64-gcc/src/gcc/configure --prefix=/usr --libexecdir=/usr/lib --target=i686-w64-mingw32 --enable-languages=c,lto,c++,objc,obj-c++,fortran,ada --enable-shared --enable-static --enable-threads=posix --enable-fully-dynamic-string --enable-libstdcxx-time=yes --with-system-zlib --enable-cloog-backend=isl --enable-lto --disable-dw2-exceptions --enable-libgomp --disable-multilib --enable-checking=release Thread model: posix gcc version 7.3.0 (GCC) No problem with x86_64-w64-mingw32, of course.
[Bug target/85220] [meta-bug, nvptx] Run trunk with og7 openacc testcases and analyze execution failures
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85220 Tom de Vries changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |FIXED --- Comment #4 from Tom de Vries --- (In reply to Tom de Vries from comment #3) > Marking resolved-fixed.
[Bug fortran/68933] ICE when mixing "-fprofile-arcs -ftest-coverage" and "-fcoarray=lib" on gcc-6 only
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68933 --- Comment #6 from Zaak --- Thanks, I'll check it out. On Sat, Apr 21, 2018 at 8:20 AM dominiq at lps dot ens.fr < gcc-bugzi...@gcc.gnu.org> wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68933 > > --- Comment #5 from Dominique d'Humieres --- > It seems to work with 7.3.0. > > -- > You are receiving this mail because: > You are on the CC list for the bug.
[Bug c/65892] gcc fails to implement N685 aliasing of union members
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65892 --- Comment #37 from David Brown --- (In reply to Martin Sebor from comment #35) > Here are the proposed changes: > > Pointer Provenance: > http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2219.htm#proposed-technical- > corrigendum > > Trap Representations: > http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2220.htm#proposed-technical- > corrigendum > > Unspecified Values: > http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2221.htm#proposed-technical- > corrigendum I am a little unsure of the suggestions for unspecified values here. Can I give some examples, to see if my interpretation is correct? Let's use a new gcc builtin "__builtin_unspecified()" that returns an unspecified value of int type, with no possible traps (no gcc target has trap representations for int types, AFAIK). int x = __builtin_unspecified(); int y = __builtin_unspecified(); if (x == y) doThis(); // The compiler can skip doThis() if (x != y) doThat(); // The compiler can skip doThat() too if (x == y) doThis(); else doThat(); // The compiler can choose to doThis() or doThat(), // but must do one or the other if (x == x) doThis(); // This compiler must doThis() if (x != x) doThat(); // The compiler cannot doThat() if (x == 3) doThis(); // The compiler can choose to doThis() // if the compiler does choose to doThis() the it fixes the value of x as 3 if (x & 0x01) doThis(); else doThat(); // The compiler can choose do doThis() or doThat(), // but that choice fixes the LSB of x This could allow for a range of possible optimisations, especially if there is a nice way to make unspecified values like __builtin_unspecified(). (Unspecified values of other types could be made by casts.) For example: struct opt_int { bool valid; int value; }; struct opt_int safe_sqrt(struct opt_int x) { opt_int y; if (!x.valid || x.value < 0) { y.valid = false; y.value = __builtin_unspecified(); } else { y.valid = true; y.value = unsafe_sqrt(x.value); } return y; } This kind of structure would mean minimal effort when you only need part of a struct to contain specified values.
[Bug c/65892] gcc fails to implement N685 aliasing of union members
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65892 --- Comment #36 from David Brown --- (In reply to Martin Sebor from comment #34) > I think in the use case below: > >struct { int i; char buf[4]; } s, r; >*(float *)s.buf = 1.; >r = s; > > the aggregate copy has to be viewed as a recursive copy of each of its > members and copying buf[4] must be viewed as a memcpy, Char is definitely > special (it can accesses anything with impunity, even indeterminate values). > That said, I don't think the rules allow char arrays to be treated as > allocated storage so while the store to s.buf via float* may be valid it > doesn't change the effective type of s.buf and so the only way to read the > float value stored in it is to copy it byte-by-byte (i.e., copy the float > representation) to an object whose effective type is float. Some of the > papers that deal with the effective type rules might touch on this (e.g., DR > 236, Clark's N1520 In bare metal embedded development, it is common to have to have a way to treat static declared storage (like a char[] array) as a pool for dynamic storage. Often you don't want to use standard library malloc() because of requirements on deterministic timing, etc. What you are saying here is that this is not possible - meaning there is no way to write such malloc replacement in normal C code. (It is possible, I think, to use gcc extensions such as the "may_alias" type attribute and the "malloc" function attribute. And -fno-strict-alias is always a safe resort.) It would be /very/ nice if there were a way to declare statically allocated pools of memory that could be doled out by user-made functions and - like malloc'ed memory - take their effective type when used. It would be even better if there were a standard way to say that the initial value of such memory is "unspecified". The compiler and linker could give such memory a static allocation (essential for small embedded systems with limited memory, so that you can be sure of your memory usage) but there would be no need for zeroing the memory at startup.
[Bug target/85491] [8 Regression] scimark LU Decomposition test 15% slower than GCC 7, 30% slower than peak
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85491 Richard Biener changed: What|Removed |Added Keywords||missed-optimization, ||needs-bisection Target||x86_64-*-* Blocks||79703, 53947 Target Milestone|--- |8.0 Referenced Bugs: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947 [Bug 53947] [meta-bug] vectorizer missed-optimizations https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79703 [Bug 79703] [meta-bug] SciMark 2.0 performance issues
[Bug target/85491] New: [8 Regression] scimark LU Decomposition test 15% slower than GCC 7, 30% slower than peak
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85491 Bug ID: 85491 Summary: [8 Regression] scimark LU Decomposition test 15% slower than GCC 7, 30% slower than peak Product: gcc Version: 8.0.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: rguenth at gcc dot gnu.org Target Milestone: --- https://gcc.opensuse.org/gcc-old/c++bench-czerny/nbench/ shows a LU decomposition slowdown between r257710 and r257760 after a previous improvement between r253986 and r254023. These are numbers on Haswell with -O3 -ffast-math -funroll-loops -march=native. The improvement is likely r253993/r254012, x86 vectorization cost changes. The regression might be r257734.
[Bug fortran/67076] [6/7/8 Regression] [F08] Critical inside a module procedure
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67076 Dominique d'Humieres changed: What|Removed |Added CC|dominiq at lps dot ens.fr | --- Comment #6 from Dominique d'Humieres --- It seems to have been fixed by revision r231649.
[Bug fortran/68933] ICE when mixing "-fprofile-arcs -ftest-coverage" and "-fcoarray=lib" on gcc-6 only
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68933 --- Comment #5 from Dominique d'Humieres --- It seems to work with 7.3.0.
[Bug bootstrap/85490] New: Missing STAGE4_CFLAGS in bootstrap-cet.mk
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85490 Bug ID: 85490 Summary: Missing STAGE4_CFLAGS in bootstrap-cet.mk Product: gcc Version: 8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: bootstrap Assignee: unassigned at gcc dot gnu.org Reporter: hjl.tools at gmail dot com CC: igor.v.tsimbalist at intel dot com Blocks: 81652 Target Milestone: --- Target: x86_64-*-*, i?86-*-* Since profiledbootstrap uses STAGEfeedback_CFLAGS = $(STAGE4_CFLAGS) -fprofile-use bootstrap-cet.mk should define STAGE4_CFLAGS to support profiledbootstrap with CET. Referenced Bugs: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81652 [Bug 81652] [meta-bug] -fcf-protection=full bugs
[Bug rtl-optimization/85423] [8 Regression] ICE in code_motion_process_successors, at sel-sched.c:6403
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85423 Andrey Belevantsev changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |abel at gcc dot gnu.org --- Comment #4 from Andrey Belevantsev --- Sigh, I've put the condition that was too broad in the previous patch and also allowed some legitimate dependencies between debug insns (so Alex was kind of right when he expressed his concern). I'm testing the following after checking that all of the previous PRs and this one passes: diff --git a/gcc/sel-sched-ir.c b/gcc/sel-sched-ir.c index ee970522890..85ff5bd3eb4 100644 --- a/gcc/sel-sched-ir.c +++ b/gcc/sel-sched-ir.c @@ -3308,7 +3308,7 @@ has_dependence_note_dep (insn_t pro, ds_t ds ATTRIBUTE_UNUSED) that a bookkeeping copy should be movable as the original insn. Detect that here and allow that movement if we allowed it before in the first place. */ - if (DEBUG_INSN_P (real_con) + if (DEBUG_INSN_P (real_con) && !DEBUG_INSN_P (real_pro) && INSN_UID (NEXT_INSN (pro)) == INSN_UID (real_con)) return;
[Bug rtl-optimization/79985] ICE in code_motion_path_driver, at sel-sched.c:6580
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79985 --- Comment #8 from Alexander Monakov --- Unfortunately the above doesn't fully address the issue, as schedulers and other passes still have no idea that DF makes those assumptions and will allow reordering of asms: register int r asm("ebx"); int f(int x, int y) { int t = x/y/r; asm("#asm" ); return t-x; } _Z1fii: #APP #asm #NO_APP movl%edi, %eax cltd idivl %esi cltd idivl %ebx subl%edi, %eax ret See how the asm is first, even though from DF point of view it should remain after the read of %ebx for division by r; here cprop_hardreg makes the offending propagation. So currently GCC has a rather split personality when it comes to deps w.r.t global reg vars in asm statements. The documentation should spell out the intended behavior. My suggestion is to require that references are exposed to the compiler via constraints, allowing to remove the ad-hoc treatment in DF. I intend to do that early in stage 1.
[Bug target/85381] [og7, nvptx, openacc] parallel-loop-1.c fails with default vector length 128
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85381 Tom de Vries changed: What|Removed |Added Keywords||openacc Status|UNCONFIRMED |RESOLVED Resolution|--- |FIXED --- Comment #10 from Tom de Vries --- I've committed this workaround: https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=b957d7c71a0a8984a88f5bfccf5a4b6fd080c47d : ... [nvptx, openacc] Don't emit barriers for empty loops 2018-04-21 Tom de VriesPR target/85381 * config/nvptx/nvptx.c (nvptx_process_pars): Don't emit barriers for empty loops. * testsuite/libgomp.oacc-c-c++-common/pr85381-2.c: New test. * testsuite/libgomp.oacc-c-c++-common/pr85381-3.c: New test. * testsuite/libgomp.oacc-c-c++-common/pr85381-4.c: New test. * testsuite/libgomp.oacc-c-c++-common/pr85381-5.c: New test. * testsuite/libgomp.oacc-c-c++-common/pr85381.c: New test. ... Submitted here: https://gcc.gnu.org/ml/gcc-patches/2018-04/msg01023.html . This fixes the hang that I observed, so I'm closing this PR. [ If nvidia comes back with a clear workaround description, I'll implement that, but there's no point in keeping the PR open. ]
Wrong snprintf optimalization
Hello, #include int main(void) { char buf[10]; return snprintf(buf, 0, "string"); } GCC simplifies it to main: mov eax, 6 ret but 0 is correct I think.
[Bug libstdc++/85466] Performance is slow when doing 'branchless' conditional style math operations
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85466 --- Comment #22 from Daniel Elliott --- (In reply to Marc Glisse from comment #21) > (In reply to Daniel Elliott from comment #20) > > still clang is 1.64x faster. had a look at the assembly. My limited > > understanding makes me think that the ucomiss is not fully vectorized and > > the clang one is (clangs ucomiss %xmm0,%xmm1 vs gcc's ucomiss > > 0x218b4(%rip),%xmm0). Feel free to correct me if I am wrong. > > Nothing gets vectorized (likely because of the "dontoptimize" code). The > ucomiss difference is that llvm keeps the constant .5f in a register, while > gcc reloads it every time. I don't know if the speed difference comes from > that, or from some subtle tuning arrangement of the operations (I didn't try > to understand why llvm has 4 mov where gcc has only 2). Right I thought because it was an xmm0 that means vector register. I'm going to go and read up on assembly!
[Bug fortran/67076] [6/7/8 Regression] [F08] Critical inside a module procedure
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67076 --- Comment #5 from janus at gcc dot gnu.org --- I also see it working with gfortran 7.2 and OpenCoarrays 1.9.1 (as packaged in Ubuntu 17.10).
[Bug c++/81837] Internal compiler error (cp/typeck2.c:1264)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81837 --- Comment #10 from Paolo Carlini --- In my opinion we should *always* set it when closing bugs: many, many, users complain that isn't always clear which is the first release where a bug is fixed. Actually, we should also spend more time on keeping up to date the "Known to work" and "Known to fail" fields.
[Bug libstdc++/85466] Performance is slow when doing 'branchless' conditional style math operations
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85466 --- Comment #21 from Marc Glisse --- (In reply to Daniel Elliott from comment #20) > still clang is 1.64x faster. had a look at the assembly. My limited > understanding makes me think that the ucomiss is not fully vectorized and > the clang one is (clangs ucomiss %xmm0,%xmm1 vs gcc's ucomiss > 0x218b4(%rip),%xmm0). Feel free to correct me if I am wrong. Nothing gets vectorized (likely because of the "dontoptimize" code). The ucomiss difference is that llvm keeps the constant .5f in a register, while gcc reloads it every time. I don't know if the speed difference comes from that, or from some subtle tuning arrangement of the operations (I didn't try to understand why llvm has 4 mov where gcc has only 2).
[Bug fortran/67076] [6/7/8 Regression] [F08] Critical inside a module procedure
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67076 Jürgen Reuter changed: What|Removed |Added CC||dominiq at lps dot ens.fr, ||janus at gcc dot gnu.org, ||juergen.reuter at desy dot de, ||tkoenig at gcc dot gnu.org --- Comment #4 from Jürgen Reuter --- This example works for me with the 8.0.1 trunk. As version 5 of the compiler is no longer supported, I wouldn't see much sense in keeping this open. Of course, one should check versions 6 and 7. I had to link libcaf_mpi dynamically, btw.