[Bug tree-optimization/111595] New: detection of MIN/MAX with truncation and sign change for the result
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111595 Bug ID: 111595 Summary: detection of MIN/MAX with truncation and sign change for the result Product: gcc Version: 14.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: enhancement Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: pinskia at gcc dot gnu.org Target Milestone: --- Take: ``` unsigned short f(long a, long b) { short as = a; short bs = b; unsigned short asu = a; unsigned short bsu = b; if (as < bs) return asu; return bsu; } unsigned short f0(long a, long b) { short as = a; short bs = b; unsigned short asu = a; unsigned short bsu = b; if (as < bs) return as; return bs; } unsigned short f1(long a, long b) { short as = a; short bs = b; unsigned short asu = a; unsigned short bsu = b; signed short t; if (as < bs) t = as; else t = bs; return t; } ``` Currently only f1 detects MIN here. They all should produce the same IR in the end.
[Bug middle-end/111594] RISC-V: Failed to fold VEC_COND_EXPR and COND_LEN_ADD
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111594 --- Comment #4 from Andrew Pinski --- (In reply to JuzheZhong from comment #3) > (In reply to Andrew Pinski from comment #1) > > The SVE one was added with r12-4402-g62b505a4d5fc89: > > ``` > > /* Detect simplication for a conditional reduction where > > > >a = mask1 ? b : 0 > >c = mask2 ? d + a : d > > > >is turned into > > > >c = mask1 && mask2 ? d + b : d. */ > > (simplify > > (IFN_COND_ADD @0 @1 (vec_cond @2 @3 integer_zerop) @1) > >(IFN_COND_ADD (bit_and @0 @2) @1 @3 @1)) > > ``` > > Most likely should do the similar thing for IFN_COND_LEN_ADD too. > > Hi, I saw ARM SVE failed to fold VEC_COND + COND_ADD into COND_ADD on > float vector since it can't satisfy integer_zerop. > > Is is reasonable the same optimization should also work for float vector ? I suspect it would only be valid if `!HONOR_NANS (type) && !HONOR_SIGNED_ZEROS (type)` is true. So it could use (match on) zerop instead but would need to check the above conditional too.
[Bug middle-end/111594] RISC-V: Failed to fold VEC_COND_EXPR and COND_LEN_ADD
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111594 --- Comment #3 from JuzheZhong --- (In reply to Andrew Pinski from comment #1) > The SVE one was added with r12-4402-g62b505a4d5fc89: > ``` > /* Detect simplication for a conditional reduction where > >a = mask1 ? b : 0 >c = mask2 ? d + a : d > >is turned into > >c = mask1 && mask2 ? d + b : d. */ > (simplify > (IFN_COND_ADD @0 @1 (vec_cond @2 @3 integer_zerop) @1) >(IFN_COND_ADD (bit_and @0 @2) @1 @3 @1)) > ``` > Most likely should do the similar thing for IFN_COND_LEN_ADD too. Hi, I saw ARM SVE failed to fold VEC_COND + COND_ADD into COND_ADD on float vector since it can't satisfy integer_zerop. Is is reasonable the same optimization should also work for float vector ?
[Bug middle-end/111594] RISC-V: Failed to fold VEC_COND_EXPR and COND_LEN_ADD
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111594 --- Comment #2 from JuzheZhong --- Oh, I see. Thanks a lot! I will have a try.
[Bug middle-end/111594] RISC-V: Failed to fold VEC_COND_EXPR and COND_LEN_ADD
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111594 Andrew Pinski changed: What|Removed |Added Ever confirmed|0 |1 Severity|normal |enhancement Status|UNCONFIRMED |NEW Last reconfirmed||2023-09-26 --- Comment #1 from Andrew Pinski --- The SVE one was added with r12-4402-g62b505a4d5fc89: ``` /* Detect simplication for a conditional reduction where a = mask1 ? b : 0 c = mask2 ? d + a : d is turned into c = mask1 && mask2 ? d + b : d. */ (simplify (IFN_COND_ADD @0 @1 (vec_cond @2 @3 integer_zerop) @1) (IFN_COND_ADD (bit_and @0 @2) @1 @3 @1)) ``` Most likely should do the similar thing for IFN_COND_LEN_ADD too.
[Bug c/111594] New: RISC-V: Failed to fold VEC_COND_EXPR and COND_LEN_ADD
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111594 Bug ID: 111594 Summary: RISC-V: Failed to fold VEC_COND_EXPR and COND_LEN_ADD Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: juzhe.zhong at rivai dot ai Target Milestone: --- Consider this following case: #include void single_loop_with_if_condition(uint64_t * restrict a, uint64_t * restrict b, int loop_size) { uint64_t result = 0; for (int i = 0; i < loop_size; i++) { if (b[i] <= a[i]) { result += a[i]; } } a[0] = result; } In ARM SVE: vect__ifc__33.15_48 = VEC_COND_EXPR ; vect__34.16_49 = .COND_ADD (loop_mask_41, vect_result_19.7_38, vect__ifc__33.15_48, vect_result_19.7_38); will be folded into: vect__34.16_49 = .COND_ADD (_50, vect_result_19.7_38, vect__7.13_45, vect_result_19.7_38); However, for RVV, if failed to fold VEC_COND_EXPR + COND_LEN_ADD. vect__ifc__44.30_96 = VEC_COND_EXPR ; vect__45.31_97 = .COND_LEN_ADD ({ -1, ... }, vect_result_35.22_78, vect__ifc__44.30_96, vect_result_35.22_78, _104, 0); I am not sure where to do this optimization?
[Bug target/111533] [14 Regression] ICE: RTL check: expected code 'reg', have 'const_int' in rhs_regno, at rtl.h:1934
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111533 --- Comment #3 from xuli1 at eswincomputing dot com --- The problem has been reproduced, thank you.
[Bug middle-end/110148] [14 Regression] TSVC s242 regression between g:c0df96b3cda5738afbba3a65bb054183c5cd5530 and g:e4c986fde56a6248f8fbe6cf0704e1da34b055d8
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110148 --- Comment #7 from cuilili --- (In reply to Martin Jambor from comment #6) > I believe this has been fixed? Yes.
[Bug target/111545] [14 Regression] RISC-V gfortran.dg/host_assoc_function_7.f09 Illegal instruction error
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111545 --- Comment #4 from JuzheZhong --- Confirm this is the latent bug in VSETVL PASS which is already existed for a long time. Lehua is working on refactoring Phase 1 and Phase 2 of VSETVL PASS which will fix all potential issues of VSETVL PASS.
[Bug middle-end/94267] Missed folding of _MEM_REF
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94267 --- Comment #4 from Andrew Pinski --- (In reply to Andrew Pinski from comment #3) > Right now we depend on not doing the folding, PR 110702. Well rather we depend on not folding *(_MEM_REF) ...
[Bug middle-end/94267] Missed folding of _MEM_REF
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94267 Andrew Pinski changed: What|Removed |Added See Also||https://gcc.gnu.org/bugzill ||a/show_bug.cgi?id=110702 --- Comment #3 from Andrew Pinski --- Right now we depend on not doing the folding, PR 110702.
[Bug middle-end/111497] [11/12/13/14 Regression] ICE building mariadb on i686 since r8-470
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111497 --- Comment #5 from CVS Commits --- The master branch has been updated by Vladimir Makarov : https://gcc.gnu.org/g:3c23defed384cf17518ad6c817d94463a445d21b commit r14-4256-g3c23defed384cf17518ad6c817d94463a445d21b Author: Vladimir N. Makarov Date: Mon Sep 25 16:19:50 2023 -0400 [PR111497][LRA]: Copy substituted equivalence When we substitute the equivalence and it becomes shared, we can fail to correctly update reg info used by LRA. This can result in wrong code generation, e.g. because of incorrect live analysis. It can also result in compiler crash as the pseudo survives RA. This is what exactly happened for the PR. This patch solves this problem by unsharing substituted equivalences. gcc/ChangeLog: PR middle-end/111497 * lra-constraints.cc (lra_constraints): Copy substituted equivalence. * lra.cc (lra): Change comment for calling unshare_all_rtl_again. gcc/testsuite/ChangeLog: PR middle-end/111497 * g++.target/i386/pr111497.C: new test.
[Bug libstdc++/111588] Provide opt-out of shared_ptr single-threaded optimization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111588 --- Comment #2 from Jonathan Wakely --- This needs numbers, not opinions.
[Bug target/111593] New: wrong code for 128-bit multiplication on MIPS64R6
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111593 Bug ID: 111593 Summary: wrong code for 128-bit multiplication on MIPS64R6 Product: gcc Version: 13.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: mikulas at artax dot karlin.mff.cuni.cz Target Milestone: --- MIPS64R6 has new instructions for multiplication and division. GCC uses them, however it miscompiles 128-bit multiplication. When you compile and run this program with -O1 or -O2 on mips64r6, you get incorrect result 9F172AF9AEE4FDB2FD12E7537CC82A0F. The correct result is 60E3DC5DAC542B19FD12E7537CC82A0F. #include __attribute__((noinline,noclone)) static unsigned __int128 power(unsigned __int128 a, unsigned __int128 b) { unsigned __int128 c = 1; while (b) { if (b & 1) c *= a; a *= a; b >>= 1; } return c; } int main(void) { int i; unsigned __int128 a = 0x1234567890abcdefULL; unsigned __int128 b = 0x1234567890abcdefULL; unsigned __int128 c = power(a, b); for (i = 124; i >= 0; i -= 4) { printf("%X", (unsigned)(c >> i) & 0xf); } printf("\n"); return 0; } How to reproduce: On Debian SID, install the packages gcc-13-mipsisa64r6-linux-gnuabi64, libc6-dev-mips64r6-cross and qemu-user. Run mipsisa64r6-linux-gnuabi64-gcc-13 -O2 power.c && /usr/bin/qemu-mips64 -L /usr/mipsisa64r6-linux-gnuabi64/ a.out The bug happens with gcc-10, gcc-11, gcc-12 and gcc-13 (I didn't try older releases).
[Bug gcov-profile/110827] C++20 coroutines aren't being measured by gcov
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110827 --- Comment #10 from Michael Duggan --- To sum up what I have figured out, C++ transforms the coroutine "function" into a trio of functions: a ramp function, an actor function, and a destruction function. The ramp function acts as the actual function (by name). The actor function contains the original body of the written function (with some transformations), and thus contains the code associated with most of the lines that need coverage information. Since the actor function is generated artificially, it is marked as artificial. The gcov program explicitly ignores functions that are marked as artificial. Also, even if that were not the case, it looks to me like the line coverage information for the actor function only includes the initial line of the function. This seems to be due to the way the artificial function gets inserted into the list of functions of the program. In order to solve this problem, we would need to at least the following: Find a way to not ignore the actor function. This would involve either not marking it as artificial or by marking it in some other way that would be recognized by gcov. Ensure that the actor function properly includes the line number information from the original coroutine body. Most of this work would probably need to be done in the c++ code (where the coroutine transformation happens) rather than in the coverage code. Should this be reassigned to the c++ component?
[Bug middle-end/109967] [11/12/13/14 Regression] Wrong code at -O2 on x86_64-linux-gnu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109967 Xi Ruoyao changed: What|Removed |Added See Also||https://gcc.gnu.org/bugzill ||a/show_bug.cgi?id=111294 CC||rguenther at suse dot de --- Comment #9 from Xi Ruoyao --- Bisect shows r14-4089 (the fix for PR111294) either fixes or "covers up" the issue.
[Bug fortran/59298] ICE when initialising PARAMETER array of derived-type (containing an array) using array constructor
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59298 anlauf at gcc dot gnu.org changed: What|Removed |Added Resolution|--- |FIXED Status|WAITING |RESOLVED Keywords||ice-on-valid-code Known to work||10.5.0 Known to fail||7.5.0, 8.5.0, 9.5.0 Target Milestone|--- |10.5 --- Comment #16 from anlauf at gcc dot gnu.org --- Fixed in gcc-10.
[Bug fortran/84693] scalar DT not broadcast across an array in an initialization expression
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84693 Bug 84693 depends on bug 59298, which changed state. Bug 59298 Summary: ICE when initialising PARAMETER array of derived-type (containing an array) using array constructor https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59298 What|Removed |Added Status|WAITING |RESOLVED Resolution|--- |FIXED
[Bug target/111570] -march=generic prints error
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111570 --- Comment #2 from Brjd --- Thank you and I also read this guide. My point is that the generic arch might be possible in theory. If the gcc builds for the oldest CPU with x86_64, is it possible that code will run on all modern CPU since their subset includes also that of their predecessor. How about making it default to that generic or baseline build for that limited CPU? If I could ask you also more questions, let me ask you about this problem. The guide doesn't mention anything about the specific arch. If -march=cpu what is better -mtune=cpu where cpu is the same as in arch or -mtune=generic so that the code tunes to all CPU kinds of this family.If the tune is empty, is it default generic or native and the arch is not clear either. One question more, I am not able to find a guide about the gcc build and no information whether the gcc may be built in targets like LLVM and clang. For example, is it possible to build first only the LLVM, then stop and resume with clang etc. or first, gcc's only c modiule and its submodules, then stop and resume with its g++ module and submodules, next with libgcc, libstdc++ etc.? It would be great, especially for long bootstraps and stage 2, but I find only make all-gcc, target-libgcc which however build almost all of the compiler.
[Bug target/109166] Built-in __atomic_test_and_set does not seem to be atomic on ARMv4T
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109166 --- Comment #9 from Hans-Peter Nilsson --- (In reply to Richard Earnshaw from comment #8) > I'm going to close this as WONTFIX. I guess I'll have to find another PR to lean on, for fixing the underlying cause for the nonatomic code.
[Bug target/104831] RISCV libatomic LR.aq/SC.rl pair insufficient for SEQ_CST
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104831 Patrick O'Neill changed: What|Removed |Added Resolution|--- |FIXED Status|ASSIGNED|RESOLVED --- Comment #11 from Patrick O'Neill --- This has been resolved on trunk: https://inbox.sourceware.org/gcc-patches/20230427162301.1151333-1-patr...@rivosinc.com/ The cover letter there contains a lot more context about why the mappings are wrong and why we implemented a strengthened version of Table A.6. These mappings are included in the RISC-V PSABI doc: https://github.com/riscv-non-isa/riscv-elf-psabi-doc/pull/378 And this series has been backported to be included in GCC 13.3 (along with a bugfix): https://inbox.sourceware.org/gcc-patches/20230725180206.284777-1-patr...@rivosinc.com/
[Bug target/111533] [14 Regression] ICE: RTL check: expected code 'reg', have 'const_int' in rhs_regno, at rtl.h:1934
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111533 --- Comment #2 from Patrick O'Neill --- Hi, I believe the issue is that you're using rv64gc, not rv64gcv. I haven't tried building with multilib, so my commands are: ../configure --with-arch=rv64gcv --with-abi=lp64d --enable-gcc-checking=rtl make linux -j32
[Bug target/111546] [14 Regression] ICE: gfortran.dg/overload_5.f90:53:2: internal compiler error: in emit_move_insn, at expr.cc:4219 since r14-4163-gbea89f78f2f
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111546 Patrick O'Neill changed: What|Removed |Added Resolution|--- |FIXED Status|UNCONFIRMED |RESOLVED --- Comment #3 from Patrick O'Neill --- gfortran.dg/overload_5.f90 failures have been resolved!
[Bug c++/111592] [11/12/13/14 Regression] ICE on expanding argument pack into variadic constructor
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111592 Andrew Pinski changed: What|Removed |Added Summary|ICE on expanding argument |[11/12/13/14 Regression] |pack into variadic |ICE on expanding argument |constructor |pack into variadic ||constructor Last reconfirmed||2023-09-25 Target Milestone|--- |11.5 Known to work||5.1.0, 5.5.0 Ever confirmed|0 |1 Status|UNCONFIRMED |NEW Known to fail||6.1.0, 6.2.0 Keywords||ice-on-valid-code --- Comment #1 from Andrew Pinski --- Confirmed.
[Bug libstdc++/111588] Provide opt-out of shared_ptr single-threaded optimization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111588 --- Comment #1 from Andrew Pinski --- >for programs that know they are effectively always multithreaded they pay for >a runtime branch and .text segment bloat for an optimization that never >applies. The bloat is not much and the overhead for a branch compared to atomics is still not going to have a bent. I suspect you are looking into the wrong place for optimizations really.
[Bug middle-end/109967] [11/12/13/14 Regression] Wrong code at -O2 on x86_64-linux-gnu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109967 Xi Ruoyao changed: What|Removed |Added CC||xry111 at gcc dot gnu.org --- Comment #8 from Xi Ruoyao --- (In reply to Shaohua Li from comment #7) > This test case does not reproduce anymore on the current trunk. Maybe one of > the recent fixes fixed the underlying issue as well. But we still need to ensure the fix backported into 11/12/13. And there is still a chance that the issue might be covered up by an unrelated change.
[Bug target/111591] ppc64be: miscompilation with -mstrict-align / -O3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111591 Mathieu Malaterre changed: What|Removed |Added Known to work||11.4.0 --- Comment #5 from Mathieu Malaterre --- (In reply to Mathieu Malaterre from comment #3) > I can make the upstream code fails using g++-11 / g++-12 version > (Debian/sid). Nevermind, it seems g++ 11.4.0 can handle the original test case.
[Bug target/111591] ppc64be: miscompilation with -mstrict-align / -O3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111591 Mathieu Malaterre changed: What|Removed |Added Known to work||10.5.0 --- Comment #4 from Mathieu Malaterre --- g++-10 seems to handle -O3/-mstrict-align
[Bug middle-end/109967] [11/12/13/14 Regression] Wrong code at -O2 on x86_64-linux-gnu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109967 --- Comment #7 from Shaohua Li --- This test case does not reproduce anymore on the current trunk. Maybe one of the recent fixes fixed the underlying issue as well.
[Bug modula2/111530] Unable to build GM2 standard library on BSD due to a `getopt_long_only' GNU extension dependency
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111530 Gaius Mulley changed: What|Removed |Added Status|UNCONFIRMED |ASSIGNED Ever confirmed|0 |1 Last reconfirmed||2023-09-25 --- Comment #1 from Gaius Mulley --- Many thanks for the bug report and hints on how to fix it.
[Bug c++/111592] New: ICE on expanding argument pack into variadic constructor
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111592 Bug ID: 111592 Summary: ICE on expanding argument pack into variadic constructor Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: yankel-pro at scialom dot org Target Milestone: --- GCC raises an Internal Compiler Error in c_common_parse_file() when (indirectly, see source) expanding an argument pack into a variadic constructor. $ g++ --version g++ (Compiler-Explorer-Build-gcc-1eb80f78f114f6582c349f75e08b361a0a582091-binutils-2.40) 14.0.0 20230925 (experimental) $ cat source struct ignore { ignore(...) {} }; template void InternalCompilerError(Args... args) { ignore{ ignore(args) ... }; } int main() { InternalCompilerError(0, 0); } $ g++ -c source : In instantiation of 'void InternalCompilerError(Args ...) [with Args = {int, int}]': :11:24: required from here :7:3: internal compiler error: in finish_expr_stmt, at cp/semantics.cc:910 7 | { ignore{ ignore(args) ... }; } | ^~ 0x251c8ee internal_error(char const*, ...) ???:0 0xae8dda fancy_abort(char const*, int, char const*) ???:0 0xcfa8f8 instantiate_decl(tree_node*, bool, bool) ???:0 0xd2dcbb instantiate_pending_templates(int) ???:0 0xbded50 c_parse_final_cleanups() ???:0 0xe149d8 c_common_parse_file() ???:0 Found on Compiler Explorer <https://godbolt.org/z/M788xE44z>.
[Bug target/111591] ppc64be: miscompilation with -mstrict-align / -O3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111591 --- Comment #3 from Mathieu Malaterre --- I can make the upstream code fails using g++-11 / g++-12 version (Debian/sid).
[Bug target/111591] ppc64be: miscompilation with -mstrict-align / -O3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111591 Richard Biener changed: What|Removed |Added Keywords||needs-bisection --- Comment #2 from Richard Biener --- does it work with older GCC?
[Bug target/111591] ppc64be: miscompilation with -mstrict-align / -O3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111591 --- Comment #1 from Mathieu Malaterre --- Created attachment 55989 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55989=edit cvise reduced test case % g++ -std=c++11 -o works -DHWY_COMPILE_ONLY_EMU128 -DHWY_BROKEN_EMU128=0 -maltivec -mcpu=power8 -g -O3 alt.cc -Wall -Wextra -Werror -Wfatal-errors % g++ -std=c++11 -o fails -DHWY_COMPILE_ONLY_EMU128 -DHWY_BROKEN_EMU128=0 -maltivec -mcpu=power8 -mstrict-align -g -O3 alt.cc -Wall -Wextra -Werror -Wfatal-errors should give: % ./works -> success but: % ./fails fails: alt.cc:395: void hwy::detail::AssertArrayEqual(const TypeInfo&, const void*, const void*, size_t, const char*, const char*, int): Assertion `memcmp(a, b, c * ti.sizeof_t) == 0' failed. zsh: abort ./fails
[Bug target/111591] New: ppc64be: miscompilation with -mstrict-align / -O3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111591 Bug ID: 111591 Summary: ppc64be: miscompilation with -mstrict-align / -O3 Product: gcc Version: 13.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: malat at debian dot org Target Milestone: --- I am seeing a regression in highway unit test on ppc64be when using -mstrict-align / -O3 454/530 Test #454: HwyWidenMulTestGroup/HwyWidenMulTest.TestAllSatWidenMulPairwiseAdd/EMU128 # GetParam() = 2305843009213693952 .Subprocess aborted***Exception: 0.00 sec Running main() from ./googletest/src/gtest_main.cc Note: Google Test filter = HwyWidenMulTestGroup/HwyWidenMulTest.TestAllSatWidenMulPairwiseAdd/EMU128 [==] Running 1 test from 1 test suite. [--] Global test environment set-up. [--] 1 test from HwyWidenMulTestGroup/HwyWidenMulTest [ RUN ] HwyWidenMulTestGroup/HwyWidenMulTest.TestAllSatWidenMulPairwiseAdd/EMU128 i16x4 expect [0+ ->]: 0x7FFF,0x7FFF,0x7FFF,0x7FFF, i16x4 actual [0+ ->]: 0x7FFF,0x01A5,0x7FFF,0x7FFF, Abort at ./hwy/tests/widen_mul_test.cc:205: EMU128, i16x4 lane 1 mismatch: expected '0x7FFF', got '0x01A5'. ref: https://buildd.debian.org/status/fetch.php?pkg=highway=ppc64=1.0.8%7Egit20230918.1e3a3d7-4=1695113957=0
[Bug c/111590] New: RISC-V: Multiple ICE in gfortran regression with 'V' Extension enabled
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111590 Bug ID: 111590 Summary: RISC-V: Multiple ICE in gfortran regression with 'V' Extension enabled Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: juzhe.zhong at rivai dot ai Target Milestone: --- FAIL: gfortran.dg/assumed_rank_24.f90 -O2 (internal compiler error: in smallest_mode_for_size, at stor-layout.cc:356) FAIL: gfortran.dg/assumed_rank_24.f90 -O2 (test for excess errors) FAIL: gfortran.dg/assumed_rank_24.f90 -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions (internal compiler error: in smallest_mode_for_size, at stor-layout.cc:356) FAIL: gfortran.dg/assumed_rank_24.f90 -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions (test for excess errors) FAIL: gfortran.dg/assumed_rank_24.f90 -O3 -g (internal compiler error: in smallest_mode_for_size, at stor-layout.cc:356) FAIL: gfortran.dg/assumed_rank_24.f90 -O3 -g (test for excess errors) FAIL: gfortran.dg/class_to_type_1.f03 -O2 (internal compiler error: in smallest_mode_for_size, at stor-layout.cc:356) FAIL: gfortran.dg/class_to_type_1.f03 -O2 (test for excess errors) FAIL: gfortran.dg/class_to_type_1.f03 -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions (internal compiler error: in smallest_mode_for_size, at stor-layout.cc:356) FAIL: gfortran.dg/class_to_type_1.f03 -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions (test for excess errors) FAIL: gfortran.dg/class_to_type_1.f03 -O3 -g (internal compiler error: in smallest_mode_for_size, at stor-layout.cc:356) FAIL: gfortran.dg/class_to_type_1.f03 -O3 -g (test for excess errors) FAIL: gfortran.dg/class_array_4.f03 -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions execution test FAIL: gfortran.dg/cshift_bounds_4.f90 -O2 (internal compiler error: in smallest_mode_for_size, at stor-layout.cc:356) FAIL: gfortran.dg/cshift_bounds_4.f90 -O2 (test for excess errors) FAIL: gfortran.dg/cshift_bounds_4.f90 -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions (internal compiler error: in smallest_mode_for_size, at stor-layout.cc:356) FAIL: gfortran.dg/cshift_bounds_4.f90 -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions (test for excess errors) FAIL: gfortran.dg/cshift_bounds_4.f90 -O3 -g (internal compiler error: in smallest_mode_for_size, at stor-layout.cc:356) FAIL: gfortran.dg/cshift_bounds_4.f90 -O3 -g (test for excess errors) One of the case: program main integer, dimension(:,:), allocatable :: a, b integer, dimension(:), allocatable :: sh allocate (a(2,2)) allocate (b(2,2)) allocate (sh(3)) a = 1 b = cshift(a,sh) end program main
[Bug target/109166] Built-in __atomic_test_and_set does not seem to be atomic on ARMv4T
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109166 Richard Earnshaw changed: What|Removed |Added Resolution|--- |WONTFIX Status|NEW |RESOLVED --- Comment #8 from Richard Earnshaw --- I'm going to close this as WONTFIX. There are several reasons for this. There's no SWPH operation, so it's impossible to generalize atomic operations for all basic data types. It's not possible to synthesize a 16-bit atomic type with either SWP or SWPB. There's no support in Thumb state for SWP[B]. The instruction was removed in later versions of the architecture, which makes code non-portable. Finally, Armv4, which dates to around 1995, is essentially in maintenance only mode and this is really a new feature request. In fact, I don't think we'd really want to add new features for anything before Armv7 these days (even that is more than 10 years old).
[Bug target/111500] [arm-none-eabi-gcc] / suboptimal optimization / subs followed by cmp (et alii)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111500 Luke changed: What|Removed |Added Status|WAITING |RESOLVED Resolution|--- |DUPLICATE --- Comment #8 from Luke --- *** This bug has been marked as a duplicate of bug 104773 ***
[Bug rtl-optimization/104773] compare with 1 not merged with subtract 1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104773 Luke changed: What|Removed |Added CC||cptarse-luke at yahoo dot com --- Comment #3 from Luke --- *** Bug 111500 has been marked as a duplicate of this bug. ***
[Bug target/111522] Different code path for static initialization with flto
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111522 --- Comment #10 from Mathieu Malaterre --- for reference: % c++ --verbose -O2 -flto base2.cc && ./a.out Using built-in specs. COLLECT_GCC=c++ COLLECT_LTO_WRAPPER=/usr/libexec/gcc/powerpc64le-linux-gnu/13/lto-wrapper OFFLOAD_TARGET_NAMES=nvptx-none OFFLOAD_TARGET_DEFAULT=1 Target: powerpc64le-linux-gnu Configured with: ../src/configure -v --with-pkgversion='Debian 13.2.0-4' --with-bugurl=file:///usr/share/doc/gcc-13/README.Bugs --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --prefix=/usr --with-gcc-major-version-only --program-suffix=-13 --program-prefix=powerpc64le-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/libexec --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --enable-bootstrap --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --enable-plugin --enable-default-pie --with-system-zlib --enable-libphobos-checking=release --with-target-system-zlib=auto --with-libphobos-druntime-only=yes --enable-objc-gc=auto --enable-secureplt --enable-targets=powerpcle-linux --disable-multilib --enable-multiarch --disable-werror --with-long-double-128 --enable-offload-targets=nvptx-none=/build/reproducible-path/gcc-13-13.2.0/debian/tmp-nvptx/usr --enable-offload-defaulted --without-cuda-driver --enable-checking=release --build=powerpc64le-linux-gnu --host=powerpc64le-linux-gnu --target=powerpc64le-linux-gnu --with-build-config=bootstrap-lto-lean --enable-link-serialization=4 Thread model: posix Supported LTO compression algorithms: zlib zstd gcc version 13.2.0 (Debian 13.2.0-4) COLLECT_GCC_OPTIONS='-v' '-O2' '-flto' '-shared-libgcc' '-dumpdir' 'a-' /usr/libexec/gcc/powerpc64le-linux-gnu/13/cc1plus -quiet -v -imultiarch powerpc64le-linux-gnu -D_GNU_SOURCE base2.cc -msecure-plt -quiet -dumpdir a- -dumpbase base2.cc -dumpbase-ext .cc -O2 -version -flto -fasynchronous-unwind-tables -o /tmp/cc1cimSD.s GNU C++17 (Debian 13.2.0-4) version 13.2.0 (powerpc64le-linux-gnu) compiled by GNU C version 13.2.0, GMP version 6.3.0, MPFR version 4.2.1, MPC version 1.3.1, isl version isl-0.26-GMP GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072 ignoring nonexistent directory "/usr/lib/gcc/powerpc64le-linux-gnu/13/../../../../include/powerpc64-linux-gnu/c++/13" ignoring nonexistent directory "/usr/local/include/powerpc64le-linux-gnu" ignoring nonexistent directory "/usr/lib/gcc/powerpc64le-linux-gnu/13/include-fixed/powerpc64le-linux-gnu" ignoring nonexistent directory "/usr/lib/gcc/powerpc64le-linux-gnu/13/include-fixed" ignoring nonexistent directory "/usr/lib/gcc/powerpc64le-linux-gnu/13/../../../../powerpc64le-linux-gnu/include" #include "..." search starts here: #include <...> search starts here: /usr/include/c++/13 /usr/include/powerpc64le-linux-gnu/c++/13 /usr/include/c++/13/backward /usr/lib/gcc/powerpc64le-linux-gnu/13/include /usr/local/include /usr/include/powerpc64le-linux-gnu /usr/include End of search list. Compiler executable checksum: 403ce0768541423839c6b7d8fd9dfeff COLLECT_GCC_OPTIONS='-v' '-O2' '-flto' '-shared-libgcc' '-dumpdir' 'a-' as -v -a64 -mpower8 -many -mlittle -o /tmp/ccFzBgtQ.o /tmp/cc1cimSD.s GNU assembler version 2.41 (powerpc64le-linux-gnu) using BFD version (GNU Binutils for Debian) 2.41 COMPILER_PATH=/usr/libexec/gcc/powerpc64le-linux-gnu/13/:/usr/libexec/gcc/powerpc64le-linux-gnu/13/:/usr/libexec/gcc/powerpc64le-linux-gnu/:/usr/lib/gcc/powerpc64le-linux-gnu/13/:/usr/lib/gcc/powerpc64le-linux-gnu/ LIBRARY_PATH=/usr/lib/gcc/powerpc64le-linux-gnu/13/:/usr/lib/gcc/powerpc64le-linux-gnu/13/../../../powerpc64le-linux-gnu/:/usr/lib/gcc/powerpc64le-linux-gnu/13/../../../../lib/:/lib/powerpc64le-linux-gnu/:/lib/../lib/:/usr/lib/powerpc64le-linux-gnu/:/usr/lib/../lib/:/usr/lib/gcc/powerpc64le-linux-gnu/13/../../../:/lib/:/usr/lib/ COLLECT_GCC_OPTIONS='-v' '-O2' '-flto' '-shared-libgcc' '-dumpdir' 'a.' /usr/libexec/gcc/powerpc64le-linux-gnu/13/collect2 -plugin /usr/libexec/gcc/powerpc64le-linux-gnu/13/liblto_plugin.so -plugin-opt=/usr/libexec/gcc/powerpc64le-linux-gnu/13/lto-wrapper -plugin-opt=-fresolution=/tmp/ccSvdAAw.res -plugin-opt=-pass-through=-lgcc_s -plugin-opt=-pass-through=-lgcc -plugin-opt=-pass-through=-lc -plugin-opt=-pass-through=-lgcc_s -plugin-opt=-pass-through=-lgcc -flto --build-id --eh-frame-hdr -V -m elf64lppc --hash-style=gnu --as-needed -dynamic-linker /lib64/ld64.so.2 -pie /usr/lib/gcc/powerpc64le-linux-gnu/13/../../../powerpc64le-linux-gnu/Scrt1.o /usr/lib/gcc/powerpc64le-linux-gnu/13/../../../powerpc64le-linux-gnu/crti.o /usr/lib/gcc/powerpc64le-linux-gnu/13/crtbeginS.o -L/usr/lib/gcc/powerpc64le-linux-gnu/13 -L/usr/lib/gcc/powerpc64le-linux-gnu/13/../../../powerpc64le-linux-gnu -L/usr/lib/gcc/powerpc64le-linux-gnu/13/../../../../lib -L/lib/powerpc64le-linux-gnu -L/lib/../lib -L/usr/lib/powerpc64le-linux-gnu -L/usr/lib/../lib
[Bug target/111522] Different code path for static initialization with flto
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111522 --- Comment #9 from Mathieu Malaterre --- If you download pr111522.cc from comment #8, you should be able to reproduce exactly the original upstream issue. Steps: % c++ -O2 -flto pr111522.cc && ./a.out vs % c++ -O2 pr111522.cc && ./a.out
[Bug target/111522] Different code path for static initialization with flto
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111522 --- Comment #8 from Mathieu Malaterre --- Created attachment 55988 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55988=edit gcc -E -P
[Bug target/111522] Different code path for static initialization with flto
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111522 --- Comment #7 from Mathieu Malaterre --- Created attachment 55987 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55987=edit gcc -E -P
[Bug target/104611] memcmp/strcmp/strncmp can be optimized when the result is tested for [in]equality with 0 on aarch64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104611 Mathias Stearn changed: What|Removed |Added CC||redbeard0531 at gmail dot com --- Comment #4 from Mathias Stearn --- clang has already been using the optimized memcmp code since v16, even at -O1: https://www.godbolt.org/z/qEd768TKr. Older versions (at least since v9) were still branch-free, but via a less optimal sequence of instructions. GCC's code gets even more ridiculous at 32 bytes, because it does a branch after every 8-byte compare, while the clang code is fully branch-free (not that branch-free is always better, but it seems clearly so in this case). Judging by the codegen, there seems to be three deficiencies in GCC: 1) an inability to take advantage of the load-pair instructions to load 16-bytes at a time, and 2) an inability to use ccmp to combine comparisons. 3) using branching rather than cset to fill the output register. Ideally these could all be done in the general case by the low level instruction optimizer, but even getting them special cased for memcmp (and friends) would be an improvement.
[Bug tree-optimization/111563] Missed optimization of LICM
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111563 --- Comment #5 from Yi <652023330028 at smail dot nju.edu.cn> --- (In reply to Andrew Pinski from comment #3) > So this is again reassociation with LIM, the same issue as PR 111560. For this similar code, GCC works as expected: https://godbolt.org/z/3TaqfeTqb ```c++ extern int var_24; int t; void test(int var_2, int var_3, int var_8, int var_10, int var_14) { for (int i_2 = -3247424; i_2 < 19; i_2 += var_3 + 1056714155) { var_24 += (-(200 / var_10)) + (-var_8); var_24 += var_14 + var_2; i_2+=i_2/3; } } ``` So it seems that this and PR 111560 may not be due to the same cause. Because it doesn't seem to be relevant to the statement, "Our re-association only produces a canonical order within a single expression." Meanwhile, in Example 2, 'if(var_3)' is actually optimized out of the Loop by Loop Unswitch. So maybe the rest of the loop should be optimized as expected like this similar code?
[Bug libstdc++/111589] New: Use relaxed atomic increment (but not decrement!) in shared_ptr
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111589 Bug ID: 111589 Summary: Use relaxed atomic increment (but not decrement!) in shared_ptr Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: libstdc++ Assignee: unassigned at gcc dot gnu.org Reporter: redbeard0531 at gmail dot com Target Milestone: --- The atomic increment when copying a shared_ptr can be relaxed because it is never actually used as a synchronization operation. The current thread must already have sufficient synchronization to access the memory because it can already deref the pointer. All synchronization is done either via whatever program-provided code makes the shared_ptr object available to the thread, or in the atomic decrement (where the decrements to non-zero are releases that ensure all uses of the object happen before the final decrement to zero acquires and destroys the object). As an argument-from-authority, libc++ already is using relaxed for increments and acq/rel for decements: https://github.com/llvm/llvm-project/blob/c649fd34e928ad01951cbff298c5c44853dd41dd/libcxx/include/__memory/shared_ptr.h#L101-L121 This will have no impact on x86 where all atomic RMWs are effectively sequentially consistent, but it will enable the use of ldadd rather than ldaddal on aarch64, and similar optimizations on other weaker architectures.
[Bug libstdc++/111588] New: Provide opt-out of shared_ptr single-threaded optimization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111588 Bug ID: 111588 Summary: Provide opt-out of shared_ptr single-threaded optimization Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: libstdc++ Assignee: unassigned at gcc dot gnu.org Reporter: redbeard0531 at gmail dot com Target Milestone: --- Right now there is a fast-path for single-threaded programs to avoid the overhead of atomics in shared_ptr, but there is no equivalent for a program the knows it is multi-threaded to remove the check and branch. If __GTHREADS is not defined then no atomic code is emitted. There are two issues with this: 1) for programs that know they are effectively always multithreaded they pay for a runtime branch and .text segment bloat for an optimization that never applies. This may have knock-on effects of making functions that use shared_ptr less likely to be inlined by pushing them slightly over the complexity threshold. 2) It invalidates singlethreaded microbenchmarks of code that uses shared_ptr because the performance of the code may be very different from when run in the real multithreaded program. I understand the value of making a fast mode for single-threaded code, and I can even except having the runtime branch by default, rather than as an opt-in, when it is unknown if the program will be run with multiple threads. But an opt-out would be nice to have. If it had to be a gcc-build time option rather than a #define, that would be acceptable for us since we always use our own build of gcc, but it seems like a worse option for other users. FWIW, neither llvm libc++ (https://github.com/llvm/llvm-project/blob/0bfaed8c612705cfa8c5382d26d8089a0a26386b/libcxx/include/__memory/shared_ptr.h#L103-L110) nor MS-STL (https://github.com/microsoft/STL/blob/main/stl/inc/memory#L1171-L1173) ever use runtime branching to detect multithreading.
[Bug target/104831] RISCV libatomic LR.aq/SC.rl pair insufficient for SEQ_CST
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104831 palmer at gcc dot gnu.org changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |patrick at rivosinc dot com Status|UNCONFIRMED |ASSIGNED Ever confirmed|0 |1 Last reconfirmed||2023-09-25 --- Comment #10 from palmer at gcc dot gnu.org --- This should be fixed, looks like we just forgot to close the bug. I've assigned it to Patrick to make sure everything's finished.
[Bug target/111522] Different code path for static initialization with flto
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111522 --- Comment #6 from Mathieu Malaterre --- (In reply to Andrew Pinski from comment #5) > (In reply to Mathieu Malaterre from comment #4) > > > So the original > > > (upstream) code is somewhat buggy as it rely on lazy init for global var. > > > > Those global vars are in different namespace, I actually fail to underwhat > > why the definition with ",cpu=power10" gets pulled in... > > Because `#pragma GCC target targets_str` is global state and unrelated to > namespace ... Forgot to mentionned that each `#pragma GCC target` for namespace are inside `#pragma GCC push_options` / `#pragma GCC pop_options`. This implements "per namespace" target-specific options AFAIK.
[Bug ipa/59948] Optimize std::function
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59948 --- Comment #8 from Jan Hubicka --- Trunk optimized stuff return 0, but fails to optimize out functions which becomes unused after indirect inlining. With -fno-early-inlining we end up with: int m () { void * D.48296; int __args#0; struct function h; int _12; bool (*) (union _Any_data & {ref-all}, const union _Any_data & {ref-all}, _Manager_operation) _24; bool (*) (union _Any_data & {ref-all}, const union _Any_data & {ref-all}, _Manager_operation) _27; long unsigned int _29; long unsigned int _35; vector(2) long unsigned int _37; void * _42; [local count: 1073741824]: _29 = (long unsigned int) _M_invoke; _35 = (long unsigned int) _M_manager; _37 = {_35, _29}; h ={v} {CLOBBER}; MEM [(struct _Function_base *) + 8B] = {}; MEM[(int (*) (int) *)] = f; MEM [(void *) + 16B] = _37; __args#0 = 1; _12 = std::_Function_handler::_M_invoke (_M_functor, &__args#0); [local count: 1073312329]: __args#0 ={v} {CLOBBER(eol)}; _24 = MEM[(struct _Function_base *)]._M_manager; if (_24 != 0B) goto ; [70.00%] else goto ; [30.00%] [local count: 751318634]: _24 ([(struct _Function_base *)]._M_functor, [(struct _Function_base *)]._M_functor, 3); [local count: 1073312329]: h ={v} {CLOBBER}; h ={v} {CLOBBER(eol)}; return _12; [count: 0]: : _27 = MEM[(struct _Function_base *)]._M_manager; if (_27 != 0B) goto ; [0.00%] else goto ; [0.00%] [count: 0]: _27 ([(struct _Function_base *)]._M_functor, [(struct _Function_base *)]._M_functor, 3); [count: 0]: h ={v} {CLOBBER}; _42 = __builtin_eh_pointer (2); __builtin_unwind_resume (_42); } ipa-prop fails to track the pointer passed around: IPA function summary for int m()/288 inlinable global time: 41.256800 self size: 16 global size: 41 min size: 38 self stack: 32 global stack:32 size:19.00, time:8.66 size:3.00, time:2.00, executed if:(not inlined) calls: std::function::~function()/286 inlined freq:0.00 Stack frame offset 32, callee self size 0 std::_Function_base::~_Function_base()/71 inlined freq:0.00 Stack frame offset 32, callee self size 0 indirect call loop depth: 0 freq:0.00 size: 6 time: 18 std::function::~function()/404 inlined freq:1.00 Stack frame offset 32, callee self size 0 std::_Function_base::~_Function_base()/405 inlined freq:1.00 Stack frame offset 32, callee self size 0 indirect call loop depth: 0 freq:0.70 size: 6 time: 18 _Res std::function<_Res(_ArgTypes ...)>::operator()(_ArgTypes ...) const [with _Res = int; _ArgTypes = {int}]/304 inlined freq:1.00 Stack frame offset 32, callee self size 0 void std::__throw_bad_function_call()/374 function body not available freq:0.00 loop depth: 0 size: 1 time: 10 _M_empty.isra/384 inlined freq:1.00 Stack frame offset 32, callee self size 0 indirect call loop depth: 0 freq:1.00 size: 6 time: 18 std::function<_Res(_ArgTypes ...)>::function(_Functor&&) [with _Functor = int (&)(int); _Constraints = void; _Res = int; _ArgTypes = {int}]/302 inlined freq:1.00 Stack frame offset 32, callee self size 0 std::function<_Res(_ArgTypes ...)>::function(_Functor&&) [with _Functor = int (&)(int); _Constraints = void; _Res = int; _ArgTypes = {int}]/375 inlined freq:0.33 Stack frame offset 32, callee self size 0 static void std::_Function_base::_Base_manager<_Functor>::_M_init_functor(std::_Any_data&, _Fn&&) [with _Fn = int (&)(int); _Functor = int (*)(int)]/310 inlined freq:0.33 Stack frame offset 32, callee self size 0 _M_create.isra/383 inlined freq:0.33 Stack frame offset 32, callee self size 0 void* std::_Any_data::_M_access()/388 inlined freq:0.33 Stack frame offset 32, callee self size 0 operator new.isra/386 inlined freq:0.33 Stack frame offset 32, callee self size 0 static bool std::_Function_base::_Base_manager<_Functor>::_M_not_empty_function(_Tp*) [with _Tp = int(int); _Functor = int (*)(int)]/308 inlined freq:1.00 Stack frame offset 32, callee self size 0 constexpr std::_Function_base::_Function_base()/299 inlined freq:1.00 Stack frame offset 32, callee self size 0
[Bug c++/111512] GCC's __builtin_memcpy can trigger ADL
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111512 --- Comment #3 from Jonathan Wakely --- The library has a workaround, but the front end still does unwanted ADL for __builtin_memcpy (and probably other built-ins).
[Bug libstdc++/111511] Incorrect ADL in std::to_array in GCC 11/12/13
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111511 --- Comment #6 from CVS Commits --- The master branch has been updated by Jonathan Wakely : https://gcc.gnu.org/g:77cf3773021b0a20d89623e09d620747a05588ec commit r14-4252-g77cf3773021b0a20d89623e09d620747a05588ec Author: Jonathan Wakely Date: Thu Sep 21 09:14:57 2023 +0100 libstdc++: Prevent unwanted ADL in std::to_array [PR111512] As noted in PR c++/111512, GCC does ADL for __builtin_memcpy if it is unqualified, which can cause errors for template argument types which cannot be completed. Casting the memcpy arguments to void* prevents ADL from considering the problem type. libstdc++-v3/ChangeLog: PR libstdc++/111511 PR c++/111512 * include/std/array (to_array): Cast memcpy arguments to void*. * testsuite/23_containers/array/creation/111512.cc: New test.
[Bug c++/111512] GCC's __builtin_memcpy can trigger ADL
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111512 --- Comment #2 from CVS Commits --- The master branch has been updated by Jonathan Wakely : https://gcc.gnu.org/g:77cf3773021b0a20d89623e09d620747a05588ec commit r14-4252-g77cf3773021b0a20d89623e09d620747a05588ec Author: Jonathan Wakely Date: Thu Sep 21 09:14:57 2023 +0100 libstdc++: Prevent unwanted ADL in std::to_array [PR111512] As noted in PR c++/111512, GCC does ADL for __builtin_memcpy if it is unqualified, which can cause errors for template argument types which cannot be completed. Casting the memcpy arguments to void* prevents ADL from considering the problem type. libstdc++-v3/ChangeLog: PR libstdc++/111511 PR c++/111512 * include/std/array (to_array): Cast memcpy arguments to void*. * testsuite/23_containers/array/creation/111512.cc: New test.
[Bug tree-optimization/110982] (unsigned)(signed_char) != (unsigned)-1 is never changed back into signed_char != -1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110982 Andrew Pinski changed: What|Removed |Added Status|UNCONFIRMED |NEW Ever confirmed|0 |1 Last reconfirmed||2023-09-25 --- Comment #2 from Andrew Pinski --- Another case where int_fits_type_p use causes an missed optimization is: ``` unsigned f(int a) { unsigned t = a; if (a == -1) return t; return 0; } ``` This should be caught in phiopt2 but currently is not due to the int_fits_type_p usage. I noticed this in PR 110131 .
[Bug target/111570] -march=generic prints error
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111570 Jonathan Wakely changed: What|Removed |Added Last reconfirmed||2023-09-25 Status|UNCONFIRMED |NEW Ever confirmed|0 |1 --- Comment #1 from Jonathan Wakely --- https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html is very clear: "There is no -march=generic option because -march indicates the instruction set the compiler can use, and there is no generic instruction set applicable to all processors. In contrast, -mtune indicates the processor (or, in this case, collection of processors) for which the code is optimized." This is just a bug in the list of valid arguments printed.
[Bug target/111584] [aarch64] Redundant movprfx with ptrue
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111584 --- Comment #1 from Andrew Pinski --- Created attachment 55986 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55986=edit Full testcase `-march=armv8.2-a+sve -O2 -msve-vector-bits=256`
[Bug tree-optimization/111583] [13/14 Regression] Wrong code at -Os on x86_64-linux-gnu since r13-3281-g6cc3394507
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111583 Andrew Pinski changed: What|Removed |Added Status|UNCONFIRMED |NEW Ever confirmed|0 |1 Last reconfirmed||2023-09-25 --- Comment #1 from Andrew Pinski --- Confirmed. The problem is a latent bug in ldist. It turns: ``` [local count: 955630224]: a_5 = a_4 + 1; a.4_6 = (char *) a_4; *a.4_6 = 0; [local count: 1073741824]: # a_4 = PHI # j_7 = PHI j_8 = j_7 + 18446744073709551615; if (j_7 != 0) goto ; [89.00%] else goto ; [11.00%] ``` Into: ``` a_2 = (long int) k_1(D); j_3 = (long unsigned int) k_1(D); _23 = (sizetype) k_1(D); _25 = (char *) a_2; __builtin_memset (_25, 0, _23); ``` Which then basically says k!=0 as _25 can't be a null pointer.
[Bug c/111584] New: [aarch64] Redundant movprfx with ptrue
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111584 Bug ID: 111584 Summary: [aarch64] Redundant movprfx with ptrue Product: gcc Version: 13.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: zhongyunde at huawei dot com Target Milestone: --- * test: https://gcc.godbolt.org/z/E6Eez81jh ``` #include typedef svfloat32_t fvec32 __attribute__((arm_sve_vector_bits(256))); typedef svfloat32_t __m256_; __m256_ _mm256_mul_ps2_z(__m256_ a, __m256_ b) { __m256_ res; res = svmul_f32_z(svptrue_b32(), a, b); return res; } ``` * llvm have same output for _mm256_mul_ps2_x and _mm256_mul_ps2_z, while gcc doesn't has high efficient output for _mm256_mul_ps2_z
[Bug target/111500] [arm-none-eabi-gcc] / suboptimal optimization / subs followed by cmp (et alii)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111500 --- Comment #7 from Luke --- (In reply to Andrew Pinski from comment #6) > (In reply to Andrew Pinski from comment #5) > > This is most likely a dup of bug 104773. > > Or of bug 3507. i concur... but i do not know which one to choose... they both look the same to me... somehow...
[Bug tree-optimization/111583] [13/14 Regression] Wrong code at -Os on x86_64-linux-gnu since r13-3281-g6cc3394507
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111583 Andrew Pinski changed: What|Removed |Added Target Milestone|--- |13.3
[Bug tree-optimization/111583] New: [13/14 Regression] Wrong code at -Os on x86_64-linux-gnu since r13-3281-g6cc3394507
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111583 Bug ID: 111583 Summary: [13/14 Regression] Wrong code at -Os on x86_64-linux-gnu since r13-3281-g6cc3394507 Product: gcc Version: 14.0 Status: UNCONFIRMED Keywords: wrong-code Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: shaohua.li at inf dot ethz.ch CC: amacleod at redhat dot com Target Milestone: --- gcc at -Os produced the wrong code. Bisected to r13-3281-g6cc3394507 Compiler explorer: https://godbolt.org/z/8GM9YvMKb $ cat a.c int printf(const char *, ...); int b, c, d; char e; short f; const unsigned short **g; char h(char k) { if (k) return '0'; return 0; } int l() { b = 0; return 1; } static short m(unsigned k) { const unsigned short *n[65]; g = [4]; k || l(); long a = k; char i = 0; unsigned long j = k; while (j--) *(char *)a++ = i; c = h(d); f = k; return 0; } int main() { long o = (e < 0) << 5; m(o); printf("%d\n", f); } $ $ gcc -O0 -fsanitize=address,undefined a.c && ./a.out 0 $ gcc -Os a.c && ./a.out 32 $
[Bug target/111522] Different code path for static initialization with flto
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111522 --- Comment #5 from Andrew Pinski --- (In reply to Mathieu Malaterre from comment #4) > > So the original > > (upstream) code is somewhat buggy as it rely on lazy init for global var. > > Those global vars are in different namespace, I actually fail to underwhat > why the definition with ",cpu=power10" gets pulled in... Because `#pragma GCC target targets_str` is global state and unrelated to namespace ...
[Bug target/111522] Different code path for static initialization with flto
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111522 --- Comment #4 from Mathieu Malaterre --- > So the original > (upstream) code is somewhat buggy as it rely on lazy init for global var. Those global vars are in different namespace, I actually fail to underwhat why the definition with ",cpu=power10" gets pulled in...
[Bug target/111500] [arm-none-eabi-gcc] / suboptimal optimization / subs followed by cmp (et alii)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111500 Bug 111500 depends on bug 111581, which changed state. Bug 111581 Summary: [arm-none-eabi-gcc] / suboptimal optimization / uxth/sxth https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111581 What|Removed |Added Status|WAITING |RESOLVED Resolution|--- |DUPLICATE
[Bug rtl-optimization/60749] combine is overly cautious when operating on volatile memory references
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60749 Luke changed: What|Removed |Added CC||cptarse-luke at yahoo dot com --- Comment #3 from Luke --- *** Bug 111581 has been marked as a duplicate of this bug. ***
[Bug target/111581] [arm-none-eabi-gcc] / suboptimal optimization / uxth/sxth
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111581 Luke changed: What|Removed |Added Status|WAITING |RESOLVED Resolution|--- |DUPLICATE --- Comment #3 from Luke --- nope... i didn't know bug 60749... and it does not happen, when i omit the "volatile"... *** This bug has been marked as a duplicate of bug 60749 ***
[Bug tree-optimization/110386] [11/12/13 Regression] ICE with ABSU in backprop
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110386 Andrew Pinski changed: What|Removed |Added Summary|[11/12/13/14 Regression]|[11/12/13 Regression] ICE |ICE with ABSU in backprop |with ABSU in backprop Known to work||14.0 --- Comment #9 from Andrew Pinski --- Fixed on the trunk so far.
[Bug tree-optimization/110386] [11/12/13/14 Regression] ICE with ABSU in backprop
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110386 --- Comment #8 from CVS Commits --- The trunk branch has been updated by Andrew Pinski : https://gcc.gnu.org/g:2bbac12ea7bd8a3eef5382e1b13f6019df4ec03f commit r14-4249-g2bbac12ea7bd8a3eef5382e1b13f6019df4ec03f Author: Andrew Pinski Date: Sat Sep 23 21:53:09 2023 -0700 Fix PR 110386: backprop vs ABSU_EXPR The issue here is that when backprop tries to go and strip sign ops, it skips over ABSU_EXPR but ABSU_EXPR not only does an ABS, it also changes the type to unsigned. Since strip_sign_op_1 is only supposed to strip off sign changing operands and not ones that change types, removing ABSU_EXPR here is correct. We don't handle nop conversions so this does cause any missed optimizations either. OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions. PR tree-optimization/110386 gcc/ChangeLog: * gimple-ssa-backprop.cc (strip_sign_op_1): Remove ABSU_EXPR. gcc/testsuite/ChangeLog: * gcc.c-torture/compile/pr110386-1.c: New test. * gcc.c-torture/compile/pr110386-2.c: New test.
[Bug target/111522] Different code path for static initialization with flto
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111522 --- Comment #3 from Mathieu Malaterre --- For reference: * https://github.com/google/highway/commit/fea3dba9cfec3a74ddcd8ecac3a5d4d8429191e4
[Bug target/111522] Different code path for static initialization with flto
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111522 --- Comment #2 from Mathieu Malaterre --- (In reply to Andrew Pinski from comment #1) > I think this is just broken code. > > It does: > #define HWY_BEFORE_NAMESPACE() > \ > HWY_PUSH_ATTRIBUTES("altivec,vsx,power8-vector" > \ > ",cpu=power10") > > But does not do a pop before the main function. > > And then you are testing on power8 which obvious does not have all of the > instructions as power10 ... > Why it works without -flto is just pure accident not using the instructions > that are not in power8. > > Anyways I suspect this is too much reduced testcase. So you might need to > provide the original one. I reported this one up after reading #111380. Honestly there is no "wrong-code" here. The LTO case is simply an eager init of global variable, while the non-LTO is a lazy loading of global var. So the original (upstream) code is somewhat buggy as it rely on lazy init for global var. Could someone please just confirm that eager init of global var is expected in LTO case, we could just close this one.
[Bug ada/111578] GNAT ada.textio.setline gives incorrect result
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111578 Eric Botcazou changed: What|Removed |Added Resolution|--- |INVALID CC||ebotcazou at gcc dot gnu.org Status|UNCONFIRMED |RESOLVED --- Comment #1 from Eric Botcazou --- No, see the A.10.5 clause of the Ada Reference Manual.
[Bug target/111500] [arm-none-eabi-gcc] / suboptimal optimization / subs followed by cmp (et alii)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111500 --- Comment #6 from Andrew Pinski --- (In reply to Andrew Pinski from comment #5) > This is most likely a dup of bug 104773. Or of bug 3507.
[Bug target/111500] [arm-none-eabi-gcc] / suboptimal optimization / subs followed by cmp (et alii)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111500 --- Comment #5 from Andrew Pinski --- (In reply to Luke from comment #4) > the a.i file for example #1a is: > # 1 "a.c" > # 1 "/tmp//" > # 1 "" > # 1 "" > # 1 "a.c" > void artiSUBS() { > for (int i=100; i>0; i--) > *(volatile int*)0xE000E014 = i; > } > > the command-line was: > > arm-none-eabi-gcc -save-temps -S a.c -O3 -g -mcpu=cortex-m0plus -mthumb > > -Wall --specs=nosys.specs -nostdlib -fdata-sections -ffunction-sections > > -ffreestanding -Winline > > and the resulting a.s file contains that subs/cmp sequence... This is most likely a dup of bug 104773.
[Bug target/111582] [arm-none-eabi-gcc] / suboptimal optimization / bitfield / superfluous stack write
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111582 --- Comment #2 from Andrew Pinski --- (In reply to Andrew Pinski from comment #1) > Fixed in GCC 10. artiSP: @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. movsr2, #14 @ sp needed ldr r3, .L3 ldrbr0, [r3] bicsr0, r2 subsr2, r2, #10 orrsr0, r2 strbr0, [r3] lslsr0, r0, #16 bx lr .L4: .align 2 .L3: .word -536870742
[Bug target/111500] [arm-none-eabi-gcc] / suboptimal optimization / subs followed by cmp (et alii)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111500 Bug 111500 depends on bug 111582, which changed state. Bug 111582 Summary: [arm-none-eabi-gcc] / suboptimal optimization / bitfield / superfluous stack write https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111582 What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |FIXED
[Bug target/111582] [arm-none-eabi-gcc] / suboptimal optimization / bitfield / superfluous stack write
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111582 Andrew Pinski changed: What|Removed |Added Target Milestone|--- |10.0 Resolution|--- |FIXED Status|UNCONFIRMED |RESOLVED Known to work||10.4.0, 14.0 --- Comment #1 from Andrew Pinski --- Fixed in GCC 10.
[Bug target/111581] [arm-none-eabi-gcc] / suboptimal optimization / uxth/sxth
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111581 Andrew Pinski changed: What|Removed |Added Status|UNCONFIRMED |WAITING Last reconfirmed||2023-09-25 Ever confirmed|0 |1 See Also||https://gcc.gnu.org/bugzill ||a/show_bug.cgi?id=60749 --- Comment #2 from Andrew Pinski --- Is there a testcase without pointers to a volatile location? If not then this is a dup of bug 60749.
[Bug ada/111579] gnatpp error at start
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111579 Eric Botcazou changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |INVALID CC||ebotcazou at gcc dot gnu.org --- Comment #1 from Eric Botcazou --- gnatpp is not part of the GNU Compiler Collection.
[Bug target/40499] [missed optimization] branch to return not threaded on thumb
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=40499 Andrew Pinski changed: What|Removed |Added CC||cptarse-luke at yahoo dot com --- Comment #7 from Andrew Pinski --- *** Bug 111580 has been marked as a duplicate of this bug. ***
[Bug target/111580] [arm-none-eabi-gcc] / suboptimal optimization / b.n to bx lr
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111580 Andrew Pinski changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |DUPLICATE --- Comment #1 from Andrew Pinski --- Dup of bug 40499. *** This bug has been marked as a duplicate of bug 40499 ***
[Bug target/111500] [arm-none-eabi-gcc] / suboptimal optimization / subs followed by cmp (et alii)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111500 Bug 111500 depends on bug 111580, which changed state. Bug 111580 Summary: [arm-none-eabi-gcc] / suboptimal optimization / b.n to bx lr https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111580 What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |DUPLICATE
[Bug target/111581] [arm-none-eabi-gcc] / suboptimal optimization / uxth/sxth
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111581 --- Comment #1 from Luke --- in the unsigned case: furthermore the ldrh already cleared the high half-word, so that a uxth would be superfluous, even if there would be a subsequent str...
[Bug target/111582] New: [arm-none-eabi-gcc] / suboptimal optimization / bitfield / superfluous stack write
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111582 Bug ID: 111582 Summary: [arm-none-eabi-gcc] / suboptimal optimization / bitfield / superfluous stack write Product: gcc Version: 9.3.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: cptarse-luke at yahoo dot com Target Milestone: --- When I try to use a struct with a bitfield, then it happens, that GCC writes to the stack without ever reading it: > arm-none-eabi-gcc -v Using built-in specs. COLLECT_GCC=arm-none-eabi-gcc COLLECT_LTO_WRAPPER=/usr/lib/gcc/arm-none-eabi/9.3.0/lto-wrapper Target: arm-none-eabi Configured with: ../configure --disable-decimal-float --disable-libffi --disable-libgomp --disable-libmudflap --disable-libquadmath --disable-libssp --disable-libstdcxx-pch --disable-libstdc__-v3 --disable-nls --disable-shared --disable-threads --disable-tls --disable-werror --enable-__cxa_atexit --enable-c99 --enable-gnu-indirect-function --enable-interwork --enable-languages=c,c++ --enable-long-long --enable-multilib --enable-plugins --host= --libdir=/usr/lib --libexecdir=/usr/lib --prefix=/usr --target=arm-none-eabi --with-gmp --with-gnu-as --with-gnu-ld --with-headers=/usr/arm-none-eabi/include --with-host-libstdcxx='-static-libgcc -Wl,-Bstatic,-lstdc++,-Bdynamic -lm' --with-isl --with-libelf --with-mpc --with-mpfr --with-multilib-list=rmprofile --with-native-system-header-dir=/include --with-newlib --with-python-dir=share/gcc-arm-none-eabi --with-sysroot=/usr/arm-none-eabi --with-system-zlib Thread model: single gcc version 9.3.0 (GCC) # arm-none-eabi-gcc -save-temps -S a.c -O3 -g -mcpu=cortex-m0plus -mthumb -Wall --specs=nosys.specs -nostdlib -fdata-sections -ffunction-sections -ffreestanding -Winline > cat a.i # 1 "a.c" # 1 "/tmp//" # 1 "" # 1 "" # 1 "a.c" typedef unsigned char u8; typedef unsigned int u32; extern int fatal(); __attribute__((always_inline)) inline u32 lsb(const u8 l) { return (1U<> (i*8); if (R.rs || msk==~R.msk) return (((volatile u8*)R.a)[i] = v) << (i*8); else if (R.v==~R.msk) return (((volatile u8*)R.a)[i] |= v) << (i*8); return (((volatile u8*)R.a)[i] = (((volatile u8*)R.a)[i] & (R.msk>>(i*8))) | v) << (i*8); } return 0; } __attribute__((always_inline)) inline Reg GU(Reg R, u32 A, u32 N, u8 o, u8 w, u32 v) { const u32 msk=~(lsb(w)< cat a.s artiSP: sub sp, sp, #16 mov r2, sp movsr3, #2 strbr3, [r2, #12] ... add sp, sp, #16 bx lr I compile it on a Intel(R) Pentium(R) Silver J5040 CPU @ 2.00GHz running Void Linux (kernel: 6.3.13_1) for a STM32G030.
[Bug middle-end/111548] RISC-V Vector: ICE in validate_change_or_fail (vsetvl pass)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111548 --- Comment #1 from CVS Commits --- The master branch has been updated by Pan Li : https://gcc.gnu.org/g:9d5f20fc4a6b3254d2d379309193da4be2747987 commit r14-4248-g9d5f20fc4a6b3254d2d379309193da4be2747987 Author: Juzhe-Zhong Date: Sun Sep 24 11:17:01 2023 +0800 RISC-V: Fix AVL/VL bug of VSETVL PASS[PR111548] This patch fixes that AVL/VL reg incorrect fetch in VSETVL PASS. C/C++ regression passed. But gfortran didn't run yet. I am still finding a way to run it. Will commit it when I pass the fortran regression. PR target/111548 gcc/ChangeLog: * config/riscv/riscv-vsetvl.cc (earliest_pred_can_be_fused_p): Bugfix gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/pr111548.c: New test.