[Bug tree-optimization/114231] [12/13/14 regression] ICE when building libjxl
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114231 Jakub Jelinek changed: What|Removed |Added CC||jakub at gcc dot gnu.org, ||rguenth at gcc dot gnu.org --- Comment #10 from Jakub Jelinek --- On x86_64 the #c6 testcase with -O3 -fno-vect-cost-model started to ICE with r14-5603-g2b59e2b4dff42118fe3a505f07b9a6aa4cf53bdf For aarch64 same testcase, my bet is r12-1551-g3dfa4fe9f1a089b2b3906c83e22a1b39c49d937c though I've only verified r12-1529 works and r12-1573 ICEs and there are no IL differences before slp2 which newly ICEs.
[Bug rtl-optimization/101523] Huge number of combine attempts
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101523 --- Comment #14 from Andreas Krebbel --- If my analysis from comment #1 is correct, combine does superfluous steps here. Getting rid of this should not cause any harm, but should be beneficial for other targets as well. I agree that the patch I've proposed is kind of a hack. Do you think this could be turned into a proper fix?
[Bug target/114233] Newly-introduced pr113617.C test fails on Darwin
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114233 Francois-Xavier Coudert changed: What|Removed |Added Target||x86_64-apple-darwin23 Status|UNCONFIRMED |NEW Last reconfirmed||2024-03-05 CC||iains at gcc dot gnu.org Ever confirmed|0 |1
[Bug target/114233] New: Newly-introduced pr113617.C test fails on Darwin
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114233 Bug ID: 114233 Summary: Newly-introduced pr113617.C test fails on Darwin Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: fxcoudert at gcc dot gnu.org Target Milestone: --- FAIL: g++.dg/other/pr113617.C -std=gnu++14 (test for excess errors) Excess errors: ld: Undefined symbols: R::Y>::operator->(), referenced from: A::foo(long long, long long) in cci8MVgO.o R::Y>::operator->(), referenced from: A::foo(long long, long long) in cci8MVgO.o N1::N2::N3::AB::bleh(), referenced from: A::foo(long long, long long) in cci8MVgO.o N1::N2::N3::AC::m1(R::S), referenced from: void N1::N2::N3::X<1>::boo, false>>(long long, long long, long long, N1::N2::N3::C<(anonymous namespace)::D, false>&) in cci8MVgO.o void N1::N2::N3::X<1>::boo, false>>(long long, long long, long long, N1::N2::N3::C<(anonymous namespace)::D, false>&) in ccjwgqSE.o N1::N2::N3::AC::AC(int), referenced from: void N1::N2::N3::X<1>::boo, false>>(long long, long long, long long, N1::N2::N3::C<(anonymous namespace)::D, false>&) in cci8MVgO.o void N1::N2::N3::X<1>::boo, false>>(long long, long long, long long, N1::N2::N3::C<(anonymous namespace)::D, false>&) in ccjwgqSE.o _main, referenced from:
[Bug target/114232] [14 regression] ICE when building rr-5.7.0 with LTO on x86
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114232 --- Comment #3 from Sam James --- I am reducing it now but it's slow going.
[Bug target/114232] [14 regression] ICE when building rr-5.7.0 with LTO on x86
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114232 --- Comment #2 from Sam James --- Created attachment 57610 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57610=edit TraceStream.cc.ii.xz
[Bug target/114232] [14 regression] ICE when building rr-5.7.0 with LTO on x86
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114232 --- Comment #1 from Sam James --- Created attachment 57609 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57609=edit Task.cc.ii.xz
[Bug target/114232] New: [14 regression] ICE when building rr-5.7.0 with LTO on x86
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114232 Bug ID: 114232 Summary: [14 regression] ICE when building rr-5.7.0 with LTO on x86 Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: sjames at gcc dot gnu.org Target Milestone: --- Created attachment 57608 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57608=edit RecordSession.cc.ii.xz Hit this when building rr-5.7.0 with LTO on x86. ``` $ cat list.txt RecordSession.cc.ii Task.cc.ii TraceStream.cc.ii $ g++ -O3 -pipe -march=i686 -mfpmath=sse -msse -msse2 -fno-vect-cost-model -rdynamic -flto=auto @list.txt /var/tmp/portage/dev-util/rr-5.7.0/work/rr-5.7.0/src/TraceStream.cc: In member function ‘close’: /var/tmp/portage/dev-util/rr-5.7.0/work/rr-5.7.0/src/TraceStream.cc:1467:1: error: unrecognizable insn: 1467 | } | ^ (insn 160 159 161 26 (parallel [ (set (reg:V2QI 250 [ vect_patt_207.470_183 ]) (minus:V2QI (reg:V2QI 251) (reg:V2QI 249 [ vect__4.468_451 ]))) (clobber (reg:CC 17 flags)) ]) "/var/tmp/portage/dev-util/rr-5.7.0/work/rr-5.7.0/src/TraceStream.cc":254:16 -1 (nil)) during RTL pass: vregs /var/tmp/portage/dev-util/rr-5.7.0/work/rr-5.7.0/src/TraceStream.cc:1467:1: internal compiler error: in extract_insn, at recog.cc:2812 0x5799263a _fatal_insn(char const*, rtx_def const*, char const*, int, char const*) /usr/src/debug/sys-devel/gcc-14.0./gcc-14.0./gcc/rtl-error.cc:108 0x579927e8 _fatal_insn_not_found(rtx_def const*, char const*, int, char const*) /usr/src/debug/sys-devel/gcc-14.0./gcc-14.0./gcc/rtl-error.cc:116 0x56eadade extract_insn(rtx_insn*) /usr/src/debug/sys-devel/gcc-14.0./gcc-14.0./gcc/recog.cc:2812 0x58ac1379 instantiate_virtual_regs_in_insn /usr/src/debug/sys-devel/gcc-14.0./gcc-14.0./gcc/function.cc:1611 0x58ac1379 instantiate_virtual_regs /usr/src/debug/sys-devel/gcc-14.0./gcc-14.0./gcc/function.cc:1994 0x58ac1379 execute /usr/src/debug/sys-devel/gcc-14.0./gcc-14.0./gcc/function.cc:2041 Please submit a full bug report, with preprocessed source (by using -freport-bug). Please include the complete backtrace with any bug report. See <https://bugs.gentoo.org/> for instructions. make: *** [/tmp/ccCI1g9e.mk:17: /tmp/ccZVEvZf.ltrans5.ltrans.o] Error 1 lto-wrapper: fatal error: make returned 2 exit status compilation terminated. /usr/lib/gcc/i686-pc-linux-gnu/14/../../../../i686-pc-linux-gnu/bin/ld: error: lto-wrapper failed collect2: error: ld returned 1 exit status ``` ``` Using built-in specs. COLLECT_GCC=gcc COLLECT_LTO_WRAPPER=/usr/libexec/gcc/i686-pc-linux-gnu/14/lto-wrapper Target: i686-pc-linux-gnu Configured with: /var/tmp/portage/sys-devel/gcc-14.0./work/gcc-14.0./configure --host=i686-pc-linux-gnu --build=i686-pc-linux-gnu --prefix=/usr --bindir=/usr/i686-pc-linux-gnu/gcc-bin/14 --includedir=/usr/lib/gcc/i686-pc-linux-gnu/14/include --datadir=/usr/share/gcc-data/i686-pc-linux-gnu/14 --mandir=/usr/share/gcc-data/i686-pc-linux-gnu/14/man --infodir=/usr/share/gcc-data/i686-pc-linux-gnu/14/info --with-gxx-include-dir=/usr/lib/gcc/i686-pc-linux-gnu/14/include/g++-v14 --disable-silent-rules --disable-dependency-tracking --with-python-dir=/share/gcc-data/i686-pc-linux-gnu/14/python --enable-languages=c,c++,fortran --enable-obsolete --enable-secureplt --disable-werror --with-system-zlib --enable-nls --without-included-gettext --disable-libunwind-exceptions --enable-checking=yes,extra,rtl,df --with-bugurl=https://bugs.gentoo.org/ --with-pkgversion='Gentoo 14.0. p, commit c8305c9bdf09abe3e2f89783fe62f2e4049468fa' --with-gcc-major-version-only --enable-libstdcxx-time --enable-lto --disable-libstdcxx-pch --enable-shared --enable-threads=posix --enable-__cxa_atexit --enable-clocale=gnu --disable-multilib --disable-fixed-point --with-arch=i686 --enable-targets=all --enable-libgomp --disable-libssp --disable-libada --disable-cet --disable-systemtap --enable-valgrind-annotations --disable-vtable-verify --disable-libvtv --with-zstd --without-isl --enable-default-pie --enable-host-pie --disable-host-bind-now --enable-default-ssp --disable-fixincludes --with-build-config='bootstrap-O3 bootstrap-lto' Thread model: posix Supported LTO compression algorithms: zlib zstd gcc version 14.0.1 20240304 (experimental) a89c5df317d1de74871e2a05c36aed9cbbb21f42 (Gentoo 14.0. p, commit c8305c9bdf09abe3e2f89783fe62f2e4049468fa) ```
[Bug tree-optimization/114231] [12/13/14 regression] ICE when building libjxl
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114231 --- Comment #9 from Andrew Pinski --- The testcases are kinda of flaky because the following fails only on the trunk for aarch64: ``` void f(long*); int ff[2]; long tt[4]; unsigned long ttt; void k(long x, long y) { long t = x >> ff[0]; long t1 = ff[1]; long t2 = y >> ff[0]; tt[0] = t1; tt[1] = t+t2; tt[2] = t2; } ```
[Bug tree-optimization/114231] [12/13/14 regression] ICE when building libjxl
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114231 --- Comment #8 from Andrew Pinski --- (In reply to Andrew Pinski from comment #7) > So my reduced testcase fails for aarch64 since GCC 12 but for x86_64 only on > the trunk. I suspect the commit that it will bisect to on x86_64 is just > enabling the pattern for x86_64. > > So if anyone does a bisect, please try on aarch64. Note also use `-O3 -fno-vect-cost-model` for the options since -O2 might catch when the vectorizer is turned on for -O2.
[Bug tree-optimization/114231] [12/13/14 regression] ICE when building libjxl
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114231 Andrew Pinski changed: What|Removed |Added Target Milestone|14.0|12.4 Keywords||needs-bisection Known to work||11.1.0 Summary|[14 regression] ICE when|[12/13/14 regression] ICE |building libjxl |when building libjxl Target||aarch64 x86_64 --- Comment #7 from Andrew Pinski --- So my reduced testcase fails for aarch64 since GCC 12 but for x86_64 only on the trunk. I suspect the commit that it will bisect to on x86_64 is just enabling the pattern for x86_64. So if anyone does a bisect, please try on aarch64.
[Bug tree-optimization/114231] [14 regression] ICE when building libjxl
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114231 --- Comment #6 from Andrew Pinski --- A little better reduced (this time only 1 BB even): ``` void f(long*); int ff[2]; void f2(long, long, unsigned long); void k(unsigned long x, unsigned long y) { long t = x >> ff[0]; long t1 = ff[1]; unsigned long t2 = y >> ff[0]; f2(t1, t+t2, t2); } ```
[Bug tree-optimization/114231] [14 regression] ICE when building libjxl
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114231 --- Comment #5 from Andrew Pinski --- Little more reduced: ``` void f(long*); int ff[2]; void f1(long, long); void k(unsigned long x, unsigned long y) { long t = x >> ff[0]; long t1 = ff[1]; unsigned long t2 = y >> ff[0]; long t3 = t+t2 ? t2 : 0; f1(t1, t3); } ``` /app/example.cpp:9:14: missed: unusable type for last operand in vector/vector shift/rotate. Note if you change the type of ff to long, this works.
[Bug tree-optimization/114231] [14 regression] ICE when building libjxl
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114231 Andrew Pinski changed: What|Removed |Added Ever confirmed|0 |1 Status|UNCONFIRMED |NEW Last reconfirmed||2024-03-05 --- Comment #4 from Andrew Pinski --- Further reduced: ``` static inline long ClampedSize(long begin, unsigned long size_max) { return begin + size_max ? size_max : 0; } void f(long*); int ff[2]; void k(unsigned long x, unsigned long y) { long t = x >> ff[0]; long t1 = ff[1]; long t2 = y >> ff[0]; long t3 = ClampedSize(t, t2); long t4[2]; t4[0] = t1; t4[1] = t3; f(t4); } ```
[Bug tree-optimization/114231] [14 regression] ICE when building libjxl
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114231 --- Comment #3 from Andrew Pinski --- vectorizable_shift
[Bug tree-optimization/114231] [14 regression] ICE when building libjxl
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114231 --- Comment #2 from Sam James --- (In reply to Sam James from comment #1) > Created attachment 57607 [details] > reduced.ii > > `g++ -c reduced.ii -march=sapphirerapids -O2 -fno-vect-cost-model` is enough > for the reduced version. in fact, g++ -c reduced.ii -O2 -fno-vect-cost-model is enough
[Bug tree-optimization/114231] [14 regression] ICE when building libjxl
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114231 Andrew Pinski changed: What|Removed |Added Keywords||ice-on-valid-code Target Milestone|--- |14.0 CC||pinskia at gcc dot gnu.org
[Bug tree-optimization/114231] [14 regression] ICE when building libjxl
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114231 --- Comment #1 from Sam James --- Created attachment 57607 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57607=edit reduced.ii `g++ -c reduced.ii -march=sapphirerapids -O2 -fno-vect-cost-model` is enough for the reduced version.
[Bug tree-optimization/114231] New: [14 regression] ICE when building libjxl
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114231 Bug ID: 114231 Summary: [14 regression] ICE when building libjxl Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: sjames at gcc dot gnu.org Target Milestone: --- Created attachment 57606 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57606=edit enc_modular.ii.xz Originally reported downstream in Gentoo by tdr. ``` $ g++ -c enc_modular.ii -mrtm -mshstk -march=sapphirerapids -O3 -fno-vect-cost-model during GIMPLE pass: slp /var/tmp/portage/media-libs/libjxl-0.9.1-r1/work/libjxl-0.9.1/lib/jxl/enc_modular.cc: In member function ‘jxl::Status jxl::ModularFrameEncoder::PrepareStreamParams(const jxl::Rect&, const jxl::CompressParams&, int, int, const jxl::ModularStreamId&, bool)’: /var/tmp/portage/media-libs/libjxl-0.9.1-r1/work/libjxl-0.9.1/lib/jxl/enc_modular.cc:1294:8: internal compiler error: in vect_transform_stmt, at tree-vect-stmts.cc:13361 1294 | Status ModularFrameEncoder::PrepareStreamParams(const Rect& rect, |^~~ 0x55bdf7edf2b0 vect_transform_stmt(vec_info*, _stmt_vec_info*, gimple_stmt_iterator*, _slp_tree*, _slp_instance*) /usr/src/debug/sys-devel/gcc-14.0./gcc-14.0./gcc/tree-vect-stmts.cc:13361 0x55bdf99cd64d vect_schedule_slp_node /usr/src/debug/sys-devel/gcc-14.0./gcc-14.0./gcc/tree-vect-slp.cc:9410 0x55bdf99cd64d vect_schedule_slp_node /usr/src/debug/sys-devel/gcc-14.0./gcc-14.0./gcc/tree-vect-slp.cc:9203 0x55bdf99ccaa2 vect_schedule_scc /usr/src/debug/sys-devel/gcc-14.0./gcc-14.0./gcc/tree-vect-slp.cc:9645 0x55bdf96e6382 vect_schedule_slp(vec_info*, vec<_slp_instance*, va_heap, vl_ptr> const&) /usr/src/debug/sys-devel/gcc-14.0./gcc-14.0./gcc/tree-vect-slp.cc:9790 0x55bdf9468b27 vect_slp_region /usr/src/debug/sys-devel/gcc-14.0./gcc-14.0./gcc/tree-vect-slp.cc:7911 0x55bdf94648c3 vect_slp_bbs /usr/src/debug/sys-devel/gcc-14.0./gcc-14.0./gcc/tree-vect-slp.cc:8011 0x55bdf94626be vect_slp_function(function*) /usr/src/debug/sys-devel/gcc-14.0./gcc-14.0./gcc/tree-vect-slp.cc:8127 0x55bdf9461e1c execute /usr/src/debug/sys-devel/gcc-14.0./gcc-14.0./gcc/tree-vectorizer.cc:1533 Please submit a full bug report, with preprocessed source (by using -freport-bug). Please include the complete backtrace with any bug report. See <https://bugs.gentoo.org/> for instructions. ``` ``` Using built-in specs. COLLECT_GCC=gcc COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-pc-linux-gnu/14/lto-wrapper Target: x86_64-pc-linux-gnu Configured with: /var/tmp/portage/sys-devel/gcc-14.0./work/gcc-14.0./configure --host=x86_64-pc-linux-gnu --build=x86_64-pc-linux-gnu --prefix=/usr --bindir=/usr/x86_64-pc-linux-gnu/gcc-bin/14 --includedir=/usr/lib/gcc/x86_64-pc-linux-gnu/14/include --datadir=/usr/share/gcc-data/x86_64-pc-linux-gnu/14 --mandir=/usr/share/gcc-data/x86_64-pc-linux-gnu/14/man --infodir=/usr/share/gcc-data/x86_64-pc-linux-gnu/14/info --with-gxx-include-dir=/usr/lib/gcc/x86_64-pc-linux-gnu/14/include/g++-v14 --disable-silent-rules --disable-dependency-tracking --with-python-dir=/share/gcc-data/x86_64-pc-linux-gnu/14/python --enable-languages=c,c++,fortran,rust --enable-obsolete --enable-secureplt --disable-werror --with-system-zlib --enable-nls --without-included-gettext --disable-libunwind-exceptions --enable-checking=yes,extra,rtl --with-bugurl=https://bugs.gentoo.org/ --with-pkgversion='Gentoo Hardened 14.0. p, commit c8305c9bdf09abe3e2f89783fe62f2e4049468fa' --with-gcc-major-version-only --enable-libstdcxx-time --enable-lto --disable-libstdcxx-pch --enable-shared --enable-threads=posix --enable-__cxa_atexit --enable-clocale=gnu --enable-multilib --with-multilib-list=m32,m64 --disable-fixed-point --enable-targets=all --enable-libgomp --disable-libssp --disable-libada --disable-cet --disable-systemtap --enable-valgrind-annotations --disable-vtable-verify --disable-libvtv --with-zstd --with-isl --disable-isl-version-check --enable-default-pie --enable-host-pie --enable-host-bind-now --enable-default-ssp --disable-fixincludes --with-build-config='bootstrap-O3 bootstrap-lto' Thread model: posix Supported LTO compression algorithms: zlib zstd gcc version 14.0.1 20240304 (experimental) eae6b63b5b5426f943f58b5ae0bf0a6068ca8ad6 (Gentoo Hardened 14.0. p, commit c8305c9bdf09abe3e2f89783fe62f2e4049468fa) ```
[Bug c/8960] invalid error mode `SI' applied to inappropriate type
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=8960 --- Comment #14 from Andrew Pinski --- I am not 100% sure if this is actually valid. The question becomes does the attribute in this case applies to the return type or the type of the function? The manual is not clear here either.
[Bug libfortran/93550] Implement control of leading zero in formatted numeric output
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93550 --- Comment #4 from Jerry DeLisle --- The LEADING_ZERO specifiers are now included in the 2023 standard, so away we go! In support of lazy programmers lets make the compiler do it. ;)
[Bug target/114194] ICE when using std::unique_ptr with xtheadvector
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114194 --- Comment #6 from Bruce Hoult --- The ICE also happens with bzero(). The ICE does NOT happen with a constant length of 16 of greater, in which case a function call is made instead of expanding inline. With rv64gv or rv64gcv a series of N `sb` are generated (N < 16) With rv64gc_xtheadvector, N >= 6, and -Os a tail call to memset is generated, no ICE. With N < 6 ... ICE. So the problem is only trying to expand memset() or bzero() inline. Does it try to use a vectorised memset? That doesn't happen with rv64gcv. memcpy() does not ICE for any N. I assume the originally reported C++ code is generating a memset() to initialise one of the classes/structs.
[Bug tree-optimization/114230] Missed optimization of loop deletion: `a!=0`
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114230 Andrew Pinski changed: What|Removed |Added Status|UNCONFIRMED |NEW Ever confirmed|0 |1 Summary|Missed optimization of loop |Missed optimization of loop |deletion: a=0||a|deletion: `a!=0` CC||pinskia at gcc dot gnu.org Keywords||missed-optimization Severity|normal |enhancement Last reconfirmed||2024-03-05 --- Comment #1 from Andrew Pinski --- Confirmed. we have: ``` [local count: 1063004408]: # i_11 = PHI # a_lsm.4_13 = PHI <_3(5), a_lsm.4_5(2)> _2 = a_lsm.4_13 != 0; _3 = (int) _2; i_8 = i_11 + 1; if (i_8 != 10) goto ; [98.99%] else goto ; [1.01%] [local count: 1052266995]: goto ; [100.00%] ``` Which sccp does not handle `(int)a != 0` currently. It does handle `a|=b;`, `a^=b;`, and `a&=b;` though.
[Bug tree-optimization/114230] New: Missed optimization of loop deletion: a=0||a
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114230 Bug ID: 114230 Summary: Missed optimization of loop deletion: a=0||a Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: 652023330028 at smail dot nju.edu.cn Target Milestone: --- Hello, we noticed that in the code below, looping is not necessary (the value of 0||a doesn't change), but gcc seems to have missed this optimization. https://godbolt.org/z/bx9jEfb63 int a; void func(){ for(int i=0;i<10;i++){ a=0||a; } } GCC -O3: func(): mov edx, DWORD PTR a[rip] mov eax, 10 .L2: testedx, edx setne dl movzx edx, dl sub eax, 1 jne .L2 mov DWORD PTR a[rip], edx ret Expected code (Clang): func(): # @func() xor eax, eax cmp dword ptr [rip + a], 0 setne al mov dword ptr [rip + a], eax ret Thank you very much for your time and effort! We look forward to hearing from you.
[Bug target/113859] popcount HI can be vectorized for non-SVE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113859 Andrew Pinski changed: What|Removed |Added Status|UNCONFIRMED |ASSIGNED Ever confirmed|0 |1 Assignee|unassigned at gcc dot gnu.org |pinskia at gcc dot gnu.org Last reconfirmed||2024-03-05 --- Comment #2 from Andrew Pinski --- Mine.
[Bug other/79469] Feature request: provide `__builtin_assume` builtin function to allow more aggressive optimizations and to match clang
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79469 Andrew Pinski changed: What|Removed |Added Resolution|--- |WONTFIX Status|NEW |RESOLVED --- Comment #6 from Andrew Pinski --- So even though clang/LLVM has not implement the attribute yet (https://github.com/llvm/llvm-project/pull/81014), adding another extension is not a good idea for GCC so closing as won't fix. There is a reason why this got standarized is so it can be implemented in a cross compiler way. Also the builtin has an odd definition when it comes to the whole no side effects (though the attribute has that, it is not directly part of the a function call).
[Bug target/54284] -mabi=ieeelongdouble problems
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54284 Peter Bergner changed: What|Removed |Added CC|bergner at vnet dot ibm.com, |bergner at gcc dot gnu.org, |dje.gcc at gmail dot com |dje at gcc dot gnu.org Resolution|--- |FIXED Status|UNCONFIRMED |RESOLVED --- Comment #1 from Peter Bergner --- I'm pretty sure this has been long ago fixed, so I'm going to close this as FIXED.
[Bug target/50329] [PowerPC] Unnecessary stack frame set up
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50329 Peter Bergner changed: What|Removed |Added CC||bergner at gcc dot gnu.org Status|UNCONFIRMED |RESOLVED Resolution|--- |FIXED --- Comment #3 from Peter Bergner --- (In reply to Segher Boessenkool from comment #2) > Current trunk (to be GCC 6) optimises "c" perfectly. Not the other > two, alas. Current trunk (to be GCC 14) optimizes all of them now. Marking as FIXED. a: li 9,-1 rldicr 9,9,0,0 std 9,0(3) blr b: li 9,-1 rldicr 9,9,0,0 std 9,0(3) blr c: li 9,0 li 10,-1 rldimi 9,10,63,0 std 9,0(3) blr
[Bug target/36557] -m32 -mpowerpc64 produces better code than -m64 for a!=0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=36557 Peter Bergner changed: What|Removed |Added Status|NEW |RESOLVED CC||bergner at gcc dot gnu.org Resolution|--- |FIXED --- Comment #5 from Peter Bergner --- (In reply to Segher Boessenkool from comment #4) > We now do > > cntlzw 3,3 > srwi 3,3,5 > xori 3,3,0x1 > blr > > which is still not optimal (and not what -m32 / -m32 -mpowerpc64 do). My GCC 10 and later compiles show we now generate: addic 9,3,-1 subfe 3,9,3 blr Marking as FIXED.
[Bug target/33236] -mminimal-toc register should be psedu-register
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=33236 Peter Bergner changed: What|Removed |Added Resolution|--- |WONTFIX Status|NEW |RESOLVED CC||bergner at gcc dot gnu.org --- Comment #5 from Peter Bergner --- (In reply to Segher Boessenkool from comment #4) > Still happens. I'm marking this as WONTFIX since -mminimal-toc is an option that is basically never used with the introduction of -mcmodel=medium (and is the default) and which results in ideal code for this testcase.
[Bug target/31557] return 0x80000000UL code gen can be improved
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=31557 Andrew Pinski changed: What|Removed |Added Target Milestone|--- |13.0
[Bug target/31557] return 0x80000000UL code gen can be improved
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=31557 Peter Bergner changed: What|Removed |Added Resolution|--- |FIXED CC||bergner at gcc dot gnu.org Status|REOPENED|RESOLVED --- Comment #7 from Peter Bergner --- (In reply to Segher Boessenkool from comment #6) > Actually, huh, *not* fixed on trunk yet. This was fixed in GCC 13. Marking it as FIXED.
[Bug target/113001] [14 Regression] RISCV Zicond ICE: in extract_insn, at recog.cc:2812 with -O2 rv64gcv_zicond
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113001 --- Comment #3 from Jeffrey A. Law --- *** Bug 112871 has been marked as a duplicate of this bug. ***
[Bug target/112871] [14 Regression] RISCV ICE: in extract_insn, at recog.cc:2804 (unrecognizable insn) with -01 rv32gc_zicond
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112871 Jeffrey A. Law changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |DUPLICATE --- Comment #3 from Jeffrey A. Law --- Same path through the conditional move expansion code. *** This bug has been marked as a duplicate of bug 113001 ***
[Bug target/114194] ICE when using std::unique_ptr with xtheadvector
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114194 --- Comment #5 from Bruce Hoult --- oops .. 379 lines .. I grep'd wrong. Anyway... gcc/config/riscv/riscv-vector-switch.def -ENTRY (RVVMF2QI, true, LMUL_F2, 16) -ENTRY (RVVMF4QI, true, LMUL_F4, 32) -ENTRY (RVVMF8QI, TARGET_MIN_VLEN > 32, LMUL_F8, 64) +ENTRY (RVVMF2QI, !TARGET_XTHEADVECTOR, LMUL_F2, 16) +ENTRY (RVVMF4QI, !TARGET_XTHEADVECTOR, LMUL_F4, 32) +ENTRY (RVVMF8QI, TARGET_MIN_VLEN > 32 && !TARGET_XTHEADVECTOR, LMUL_F8, 64) Fractional LMUL (including RVVMF8QI) is removed. Correct, 0.7.1 doesn't have it. But something still tries to use it.
[Bug c++/98356] [11/12/13/14 Regression] ICE in cp_parser_dot_deref_incomplete, at cp/parser.c:7899 since r9-4841-g2139fd74f31449c0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98356 Marek Polacek changed: What|Removed |Added CC||mpolacek at gcc dot gnu.org Keywords||patch --- Comment #7 from Marek Polacek --- This has a patch now: https://gcc.gnu.org/pipermail/gcc-patches/2024-March/647157.html
[Bug target/114194] ICE when using std::unique_ptr with xtheadvector
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114194 --- Comment #4 from Bruce Hoult --- I've bisected this and the problem is introduced in 2d7205eb2c3 "RISC-V: Handle differences between XTheadvector and Vector" Fortunately this commit touches only 136 lines of code, unlike the later two xtheadvector commits which are 1119 and 204 touched lines.
[Bug target/114224] popcount RTL cost seems wrong with cssc
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114224 Andrew Pinski changed: What|Removed |Added Status|UNCONFIRMED |ASSIGNED Assignee|unassigned at gcc dot gnu.org |pinskia at gcc dot gnu.org Last reconfirmed||2024-03-04 Ever confirmed|0 |1 --- Comment #2 from Andrew Pinski --- Interesting: ``` int h1(unsigned a) { return __builtin_popcountg(a) == 1; } ``` works. Anyways I will be adding POPCOUNT's rtl cost here. We don't even handle POPCOUNT for vector modes either ...
[Bug middle-end/106727] Missed fold / canonicalization for checking if a number is a power of 2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106727 --- Comment #3 from Andrew Pinski --- (In reply to Richard Biener from comment #1) > Confirmed. Expanding __builtin_popcount (n) <= 1 as (n & (n - 1)) == 0 > might be already done. The canonicalization could be applied if .POPCOUNT > is available. No, it is not already done, expanding `__builtin_popcount (n) == 1` is done (and including if n is known not to include 0 which is exapnded as `n & (n - 1) == 0`)
[Bug middle-end/106727] Missed fold / canonicalization for checking if a number is a power of 2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106727 Andrew Pinski changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |pinskia at gcc dot gnu.org Status|NEW |ASSIGNED --- Comment #2 from Andrew Pinski --- Mine.
[Bug libstdc++/97759] Could std::has_single_bit be faster?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97759 Andrew Pinski changed: What|Removed |Added Ever confirmed|0 |1 Assignee|unassigned at gcc dot gnu.org |pinskia at gcc dot gnu.org Last reconfirmed||2024-03-04 Status|UNCONFIRMED |ASSIGNED --- Comment #15 from Andrew Pinski --- >popcount (x) == 1 || x == 0 That could be optimized to just `popcount (x) <= 1`. I am going to look to see what is left in GCC 15.
[Bug tree-optimization/90693] Missing popcount simplifications
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90693 Andrew Pinski changed: What|Removed |Added Assignee|wilco at gcc dot gnu.org |pinskia at gcc dot gnu.org --- Comment #14 from Andrew Pinski --- > __builtin_popcount (x) == 1 into x == (x & -x) Actually that should be `__builtin_popcount (x) <= 1` Anyways I am going to implement the rest here due to PR 94787 .
[Bug tree-optimization/113441] [14 Regression] Fail to fold the last element with multiple loop since g:2efe3a7de0107618397264017fb045f237764cc7
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113441 Richard Sandiford changed: What|Removed |Added Attachment #57602|0 |1 is obsolete|| --- Comment #42 from Richard Sandiford --- Created attachment 57605 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57605=edit proof-of-concept patch to suppress peeling for gaps How about the attached? It records whether all accesses that require peeling for gaps could instead have used gathers, and only retries when that's true. It means that we retry for only 0.034% of calls to vect_analyze_loop_1 in a build of SPEC2017 with -mcpu=neoverse-v1 -Ofast -fomit-frame-pointer. The figures exclude wrf, which failed for me with: module_mp_gsfcgce.fppized.f90:852:23: 852 |REAL FUNCTION ggamma(X) | ^ Error: definition in block 18 does not dominate use in block 13 for SSA_NAME: stmp_pf_6.5657_140 in statement: pf_81 = PHI PHI argument stmp_pf_6.5657_140 for PHI node pf_81 = PHI during GIMPLE pass: vect module_mp_gsfcgce.fppized.f90:852:23: internal compiler error: verify_ssa failed Will look at that tomorrow.
[Bug middle-end/114198] [14] RISC-V fixed-length vector -flto ICE: in vectorizable_load, at tree-vect-stmts.cc:10570
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114198 --- Comment #2 from Patrick O'Neill --- (In reply to Richard Biener from comment #1) > Probably also with -fwhole-program instead of -flto Thanks! Updated args (--param=riscv-autovec-preference=fixed-vlmax was recently removed): -march=rv64gcv -fwhole-program -O3 -mrvv-vector-bits=zvl or -march=rv64gcv -flto -O3 -mrvv-vector-bits=zvl Updated godbolt: https://godbolt.org/z/qb9bK61xM
[Bug target/114083] Possible word play on conditional/unconditional
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114083 --- Comment #6 from Roland Illig --- (In reply to Maciej W. Rozycki from comment #4) > The flag enables the use of the conditional-move operations even with > hardware that has no support for such operations, hence unconditionally. Thank you for your explanation, that made the intention much clearer to me. There's a problem with the wording though. On a platform that doesn't support conditional-move operations, it's not possible to _use_ conditional-move operations. Period. It's only possible to _emulate_ the behavior of these operations. I'm not sure how consistently the words 'operation' and 'instruction' are used in the GCC code base and documentation, but I mixed them up in my mind when I tried to translate this option. > if someone has > a better proposal, then please feel free to submit a patch. Or would: > > Enable conditional-move operations unconditionally. > > be preferable? No. Above, you wrote that the branchless instructions would be selected _if_ they are cheaper than the equivalent branch instructions. This is a condition, thus the word 'unconditionally' doesn't fit. What about this? > Prefer branchless move instructions where cheaper.
[Bug modula2/114227] InstallTerminationProcedure does not work with -fiso
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114227 Gaius Mulley changed: What|Removed |Added Resolution|--- |FIXED Status|ASSIGNED|RESOLVED --- Comment #4 from Gaius Mulley --- Closing now that the patch has been applied.
[Bug sanitizer/114217] -fsanitize=alignment false positive with intended unaligned struct member access
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114217 Fangrui Song changed: What|Removed |Added CC||i at maskray dot me --- Comment #14 from Fangrui Song --- I agree with Jakub and Andrew. The relevant rules: C11 6.3.2.3 says > An integer may be converted to any pointer type. Except as previously > specified, the result is implementation-defined, might not be correctly > aligned, might not point to an entity of the referenced type, and might be a > trap representation. > > A pointer to an object type may be converted to a pointer to a different > object type. If the resulting pointer is not correctly aligned for the > referenced type, the behavior is undefined. ... C++ [expr.static.cast]p14 says of conversions from a misaligned pointer: > A prvalue of type “pointer to cv1 void” can be converted to a prvalue of type > “pointer to cv2 T”, where T is an object type and cv2 is the same > cv-qualification as, or greater cv-qualification than, cv1. If the original > pointer value represents the address A of a byte in memory and A does not > satisfy the alignment requirement of T, then the resulting pointer value is > unspecified. ... Which is allowed to be an invalid pointer value, which the compiler is then permitted to give whatever semantics we like, such as disallowing it being passed to memcpy. --- memcpy is preferred for expressing an unaligned access. typedef struct dir_entry dir_entry_u __attribute__((aligned(1))); // In C++, there is an alternative: using dir_entry_u __attribute__((aligned(1))) = dir_entry; u64 gu(dir_entry_u *entry) { return entry->offset; }
[Bug modula2/114227] InstallTerminationProcedure does not work with -fiso
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114227 --- Comment #3 from GCC Commits --- The master branch has been updated by Gaius Mulley : https://gcc.gnu.org/g:d646db0e35ad9d235635b204349f5d960072f9fe commit r14-9308-gd646db0e35ad9d235635b204349f5d960072f9fe Author: Gaius Mulley Date: Mon Mar 4 21:46:32 2024 + PR modula2/114227 InstallTerminationProcedure does not work with -fiso This patch moves the initial/termination user procedure functionality in pim and iso versions of M2RTS into M2Dependent. This ensures that finalization/initialization procedures will always be invoked for both -fiso and -fpim. Prior to this patch M2Dependent called M2RTS for termination procedure cleanup and always invoked the pim M2RTS. gcc/m2/ChangeLog: PR modula2/114227 * gm2-libs-iso/M2RTS.mod (ProcedureChain): Remove. (ProcedureList): Remove. (ExecuteReverse): Remove. (ExecuteTerminationProcedures): Rewrite. (ExecuteInitialProcedures): Rewrite. (AppendProc): Remove. (InstallTerminationProcedure): Rewrite. (InstallInitialProcedure): Rewrite. (InitProcList): Remove. * gm2-libs/M2Dependent.def (InstallTerminationProcedure): New procedure. (ExecuteTerminationProcedures): New procedure. (InstallInitialProcedure): New procedure. (ExecuteInitialProcedures): New procedure. * gm2-libs/M2Dependent.mod (ProcedureChain): New type. (ProcedureList): New type. (ExecuteReverse): New procedure. (ExecuteTerminationProcedures): New procedure. (ExecuteInitialProcedures): New procedure. (AppendProc): New procedure. (InstallTerminationProcedure): New procedure. (InstallInitialProcedure): New procedure. (InitProcList): New procedure. * gm2-libs/M2RTS.mod (ProcedureChain): Remove. (ProcedureList): Remove. (ExecuteReverse): Remove. (ExecuteTerminationProcedures): Rewrite. (ExecuteInitialProcedures): Rewrite. (AppendProc): Remove. (InstallTerminationProcedure): Rewrite. (InstallInitialProcedure): Rewrite. (InitProcList): Remove. Signed-off-by: Gaius Mulley
[Bug modula2/114227] InstallTerminationProcedure does not work with -fiso
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114227 --- Comment #2 from Gaius Mulley --- Created attachment 57604 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57604=edit Proposed fix Here is the proposed patch which moves the initial/termination user procedure functionality in pim and iso versions of M2RTS into M2Dependent. This ensures that finalization/initialization procedures will always be invoked for both -fiso and -fpim. Prior to this patch M2Dependent called M2RTS for termination procedure cleanup and always invoked the pim M2RTS.
[Bug rtl-optimization/101523] Huge number of combine attempts
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101523 --- Comment #13 from Segher Boessenkool --- (In reply to Sarah Julia Kriesch from comment #12) > I expect also, that this bug is a bigger case. A bigger case of what? What do you mean?
[Bug c++/114183] [11/12/13/14 Regression] Lambda constexpr works in msvc but not in gcc
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114183 Andrew Pinski changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |INVALID --- Comment #2 from Andrew Pinski --- Invalid for the same reason as the clang issue is invalid.
[Bug libstdc++/114147] [11/12/13/14 Regression] tuple allocator-extended constructor requires non-explicit default constructor
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114147 --- Comment #8 from GCC Commits --- The master branch has been updated by Jonathan Wakely : https://gcc.gnu.org/g:0a545ac7000501844670add0b3560ebdbcb123c6 commit r14-9307-g0a545ac7000501844670add0b3560ebdbcb123c6 Author: Jonathan Wakely Date: Fri Mar 1 11:16:58 2024 + libstdc++: Add missing std::tuple constructor [PR114147] I caused a regression with commit r10-908 by adding a constraint to the non-explicit allocator-extended default constructor, but seemingly forgot to add an explicit overload with the corresponding constraint. libstdc++-v3/ChangeLog: PR libstdc++/114147 * include/std/tuple (tuple::tuple(allocator_arg_t, const Alloc&)): Add missing overload of allocator-extended default constructor. (tuple::tuple(allocator_arg_t, const Alloc&)): Likewise. * testsuite/20_util/tuple/cons/114147.cc: New test.
[Bug libstdc++/77776] C++17 std::hypot implementation is poor
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=6 --- Comment #18 from Jakub Jelinek --- I was looking at the sysdeps/ieee754/ldbl-128/ version, i.e. what is used for hypotf128.
[Bug rtl-optimization/114211] [13/14 Regression] wrong code with -O -fno-tree-coalesce-vars since r13-1907
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114211 Jakub Jelinek changed: What|Removed |Added Status|NEW |ASSIGNED Assignee|unassigned at gcc dot gnu.org |jakub at gcc dot gnu.org --- Comment #6 from Jakub Jelinek --- Created attachment 57603 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57603=edit gcc14-pr114211.patch Untested fix.
[Bug target/114194] ICE when using std::unique_ptr with xtheadvector
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114194 --- Comment #3 from Bruce Hoult --- Simpler example, found independently. void *memset(); void a(void *b){ memset(b, 0, 1lu); } There might be a lot of code that triggers this. Fortunately the source file this happened in didn't actually use RVV (others did) so I was able to simply use rv64gc for it.
[Bug rtl-optimization/114211] [13/14 Regression] wrong code with -O -fno-tree-coalesce-vars since r13-1907
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114211 --- Comment #5 from Jakub Jelinek --- Anyway, the actual bug is in the r9-4082-g38e601118ca88adf0a472750b0da83f0ef1798a7 PR87507 change. Either we need to punt if the rotate input and output overlaps, or handle that case correctly.
[Bug rtl-optimization/114211] [13/14 Regression] wrong code with -O -fno-tree-coalesce-vars since r13-1907
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114211 Jakub Jelinek changed: What|Removed |Added Summary|[13/14 Regression] wrong|[13/14 Regression] wrong |code with -O|code with -O |-fno-tree-coalesce-vars |-fno-tree-coalesce-vars ||since r13-1907 CC||jakub at gcc dot gnu.org --- Comment #4 from Jakub Jelinek --- Started with r13-1907-g525a1a73a5a563c829a5f76858fe122c9b39f254
[Bug target/113010] [RISCV] sign-extension lost in comparison with constant embedded in comma-op expression
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113010 --- Comment #12 from GCC Commits --- The master branch has been updated by Jakub Jelinek : https://gcc.gnu.org/g:901e7bdab70e2275723ac31dacbbce0b6f68f4f4 commit r14-9304-g901e7bdab70e2275723ac31dacbbce0b6f68f4f4 Author: Jakub Jelinek Date: Mon Mar 4 19:23:02 2024 +0100 combine: Fix recent WORD_REGISTER_OPERATIONS check [PR113010] On Mon, Mar 04, 2024 at 05:18:39PM +0100, Rainer Orth wrote: > unfortunately, the patch broke Solaris/SPARC bootstrap > (sparc-sun-solaris2.11): > > .../gcc/combine.cc: In function 'rtx_code simplify_comparison(rtx_code, rtx_def**, rtx_def**)': > .../gcc/combine.cc:12101:25: error: '*(unsigned int*)((char*)_mode + offsetof(scalar_int_mode, scalar_int_mode::m_mode))' may be used uninitialized [-Werror=maybe-uninitialized] > 12101 | scalar_int_mode mode, inner_mode, tmode; > | ^~ I don't see how it could ever work properly, inner_mode in that spot is just uninitialized. I think we shouldn't worry about paradoxical subregs of non-scalar_int_mode REGs/MEMs and for the scalar_int_mode ones should initialize inner_mode before we use it. Another option would be to use maybe_lt (GET_MODE_PRECISION (GET_MODE (SUBREG_REG (op0))), BITS_PER_WORD) and load_extend_op (GET_MODE (SUBREG_REG (op0))) == ZERO_EXTEND, or set machine_mode smode = GET_MODE (SUBREG_REG (op0)); and use it in those two spots. 2024-03-04 Jakub Jelinek PR rtl-optimization/113010 * combine.cc (simplify_comparison): Guard the WORD_REGISTER_OPERATIONS check on scalar_int_mode of SUBREG_REG and initialize inner_mode.
[Bug c++/114114] [11/12/13/14 Regression] Internal compiler error on function-local conditional noexcept
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114114 Marek Polacek changed: What|Removed |Added Priority|P3 |P2 Assignee|unassigned at gcc dot gnu.org |mpolacek at gcc dot gnu.org Status|NEW |ASSIGNED CC||mpolacek at gcc dot gnu.org
[Bug c++/114183] [11/12/13/14 Regression] Lambda constexpr works in msvc but not in gcc
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114183 Marek Polacek changed: What|Removed |Added CC||mpolacek at gcc dot gnu.org --- Comment #1 from Marek Polacek --- https://github.com/llvm/llvm-project/issues/83569 was closed so this is not a bug?
[Bug tree-optimization/114206] [11/12/13/14 Regression] recursive function call vs local variable addresses
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114206 Andrew Pinski changed: What|Removed |Added Known to work||4.5.3 Summary|recursive function call vs |[11/12/13/14 Regression] |local variable addresses|recursive function call vs ||local variable addresses Target Milestone|--- |11.5 Known to fail||4.6.3, 4.7.3, 5.1.0
[Bug c++/110031] [11/12/13/14 Regression] ICE with deprecated attribute and NTTP and diagnostic for deprecated printed out so much
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110031 Marek Polacek changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |mpolacek at gcc dot gnu.org Status|NEW |ASSIGNED
[Bug libstdc++/77776] C++17 std::hypot implementation is poor
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=6 --- Comment #17 from Matthias Kretz (Vir) --- hypotf(a, b) is implemented using double precision and hypot(a, b) uses 80-bit long double on i386 and x86_64 hypot does what you describe, right? std::experimental::simd benchmarks of hypot(a, b), where simd_abi::scalar uses the implementation (i.e. glibc): -march=skylake-avx512 -ffast-math -O3 -lmvec: TYPE Latency Speedup Throughput Speedup [cycles/call] [per value] [cycles/call] [per value] float, simd_abi::scalar 37.5 1 11.5 1 float,37.6 0.999 10.2 1.13 float, simd_abi::__sse 344.42 6.46 7.15 float, simd_abi::__avx34.18.79 6.56 14.1 float, simd_abi::_Avx512<32> 34.38.76 6.01 15.4 float, simd_abi::_Avx512<64> 44.113.6 12 15.4 float, [[gnu::vector_size(16)]] 58.32.57 47.5 0.974 float, [[gnu::vector_size(32)]]1322.27104 0.892 float, [[gnu::vector_size(64)]]240 2.5222 0.832 -- TYPE Latency Speedup Throughput Speedup [cycles/call] [per value] [cycles/call] [per value] double, simd_abi::scalar 81 1 21.5 1 double,80.11.01 21.3 1.01 double, simd_abi::__sse39.94.06 6.47 6.64 double, simd_abi::__avx40.28.05 12 7.14 double, simd_abi::_Avx512<32> 40.38.04 12 7.14 double, simd_abi::_Avx512<64> 56.211.5 24 7.14 double, [[gnu::vector_size(16)]] 89.31.81 42.5 1.01 double, [[gnu::vector_size(32)]]1502.16110 0.777 double, [[gnu::vector_size(64)]]2972.18242 0.71 -- -march=skylake-avx512 -O3 -lmvec: TYPE Latency Speedup Throughput Speedup [cycles/call] [per value] [cycles/call] [per value] float, simd_abi::scalar 37.6 1 10.4 1 float,37.7 0.998 10.2 1.02 float, simd_abi::__sse37.6 4 8.83 4.71 float, simd_abi::__avx37.58.01 9.42 8.82 float, simd_abi::_Avx512<64> 47.812.6 12 13.8 float, [[gnu::vector_size(16)]] 98.71.52 57.2 0.727 float, [[gnu::vector_size(32)]]151 2114 0.728 float, [[gnu::vector_size(64)]]2602.31230 0.722 -- TYPE Latency Speedup Throughput Speedup [cycles/call] [per value] [cycles/call] [per value] double, simd_abi::scalar 79.7 1 21.7 1 double,80.1 0.995 21.6 1 double, simd_abi::__sse44.2 3.6 9.99 4.33 double, simd_abi::__avx43.67.32 12 7.21 double, simd_abi::_Avx512<64> 59.910.6 24 7.21 double, [[gnu::vector_size(16)]] 88.3 1.8 44.2 0.98 double, [[gnu::vector_size(32)]]1631.96115 0.75 double, [[gnu::vector_size(64)]]3022.11233 0.742 -- I have never ported my SIMD implementation back to scalar and benchmarked it against glibc.
[Bug c++/110031] [11/12/13/14 Regression] ICE with deprecated attribute and NTTP and diagnostic for deprecated printed out so much
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110031 Marek Polacek changed: What|Removed |Added CC||mpolacek at gcc dot gnu.org --- Comment #5 from Marek Polacek --- Started with r8-4678-g6296cf8e099aae: commit 6296cf8e099aae43c86a773f93d83a19df85d7e7 Author: Jason Merrill Date: Thu Nov 16 15:13:48 2017 -0500 PR c++/79092 - non-type args of different types are different
[Bug rtl-optimization/101523] Huge number of combine attempts
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101523 --- Comment #12 from Sarah Julia Kriesch --- Raise your hand if you need anything new from my side. We have got enough use cases in our build system and upstream open source projects gave warnings to remove the s390x support because of long building time and the required resources. I expect also, that this bug is a bigger case.
[Bug c++/114229] [modules] duplicate symbols when including stl in submodule
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114229 --- Comment #1 from Nick Begg --- gcc (GCC) 14.0.1 20240301 (experimental)
[Bug c++/103497] [11/12/13/14 Regression] ICE when decltype(auto)... as parameters
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103497 Marek Polacek changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #6 from Marek Polacek --- Fixed in GCC 14.
[Bug c++/114229] New: [modules] duplicate symbols when including stl in submodule
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114229 Bug ID: 114229 Summary: [modules] duplicate symbols when including stl in submodule Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: nickbegg at gmail dot com Target Milestone: --- using the same test src code as PR113930 - // submod.mpp module; #include export module modA:submod; // modA.mpp module; export module modA; export import :submod; // main.cpp #include import modA; std::string test_func() { return ""; } Note that this test code causes #113930 to check in a GCC debug build. With a GCC release build, at link time numerous STL symbols become duplicated - % /home/nick/inst/gcc-trunk-release/bin/g++ -freport-bug -g CMakeFiles/moduleMin.dir/main.cpp.o CMakeFiles/moduleMin.dir/submod.mpp.o CMakeFiles/moduleMin.dir/modA.mpp.o -o moduleMin /usr/bin/ld: CMakeFiles/moduleMin.dir/modA.mpp.o:(.rodata+0x40): multiple definition of `vtable for std::basic_ios >'; CMakeFiles/moduleMin.dir/main.cpp.o:(.rodata+0x950): first defined here /usr/bin/ld: CMakeFiles/moduleMin.dir/modA.mpp.o:(.rodata+0x60): multiple definition of `vtable for std::basic_ostream >'; CMakeFiles/moduleMin.dir/main.cpp.o:(.rodata+0x8f0): first defined here /usr/bin/ld: CMakeFiles/moduleMin.dir/modA.mpp.o:(.rodata+0xb0): multiple definition of `VTT for std::basic_ostream >'; CMakeFiles/moduleMin.dir/main.cpp.o:(.rodata+0x940): first defined here /usr/bin/ld: CMakeFiles/moduleMin.dir/modA.mpp.o:(.rodata+0xc0): multiple definition of `vtable for std::basic_istream >'; CMakeFiles/moduleMin.dir/main.cpp.o:(.rodata+0x740): first defined here [snip] Note that #including in both places (rather than string in main.cpp) resolves the issue - Is the include guard mechanism failing?
[Bug middle-end/94787] Failure to detect single bit popcount pattern
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94787 --- Comment #7 from Andrew Pinski --- And add: ``` int h(int a) { if (a == 0) return 0; return __builtin_popcount(a) == 1; } int h1(int a) { if (a == 0) return 1; return __builtin_popcount(a) == 1; } ``` h should be just `__builtin_popcount(a) == 1`. While h1 should be just `__builtin_popcount(a) <= 1`.
[Bug tree-optimization/90693] Missing popcount simplifications
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90693 --- Comment #13 from Andrew Pinski --- (In reply to Andrew Pinski from comment #12) > (In reply to Piotr Siupa from comment #11) > > However, I've noticed that: > > bool foo(unsigned x) > > { > > if (x == 0) > > return true; > > else > > return std::has_single_bit(x); > > } > > > Oh that is because expand does not use flow sensitive ranges/non-zero bits > there. There is talk about adding the ability for that but nothing has been > done yet. Well that also should be transformed into `__builtin_popcount(a) <= 1` which then gets expanded into `(v & (v - 1)) == 0`. I will be handling both of those via PR 94787 .
[Bug middle-end/94787] Failure to detect single bit popcount pattern
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94787 --- Comment #6 from Andrew Pinski --- (In reply to Andrew Pinski from comment #5) > Note the expansion part is handled by r14-5612, r14-5613, and r14-6940 . > > So now we just need the match part which I will handle for 15. Actually the expansion part is not fully complete. ``` int f(int a) { return __builtin_popcount(a) <= 1; } int f1(int a) { return __builtin_popcount(a) == 1; } ``` f1 is handled but f is not. f should expand to `!(v & (v - 1))`. The other match patterns needed: ``` int g(int a) { if (a == 0) return 0; return __builtin_popcount(a) <= 1; } int g1(int a) { if (a == 0) return 1; return __builtin_popcount(a) <= 1; } ``` g should be transformed into just `__builtin_popcount(a) == 1` and g1 should be transformed into just `__builtin_popcount(a) <= 1`. Both during phi-opt.
[Bug rtl-optimization/101523] Huge number of combine attempts
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101523 --- Comment #11 from Segher Boessenkool --- Okay, so it is a function with a huge BB, so this is not a regression at all, there will have been incredibly many combination attempts since the day combine has existed.
[Bug tree-optimization/113441] [14 Regression] Fail to fold the last element with multiple loop since g:2efe3a7de0107618397264017fb045f237764cc7
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113441 --- Comment #41 from Richard Sandiford --- (In reply to Richard Biener from comment #40) > So I wonder if we can use "local costing" to decide a gather is always OK > compared to the alternative with peeling for gaps. On x86 gather tends > to be slow compared to open-coding it. Yeah, on SVE gathers are generally “enabling” instructions rather than something to use for their own sake. I suppose one problem is that we currently only try to use gathers for single-element groups. If we make a local decision to use gathers while keeping that restriction, we could end up using gathers “unnecessarily” while still needing to peel for gaps for (say) a two-element group. That is, it's only better to use gathers than contiguous loads if by doing that we avoid all need to peel for gaps (and if the cost of peeling for gaps was high enough to justify the cost of using gathers over consecutive loads). One of the things on the list to do (once everything is SLP!) is to support loads with gaps directly via predication, so that we never load elements that aren't needed. E.g. on SVE, a 64-bit predicate (PTRUE .D) can be used with a 32-bit load (LD1W .S) to load only even-indexed elements. So a single-element group with a group size of 2 could be done cheaply with just consecutive loads, without peeling for gaps.
[Bug c++/106207] [11/12/13/14 Regression] ICE in apply_fixit, at edit-context.cc:769
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106207 Marek Polacek changed: What|Removed |Added Status|ASSIGNED|NEW Assignee|mpolacek at gcc dot gnu.org|unassigned at gcc dot gnu.org
[Bug c++/103994] Module ICE in write_var_def with global variable in global module fragment
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103994 Patrick Palka changed: What|Removed |Added Status|NEW |ASSIGNED Assignee|unassigned at gcc dot gnu.org |ppalka at gcc dot gnu.org
[Bug analyzer/106390] Support gsl::owner and/or [[gnu::owner]] attribute in -fanalyzer
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106390 --- Comment #6 from Jonathan Wakely --- Related work: http://thradams.com/cake/ownership.html
[Bug rtl-optimization/114208] RTL DSE deletes a store that is not dead
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114208 --- Comment #5 from Georg-Johann Lay --- (In reply to Richard Biener from comment #4) > Did it ever work? No. I allowed -mfuse-add=3 to reproduce this PR because there seems to be a problem with DSE, and for the case that someone is going to fix it before it bites an important target. The mfuse-add optimization tries to avoid the broken parts of DSE and works around it; documented are only -mfuse-add=0...2 It was added Feb 2024 as PR114100. > I suppose 'st Y+,r20 is' post-inc so maybe DSE mishandles this somehow. That post-inc is only generated after .dse2: .split2 splits some move insns: These cores don't have reg+offset addressing, so the backend must pretend to support it. Then .split2 generates pointer-adjust + mem-access + undo-pointer-adjust. The address adjustments are plain additions of the address register (frame pointer in this case) and have according REG_CFA_ADJUST_CFA notes. Then .dse2 removes some non-dead stores. The 'st Y+,r20' you mentioned is only generated by .avr-fuse-add which runs after .dse2. I'd guess that GCC is not ready for targets with such tight addressing modes? (without reg+offset addressing; stack-pointer cannot be used either, the only SP accesses are PUSH and POP). ad "needs-bisection": -mfuse-add is a new target optimization added as PR114100 in Feb 2024, so bi-secting won't work because -mfuse-add is not recognized prior to that date.
[Bug tree-optimization/113441] [14 Regression] Fail to fold the last element with multiple loop since g:2efe3a7de0107618397264017fb045f237764cc7
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113441 --- Comment #40 from Richard Biener --- So I wonder if we can use "local costing" to decide a gather is always OK compared to the alternative with peeling for gaps. On x86 gather tends to be slow compared to open-coding it. In the future we might want to explore whether we can re-do costing for alternatives without re-running all of the analysis at least for decisions we know have only "local" effect.
[Bug tree-optimization/113441] [14 Regression] Fail to fold the last element with multiple loop since g:2efe3a7de0107618397264017fb045f237764cc7
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113441 --- Comment #39 from Richard Sandiford --- (In reply to Richard Sandiford from comment #38) > (In reply to Richard Biener from comment #37) > > Even more iteration looks bad. I do wonder why when gather can avoid > > peeling for GAPs using load-lanes cannot? > Like you say, we don't realise that all the loads from array3[i] form a > single group. Oops, sorry, I shouldn't have gone off memory. So yeah, it's array1[] where that happens, not array3[]. The reason we don't use load-lanes is that we don't have load-lane instructions for smaller elements in larger containers, so we're forced to use load-and-permute instead.
[Bug rtl-optimization/114211] [13/14 Regression] wrong code with -O -fno-tree-coalesce-vars
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114211 Uroš Bizjak changed: What|Removed |Added Component|target |rtl-optimization Keywords|needs-bisection | --- Comment #3 from Uroš Bizjak --- (In reply to Richard Biener from comment #2) > Possibly target independent rtl-optimization issue. It is _subreg1 pass that converts: (insn 10 7 11 2 (set (reg/v:TI 106 [ h ]) (rotate:TI (reg/v:TI 106 [ h ]) (const_int 64 [0x40]))) "pr114211.c":9:5 1042 {rotl64ti2_doubleword} (nil)) to: (insn 39 7 40 2 (set (reg:DI 128 [ h+8 ]) (reg:DI 127 [ h ])) "pr114211.c":9:5 84 {*movdi_internal} (nil)) (insn 40 39 11 2 (set (reg:DI 127 [ h ]) (reg:DI 128 [ h+8 ])) "pr114211.c":9:5 84 {*movdi_internal} (nil)) Well... this won't swap. Either parallel should be emitted, or a temporary should be used. Adding -fno-split-wide-types fixes the testcase. Re-confirmed as rtl-optimization problem.
[Bug tree-optimization/113632] Range info for a^CSTP2-1 could be improved in some cases
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113632 Andrew Macleod changed: What|Removed |Added CC||amacleod at redhat dot com --- Comment #1 from Andrew Macleod --- (In reply to Andrew Pinski from comment #0) > Take: > ``` > void dummy(); > _Bool f(unsigned long a) > { > _Bool cmp = a > 8192; > if (cmp) goto then; else goto e; > then: > unsigned long t = __builtin_clzl(a); // [0,50] > t^=63; // [13,63] > return t >= 13; > e: > dummy(); > return 0; > } > ``` > > Currently after the t^=63; we get: > ``` > # RANGE [irange] int [1, 63] MASK 0x3f VALUE 0x0 > _7 = _1 ^ 63; > ``` > > But this could/should be improved to [13,63]. > > If we change to using minus instead: > ``` > t = 63 - t; > ``` > > We get the better range and the comparison (t >= 13) is optimized away. > ``` > Folding statement: t_10 = 63 - t_9; > Global Exported: t_10 = [irange] long unsigned int [13, 63] MASK 0x3f VALUE > 0x0 > Not folded > ``` > > Yes this should up in real code, see the LLVM issue for more information on > that. I think the current implementation of "operator_bitwise_xor::wi_fold ()" in range-op.cc was simply ported from the original version we used in the old VRP code. so it is neither multi-range awre, nor been enhanced. If you put a break point there, you'll see its getting: (gdb) p lh_lb.dump() [0], precision = 32 $1 = void (gdb) p lh_ub.dump() [0x32], precision = 32 $2 = void (gdb) p rh_ub.dump() [0x3f], precision = 32 $3 = void (gdb) p rh_lb.dump() [0x3f], precision = 32 $4 = void One could conceivable do something much better than the general masking stuff that goes on if rh_lb == rh_ub. I suspect we could probably do a better job in general, but have never looked at it. It also looks like we make some minor attempts with signed values in wi_optimize_signed_bitwise_op (), but again, I do not think anyone has tried to make this code do anything new yet.
[Bug tree-optimization/113441] [14 Regression] Fail to fold the last element with multiple loop since g:2efe3a7de0107618397264017fb045f237764cc7
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113441 --- Comment #38 from Richard Sandiford --- (In reply to Richard Biener from comment #37) > Even more iteration looks bad. I do wonder why when gather can avoid > peeling for GAPs using load-lanes cannot? Like you say, we don't realise that all the loads from array3[i] form a single group. Note that we're not using load-lanes in either case, since the group size (8) is too big for that. But load-lanes and load-and-permute have the same restriction about when peeling for gaps is required. In contrast, gather loads only ever load data that they actually need. > Also for the stores we seem to use elementwise stores rather than store-lanes. What configuration are you trying? The original report was about SVE, so I was trying that. There we use a scatter store. > To me the most obvious thing to try optimizing in this testcase is DR > analysis. With -march=armv8.3-a I still see > > t.c:26:22: note: === vect_analyze_data_ref_accesses === > t.c:26:22: note: Detected single element interleaving array1[0][_8] step 4 > t.c:26:22: note: Detected single element interleaving array1[1][_8] step 4 > t.c:26:22: note: Detected single element interleaving array1[2][_8] step 4 > t.c:26:22: note: Detected single element interleaving array1[3][_8] step 4 > t.c:26:22: note: Detected single element interleaving array1[0][_1] step 4 > t.c:26:22: note: Detected single element interleaving array1[1][_1] step 4 > t.c:26:22: note: Detected single element interleaving array1[2][_1] step 4 > t.c:26:22: note: Detected single element interleaving array1[3][_1] step 4 > t.c:26:22: missed: not consecutive access array2[_4][_8] = _69; > t.c:26:22: note: using strided accesses > t.c:26:22: missed: not consecutive access array2[_4][_1] = _67; > t.c:26:22: note: using strided accesses > > so we don't figure > > Creating dr for array1[0][_1] > base_address: > offset from base address: (ssizetype) ((sizetype) (m_111 * 2) * 2) > constant offset from base address: 0 > step: 4 > base alignment: 16 > base misalignment: 0 > offset alignment: 4 > step alignment: 4 > base_object: array1 > Access function 0: {m_111 * 2, +, 2}_4 > Access function 1: 0 > Creating dr for array1[0][_8] > analyze_innermost: success. > base_address: > offset from base address: (ssizetype) ((sizetype) (m_111 * 2 + 1) * > 2) > constant offset from base address: 0 > step: 4 > base alignment: 16 > base misalignment: 0 > offset alignment: 2 > step alignment: 4 > base_object: array1 > Access function 0: {m_111 * 2 + 1, +, 2}_4 > Access function 1: 0 > > belong to the same group (but the access functions tell us it worked out). > Above we fail to split the + 1 to the constant offset. OK, but this is moving the question on to how we should optimise the testcase for Advanced SIMD rather than SVE, and how we should optimise the testcase in general, rather than simply recover what we could do before. (SVE is only enabled for -march=arvm9-a and above, in case armv8.3-a was intended to enable SVE too.)
[Bug tree-optimization/113441] [14 Regression] Fail to fold the last element with multiple loop since g:2efe3a7de0107618397264017fb045f237764cc7
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113441 --- Comment #37 from Richard Biener --- (In reply to Richard Sandiford from comment #36) > Created attachment 57602 [details] > proof-of-concept patch to suppress peeling for gaps > > This patch does what I suggested in the previous comment: if the loop needs > peeling for gaps, try again without that, and pick the better loop. It > seems to restore the original style of code for SVE. > > A more polished version would be a bit smarter about when to retry. E.g. > it's pointless if the main loop already operates on full vectors (i.e. if > peeling 1 iteration is natural in any case). Perhaps the condition should > be that either (a) the number of epilogue iterations is known to be equal to > the VF of the main loop or (b) the target is known to support partial > vectors for the loop's vector_mode. > > Any thoughts? Even more iteration looks bad. I do wonder why when gather can avoid peeling for GAPs using load-lanes cannot? Also for the stores we seem to use elementwise stores rather than store-lanes. To me the most obvious thing to try optimizing in this testcase is DR analysis. With -march=armv8.3-a I still see t.c:26:22: note: === vect_analyze_data_ref_accesses === t.c:26:22: note: Detected single element interleaving array1[0][_8] step 4 t.c:26:22: note: Detected single element interleaving array1[1][_8] step 4 t.c:26:22: note: Detected single element interleaving array1[2][_8] step 4 t.c:26:22: note: Detected single element interleaving array1[3][_8] step 4 t.c:26:22: note: Detected single element interleaving array1[0][_1] step 4 t.c:26:22: note: Detected single element interleaving array1[1][_1] step 4 t.c:26:22: note: Detected single element interleaving array1[2][_1] step 4 t.c:26:22: note: Detected single element interleaving array1[3][_1] step 4 t.c:26:22: missed: not consecutive access array2[_4][_8] = _69; t.c:26:22: note: using strided accesses t.c:26:22: missed: not consecutive access array2[_4][_1] = _67; t.c:26:22: note: using strided accesses so we don't figure Creating dr for array1[0][_1] base_address: offset from base address: (ssizetype) ((sizetype) (m_111 * 2) * 2) constant offset from base address: 0 step: 4 base alignment: 16 base misalignment: 0 offset alignment: 4 step alignment: 4 base_object: array1 Access function 0: {m_111 * 2, +, 2}_4 Access function 1: 0 Creating dr for array1[0][_8] analyze_innermost: success. base_address: offset from base address: (ssizetype) ((sizetype) (m_111 * 2 + 1) * 2) constant offset from base address: 0 step: 4 base alignment: 16 base misalignment: 0 offset alignment: 2 step alignment: 4 base_object: array1 Access function 0: {m_111 * 2 + 1, +, 2}_4 Access function 1: 0 belong to the same group (but the access functions tell us it worked out). Above we fail to split the + 1 to the constant offset. See my hint to use int32_t m instead of uint32_t yielding t.c:26:22: note: Detected interleaving load of size 2 t.c:26:22: note:_2 = array1[0][_1]; t.c:26:22: note:_9 = array1[0][_8]; t.c:26:22: note: Detected interleaving load of size 2 t.c:26:22: note:_18 = array1[1][_1]; t.c:26:22: note:_23 = array1[1][_8]; t.c:26:22: note: Detected interleaving load of size 2 t.c:26:22: note:_32 = array1[2][_1]; t.c:26:22: note:_37 = array1[2][_8]; t.c:26:22: note: Detected interleaving load of size 2 t.c:26:22: note:_46 = array1[3][_1]; t.c:26:22: note:_51 = array1[3][_8]; t.c:26:22: note: Detected interleaving store of size 2 t.c:26:22: note:array2[_4][_1] = _67; t.c:26:22: note:array2[_4][_8] = _69; (and SLP being thrown away because we can use load/store lanes)
[Bug c++/107688] [C++23] P2615 - Meaningful exports
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107688 Nathaniel Shead changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |nshead at gcc dot gnu.org CC||nshead at gcc dot gnu.org Status|NEW |ASSIGNED --- Comment #1 from Nathaniel Shead --- Proposed patch here: https://gcc.gnu.org/pipermail/gcc-patches/2024-March/647120.html
[Bug c/114226] ICE on valid vanilla code when RVV xtheadvector enabled
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114226 Andrew Pinski changed: What|Removed |Added Status|WAITING |RESOLVED Resolution|--- |DUPLICATE --- Comment #4 from Andrew Pinski --- Dup. *** This bug has been marked as a duplicate of bug 114194 ***
[Bug target/114194] ICE when using std::unique_ptr with xtheadvector
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114194 Andrew Pinski changed: What|Removed |Added CC||bruce at hoult dot org --- Comment #2 from Andrew Pinski --- *** Bug 114226 has been marked as a duplicate of this bug. ***
[Bug testsuite/114221] gcc.c-torture/execute/20101011-1.c fails for H8/300
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114221 Jeffrey A. Law changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |FIXED --- Comment #1 from Jeffrey A. Law --- Fixed on the trunk.
[Bug middle-end/114157] during GIMPLE pass: bitintlower ICE: in lower_stmt, at gimple-lower-bitint.cc:5577 with -O with _BitInt(256) / vector memmove
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114157 --- Comment #1 from Jakub Jelinek --- Ah, we need to handle BIT_FIELD_REF from some SSA_NAME to large/huge _BitInt: void foo (vector(8) long int s) { _BitInt(256) _2; [local count: 1073741824]: _2 = BIT_FIELD_REF ; MEM <_BitInt(256)> [(char * {ref-all})] = _2; maybe also BIT_FIELD_REF from large/huge _BitInt to non-bitint and maybe also from/to large/huge _BitInt. Though, I really can't reproduce those cases right now, so it would be purely theoretical.
[Bug middle-end/114197] [14 regression] ICE in verify_dominators
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114197 --- Comment #6 from GCC Commits --- The master branch has been updated by Richard Biener : https://gcc.gnu.org/g:8fdac08b4d5f65973164a476bd255533ed97a766 commit r14-9296-g8fdac08b4d5f65973164a476bd255533ed97a766 Author: Richard Biener Date: Mon Mar 4 13:28:34 2024 +0100 tree-optimization/114197 - unexpected if-conversion for vectorization The following avoids lowering a volatile bitfiled access and in case the if-converted and original loops end up in different outer loops because of simplifcations enabled scrap the result since that is not how the vectorizer expects the loops to be laid out. PR tree-optimization/114197 * tree-if-conv.cc (bitfields_to_lower_p): Do not lower if there are volatile bitfield accesses. (pass_if_conversion::execute): Throw away result if the if-converted and original loops are not nested as expected. * gcc.dg/torture/pr114197.c: New testcase.
[Bug middle-end/114197] [14 regression] ICE in verify_dominators
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114197 Richard Biener changed: What|Removed |Added Resolution|--- |FIXED Status|ASSIGNED|RESOLVED --- Comment #7 from Richard Biener --- Fixed both issues.
[Bug tree-optimization/114228] [14 Regression] memset/memcpy loop not always recognised with -Os
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114228 Andrew Pinski changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |INVALID --- Comment #2 from Andrew Pinski --- Looks like this was on purpose, see PR 111583 for more analysis. Basically if buff/input were either NULL, then this would have been an invalid transformation. So invalid.
[Bug target/112548] [14 regression] 5% exec time regression in 429.mcf on AMD zen4 CPU (since r14-5076-g01c18f58d37865)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112548 --- Comment #6 from Robin Dapp --- Honestly, I don't know how to analyze/debug this without a zen4, in particular as it only seems to happen with PGO. I tried locally but of course the execution time doesn't change (same as with zen3 according to the database). Is there a way to obtain the binaries in order to tell a difference?
[Bug debug/92387] [11/12/13 Regression] gcc generates wrong debug information at -O1 since r10-1907-ga20f263ba1a76a
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92387 --- Comment #5 from Jan Hubicka --- The revision is changing inlining decisions, so it would be probably possible to reproduce the problem without that change with right alaways_inline and noinline attributes.
[Bug tree-optimization/114228] [14 Regression] memset/memcpy loop not always recognised with -Os
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114228 Andrew Pinski changed: What|Removed |Added Ever confirmed|0 |1 Keywords||needs-bisection Last reconfirmed||2024-03-04 Status|UNCONFIRMED |NEW Target Milestone|--- |14.0 Summary|memset/memcpy loop not |[14 Regression] |always recognised with -Os |memset/memcpy loop not ||always recognised with -Os --- Comment #1 from Andrew Pinski --- Confirmed. The IR is the same before coming into ldist . ldist in 13.2.0 had: ``` ldist creates useful parallel partition: 0, 1, 2, 3, 4 Applying pattern match.pd:365, generic-match.cc:23462 distribute loop <1> into partitions: ``` But the trunk: ``` ldist asked to generate code for vertex 3 ldist creates useful parallel partition: 0, 1, 2, 3, 4 Loop 1 not distributed. ``` But no reason why though.
[Bug tree-optimization/113441] [14 Regression] Fail to fold the last element with multiple loop since g:2efe3a7de0107618397264017fb045f237764cc7
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113441 --- Comment #36 from Richard Sandiford --- Created attachment 57602 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57602=edit proof-of-concept patch to suppress peeling for gaps This patch does what I suggested in the previous comment: if the loop needs peeling for gaps, try again without that, and pick the better loop. It seems to restore the original style of code for SVE. A more polished version would be a bit smarter about when to retry. E.g. it's pointless if the main loop already operates on full vectors (i.e. if peeling 1 iteration is natural in any case). Perhaps the condition should be that either (a) the number of epilogue iterations is known to be equal to the VF of the main loop or (b) the target is known to support partial vectors for the loop's vector_mode. Any thoughts?
[Bug middle-end/114197] [14 regression] ICE in verify_dominators
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114197 Richard Biener changed: What|Removed |Added Priority|P3 |P1
[Bug target/114187] [14 regression] bizarre register dance on x86_64 for pass-by-value struct since r14-2526
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114187 Richard Biener changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED --- Comment #6 from Richard Biener --- Fixed I assume.
[Bug rtl-optimization/114190] [14 regression] Wrong code with -O2 -fno-dce -fharden-compares -mvpclmulqdq --param=max-rtl-if-conversion-unpredictable-cost=136
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114190 Richard Biener changed: What|Removed |Added Priority|P3 |P1
[Bug rtl-optimization/114228] New: memset/memcpy loop not always recognised with -Os
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114228 Bug ID: 114228 Summary: memset/memcpy loop not always recognised with -Os Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: denis.campredon at gmail dot com Target Milestone: --- typedef __SIZE_TYPE__ size_t; void baz(char *); void foo( char *__restrict buff, const char*__restrict input) { size_t max = __builtin_strlen (input); for(size_t i = 0 ; i < max; ++i) buff[i] = 0; baz(buff); } void bar( char *__restrict buff, const char*__restrict input) { size_t max = __builtin_strlen (input); for(size_t i = 0 ; i < max; ++i) buff[i] = input[i]; baz(buff); } -- The code above, compiled with -Os, the current trunk fails to convert the two loops into memcpy/memset. gcc 13.2 is able to convert the loops into a call.
[Bug tree-optimization/114108] [14 regression] ICE when building opencv-4.8.1 (error: type mismatch in binary expression) since r14-1833
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114108 Richard Biener changed: What|Removed |Added Priority|P3 |P1