[Bug target/110435] New: ICE in in convert_move, at expr.cc:297 on Aarch64 with -Ofast
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110435 Bug ID: 110435 Summary: ICE in in convert_move, at expr.cc:297 on Aarch64 with -Ofast Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: jamborm at gcc dot gnu.org Target Milestone: --- Host: x86_64-linux Target: aarch64-linux Using a cross compiler (revision r14-2079-g9326a49c9e9d63) configured with /home/worker/buildworker/tiber-gcc-trunk-aarch64/build/configure --enable-languages=c,c++,fortran,rust,m2 --disable-bootstrap --disable-libsanitizer --disable-multilib --enable-checking=release --prefix=/home/worker/cross --target=aarch64-linux-gnu --with-as=/usr/bin/aarch64-suse-linux-as to compile our own testcase gcc/testsuite/gfortran.dg/pr68251.f90 with -Ofast results in an ICE: $ ~/cross/bin/aarch64-linux-gnu-gfortran gcc/testsuite/gfortran.dg/pr68251.f90 -Ofast -o /tmp/aaa.out during RTL pass: expand gcc/testsuite/gfortran.dg/pr68251.f90:1043:57: 1043 | kbc((mc-1)*15+mb) = kbc((mc-1)*15+mb) - ks_bc | ^ internal compiler error: in convert_move, at expr.cc:297 0x74393a convert_move(rtx_def*, rtx_def*, int) /home/worker/buildworker/tiber-gcc-trunk-aarch64/build/gcc/expr.cc:297 0xa5ffa5 expand_expr_real_2(separate_ops*, rtx_def*, machine_mode, expand_modifier) /home/worker/buildworker/tiber-gcc-trunk-aarch64/build/gcc/expr.cc:9368 0x961290 expand_gimple_stmt_1 /home/worker/buildworker/tiber-gcc-trunk-aarch64/build/gcc/cfgexpand.cc:3983 0x961290 expand_gimple_stmt /home/worker/buildworker/tiber-gcc-trunk-aarch64/build/gcc/cfgexpand.cc:4044 0x9661c7 expand_gimple_basic_block /home/worker/buildworker/tiber-gcc-trunk-aarch64/build/gcc/cfgexpand.cc:6096 0x967e2e execute /home/worker/buildworker/tiber-gcc-trunk-aarch64/build/gcc/cfgexpand.cc:6831 Please submit a full bug report, with preprocessed source (by using -freport-bug).
[Bug libstdc++/110432] macOS: Segmentation fault when using stdlibc++ from gcc 13.1 in combination with clang-16
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110432 --- Comment #5 from Iain Sandoe --- (In reply to Sascha Scandella from comment #4) > I found also this issue regarding init_priority: > https://github.com/llvm/llvm-project/issues/15363 So that is the intentional behaviour (upstream clang definitely used to reject it) - as noted it actually works fine with LTO too (or within one module if not). I was investigating whether we could do the work in collect2, but that gets quite complex when considering the interactions between LTO and non-LTO objects. For now, IMO, we should adopt a fix of the nature Jonathan suggests and then it will "just work" if/when we get init prio on Darwin. in slower time, we might consider the option of following clang's behaviour for Darwin (possibly with a warning about the does-not-work-between-tus).
[Bug tree-optimization/110420] [12/13/14 Regression] internal compiler error: in gimple_redirect_edge_and_branch due to simple_dce_from_worklist removing `asm goto`
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110420 --- Comment #7 from Jan-Benedict Glaw --- Confirmed: This patch fixes the issue for me with the Linux PPC builds.
[Bug libstdc++/110432] macOS: Segmentation fault when using stdlibc++ from gcc 13.1 in combination with clang-16
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110432 --- Comment #4 from Sascha Scandella --- I found also this issue regarding init_priority: https://github.com/llvm/llvm-project/issues/15363
[Bug tree-optimization/110434] New: tree-nrv introduces incorrect CLOBBER(eol)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110434 Bug ID: 110434 Summary: tree-nrv introduces incorrect CLOBBER(eol) Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: kristerw at gcc dot gnu.org Target Milestone: --- The tree-nrv pass may introduce incorrect CLOBBER(eol) of the form ={v} {CLOBBER(eol)}; return ; One example of this can be seen by compiling gcc.c-torture/execute/921204-1.c for x86 using the flags "-O -m32", where it changes the IR union bu o; ... o = i; MEM[(union *)].b18 = _11; MEM[(union *)].b20 = _11; = o; o ={v} {CLOBBER(eol)}; return ; to just use instead of o union bu o [value-expr: ]; ... = i; MEM[(union *)&].b18 = _11; MEM[(union *)&].b20 = _11; ={v} {CLOBBER(eol)}; return ; so the CLOBBER(eol) now refers to .
[Bug libstdc++/110432] macOS: Segmentation fault when using stdlibc++ from gcc 13.1 in combination with clang-16
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110432 --- Comment #3 from Iain Sandoe --- interesting - Apple clang does seem to accept __attribute__((init_priority)) but it still does not actually work **between TUs** unless LTO is engaged. Actually GCC for Darwin could adopt a similar scheme (perhaps we should to be *** compatible). The issue is not whether GCC can do it - it is whether the linker (ld64) honours the ordering information and can generate a new global initialiser (which it seems still not to). AFAIR upstream clang rejects the attribute for Darwin. @Jonathan is there a patch for that proposed solution?
[Bug tree-optimization/110428] missed CSE with VLA vectors
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110428 --- Comment #4 from JuzheZhong --- (In reply to JuzheZhong from comment #3) > Hi, I think for VLS vectors, we should be able the enhance CSE for this > following case: > > #include > > void __attribute__((noinline,noclone)) > foo (int *out, int *res, unsigned int n) > { > int mask[] = { 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1 }; > int i; > for (i = 0; i < n+16; ++i) > { > if (mask[i]) > out[i] = i; > } > int o0 = out[0]; > int o7 = out[7]; > int o14 = out[14]; > int o15 = out[15]; > res[0] = o0; > res[2] = o7; > res[4] = o14; > res[6] = o15; > } > > since n is unsigned int number, i < n + 16, ARM SVE fail to CSE. > Is it right? Maybe this case is too complicated, I try this following case: void __attribute__((noinline,noclone)) foo (int *out, int *res, unsigned int n) { int mask[] = { 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1 }; int i; for (i = 0; i < 16; ++i) { if (mask[i]) out[i] = i; } for (i = 16; i < n + 16; ++i) { if (mask[i]) out[i] = i; } int o0 = out[0]; int o7 = out[7]; int o14 = out[14]; int o15 = out[15]; res[0] = o0; res[2] = o7; res[4] = o14; res[6] = o15; } Such case is simpler, it should be CSE? I tried on SVE, GCC failed to CSE.
[Bug libstdc++/110432] macOS: Segmentation fault when using stdlibc++ from gcc 13.1 in combination with clang-16
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110432 --- Comment #2 from Sascha Scandella --- > Still libstdc++ ;-) True that ;-) > Patrick, we talked about this and IIRC your suggestion was to move the > __has_attribute check into configure, so that it depends on GCC, not on > whichever compiler happens to include later. I think this would also be a solution. Would this then be included in a future GCC 13.2? Took quite a while until I figured out the reason for the segfault. Just for completeness sake. I also posted it on one of the brew repositories for GCC. Probably this could be patched on macOS also for GCC 13.1. https://github.com/iains/gcc-13-branch/issues/6
[Bug libstdc++/110432] macOS: Segmentation fault when using stdlibc++ from gcc 13.1 in combination with clang-16
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110432 Jonathan Wakely changed: What|Removed |Added Last reconfirmed||2023-06-27 Status|UNCONFIRMED |NEW Keywords||ABI CC||ppalka at gcc dot gnu.org Ever confirmed|0 |1 --- Comment #1 from Jonathan Wakely --- (In reply to Sascha Scandella from comment #0) > This leads to a segmentation fault with a simple sample application when > using clang-16 in combination with stdlibc++. It's libstdc++ > Would it be possible to change the #if statement such that it would also > work on macOS when using clang in combination with the stdlibc++? Still libstdc++ ;-) > #if !__has_attribute(__init_priority__) > static ios_base::Init __ioinit; > #elif defined(_GLIBCXX_SYMVER_GNU) > __extension__ __asm (".globl _ZSt21ios_base_library_initv"); > #endif Patrick, we talked about this and IIRC your suggestion was to move the __has_attribute check into configure, so that it depends on GCC, not on whichever compiler happens to include later.
[Bug tree-optimization/110428] missed CSE with VLA vectors
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110428 JuzheZhong changed: What|Removed |Added CC||juzhe.zhong at rivai dot ai --- Comment #3 from JuzheZhong --- Hi, I think for VLS vectors, we should be able the enhance CSE for this following case: #include void __attribute__((noinline,noclone)) foo (int *out, int *res, unsigned int n) { int mask[] = { 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1 }; int i; for (i = 0; i < n+16; ++i) { if (mask[i]) out[i] = i; } int o0 = out[0]; int o7 = out[7]; int o14 = out[14]; int o15 = out[15]; res[0] = o0; res[2] = o7; res[4] = o14; res[6] = o15; } since n is unsigned int number, i < n + 16, ARM SVE fail to CSE. Is it right?
[Bug middle-end/110431] Incorrect disambiguation of wide accesess from store-merging or SLP
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110431 Richard Biener changed: What|Removed |Added Ever confirmed|0 |1 Last reconfirmed||2023-06-27 CC||rguenth at gcc dot gnu.org Status|UNCONFIRMED |NEW --- Comment #1 from Richard Biener --- I guess the provenance people would say this violates some rules since pa + 1 is used to access 'b'. GCC itself is prone to introduce such issues when propagating equivalences though.
[Bug c/110430] Fail to CSE for LEN_MASK_STORE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110430 --- Comment #1 from Richard Biener --- *** Bug 110428 has been marked as a duplicate of this bug. ***
[Bug tree-optimization/110428] missed CSE with VLA vectors
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110428 Richard Biener changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |DUPLICATE --- Comment #2 from Richard Biener --- . *** This bug has been marked as a duplicate of bug 110430 ***
[Bug rtl-optimization/110237] gcc.dg/torture/pr58955-2.c is miscompiled by RTL scheduling after reload
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110237 --- Comment #22 from rguenther at suse dot de --- On Tue, 27 Jun 2023, amonakov at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110237 > > --- Comment #21 from Alexander Monakov --- > (In reply to rguent...@suse.de from comment #19) > > But the size argument doesn't have anything to do with TBAA (and > > may_alias is about TBAA). I don't think we have any way to circumvent > > C object access rules. That is, for example, with -fno-strict-aliasing > > the following isn't going to work. > > > > int a; > > int b; > > > > int main() > > { > > a = 1; > > b = 2; > > if ( + 1 == ) // equality compare of unrelated pointers OK > > { > > long x = *(long *) // access outside of 'a' not OK > > if (x != 0x00010002) > > abort (); > > } > > } > > > > there's no command-line flag or attribute to form a pointer > > to an object composing 'a' and 'b' besides changing how the > > storage is declared. > > But store-merging and SLP can introduce a wide long-sized access where on > source level you had two adjacent loads or even memcpy's, so we really seem to > have a problem here and might need to be able to annotate types or individual > accesses as "may-alias-with-oob-ok" in the IR: PR 110431. But above 'a' and 'b' are not adjacent, they are only verified to be at runtime. The only thing we do IIRC is use wider loads to access properly aligned storage as we know the load wouldn't trap. That can lead us to the case you pointed out originally - we load stuff we will ignore but might cause alias disambiguation to disambiguate against a store of the original non-widened size.
[Bug analyzer/110433] New: ASAN reports mismatching new/delete when compiling analyzer testcases
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110433 Bug ID: 110433 Summary: ASAN reports mismatching new/delete when compiling analyzer testcases Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: analyzer Assignee: dmalcolm at gcc dot gnu.org Reporter: jamborm at gcc dot gnu.org CC: dmalcolm at gcc dot gnu.org Blocks: 86656 Target Milestone: --- Host: x86_64-linux Target: x86_64-linux With a bootstrapped compiler configured with --with-build-config=bootstrap-asan I get errors about new/delete mismatching types when compiling testcases: - gcc.dg/analyzer/out-of-bounds-diagram-13.c - gcc.dg/analyzer/out-of-bounds-diagram-15.c - gcc.dg/analyzer/out-of-bounds-diagram-4.c - gcc.dg/analyzer/out-of-bounds-diagram-5-ascii.c - gcc.dg/analyzer/out-of-bounds-diagram-5-unicode.c and - gcc.dg/analyzer/out-of-bounds-diagram-7.c The errors all look like: Executing on host: /home/worker/buildworker/tiber-gcc-asan/objdir/gcc/xgcc -B/home/worker/buildworker/tiber-gcc-asan/objdir/gcc/ /home/worker/buildworker/tiber-gcc-asan/build/gcc/testsuite/gcc.dg/analyzer/out-of-bounds-diagram-13.c -fdiagnostics-plain-output -fanalyzer -Wanalyzer-too-complex -fanalyzer-call-summaries -fdiagnostics-text-art-charset=unicode -S -o out-of-bounds-diagram-13.s(timeout = 300) spawn -ignore SIGHUP /home/worker/buildworker/tiber-gcc-asan/objdir/gcc/xgcc -B/home/worker/buildworker/tiber-gcc-asan/objdir/gcc/ /home/worker/buildworker/tiber-gcc-asan/build/gcc/testsuite/gcc.dg/analyzer/out-of-bounds-diagram-13.c -fdiagnostics-plain-output -fanalyzer -Wanalyzer-too-complex -fanalyzer-call-summaries -fdiagnostics-text-art-charset=unicode -S -o out-of-bounds-diagram-13.s /home/worker/buildworker/tiber-gcc-asan/build/gcc/testsuite/gcc.dg/analyzer/out-of-bounds-diagram-13.c: In function 'test_non_ascii': /home/worker/buildworker/tiber-gcc-asan/build/gcc/testsuite/gcc.dg/analyzer/out-of-bounds-diagram-13.c:9:3: warning: stack-based buffer overflow [CWE-121] [-Wanalyzer-out-of-bounds] /home/worker/buildworker/tiber-gcc-asan/build/gcc/testsuite/gcc.dg/analyzer/out-of-bounds-diagram-13.c:8:8: note: (1) capacity: 9 bytes /home/worker/buildworker/tiber-gcc-asan/build/gcc/testsuite/gcc.dg/analyzer/out-of-bounds-diagram-13.c:9:3: note: (2) out-of-bounds write at byte 9 but 'buf' ends at byte 9 /home/worker/buildworker/tiber-gcc-asan/build/gcc/testsuite/gcc.dg/analyzer/out-of-bounds-diagram-13.c:9:3: note: write of 1 byte to beyond the end of 'buf' /home/worker/buildworker/tiber-gcc-asan/build/gcc/testsuite/gcc.dg/analyzer/out-of-bounds-diagram-13.c:9:3: note: valid subscripts for 'buf' are '[0]' to '[8]' = ==58507==ERROR: AddressSanitizer: new-delete-type-mismatch on 0x50d00a00 in thread T0: object passed to delete has wrong type: size of the allocated type: 136 bytes; size of the deallocated type: 104 bytes. #0 0x83eba8 in operator delete(void*, unsigned long) /home/worker/buildworker/tiber-gcc-asan/build/libsanitizer/asan/asan_new_delete.cpp:164 #1 0x51e6e45 in std::default_delete::operator()(ana::svalue_spatial_item*) const /home/worker/buildworker/tiber-gcc-asan/objdir/prev-x86_64-pc-linux-gnu/libstdc++-v3/include/bits/unique_ptr.h:99 #2 0x51e6e45 in std::unique_ptr >::~unique_ptr() /home/worker/buildworker/tiber-gcc-asan/objdir/prev-x86_64-pc-linux-gnu/libstdc++-v3/include/bits/unique_ptr.h:404 #3 0x51e6e45 in ana::access_diagram_impl::~access_diagram_impl() /home/worker/buildworker/tiber-gcc-asan/build/gcc/analyzer/access-diagram.cc:1728 #4 0x51e703c in ana::access_diagram_impl::~access_diagram_impl() /home/worker/buildworker/tiber-gcc-asan/build/gcc/analyzer/access-diagram.cc:1728 #5 0x4e97142 in std::default_delete::operator()(text_art::widget*) const /home/worker/buildworker/tiber-gcc-asan/objdir/prev-x86_64-pc-linux-gnu/libstdc++-v3/include/bits/unique_ptr.h:99 #6 0x4e97142 in std::unique_ptr >::~unique_ptr() /home/worker/buildworker/tiber-gcc-asan/objdir/prev-x86_64-pc-linux-gnu/libstdc++-v3/include/bits/unique_ptr.h:404 -- ==58507==HINT: if you don't care about these errors you may set ASAN_OPTIONS=new_delete_type_mismatch=0 ==58507==ABORTING compiler exited with status 1 PASS: gcc.dg/analyzer/out-of-bounds-diagram-13.c (test for warnings, line 9) FAIL: gcc.dg/analyzer/out-of-bounds-diagram-13.c at line 10 (test for warnings, line 9) FAIL: gcc.dg/analyzer/out-of-bounds-diagram-13.c expected multiline pattern lines 17-42 FAIL: gcc.dg/analyzer/out-of-bounds-diagram-13.c 2 blank line(s) in output FAIL: gcc.dg/analyzer/out-of-bounds-diagram-13.c (test for excess errors) Excess errors: = ==58507==ERROR: AddressSanitizer: new-delete-type-mismatch
[Bug libstdc++/110432] New: macOS: Segmentation fault when using stdlibc++ from gcc 13.1 in combination with clang-16
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110432 Bug ID: 110432 Summary: macOS: Segmentation fault when using stdlibc++ from gcc 13.1 in combination with clang-16 Product: gcc Version: 13.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: libstdc++ Assignee: unassigned at gcc dot gnu.org Reporter: sascha.scandella at dentsplysirona dot com Target Milestone: --- As you certainly all know GCC has changed the way how the global iostream objects are created since gcc 13.1. This can be found on the official page. "For C++, construction of the global iostream objects std::cout, std::cin, etc. is now done inside the standard library, instead of in every source file that includes the header. This change improves the start-up performance of C++ programs, but it means that code compiled with GCC 13.1 will crash if the correct version of libstdc++.so is not used at runtime. See the documentation about using the right libstdc++.so at runtime. Future GCC releases will mitigate the problem so that the program cannot be run at all with an older libstdc++.so." More details can also be found here: https://developers.redhat.com/articles/2023/04/03/leaner-libstdc-gcc-13 On macOS SUPPORTS_INIT_PRIORITY within gcc is set to 0. This means that the global iostream object is not initialized and the fallback will be taken (i.e. static initialization of the iostream object). The problem is that when the iostream include is used, the expression __has_attribute(__init_priority__) is true, since clang-16 supports __init_priority__ and the static initialization is not done. See here: https://github.com/gcc-mirror/gcc/blob/master/libstdc%2B%2B-v3/include/std/iostream#L78 This leads to a segmentation fault with a simple sample application when using clang-16 in combination with stdlibc++. Sample application: #include int main() { std::cout << "Hello" << std::endl; } $HOMEBREW_PREFIX/opt/llvm@16/bin/clang++ \ -v \ -stdlib=libstdc++ \ -stdlib++-isystem $HOMEBREW_PREFIX/opt/gcc@13/include/c++/13 \ -cxx-isystem $HOMEBREW_PREFIX/opt/gcc@13/include/c++/13/x86_64-apple-darwin22 \ -L $HOMEBREW_PREFIX/opt/gcc@13/lib/gcc/13/ \ -L $HOMEBREW_PREFIX/opt/llvm/lib \ -o test main.cpp Execute test -> segfault. ➜ ~ ./test [1]7965 segmentation fault ./test Would it be possible to change the #if statement such that it would also work on macOS when using clang in combination with the stdlibc++? #if !__has_attribute(__init_priority__) static ios_base::Init __ioinit; #elif defined(_GLIBCXX_SYMVER_GNU) __extension__ __asm (".globl _ZSt21ios_base_library_initv"); #endif Remarks: When compiling with gcc everything works as expected since the iostream object gets initialized properly with the fallback. gcc -v Using built-in specs. COLLECT_GCC=gcc-13 COLLECT_LTO_WRAPPER=/usr/local/Cellar/gcc/13.1.0/bin/../libexec/gcc/x86_64-apple-darwin22/13/lto-wrapper Target: x86_64-apple-darwin22 Configured with: ../configure --prefix=/usr/local/opt/gcc --libdir=/usr/local/opt/gcc/lib/gcc/current --disable-nls --enable-checking=release --with-gcc-major-version-only --enable-languages=c,c++,objc,obj-c++,fortran --program-suffix=-13 --with-gmp=/usr/local/opt/gmp --with-mpfr=/usr/local/opt/mpfr --with-mpc=/usr/local/opt/libmpc --with-isl=/usr/local/opt/isl --with-zstd=/usr/local/opt/zstd --with-pkgversion='Homebrew GCC 13.1.0' --with-bugurl=https://github.com/Homebrew/homebrew-core/issues --with-system-zlib --build=x86_64-apple-darwin22 --with-sysroot=/Library/Developer/CommandLineTools/SDKs/MacOSX13.sdk Thread model: posix Supported LTO compression algorithms: zlib zstd gcc version 13.1.0 (Homebrew GCC 13.1.0) OS: macOS Ventura 13.4 (Intel)
[Bug rtl-optimization/110237] gcc.dg/torture/pr58955-2.c is miscompiled by RTL scheduling after reload
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110237 --- Comment #21 from Alexander Monakov --- (In reply to rguent...@suse.de from comment #19) > But the size argument doesn't have anything to do with TBAA (and > may_alias is about TBAA). I don't think we have any way to circumvent > C object access rules. That is, for example, with -fno-strict-aliasing > the following isn't going to work. > > int a; > int b; > > int main() > { > a = 1; > b = 2; > if ( + 1 == ) // equality compare of unrelated pointers OK > { > long x = *(long *) // access outside of 'a' not OK > if (x != 0x00010002) > abort (); > } > } > > there's no command-line flag or attribute to form a pointer > to an object composing 'a' and 'b' besides changing how the > storage is declared. But store-merging and SLP can introduce a wide long-sized access where on source level you had two adjacent loads or even memcpy's, so we really seem to have a problem here and might need to be able to annotate types or individual accesses as "may-alias-with-oob-ok" in the IR: PR 110431.
[Bug middle-end/110431] New: Incorrect disambiguation of wide accesess from store-merging or SLP
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110431 Bug ID: 110431 Summary: Incorrect disambiguation of wide accesess from store-merging or SLP Product: gcc Version: 12.3.0 Status: UNCONFIRMED Keywords: wrong-code Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: amonakov at gcc dot gnu.org Target Milestone: --- Inspired by bug 110237 comment 19: int b, a; int main() { int *pa = , *pb = asm("" : "+r"(pa)); asm("" : "+r"(pb)); if (pa + 1 == pb) { a = 1, b = 2; long x; __builtin_memcpy(, pa, 4); __builtin_memcpy(4 + (char *), pa+1, 4); return (x - 0x00020001) * 131 >> 32; } } https://godbolt.org/z/b67zxMv54 On GIMPLE, both store-merging and SLP vectorization are capable of introducing merged long-sized access in place of individual int-sized memcpy's, which is then disambiguated against initial stores on the RTL level, leading to a miscompilation.
[Bug middle-end/110379] Unnecessary copies after early opts
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110379 --- Comment #3 from Jan Hubicka --- I thought that ADDR_EXPR of refenence is just fancy way to represent NOP_EXPR or POINTER_PLUS in today gimple. How that affects builtin_object_size? :) However I think ipa-sra will eventually need to handel also ADDR_EXPR that correspnds to non-zero offset
[Bug tree-optimization/110414] [14 Regression] Dead Code Elimination Regression since r14-1127-g9e2017ae6ac
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110414 Richard Biener changed: What|Removed |Added Status|UNCONFIRMED |NEW Ever confirmed|0 |1 Last reconfirmed||2023-06-27 --- Comment #1 from Richard Biener --- DOM3 removes the call in GCC 13 and the cause looks pretty similar to PR110413, we lose a __builtin_unreachable () early.
[Bug tree-optimization/110413] [14 Regression] Missed Dead Code Elimination when using __builtin_unreachable since r14-1880-g827e208fa64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110413 Richard Biener changed: What|Removed |Added Status|UNCONFIRMED |NEW Ever confirmed|0 |1 Known to work||11.4.0, 12.3.0 Last reconfirmed||2023-06-27 --- Comment #1 from Richard Biener --- GCC 13 eliminates the call in DOM3. The differences into that pass are quite big with trunk having performed many more optimizations, in particular we have elided the __builtin_unreachable () call already in DOM2 where the first differences appears. We now manage to simplify the _24 != 0 branch. @@ -79,22 +80,15 @@ _22 = _2 <= _23 = _2 != 0B; _24 = _23 & _22; - if (_24 != 0) -goto ; [100.00%] - else -goto ; [0.00%] + goto ; [100.00%] [local count: 850510901]: k = _2; _3 = 1; _1 = 1; _15 = 1; - goto ; [100.00%] - - [count: 0]: - __builtin_unreachable (); - [local count: 955630225]: + [local count: 955630225]: So we manage to optimize [local count: 105119324]: k = _2; _22 = _2 <= _23 = _2 != 0B; _24 = _23 & _22; if (_24 != 0) goto ; [100.00%] else goto ; [0.00%] [local count: 850510901]: k = _2; _3 = _2 <= _1 = _2 != 0B; _15 = _1 & _3; if (_15 != 0) goto ; [100.00%] else goto ; [0.00%] [count: 0]: __builtin_unreachable (); [local count: 955630225]: # h.4_25 = PHI _4 = h.4_25 + -1; h = _4; h.4_5 = h; if (h.4_5 != 0) goto ; [89.00%] It seems we're now doing this based on the exported global range table since we correctly first arrive at Optimizing block #3 Optimizing statement k = _2; LKUP STMT k = _2 with .MEM_11 LKUP STMT _2 = k with .MEM_11 LKUP STMT _2 = k with .MEM_21 2>>> STMT _2 = k with .MEM_21 Optimizing statement _22 = _2 <= LKUP STMT _22 = _2 le_expr 2>>> STMT _22 = _2 le_expr LKUP STMT _2 ge_expr Optimizing statement _23 = _2 != 0B; LKUP STMT _23 = _2 ne_expr 0B 2>>> STMT _23 = _2 ne_expr 0B Optimizing statement _24 = _23 & _22; LKUP STMT _24 = _23 bit_and_expr _22 2>>> STMT _24 = _23 bit_and_expr _22 Optimizing statement if (_24 != 0) Visiting conditional with predicate: if (_24 != 0) With known ranges _24: [irange] _Bool VARYING Predicate evaluates to: DON'T KNOW but then we register global ranges from the __builtin_unreachable () CFG: # RANGE [irange] _Bool [1, 1] _24 = _23 & _22; and CFG cleanup scheduled by DOM does static bool cleanup_control_expr_graph (basic_block bb, gimple_stmt_iterator gsi) { ... case GIMPLE_COND: { gimple_match_op res_op; if (gimple_simplify (stmt, _op, NULL, no_follow_ssa_edges, no_follow_ssa_edges) && res_op.code == INTEGER_CST) val = res_op.ops[0]; which now picks this up and elides the branch. For some reason the "dead" stmts [local count: 118111600]: # PT = nonlocal null _2 = j (); i = _2; k = _2; # RANGE [irange] _Bool [1, 1] _22 = _2 <= # RANGE [irange] _Bool [1, 1] _23 = _2 != 0B; # RANGE [irange] _Bool [1, 1] _24 = _23 & _22; allow us to optimize k = _2; # PT = nonlocal escaped null k.5_6 = k; _16 = k.5_6 == _17 = k.5_6 != 0B; _18 = _17 | _16; if (_18 != 0) via Optimizing statement _16 = k.5_6 == Replaced 'k.5_6' with variable '_2' LKUP STMT _16 = _2 eq_expr 2>>> STMT _16 = _2 eq_expr Optimizing statement _17 = k.5_6 != 0B; Replaced 'k.5_6' with variable '_2' LKUP STMT _17 = _2 ne_expr 0B FIND: _23 Replaced redundant expr '_2 != 0B' with '_23' ASGN _17 = _23 Optimizing statement _18 = _17 | _16; Replaced '_17' with variable '_23' Folded to: _18 = _16 | _23; LKUP STMT _18 = _16 bit_ior_expr _23 2>>> STMT _18 = _16 bit_ior_expr _23 Optimizing statement if (_18 != 0) Replaced '_18' with constant '1' so it's kind of a missed optimization in the first DOM that elides the stmts and the inability of ranger to capture the relations in the global ranges.
[Bug target/110406] d: Wrong code-gen returning POD structs by value
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110406 --- Comment #12 from Iain Sandoe --- OTOH there was a second issue with zero-sized objects which was fixed thus: diff --git a/gcc/d/types.cc b/gcc/d/types.cc index a1f69bb02b7..020cc7de83f 100644 --- a/gcc/d/types.cc +++ b/gcc/d/types.cc @@ -581,6 +581,11 @@ finish_aggregate_mode (tree type) { for (tree field = TYPE_FIELDS (type); field; field = DECL_CHAIN (field)) { + /* Fields of type `typeof(*null)' have no size, so let them force the +record type mode to be computed as BLKmode. */ + if (TYPE_MAIN_VARIANT (TREE_TYPE (field)) == noreturn_type_node) + break; + if (DECL_SIZE (field) == NULL_TREE) return; }
[Bug c/110430] New: Fail to CSE for LEN_MASK_STORE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110430 Bug ID: 110430 Summary: Fail to CSE for LEN_MASK_STORE Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: juzhe.zhong at rivai dot ai Target Milestone: --- Consider this following case: void __attribute__((noinline,noclone)) foo (int *out, int *res) { int mask[] = { 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1 }; int i; for (i = 0; i < 16; ++i) { if (mask[i]) out[i] = i; } int o0 = out[0]; int o7 = out[7]; int o14 = out[14]; int o15 = out[15]; res[0] = o0; res[2] = o7; res[4] = o14; res[6] = o15; } -O3 -march=rv64gcv_zvl512b --param riscv-autovec-preference=fixed-vlmax Current RVV auto-vectorization codegen: foo: lui a5,%hi(.LANCHOR0) vsetivlizero,16,e32,m1,ta,ma addia5,a5,%lo(.LANCHOR0) vid.v v1 vlm.v v0,0(a5) vsetvli a5,zero,e32,m1,ta,ma vse32.v v1,0(a0),v0.t lw a2,0(a0) lw a3,28(a0) lw a4,56(a0) lw a5,60(a0) sw a2,0(a1) sw a3,8(a1) sw a4,16(a1) sw a5,24(a1) ret However, with this patch: https://patchwork.sourceware.org/project/gcc/patch/20230627064737.16257-1-juzhe.zh...@rivai.ai/ We will end up with better codegen with CSE: foo: lui a5,%hi(.LANCHOR0) vsetivlizero,16,e32,m1,ta,ma addia5,a5,%lo(.LANCHOR0) vid.v v1 vlm.v v0,0(a5) vsetvli a5,zero,e32,m1,ta,ma vse32.v v1,0(a0),v0.t lw a4,0(a0) lw a5,56(a0) sw a4,0(a1) sw a5,16(a1) li a4,7 li a5,15 sw a4,8(a1) sw a5,24(a1) ret 2 "lw" should be CSE into 2 "li" instructions, gimple IR: .LEN_MASK_STORE (out_10(D), 32B, 16, { 0, -1, 0, -1, 0, -1, 0, -1, 0, -1, 0, -1, 0, -1, 0, -1 }, { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 }, 0); o0_11 = *out_10(D); o14_13 = MEM[(int *)out_10(D) + 56B]; *res_15(D) = o0_11; MEM[(int *)res_15(D) + 8B] = 7; MEM[(int *)res_15(D) + 16B] = o14_13; MEM[(int *)res_15(D) + 24B] = 15; mask ={v} {CLOBBER(eol)}; Since after discussion with Richi, this current possible fix patch can only hanlde VLS (fixed-length) vectors, can not handle VLA (variable-length) vectors. It's hard for us to create a C code testcase to produce CSE opportunity for VL vectors. So, open a BUG for now to make me won't forget such issue. Will enhance LEN_MASK_STORE in CSE after I finished all RVV auto-vectorization support.
[Bug target/110429] New: Redundant vector extract instruction on P9
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110429 Bug ID: 110429 Summary: Redundant vector extract instruction on P9 Product: gcc Version: 13.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: guihaoc at gcc dot gnu.org Target Milestone: --- //test.c #include void extract_int_2 (int *p, vector int a) { *p = vec_extract (a, 2); } On P9 LE, it generates xxextractuw 34,34,4 stxsiwx 34,0,3 The xxextractuw is unnecessary as the extracted int is just at word[1].
[Bug middle-end/106081] missed vectorization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106081 --- Comment #7 from rsandifo at gcc dot gnu.org --- I don't think the splat creates a new layout, but instead a splat should be allowed to change its layout at zero cost.
[Bug target/110406] d: Wrong code-gen returning POD structs by value
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110406 Iain Sandoe changed: What|Removed |Added CC||iains at gcc dot gnu.org --- Comment #11 from Iain Sandoe --- If I remember correctly, the underlying issue is that D always has a vtable pointer for a "class" whereas C++ only adds one if needed (i.e. there are actual virtual methods) So we really need to use the 'struct' tag to D for classes without virtual methods that need to interoperate with C++. I think that then D will lay them out without the vtable pointer. We had a fix for this for Darwin - which does not seem to have made upstream just yet. Restesting (there's an unrelated bootstrap regression to work around).
[Bug testsuite/110419] [14 regression] new test case gfortran.dg/value_9.f90 in r14-2050-gd130ae8499e0c6 fails
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110419 Richard Biener changed: What|Removed |Added Keywords||testsuite-fail Component|other |testsuite Target Milestone|--- |14.0
[Bug tree-optimization/110428] missed CSE with VLA vectors
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110428 Richard Biener changed: What|Removed |Added Target||aarch64 Keywords||missed-optimization --- Comment #1 from Richard Biener --- On x86_64 for example with -march=znver4 we can perform the required CSE.
[Bug tree-optimization/110428] New: missed CSE with VLA vectors
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110428 Bug ID: 110428 Summary: missed CSE with VLA vectors Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: rguenth at gcc dot gnu.org Target Milestone: --- #include void __attribute__((noinline,noclone)) foo (uint16_t *out, uint16_t *res) { int mask[] = { 0, 1, 1, 1, 1, 1, 1, 1 }; int i; for (i = 0; i < 8; ++i) { if (mask[i]) out[i] = 33; } uint16_t o0 = out[0]; uint16_t o7 = out[3]; uint16_t o14 = out[6]; uint16_t o15 = out[7]; res[0] = o0; res[2] = o7; res[4] = o14; res[6] = o15; } With -march=armv9.3-a -O3 -g0 -fno-vect-cost-model we fail to CSE the out[] loads after vectorization.
[Bug c/110427] a
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110427 Andreas Schwab changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |INVALID --- Comment #2 from Andreas Schwab --- The behaviour is undefined.
[Bug middle-end/106081] missed vectorization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106081 Richard Biener changed: What|Removed |Added CC||rsandifo at gcc dot gnu.org --- Comment #6 from Richard Biener --- So what's interesting is that we now get as of r14-2117-gdd86a5a69cbda4 the following. The odd thing is that we fail to eliminate the load permutation { 3 2 1 0 } even though this is a reduction group. I _suppose_ the reason is the { 0 0 0 0 } load permutation (the "splat") which we don't "support". In vect_optimize_slp_pass::start_choosing_layouts there's if (SLP_TREE_LOAD_PERMUTATION (node).exists ()) { /* If splitting out a SLP_TREE_LANE_PERMUTATION can make the node unpermuted, record a layout that reverses this permutation. We would need more work to cope with loads that are internally permuted and also have inputs (such as masks for IFN_MASK_LOADs). */ gcc_assert (partition.layout == 0 && !m_slpg->vertices[node_i].succ); if (!STMT_VINFO_GROUPED_ACCESS (dr_stmt)) continue; which means we'll keep the permute there (well, that's OK - any permute of the permute will retain it ...). I suspect this prevents the optimization here. Massaging start_choosing_layouts to allow a splat on element zero for a non-grouped access breaks things as we try to move that permute. So I guess this needs a new kind of layout constraint? The permute can absorb any permute but we cannot "move" it. Richard? t.c:14:18: note: === scheduling SLP instances === t.c:14:18: note: Vectorizing SLP tree: t.c:14:18: note: node 0x4304170 (max_nunits=16, refcnt=2) vector(4) double t.c:14:18: note: op template: _21 = _20 + results$d_60; t.c:14:18: note:stmt 0 _21 = _20 + results$d_60; t.c:14:18: note:stmt 1 _17 = _16 + results$c_58; t.c:14:18: note:stmt 2 _13 = _12 + results$b_56; t.c:14:18: note:stmt 3 _9 = _8 + results$a_54; t.c:14:18: note:children 0x43041f8 0x4304418 t.c:14:18: note: node 0x43041f8 (max_nunits=16, refcnt=1) vector(4) double t.c:14:18: note: op template: _20 = _1 * _19; t.c:14:18: note:stmt 0 _20 = _1 * _19; t.c:14:18: note:stmt 1 _16 = _1 * _15; t.c:14:18: note:stmt 2 _12 = _1 * _11; t.c:14:18: note:stmt 3 _8 = _1 * _7; t.c:14:18: note:children 0x4304280 0x4304308 t.c:14:18: note: node 0x4304280 (max_nunits=4, refcnt=1) vector(4) double t.c:14:18: note: op template: _1 = *k_50; t.c:14:18: note:stmt 0 _1 = *k_50; t.c:14:18: note:stmt 1 _1 = *k_50; t.c:14:18: note:stmt 2 _1 = *k_50; t.c:14:18: note:stmt 3 _1 = *k_50; t.c:14:18: note:load permutation { 0 0 0 0 } t.c:14:18: note: node 0x4304308 (max_nunits=16, refcnt=1) vector(4) double t.c:14:18: note: op template: _19 = (double) _18; t.c:14:18: note:stmt 0 _19 = (double) _18; t.c:14:18: note:stmt 1 _15 = (double) _14; t.c:14:18: note:stmt 2 _11 = (double) _10; t.c:14:18: note:stmt 3 _7 = (double) _6; t.c:14:18: note:children 0x4304390 t.c:14:18: note: node 0x4304390 (max_nunits=16, refcnt=1) vector(16) short int t.c:14:18: note: op template: _18 = _5->d; t.c:14:18: note:stmt 0 _18 = _5->d; t.c:14:18: note:stmt 1 _14 = _5->c; t.c:14:18: note:stmt 2 _10 = _5->b; t.c:14:18: note:stmt 3 _6 = _5->a; t.c:14:18: note:load permutation { 3 2 1 0 } t.c:14:18: note: node 0x4304418 (max_nunits=4, refcnt=1) vector(4) double t.c:14:18: note: op template: results$d_60 = PHI <_21(5), 0.0(6)> t.c:14:18: note:stmt 0 results$d_60 = PHI <_21(5), 0.0(6)> t.c:14:18: note:stmt 1 results$c_58 = PHI <_17(5), 0.0(6)> t.c:14:18: note:stmt 2 results$b_56 = PHI <_13(5), 0.0(6)> t.c:14:18: note:stmt 3 results$a_54 = PHI <_9(5), 0.0(6)> t.c:14:18: note:children 0x4304170 (nil)
[Bug c/110427] a
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110427 Arsen Arsenović changed: What|Removed |Added CC||arsen at gcc dot gnu.org --- Comment #1 from Arsen Arsenović --- : In function 'main': :4:18: warning: operation on 'a' may be undefined [-Wsequence-point] 4 | if (a < a--) { | ~^~ the result is simply undefined (is the first `a' pre- or post-decrement?)
[Bug c/110427] New: a
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110427 Bug ID: 110427 Summary: a int main() { int a = 0; if (a < a--) { a = 1; } printf("%d\n", a); return 0; }
[Bug middle-end/106081] missed vectorization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106081 Bug 106081 depends on bug 96208, which changed state. Bug 96208 Summary: non-grouped load can be SLP vectorized for 2-element vectors case https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96208 What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED
[Bug tree-optimization/53947] [meta-bug] vectorizer missed-optimizations
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947 Bug 53947 depends on bug 96208, which changed state. Bug 96208 Summary: non-grouped load can be SLP vectorized for 2-element vectors case https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96208 What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED
[Bug tree-optimization/96208] non-grouped load can be SLP vectorized for 2-element vectors case
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96208 Richard Biener changed: What|Removed |Added Resolution|--- |FIXED Status|ASSIGNED|RESOLVED --- Comment #5 from Richard Biener --- Fixed.
[Bug tree-optimization/96208] non-grouped load can be SLP vectorized for 2-element vectors case
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96208 --- Comment #4 from CVS Commits --- The master branch has been updated by Richard Biener : https://gcc.gnu.org/g:dd86a5a69cbda40cf76388a65d3317c91cb2b501 commit r14-2117-gdd86a5a69cbda40cf76388a65d3317c91cb2b501 Author: Richard Biener Date: Thu Jun 22 11:40:46 2023 +0200 tree-optimization/96208 - SLP of non-grouped loads The following extends SLP discovery to handle non-grouped loads in loop vectorization in the case the same load appears in all lanes. Code generation is adjusted to mimick what we do for the case of single element interleaving (when the load is not unit-stride) which is already handled by SLP. There are some limits we run into because peeling for gap cannot cover all cases and we choose VMAT_CONTIGUOUS. The patch does not try to address these issues yet. The main obstacle is that these loads are not STMT_VINFO_GROUPED_ACCESS and that's a new thing with SLP. I know from the past that it's not a good idea to make them grouped. Instead the following massages places to deal with SLP loads that are not STMT_VINFO_GROUPED_ACCESS. There's already a testcase testing for the case the PR is after, just XFAILed, the following adjusts that instead of adding another. I do expect to have missed some so I don't plan to push this on a Friday. Still there may be feedback, so posting this now. Bootstrapped and tested on x86_64-unknown-linux-gnu. PR tree-optimization/96208 * tree-vect-slp.cc (vect_build_slp_tree_1): Allow a non-grouped load if it is the same for all lanes. (vect_build_slp_tree_2): Handle not grouped loads. (vect_optimize_slp_pass::remove_redundant_permutations): Likewise. (vect_transform_slp_perm_load_1): Likewise. * tree-vect-stmts.cc (vect_model_load_cost): Likewise. (get_group_load_store_type): Likewise. Handle invariant accesses. (vectorizable_load): Likewise. * gcc.dg/vect/slp-46.c: Adjust for new vectorizations. * gcc.dg/vect/bb-slp-pr65935.c: Adjust.
[Bug middle-end/110377] Early VRP and IPA-PROP should work out value ranges from __builtin_unreachable
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110377 Jan Hubicka changed: What|Removed |Added Last reconfirmed||2023-06-27 Ever confirmed|0 |1 Status|UNCONFIRMED |NEW --- Comment #5 from Jan Hubicka --- OK, I think we want to use ranger in the analysis stage then. I am testing the following. diff --git a/gcc/ipa-prop.cc b/gcc/ipa-prop.cc index 704fe01b02c..4bc142e1471 100644 --- a/gcc/ipa-prop.cc +++ b/gcc/ipa-prop.cc @@ -2339,7 +2339,8 @@ ipa_set_jfunc_vr (ipa_jump_func *jf, value_range *tmp) static void ipa_compute_jump_functions_for_edge (struct ipa_func_body_info *fbi, -struct cgraph_edge *cs) +struct cgraph_edge *cs, +gimple_ranger *ranger) { ipa_node_params *info = ipa_node_params_sum->get (cs->caller); ipa_edge_args *args = ipa_edge_args_sum->get_create (cs); @@ -2384,7 +2385,7 @@ ipa_compute_jump_functions_for_edge (struct ipa_func_body_info *fbi, if (TREE_CODE (arg) == SSA_NAME && param_type - && get_range_query (cfun)->range_of_expr (vr, arg) + && get_range_query (cfun)->range_of_expr (vr, arg, cs->call_stmt) && vr.nonzero_p ()) addr_nonzero = true; else if (tree_single_nonzero_warnv_p (arg, _overflow)) @@ -2407,7 +2408,7 @@ ipa_compute_jump_functions_for_edge (struct ipa_func_body_info *fbi, integers and pointers. */ && irange::supports_p (TREE_TYPE (arg)) && irange::supports_p (param_type) - && get_range_query (cfun)->range_of_expr (vr, arg) + && ranger->range_of_expr (vr, arg, cs->call_stmt) && !vr.undefined_p ()) { value_range resvr = vr; @@ -2516,7 +2517,8 @@ ipa_compute_jump_functions_for_edge (struct ipa_func_body_info *fbi, from BB. */ static void -ipa_compute_jump_functions_for_bb (struct ipa_func_body_info *fbi, basic_block bb) +ipa_compute_jump_functions_for_bb (struct ipa_func_body_info *fbi, basic_block bb, + gimple_ranger *ranger) { struct ipa_bb_info *bi = ipa_get_bb_info (fbi, bb); int i; @@ -2535,7 +2537,7 @@ ipa_compute_jump_functions_for_bb (struct ipa_func_body_info *fbi, basic_block b && !gimple_call_fnspec (cs->call_stmt).known_p ()) continue; } - ipa_compute_jump_functions_for_edge (fbi, cs); + ipa_compute_jump_functions_for_edge (fbi, cs, ranger); } } @@ -3109,19 +3111,27 @@ class analysis_dom_walker : public dom_walker { public: analysis_dom_walker (struct ipa_func_body_info *fbi) -: dom_walker (CDI_DOMINATORS), m_fbi (fbi) {} +: dom_walker (CDI_DOMINATORS), m_fbi (fbi) + { +m_ranger = enable_ranger (cfun, false); + } + ~analysis_dom_walker () + { +disable_ranger (cfun); + } edge before_dom_children (basic_block) final override; private: struct ipa_func_body_info *m_fbi; + gimple_ranger *m_ranger; }; edge analysis_dom_walker::before_dom_children (basic_block bb) { ipa_analyze_params_uses_in_bb (m_fbi, bb); - ipa_compute_jump_functions_for_bb (m_fbi, bb); + ipa_compute_jump_functions_for_bb (m_fbi, bb, m_ranger); return NULL; } diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr110377.c b/gcc/testsuite/gcc.dg/tree-ssa/pr110377.c new file mode 100644 index 000..d770f8babba --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr110377.c @@ -0,0 +1,17 @@ +/* { dg-do compile */ +/* { dg-options "-O2 -fdump-ipa-fnsummary" } */ +int test3(int); +__attribute__ ((noinline)) +void test2(int a) +{ + test3(a); +} +void +test(int n) +{ +if (n > 5) + __builtin_unreachable (); +test2(n); +} +/* { dg-final { scan-tree-dump "-INF, 5-INF" "fnsummary" } } */
[Bug rtl-optimization/110237] gcc.dg/torture/pr58955-2.c is miscompiled by RTL scheduling after reload
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110237 --- Comment #20 from CVS Commits --- The master branch has been updated by hongtao Liu : https://gcc.gnu.org/g:dbf8ab449417aa24669f6ccf50be8c17f8c1278e commit r14-2116-gdbf8ab449417aa24669f6ccf50be8c17f8c1278e Author: liuhongt Date: Mon Jun 26 21:07:09 2023 +0800 Refine maskstore patterns with UNSPEC_MASKMOV. Similar like r14-2070-gc79476da46728e If mem_addr points to a memory region with less than whole vector size bytes of accessible memory and k is a mask that would prevent reading the inaccessible bytes from mem_addr, add UNSPEC_MASKMOV to prevent it to be transformed to any other whole memory access instructions. gcc/ChangeLog: PR rtl-optimization/110237 * config/i386/sse.md (_store_mask): Refine with UNSPEC_MASKMOV. (maskstore_store_mask): New define_insn, it's renamed from original _store_mask.
[Bug rtl-optimization/110237] gcc.dg/torture/pr58955-2.c is miscompiled by RTL scheduling after reload
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110237 --- Comment #19 from rguenther at suse dot de --- On Mon, 26 Jun 2023, amonakov at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110237 > > --- Comment #18 from Alexander Monakov --- > (In reply to rguent...@suse.de from comment #17) > > Yes, we do the same to loads. I hope that's not a common technique > > though but I have to admit the vectorizer itself assesses whether it's > > safe to access "gaps" by looking at alignment so its code generation > > is prone to this same "mistake". > > > > Now, is "alignment to 16 is ensured externally" good enough here? > > If we consider > > > > static int a[2]; > > > > and code doing > > > > if (is_aligned (a)) > >{ > > __v4si v = (__attribute__((may_alias)) __v4si *) > >} > > > > then we cannot even use a DECL_ALIGN that's insufficient for decls > > that bind locally. > > I agree. I went with the 'extern' example because there it should be more > obvious the construction ought to work. > > > > Note we have similar arguments with aggregate type sizes (and TBAA) > > where when we infer a dynamic type from one access we check if > > the other access would fit. Wouldn't the above then extend to that > > as well given we could also do aggregate copies of "padding" and > > ignore the bits if we'd have ensured the larger access wouldn't trap? > > I think a read via a may_alias type just tells you that N bytes are accessible > for reading, not necessarily for writing. So I don't see a problem, but maybe > I > didn't quite catch what you are saying. I wasn't sure how to phrase, what I was saying is we have this "the access is too large for the object in consideration, so it cannot alias it" in places where we just work with types within the TBAA framework. So I wondered if one can construct a similar case to support that we should not do this. (tree-ssa-alias.cc: aliasing_component_refs_p) > > > So supporting the above might be a bit of a stretch (though I think > > we have to fix the vectorizer here). > > What would the solution be? Using a may_alias type for such accesses? But the size argument doesn't have anything to do with TBAA (and may_alias is about TBAA). I don't think we have any way to circumvent C object access rules. That is, for example, with -fno-strict-aliasing the following isn't going to work. int a; int b; int main() { a = 1; b = 2; if ( + 1 == ) // equality compare of unrelated pointers OK { long x = *(long *) // access outside of 'a' not OK if (x != 0x00010002) abort (); } } there's no command-line flag or attribute to form a pointer to an object composing 'a' and 'b' besides changing how the storage is declared. I don't think we should make an exception for "padding" after an object and I don't see any sensible way how to constrain the size of the supported "padding" either? Pad to the largest possible alignment of the object? That would be MAX_OFILE_ALIGNMENT ... > > > > > If the v4si store is masked we cannot do this anymore, but the IL > > > > we seed the alias oracle with doesn't know the store is partial. > > > > The only way to "fix" it is to take away all of the information from it. > > > > > > But that won't fix the trapping issue? I think we need a distinct RTX for > > > memory accesses where hardware does fault suppression for masked-out > > > elements. > > > > Yes, it doesn't fix that part. The idea of using BLKmode instead of > > a vector mode for the MEMs would, I guess, together with specifying > > MEM_SIZE as not known. > > Unfortunate if that works for the trapping side, but not for the > aliasing side. It should work for both I think, but MEM_EXPR would need changing as well - we do have a perfectly working representation there, it would just be the first CALL_EXPR in such context ...
[Bug ada/110398] internal error on call with parameter of predicated subtype
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110398 Eric Botcazou changed: What|Removed |Added Status|WAITING |NEW Summary|Program_Error |internal error on call with |sem_eval.adb:4635 explicit |parameter of predicated |raise |subtype --- Comment #4 from Eric Botcazou --- Thanks. Confirmed on the mainline with an assertion failure: eric@fomalhaut:~/build/gcc/native> gcc/gnat1 -quiet example.adb +===GNAT BUG DETECTED==+ | 14.0.0 20230626 (experimental) [master r14-2083-g068eba260fa] (x86_64-suse-linux) | | Assert_Failure sem.adb:650 | | Error detected at example.adb:3:42 | | Compiling example.adb
[Bug ipa/110334] [13/14 Regresssion] unused functions not eliminated before LTO streaming
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110334 --- Comment #12 from rguenther at suse dot de --- On Mon, 26 Jun 2023, hubicka at ucw dot cz wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110334 > > --- Comment #11 from Jan Hubicka --- > Hi, > what about this. It should make at least quite basic inlining to happen > to always_inline. I do not think many critical always_inlines have > indirect calls in them. The test for lto is quite bad and I can > work on solving this incrementally (it would be nice to have this > tested and possibly backport it). > > diff --git a/gcc/ipa-inline.cc b/gcc/ipa-inline.cc > index efc8df7d4e0..dcec07e49e1 100644 > --- a/gcc/ipa-inline.cc > +++ b/gcc/ipa-inline.cc > @@ -702,6 +702,38 @@ can_early_inline_edge_p (struct cgraph_edge *e) >if (!can_inline_edge_p (e, true, true) >|| !can_inline_edge_by_limits_p (e, true, false, true)) > return false; > + /* When inlining regular functions into always-inline functions > + during early inlining watch for possible inline cycles. */ > + if (DECL_DISREGARD_INLINE_LIMITS (caller->decl) > + && lookup_attribute ("always_inline", DECL_ATTRIBUTES (caller->decl)) > + && (!DECL_DISREGARD_INLINE_LIMITS (callee->decl) > + || !lookup_attribute ("always_inline", DECL_ATTRIBUTES > (callee->decl > +{ > + /* If there are indirect calls, inlining may produce direct call. > +TODO: We may lift this restriction if we avoid errors on formely > +indirect calls to always_inline functions. Taking address > +of always_inline function is generally bad idea and should > +have been declared as undefined, but sadly we allow this. */ > + if (caller->indirect_calls || e->callee->indirect_calls) why disallow caller->indirect_calls? > + return false; > + for (cgraph_edge *e2 = callee->callees; e2; e2 = e2->next_callee) I don't think this flys - it looks quadratic. Can we compute this in the inline summary once instead? As for indirect calls, can we maybe mark initial direct GIMPLE call stmts as "always-inline" and only look at that marking, thus an indirect call will never become "always-inline"? Iff cgraph edges prevail during all early inlining we could mark call edges for this purpose? > + { > + struct cgraph_node *callee2 = e2->callee->ultimate_alias_target (); > + /* As early inliner runs in RPO order, we will see uninlined > +always_inline calls only in the case of cyclic graphs. */ > + if (DECL_DISREGARD_INLINE_LIMITS (callee2->decl) > + || lookup_attribute ("always_inline", callee2->decl)) > + return false; > + /* With LTO watch for case where function is later replaced > +by always_inline definition. > +TODO: We may either stop treating noninlined cross-module always > +inlines as errors, or we can extend decl merging to produce > +syntacic alias and honor always inline only in units it has > +been declared as such. */ > + if (flag_lto && callee2->externally_visible) > + return false; > + } > +} >return true; > } > > @@ -3034,18 +3066,7 @@ early_inliner (function *fun) > >if (!optimize >|| flag_no_inline > - || !flag_early_inlining > - /* Never inline regular functions into always-inline functions > -during incremental inlining. This sucks as functions calling > -always inline functions will get less optimized, but at the > -same time inlining of functions calling always inline > -function into an always inline function might introduce > -cycles of edges to be always inlined in the callgraph. > - > -We might want to be smarter and just avoid this type of inlining. */ > - || (DECL_DISREGARD_INLINE_LIMITS (node->decl) > - && lookup_attribute ("always_inline", > - DECL_ATTRIBUTES (node->decl > + || !flag_early_inlining) > ; >else if (lookup_attribute ("flatten", > DECL_ATTRIBUTES (node->decl)) != NULL) > >
[Bug tree-optimization/110414] [14 Regression] Dead Code Elimination Regression since r14-1127-g9e2017ae6ac
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110414 Richard Biener changed: What|Removed |Added Target Milestone|--- |14.0 Blocks||109849 Keywords||missed-optimization Referenced Bugs: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109849 [Bug 109849] suboptimal code for vector walking loop
[Bug tree-optimization/110413] [14 Regression] Missed Dead Code Elimination when using __builtin_unreachable since r14-1880-g827e208fa64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110413 Richard Biener changed: What|Removed |Added Keywords||missed-optimization Target Milestone|--- |14.0 Blocks||110269 Referenced Bugs: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110269 [Bug 110269] [13 Regression] Missed Dead Code Elimination when using __builtin_unreachable since r13-4607-g2dc5d6b1e7e
[Bug target/82735] _mm256_zeroupper does not invalidate previously computed registers
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82735 --- Comment #21 from CVS Commits --- The master branch has been updated by hongtao Liu : https://gcc.gnu.org/g:a90f558bbb87c0b5d2b1e07d55bd585b2285cf3d commit r14-2114-ga90f558bbb87c0b5d2b1e07d55bd585b2285cf3d Author: liuhongt Date: Mon Jun 26 13:59:29 2023 +0800 Don't issue vzeroupper for vzeroupper call_insn. gcc/ChangeLog: PR target/82735 * config/i386/i386.cc (ix86_avx_u127_mode_needed): Don't emit vzeroupper for vzeroupper call_insn. gcc/testsuite/ChangeLog: * gcc.target/i386/avx-vzeroupper-30.c: New test.
[Bug rtl-optimization/110215] RA fails to allocate register when loop invariant lives across calls and eh
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110215 --- Comment #6 from Hongyu Wang --- Thanks for the fix, now for the attached test, main loop will not have any load. There is a remaining issue that the loop epilogue still contains load from stack and constant pool .L9: movslq %edx, %rax movss 72(%rsp), %xmm5 salq$2, %rax leaq(%rbx,%rax), %rcx movaps %xmm5, %xmm1 subss (%rcx), %xmm1 andps .LC4(%rip), %xmm1 movss %xmm1, (%rcx) leal1(%rdx), %ecx addss %xmm1, %xmm0 cmpl%ecx, %r12d jle .L8 IRA dump shows the pseudos does not have conflict but they still failed to be allocated with register. This issue does not exist on aarch64.