[Bug tree-optimization/106322] [12/13 Regression] tree-vectorize: Wrong code at O2 level (-fno-tree-vectorize is working) since r12-2404-ga1d27560770818c5
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106322 --- Comment #38 from Kewen Lin --- Created attachment 53428 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53428=edit untested patch A untested patch which can make it pass.
[Bug tree-optimization/106322] [12/13 Regression] tree-vectorize: Wrong code at O2 level (-fno-tree-vectorize is working) since r12-2404-ga1d27560770818c5
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106322 --- Comment #37 from Kewen Lin --- (In reply to Andrew Pinski from comment #36) > You might need to do -O2 -fPIE -pie to reproduce the issue as debian is > configured with --enable-default-pie Thanks for the hint! I can reproduce this but it needs one more explicit cpu type like -mcpu=power4/5/6. The problem comes from slp1, so -fno-tree-slp-vectorize can make it pass. It seems to expose one latent issue, for the code in vect_recog_mulhs_pattern: vect_pattern_detected ("vect_recog_mulhs_pattern", last_stmt); /* Check for target support. */ tree new_vectype = get_vectype_for_scalar_type (vinfo, new_type); if (!new_vectype || !direct_internal_fn_supported_p (ifn, new_vectype, OPTIMIZE_FOR_SPEED)) return NULL; At this time, the new_vectype is (gdb) pge new_vectype vector(2) short unsigned int the current target doesn't support umul_highpart optab for V2HImode at all, but the check doesn't fail since in the function direct_optab_supported_p static bool direct_optab_supported_p (direct_optab optab, tree_pair types, optimization_type opt_type) { machine_mode mode = TYPE_MODE (types.first); gcc_checking_assert (mode == TYPE_MODE (types.second)); return direct_optab_handler (optab, mode, opt_type) != CODE_FOR_nothing; } (gdb) pge types.first vector(2) short unsigned int (gdb) p mode $12 = E_SImode the current target does support umul_highpart optab for SImode, so it doesn't fail. But we expected to query with vector mode for the given type, it's wrong in functionality to use scalar insn for vector operation here, so this result is unexpected.
[Bug bootstrap/106472] No rule to make target '../libbacktrace/libbacktrace.la', needed by 'libgo.la'.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106472 --- Comment #14 from Petr Sumbera --- Sorry for late response. Unfortunatelly above patch dosen't make any difference. The problem is still there.
[Bug ipa/101839] [10/11/12/13 Regression] Hang in C++ code with -fdevirtualize
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101839 --- Comment #8 from Xionghu Luo (luoxhu at gcc dot gnu.org) --- The relationship is: A A::type | | | BA BA::type CACA::type | CBA CBA::type class CA and CBA are final, also function CA::type and BA::type are final, then in function possible_polymorphic_call_targets for "target" BA::type, the "DECL_FINAL_P (target)" check is not accurate enough, as there may be classes like CBA derived from BA and have instance that need continue walk recursively in possible_polymorphic_call_targets_1 to record_target_from_binfo. if (target) { /* In the case we get complete method, we don't need to walk derivations. */ if (DECL_FINAL_P (target)) context.maybe_derived_type = false; } So fix this by belong change only stop walk derivations when target is final and it's class outer_type->type is also final? diff --git a/gcc/ipa-devirt.cc b/gcc/ipa-devirt.cc index 412ca14f66b..77f9b268e86 100644 --- a/gcc/ipa-devirt.cc +++ b/gcc/ipa-devirt.cc @@ -3188,7 +3188,9 @@ possible_polymorphic_call_targets (tree otr_type, /* In the case we get complete method, we don't need to walk derivations. */ - if (target && DECL_FINAL_P (target)) + if (target && TREE_CODE (target) == FUNCTION_DECL && DECL_FINAL_P (target) + && RECORD_OR_UNION_TYPE_P (out er_type->type) + && TYPE_FINAL_P (outer_type->type)) context.speculative_maybe_derived_type = false; if (type_possibly_instantiated_p (speculative_outer_type->type)) maybe_record_node (nodes, target, , can_refer, _complete); @@ -3233,7 +3235,9 @@ possible_polymorphic_call_targets (tree otr_type, { /* In the case we get complete method, we don't need to walk derivations. */ - if (DECL_FINAL_P (target)) + if (TREE_CODE (target) == FUNCTION_DECL && DECL_FINAL_P (target) + && RECORD_OR_UNION_TYPE_P (outer_type->type) + && TYPE_FINAL_P (outer_type->type)) context.maybe_derived_type = false; }
[Bug other/106575] New: new test case gcc.dg/fold-eqandshift-4.c fails
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106575 Bug ID: 106575 Summary: new test case gcc.dg/fold-eqandshift-4.c fails Product: gcc Version: 13.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: other Assignee: unassigned at gcc dot gnu.org Reporter: seurer at gcc dot gnu.org Target Milestone: --- g:6fc14f1963dfefead588a4cd8902d641ed69255c, r13-2005-g6fc14f1963dfef make -k check-gcc RUNTESTFLAGS="dg.exp=gcc.dg/fold-eqandshift-4.c" FAIL: gcc.dg/fold-eqandshift-4.c scan-tree-dump-times optimized "return [01]" 14 FAIL: gcc.dg/fold-eqandshift-4.c scan-tree-dump-times optimized "x_[0-9]\\(D\\)" 18 # of expected passes6 # of unexpected failures2 commit 6fc14f1963dfefead588a4cd8902d641ed69255c (HEAD, refs/bisect/bad) Author: Roger Sayle Date: Tue Aug 9 18:54:43 2022 +0100 middle-end: Optimize ((X >> C1) & C2) != C3 for more cases.
[Bug target/106338] RISC-V static-chain register may be clobbered by PLT stubs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106338 --- Comment #6 from Kito Cheng --- My understanding is static chain is sort of compiler internal implementation, any register could be picked if that is not used for passing argument, so I would also prefer keep that out psABI spec for now. And just record info for myself: x86-64 ABI has document function's static chain pointer in their ABI https://gitlab.com/x86-psABIs/x86-64-ABI/-/blob/master/x86-64-ABI/low-level-sys-info.tex#L701
[Bug analyzer/106573] Missing -Wanalyzer-use-of-uninitialized-value on calls handled by state machines
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106573 David Malcolm changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED --- Comment #3 from David Malcolm --- Should be fixed by the above patch.
[Bug analyzer/106573] Missing -Wanalyzer-use-of-uninitialized-value on calls handled by state machines
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106573 --- Comment #2 from CVS Commits --- The master branch has been updated by David Malcolm : https://gcc.gnu.org/g:bddd8d86e3036e480158ba9219ee3f290ba652ce commit r13-2007-gbddd8d86e3036e480158ba9219ee3f290ba652ce Author: David Malcolm Date: Tue Aug 9 19:58:54 2022 -0400 analyzer: fix missing -Wanalyzer-use-of-uninitialized-value on special-cased functions [PR106573] We were missing checks for uninitialized params on calls to functions that the analyzer has hardcoded knowledge of - both for those that are handled just by state machines, and for those that are handled in region-model-impl-calls.cc (for those arguments for which the svalue wasn't accessed in handling the call). Fixed thusly. gcc/analyzer/ChangeLog: PR analyzer/106573 * region-model.cc (region_model::on_call_pre): Ensure that we call get_arg_svalue on all arguments. gcc/testsuite/ChangeLog: PR analyzer/106573 * gcc.dg/analyzer/error-uninit.c: New test. * gcc.dg/analyzer/fd-uninit-1.c: New test. * gcc.dg/analyzer/file-uninit-1.c: New test. Signed-off-by: David Malcolm
[Bug target/106574] gcc 12 with O3 leads to failures in glibc's y1f128 tests
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106574 --- Comment #3 from Michael Hudson-Doyle --- Certainly this could be "handled" by bumping the tolerance I guess. Not sure how to tell if that is appropriate though...
[Bug target/106574] gcc 12 with O3 leads to failures in glibc's y1f128 tests
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106574 --- Comment #2 from Andrew Pinski --- this is just 2 ulp difference ... This could be constant folding difference between GCC and what is done for _Float128 in the software. Which could mean this is a not a bug.
[Bug target/106574] gcc 12 with O3 leads to failures in glibc's y1f128 tests
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106574 --- Comment #1 from Michael Hudson-Doyle --- oops forgot the link to my glibc bug https://sourceware.org/bugzilla/show_bug.cgi?id=29463
[Bug c/106574] New: gcc 12 with O3 leads to failures in glibc's y1f128 tests
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106574 Bug ID: 106574 Summary: gcc 12 with O3 leads to failures in glibc's y1f128 tests Product: gcc Version: 12.1.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: michael.hudson at canonical dot com Target Milestone: --- Initially reported here, but more likely to be a gcc issue: if I build glibc with gcc 12 and -O3 (as is the default in Debian/Ubuntu) I get this failure: (kinetic-amd64)root@anduril:/build/glibc-EA2Jch/glibc-2.36/build-tree/amd64-libc# ./elf/ld-linux-x86-64.so.2 --library-path .:./elf:./math ./math/test-float128-y1 testing _Float128 (without inline functions) Failure: Test: y1_downward (0x1.c1badep+0) Result: is: -2.49850711930108135145795303826944004e-01 -0x1.ffb1bae4fa20118544b142160f5fp-3 should be: -2.49850711930108135145795303826943836e-01 -0x1.ffb1bae4fa20118544b142160f58p-3 difference: 1.68518870133883137142398069976181140e-34 0x1.c000p-113 ulp : 7. max.ulp : 5. Maximal error of `y1_downward' is : 7 ulp accepted: 5 ulp Test suite completed: 216 test cases plus 212 tests for exception flags and 212 tests for errno executed. 2 errors occurred. Building the e_j1f128.os object with -O2 or with gcc-11 fixes the failure. Not sure how to reduce this to a smaller test case, but I'm happy to try things.
[Bug target/106338] RISC-V static-chain register may be clobbered by PLT stubs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106338 Andrew Waterman changed: What|Removed |Added CC||andrew at sifive dot com --- Comment #5 from Andrew Waterman --- (I don't want to make the static chain register part of the RISC-V ABI; the status quo seems fine.)
[Bug analyzer/106573] Missing -Wanalyzer-use-of-uninitialized-value on calls handled by state machines
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106573 David Malcolm changed: What|Removed |Added Status|UNCONFIRMED |ASSIGNED Ever confirmed|0 |1 Last reconfirmed||2022-08-09 --- Comment #1 from David Malcolm --- I'm working on a fix for this.
[Bug c++/106207] [11/12/13 Regression] ICE in apply_fixit, at edit-context.cc:769
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106207 --- Comment #2 from Marek Polacek --- Reduced: #define FOO(no) \ void f_##no() \ { \ int gen_##no(); \ } #define GEN_FOO \ FOO(f##1) \ FOO(f##2) GEN_FOO
[Bug analyzer/106573] New: Missing -Wanalyzer-use-of-uninitialized-value on calls handled by state machines
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106573 Bug ID: 106573 Summary: Missing -Wanalyzer-use-of-uninitialized-value on calls handled by state machines Product: gcc Version: 13.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: analyzer Assignee: dmalcolm at gcc dot gnu.org Reporter: dmalcolm at gcc dot gnu.org CC: mir at gcc dot gnu.org Target Milestone: --- Consider: int dup (int old_fd); int not_dup (int old_fd); int test_1 () { int m; return dup (m); } int test_2 () { int m; return not_dup (m); } where in each function uninitialized local "m" is passed to an externally-defined function. -fanalyzer currently emits: t.c: In function ‘test_1’: t.c:8:10: warning: ‘dup’ on possibly invalid file descriptor ‘m’ [-Wanalyzer-fd-use-without-check] 8 | return dup (m); | ^~~ ‘test_1’: event 1 | |8 | return dup (m); | | ^~~ | | | | | (1) ‘m’ could be invalid | t.c: In function ‘test_2’: t.c:15:10: warning: use of uninitialized value ‘m’ [CWE-457] [-Wanalyzer-use-of-uninitialized-value] 15 | return not_dup (m); | ^~~ ‘test_2’: events 1-2 | | 14 | int m; | | ^ | | | | | (1) region created on stack here | 15 | return not_dup (m); | | ~~~ | | | | | (2) use of uninitialized value ‘m’ here | where it only complains about uninit m being passed to not_dup. Looks like we're missing a check for poisoned svalues as params for the case where one of the state machines recognizes the function in question.
[Bug target/106338] RISC-V static-chain register may be clobbered by PLT stubs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106338 Andrew Pinski changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |WONTFIX --- Comment #4 from Andrew Pinski --- The only reason why aarch64 changed their static chain register was because it was used for TLS on darwin (and IIRC on VXWorks). static chain is not part of the ABI (unless RISCV folks want to do that). And there are no PLTs between the function calls.
[Bug d/102765] [11 Regression] GDC11 stopped inlining library functions and lambdas used by a binary search one-liner code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102765 --- Comment #6 from Iain Buclaw --- r13-2002 (and r12-8673) is a start that sows the seeds to make the codegen option -fno-weak-templates the default. Should just be a case of extending the forced emission to all instantiations too.
[Bug d/104317] D language: rt.config module doesn't work as expected in GDC 9/10 (multiple definition linker error)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104317 --- Comment #3 from Iain Buclaw --- (In reply to Siarhei Siamashka from comment #2) > I first tried to toggle "flag_weak_templates" in "gcc/d/lang.opt" from 1 to > 0 in GDC11 instead of reverting PR99914, but the resulting toolchain was > unable to compile and link even the most simple applications due to missing > symbols from Phobos. > r13-2002 (and r12-8673) is a start that sows the seeds to make the codegen option -fno-weak-templates the default. Should just be a case of extending the forced emission to all instantiations too.
[Bug c++/101421] ICE: in lookup_template_class_1, at cp/pt.c:10005
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101421 Marek Polacek changed: What|Removed |Added Resolution|--- |FIXED Status|ASSIGNED|RESOLVED --- Comment #3 from Marek Polacek --- Fixed by r13-1390-g07ac550393d00f.
[Bug c/77876] -Wbool-operation rejects useful code involving '~'
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77876 Marek Polacek changed: What|Removed |Added Status|ASSIGNED|NEW Assignee|mpolacek at gcc dot gnu.org|unassigned at gcc dot gnu.org --- Comment #2 from Marek Polacek --- Clearly I never got to this PR. clang also issues the warning, and I think it'd be better to simply use '!' rather than '~'. I have no plans to change the warning, sorry.
[Bug c/106569] enhancement: use STL algorithm instead of a raw loop
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106569 --- Comment #4 from David Binderman --- (In reply to Martin Liška from comment #3) > > My best guess is that if gcc trunk is written in some recent version of C++, > > then all that recent version can be used. > > We are written in C++11, is std::find_if available in the given standard? Yes, there is *a* version of std::find_if available. Some additional work might be needed to verify exact match.
[Bug target/65372] -mprofile-kernel undocumented
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65372 Nick Desaulniers changed: What|Removed |Added CC||ndesaulniers at google dot com, ||nemanja.i.ibm at gmail dot com --- Comment #1 from Nick Desaulniers --- I filed a feature request to get this implemented in Clang, since the Linux kernel uses it for the ppc port. The immediate request was for documentation about the change. https://github.com/llvm/llvm-project/issues/57031
[Bug fortran/106566] [OpenMP] declare simd fails with with bogus "already been host associated" for module procedures
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106566 Tobias Burnus changed: What|Removed |Added Keywords||accepts-invalid Summary|[OpenMP]|[OpenMP] declare simd fails ||with with bogus "already ||been host associated" for ||module procedures --- Comment #1 from Tobias Burnus --- Additionally, the following is not diagnosed – at least not for this example. "For Fortran, a declarative directive must appear after any USE, IMPORT, and IMPLICIT statements in a declarative context." (The original example shows this issue. This is reported by other compilers and is being fixed on the OpenMP examples side.) Example - *FAILS* ("has already been host associated") but is *VALID* module m integer, parameter :: NN = 1023 integer :: a(NN) contains subroutine add_one2(p) implicit none ! valid - must before declare !$omp declare simd(add_one2) linear(p: ref) simdlen(8) integer :: p p = p + 1 end subroutine end module The following is *COMPILING* - as there is no MODULE: subroutine add_one2(p) !$omp declare simd(add_one2) linear(p: ref) simdlen(8) implicit none ! invalid because after declare. integer :: p p = p + 1 end subroutine Note: This example is on purpose invalid as 'implicit none' has been moved after 'omp declare'. Otherwise, it would be valid.
[Bug c/106569] enhancement: use STL algorithm instead of a raw loop
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106569 --- Comment #3 from Martin Liška --- > My best guess is that if gcc trunk is written in some recent version of C++, > then all that recent version can be used. We are written in C++11, is std::find_if available in the given standard?
[Bug fortran/106565] Using a transposed matrix in matmul (GCC-10.3.0) is very slow
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106565 --- Comment #9 from Quanhua Liu --- Hi Richard, It seems that I cannot add comment online to the ticket. I tried gfortran -o z -O3 -march=native test_matrixCal.f90 -fexternal-blas -lblas -fdump-tree-optimized time a.out 1 and time a.out 2 Both are very slow ( 6s in comparison to previous 0.8 s using method 2). I don't know which blab on my machine is. On your machine, can you help to test BB = transpose(B) C = matmul(A,BB) using gfortran -O3 test_matrixCal.f90 time a.out 2 against test C = matmul(A, transpose(B) ) using any option or blas timing? The timing depends on machine. It would be great helpful if you can provide the timing for the two methods from your site Thank you! Quanhua Liu On 8/9/2022 1:53 PM, sgk at troutmask dot apl.washington.edu wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106565 > > --- Comment #7 from Steve Kargl --- > On Tue, Aug 09, 2022 at 05:17:57PM +, quanhua.liu at noaa dot gov wrote: >> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106565 >> >> --- Comment #5 from Quanhua Liu --- >> Hi Richard, >> >> Using -fexternal-blas for gfortran v10.3.0 is much slower than >> the method 2: >> BB = transpose(B) >> C = matmul(A, BB) >> >> How about on your machine? >> >>> If you are doing a problem of this size or larger, you want to use the >>> -fexternal-blas option and link in OpenBLAS. > > I wrote "and link in OpenBLAS". > >>> I added timing code and replicated the loop to both in one go. >>> >>> % gfcx -o z -O3 -march=native a.f90 && ./z >>> 1.16500998 1615.08594 >>> 5.32258606 1615.08020 > >>> % gfcx -o z -O3 -march=native a.f90 -fexternal-blas -lopenblas && ./z >>> 2.44668889 1615.08301 >>> 1.99379802 1615.08301 > Method 1 is faster with OpenBLAS. >
[Bug c++/81159] New warning idea: -Wself-move
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81159 Marek Polacek changed: What|Removed |Added Keywords||patch --- Comment #8 from Marek Polacek --- Patch posted: https://gcc.gnu.org/pipermail/gcc-patches/2022-August/599503.html
[Bug tree-optimization/98954] ((X << CST0) & CST1) == 0 is not optimized to 0 == (X & (CST1 >> CST0))
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98954 --- Comment #6 from CVS Commits --- The master branch has been updated by Roger Sayle : https://gcc.gnu.org/g:6fc14f1963dfefead588a4cd8902d641ed69255c commit r13-2005-g6fc14f1963dfefead588a4cd8902d641ed69255c Author: Roger Sayle Date: Tue Aug 9 18:54:43 2022 +0100 middle-end: Optimize ((X >> C1) & C2) != C3 for more cases. Following my middle-end patch for PR tree-optimization/94026, I'd promised Jeff Law that I'd clean up the dead-code in fold-const.cc now that these optimizations are handled in match.pd. Alas, I discovered things aren't quite that simple, as the transformations I'd added avoided cases where C2 overlapped with the new bits introduced by the shift, but the original code handled any value of C2 provided that it had a single-bit set (under the condition that C3 was always zero). This patch upgrades the transformations supported by match.pd to cover any values of C2 and C3, provided that C1 is a valid bit shift constant, for all three shift types (logical right, arithmetic right and left). This then makes the code in fold-const.cc fully redundant, and adds support for some new (corner) cases not previously handled. If the constant C1 is valid for the type's precision, the shift is now always eliminated (with C2 and C3 possibly updated to test the sign bit). Interestingly, the fold-const.cc code that I'm now deleting was originally added by me back in 2006 to resolve PR middle-end/21137. I've confirmed that those testcase(s) remain resolved with this patch (and I'll close 21137 in Bugzilla). This patch also implements most (but not all) of the examples mentioned in PR tree-optimization/98954, for which I have some follow-up patches. 2022-08-09 Roger Sayle Richard Biener gcc/ChangeLog PR middle-end/21137 PR tree-optimization/98954 * fold-const.cc (fold_binary_loc): Remove optimizations to optimize ((X >> C1) & C2) ==/!= 0. * match.pd (cmp (bit_and (lshift @0 @1) @2) @3): Remove wi::ctz check, and handle all values of INTEGER_CSTs @2 and @3. (cmp (bit_and (rshift @0 @1) @2) @3): Likewise, remove wi::clz checks, and handle all values of INTEGER_CSTs @2 and @3. gcc/testsuite/ChangeLog PR middle-end/21137 PR tree-optimization/98954 * gcc.dg/fold-eqandshift-4.c: New test case.
[Bug tree-optimization/21137] Convert (a >> 2) & 1 != 0 into a & 4 != 0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=21137 --- Comment #13 from CVS Commits --- The master branch has been updated by Roger Sayle : https://gcc.gnu.org/g:6fc14f1963dfefead588a4cd8902d641ed69255c commit r13-2005-g6fc14f1963dfefead588a4cd8902d641ed69255c Author: Roger Sayle Date: Tue Aug 9 18:54:43 2022 +0100 middle-end: Optimize ((X >> C1) & C2) != C3 for more cases. Following my middle-end patch for PR tree-optimization/94026, I'd promised Jeff Law that I'd clean up the dead-code in fold-const.cc now that these optimizations are handled in match.pd. Alas, I discovered things aren't quite that simple, as the transformations I'd added avoided cases where C2 overlapped with the new bits introduced by the shift, but the original code handled any value of C2 provided that it had a single-bit set (under the condition that C3 was always zero). This patch upgrades the transformations supported by match.pd to cover any values of C2 and C3, provided that C1 is a valid bit shift constant, for all three shift types (logical right, arithmetic right and left). This then makes the code in fold-const.cc fully redundant, and adds support for some new (corner) cases not previously handled. If the constant C1 is valid for the type's precision, the shift is now always eliminated (with C2 and C3 possibly updated to test the sign bit). Interestingly, the fold-const.cc code that I'm now deleting was originally added by me back in 2006 to resolve PR middle-end/21137. I've confirmed that those testcase(s) remain resolved with this patch (and I'll close 21137 in Bugzilla). This patch also implements most (but not all) of the examples mentioned in PR tree-optimization/98954, for which I have some follow-up patches. 2022-08-09 Roger Sayle Richard Biener gcc/ChangeLog PR middle-end/21137 PR tree-optimization/98954 * fold-const.cc (fold_binary_loc): Remove optimizations to optimize ((X >> C1) & C2) ==/!= 0. * match.pd (cmp (bit_and (lshift @0 @1) @2) @3): Remove wi::ctz check, and handle all values of INTEGER_CSTs @2 and @3. (cmp (bit_and (rshift @0 @1) @2) @3): Likewise, remove wi::clz checks, and handle all values of INTEGER_CSTs @2 and @3. gcc/testsuite/ChangeLog PR middle-end/21137 PR tree-optimization/98954 * gcc.dg/fold-eqandshift-4.c: New test case.
[Bug tree-optimization/94026] combine missed opportunity to simplify comparisons with zero
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94026 --- Comment #14 from CVS Commits --- The master branch has been updated by Roger Sayle : https://gcc.gnu.org/g:6fc14f1963dfefead588a4cd8902d641ed69255c commit r13-2005-g6fc14f1963dfefead588a4cd8902d641ed69255c Author: Roger Sayle Date: Tue Aug 9 18:54:43 2022 +0100 middle-end: Optimize ((X >> C1) & C2) != C3 for more cases. Following my middle-end patch for PR tree-optimization/94026, I'd promised Jeff Law that I'd clean up the dead-code in fold-const.cc now that these optimizations are handled in match.pd. Alas, I discovered things aren't quite that simple, as the transformations I'd added avoided cases where C2 overlapped with the new bits introduced by the shift, but the original code handled any value of C2 provided that it had a single-bit set (under the condition that C3 was always zero). This patch upgrades the transformations supported by match.pd to cover any values of C2 and C3, provided that C1 is a valid bit shift constant, for all three shift types (logical right, arithmetic right and left). This then makes the code in fold-const.cc fully redundant, and adds support for some new (corner) cases not previously handled. If the constant C1 is valid for the type's precision, the shift is now always eliminated (with C2 and C3 possibly updated to test the sign bit). Interestingly, the fold-const.cc code that I'm now deleting was originally added by me back in 2006 to resolve PR middle-end/21137. I've confirmed that those testcase(s) remain resolved with this patch (and I'll close 21137 in Bugzilla). This patch also implements most (but not all) of the examples mentioned in PR tree-optimization/98954, for which I have some follow-up patches. 2022-08-09 Roger Sayle Richard Biener gcc/ChangeLog PR middle-end/21137 PR tree-optimization/98954 * fold-const.cc (fold_binary_loc): Remove optimizations to optimize ((X >> C1) & C2) ==/!= 0. * match.pd (cmp (bit_and (lshift @0 @1) @2) @3): Remove wi::ctz check, and handle all values of INTEGER_CSTs @2 and @3. (cmp (bit_and (rshift @0 @1) @2) @3): Likewise, remove wi::clz checks, and handle all values of INTEGER_CSTs @2 and @3. gcc/testsuite/ChangeLog PR middle-end/21137 PR tree-optimization/98954 * gcc.dg/fold-eqandshift-4.c: New test case.
[Bug fortran/106565] Using a transposed matrix in matmul (GCC-10.3.0) is very slow
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106565 --- Comment #8 from Steve Kargl --- On Tue, Aug 09, 2022 at 05:51:51PM +, sgk at troutmask dot apl.washington.edu wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106565 > > --- Comment #6 from Steve Kargl --- > On Tue, Aug 09, 2022 at 05:14:16PM +, quanhua.liu at noaa dot gov wrote: > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106565 > > > > --- Comment #4 from Quanhua Liu --- > > Using > > gfortran -O3 -fexternal-blas -L/. -lblas testmatrixCal.f90 > > Which BLAS are you using? If you are using BLAS from > Netlib, then of course you'll likely get poor results > as the Netlib BLAS is not tuned. > Even netlib blas is ok. gfcx -o z -O3 -march=native a.f90 -fexternal-blas -lblas -fdump-tree-optimized && ./z 1.41149306 1615.08020 1.50036991 1615.08020
[Bug c/106560] ICE after conflicting types of redeclaration
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106560 --- Comment #7 from Andrew Pinski --- (In reply to Richard Biener from comment #6) > (In reply to Andrew Pinski from comment #3) > > Here is the simple fix, I will submit it this weekend. > > [apinski@xeond2 gcc]$ git diff > > diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc > > index f0fbdb48012..d9ada8e0f9e 100644 > > --- a/gcc/gimplify.cc > > +++ b/gcc/gimplify.cc > > @@ -6012,6 +6012,11 @@ gimplify_modify_expr (tree *expr_p, gimple_seq > > *pre_p, gimple_seq *post_p, > >gcc_assert (TREE_CODE (*expr_p) == MODIFY_EXPR > > || TREE_CODE (*expr_p) == INIT_EXPR); > > > > + if (TREE_TYPE (*from_p) == error_mark_node) > > if (error_operand_p (*from_p)) Oh Ok, There was a few places which check directly against error_mark_node. gimplify_decl_expr and gimplify_save_expr for example. I will submit a patch to fix those too.
[Bug fortran/106565] Using a transposed matrix in matmul (GCC-10.3.0) is very slow
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106565 --- Comment #7 from Steve Kargl --- On Tue, Aug 09, 2022 at 05:17:57PM +, quanhua.liu at noaa dot gov wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106565 > > --- Comment #5 from Quanhua Liu --- > Hi Richard, > > Using -fexternal-blas for gfortran v10.3.0 is much slower than > the method 2: > BB = transpose(B) > C = matmul(A, BB) > > How about on your machine? > > > > > If you are doing a problem of this size or larger, you want to use the > > -fexternal-blas option and link in OpenBLAS. I wrote "and link in OpenBLAS". > > I added timing code and replicated the loop to both in one go. > > > > % gfcx -o z -O3 -march=native a.f90 && ./z > > 1.16500998 1615.08594 > > 5.32258606 1615.08020 > > % gfcx -o z -O3 -march=native a.f90 -fexternal-blas -lopenblas && ./z > > 2.44668889 1615.08301 > > 1.99379802 1615.08301 Method 1 is faster with OpenBLAS.
[Bug fortran/106565] Using a transposed matrix in matmul (GCC-10.3.0) is very slow
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106565 --- Comment #6 from Steve Kargl --- On Tue, Aug 09, 2022 at 05:14:16PM +, quanhua.liu at noaa dot gov wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106565 > > --- Comment #4 from Quanhua Liu --- > Using > gfortran -O3 -fexternal-blas -L/. -lblas testmatrixCal.f90 Which BLAS are you using? If you are using BLAS from Netlib, then of course you'll likely get poor results as the Netlib BLAS is not tuned. I specifically wrote use OpenBLAS OpenBLAS is likely tuned for whatever hardware you have. % gfcx -o z -O3 -march=native a.f90 -fexternal-blas -lopenblas \ -fdump-tree-optimized && ./z 2.44969702 1615.08301 2.00995278 1615.08301 The use of matmal(..., transpose()) is the fastest on a AMD FX(tm)-8350, % grep gemm z-a.f90.252t.optimized sgemm (&"N"[1]{lb: 1 sz: 1}, &"N"[1]{lb: 1 sz: 1}, , , , , , , , , , , , 1, 1); sgemm (&"N"[1]{lb: 1 sz: 1}, &"T"[1]{lb: 1 sz: 1}, , , , , , , , , , , , 1, 1);
[Bug c/106571] Implement -Wsection diag
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106571 --- Comment #3 from Andrew Pinski --- (In reply to Boris from comment #2) > How can you check a mismatch if only the definition has the section > attribute? You don't need to. > > Here's the kernel commit which fixes this for clang: > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/ > ?id=db886979683a8360ced9b24ab1125ad0c4d2cf76 > > there's the same example in the commit message. Oh I see the section here has more semantics than the normal section attribute does. There should be an enhancement request for a new attribute which does more than the current section attribute instead.
[Bug fortran/106565] Using a transposed matrix in matmul (GCC-10.3.0) is very slow
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106565 --- Comment #5 from Quanhua Liu --- Hi Richard, Using -fexternal-blas for gfortran v10.3.0 is much slower than the method 2: BB = transpose(B) C = matmul(A, BB) How about on your machine? Thanks, Quanhua Liu On 8/9/2022 11:07 AM, kargl at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106565 > > kargl at gcc dot gnu.org changed: > > What|Removed |Added > > CC||kargl at gcc dot gnu.org > > --- Comment #3 from kargl at gcc dot gnu.org --- > >>INTEGER, PARAMETER :: m = 200, n = 300, nn = 150 >>REAL :: A(m,n), B(nn,n), C(m,nn), BB(n,nn) >>INTEGER :: i, j, k, L > > If you are doing a problem of this size or larger, you want to use the > -fexternal-blas option and link in OpenBLAS. > > I added timing code and replicated the loop to both in one go. > > % gfcx -o z -O3 -march=native a.f90 && ./z > 1.16500998 1615.08594 > 5.32258606 1615.08020 > % gfcx -o z -O3 -march=native a.f90 -fexternal-blas -lopenblas && ./z > 2.44668889 1615.08301 > 1.99379802 1615.08301 >
[Bug fortran/106565] Using a transposed matrix in matmul (GCC-10.3.0) is very slow
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106565 --- Comment #4 from Quanhua Liu --- Using gfortran -O3 -fexternal-blas -L/. -lblas testmatrixCal.f90 time a.out 1 real: 6.14 (s) time a.out 2 real: 5.41 It is 6 times slower than BB = transpose(B) C = matmul(A, BB) ifort doesn't have the problem.
[Bug c/106571] Implement -Wsection diag
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106571 --- Comment #2 from Boris --- How can you check a mismatch if only the definition has the section attribute? Here's the kernel commit which fixes this for clang: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=db886979683a8360ced9b24ab1125ad0c4d2cf76 there's the same example in the commit message.
[Bug c/106571] Implement -Wsection diag
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106571 Andrew Pinski changed: What|Removed |Added Last reconfirmed||2022-08-09 Ever confirmed|0 |1 Severity|normal |enhancement Status|UNCONFIRMED |WAITING Component|other |c --- Comment #1 from Andrew Pinski --- This example seems not to be correct as this is section is not needed on the declaration only the definition. If this is the example, I think the warning is incorrect and should not be implemented in GCC.
[Bug ipa/105360] Inlined lazy parameters / delegate literals, still emitted
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105360 Andrew Pinski changed: What|Removed |Added Resolution|--- |DUPLICATE Status|UNCONFIRMED |RESOLVED --- Comment #5 from Andrew Pinski --- Dup of bug 89139. *** This bug has been marked as a duplicate of bug 89139 ***
[Bug ipa/89139] GCC emits code for static functions that aren't used by the optimized code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89139 Andrew Pinski changed: What|Removed |Added CC||witold.baryluk+gcc at gmail dot co ||m --- Comment #8 from Andrew Pinski --- *** Bug 105360 has been marked as a duplicate of this bug. ***
[Bug ipa/105360] Inlined lazy parameters / delegate literals, still emitted
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105360 Andrew Pinski changed: What|Removed |Added Assignee|ibuclaw at gdcproject dot org |unassigned at gcc dot gnu.org CC||marxin at gcc dot gnu.org See Also||https://gcc.gnu.org/bugzill ||a/show_bug.cgi?id=94818 Severity|normal |enhancement Component|d |ipa --- Comment #4 from Andrew Pinski --- Or PR 94818.
[Bug d/105360] Inlined lazy parameters / delegate literals, still emitted
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105360 Iain Buclaw changed: What|Removed |Added See Also||https://gcc.gnu.org/bugzill ||a/show_bug.cgi?id=80680, ||https://gcc.gnu.org/bugzill ||a/show_bug.cgi?id=99373 --- Comment #3 from Iain Buclaw --- Possibly a duplicate of pr80680 or pr99373.
[Bug d/105360] Inlined lazy parameters / delegate literals, still emitted
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105360 --- Comment #2 from Iain Buclaw --- Looks like it's a middle-end missed-optimization, not a D front-end one. https://godbolt.org/z/5WWYEG4jW Perhaps we need an extra DCE pass?
[Bug tree-optimization/106322] [12/13 Regression] tree-vectorize: Wrong code at O2 level (-fno-tree-vectorize is working) since r12-2404-ga1d27560770818c5
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106322 --- Comment #36 from Andrew Pinski --- You might need to do -O2 -fPIE -pie to reproduce the issue as debian is configured with --enable-default-pie
[Bug c++/106572] A programmatic list of all possible compiler warnings
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106572 --- Comment #6 from Andrew Pinski --- (In reply to Andrew Pinski from comment #5) > >which blows up the command line for the compilation. > > You can use a response file and that won't blow up the command line at all. > > That is: > g++ -Q --help=warnings | tail -n +2 | awk '{print $1}' | tr '\n' ' ' > > cxxflags.opt > > g++ @cxxflags.opt Oh and you don't need the tr either that is any whitespace in a response file is will be treated as a seperator. So just: g++ -Q --help=warnings | tail -n +2 | awk '{print $1}' > cxxflags.opt
[Bug c++/106572] A programmatic list of all possible compiler warnings
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106572 --- Comment #5 from Andrew Pinski --- >which blows up the command line for the compilation. You can use a response file and that won't blow up the command line at all. That is: g++ -Q --help=warnings | tail -n +2 | awk '{print $1}' | tr '\n' ' ' > cxxflags.opt g++ @cxxflags.opt
[Bug fortran/106565] Using a transposed matrix in matmul (GCC-10.3.0) is very slow
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106565 kargl at gcc dot gnu.org changed: What|Removed |Added Priority|P3 |P4
[Bug fortran/106565] Using a transposed matrix in matmul (GCC-10.3.0) is very slow
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106565 kargl at gcc dot gnu.org changed: What|Removed |Added CC||kargl at gcc dot gnu.org --- Comment #3 from kargl at gcc dot gnu.org --- > INTEGER, PARAMETER :: m = 200, n = 300, nn = 150 > REAL :: A(m,n), B(nn,n), C(m,nn), BB(n,nn) > INTEGER :: i, j, k, L If you are doing a problem of this size or larger, you want to use the -fexternal-blas option and link in OpenBLAS. I added timing code and replicated the loop to both in one go. % gfcx -o z -O3 -march=native a.f90 && ./z 1.16500998 1615.08594 5.32258606 1615.08020 % gfcx -o z -O3 -march=native a.f90 -fexternal-blas -lopenblas && ./z 2.44668889 1615.08301 1.99379802 1615.08301
[Bug c++/106572] A programmatic list of all possible compiler warnings
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106572 --- Comment #4 from Jayesh Badwaik --- I don't think any of the previous bug reports address the requirements that this bug report does. This is not about production runs, this is about development workflow. Unless the position is that users should not use any warnings apart from `-Wall -Wextra` ever, the user has to look at what warnings the compiler offers. The current method is a very manual method where I have to browse through the whole GCC page and get the list of warnings and then manually put them into my command line to see if any of the code in my repository triggers those warnings. It will save everyone's time and effort if there was a switch to do that. It is therefore, actually very useful.
[Bug c++/106572] A programmatic list of all possible compiler warnings
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106572 --- Comment #3 from Jayesh Badwaik --- I don't think any of the previous bug reports address the requirements that this bug report does. This is not about production runs, this is about development workflow. Unless the position is that users should not use any warnings apart from `-Wall -Wextra` ever, the user has to look at what warnings the compiler offers. The current method is a very manual method where I have to browse through the whole GCC page and get the list of warnings and then manually put them into my command line to see if any of the code in my repository triggers those warnings. It will save everyone's time and effort if there was a switch to do that.
[Bug target/106554] -fstack-usage result too low for variadic function on Arm
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106554 Eric Botcazou changed: What|Removed |Added Ever confirmed|0 |1 Status|UNCONFIRMED |NEW Last reconfirmed||2022-08-09 --- Comment #2 from Eric Botcazou --- > IIRC there's target support code eventually missing for some targets. The > -fstack-usage documentation isn't clear how exact the result is supposed to > be. It must always be conservatively correct, so it's certainly a bug.
[Bug c++/106572] A programmatic list of all possible compiler warnings
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106572 Andrew Pinski changed: What|Removed |Added See Also||https://gcc.gnu.org/bugzill ||a/show_bug.cgi?id=31573 --- Comment #2 from Andrew Pinski --- -Weverything is useless and was decided years ago gcc was not going to add it. See PR 31573.
[Bug c++/106572] A programmatic list of all possible compiler warnings
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106572 Marek Polacek changed: What|Removed |Added CC||mpolacek at gcc dot gnu.org --- Comment #1 from Marek Polacek --- Probably a dup of bug 31573.
[Bug c++/106572] New: A programmatic list of all possible compiler warnings
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106572 Bug ID: 106572 Summary: A programmatic list of all possible compiler warnings Product: gcc Version: 13.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: j.badw...@fz-juelich.de Target Milestone: --- It would be an excellent workflow to run your code with `-Weverything` once in a while just to check which new warnings are triggered by your code. Then, depending on whether they are useful or not, one can incorporate those warnings in their normal workflow. The alternative is to go through the release notes of every GCC release everytime and then try to see if there is a warning which interests you. While this is doable, it requires manual effort, with a possibility that you find no warning which is useful. Also, depending on the code, it might not be possible to play with all compiler warnings as soon as the compiler is released, since you might want to wait for your code to be able to compile with the compiler before you go there. All of this makes for a very clumsy workflow with a lot of manual reminders about what needs to be done. The `-Weverything` allows for someone to schedule say a monthly CI job which automatically runs the build with `-Weverything -Werror`. Any new compilers added and any new warning which affects the current code will automatically be detected. The user can then make a decision on whether the warning makes enough sense for them to be used in their production runs. Currently, the way to get a list of all warning is very cumbersome. One has to do: > g++ -Q --help=warnings | tail -n +2 | awk '{print $1}' | tr '\n' ' ' which blows up the command line for the compilation. The request would be to provide either a `-Weverything` flag like clang does or a `g++ --list-every-warning` to list all warnings in a format which can then be passed to the compiler.
[Bug modula2/106443] Many 32-bit tests FAIL to link on Solaris/sparcv9
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106443 --- Comment #2 from ro at CeBiTec dot Uni-Bielefeld.DE --- > --- Comment #1 from Gaius Mulley --- > I've pushed a fix to devel/modula2 to fix multilib install (seen on amd64). > It > now builds and installs multilib. Prior to this fix the 32 bit libraries were > installed over the 64 bit libraries when multilib was enabled. Curious as to > whether this fixes the linking bugs on Solaris. It did indeed: I've tried both sparcv9-sun-solaris2.11 and i386-pc-solaris2.11 builds and the results are fine (rought 15 to 20 failures per multilib on both sparc and x86). However, I still needed the gcc.cc patch from https://gcc.gnu.org/pipermail/gcc-patches/2022-July/598822.html to allow the 32-bit-default build to succeed. Thanks. Rainer
[Bug tree-optimization/106457] array_at_struct_end_p returns TRUE for a two-dimension array which is not inside any structure
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106457 --- Comment #9 from qinzhao at gcc dot gnu.org --- one more testing case failed with the current array_at_struct_end_p is:gcc/testsuite/gcc.dg/torture/pr50067-2.c: 1 /* { dg-do run } */ 2 3 /* Make sure data-dependence analysis does not compute a bogus 4 distance vector for the different sized accesses. */ 5 6 extern int memcmp(const void *, const void *, __SIZE_TYPE__); 7 extern void abort (void); 8 short a[32] = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31 }; 9 short b[32] = { 4, 0, 5, 0, 6, 0, 7, 0, 8, 0, }; 10 int main() 11 { 12 #if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__ 13 int i; 14 if (sizeof (short) == 2) 15 { 16 for (i = 0; i < 32; ++i) 17 { 18 a[i] = (*((char(*)[32])[0]))[i+8]; 19 } 20 if (memcmp (, , sizeof (a)) != 0) 21 abort (); 22 } 23 #endif 24 return 0; 25 } In the above, at line 18: (*((char(*)[32])[0]))[i+8] was identified as TRUE: Breakpoint 1, array_at_struct_end_p (ref=0xf57a2b18) at ../../latest_gcc/gcc/tree.cc:12690 12690 if (TREE_CODE (ref) == ARRAY_REF (gdb) call debug_tree(ref) unit-size align:8 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0xf57d03f0 precision:8 min max pointer_to_this > arg:0 BLK size unit-size align:8 warn_if_not_align:0 symtab:0 alias-set 0 canonical-type 0xf59950b8 domain pointer_to_this > arg:0 constant arg:0 > arg:1 /home/opc/Work/GCC/latest_gcc/gcc/testsuite/gcc.dg/torture/pr50067-2.c:18:12 start: /home/opc/Work/GCC/latest_gcc/gcc/testsuite/gcc.dg/torture/pr50067-2.c:18:11 finish: /home/opc/Work/GCC/latest_gcc/gcc/testsuite/gcc.dg/torture/pr50067-2.c:18:33> arg:1 unit-size align:32 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0xf57d05e8 precision:32 min max pointer_to_this > visited def_stmt _1 = i_5 + 8; version:1> /home/opc/Work/GCC/latest_gcc/gcc/testsuite/gcc.dg/torture/pr50067-2.c:18:34 start: /home/opc/Work/GCC/latest_gcc/gcc/testsuite/gcc.dg/torture/pr50067-2.c:18:11 finish: /home/opc/Work/GCC/latest_gcc/gcc/testsuite/gcc.dg/torture/pr50067-2.c:18:38> ... (gdb) n 12801 return true;
[Bug tree-optimization/106457] array_at_struct_end_p returns TRUE for a two-dimension array which is not inside any structure
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106457 --- Comment #8 from qinzhao at gcc dot gnu.org --- another testing case failed with the current array_at_struct_end_p is: gcc/testsuite/gcc.dg/torture/pr50067-1.c: 1 /* { dg-do run } */ 2 3 /* Make sure data-dependence analysis does not compute a bogus 4distance vector for the different sized accesses. */ 5 6 extern int memcmp(const void *, const void *, __SIZE_TYPE__); 7 extern void abort (void); 8 short a[32] = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31 }; 9 short b[32] = { 4, 0, 5, 0, 6, 0, 7, 0, 8, 0, }; 10 int main() 11 { 12 #if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__ 13 int i; 14 if (sizeof (short) == 2) 15 { 16 for (i = 0; i < 32; ++i) 17 (*((unsigned short(*)[32])[0]))[i] = (*((char(*)[32])[0]))[i+8]; 18 if (memcmp (, , sizeof (a)) != 0) 19 abort (); 20 } 21 #endif 22 return 0; 23 } In the above, the array ref at line 17: (*((char(*)[32])[0]))[i+8] was identified as TRUE by the current array_at_struct_end_p: 12690 if (TREE_CODE (ref) == ARRAY_REF (gdb) call debug_tree(ref) unit-size align:8 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0xf57d03f0 precision:8 min max pointer_to_this > arg:0 BLK size unit-size align:8 warn_if_not_align:0 symtab:0 alias-set 0 canonical-type 0xf5994d70 domain pointer_to_this > arg:0 constant arg:0 > arg:1 /home/opc/Work/GCC/latest_gcc/gcc/testsuite/gcc.dg/torture/pr50067-1.c:17:42 start: /home/opc/Work/GCC/latest_gcc/gcc/testsuite/gcc.dg/torture/pr50067-1.c:17:41 finish: /home/opc/Work/GCC/latest_gcc/gcc/testsuite/gcc.dg/torture/pr50067-1.c:17:63> arg:1 unit-size align:32 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0xf57d05e8 precision:32 min max pointer_to_this > visited def_stmt _1 = i_13 + 8; version:1 ptr-info 0xf59fee20> /home/opc/Work/GCC/latest_gcc/gcc/testsuite/gcc.dg/torture/pr50067-1.c:17:64 start: /home/opc/Work/GCC/latest_gcc/gcc/testsuite/gcc.dg/torture/pr50067-1.c:17:41 finish: /home/opc/Work/GCC/latest_gcc/gcc/testsuite/gcc.dg/torture/pr50067-1.c:17:68> (gdb) n 12801 return true; (gdb)
[Bug c/106569] enhancement: use STL algorithm instead of a raw loop
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106569 --- Comment #2 from David Binderman --- (In reply to Richard Biener from comment #1) > I find those less obvious, for example does std::any_of guarantee some > evaluation order? I also find any_of less obvious, but that's because my working knowledge of C++ stopped about 20 years ago. According to https://cplusplus.com/reference/algorithm/any_of/ there is no guarantee of evaluation order. My best guess is that if gcc trunk is written in some recent version of C++, then all that recent version can be used.
[Bug tree-optimization/106322] [12/13 Regression] tree-vectorize: Wrong code at O2 level (-fno-tree-vectorize is working) since r12-2404-ga1d27560770818c5
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106322 --- Comment #35 from Mathieu Malaterre --- (In reply to Mathieu Malaterre from comment #33) > (In reply to Kewen Lin from comment #32) > > (In reply to Mathieu Malaterre from comment #30) > > > (In reply to Martin Liška from comment #29) > > > > (In reply to Kewen Lin from comment #28) > > > > > Sorry for the breakage, I'll have a look tomorrow. > > > > > > > > > > btw, is it able to reproduce the issue on ppc64 (or ppc64le) as well? > > > > > > > > No for gcc112 machine (ppc64le). Seems to be related to 32-bit targets. > > > > > > I could see unit-test failures of highway on most 32bits arch, as well as > > > mips64el and ppc64be. > > > > Thanks to both guys! I'll try with ppc64 32bit first. > > Watch out that I've reduced the original test case on my local x86/32bits > arch. > > It appears that I've lifted way too much code to reproduce the issue on > ppc32/be. Is is ok for you to use instead, reproducer from previous comment: > > * https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106322#c16 Nevermind; I was using gcc-11. I can reproduce the issue on ppc32/be using the (somewhat) reduced example: * https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106322#c19 For reference: % g++ -O2 -fno-tree-vectorize *.cc && ./a.out && echo "ok" ok But: % g++ --verbose -O2 *.cc && ./a.out && echo "ok" Using built-in specs. COLLECT_GCC=g++ COLLECT_LTO_WRAPPER=/usr/lib/gcc/powerpc-linux-gnu/12/lto-wrapper Target: powerpc-linux-gnu Configured with: ../src/configure -v --with-pkgversion='Debian 12.1.0-7' --with-bugurl=file:///usr/share/doc/gcc-12/README.Bugs --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++ --prefix=/usr --with-gcc-major-version-only --program-suffix=-12 --program-prefix=powerpc-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-libitm --disable-libquadmath --disable-libquadmath-support --enable-plugin --enable-default-pie --with-system-zlib --enable-libphobos-checking=release --with-target-system-zlib=auto --with-libphobos-druntime-only=yes --enable-objc-gc=auto --enable-secureplt --disable-softfloat --with-cpu=default32 --disable-softfloat --enable-targets=powerpc-linux,powerpc64-linux --enable-multiarch --disable-werror --with-long-double-128 --enable-multilib --enable-checking=release --build=powerpc-linux-gnu --host=powerpc-linux-gnu --target=powerpc-linux-gnu Thread model: posix Supported LTO compression algorithms: zlib zstd gcc version 12.1.0 (Debian 12.1.0-7) COLLECT_GCC_OPTIONS='-v' '-O2' '-shared-libgcc' '-dumpdir' 'a-' /usr/lib/gcc/powerpc-linux-gnu/12/cc1plus -quiet -v -imultiarch powerpc-linux-gnu -D_GNU_SOURCE bytes.cc -msecure-plt -quiet -dumpdir a- -dumpbase bytes.cc -dumpbase-ext .cc -O2 -version -o /tmp/ccXa9nGd.s GNU C++17 (Debian 12.1.0-7) version 12.1.0 (powerpc-linux-gnu) compiled by GNU C version 12.1.0, GMP version 6.2.1, MPFR version 4.1.0, MPC version 1.2.1, isl version isl-0.25-GMP GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072 ignoring duplicate directory "/usr/include/powerpc-linux-gnu/c++/12" ignoring nonexistent directory "/usr/local/include/powerpc-linux-gnu" ignoring nonexistent directory "/usr/lib/gcc/powerpc-linux-gnu/12/include-fixed" ignoring nonexistent directory "/usr/lib/gcc/powerpc-linux-gnu/12/../../../../powerpc-linux-gnu/include" #include "..." search starts here: #include <...> search starts here: /usr/include/c++/12 /usr/include/powerpc-linux-gnu/c++/12 /usr/include/c++/12/backward /usr/lib/gcc/powerpc-linux-gnu/12/include /usr/local/include /usr/include/powerpc-linux-gnu /usr/include End of search list. GNU C++17 (Debian 12.1.0-7) version 12.1.0 (powerpc-linux-gnu) compiled by GNU C version 12.1.0, GMP version 6.2.1, MPFR version 4.1.0, MPC version 1.2.1, isl version isl-0.25-GMP GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072 Compiler executable checksum: 56cdbc606649bdc6108da73e5dd1af6f COLLECT_GCC_OPTIONS='-v' '-O2' '-shared-libgcc' '-dumpdir' 'a-' as -v -a32 -K PIC -mppc -many -mbig -o /tmp/ccKx6rlb.o /tmp/ccXa9nGd.s GNU assembler version 2.38.90 (powerpc-linux-gnu) using BFD version (GNU Binutils for Debian) 2.38.90.20220713 COLLECT_GCC_OPTIONS='-v' '-O2' '-shared-libgcc' '-dumpdir' 'a-' /usr/lib/gcc/powerpc-linux-gnu/12/cc1plus -quiet -v -imultiarch powerpc-linux-gnu -D_GNU_SOURCE demo.cc -msecure-plt -quiet -dumpdir a- -dumpbase demo.cc -dumpbase-ext .cc -O2 -version -o /tmp/ccXa9nGd.s GNU C++17 (Debian 12.1.0-7) version 12.1.0 (powerpc-linux-gnu) compiled by GNU C version 12.1.0, GMP version 6.2.1, MPFR version 4.1.0, MPC version 1.2.1, isl version isl-0.25-GMP GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072 ignoring duplicate
[Bug fortran/106565] Using a transposed matrix in matmul (GCC-10.3.0) is very slow
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106565 --- Comment #2 from Quanhua Liu --- I modified the application code (see below) and use the "method" as a control variable from command line. I use the same code for both gfortran 10.3.0 and ifort 19.0.5.281 gfortran -O3 matrixCal.f90 time a.out 1 time a.out 2 ifort -O3 matrixCal.f90 time a.out 1 time a.out 2 where method 1, C = matmul(A, transpose(B) ) method 2, BB = transpose(B), C = matmul(A, BB) The timing is given in the table below. As you can see, using gfortran, method '2' is 6 times faster than the method '1'. Using ifort, method '2' is very similar to the method '1'. '1' is slightly fast because '2' may copy B to BB. Timing compiler gfortran ifort method1 2 1 2 real6.28 0.79 0.80 0.83
[Bug tree-optimization/106322] [12/13 Regression] tree-vectorize: Wrong code at O2 level (-fno-tree-vectorize is working) since r12-2404-ga1d27560770818c5
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106322 --- Comment #34 from Mathieu Malaterre --- (In reply to Mathieu Malaterre from comment #33) > (In reply to Kewen Lin from comment #32) > > (In reply to Mathieu Malaterre from comment #30) > > > (In reply to Martin Liška from comment #29) > > > > (In reply to Kewen Lin from comment #28) > > > > > Sorry for the breakage, I'll have a look tomorrow. > > > > > > > > > > btw, is it able to reproduce the issue on ppc64 (or ppc64le) as well? > > > > > > > > No for gcc112 machine (ppc64le). Seems to be related to 32-bit targets. > > > > > > I could see unit-test failures of highway on most 32bits arch, as well as > > > mips64el and ppc64be. > > > > Thanks to both guys! I'll try with ppc64 32bit first. > > Watch out that I've reduced the original test case on my local x86/32bits > arch. > > It appears that I've lifted way too much code to reproduce the issue on > ppc32/be. Is is ok for you to use instead, reproducer from previous comment: > > * https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106322#c16 It appears this one is also way too much lifted for proper repro on ppc32/be.
[Bug tree-optimization/106322] [12/13 Regression] tree-vectorize: Wrong code at O2 level (-fno-tree-vectorize is working) since r12-2404-ga1d27560770818c5
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106322 --- Comment #33 from Mathieu Malaterre --- (In reply to Kewen Lin from comment #32) > (In reply to Mathieu Malaterre from comment #30) > > (In reply to Martin Liška from comment #29) > > > (In reply to Kewen Lin from comment #28) > > > > Sorry for the breakage, I'll have a look tomorrow. > > > > > > > > btw, is it able to reproduce the issue on ppc64 (or ppc64le) as well? > > > > > > No for gcc112 machine (ppc64le). Seems to be related to 32-bit targets. > > > > I could see unit-test failures of highway on most 32bits arch, as well as > > mips64el and ppc64be. > > Thanks to both guys! I'll try with ppc64 32bit first. Watch out that I've reduced the original test case on my local x86/32bits arch. It appears that I've lifted way too much code to reproduce the issue on ppc32/be. Is is ok for you to use instead, reproducer from previous comment: * https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106322#c16
[Bug c/106569] enhancement: use STL algorithm instead of a raw loop
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106569 --- Comment #1 from Richard Biener --- I find those less obvious, for example does std::any_of guarantee some evaluation order?
[Bug tree-optimization/106570] [12/13 Regression] DCE sometimes fails with depending if statements since r12-2305-g398572c1544d8b75
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106570 Richard Biener changed: What|Removed |Added Target Milestone|--- |12.2
[Bug tree-optimization/106322] [12/13 Regression] tree-vectorize: Wrong code at O2 level (-fno-tree-vectorize is working) since r12-2404-ga1d27560770818c5
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106322 Richard Biener changed: What|Removed |Added Priority|P3 |P2 Target Milestone|--- |12.2 Keywords||wrong-code
[Bug tree-optimization/106322] [12/13 Regression] tree-vectorize: Wrong code at O2 level (-fno-tree-vectorize is working) since r12-2404-ga1d27560770818c5
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106322 --- Comment #32 from Kewen Lin --- (In reply to Mathieu Malaterre from comment #30) > (In reply to Martin Liška from comment #29) > > (In reply to Kewen Lin from comment #28) > > > Sorry for the breakage, I'll have a look tomorrow. > > > > > > btw, is it able to reproduce the issue on ppc64 (or ppc64le) as well? > > > > No for gcc112 machine (ppc64le). Seems to be related to 32-bit targets. > > I could see unit-test failures of highway on most 32bits arch, as well as > mips64el and ppc64be. Thanks to both guys! I'll try with ppc64 32bit first.
[Bug tree-optimization/106322] [12/13 Regression] tree-vectorize: Wrong code at O2 level (-fno-tree-vectorize is working) since r12-2404-ga1d27560770818c5
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106322 --- Comment #31 from Mathieu Malaterre --- (In reply to Mathieu Malaterre from comment #30) > (In reply to Martin Liška from comment #29) > > (In reply to Kewen Lin from comment #28) > > > Sorry for the breakage, I'll have a look tomorrow. > > > > > > btw, is it able to reproduce the issue on ppc64 (or ppc64le) as well? > > > > No for gcc112 machine (ppc64le). Seems to be related to 32-bit targets. > > I could see unit-test failures of highway on most 32bits arch, as well as > mips64el and ppc64be. For reference complete list is: * armel * i386 * mips64el * mipsel * powerpc * ppc64 See: * https://buildd.debian.org/status/logs.php?pkg=highway=1.0.1%7Egit20220802.5810c58-3=experimental (riscv64 is unrelated IMHO).
[Bug tree-optimization/106322] [12/13 Regression] tree-vectorize: Wrong code at O2 level (-fno-tree-vectorize is working) since r12-2404-ga1d27560770818c5
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106322 --- Comment #30 from Mathieu Malaterre --- (In reply to Martin Liška from comment #29) > (In reply to Kewen Lin from comment #28) > > Sorry for the breakage, I'll have a look tomorrow. > > > > btw, is it able to reproduce the issue on ppc64 (or ppc64le) as well? > > No for gcc112 machine (ppc64le). Seems to be related to 32-bit targets. I could see unit-test failures of highway on most 32bits arch, as well as mips64el and ppc64be.
[Bug tree-optimization/106322] [12/13 Regression] tree-vectorize: Wrong code at O2 level (-fno-tree-vectorize is working) since r12-2404-ga1d27560770818c5
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106322 --- Comment #29 from Martin Liška --- (In reply to Kewen Lin from comment #28) > Sorry for the breakage, I'll have a look tomorrow. > > btw, is it able to reproduce the issue on ppc64 (or ppc64le) as well? No for gcc112 machine (ppc64le). Seems to be related to 32-bit targets.
[Bug tree-optimization/106570] [12/13 Regression] DCE sometimes fails with depending if statements since r12-2305-g398572c1544d8b75
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106570 --- Comment #2 from Andrew Macleod --- I think this is a duplicate of PR106379 . At the VRP2 stage I see: [local count: 1073741824]: if (c_6(D) == s_7(D)) goto ; [34.00%] else goto ; [66.00%] [local count: 365072224]: _1 = ~c_6(D); _2 = _1 & s_7(D); if (_2 != 0) goto ; [75.00%] else goto ; [25.00%] [local count: 628138969]: DCEMarker0_ (); [local count: 1073741824]: return; Which is basically the identical sequence.. it just took longer to get to it :-) We aren't removing this yet with ranger as I need to get to integrate rangers relation oracle with the simplifier so that it will see that _2 = ~s_7 & s_7.
[Bug tree-optimization/106322] [12/13 Regression] tree-vectorize: Wrong code at O2 level (-fno-tree-vectorize is working) since r12-2404-ga1d27560770818c5
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106322 Kewen Lin changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |linkw at gcc dot gnu.org --- Comment #28 from Kewen Lin --- Sorry for the breakage, I'll have a look tomorrow. btw, is it able to reproduce the issue on ppc64 (or ppc64le) as well?
[Bug tree-optimization/106570] [12/13 Regression] DCE sometimes fails with depending if statements since r12-2305-g398572c1544d8b75
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106570 Martin Liška changed: What|Removed |Added Last reconfirmed||2022-08-09 Status|UNCONFIRMED |NEW Ever confirmed|0 |1 Summary|DCE sometimes fails with|[12/13 Regression] DCE |depending if statements |sometimes fails with ||depending if statements ||since ||r12-2305-g398572c1544d8b75 CC||aldyh at gcc dot gnu.org, ||amacleod at redhat dot com, ||marxin at gcc dot gnu.org --- Comment #1 from Martin Liška --- Started with r12-2305-g398572c1544d8b75.
[Bug tree-optimization/106322] [12/13 Regression] tree-vectorize: Wrong code at O2 level (-fno-tree-vectorize is working) since r12-2404-ga1d27560770818c5
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106322 --- Comment #27 from Martin Liška --- Crashes also w/ -fno-strict-aliasing.
[Bug tree-optimization/106322] [12/13 Regression] tree-vectorize: Wrong code at O2 level (-fno-tree-vectorize is working) since r12-2404-ga1d27560770818c5
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106322 Martin Liška changed: What|Removed |Added CC||linkw at gcc dot gnu.org Summary|tree-vectorize: Wrong code |[12/13 Regression] |at O2 level |tree-vectorize: Wrong code |(-fno-tree-vectorize is |at O2 level |working)|(-fno-tree-vectorize is ||working) since ||r12-2404-ga1d27560770818c5 Status|WAITING |NEW --- Comment #26 from Martin Liška --- Cool! I can reproduce it now with: $ g++ *.cc -O3 -m32 -mtune=generic -march=i686 && ./a.out Aborted (core dumped) and it started with r12-2404-ga1d27560770818c5.
[Bug tree-optimization/106322] tree-vectorize: Wrong code at O2 level (-fno-tree-vectorize is working)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106322 --- Comment #25 from Mathieu Malaterre --- (In reply to Martin Liška from comment #24) > > sid64 % g++ *.cc -O2 -m32 && ./a.out > > Please provide output with --verbose. % g++ --verbose *.cc -O2 -m32 && ./a.out Using built-in specs. COLLECT_GCC=g++ COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/12/lto-wrapper OFFLOAD_TARGET_NAMES=nvptx-none:amdgcn-amdhsa OFFLOAD_TARGET_DEFAULT=1 Target: x86_64-linux-gnu Configured with: ../src/configure -v --with-pkgversion='Debian 12.1.0-7' --with-bugurl=file:///usr/share/doc/gcc-12/README.Bugs --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --prefix=/usr --with-gcc-major-version-only --program-suffix=-12 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-plugin --enable-default-pie --with-system-zlib --enable-libphobos-checking=release --with-target-system-zlib=auto --enable-objc-gc=auto --enable-multiarch --disable-werror --enable-cet --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-offload-targets=nvptx-none=/build/gcc-12-aYRw0H/gcc-12-12.1.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-12-aYRw0H/gcc-12-12.1.0/debian/tmp-gcn/usr --enable-offload-defaulted --without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu Thread model: posix Supported LTO compression algorithms: zlib zstd gcc version 12.1.0 (Debian 12.1.0-7) COLLECT_GCC_OPTIONS='-v' '-O2' '-m32' '-shared-libgcc' '-mtune=generic' '-march=i686' '-dumpdir' 'a-' /usr/lib/gcc/x86_64-linux-gnu/12/cc1plus -quiet -v -imultilib 32 -imultiarch i386-linux-gnu -D_GNU_SOURCE bytes.cc -quiet -dumpdir a- -dumpbase bytes.cc -dumpbase-ext .cc -m32 -mtune=generic -march=i686 -O2 -version -fasynchronous-unwind-tables -o /tmp/cccQJh1u.s GNU C++17 (Debian 12.1.0-7) version 12.1.0 (x86_64-linux-gnu) compiled by GNU C version 12.1.0, GMP version 6.2.1, MPFR version 4.1.0, MPC version 1.2.1, isl version isl-0.25-GMP GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072 ignoring nonexistent directory "/usr/lib/gcc/x86_64-linux-gnu/12/../../../../include/i386-linux-gnu/c++/12" ignoring nonexistent directory "/usr/local/include/i386-linux-gnu" ignoring nonexistent directory "/usr/lib/gcc/x86_64-linux-gnu/12/include-fixed" ignoring nonexistent directory "/usr/lib/gcc/x86_64-linux-gnu/12/../../../../x86_64-linux-gnu/include" ignoring nonexistent directory "/usr/include/i386-linux-gnu" #include "..." search starts here: #include <...> search starts here: /usr/include/c++/12 /usr/include/x86_64-linux-gnu/c++/12/32 /usr/include/c++/12/backward /usr/lib/gcc/x86_64-linux-gnu/12/include /usr/local/include /usr/include End of search list. GNU C++17 (Debian 12.1.0-7) version 12.1.0 (x86_64-linux-gnu) compiled by GNU C version 12.1.0, GMP version 6.2.1, MPFR version 4.1.0, MPC version 1.2.1, isl version isl-0.25-GMP GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072 Compiler executable checksum: 8a56007e6299a53b3d2bb12e46ecf480 COLLECT_GCC_OPTIONS='-v' '-O2' '-m32' '-shared-libgcc' '-mtune=generic' '-march=i686' '-dumpdir' 'a-' as -v --32 -o /tmp/ccG1Wx1X.o /tmp/cccQJh1u.s GNU assembler version 2.38.90 (x86_64-linux-gnu) using BFD version (GNU Binutils for Debian) 2.38.90.20220713 COLLECT_GCC_OPTIONS='-v' '-O2' '-m32' '-shared-libgcc' '-mtune=generic' '-march=i686' '-dumpdir' 'a-' /usr/lib/gcc/x86_64-linux-gnu/12/cc1plus -quiet -v -imultilib 32 -imultiarch i386-linux-gnu -D_GNU_SOURCE demo.cc -quiet -dumpdir a- -dumpbase demo.cc -dumpbase-ext .cc -m32 -mtune=generic -march=i686 -O2 -version -fasynchronous-unwind-tables -o /tmp/cccQJh1u.s GNU C++17 (Debian 12.1.0-7) version 12.1.0 (x86_64-linux-gnu) compiled by GNU C version 12.1.0, GMP version 6.2.1, MPFR version 4.1.0, MPC version 1.2.1, isl version isl-0.25-GMP GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072 ignoring nonexistent directory "/usr/lib/gcc/x86_64-linux-gnu/12/../../../../include/i386-linux-gnu/c++/12" ignoring nonexistent directory "/usr/local/include/i386-linux-gnu" ignoring nonexistent directory "/usr/lib/gcc/x86_64-linux-gnu/12/include-fixed" ignoring nonexistent directory "/usr/lib/gcc/x86_64-linux-gnu/12/../../../../x86_64-linux-gnu/include" ignoring nonexistent directory "/usr/include/i386-linux-gnu" #include "..." search starts here: #include <...> search starts here: /usr/include/c++/12 /usr/include/x86_64-linux-gnu/c++/12/32 /usr/include/c++/12/backward /usr/lib/gcc/x86_64-linux-gnu/12/include /usr/local/include /usr/include End of search list. GNU C++17 (Debian 12.1.0-7)
[Bug target/106524] [12/13 Regression] ICE in extract_insn, at recog.cc:2791 (error: unrecognizable insn) since r12-4349-ge36206c9940d22.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106524 Martin Liška changed: What|Removed |Added Summary|[12/13 Regression] ICE in |[12/13 Regression] ICE in |extract_insn, at|extract_insn, at |recog.cc:2791 (error: |recog.cc:2791 (error: |unrecognizable insn)|unrecognizable insn) since ||r12-4349-ge36206c9940d22. Ever confirmed|0 |1 CC||marxin at gcc dot gnu.org, ||tnfchris at gcc dot gnu.org Status|UNCONFIRMED |NEW Last reconfirmed||2022-08-09 --- Comment #1 from Martin Liška --- Started with r12-4349-ge36206c9940d22.
[Bug tree-optimization/106322] tree-vectorize: Wrong code at O2 level (-fno-tree-vectorize is working)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106322 --- Comment #24 from Martin Liška --- > sid64 % g++ *.cc -O2 -m32 && ./a.out Please provide output with --verbose.
[Bug tree-optimization/106322] tree-vectorize: Wrong code at O2 level (-fno-tree-vectorize is working)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106322 --- Comment #23 from Mathieu Malaterre --- Nevermind; I can reproduce the issue with a sid/amd64 chroot: stable64 % schroot -c sid64 sid64 % g++ --version g++ (Debian 12.1.0-7) 12.1.0 Copyright (C) 2022 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. sid64 % g++ *.cc -O2 -m32 && ./a.out zsh: IOT instruction ./a.out I'll report against Debian bugtracker for now.
[Bug tree-optimization/106322] tree-vectorize: Wrong code at O2 level (-fno-tree-vectorize is working)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106322 --- Comment #22 from Uroš Bizjak --- (In reply to Martin Liška from comment #20) > Hmm, can't reproduce with x86_64 compiler with -m32: > > $ g++ --version > g++ (SUSE Linux) 12.1.1 20220721 [revision > 4f15d2234608e82159d030dadb17af678cfad626 > ... > $ g++ *.cc -O2 -m32 && ./a.out && echo Ok > Ok Do you need -msse2 to actually enable vectorization?
[Bug tree-optimization/106322] tree-vectorize: Wrong code at O2 level (-fno-tree-vectorize is working)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106322 --- Comment #21 from Mathieu Malaterre --- (In reply to Martin Liška from comment #20) > Hmm, can't reproduce with x86_64 compiler with -m32: > > $ g++ --version > g++ (SUSE Linux) 12.1.1 20220721 [revision > 4f15d2234608e82159d030dadb17af678cfad626 > ... > $ g++ *.cc -O2 -m32 && ./a.out && echo Ok > Ok I also confirm the behavior over here. However my x86 binary produces the expected 'abort' from my multi-arch amd64. There is no point in attaching *.o here, right ? A quick check seems to indicate that the issue is: schroot-32 $ g++ -O2 -c -o demo.o demo.cc schroot-32 $ amd64 $ g++ -O2 -m32 -c -o bytes.o bytes.cc amd64 $ g++ -O2 -m32 -o demo demo.o bytes.o amd64 $ ./demo zsh: abort ./demo
[Bug d/106563] [12/13 Regression] d: undefined reference to pragma(inline) symbol
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106563 Iain Buclaw changed: What|Removed |Added Resolution|--- |FIXED Status|UNCONFIRMED |RESOLVED --- Comment #4 from Iain Buclaw --- Fix committed.
[Bug d/106563] [12/13 Regression] d: undefined reference to pragma(inline) symbol
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106563 --- Comment #3 from CVS Commits --- The releases/gcc-12 branch has been updated by Iain Buclaw : https://gcc.gnu.org/g:79a86a608691621659b3ce3a24a72aeea4054668 commit r12-8673-g79a86a608691621659b3ce3a24a72aeea4054668 Author: Iain Buclaw Date: Tue Aug 9 12:48:14 2022 +0200 d: Fix undefined reference to pragma(inline) symbol (PR106563) Functions that are declared `pragma(inline)' should be treated as if they are defined in every translation unit they are referenced from, regardless of visibility protection. Ensure they always get DECL_ONE_ONLY linkage, and start emitting them into other modules that import them. PR d/106563 gcc/d/ChangeLog: * decl.cc (DeclVisitor::visit (FuncDeclaration *)): Set semanticRun before generating its symbol. (function_defined_in_root_p): New function. (function_needs_inline_definition_p): New function. (maybe_build_decl_tree): New function. (get_symbol_decl): Call maybe_build_decl_tree before returning symbol. (start_function): Use function_defined_in_root_p instead of inline test for locally defined symbols. (set_linkage_for_decl): Check for inline functions before private or protected symbols. gcc/testsuite/ChangeLog: * gdc.dg/torture/torture.exp (srcdir): New proc. * gdc.dg/torture/imports/pr106563math.d: New test. * gdc.dg/torture/imports/pr106563regex.d: New test. * gdc.dg/torture/imports/pr106563uni.d: New test. * gdc.dg/torture/pr106563.d: New test. (cherry picked from commit 04284176d549ff2565406406a6d53ab4ba8e507d)
[Bug d/106563] [12/13 Regression] d: undefined reference to pragma(inline) symbol
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106563 --- Comment #2 from CVS Commits --- The master branch has been updated by Iain Buclaw : https://gcc.gnu.org/g:04284176d549ff2565406406a6d53ab4ba8e507d commit r13-2002-g04284176d549ff2565406406a6d53ab4ba8e507d Author: Iain Buclaw Date: Tue Aug 9 12:48:14 2022 +0200 d: Fix undefined reference to pragma(inline) symbol (PR106563) Functions that are declared `pragma(inline)' should be treated as if they are defined in every translation unit they are referenced from, regardless of visibility protection. Ensure they always get DECL_ONE_ONLY linkage, and start emitting them into other modules that import them. PR d/106563 gcc/d/ChangeLog: * decl.cc (DeclVisitor::visit (FuncDeclaration *)): Set semanticRun before generating its symbol. (function_defined_in_root_p): New function. (function_needs_inline_definition_p): New function. (maybe_build_decl_tree): New function. (get_symbol_decl): Call maybe_build_decl_tree before returning symbol. (start_function): Use function_defined_in_root_p instead of inline test for locally defined symbols. (set_linkage_for_decl): Check for inline functions before private or protected symbols. gcc/testsuite/ChangeLog: * gdc.dg/torture/torture.exp (srcdir): New proc. * gdc.dg/torture/imports/pr106563math.d: New test. * gdc.dg/torture/imports/pr106563regex.d: New test. * gdc.dg/torture/imports/pr106563uni.d: New test. * gdc.dg/torture/pr106563.d: New test.
[Bug tree-optimization/106523] [10/11/12/13 Regression] forwprop miscompile
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106523 Martin Liška changed: What|Removed |Added CC||marxin at gcc dot gnu.org --- Comment #2 from Martin Liška --- Started with 4.9.0.
[Bug tree-optimization/106322] tree-vectorize: Wrong code at O2 level (-fno-tree-vectorize is working)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106322 --- Comment #20 from Martin Liška --- Hmm, can't reproduce with x86_64 compiler with -m32: $ g++ --version g++ (SUSE Linux) 12.1.1 20220721 [revision 4f15d2234608e82159d030dadb17af678cfad626 ... $ g++ *.cc -O2 -m32 && ./a.out && echo Ok Ok
[Bug sanitizer/106558] ASan failed to detect a global-buffer-overflow
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106558 Martin Liška changed: What|Removed |Added Last reconfirmed||2022-08-09 Ever confirmed|0 |1 Status|UNCONFIRMED |NEW --- Comment #2 from Martin Liška --- Might be related to PR 82501.
[Bug preprocessor/106426] UTF-8 character literals do not have unsigned type in the preprocessor in -fchar8_t mode
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106426 --- Comment #3 from Tom Honermann --- I believe this issue can be resolved as fixed via commit 053876cdbe8057210e6f4da4eec2df58f92ccd4c for the gcc 13 release.
[Bug other/106571] New: Implement -Wsection diag
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106571 Bug ID: 106571 Summary: Implement -Wsection diag Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: other Assignee: unassigned at gcc dot gnu.org Reporter: bp at alien8 dot de Target Milestone: --- Hi, clang has this -Wsection diag which does: https://clang.llvm.org/docs/DiagnosticsReference.html#wsection It would be good to have it in gcc too so that declarations like extern u64 x86_spec_ctrl_current; for variable definitions which belong to a specific section: __attribute__((section(".data..percpu" ""))) __typeof__(u64) x86_spec_ctrl_current; get caught: arch/x86/kernel/cpu/bugs.c:58:21: error: section attribute is specified on redeclared variable [-Werror,-Wsection] Thx.
[Bug tree-optimization/106570] New: DCE sometimes fails with depending if statements
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106570 Bug ID: 106570 Summary: DCE sometimes fails with depending if statements Product: gcc Version: 12.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: tmayerl at student dot ethz.ch Target Milestone: --- Sometimes, DCE fails when multiple if statements are used. For example, GCC detects that the following if statements always evaluate to false and thus removes the dead code: #include #include void DCEMarker0_(); void f(bool s, bool c) { if (!c == !s) { if (s && !c) { DCEMarker0_(); } } } In the next snippet, the if statements are used to set a variable. This variable is then used in the next if statement. However, GCC now fails to detect and eliminate the dead code: #include #include void DCEMarker0_(); void f(bool s, bool c) { int intermediate_result = 0; if (!c == !s) { if (s && !c) { intermediate_result = 1; } } if (((!c == !s) && (s && !c)) || intermediate_result) { DCEMarker0_(); } } This is actually a regression: It works fine until GCC 11.3. This can also be seen via the following Compiler Explorer link: https://godbolt.org/z/n9dKMfqsd
[Bug tree-optimization/106514] [12/13 Regression] ranger slowness in path query
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106514 --- Comment #7 from Richard Biener --- For the testcase m_imports is so big because we have ... [local count: 1073741824]: # c_1198 = PHI _599 = MEM[(unsigned int *)b_1201(D) + 2792B]; d_2401 = _599 + d_2399; if (d_2399 > d_2401) goto ; [50.00%] else goto ; [50.00%] [local count: 536870913]: c_2402 = c_1198 + 1; [local count: 1073741824]: # c_1199 = PHI _600 = MEM[(unsigned int *)b_1201(D) + 2796B]; d_2403 = _600 + d_2401; if (d_2401 > d_2403) goto ; [50.00%] else goto ; [50.00%] so when back_threader::find_paths does ->compute_imports (.., bb 1200) we walk up the whole d_2403 definition chain unbound (for PHIs we restrict to edges on the path which is empty). I realize that there's no good way to pick up extra imports on the fly cheaply - we could handle it when we prune local defs from the imports at which point we could add operands but it's not clear to me that will be a good trade-off. In fact pruning imports looks suspicious as the final path-range query will be limited there? Likewise for any import we add via PHI-translation we fail to add local def operands - we're only getting those from the initial import compute which basically picks those from blocks dominating the exit but no others. I will experiment with re-wiring this.
[Bug target/103498] Spec 2017 imagick_r is 2.62% slower on Power10 with pc-relative addressing compared to not using pc-relative addressing
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103498 --- Comment #2 from Segher Boessenkool --- Mike, do you still see this?
[Bug tree-optimization/106514] [12/13 Regression] ranger slowness in path query
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106514 --- Comment #6 from Richard Biener --- So one now needs to bump the limit to 60 to get enough samples for perf. Then we now see Samples: 55K of event 'cycles:u', Event count (approx.): 49013411833 Overhead Samples Command Shared Object Symbol 51.19% 28195 cc1 cc1 [.] path_range_query::compute_ranges_in_block 11.67% 6427 cc1 cc1 [.] path_range_query::adjust_for_non_null_uses 9.20% 5069 cc1 cc1 [.] path_range_query::range_defined_in_block 3.39% 1869 cc1 cc1 [.] bitmap_set_bit 1.95% 1072 cc1 cc1 [.] back_threader::find_paths_to_names 1.93% 1066 cc1 cc1 [.] bitmap_bit_p the compute_ranges_in_block is also top with 30 but adjust_for_non_null_uses pops up newly with 60. The compute_ranges_in_block slowness is attributed to // ...and then the rest of the imports. EXECUTE_IF_SET_IN_BITMAP (m_imports, 0, i, bi) { tree name = ssa_name (i); Value_Range r (TREE_TYPE (name)); if (gimple_code (SSA_NAME_DEF_STMT (name)) != GIMPLE_PHI && range_defined_in_block (r, name, bb)) plus gori_compute = m_ranger->gori (); bitmap exports = g.exports (bb); EXECUTE_IF_AND_IN_BITMAP (m_imports, exports, 0, i, bi) { tree name = ssa_name (i); Value_Range r (TREE_TYPE (name)); if (g.outgoing_edge_range_p (r, e, name, *this)) for this testcase there seem to be a lot of imports but not many exports so range_defined_in_block is called very many times compared to outgoing_edge_range_p but the latter is comparatively more expensive. For the path query I wonder why we are interested in computing (aka updating the cache) for any but the exports? When we compute the exports, why is the cache not lazily computed just for the interesting names? AFAICS we invalidate all local defs (but even then, why? we get to see a def exactly once, why do we have to even think about clearing sth we should not have seen?) That is, in path_range_query::compute_ranges while (1) { basic_block bb = curr_bb (); compute_ranges_in_block (bb); adjust_for_non_null_uses (bb); if (at_exit ()) break; move_next (); } I'd expect only a small portion of the actual compute_ranges_in_block work to be done for all blocks and the real resolving work only for the block ending the path? Maybe the backwards threader is just using the wrong (expensive) API here? It does m_solver->compute_ranges (path, m_imports); m_solver->range_of_stmt (r, cond); -- Btw, I wondered if path-range-query can handle parts of the path being a "black box", aka, skip to the immediate dominator instead of one of the predecessor edges? I _think_ analysis wise this would be quite straight forward but of course we'd have to represent this somehow in the path. Maybe it works by simply leaving out the intermediate blocks? Thus, B |\ A / \ C D \ / E \ the path would be from B to E but we don't care whether we go the C or D way, and when duplicating the path we'd simply duplicate the whole diamond instead of duplicating only one branch, say A->D, and keeping the edge A->C to the original block C, defeating the threading of E to its successor if we happen to go that way.
[Bug tree-optimization/106514] [12/13 Regression] ranger slowness in path query
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106514 --- Comment #5 from CVS Commits --- The master branch has been updated by Richard Biener : https://gcc.gnu.org/g:409978d58dafa689c5b3f85013e2786526160f2c commit r13-1998-g409978d58dafa689c5b3f85013e2786526160f2c Author: Richard Biener Date: Mon Aug 8 12:20:04 2022 +0200 tree-optimization/106514 - add --param max-jump-thread-paths The following adds a limit for the exponential greedy search of the backwards jump threader. The idea is to limit the search space in a way that the paths considered are the same if the search were in BFS order rather than DFS. In particular it stops considering incoming edges into a block if the product of the in-degrees of blocks on the path exceeds the specified limit. When considering the low stmt copying limit of 7 (or 1 in the size optimize case) this means the degenerate case with maximum search space is a sequence of conditions with no actual code B1 |\ | empty |/ B2 |\ ... Bn |\ GIMPLE_CONDs are costed 2, an equivalent GIMPLE_SWITCH already 4, so we reach 7 already with 3 middle conditions (B1 and Bn do not count). The search space would be 2^4 == 16 to reach this. The FSM threads historically allowed for a thread length of 10 but is really looking for a single multiway branch threaded across the backedge. I've chosen the default of the new parameter to 64 which effectively limits the outdegree of the switch statement (the cases reaching the backedge) to that number (divided by 2 until I add some special pruning for FSM threads due to the loop header indegree). The testcase ssa-dom-thread-7.c requires 56 at the moment (as said, some special FSM thread pruning of considered edges would bring it down to half of that), but we now get one more threading and quite some more in later threadfull. This testcase seems to be difficult to check for expected transforms. The new testcases add the degenerate case we currently thread (without deciding whether that's a good idea ...) plus one with an approripate limit that should prevent the threading. This obsoletes the mentioned --param max-fsm-thread-length but I am not removing it as part of this patch. When the search space is limited the thread stmt size limit effectively provides max-fsm-thread-length. The param with its default does not help PR106514 enough to unleash path searching with the higher FSM stmt count limit. PR tree-optimization/106514 * params.opt (max-jump-thread-paths): New. * doc/invoke.texi (max-jump-thread-paths): Document. * tree-ssa-threadbackward.cc (back_threader::find_paths_to_names): Honor max-jump-thread-paths, take overall_path argument. (back_threader::find_paths): Pass 1 as initial overall_path. * gcc.dg/tree-ssa/ssa-thread-16.c: New testcase. * gcc.dg/tree-ssa/ssa-thread-17.c: Likewise. * gcc.dg/tree-ssa/ssa-dom-thread-7.c: Adjust.
[Bug c++/106567] [13 Regression] An array with a dependent type and initializer-deduced bound is treated as an array of unknown bound when captured in a lambda
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106567 m.cencora at gmail dot com changed: What|Removed |Added CC||m.cencora at gmail dot com --- Comment #4 from m.cencora at gmail dot com --- Seems related to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93259
[Bug rtl-optimization/106568] -freorder-blocks-algorithm appears to causes a crash in stable code, no way to disable it
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106568 --- Comment #21 from Richard Biener --- Try -fsanitize=unreachable - when reordering BBs makes crashes appear/disappear the most likely culprit is we run into a path deemed unreachable which means we fall through to random code. You can also try looking at the -fdump-tree-optimized dump and find the function that's not catching what it is supposed to catch to see if there's any __builtin_unreachable () calls around.
[Bug tree-optimization/106322] tree-vectorize: Wrong code at O2 level (-fno-tree-vectorize is working)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106322 --- Comment #19 from Mathieu Malaterre --- Without hwy dependency: % more Makefile bytes.cc demo.cc :: Makefile :: CXXFLAGS := -O2 demo: demo.o bytes.o $(CXX) $(CXXFLAGS) -o $@ $^ clean: rm -f bytes.o demo.o :: bytes.cc :: #include bool BytesEqual(const void *bytes1, const void *bytes2, const size_t size) { return memcmp(bytes1, bytes2, size) == 0; } :: demo.cc :: #include #include #include #include #include #include #define HWY_ALIGNMENT 64 constexpr size_t kAlignment = HWY_ALIGNMENT; constexpr size_t kAlias = kAlignment * 4; bool BytesEqual(const void *p1, const void *p2, const size_t size); namespace hwy { namespace N_EMU128 { template struct Vec128 { T raw[16 / sizeof(T)] = {}; }; } // namespace N_EMU128 } // namespace hwy template static void Store(const hwy::N_EMU128::Vec128 v, T *__restrict__ aligned) { __builtin_memcpy(aligned, v.raw, sizeof(T) * N); } template static hwy::N_EMU128::Vec128 Load(const T *__restrict__ aligned) { hwy::N_EMU128::Vec128 v; __builtin_memcpy(v.raw, aligned, sizeof(T) * N); return v; } template static hwy::N_EMU128::Vec128 MulHigh(hwy::N_EMU128::Vec128 a, const hwy::N_EMU128::Vec128 b) { for (size_t i = 0; i < N; ++i) { // Cast to uint32_t first to prevent overflow. Otherwise the result of // uint16_t * uint16_t is in "int" which may overflow. In practice the // result is the same but this way it is also defined. a.raw[i] = static_cast( (static_cast(a.raw[i]) * static_cast(b.raw[i])) >> 16); } return a; } #define HWY_ASSERT(condition) assert((condition)) #define HWY_ASSUME_ALIGNED(ptr, align) __builtin_assume_aligned((ptr), (align)) #pragma pack(push, 1) struct AllocationHeader { void *allocated; size_t payload_size; }; #pragma pack(pop) static void FreeAlignedBytes(const void *aligned_pointer) { HWY_ASSERT(aligned_pointer != nullptr); if (aligned_pointer == nullptr) return; const uintptr_t payload = reinterpret_cast(aligned_pointer); HWY_ASSERT(payload % kAlignment == 0); const AllocationHeader *header = reinterpret_cast(payload) - 1; free(header->allocated); } class AlignedFreer { public: template void operator()(T *aligned_pointer) const { FreeAlignedBytes(aligned_pointer); } }; template using AlignedFreeUniquePtr = std::unique_ptr; static inline constexpr size_t ShiftCount(size_t n) { return (n <= 1) ? 0 : 1 + ShiftCount(n / 2); } namespace { static size_t NextAlignedOffset() { static std::atomic next{0}; constexpr uint32_t kGroups = kAlias / kAlignment; const uint32_t group = next.fetch_add(1, std::memory_order_relaxed) % kGroups; const size_t offset = kAlignment * group; HWY_ASSERT((offset % kAlignment == 0) && offset <= kAlias); // std::cerr << "O: " << offset << std::endl; return offset; } } // namespace static void *AllocateAlignedBytes(const size_t payload_size) { HWY_ASSERT(payload_size != 0); // likely a bug in caller if (payload_size >= std::numeric_limits::max() / 2) { HWY_ASSERT(false && "payload_size too large"); return nullptr; } size_t offset = NextAlignedOffset(); // What: | misalign | unused | AllocationHeader |payload // Size: |<= kAlias | offset|payload_size // ^allocated.^aligned.^header^payload // The header must immediately precede payload, which must remain aligned. // To avoid wasting space, the header resides at the end of `unused`, // which therefore cannot be empty (offset == 0). if (offset == 0) { offset = kAlignment; // = RoundUpTo(sizeof(AllocationHeader), kAlignment) static_assert(sizeof(AllocationHeader) <= kAlignment, "Else: round up"); } const size_t allocated_size = kAlias + offset + payload_size; void *allocated = malloc(allocated_size); HWY_ASSERT(allocated != nullptr); if (allocated == nullptr) return nullptr; // Always round up even if already aligned - we already asked for kAlias // extra bytes and there's no way to give them back. uintptr_t aligned = reinterpret_cast(allocated) + kAlias; static_assert((kAlias & (kAlias - 1)) == 0, "kAlias must be a power of 2"); static_assert(kAlias >= kAlignment, "Cannot align to more than kAlias"); aligned &= ~(kAlias - 1); const uintptr_t payload = aligned + offset; // still aligned // Stash `allocated` and payload_size inside header for FreeAlignedBytes(). // The allocated_size can be reconstructed from the payload_size. AllocationHeader *header = reinterpret_cast(payload) - 1; header->allocated = allocated; header->payload_size = payload_size; //printf("%d-byte aligned addr: %p\n", kAlignment, reinterpret_cast(payload)); return HWY_ASSUME_ALIGNED(reinterpret_cast(payload), kAlignment); } template static T *AllocateAlignedItems(size_t items) { constexpr size_t size =
[Bug fortran/106565] Using a transposed matrix in matmul (GCC-10.3.0) is very slow
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106565 Richard Biener changed: What|Removed |Added Last reconfirmed||2022-08-09 Known to fail||12.1.0 Status|UNCONFIRMED |NEW Keywords||missed-optimization Ever confirmed|0 |1 Version|unknown |10.3.0 --- Comment #1 from Richard Biener --- Confirmed also with gfortran 12. The issue is that with the combined matmul+transpose we invoke matmul with an array descriptor representing the transpose operation which results in suboptimal memory access patterns. Can you check whether ifort does the transpose separately or whether its matmul library routine simply special-cases the situation?
[Bug analyzer/106551] [13 Regression] dup2 causes -fanalyzer ICE in valid_to_unchecked_state, at analyzer/sm-fd.cc:751
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106551 --- Comment #2 from Immad Mir --- Sergei Trofimovich: Thanks for bringing the issue to our attention. Dave: I've sent a patch via gcc-patches.
[Bug c/106569] New: enhancement: use STL algorithm instead of a raw loop
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106569 Bug ID: 106569 Summary: enhancement: use STL algorithm instead of a raw loop Product: gcc Version: 12.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: dcb314 at hotmail dot com Target Milestone: --- Static analyser cppcheck can produce these style messages for gcc trunk source code: $ fgrep useStlAlgorithm cppcheck.20220809.out trunk.git/gcc/analyzer/call-string.cc:169:9: style: Consider using std::count_if algorithm instead of a raw loop. [useStlAlgorithm] trunk.git/gcc/analyzer/constraint-manager.cc:2454:0: style: Consider using std::find_if algorithm instead of a raw loop. [useStlAlgorithm] trunk.git/gcc/analyzer/region-model-manager.cc:1230:0: style: Consider using std::any_of algorithm instead of a raw loop. [useStlAlgorithm] trunk.git/gcc/analyzer/region.cc:1245:0: style: Consider using std::any_of algorithm instead of a raw loop. [useStlAlgorithm] trunk.git/gcc/cp/constexpr.cc:348:0: style: Consider using std::any_of algorithm instead of a raw loop. [useStlAlgorithm] trunk.git/gcc/cp/constexpr.cc:5965:8: style: Consider using std::find_if algorithm instead of a raw loop. [useStlAlgorithm] trunk.git/gcc/cp/constexpr.cc:8991:0: style: Consider using std::any_of algorithm instead of a raw loop. [useStlAlgorithm] trunk.git/gcc/rtl-ssa/change-utils.h:28:0: style: Consider using std::any_of algorithm instead of a raw loop. [useStlAlgorithm] trunk.git/gcc/rtl-ssa/blocks.cc:347:0: style: Consider using std::any_of algorithm instead of a raw loop. [useStlAlgorithm] trunk.git/gcc/rtl-ssa/accesses.cc:1507:7: style: Consider using std::any_of algorithm instead of a raw loop. [useStlAlgorithm] trunk.git/gcc/rtl-ssa/member-fns.inl:854:0: style: Consider using std::any_of algorithm instead of a raw loop. [useStlAlgorithm] trunk.git/libsanitizer/hwasan/hwasan_thread_list.h:120:20: style: Consider using std::find_if algorithm instead of a raw loop. [useStlAlgorithm] trunk.git/libsanitizer/hwasan/hwasan_report.cpp:293:0: style: Consider using std::find_if algorithm instead of a raw loop. [useStlAlgorithm] $ None, some or all of these might be worth fixing. I suspect it would not be worthwhile to implement this style warning in gcc.