[Bug c++/98975] Infinite loop produces no assembly (including returning) with -O3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98975 --- Comment #5 from Andrew Pinski --- Note C and C++ are differ here. C says only if the return value is used it becomes undefined while in C++ it is undefined at the point of return.
[Bug target/98977] [x86] Failure to optimize consecutive sub flags usage
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98977 Andrew Pinski changed: What|Removed |Added Severity|normal |enhancement
[Bug lto/96591] [8/9/10/11 Regression] ICE with -flto=auto and -O1: tree code ‘typename_type’ is not supported in LTO streams
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96591 Jason Merrill changed: What|Removed |Added Component|c++ |lto Assignee|jason at gcc dot gnu.org |unassigned at gcc dot gnu.org Status|ASSIGNED|NEW --- Comment #5 from Jason Merrill --- (In reply to Arseny Solokha from comment #4) > Is it somehow related to PR83997? Maybe even a duplicate? No, not a duplicate. Reduced a bit more: struct builtin_simd { using type [[gnu::vector_size(sizeof(scalar_t) * length)]] = scalar_t; }; struct simd_traits { using scalar_type = int; template using rebind = typename X::type; }; template constexpr simd_t fill(typename simd_traits::scalar_type const scalar) { return simd_t{scalar}; } using score_type = typename builtin_simd::type; // Uncommenting this makes it work: // const simd_traits::scalar_type n = 8; score_type data[1]{fill(8)}; The difference from uncommenting that line seems to be that then free_lang_data_in_type is called for simd_traits::scalar_type. So the problem seems to be that find_decls_types isn't finding scalar_type in the vector in the array. So changing component to LTO and unassigning myself. Feel free to change it back if it seems appropriate.
[Bug target/98981] gcc-10.2 for RISC-V has extraneous register moves
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98981 --- Comment #3 from Jim Wilson --- I suppose cost model problems could explain why combine didn't do the optimization. I didn't have a chance to look at that. I still think there is a fundmental problem with how we represent SImode operations, but again cost model problems could explain why my experiments to fix that didn't work as expected. I probably didn't look at that when I was experimenting with riscv.md changes. Your patch does look useful, but setting cost to 1 for MULT is wrong, and would be just as wrong for DIV. That is OK for PLUS, MINUS, and NEG though. I think a better option is to set *total = 0 and return false. That gives no extra cost to the sign extend, and recurs to get the proper cost for the operation underneath. That would work for MUL and DIV. I found code in the rs6000 port that does this.
[Bug target/98981] gcc-10.2 for RISC-V has extraneous register moves
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98981 Kito Cheng changed: What|Removed |Added CC||kito at gcc dot gnu.org --- Comment #2 from Kito Cheng --- Here is a quick patch for fix part of this issue, it seems like because our cost model is inprecise, but I guess I need run benchmark to make sure the performance and code size didn't get any regression. find_max_i32: lui a4,%hi(.LANCHOR0) addia4,a4,%lo(.LANCHOR0) addia3,a4,1024 addia6,a4,400 li a0,0 .L3: lw a5,0(a4) lw a2,0(a3) addia4,a4,4 addia3,a3,4 addwa1,a5,a2 addwa5,a5,a2 bge a1,a0,.L2 mv a5,a0 .L2: sext.w a0,a5 bne a4,a6,.L3 ret Patch: diff --git a/gcc/config/riscv/riscv.c b/gcc/config/riscv/riscv.c index d489717b2a5..b8c9f7200ce 100644 --- a/gcc/config/riscv/riscv.c +++ b/gcc/config/riscv/riscv.c @@ -1879,6 +1879,15 @@ riscv_rtx_costs (rtx x, machine_mode mode, int outer_code, int opno ATTRIBUTE_UN } /* Fall through. */ case SIGN_EXTEND: + if (TARGET_64BIT && !REG_P (XEXP (x, 0))) + { + int code = GET_CODE (XEXP (x, 0)); + if (code == PLUS || code == MINUS || code == NEG || code == MULT) + { + *total = COSTS_N_INSNS (1); + return true; + } + } *total = riscv_extend_cost (XEXP (x, 0), GET_CODE (x) == ZERO_EXTEND); return false;
[Bug target/98981] gcc-10.2 for RISC-V has extraneous register moves
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98981 Jim Wilson changed: What|Removed |Added CC||wilson at gcc dot gnu.org --- Comment #1 from Jim Wilson --- The extra move instruction is a side effect of how the riscv64 toolchain handles 32-bit arithmetic. We lie to the compiler and tell it that we have instructions that produce 32-bit results. In fact, we only have instructions that produce 64-bit sign-extended 32-bit results. The lie means that the RTL has some insns with SImode output and some instructions with DImode outputs, and sometimes we end up with nop moves to convert between the modes. In this case, it is peephole2 after regalloc that notices a SImode add followed by a sign-extend, and converts it to a sign-extending 32-bit add followed by a move, but can't eliminate the move because we already did register allocation. This same problem is also why we get the unnecessary sext after the label, as peephole can't fix that. This problem has been on my todo list for a few years, and I have ideas of how to fix it, but I have no idea when I will have time to try to fix it. I did document it for the RISC-V International Code Speed Optimization task group. https://github.com/riscv/riscv-code-speed-optimization/blob/main/projects/gcc-optimizations.adoc This one is the first one in the list.
[Bug target/98981] New: gcc-10.2 for RISC-V has extraneous register moves
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98981 Bug ID: 98981 Summary: gcc-10.2 for RISC-V has extraneous register moves Product: gcc Version: 10.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: brian.grayson at sifive dot com Target Milestone: --- gcc is inserting an unnecessary register-register move for a simple max-style operation: int a[256], b[256]; int32_t find_max_i32() { int32_t xme = 0, sc=0; for (int32_t i = 0; i < 100; i++) { if ((sc=a[i]+b[i]) > xme) xme=sc; } return xme; } This is from the SPECint2006 benchmark HMMER, in P7Viterbi(), hence the variable names sc and xme from the original source. Under these flags: -march=rv64imafdc -mcmodel=medany -mabi=lp64d -O3 I get this disassembly for the loop: .L5: lw a5,0(a4) lw a2,0(a3) addi a4,a4,4 addi a3,a3,4 addw a2,a5,a2 mv a5,a2 <--- unnecessary move bge a2,a0,.L4 mv a5,a0 .L4: sext.w a0,a5 bne a4,a1,.L5 If the addw targets a5, and the bge compares a5 to a0, the mv could be removed. In fact, if the variable types are changed to int64_t, that's exactly what happens: .L13: ld a5,0(a4) ld a2,0(a3) addi a4,a4,8 addi a3,a3,8 add a5,a5,a2 bgeu a0,a5,.L12 mv a0,a5 .L12: bne a4,a1,.L13
[Bug c++/82952] Hang compiling with g++ -fsanitize=undefined -Wduplicated-branches
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82952 Marek Polacek changed: What|Removed |Added CC||sshannin at gmail dot com --- Comment #8 from Marek Polacek --- *** Bug 98980 has been marked as a duplicate of this bug. ***
[Bug c++/98980] Very slow compilation with -Wduplicated-branches and ubsan
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98980 Marek Polacek changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |DUPLICATE CC||mpolacek at gcc dot gnu.org --- Comment #1 from Marek Polacek --- Dup. *** This bug has been marked as a duplicate of bug 82952 ***
[Bug c++/98980] New: Very slow compilation with -Wduplicated-branches and ubsan
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98980 Bug ID: 98980 Summary: Very slow compilation with -Wduplicated-branches and ubsan Product: gcc Version: 9.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: sshannin at gmail dot com Target Milestone: --- Created attachment 50136 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50136&action=edit the code I have attached a heavily reduced example which encounters excessively slow compilation. I was not able to remove the stringstream include without the degenerate behavior disappearing, unfortunately; I hope that's ok. It seems to be mostly exponential in the number of operator<< invocations, with a couple interesting behaviors - s/long/int/g on the code allows it to compile almost instantly - removing any of the longs being streamed seems to halve the time, but so does replacing the variable with a literal - removing any of the string literals also halves the time. Compiled with flags: /toolchain14/bin/g++ -std=c++2a -Wduplicated-branches -c -fsanitize=undefined -o dup.o dup.cpp /toolchain14/bin/g++ -v Using built-in specs. COLLECT_GCC=/toolchain14/bin/g++ COLLECT_LTO_WRAPPER=/toolchain14/libexec/gcc/x86_64-pc-linux-gnu/9.1.0/lto-wrapper Target: x86_64-pc-linux-gnu Configured with: ../gcc_9_1_0/configure --prefix=/toolchain14 --enable-languages=c,c++,fortran --enable-lto --disable-plugin --program-suffix=-9.1.0 --disable-multilib Thread model: posix gcc version 9.1.0 (GCC)
[Bug fortran/98979] New: [11 regression] ICE in several tests cases after r11-7112
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98979 Bug ID: 98979 Summary: [11 regression] ICE in several tests cases after r11-7112 Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: fortran Assignee: unassigned at gcc dot gnu.org Reporter: seurer at gcc dot gnu.org Target Milestone: --- g:9a4d32f85ccebc0ee4b24e6d9d7a4f11c04d7146, r11-7112 previous run: g:f743fe231663e32d52db987650d0ec3381a777af, r11-7111: 75 failures this run: g:9a4d32f85ccebc0ee4b24e6d9d7a4f11c04d7146, r11-7112: 89 failures FAIL: gfortran.dg/goacc/array-with-dt-2.f90 -O (internal compiler error) FAIL: gfortran.dg/goacc/array-with-dt-2.f90 -O (test for excess errors) FAIL: libgomp.oacc-fortran/array-stride-dt-1.f90 -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable -O0 (internal compiler error) FAIL: libgomp.oacc-fortran/array-stride-dt-1.f90 -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable -O0 (test for excess errors) FAIL: libgomp.oacc-fortran/array-stride-dt-1.f90 -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable -O1 (internal compiler error) FAIL: libgomp.oacc-fortran/array-stride-dt-1.f90 -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable -O1 (test for excess errors) FAIL: libgomp.oacc-fortran/array-stride-dt-1.f90 -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable -O2 (internal compiler error) FAIL: libgomp.oacc-fortran/array-stride-dt-1.f90 -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable -O2 (test for excess errors) FAIL: libgomp.oacc-fortran/array-stride-dt-1.f90 -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions (internal compiler error) FAIL: libgomp.oacc-fortran/array-stride-dt-1.f90 -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions (test for excess errors) FAIL: libgomp.oacc-fortran/array-stride-dt-1.f90 -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable -O3 -g (internal compiler error) FAIL: libgomp.oacc-fortran/array-stride-dt-1.f90 -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable -O3 -g (test for excess errors) FAIL: libgomp.oacc-fortran/array-stride-dt-1.f90 -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable -Os (internal compiler error) FAIL: libgomp.oacc-fortran/array-stride-dt-1.f90 -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable -Os (test for excess errors) /home/seurer/gcc/git/gcc-test/gcc/testsuite/gfortran.dg/goacc/array-with-dt-2.f90:8:34: internal compiler error: Segmentation fault 0x10c21d1b crash_signal /home/seurer/gcc/git/gcc-test/gcc/toplev.c:327 0x10404f18 gfc_conv_scalarized_array_ref /home/seurer/gcc/git/gcc-test/gcc/fortran/trans-array.c:3570 0x10406913 gfc_conv_array_ref(gfc_se*, gfc_array_ref*, gfc_expr*, locus*) /home/seurer/gcc/git/gcc-test/gcc/fortran/trans-array.c:3721 0x1045ad07 gfc_conv_variable /home/seurer/gcc/git/gcc-test/gcc/fortran/trans-expr.c:2998 0x10453f6b gfc_conv_expr(gfc_se*, gfc_expr*) /home/seurer/gcc/git/gcc-test/gcc/fortran/trans-expr.c:8886 0x104614bb gfc_conv_expr_reference(gfc_se*, gfc_expr*, bool) /home/seurer/gcc/git/gcc-test/gcc/fortran/trans-expr.c:8986 0x104a01d7 gfc_trans_omp_array_section /home/seurer/gcc/git/gcc-test/gcc/fortran/trans-openmp.c:2157 0x104aaf1f gfc_trans_omp_clauses /home/seurer/gcc/git/gcc-test/gcc/fortran/trans-openmp.c:3151 0x104bace7 gfc_trans_oacc_executable_directive /home/seurer/gcc/git/gcc-test/gcc/fortran/trans-openmp.c:3984 0x104bace7 gfc_trans_oacc_directive(gfc_code*) /home/seurer/gcc/git/gcc-test/gcc/fortran/trans-openmp.c:6124 0x103fab57 trans_code /home/seurer/gcc/git/gcc-test/gcc/fortran/trans.c:2216 0x1043d48f gfc_generate_function_code(gfc_namespace*) /home/seurer/gcc/git/gcc-test/gcc/fortran/trans-decl.c:6880 0x103fb56b gfc_generate_code(gfc_namespace*) /home/seurer/gcc/git/gcc-test/gcc/fortran/trans.c:2272 0x1037df87 translate_all_program_units /home/seurer/gcc/git/gcc-test/gcc/fortran/parse.c:6351 0x1037df87 gfc_parse_file() /home/seurer/gcc/git/gcc-test/gcc/fortran/parse.c:6620 0x103efa1f gfc_be_parse_file /home/seurer/gcc/git/gcc-test/gcc/fortran/f95-lang.c:212
[Bug c++/95888] [9/10/11 Regression] Regression in 9.3. GCC freezes when compiling code using boost::poly_collection::segment
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95888 --- Comment #5 from Marek Polacek --- Simplified test: template class A { A(int, int); template friend class A; friend T; }; template struct B { template struct C { A begin() { return {1, 0}; } }; template C fn(); }; int main () { B b; b.fn().begin(); }
[Bug libstdc++/98978] Consider packing _M_Engaged in the tail padding of T in optional<>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98978 --- Comment #3 from andysem at mail dot ru --- (In reply to Jonathan Wakely from comment #1) > This would be an ABI break, and so not going to happen. Is there no way to improve standard components implementation? I'd imagine you could provide the new implementation in the new version inline namespace and still support the old ABI for backward compatibility. (In reply to Jonathan Wakely from comment #2) > If we were going to do this, we could also make std::optional occupy a > single byte, using one bit for the value and one for the engaged flag. This would be more problematic as emplace(), value() and operator*() need to return T&, which would not be possible.
[Bug fortran/95682] [9/10/11 Regression] Default assignment fails with allocatable array of deferred-length strings
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95682 --- Comment #2 from anlauf at gcc dot gnu.org --- Adding some printout after initializing the t1%x(:), do i = 1, size(t1%x) print *, len_trim (t1%x(i)), t1%x(i) end do I get for gcc-8: 5 three 5 three 5 three and for 9,10,11: 3 one 3 two 5 three That's not a typical regression, but rather wrong code replaced by other wrong code.
[Bug tree-optimization/98856] [11 Regression] botan AES-128/XTS is slower by ~17% since r11-6649-g285fa338b06b804e72997c4d876ecf08a9c083af
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98856 --- Comment #15 from Jakub Jelinek --- The needed permutations for this boil down to typedef int V __attribute__((vector_size (16))); typedef int W __attribute__((vector_size (32))); #ifdef __clang__ V f1 (V x) { return __builtin_shufflevector (x, x, 1, 1, 3, 3); } V f2 (V x, V y) { return __builtin_shufflevector (x, y, 1, 5, 3, 7); } V f3 (V x, V y) { return __builtin_shufflevector (x, y, 0, 5, 2, 7); } #ifdef __AVX2__ W f4 (W x, W y) { return __builtin_shufflevector (x, y, 1, 9, 3, 11, 5, 13, 7, 15); } W f5 (W x, W y) { return __builtin_shufflevector (x, y, 0, 9, 2, 11, 4, 13, 6, 15); } W f6 (W x) { return __builtin_shufflevector (x, x, 1, 1, 3, 3, 5, 5, 7, 7); } #endif V f7 (V x) { return __builtin_shufflevector (x, x, 1, 3, 2, 3); } V f8 (V x) { return __builtin_shufflevector (x, x, 0, 2, 2, 3); } V f9 (V x, V y) { return __builtin_shufflevector (x, y, 0, 4, 1, 5); } #else V f1 (V x) { return __builtin_shuffle (x, (V) { 1, 1, 3, 3 }); } V f2 (V x, V y) { return __builtin_shuffle (x, y, (V) { 1, 5, 3, 7 }); } V f3 (V x, V y) { return __builtin_shuffle (x, y, (V) { 0, 5, 2, 7 }); } #ifdef __AVX2__ W f4 (W x, W y) { return __builtin_shuffle (x, y, (W) { 1, 9, 3, 11, 5, 13, 7, 15 }); } W f5 (W x, W y) { return __builtin_shuffle (x, y, (W) { 0, 9, 2, 11, 4, 13, 6, 15 }); } W f6 (W x, W y) { return __builtin_shuffle (x, (W) { 1, 1, 3, 3, 5, 5, 7, 7 }); } #endif V f7 (V x) { return __builtin_shuffle (x, (V) { 1, 3, 2, 3 }); } V f8 (V x) { return __builtin_shuffle (x, (V) { 0, 2, 2, 3 }); } V f9 (V x, V y) { return __builtin_shuffle (x, y, (V) { 0, 4, 1, 5 }); } #endif With -msse2, LLVM emits 2 x pshufd $237 + punpckldq for f2 and pshufd $237 + pshufd $232 + punpckldq, we give up or emit very large code. With -msse4, we handle everything, and f1/f3 are the same/comparable, but for f2 we emit 2 x pshufb (with memory operands) + por while LLVM emits pshufd $245 + pblendw $204. With -mavx2, the f2 inefficiency remains, and for f4 we emit 2x vpshufb with memory operands + vpor while LLVM emits vpermilps $245 + vblendps $170. f6-f9 are all insns that we handle through a single insn and that plus f3 are the roadblocks to build the f2 and f4 permutations more efficiently.
[Bug c++/82235] Copy ctor is not found for copying array of an object when it's marked explicit
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82235 Marek Polacek changed: What|Removed |Added CC||mpolacek at gcc dot gnu.org --- Comment #4 from Marek Polacek --- Some debugging notes. We're synthesizing the Bar::Bar(const Bar&) constructor in do_build_copy_constructor which creates a list of Bar's fields along with their initializers. Here we have m D.2398->m because we're initializing m from m of a copy. We pass this list down to finish_mem_initializers, which passes each pair to perform_member_init. perform_member_init sees that we're initializing an array so creates a VEC_INIT_EXPR via build_vec_init_expr. VEC_INIT_EXPRs are expanded in cp_gimplify_expr, so we call build_vec_init to do so. We're initializing an array from another array and so do 4508 else if (type_build_ctor_call (type)) 4509 elt_init = build_aggr_init (to, from, 0, complain) where to = *D.2445 and from = *D.2447. and build_aggr_init has 1822 if (init && init != void_type_node 1823 && TREE_CODE (init) != TREE_LIST 1824 && !(TREE_CODE (init) == TARGET_EXPR 1825&& TARGET_EXPR_DIRECT_INIT_P (init)) 1826 && !DIRECT_LIST_INIT_P (init)) 1827 flags |= LOOKUP_ONLYCONVERTING; and init is an indirect_ref so we set L_O, never realizing that in this case we don't want to set L_O. I suppose we could introduce VEC_INIT_EXPR_DIRECT_INIT_P, set it in perform_member_init, and then use it in cp_gimplify_expr to let build_aggr_init know not to set L_O. Because cp_gimplify_expr can't know in what context the VEC_INIT_EXPR was created.
[Bug c++/98232] [9 Regression] ICE when compiling libreoffice
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98232 --- Comment #7 from Hussam Al-Tayeb --- The patch in bug 95719 fixes the ICE. Can you please backport it to the gcc-9 branch? Also we need some methodology for followup patches so they are marked as candidates for stable branches as well. In this case https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=0ddb93ce77374004 is the initial patch which was applied to 10 and 9 branches. The a followup patch https://gcc.gnu.org/g:554eb7d2e1ef5660d6a8e1c12ee1d751a70bbf31 was only applied in gcc-10 branch but not gcc-9 branch.
[Bug lto/83997] ICE with alias template and attribute
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83997 Jason Merrill changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |jason at gcc dot gnu.org Status|NEW |ASSIGNED
[Bug libstdc++/98978] Consider packing _M_Engaged in the tail padding of T in optional<>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98978 --- Comment #2 from Jonathan Wakely --- If we were going to do this, we could also make std::optional occupy a single byte, using one bit for the value and one for the engaged flag.
[Bug libstdc++/98978] Consider packing _M_Engaged in the tail padding of T in optional<>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98978 Jonathan Wakely changed: What|Removed |Added Keywords||ABI --- Comment #1 from Jonathan Wakely --- This would be an ABI break, and so not going to happen.
[Bug tree-optimization/98856] [11 Regression] botan AES-128/XTS is slower by ~17% since r11-6649-g285fa338b06b804e72997c4d876ecf08a9c083af
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98856 --- Comment #14 from Jakub Jelinek --- WIP that implements that. Except that we need some permutation expansion improvements, both for the SSE2 V4SImode permutation cases and for AVX2 V8SImode permutation cases. --- gcc/config/i386/sse.md.jj 2021-02-05 14:32:44.175463716 +0100 +++ gcc/config/i386/sse.md 2021-02-05 18:49:29.621590903 +0100 @@ -12458,7 +12458,7 @@ (set_attr "prefix" "orig,vex") (set_attr "mode" "")]) -(define_insn "ashr3" +(define_insn "ashr3" [(set (match_operand:VI248_AVX512BW_AVX512VL 0 "register_operand" "=v,v") (ashiftrt:VI248_AVX512BW_AVX512VL (match_operand:VI248_AVX512BW_AVX512VL 1 "nonimmediate_operand" "v,vm") @@ -12472,6 +12472,125 @@ (const_string "0"))) (set_attr "mode" "")]) +(define_expand "ashr3" + [(set (match_operand:VI248_AVX512BW 0 "register_operand") + (ashiftrt:VI248_AVX512BW + (match_operand:VI248_AVX512BW 1 "nonimmediate_operand") + (match_operand:DI 2 "nonmemory_operand")))] + "TARGET_AVX512F") + +(define_expand "ashrv4di3" + [(set (match_operand:V4DI 0 "register_operand") + (ashiftrt:V4DI + (match_operand:V4DI 1 "nonimmediate_operand") + (match_operand:DI 2 "nonmemory_operand")))] + "TARGET_AVX2" +{ + if (!TARGET_AVX512VL) +{ + if (CONST_INT_P (operands[2]) && UINTVAL (operands[2]) >= 63) + { + rtx zero = force_reg (V4DImode, CONST0_RTX (V4DImode)); + emit_insn (gen_avx2_gtv4di3 (operands[0], zero, operands[1])); + DONE; + } + if (operands[2] == const0_rtx) + { + emit_move_insn (operands[0], operands[1]); + DONE; + } + if (CONST_INT_P (operands[2])) + { + vec_perm_builder sel (8, 8, 1); + sel.quick_grow (8); + rtx arg0, arg1; + rtx op1 = lowpart_subreg (V8SImode, operands[1], V4DImode); + rtx target = gen_reg_rtx (V8SImode); + if (INTVAL (operands[2]) > 32) + { + arg0 = gen_reg_rtx (V8SImode); + arg1 = gen_reg_rtx (V8SImode); + emit_insn (gen_ashrv8si3 (arg1, op1, GEN_INT (31))); + emit_insn (gen_ashrv8si3 (arg0, op1, + GEN_INT (INTVAL (operands[2]) - 32))); + sel[0] = 1; + sel[1] = 9; + sel[2] = 3; + sel[3] = 11; + sel[4] = 5; + sel[5] = 13; + sel[6] = 7; + sel[7] = 15; + } + else if (INTVAL (operands[2]) == 32) + { + arg0 = op1; + arg1 = gen_reg_rtx (V8SImode); + emit_insn (gen_ashrv8si3 (arg1, op1, GEN_INT (31))); + sel[0] = 1; + sel[1] = 9; + sel[2] = 3; + sel[3] = 11; + sel[4] = 5; + sel[5] = 13; + sel[6] = 7; + sel[7] = 15; + } + else + { + arg0 = gen_reg_rtx (V2DImode); + arg1 = gen_reg_rtx (V4SImode); + emit_insn (gen_lshrv2di3 (arg0, operands[1], operands[2])); + emit_insn (gen_ashrv4si3 (arg1, op1, operands[2])); + arg0 = lowpart_subreg (V4SImode, arg0, V2DImode); + sel[0] = 0; + sel[1] = 9; + sel[2] = 2; + sel[3] = 11; + sel[4] = 4; + sel[5] = 13; + sel[6] = 6; + sel[7] = 15; + } + vec_perm_indices indices (sel, 2, 8); + bool ok = targetm.vectorize.vec_perm_const (V8SImode, target, + arg0, arg1, indices); + gcc_assert (ok); + emit_move_insn (operands[0], + lowpart_subreg (V4DImode, target, V8SImode)); + DONE; + } + + rtx zero = force_reg (V4DImode, CONST0_RTX (V4DImode)); + rtx zero_or_all_ones = gen_reg_rtx (V4DImode); + emit_insn (gen_avx2_gtv4di3 (zero_or_all_ones, zero, operands[1])); + rtx lshr_res = gen_reg_rtx (V4DImode); + emit_insn (gen_lshrv4di3 (lshr_res, operands[1], operands[2])); + rtx ashl_res = gen_reg_rtx (V4DImode); + rtx amount; + if (TARGET_64BIT) + { + amount = gen_reg_rtx (DImode); + emit_insn (gen_subdi3 (amount, force_reg (DImode, GEN_INT (64)), +operands[2])); + } + else + { + rtx temp = gen_reg_rtx (SImode); + emit_insn (gen_subsi3 (temp, force_reg (SImode, GEN_INT (64)), +lowpart_subreg (SImode, operands[2], +DImode))); + amount = gen_reg_rtx (V4SImode); + emit_insn (gen_vec_setv4si_0 (amount, CONST0_RTX (V4SImode), + temp)); + } + amount = lowpart_subreg (DImode, amount, GET_MODE (amount)); + emit_insn (gen_ashlv4di
[Bug libstdc++/98978] New: Consider packing _M_Engaged in the tail padding of T in optional<>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98978 Bug ID: 98978 Summary: Consider packing _M_Engaged in the tail padding of T in optional<> Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: libstdc++ Assignee: unassigned at gcc dot gnu.org Reporter: andysem at mail dot ru Target Milestone: --- Using std::optional with some types may considerably increase object sizes since it adds alignof(T) bytes worth of overhead. Sometimes it is possible to avoid this overhead if the flag indicating presence of the stored value (_M_Engaged in libstdc++ sources) is placed in the tail padding of the T object. This can be done if std::optional constructs an object of a type that derives from T, which has an additional bool data member that is initialized to true upon construction. The below code roughly illustrates the idea: template< typename T > struct _Optional_payload_base { struct _PresentT : T { const bool _M_Engaged = true; // Forwarding ctors and other members }; static constexpr size_t engaged_offset = offsetof(_PresentT, _M_Engaged); struct _AbsentT { unsigned char _M_Offset[engaged_offset]; const bool _M_Engaged = false; }; union _Storage { _AbsentT _M_Empty; _PresentT _M_Value; _Storage() : _M_Empty() {} // Forwarding ctors and other members }; _Storage _M_payload bool is_engaged() const noexcept { return *reinterpret_cast< const bool* >(reinterpret_cast< const unsigned char* >(&_M_payload) + engaged_offset); } }; The above relies on some implementation details, such as: - offsetof works for the type T. It does for many types in gcc, beyond what is required by the C++ standard. Maybe there is a way to avoid offsetof, I just didn't immediately see it. - The location of _M_Engaged in both _PresentT and _AbsentT is the same. This is a property of the target ABI, and AFAICS it should be true at least on x86 psABI and I think Microsoft ABI. The above will only work for non-final class types, for other types, and where the above requirements don't hold true, the current code with a separate _M_Engaged flag would work.
[Bug c++/93788] Segfault caused by infinite loop in cc1plus
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93788 Marek Polacek changed: What|Removed |Added CC||zhan3299 at purdue dot edu --- Comment #3 from Marek Polacek --- *** Bug 98972 has been marked as a duplicate of this bug. ***
[Bug c++/98972] internal compiler error: Segmentation fault signal terminated program cc1plus
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98972 Marek Polacek changed: What|Removed |Added Status|NEW |RESOLVED CC||mpolacek at gcc dot gnu.org Resolution|--- |DUPLICATE --- Comment #4 from Marek Polacek --- Looks like a dup. *** This bug has been marked as a duplicate of bug 93788 ***
[Bug c++/98972] internal compiler error: Segmentation fault signal terminated program cc1plus
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98972 --- Comment #3 from Zhuo Zhang --- I reduced the test-case, and the simplest test-case should be: --- crash1.cc starts --- constexpr p([](register const signed struct s; --- crash1.cc ends --- The bug is also reproduced on the commit 8d0737d8f4b10bffe0411507ad2dc21ba7679883. Hope it can help. Thanks.
[Bug tree-optimization/98856] [11 Regression] botan AES-128/XTS is slower by ~17% since r11-6649-g285fa338b06b804e72997c4d876ecf08a9c083af
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98856 --- Comment #13 from Jakub Jelinek --- Looking at what other compilers emit for this, ICC seems to be completely broken, it emits logical right shifts instead of arithmetic right shift, and LLVM trunk emits for >> 63 what this patch emits, for >> 17 it emits vpsrad $17, %xmm0, %xmm1 vpsrlq $17, %xmm0, %xmm0 vpblendd$10, %xmm1, %xmm0, %xmm0 instead of vpxor %xmm1, %xmm1, %xmm1 vpcmpgtq%xmm0, %xmm1, %xmm1 vpsrlq $17, %xmm0, %xmm0 vpsllq $47, %xmm1, %xmm1 vpor%xmm1, %xmm0, %xmm0 the patch emits. For >> 47 it emits: vpsrad $31, %xmm0, %xmm1 vpsrad $15, %xmm0, %xmm0 vpshufd $245, %xmm0, %xmm0 vpblendd$10, %xmm1, %xmm0, %xmm0 etc. So, in summary, for >> 63 with SSE4.2 I think what the patch does looks best, for >> 63 and SSE2 we can emit psrad $31 instead and permute the odd elements into even ones (i.e. __builtin_shuffle ((v4si) x >> 31, { 1, 1, 3, 3 })). For >> cst where cst < 32, do a psrad and psrlq by that cst and permute such that we get the even SI elts from the psrlq result and odd from psrad result. For >> 32, do a psrad $31 and permute to get the even SI elts from odd elts of the source and odd SI elts from odd results of psrad $31. For >> cst where cst > 32, do psrad $31 and psrad $(cst-32) and permute such that even SI elts come from odd elts of the latter and odd elts come from odd elts of the former.
[Bug c++/98947] [10 Regression] Incorrect warning when using a ternary operator to select one of two volatile variables to write to
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98947 Marek Polacek changed: What|Removed |Added Summary|[10/11 Regression] |[10 Regression] Incorrect |Incorrect warning when |warning when using a |using a ternary operator to |ternary operator to select |select one of two volatile |one of two volatile |variables to write to |variables to write to --- Comment #4 from Marek Polacek --- Fixed on trunk so far.
[Bug c++/98947] [10/11 Regression] Incorrect warning when using a ternary operator to select one of two volatile variables to write to
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98947 --- Comment #3 from CVS Commits --- The master branch has been updated by Marek Polacek : https://gcc.gnu.org/g:7a18bc4ae62081021f4fd90d591a588cac931f77 commit r11-7126-g7a18bc4ae62081021f4fd90d591a588cac931f77 Author: Marek Polacek Date: Wed Feb 3 17:57:22 2021 -0500 c++: Fix bogus -Wvolatile warning in C++20 [PR98947] Since most of volatile is deprecated in C++20, we are required to warn for compound assignments to volatile variables and so on. But here we have volatile int x, y, z; (b ? x : y) = 1; and we shouldn't warn, because simple assignments like x = 24; should not provoke the warning when they are a discarded-value expression. We warn here because when ?: is used as an lvalue, we transform it in cp_build_modify_expr/COND_EXPR from (a ? b : c) = rhs to (a ? (b = rhs) : (c = rhs)) and build_conditional_expr then calls mark_lvalue_use for the new artificial assignments, which then evokes the warning. The calls to mark_lvalue_use were added in r160289 to suppress warnings in Wunused-var-10.c, but looks like they're no longer needed. To warn on (b ? (x = 2) : y) = 1; (b ? x : (y = 5)) = 1; I've tweaked a check in mark_use/MODIFY_EXPR. I'd argue this is a regression because GCC 9 doesn't warn. gcc/cp/ChangeLog: PR c++/98947 * call.c (build_conditional_expr_1): Don't call mark_lvalue_use on arg2/arg3. * expr.c (mark_use) : Don't check read_p when issuing the -Wvolatile warning. Only set TREE_THIS_VOLATILE if a warning was emitted. gcc/testsuite/ChangeLog: PR c++/98947 * g++.dg/cpp2a/volatile5.C: New test.
[Bug c++/96462] [10 Regression] ICE in tree check: expected identifier_node, have bit_not_expr in find_namespace_slot, at cp/name-lookup.c:97
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96462 Marek Polacek changed: What|Removed |Added Resolution|--- |FIXED Status|ASSIGNED|RESOLVED Summary|[10/11 Regression] ICE in |[10 Regression] ICE in |tree check: expected|tree check: expected |identifier_node, have |identifier_node, have |bit_not_expr in |bit_not_expr in |find_namespace_slot, at |find_namespace_slot, at |cp/name-lookup.c:97 |cp/name-lookup.c:97 --- Comment #6 from Marek Polacek --- Fixed in GCC 11.
[Bug c++/96462] [10/11 Regression] ICE in tree check: expected identifier_node, have bit_not_expr in find_namespace_slot, at cp/name-lookup.c:97
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96462 --- Comment #5 from CVS Commits --- The master branch has been updated by Marek Polacek : https://gcc.gnu.org/g:1cbc10d894494c34987d1f42f955e7843457ee38 commit r11-7125-g1cbc10d894494c34987d1f42f955e7843457ee38 Author: Marek Polacek Date: Thu Feb 4 12:53:59 2021 -0500 c++: Fix ICE with invalid using enum [PR96462] Here we ICE in finish_nonmember_using_decl -> lookup_using_decl -> ... -> find_namespace_slot because "name" is not an IDENTIFIER_NODE. It is a BIT_NOT_EXPR because this broken test uses using E::~E; // SCOPE::NAME A using-decl can't refer to a destructor, and lookup_using_decl already checks that in the class member case. But in C++17, we do the "enum scope is the enclosing scope" block, and so scope gets set to ::, and we go into the NAMESPACE_DECL block. In C++20 we don't do it, we go to the ENUMERAL_TYPE block. I resorted to hoisting the check along with a diagnostic tweak: we don't want to print "~E names destructor". gcc/cp/ChangeLog: PR c++/96462 * name-lookup.c (lookup_using_decl): Hoist the destructor check. gcc/testsuite/ChangeLog: PR c++/96462 * g++.dg/cpp2a/using-enum-8.C: New test.
[Bug target/98931] [11 Regression] arm: Assembly fails with "branch out of range or not a multiple of 2" since r11-2012
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98931 --- Comment #12 from akrl at gcc dot gnu.org --- Right LE is 4 bytes, good catch thanks
[Bug target/98931] [11 Regression] arm: Assembly fails with "branch out of range or not a multiple of 2" since r11-2012
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98931 --- Comment #11 from Jakub Jelinek --- Isn't the normal length of short le lr, 1b 4 bytes rather than 2?
[Bug tree-optimization/97236] [8 Regression] g:e93428a8b056aed83a7678 triggers vlc miscompile
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97236 ktkachov at gcc dot gnu.org changed: What|Removed |Added Known to work||9.3.1 CC||ktkachov at gcc dot gnu.org Summary|[8/9 Regression]|[8 Regression] |g:e93428a8b056aed83a7678|g:e93428a8b056aed83a7678 |triggers vlc miscompile |triggers vlc miscompile --- Comment #14 from ktkachov at gcc dot gnu.org --- Fixed for GCC 9.4
[Bug tree-optimization/97236] [8/9 Regression] g:e93428a8b056aed83a7678 triggers vlc miscompile
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97236 --- Comment #13 from ktkachov at gcc dot gnu.org --- *** Bug 98949 has been marked as a duplicate of this bug. ***
[Bug tree-optimization/98949] gcc-9.3 aarch64 -ftree-vectorize generates wrong code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98949 ktkachov at gcc dot gnu.org changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |DUPLICATE --- Comment #5 from ktkachov at gcc dot gnu.org --- Dup. The patch fixing PR 97236 has been backported to the GCC 9 branch for GCC 9.4 *** This bug has been marked as a duplicate of bug 97236 ***
[Bug tree-optimization/97236] [8/9 Regression] g:e93428a8b056aed83a7678 triggers vlc miscompile
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97236 --- Comment #12 from CVS Commits --- The releases/gcc-9 branch has been updated by Kyrylo Tkachov : https://gcc.gnu.org/g:97b668f9a8c6ec565c278a60e7d1492a6932e409 commit r9-9224-g97b668f9a8c6ec565c278a60e7d1492a6932e409 Author: Matthias Klose Date: Tue Oct 6 13:41:37 2020 +0200 Backport fix for PR/tree-optimization/97236 - fix bad use of VMAT_CONTIGUOUS This avoids using VMAT_CONTIGUOUS with single-element interleaving when using V1mode vectors. Instead keep VMAT_ELEMENTWISE but continue to avoid load-lanes and gathers. 2020-10-01 Richard Biener PR tree-optimization/97236 * tree-vect-stmts.c (get_group_load_store_type): Keep VMAT_ELEMENTWISE for single-element vectors. * gcc.dg/vect/pr97236.c: New testcase. (cherry picked from commit 1ab88985631dd2c5a5e3b5c0dce47cf8b6ed2f82)
Re: [Bug target/98931] [11 Regression] arm: Assembly fails with "branch out of range or not a multiple of 2" since r11-2012
Following suggestions I'm testing the attached emitting the following for long branches where LE cannot cover: subslr, #1 bmi .L2 >From 0cd38cb29829b48f96e8e060e7a875f49236b67b Mon Sep 17 00:00:00 2001 From: Andrea Corallo Date: Wed, 3 Feb 2021 15:21:54 +0100 Subject: [PATCH] arm: Add low overhead loop address range check [PR98931] 2021-02-05 Andrea Corallo * config/arm/thumb2.md: Generate alternative sequence for long range branches. --- gcc/config/arm/thumb2.md | 13 - 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/gcc/config/arm/thumb2.md b/gcc/config/arm/thumb2.md index bd53bf320de..a8327066bfe 100644 --- a/gcc/config/arm/thumb2.md +++ b/gcc/config/arm/thumb2.md @@ -1719,7 +1719,18 @@ (set (reg:SI LR_REGNUM) (plus:SI (reg:SI LR_REGNUM) (const_int -1)))])] "TARGET_32BIT && TARGET_HAVE_LOB" - "le\t%|lr, %l0") + "* + if (get_attr_length (insn) == 2) +return \"le\\t%|lr, %l0\"; + else +return \"subs\\t%|lr, #1\;bmi\\t%l0\"; + " + [(set (attr "length") +(if_then_else +(lt (minus (pc) (match_dup 0)) (const_int 1024)) + (const_int 2) + (const_int 6))) + (set_attr "type" "branch")]) (define_expand "doloop_begin" [(match_operand 0 "" "") -- 2.20.1
[Bug target/98931] [11 Regression] arm: Assembly fails with "branch out of range or not a multiple of 2" since r11-2012
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98931 --- Comment #10 from Andrea Corallo --- Following suggestions I'm testing the attached emitting the following for long branches where LE cannot cover: subslr, #1 bmi .L2
[Bug c++/95719] [10/11 Regression] ICE in lookup_vfn_in_binfo at gcc/cp/class.c:2459 since r11-954-g0ddb93ce77374004
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95719 Hussam Al-Tayeb changed: What|Removed |Added CC||ht990332 at gmx dot com --- Comment #6 from Hussam Al-Tayeb --- gcc-9 branch also has a backport of https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=0ddb93ce77374004 which caused the regression. Can the fix for this bug be backported to the gcc-9 branch please?
[Bug sanitizer/98920] [10/11 Regression] uses regexec without support for REG_STARTEND with -fsanitize=address
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98920 --- Comment #10 from Jakub Jelinek --- Ugh, that is quite misdesigned then...
[Bug tree-optimization/98856] [11 Regression] botan AES-128/XTS is slower by ~17% since r11-6649-g285fa338b06b804e72997c4d876ecf08a9c083af
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98856 --- Comment #12 from Jakub Jelinek --- V4DImode arithmetic right shifts would be (untested): --- gcc/config/i386/sse.md.jj 2021-02-05 14:32:44.175463716 +0100 +++ gcc/config/i386/sse.md 2021-02-05 15:24:37.942026401 +0100 @@ -12458,7 +12458,7 @@ (set_attr "prefix" "orig,vex") (set_attr "mode" "")]) -(define_insn "ashr3" +(define_insn "ashr3" [(set (match_operand:VI248_AVX512BW_AVX512VL 0 "register_operand" "=v,v") (ashiftrt:VI248_AVX512BW_AVX512VL (match_operand:VI248_AVX512BW_AVX512VL 1 "nonimmediate_operand" "v,vm") @@ -12472,6 +12472,67 @@ (const_string "0"))) (set_attr "mode" "")]) +(define_expand "ashr3" + [(set (match_operand:VI248_AVX512BW 0 "register_operand") + (ashiftrt:VI248_AVX512BW + (match_operand:VI248_AVX512BW 1 "nonimmediate_operand") + (match_operand:DI 2 "nonmemory_operand")))] + "TARGET_AVX512F") + +(define_expand "ashrv4di3" + [(set (match_operand:V4DI 0 "register_operand") + (ashiftrt:V4DI + (match_operand:V4DI 1 "nonimmediate_operand") + (match_operand:DI 2 "nonmemory_operand")))] + "TARGET_AVX2" +{ + if (!TARGET_AVX512VL) +{ + if (CONST_INT_P (operands[2]) && INTVAL (operands[2]) == 63) + { + rtx zero = force_reg (V4DImode, CONST0_RTX (V4DImode)); + emit_insn (gen_avx2_gtv4di3 (operands[0], zero, operands[1])); + DONE; + } + if (operands[2] == const0_rtx) + { + emit_move_insn (operands[0], operands[1]); + DONE; + } + + rtx zero = force_reg (V4DImode, CONST0_RTX (V4DImode)); + rtx zero_or_all_ones = gen_reg_rtx (V4DImode); + emit_insn (gen_avx2_gtv4di3 (zero_or_all_ones, zero, operands[1])); + rtx lshr_res = gen_reg_rtx (V4DImode); + emit_insn (gen_lshrv4di3 (lshr_res, operands[1], operands[2])); + rtx ashl_res = gen_reg_rtx (V4DImode); + rtx amount; + if (CONST_INT_P (operands[2])) + amount = GEN_INT (64 - INTVAL (operands[2])); + else if (TARGET_64BIT) + { + amount = gen_reg_rtx (DImode); + emit_insn (gen_subdi3 (amount, force_reg (DImode, GEN_INT (64)), +operands[2])); + } + else + { + rtx temp = gen_reg_rtx (SImode); + emit_insn (gen_subsi3 (temp, force_reg (SImode, GEN_INT (64)), +lowpart_subreg (SImode, operands[2], +DImode))); + amount = gen_reg_rtx (V4SImode); + emit_insn (gen_vec_setv4si_0 (amount, CONST0_RTX (V4SImode), + temp)); + } + if (!CONST_INT_P (operands[2])) + amount = lowpart_subreg (DImode, amount, GET_MODE (amount)); + emit_insn (gen_ashlv4di3 (ashl_res, zero_or_all_ones, amount)); + emit_insn (gen_iorv4di3 (operands[0], lshr_res, ashl_res)); + DONE; +} +}) + (define_insn "3" [(set (match_operand:VI248_AVX512BW_2 0 "register_operand" "=v,v") (any_lshift:VI248_AVX512BW_2 Trying 3 different routines, one returning >> 63 of a V4DImode vector, another one >> 17 and another one >> var, the differences with -mavx2 are: - vextracti128$0x1, %ymm0, %xmm1 - vmovq %xmm0, %rax - vpextrq $1, %xmm0, %rcx - cqto - vmovq %xmm1, %rax - sarq$63, %rcx - sarq$63, %rax - vmovq %rdx, %xmm3 - movq%rax, %rsi - vpextrq $1, %xmm1, %rax - vpinsrq $1, %rcx, %xmm3, %xmm0 - sarq$63, %rax - vmovq %rsi, %xmm2 - vpinsrq $1, %rax, %xmm2, %xmm1 - vinserti128 $0x1, %xmm1, %ymm0, %ymm0 + vmovdqa %ymm0, %ymm1 + vpxor %xmm0, %xmm0, %xmm0 + vpcmpgtq%ymm1, %ymm0, %ymm0 - vmovq %xmm0, %rax - vextracti128$0x1, %ymm0, %xmm1 - vpextrq $1, %xmm0, %rcx - sarq$17, %rax - sarq$17, %rcx - movq%rax, %rdx - vmovq %xmm1, %rax - sarq$17, %rax - vmovq %rdx, %xmm3 - movq%rax, %rsi - vpextrq $1, %xmm1, %rax - vpinsrq $1, %rcx, %xmm3, %xmm0 - sarq$17, %rax - vmovq %rsi, %xmm2 - vpinsrq $1, %rax, %xmm2, %xmm1 - vinserti128 $0x1, %xmm1, %ymm0, %ymm0 + vpxor %xmm1, %xmm1, %xmm1 + vpcmpgtq%ymm0, %ymm1, %ymm1 + vpsrlq $17, %ymm0, %ymm0 + vpsllq $47, %ymm1, %ymm1 + vpor%ymm1, %ymm0, %ymm0 and - movl%edi, %ecx - vmovq %xmm0, %rax - vextracti128$0x1, %ymm0, %xmm1 - sarq%cl, %rax - vpextrq $1, %xmm0, %rsi - movq%rax, %rdx - vmovq %xmm1, %rax - sarq%cl, %rsi - sarq%cl, %rax - vmovq %rdx, %xmm3 - movq%rax, %rdi - vpextrq $1, %xmm1, %rax - vpinsrq $1, %rsi, %xmm3, %xmm0 - sarq%cl, %rax + vpxor %xmm1, %xmm1, %xmm1 + movslq %edi,
[Bug c++/98232] [9 Regression] ICE when compiling libreoffice
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98232 --- Comment #6 from Martin Liška --- (In reply to Hussam Al-Tayeb from comment #4) > (In reply to Martin Liška from comment #3) > > Run build system with in a verbose mode (V=1 or VERBOSE=1), or so. > > And then for the problematic TU do -E, which will save pre-processed source > > file instead of the object file. > > Can you please tell me what to type for -E? And what is a TU? You need to display full command line where vcl/workben/vcldemo.cxx is compiler. In order to do that do: make V=1 VERBOSE=1 then take the command line and append '-E'. And attach pre-processed source file that will be in '-o .o' file.
[Bug c++/98232] [9 Regression] ICE when compiling libreoffice
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98232 --- Comment #5 from Hussam Al-Tayeb --- I also found this https://bugzilla.redhat.com/show_bug.cgi?id=1858036
[Bug sanitizer/98920] [10/11 Regression] uses regexec without support for REG_STARTEND with -fsanitize=address
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98920 --- Comment #9 from Florian Weimer --- (In reply to Jakub Jelinek from comment #8) > Even if it does, exporting regexec@@GLIBC_2.3.4 from libsanitizer when glibc > doesn't support that symbol looks wrong. I think all the interceptors use unversioned symbols, so this doesn't matter. Yes, it's quite broken, but when fixing this, you might as well go with the flow …
[Bug c++/98232] [9 Regression] ICE when compiling libreoffice
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98232 --- Comment #4 from Hussam Al-Tayeb --- (In reply to Martin Liška from comment #3) > Run build system with in a verbose mode (V=1 or VERBOSE=1), or so. > And then for the problematic TU do -E, which will save pre-processed source > file instead of the object file. Can you please tell me what to type for -E? And what is a TU?
[Bug target/98977] New: [x86] Failure to optimize consecutive sub flags usage
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98977 Bug ID: 98977 Summary: [x86] Failure to optimize consecutive sub flags usage Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: gabravier at gmail dot com Target Milestone: --- extern bool z, c; uint8_t f(uint8_t dest, uint8_t src) { u8 res = dest - src; z = !res; c = src > dest; return res; } With -O3, LLVM outputs this: f(unsigned char, unsigned char): mov eax, edi sub al, sil sete byte ptr [rip + z] setb byte ptr [rip + c] ret GCC outputs this: f(unsigned char, unsigned char): mov eax, edi sub al, sil sete BYTE PTR z[rip] cmp dil, sil setb BYTE PTR c[rip] ret It seems desirable to eliminate the `cmp`, unless there's some weird flag stall thing I'm not aware of.
[Bug analyzer/98969] [11 Regression] ICE: Segmentation fault (in print_mem_ref)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98969 David Malcolm changed: What|Removed |Added Assignee|msebor at gcc dot gnu.org |dmalcolm at gcc dot gnu.org --- Comment #6 from David Malcolm --- Mine; the analyzer shouldn't ICE by constructing malformed trees. Also, the leak diagnostic is arguably a false positive in that (struct TYPE_14__ *) _round_2_cb_n_0 is still effectively reachable by the caller after the function returns.
[Bug driver/98943] [11 Regression] gcc driver does not fail on unknown files: tricks configure scripts to recognize /W4 and -diag-disable 1,2,3,4 options
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98943 Nathan Sidwell changed: What|Removed |Added Resolution|--- |FIXED Status|ASSIGNED|RESOLVED --- Comment #10 from Nathan Sidwell --- 6606b852bfa 2021-02-04 | driver: error for nonexistent linker inputs [PR 98943] I think that's sufficient, but please reopen if it is not.
[Bug driver/98943] [11 Regression] gcc driver does not fail on unknown files: tricks configure scripts to recognize /W4 and -diag-disable 1,2,3,4 options
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98943 --- Comment #9 from CVS Commits --- The master branch has been updated by Nathan Sidwell : https://gcc.gnu.org/g:6606b852bfa866c19375a7c5e9cb94776a28bd94 commit r11-7124-g6606b852bfa866c19375a7c5e9cb94776a28bd94 Author: Nathan Sidwell Date: Thu Feb 4 08:16:17 2021 -0800 driver: error for nonexistent linker inputs [PR 98943] We used to check all unknown input files, even when passing them to a compiler. But that caused problems. However, not erroring out on non-existent would-be-linker inputs confuses configure machinery that probes the compiler to see if it accepts various inputs. This restores the access check for things that are thought to be linker input files, when we're not linking. (If we are linking, we presume the linker will error out on its own accord.) PR driver/98943 gcc/ * gcc.c (driver::maybe_run_linker): Check for input file accessibility if not linking. gcc/testsuite/ * c-c++-common/pr98943.c: New.
[Bug analyzer/98969] [11 Regression] ICE: Segmentation fault (in print_mem_ref)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98969 --- Comment #5 from Jakub Jelinek --- Yeah, seems the analyzer looked through the cast, so either it shouldn't, or it needs to readd the cast in there. As for print_mem_ref, if we wanted to protect it from bogus MEM_REF creation (not sure about if we want to), the right change IMHO would be to set access_type to NULL_TREE if TREE_TYPE (arg) doesn't have POINTER_TYPE_P, and in the spots that use access_type treat access_type NULL as unknown access type, e.g. access_cast should be true if access_type is NULL, and char_cast should be true too.
[Bug c++/98975] Infinite loop produces no assembly (including returning) with -O3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98975 --- Comment #4 from Jakub Jelinek --- The only thing that should be fixed is whatever code invokes the UB. There is no bug on the compiler side, you essentially end up with __builtin_unreachable (); in place of the loop. You can use -fsanitize=unreachable to get a runtime diagnostics instead if the UB is turned into __builtin_unreachable ().
[Bug c++/98972] internal compiler error: Segmentation fault signal terminated program cc1plus
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98972 Martin Liška changed: What|Removed |Added Status|WAITING |NEW
[Bug tree-optimization/98856] [11 Regression] botan AES-128/XTS is slower by ~17% since r11-6649-g285fa338b06b804e72997c4d876ecf08a9c083af
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98856 Jakub Jelinek changed: What|Removed |Added CC||uros at gcc dot gnu.org --- Comment #11 from Jakub Jelinek --- For V2DImode arithmetic right shift, I think it would be something like: --- gcc/config/i386/sse.md.jj 2021-01-27 11:50:09.168981297 +0100 +++ gcc/config/i386/sse.md 2021-02-05 14:32:44.175463716 +0100 @@ -20313,10 +20313,55 @@ (define_expand "ashrv2di3" (ashiftrt:V2DI (match_operand:V2DI 1 "register_operand") (match_operand:DI 2 "nonmemory_operand")))] - "TARGET_XOP || TARGET_AVX512VL" + "TARGET_SSE4_2" { if (!TARGET_AVX512VL) { + if (CONST_INT_P (operands[2]) && INTVAL (operands[2]) == 63) + { + rtx zero = force_reg (V2DImode, CONST0_RTX (V2DImode)); + emit_insn (gen_sse4_2_gtv2di3 (operands[0], zero, operands[1])); + DONE; + } + if (operands[2] == const0_rtx) + { + emit_move_insn (operands[0], operands[1]); + DONE; + } + if (!TARGET_XOP) + { + rtx zero = force_reg (V2DImode, CONST0_RTX (V2DImode)); + rtx zero_or_all_ones = gen_reg_rtx (V2DImode); + emit_insn (gen_sse4_2_gtv2di3 (zero_or_all_ones, zero, operands[1])); + rtx lshr_res = gen_reg_rtx (V2DImode); + emit_insn (gen_lshrv2di3 (lshr_res, operands[1], operands[2])); + rtx ashl_res = gen_reg_rtx (V2DImode); + rtx amount; + if (CONST_INT_P (operands[2])) + amount = GEN_INT (64 - INTVAL (operands[2])); + else if (TARGET_64BIT) + { + amount = gen_reg_rtx (DImode); + emit_insn (gen_subdi3 (amount, force_reg (DImode, GEN_INT (64)), +operands[2])); + } + else + { + rtx temp = gen_reg_rtx (SImode); + emit_insn (gen_subsi3 (temp, force_reg (SImode, GEN_INT (64)), +lowpart_subreg (SImode, operands[2], +DImode))); + amount = gen_reg_rtx (V4SImode); + emit_insn (gen_vec_setv4si_0 (amount, CONST0_RTX (V4SImode), + temp)); + } + if (!CONST_INT_P (operands[2])) + amount = lowpart_subreg (DImode, amount, GET_MODE (amount)); + emit_insn (gen_ashlv2di3 (ashl_res, zero_or_all_ones, amount)); + emit_insn (gen_iorv2di3 (operands[0], lshr_res, ashl_res)); + DONE; + } + rtx reg = gen_reg_rtx (V2DImode); rtx par; bool negate = false; plus adjusting the cost computation to hint that at least the non-63 arithmetic right V2DImode shifts are more expensive. Even if in the end the V2DImode arithmetic right shifts turn to be more expensive than scalar code (though, it surprises me at least for the >> 63 case), I think V4DImode for TARGET_AVX2 should be beneficial always (haven't tried to adjust the expander for that yet).
[Bug c++/98975] Infinite loop produces no assembly (including returning) with -O3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98975 --- Comment #3 from Emil Meissner --- (In reply to Jakub Jelinek from comment #2) > And the bug is? The code always invokes undefined behavior, so anything can > happen. Whilst that is true, shouldn't it still be fixed, given (possible) security implications?
[Bug lto/98971] LTO removes __patchable_function_entries
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98971 --- Comment #5 from Gabriel F. T. Gomes --- (In reply to Martin Liška from comment #4) > > Well, the intermediate object contains just LTO bytecode, that's why you > can't see the section. You can use -ffat-lto-objects in order to generate > both assembly and LTO bytecode. Indeed. Thanks again! :)
[Bug c++/98976] New: [coroutines] co_return in a switch statement doesn't make a generic lambda non-constexpr
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98976 Bug ID: 98976 Summary: [coroutines] co_return in a switch statement doesn't make a generic lambda non-constexpr Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: mail+gnu at tzik dot jp Target Milestone: --- In a repro case below, the lambda is wrongly handled as a constexpr and its co_return causes a compile error, as a coroutine can not be constexpr. https://wandbox.org/permlink/y4pEMCNki1ndzJYI The gcc here is the trunk version as of today, and the command was: $ g++ -c -std=c++20 failcase.cc --- failcase.cc #include struct future { struct promise_type { std::suspend_always initial_suspend() noexcept { return {};} std::suspend_always final_suspend() noexcept { return {}; } void unhandled_exception() {} future get_return_object() { return {}; } void return_void() {} }; }; void failcase() { auto foo = [](auto&&) -> future { switch (42) { case 42: co_return; } }; foo(1); } The error message was: prog.cc: In instantiation of 'failcase():: [with auto:1 = int]': prog.cc:20:9: required from here prog.cc:17:9: error: 'co_return' cannot be used in a 'constexpr' function 17 | co_return; | ^
[Bug c++/98975] Infinite loop produces no assembly (including returning) with -O3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98975 Jakub Jelinek changed: What|Removed |Added Resolution|--- |INVALID CC||jakub at gcc dot gnu.org Status|UNCONFIRMED |RESOLVED --- Comment #2 from Jakub Jelinek --- And the bug is? The code always invokes undefined behavior, so anything can happen.
[Bug c++/98975] Infinite loop produces no assembly (including returning) with -O3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98975 --- Comment #1 from Emil Meissner --- The code in the attachment, compiled with `g++ file.cpp -o bug -O3 -std=c++20` produces no assembly for both the `main` and `bsort` function`. (I.e. not even a `ret` instruction), ultimating in a segmentation fault when run. The code has an intentional bug in it, where instead of comparing `j < std::size(arr)` we compare `i < std::size(arr)`. I couldn't further simplify the example. Compiling with -O2 and -O1 produces the expected infinite loop. I suspect this may be exploitable.
[Bug c++/98972] internal compiler error: Segmentation fault signal terminated program cc1plus
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98972 --- Comment #2 from Zhuo Zhang --- (In reply to Martin Liška from comment #1) > Thank you for the report. Actually, it's an invalid code and we do have a > lot of error recovery ICEs. > Or do you have an original test-case that is a valid C++ code? Hi, thanks for your prompt reply. I think I do not have a valid C++ code, as this test-case is generated by fuzzer.
[Bug sanitizer/98920] [10/11 Regression] uses regexec without support for REG_STARTEND with -fsanitize=address
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98920 --- Comment #8 from Jakub Jelinek --- Even if it does, exporting regexec@@GLIBC_2.3.4 from libsanitizer when glibc doesn't support that symbol looks wrong.
[Bug c++/98975] New: Infinite loop produces no assembly (including returning) with -O3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98975 Bug ID: 98975 Summary: Infinite loop produces no assembly (including returning) with -O3 Product: gcc Version: 10.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: e.meissner at seznam dot cz Target Milestone: --- Created attachment 50134 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50134&action=edit Code producing the bug
[Bug lto/98971] LTO removes __patchable_function_entries
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98971 --- Comment #4 from Martin Liška --- > The only difference now is that the intermediate object doesn't have a > __patchable_function_entries section, but that's OK as far as I can tell. Well, the intermediate object contains just LTO bytecode, that's why you can't see the section. You can use -ffat-lto-objects in order to generate both assembly and LTO bytecode.
[Bug sanitizer/98920] [10/11 Regression] uses regexec without support for REG_STARTEND with -fsanitize=address
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98920 Florian Weimer changed: What|Removed |Added CC||fw at gcc dot gnu.org --- Comment #7 from Florian Weimer --- I think libsanitizer falls back to a version-less lookup if the version cannot be found. Therefore, if the glibc baseline is after 2.3.4, the version-less lookup will find the unversioned symbol, which has the right behavior. I don't see any architecture that has two regexec symbols, but does not use GLIBC_2.3.4 for the most recent symbol, based on this command in the glibc source tree: git grep -c ' regexec F' | grep :2$ | cut -d: -f1 | xargs grep ' regexec F' A comment in the interceptor might make sense, though.
[Bug lto/98971] LTO removes __patchable_function_entries
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98971 --- Comment #3 from Gabriel F. T. Gomes --- (In reply to Martin Liška from comment #2) > > @Gabriel: Is it intended behavior? That's what I expected, yes! Thank you. The only difference now is that the intermediate object doesn't have a __patchable_function_entries section, but that's OK as far as I can tell. With: $ gcc libtesta.c -fPIC -fpatchable-function-entry=4,2 -flto -c -o libtesta.o $ gcc libtesta.o -flto -shared -o libtesta.so now I get: $ readelf --sections libtesta.o | grep __patchable $ readelf --sections libtesta.so | grep __patchable [22] __patchable_[...] PROGBITS 4020 3020 Cheers!
[Bug sanitizer/98920] [10/11 Regression] uses regexec without support for REG_STARTEND with -fsanitize=address
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98920 --- Comment #6 from Jakub Jelinek --- Well, it is not about what arches you care about, but what arches we support in libsanitizer/configure.tgt (from *-linux*). So, riscv64, aarch64, mips, arm, s390*, sparc*, powerpc*, x86. So it is desirable to get it right for all these. Thus, I think you want to use GLIBC_2.3.4 for Linux except on arm, riscv*, x86_64 -mx32, powerpc64le and aarch64. So you need to look up SANITIZE* macros for all of these...
[Bug tree-optimization/98855] [11 Regression] botan XTEA is 100% slower on znver2 since r11-4428-g4a369d199bf2f34e
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98855 Richard Biener changed: What|Removed |Added Resolution|--- |FIXED Status|ASSIGNED|RESOLVED --- Comment #10 from Richard Biener --- Fixed.
[Bug tree-optimization/98855] [11 Regression] botan XTEA is 100% slower on znver2 since r11-4428-g4a369d199bf2f34e
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98855 --- Comment #9 from CVS Commits --- The master branch has been updated by Richard Biener : https://gcc.gnu.org/g:63538886d1f7fc7cbf066b4c2d6d7fd4da537259 commit r11-7123-g63538886d1f7fc7cbf066b4c2d6d7fd4da537259 Author: Richard Biener Date: Fri Feb 5 09:54:00 2021 +0100 tree-optimization/98855 - redo BB vectorization costing The following attempts to account for the fact that BB vectorization regions now can span multiple loop levels and that an unprofitable inner loop vectorization shouldn't be offsetted by a profitable outer loop vectorization to make it overall profitable. For now I've implemented a heuristic based on the premise that vectorization should be profitable even if loops may not be entered or if they iterate any number of times. Especially the first assumption then requires that stmts directly belonging to loop A need to be costed separately from stmts belonging to another loop which also simplifies the implementation. On x86 the added testcase has in the outer loop t.c:38:20: note: Cost model analysis for part in loop 1: Vector cost: 56 Scalar cost: 192 and the inner loop t.c:38:20: note: Cost model analysis for part in loop 2: Vector cost: 132 Scalar cost: 48 and thus the vectorization is considered not profitable (note the same would happen in case the 2nd cost were for a loop outer to the 1st costing). Future enhancements may consider static knowledge of whether a loop is always entered which would allow some inefficiency in the vectorization of its loop header. Likewise stmts only reachable from a loop exit can be treated this way. 2021-02-05 Richard Biener PR tree-optimization/98855 * tree-vectorizer.h (add_stmt_cost): New overload. * tree-vect-slp.c (li_cost_vec_cmp): New. (vect_bb_slp_scalar_cost): Cost individual loop regions separately. Account for the scalar instance root stmt. * g++.dg/vect/slp-pr98855.cc: New testcase.
[Bug tree-optimization/98856] [11 Regression] botan AES-128/XTS is slower by ~17% since r11-6649-g285fa338b06b804e72997c4d876ecf08a9c083af
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98856 --- Comment #10 from Richard Biener --- (In reply to Jakub Jelinek from comment #9) > For arithmetic >> (element_precision - 1) one can just use > {,v}pxor + {,v}pcmpgtq, as in instead of return vec >> 63; do return vec < 0; > (in C++-ish way), aka VEC_COND_EXPR vec < 0, { all ones }, { 0 } > For other arithmetic shifts by scalar constant, perhaps one can replace > return vec >> 17; with return (vectype) ((uvectype) vec >> 17) | ((vec < 0) > << (64 - 17)); > - it will actually work even for non-constant scalar shift amounts because > {,v}psllq treats shift counts > 63 as 0. OK, so that yields poly_double_le2: .LFB0: .cfi_startproc vmovdqu (%rsi), %xmm0 vpxor %xmm1, %xmm1, %xmm1 vpalignr$8, %xmm0, %xmm0, %xmm2 vpcmpgtq%xmm2, %xmm1, %xmm1 vpand .LC0(%rip), %xmm1, %xmm1 vpsllq $1, %xmm0, %xmm0 vpxor %xmm1, %xmm0, %xmm0 vmovdqu %xmm0, (%rdi) ret when I feed the following to SLP2 directly: void __GIMPLE (ssa,guessed_local(1073741824),startwith("slp")) poly_double_le2 (unsigned char * out, const unsigned char * in) { long unsigned int carry; long unsigned int _1; long unsigned int _2; long unsigned int _3; long unsigned int _4; long unsigned int _5; long unsigned int _6; __int128 unsigned _9; long unsigned int _14; long unsigned int _15; long int _18; long int _19; long unsigned int _20; __BB(2,guessed_local(1073741824)): _9 = __MEM <__int128 unsigned, 8> ((char *)in_8(D)); _14 = __BIT_FIELD_REF (_9, 64u, 64u); _18 = (long int) _14; _1 = _18 < 0l ? _Literal (unsigned long) -1ul : 0ul; carry_10 = _1 & 135ul; _2 = _14 << 1; _15 = __BIT_FIELD_REF (_9, 64u, 0u); _19 = (long int) _15; _20 = _19 < 0l ? _Literal (unsigned long) -1ul : 0ul; _3 = _20 & 1ul; _4 = _2 ^ _3; _5 = _15 << 1; _6 = _5 ^ carry_10; __MEM ((char *)out_11(D)) = _6; __MEM ((char *)out_11(D) + _Literal (char *) 8) = _4; return; } with [local count: 1073741824]: _9 = MEM <__int128 unsigned> [(char *)in_8(D)]; _12 = VIEW_CONVERT_EXPR(_9); _7 = VEC_PERM_EXPR <_12, _12, { 1, 0 }>; vect__18.1_25 = VIEW_CONVERT_EXPR(_7); vect_carry_10.3_28 = .VCOND (vect__18.1_25, { 0, 0 }, { 135, 1 }, { 0, 0 }, 108); vect__5.0_13 = _12 << 1; vect__6.4_29 = vect__5.0_13 ^ vect_carry_10.3_28; MEM [(char *)out_11(D)] = vect__6.4_29; return; in .optimized The latency of the data is at least 7 instructions that way, compared to 4 in the not vectorized code (guess I could try Intel iaca on it). So if that's indeed the best we can do then it's not profitable (btw, with the above the vectorizers conclusion is not profitable but due to excessive costing of constants for the condition vectorization). Simple asm replacement of the kernel results in ES-128/XTS 292740 key schedule/sec; 0.00 ms/op 11571 cycles/op (2 ops in 0 ms) AES-128/XTS encrypt buffer size 1024 bytes: 765.571 MiB/sec 4.62 cycles/byte (382.79 MiB in 500.00 ms) AES-128/XTS decrypt buffer size 1024 bytes: 767.064 MiB/sec 4.61 cycles/byte (382.79 MiB in 499.03 ms) compared to AES-128/XTS 283527 key schedule/sec; 0.00 ms/op 11932 cycles/op (2 ops in 0 ms) AES-128/XTS encrypt buffer size 1024 bytes: 768.446 MiB/sec 4.60 cycles/byte (384.22 MiB in 500.00 ms) AES-128/XTS decrypt buffer size 1024 bytes: 769.292 MiB/sec 4.60 cycles/byte (384.22 MiB in 499.45 ms) so that's indeed no improvement. Bigger block sizes also contain vector code but that's not exercised by the botan speed measurement.
[Bug sanitizer/98920] [10/11 Regression] uses regexec without support for REG_STARTEND with -fsanitize=address
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98920 --- Comment #5 from Martin Liška --- (In reply to Jakub Jelinek from comment #3) > I'm not sure if your patch is correct. > glibc has the system of earliest symbol versions, and so on certain > architectures > GLIBC_2.3.4 symver will not appear at all. > Given: > ./sysdeps/mach/hurd/shlib-versions:DEFAULTGLIBC_2.2.6 > ./sysdeps/unix/sysv/linux/csky/shlib-versions:DEFAULT > GLIBC_2.29 > ./sysdeps/unix/sysv/linux/microblaze/shlib-versions:DEFAULT > GLIBC_2.18 > ./sysdeps/unix/sysv/linux/arc/shlib-versions:DEFAULT > GLIBC_2.32 > ./sysdeps/unix/sysv/linux/m68k/coldfire/shlib-versions:DEFAULT > GLIBC_2.4 > ./sysdeps/unix/sysv/linux/arm/shlib-versions:DEFAULT > GLIBC_2.4 > ./sysdeps/unix/sysv/linux/s390/s390-64/shlib-versions:DEFAULT > GLIBC_2.2 > ./sysdeps/unix/sysv/linux/riscv/shlib-versions:DEFAULT > GLIBC_2.27 > ./sysdeps/unix/sysv/linux/riscv/shlib-versions:DEFAULT > GLIBC_2.27 > ./sysdeps/unix/sysv/linux/riscv/shlib-versions:DEFAULT > GLIBC_2.33 > ./sysdeps/unix/sysv/linux/riscv/shlib-versions:DEFAULT > GLIBC_2.33 > ./sysdeps/unix/sysv/linux/x86_64/x32/shlib-versions:# DEFAULT > Earliest > symbol set > ./sysdeps/unix/sysv/linux/x86_64/x32/shlib-versions:DEFAULT > GLIBC_2.16 > ./sysdeps/unix/sysv/linux/x86_64/64/shlib-versions:# DEFAULT > Earliest > symbol set > ./sysdeps/unix/sysv/linux/x86_64/64/shlib-versions:DEFAULT > GLIBC_2.2.5 > ./sysdeps/unix/sysv/linux/powerpc/powerpc64/shlib-versions:DEFAULT > GLIBC_2.17 > ./sysdeps/unix/sysv/linux/powerpc/powerpc64/shlib-versions:DEFAULT > GLIBC_2.3 > ./sysdeps/unix/sysv/linux/nios2/shlib-versions:DEFAULT > GLIBC_2.21 > ./sysdeps/unix/sysv/linux/aarch64/shlib-versions:DEFAULT > GLIBC_2.17 > and the limited list of arches supported by libsanitizer, I'd say at least > riscv*, powerpc64le and aarch64 (and maybe x86-64 -mx32 if supported) are > affected. Thank you for this. You are right, my patch is not correct. So for the archs we care about we should do: x86_64 - require GLIBC_2.3.4 ppc64 - require GLIBC_2.3.4 aarch64, ppc64le, x32 and riscv* are newer than 2.3.4, so a default non-versioned symbol should be fine. Am I right?
[Bug rtl-optimization/98973] [11 regression] Wrong code with gcse store motion pass
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98973 --- Comment #8 from Richard Biener --- (In reply to Richard Biener from comment #6) > Btw, -fgcse-sm is nowhere enabled by default (same applies to -fgcse-las), > we should consider removing these optimizations (though -fgcse-las at least > sounds > useful and I wonder why it is not enabled). GCSE store-motion should be > re-implemented on GIMPLE, replacing the sink pass (there were previous > attempts in implementing SSU-PRE). > > A comment in store-motion.c claims > > /* This pass implements downward store motion. >As of May 1, 2009, the pass is not enabled by default on any target, >but bootstrap completes on ia64 and x86_64 with the pass enabled. */ > > I'm trying if enabling it by default still bootstraps & tests OK on x86-64 > (also enabling gcse-las at the same time..) It does. Extra FAILs are FAIL: c-c++-common/guality/Og-dce-2.c -Og line 17 ptr->a == 1 FAIL: c-c++-common/guality/Og-dce-2.c -Og -flto line 17 ptr->a == 1
[Bug debug/98656] [9/10 Regression] switchlower_O0 drops line number of switch
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98656 Martin Liška changed: What|Removed |Added Known to work||11.0 Summary|[9/10/11 Regression]|[9/10 Regression] |switchlower_O0 drops line |switchlower_O0 drops line |number of switch|number of switch --- Comment #6 from Martin Liška --- Fixed on master so far.
[Bug debug/98656] [9/10/11 Regression] switchlower_O0 drops line number of switch
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98656 --- Comment #5 from CVS Commits --- The master branch has been updated by Martin Liska : https://gcc.gnu.org/g:4ede02a5f2af1205434f0e05aaaeff762b24e329 commit r11-7122-g4ede02a5f2af1205434f0e05aaaeff762b24e329 Author: Tom de Vries Date: Fri Feb 5 10:36:38 2021 +0100 debug: fix switch lowering debug info gcc/ChangeLog: PR debug/98656 * tree-switch-conversion.c (jump_table_cluster::emit): Add loc argument. (bit_test_cluster::emit): Reuse location_t for newly created gswitch statement. (switch_decision_tree::try_switch_expansion): Preserve location_t. * tree-switch-conversion.h: Change function signatures.
[Bug rtl-optimization/98973] [11 regression] Wrong code with gcse store motion pass
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98973 Jakub Jelinek changed: What|Removed |Added CC||jakub at gcc dot gnu.org --- Comment #7 from Jakub Jelinek --- Started with r11-4122-g06729598b0dc10dbe60545f21c2214ad66a5a3db
[Bug lto/98971] LTO removes __patchable_function_entries
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98971 --- Comment #2 from Martin Liška --- Created attachment 50133 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50133&action=edit Tentative patch As seen the flag -fpatchable-function-entry is properly marked as Optimization. However, it's the argument is parsed early and stored into the following tuple: ; How many NOP insns to place at each function entry by default Variable HOST_WIDE_INT function_entry_patch_area_size ; And how far the real asm entry point is into this area Variable HOST_WIDE_INT function_entry_patch_area_start That does not work with set_current_function where per-function arguments are restored. My tentative patch fixes that. The following examples works now: $ cat pr98971.c int testa7(void) { return 7; } int __attribute__((patchable_function_entry(10,5))) testa77(void) { return 77; } #pragma GCC optimize("patchable-function-entry=0,0") int testa_no(void) { return 1234; } $ cat pr98971-2.c int testa8(void) { return 8; } $ gcc pr98971.c -fPIC -fpatchable-function-entry=4,1 -flto -c $ gcc pr98971-2.c -fPIC -flto -c $ gcc pr98971.o pr98971-2.o -flto -shared -o x.so $ objdump -d x.so ... 0650 : 650: f3 0f 1e fa endbr64 654: e9 77 ff ff ff jmp5d0 659: 90 nop 065a : 65a: 90 nop 65b: 90 nop 65c: 90 nop 65d: 55 push %rbp 65e: 48 89 e5mov%rsp,%rbp 661: b8 07 00 00 00 mov$0x7,%eax 666: 5d pop%rbp 667: c3 ret 668: 90 nop 669: 90 nop 66a: 90 nop 66b: 90 nop 66c: 90 nop 066d : 66d: 90 nop 66e: 90 nop 66f: 90 nop 670: 90 nop 671: 90 nop 672: 55 push %rbp 673: 48 89 e5mov%rsp,%rbp 676: b8 4d 00 00 00 mov$0x4d,%eax 67b: 5d pop%rbp 67c: c3 ret 067d : 67d: 55 push %rbp 67e: 48 89 e5mov%rsp,%rbp 681: b8 d2 04 00 00 mov$0x4d2,%eax 686: 5d pop%rbp 687: c3 ret 0688 : 688: 55 push %rbp 689: 48 89 e5mov%rsp,%rbp 68c: b8 08 00 00 00 mov$0x8,%eax 691: 5d pop%rbp 692: c3 ret @Gabriel: Is it intended behavior?
[Bug middle-end/98974] [11 Regression] ICE in vectorizable_condition after STMT_VINFO_VEC_STMTS
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98974 --- Comment #3 from Richard Biener --- (In reply to avieira from comment #1) > The testcase above issues a warning, around do j=jts,enddo > > To use it as a testcase in my patch I'd like to get rid of it so if someone > proficient in Fortran knows a way to get rid of it that'd be great! The following still reproduces the issue for me and is more valid. module module_foobar integer,parameter :: fp_kind = selected_real_kind(15) contains subroutine foobar( foo, ix ,jx ,kx,iy,ky) real, dimension( ix, kx, jx ) :: foo real(fp_kind), dimension( iy, ky, 3 ) :: bar, baz do k=1,ky do i=1,iy if ( baz(i,k,1) > 0. ) then bar(i,k,1) = 0 endif foo(i,nk,j) = baz0 * bar(i,k,1) enddo enddo end end
[Bug rtl-optimization/98782] [11 Regression] Bad interaction between IPA frequences and IRA resulting in spills due to changes in BB frequencies
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98782 Tamar Christina changed: What|Removed |Added CC||hubicka at gcc dot gnu.org Summary|IRA artificially creating |[11 Regression] Bad |spills due to BB|interaction between IPA |frequencies |frequences and IRA ||resulting in spills due to ||changes in BB frequencies --- Comment #3 from Tamar Christina --- Hi, Since we are in stage-4 I'd like to put all our ducks in a row and see what the options are at this point. IRA as you can imagine is huge and quite complex, the more I investigate the problem the more I realize that there isn't a spot fix for this issue. It will require a lot more work in IRA and understanding parts of it that I don't fully understand yet. But one thing is clear, there is a severe interaction between IPA predictions and IRA under conditions where there is high register pressure *and* a function call. The problem is that the changes introduced in g:1118a3ff9d3ad6a64bba25dc01e7703325e23d92 make local changes. i.e. they effect only some BB and not others. The problem is any spot fix in IRA would be a globally scoped. I was investigating whether the issue could be solved by having IRA treat the recursive inlined function in exchange2 as one region instead of going live range splitting. And yes using -fira-region=one does make a difference, but only a small difference of about 33% of the regression. However doing this has some disadvantage in that regions that before would not count in the live range of the call are now counted, so you regress spilling in those cases. This is why this flag can only recover 33% of the regression, it introduces some of it's own. The second alternative I tried as a spot fix is to be able to specify a weight for the CALL_FREQ for use during situations of high reg pressure and call live ranges. The "hack" looks like this: index 4fe019b2367..674e6ca7a48 100644 --- a/gcc/caller-save.c +++ b/gcc/caller-save.c @@ -425,6 +425,7 @@ setup_save_areas (void) || find_reg_note (insn, REG_NORETURN, NULL)) continue; freq = REG_FREQ_FROM_BB (BLOCK_FOR_INSN (insn)); + freq = freq * (param_ira_call_freq_weight / 100.f); REG_SET_TO_HARD_REG_SET (hard_regs_to_save, &chain->live_throughout); used_regs = insn_callee_abi (insn).full_reg_clobbers (); diff --git a/gcc/ira-lives.c b/gcc/ira-lives.c index 4ba29dcadf4..6e2699e5a7d 100644 --- a/gcc/ira-lives.c +++ b/gcc/ira-lives.c @@ -1392,7 +1392,7 @@ process_bb_node_lives (ira_loop_tree_node_t loop_tree_node) it was saved on previous call in the same basic block and the hard register was not mentioned between the two calls. */ - ALLOCNO_CALL_FREQ (a) += freq / 3; + ALLOCNO_CALL_FREQ (a) += (freq * (param_ira_call_freq_weight / 100.0f)); diff --git a/gcc/params.opt b/gcc/params.opt index cfed980a4d2..39d5cae9f31 100644 --- a/gcc/params.opt +++ b/gcc/params.opt @@ -321,6 +321,11 @@ Max size of conflict table in MB. Common Joined UInteger Var(param_ira_max_loops_num) Init(100) Param Optimization Max loops number for regional RA. +-param=ira-call-freq-weight= +Common Joined UInteger Var(param_ira_call_freq_weight) Init(100) Param Optimization +Scale to be applied to the weighting of the frequencies of allocations live across +a call. + -param=iv-always-prune-cand-set-bound= Common Joined UInteger Var(param_iv_always_prune_cand_set_bound) Init(10) Param Optimization If number of candidates in the set is smaller, we always try to remove unused ivs during its optimization. And if we look at the changes in the frequency between the good and bad case the prediction changes approx 40%. So using the value of --param ira-call-freq-weight=40 recovers about 60% of the regression. The issue this global change introduce is however that IRA seems to start preferring callee-saves. Which is in itself not an issue, but at the boundary of a region it will then emit moves from temp to callee-saves to carry live values to the next region. This is completely unneeded, enabling the late register renaming pass (-frename-registers) removes these superfluous moves and we recover 66% of the regression. But this is just a big hack. The obvious disadvantage here, since again it's a global change is that it pushes all caller saves to be spilled before the function call. And indeed, before the recursive call there now is a massive amount of spilling happening. But it is something that would be "safe" to do at this point in the GCC development cycle. The last and preferred approach, if you a
[Bug rtl-optimization/98973] [11 regression] Wrong code with gcse store motion pass
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98973 --- Comment #6 from Richard Biener --- Btw, -fgcse-sm is nowhere enabled by default (same applies to -fgcse-las), we should consider removing these optimizations (though -fgcse-las at least sounds useful and I wonder why it is not enabled). GCSE store-motion should be re-implemented on GIMPLE, replacing the sink pass (there were previous attempts in implementing SSU-PRE). A comment in store-motion.c claims /* This pass implements downward store motion. As of May 1, 2009, the pass is not enabled by default on any target, but bootstrap completes on ia64 and x86_64 with the pass enabled. */ I'm trying if enabling it by default still bootstraps & tests OK on x86-64 (also enabling gcse-las at the same time..)
[Bug tree-optimization/98856] [11 Regression] botan AES-128/XTS is slower by ~17% since r11-6649-g285fa338b06b804e72997c4d876ecf08a9c083af
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98856 Jakub Jelinek changed: What|Removed |Added CC||jakub at gcc dot gnu.org --- Comment #9 from Jakub Jelinek --- For arithmetic >> (element_precision - 1) one can just use {,v}pxor + {,v}pcmpgtq, as in instead of return vec >> 63; do return vec < 0; (in C++-ish way), aka VEC_COND_EXPR vec < 0, { all ones }, { 0 } For other arithmetic shifts by scalar constant, perhaps one can replace return vec >> 17; with return (vectype) ((uvectype) vec >> 17) | ((vec < 0) << (64 - 17)); - it will actually work even for non-constant scalar shift amounts because {,v}psllq treats shift counts > 63 as 0.
[Bug tree-optimization/98932] Wrong output with -O3 on aarch64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98932 --- Comment #7 from Kristian --- Thanks! Yes, I agree. We are however bound to earlier versions due to CUDA-dependency on the NVIDIA Jetson-platforms: https://github.com/OE4T/meta-tegra/wiki/Compatibility-notes Hopefully NVIDIA will update their CUDA-libraries in due time. Best, Kristian
[Bug fortran/98890] ICE on reference to module function
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98890 Tobias Burnus changed: What|Removed |Added CC||burnus at gcc dot gnu.org Keywords||ice-on-invalid-code --- Comment #3 from Tobias Burnus --- Likewise for the following, which uses an assignment: implicit none contains real function bar(x) real :: x(2,2) bar = bar ! OK bar = baz ! ERROR: function name not reference bar = get_funptr() ! ERROR: proc-pointer returning function bar = bar * x(1,1) ! OK bar = baz * x(1,1) ! error - as above but as operator bar = get_funptr() * x(1,1) ! likewise end function bar function get_funptr() result(ptr) procedure(bar), pointer :: ptr ptr => bar end real function baz(x) result(bazr) real :: x(2,2) bazr=x(1,1) end function baz end module foo * * * I am not sure whether the problem is that expr_type == EXPR_VARIABLE instead of expr_type == EXPR_FUNCTION or whether the proper fix should be inside both resolve_ordinary_assign() and resolve_operator() a check like: symbol_attribute rhs_attr = gfc_expr_attr (rhs); if (rhs_attr.function && ...) { gfc_error ("Unexpected function name at %L", &rhs->where); return false; } if (rhs_attr.proc_pointer) { gfc_error ("Unexpected procedure pointer at %L", &rhs->where); return false; } where "..." detects that the rhs may be used as result name in this context. This check always confuses me. And a quick try failed: I tried rhs_attr – but it is identical for 'bar' and baz'; and also 'sym->result = sym' is the same (if changing 'baz' to use no result variable). I also thought about the namespace but thanks to BLOCK and contained procedures (which may access their parent's result variable) it is not that simple. * * * I have not checked but, e.g., for 'call foo(baz)' a similar issue may pop up. I think not occurring, but to check: proc_pointer_comp (should be resolved already at parse time?) and derived-type procedures returning proc pointers (same check as for other functions).
[Bug tree-optimization/98932] Wrong output with -O3 on aarch64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98932 --- Comment #6 from Martin Liška --- (In reply to Kristian from comment #5) > Thanks for such a swift reponse! Looking forward to testing the patches for > 8.x. You're welcome. Just a note, please try to use the latest GCC release (version 10). GCC 8 will goes out of support quite soon and you will more likely receive backports for serious bugs.
[Bug tree-optimization/98932] Wrong output with -O3 on aarch64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98932 --- Comment #5 from Kristian --- Thanks for such a swift reponse! Looking forward to testing the patches for 8.x.
[Bug testsuite/98325] [11 regression] gcc.dg/pr25376.c fails after r11-5027
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98325 Jakub Jelinek changed: What|Removed |Added CC||jakub at gcc dot gnu.org Status|REOPENED|RESOLVED Resolution|--- |FIXED --- Comment #5 from Jakub Jelinek --- Fixed, verified in gcc-testresults archive too.
[Bug middle-end/98974] [11 Regression] ICE in vectorizable_condition after STMT_VINFO_VEC_STMTS
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98974 ktkachov at gcc dot gnu.org changed: What|Removed |Added Last reconfirmed||2021-02-05 Known to fail||11.0 Status|UNCONFIRMED |NEW CC||ktkachov at gcc dot gnu.org Summary|ICE in |[11 Regression] ICE in |vectorizable_condition |vectorizable_condition |after STMT_VINFO_VEC_STMTS |after STMT_VINFO_VEC_STMTS Priority|P3 |P1 Target Milestone|--- |11.0 Ever confirmed|0 |1 Target||aarch64 Known to work||10.2.1 --- Comment #2 from ktkachov at gcc dot gnu.org --- Confirmed. This affects building 521.wrf_r from SPEC2017 with LTO
[Bug tree-optimization/98949] gcc-9.3 aarch64 -ftree-vectorize generates wrong code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98949 ktkachov at gcc dot gnu.org changed: What|Removed |Added CC||ktkachov at gcc dot gnu.org --- Comment #4 from ktkachov at gcc dot gnu.org --- I can confirm that the commit https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=1ab88985631dd2c5a5e3b5c0dce47cf8b6ed2f82 from PR97236 fixes the abort here.
[Bug middle-end/98974] ICE in vectorizable_condition after STMT_VINFO_VEC_STMTS
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98974 --- Comment #1 from avieira at gcc dot gnu.org --- The testcase above issues a warning, around do j=jts,enddo To use it as a testcase in my patch I'd like to get rid of it so if someone proficient in Fortran knows a way to get rid of it that'd be great!
[Bug middle-end/98974] New: ICE in vectorizable_condition after STMT_VINFO_VEC_STMTS
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98974 Bug ID: 98974 Summary: ICE in vectorizable_condition after STMT_VINFO_VEC_STMTS Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: avieira at gcc dot gnu.org Target Milestone: --- Hi, After https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=b05d5563f4be13b4a0d0951375a82adf483973c0 we found vectorizable_condition to ICE when autovectorizing for SVE. The reduced fortran testcase is an example of this: $ cat foo.F90 module module_foobar integer,parameter :: fp_kind = selected_real_kind(15) contains subroutine foobar( foo, ix ,jx ,kx,iy,ky) real, dimension( ix, kx, jx ) :: foo real(fp_kind), dimension( iy, ky, 3 ) :: bar, baz j_loop: do j=jts,enddo do k=0,ky do i=0,iy if ( baz(i,k,1) > 0. ) then bar(i,k,1) = 0 endif foo(i,nk,j) = baz0 * bar(i,k,1) enddo enddo enddo j_loop end end And the following command will cause it to ICE: $ gfortran -Ofast -mcpu=neoverse-v1 foo.F90 -S I have debugged this and I believe the issue is that before Richi's change vectorizable_condition used to set vec_oprnds0 to vec_cond_lhs for each copy. Now it is collected for all copies at the same time. However, when calling vect_get_loop_mask we pass vec_num * ncopies as the nvectors parameter, where vec_num has been set to the length of vec_oprnds0. I believe that because we are now doing all ncopies at the same time we no longer need to multiply it by ncopies. I'll be posting a patch for this soon.
[Bug tree-optimization/98856] [11 Regression] botan AES-128/XTS is slower by ~17% since r11-6649-g285fa338b06b804e72997c4d876ecf08a9c083af
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98856 --- Comment #8 from Richard Biener --- exploring more options I noticed there's no arithmetic vector V2DI right shift, so vectorizing uint64_t carry = (uint64_t)(((int64_t)W[1]) >> 63) & (uint64_t)135; W[1] = (W[1] << 1) ^ ((uint64_t)(((int64_t)W[0]) >> 63) & (uint64_t)1); W[0] = (W[0] << 1) ^ carry; didn't work out. But V2DI >> CST with CST > 31 can be implemented with VPSRAD and then doing PMOVSXDQ after shuffling the high shifted part into low position. Maybe there's sth more clever for the special case of >> 63 even. As said, just trying if "optimal" vectorization of the kernel would solve the issue. But I guess pipelines are wide enough so the original scalar code effectively executes "vectorized".
[Bug middle-end/98465] [11 Regression] Bogus -Wstringop-overread with -std=gnu++20 -O2 and std::string::insert
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98465 --- Comment #28 from Jakub Jelinek --- Actually tested version. The above testcase with [2, INF] range doesn't make really much sense, but adjusted testcase where n has [0, 2] range doesn't warn anymore like the one with constant 2. diff --git a/libstdc++-v3/include/bits/c++config b/libstdc++-v3/include/bits/c++config index b57ff339990..69336a32bc6 100644 --- a/libstdc++-v3/include/bits/c++config +++ b/libstdc++-v3/include/bits/c++config @@ -731,6 +731,10 @@ namespace std # define _GLIBCXX_HAVE_BUILTIN_LAUNDER 1 #endif +#if _GLIBCXX_HAS_BUILTIN(__builtin_object_size) +# define _GLIBCXX_HAVE_BUILTIN_OBJECT_SIZE 1 +#endif + #undef _GLIBCXX_HAS_BUILTIN #if _GLIBCXX_HAVE_BUILTIN_IS_CONSTANT_EVALUATED && __cplusplus >= 201402L diff --git a/libstdc++-v3/include/bits/basic_string.tcc b/libstdc++-v3/include/bits/basic_string.tcc index 5beda8b829b..bc6e0b98186 100644 --- a/libstdc++-v3/include/bits/basic_string.tcc +++ b/libstdc++-v3/include/bits/basic_string.tcc @@ -477,7 +477,21 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION { if (__s + __len2 <= __p + __len1) this->_S_move(__p, __s, __len2); +#if defined(_GLIBCXX_HAVE_BUILTIN_OBJECT_SIZE) && defined(__OPTIMIZE__) + /* Help the optimizers rule out impossible cases and +get rid of false positive warnings at the same time. +If we know the maximum size of the __s object and +it is shorter than 2 * __len2 - __len1, then +__s >= __p + __len1 case is impossible. */ + else if (!(__builtin_constant_p(__builtin_object_size(__s, 0) + < ((2 * __len2 - __len1) +* sizeof(_CharT))) +&& (__builtin_object_size(__s, 0) +< (2 * __len2 - __len1) * sizeof(_CharT))) + && __s >= __p + __len1) +#else else if (__s >= __p + __len1) +#endif this->_S_copy(__p, __s + __len2 - __len1, __len2); else {
[Bug rtl-optimization/98973] [11 regression] Wrong code with gcse store motion pass
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98973 --- Comment #5 from Andreas Krebbel --- Created attachment 50132 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50132&action=edit RTL dump from store motion pass
[Bug rtl-optimization/98973] [11 regression] Wrong code with gcse store motion pass
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98973 --- Comment #4 from Andreas Krebbel --- The update of global variable c is moved out of the loop. Due to that c stays at 8 although it should be counted down to 2.
[Bug rtl-optimization/98973] [11 regression] Wrong code with gcse store motion pass
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98973 --- Comment #3 from Andreas Krebbel --- Created attachment 50131 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50131&action=edit RTL GCSE dump without -fgcse-sm
[Bug rtl-optimization/98973] [11 regression] Wrong code with gcse store motion pass
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98973 --- Comment #2 from Andreas Krebbel --- Created attachment 50130 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50130&action=edit RTL GCSE dump with -fgcse-sm
[Bug debug/98656] [9/10/11 Regression] switchlower_O0 drops line number of switch
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98656 Martin Liška changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |marxin at gcc dot gnu.org Status|NEW |ASSIGNED --- Comment #4 from Martin Liška --- I'm going to test the patch and install it.
[Bug target/98957] [11 Regression] [x86] Odd code generation for 8-bit right shift
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98957 Jakub Jelinek changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED --- Comment #7 from Jakub Jelinek --- Fixed.
[Bug c++/98967] warning to spot recursive include graph
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98967 Eric Gallager changed: What|Removed |Added See Also||https://gcc.gnu.org/bugzill ||a/show_bug.cgi?id=96842 CC||egallager at gcc dot gnu.org --- Comment #1 from Eric Gallager --- Fixing bug 96842 would also help here (not exactly the same thing, but serves a similar purpose)
[Bug target/98957] [11 Regression] [x86] Odd code generation for 8-bit right shift
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98957 --- Comment #6 from CVS Commits --- The master branch has been updated by Jakub Jelinek : https://gcc.gnu.org/g:37876976b0511ec96741f638f160874f2added0e commit r11-7121-g37876976b0511ec96741f638f160874f2added0e Author: Jakub Jelinek Date: Fri Feb 5 10:39:03 2021 +0100 i386: Fix up TARGET_QIMODE_MATH for many AMD CPU tunings [PR98957] As written in the PR, TARGET_QIMODE_MATH was meant to be set for all tunings and it was the case for GCC <= 7, but as the number of PROCESSOR_* enumerators grew, some AMD tunings (which are at the end of the list) over time got enumerators with values >= 32 and TARGET_QIMODE_MATH became disabled for them, in GCC 8 for 2 tunings, in GCC 9 for 7 tunings, in GCC 10 for 8 tunings, and on the trunk for 11 tunings. The following patch fixes it by using uhwis rather than uints and gives them also symbolic names. 2021-02-05 Jakub Jelinek PR target/98957 * config/i386/i386-options.c (m_NONE, m_ALL): Define. * config/i386/x86-tune.def (X86_TUNE_BRANCH_PREDICTION_HINTS, X86_TUNE_PROMOTE_QI_REGS): Use m_NONE instead of 0U. (X86_TUNE_QIMODE_MATH): Use m_ALL instead of ~0U.
[Bug c/60759] improve -Wlogical-op
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60759 --- Comment #7 from Vincent Lefèvre --- (In reply to Manuel López-Ibáñez from comment #6) > I believe this is on purpose to avoid too much noise. The warning in GCC > needs to be smarter about types and macros and avoid early folding. Well, for the case constant-logical-operand, the warning on X || Y should be on "true" constants X and Y (which is stricter than what __builtin_constant_p regards as constants). I don't think that there would be much noise in this case, or this could be a separate macro like clang's -Wconstant-logical-operand, thus which can easily be enabled/disabled.
[Bug c++/97878] [8/9/10 Regression] ICE in cxx_eval_outermost_constant_expr, at cp/constexpr.c:6825
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97878 Jakub Jelinek changed: What|Removed |Added Summary|[8/9/10/11 Regression] ICE |[8/9/10 Regression] ICE in |in |cxx_eval_outermost_constant |cxx_eval_outermost_constant |_expr, at |_expr, at |cp/constexpr.c:6825 |cp/constexpr.c:6825 | --- Comment #6 from Jakub Jelinek --- Fixed on the trunk so far.
[Bug c++/97878] [8/9/10/11 Regression] ICE in cxx_eval_outermost_constant_expr, at cp/constexpr.c:6825
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97878 --- Comment #5 from CVS Commits --- The master branch has been updated by Jakub Jelinek : https://gcc.gnu.org/g:b229baa75ce4627d1bd38f2d3dcd91af1a7071db commit r11-7120-gb229baa75ce4627d1bd38f2d3dcd91af1a7071db Author: Jakub Jelinek Date: Fri Feb 5 10:22:07 2021 +0100 c++: Fix ICE with structured binding initialized to incomplete array [PR97878] We ICE on the following testcase, for incomplete array a on auto [b] { a }; without giving any kind of diagnostics, with auto [c] = a; during error-recovery. The problem is that we get too far through check_initializer and e.g. store_init_value -> constexpr stuff can't deal with incomplete array types. As the type of the structured binding artificial variable is always deduced, I think it is easiest to diagnose this early, even if they have array types we'll need their deduced type to be complete rather than just its element type. 2021-02-05 Jakub Jelinek PR c++/97878 * decl.c (check_array_initializer): For structured bindings, require the array type to be complete. * g++.dg/cpp1z/decomp54.C: New test.