[Bug tree-optimization/107891] Redudant "double" permutation from SLP vectorization (PR97832)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107891 Richard Biener changed: What|Removed |Added Ever confirmed|0 |1 CC||rguenth at gcc dot gnu.org, ||rsandifo at gcc dot gnu.org Blocks||53947 Component|middle-end |tree-optimization Summary|Redudant "double" |Redudant "double" |permutation from PR97832|permutation from SLP ||vectorization (PR97832) Last reconfirmed||2022-11-28 Status|UNCONFIRMED |NEW Referenced Bugs: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947 [Bug 53947] [meta-bug] vectorizer missed-optimizations
[Bug tree-optimization/107888] [12/13 Regression] Missed min/max transformation in phiopt due to VRP
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107888 Richard Biener changed: What|Removed |Added Target Milestone|--- |12.3 --- Comment #1 from Richard Biener --- which means we fail to optimize a > b ? 1 : b as well, no?
[Bug c/107890] UB on integer overflow impacts code flow
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107890 Martin Uecker changed: What|Removed |Added CC||muecker at gwdg dot de --- Comment #3 from Martin Uecker --- Of course, instead of using the standard as an excuse, we could also try to make the compiler less of a footgun. Even if this is standard conforming, it is still a severe usability issue with safety implications and I do not think we should simply close such bugs.
[Bug demangler/107884] H8/300: cp-demangle.c fix warning related demangle.h
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107884 Richard Biener changed: What|Removed |Added CC||jsm28 at gcc dot gnu.org Target||h8 --- Comment #3 from Richard Biener --- Since there is I think no ABI constraints here simply using the appearant unused bits to get them to fit into 16 bits looks possible? Supposedly C defines literal suffixes for int32_t? Otherwise using (1L << 17) might work as well here.
[Bug rtl-optimization/107892] New: Unnecessary move between ymm registers in loop using AVX2 intrinsic
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107892 Bug ID: 107892 Summary: Unnecessary move between ymm registers in loop using AVX2 intrinsic Product: gcc Version: 13.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: ebiggers3 at gmail dot com Target Milestone: --- To reproduce with the latest trunk, compile the following .c file on x86_64 at -O2: #include int __attribute__((target("avx2"))) sum_ints(const __m256i *p, size_t n) { __m256i a = _mm256_setzero_si256(); __m128i b; do { a = _mm256_add_epi32(a, *p++); } while (--n); b = _mm_add_epi32(_mm256_extracti128_si256(a, 0), _mm256_extracti128_si256(a, 1)); b = _mm_add_epi32(b, _mm_shuffle_epi32(b, 0x31)); b = _mm_add_epi32(b, _mm_shuffle_epi32(b, 0x02)); return _mm_cvtsi128_si32(b); } The assembly that gcc generates is: : 0: c5 f1 ef c9 vpxor %xmm1,%xmm1,%xmm1 4: 0f 1f 40 00 nopl 0x0(%rax) 8: c5 f5 fe 07 vpaddd (%rdi),%ymm1,%ymm0 c: 48 83 c7 20 add$0x20,%rdi 10: c5 fd 6f c8 vmovdqa %ymm0,%ymm1 14: 48 83 ee 01 sub$0x1,%rsi 18: 75 ee jne8 1a: c4 e3 7d 39 c1 01 vextracti128 $0x1,%ymm0,%xmm1 20: c5 f9 fe c1 vpaddd %xmm1,%xmm0,%xmm0 24: c5 f9 70 c8 31 vpshufd $0x31,%xmm0,%xmm1 29: c5 f1 fe c8 vpaddd %xmm0,%xmm1,%xmm1 2d: c5 f9 70 c1 02 vpshufd $0x2,%xmm1,%xmm0 32: c5 f9 fe c1 vpaddd %xmm1,%xmm0,%xmm0 36: c5 f9 7e c0 vmovd %xmm0,%eax 3a: c5 f8 77vzeroupper 3d: c3 ret The bug is that the inner loop contains an unnecessary vmovdqa: 8: vpaddd (%rdi),%ymm1,%ymm0 add$0x20,%rdi vmovdqa %ymm0,%ymm1 sub$0x1,%rsi jne8 It should look like the following instead: 8: vpaddd (%rdi),%ymm0,%ymm0 add$0x20,%rdi sub$0x1,%rsi jne8 Strangely, the bug goes away if the __v8si type is used instead of __m256i and the addition is done using "+=" instead of _mm256_add_epi32(): int __attribute__((target("avx2"))) sum_ints_good(const __v8si *p, size_t n) { __v8si a = {}; __m128i b; do { a += *p++; } while (--n); b = _mm_add_epi32(_mm256_extracti128_si256((__m256i)a, 0), _mm256_extracti128_si256((__m256i)a, 1)); b = _mm_add_epi32(b, _mm_shuffle_epi32(b, 0x31)); b = _mm_add_epi32(b, _mm_shuffle_epi32(b, 0x02)); return _mm_cvtsi128_si32(b); } In the bad version, I noticed that the RTL initially has two separate insns for 'a += *p': one to do the addition and write the result to a new pseudo register, and one to convert the value from mode V8SI to V4DI and assign it to the original pseudo register. These two separate insns never get combined. (That sort of explains why the bug isn't seen with the __v8si and += method; gcc doesn't do a type conversion with that method.) So, I'm wondering if the bug is in the instruction combining pass. Or perhaps the RTL should never have had two separate insns in the first place?
[Bug analyzer/107882] [13 Regression] ICE in get_last_bit_offset, at analyzer/store.h:255 since 13-2582-g0ea5e3f4542832b8
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107882 Richard Biener changed: What|Removed |Added Target Milestone|--- |13.0
[Bug tree-optimization/107879] [13 Regression] ffmpeg-4 test suite fails on FPU arithmetics
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107879 Richard Biener changed: What|Removed |Added Priority|P3 |P1
[Bug tree-optimization/107876] [13 Regression] ICE in verify_dominators, at dominance.cc:1184 (error: dominator of 4 should be 14, not 16) since r13-3749-g7314b98b1bcd382c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107876 Richard Biener changed: What|Removed |Added Status|NEW |ASSIGNED Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org --- Comment #3 from Richard Biener --- Mine.
[Bug target/107863] [10/11/12/13 Regression] ICE with unrecognizable insn when using -funsigned-char with some SSE/AVX builtins
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107863 --- Comment #9 from Hongtao.liu --- expand_expr_real_1 generates (const_int 255) without considering the target mode. I guess it's on purpose, so I'll leave that alone and only change the expander in the backend. After applying convert_modes to (const_int 255), it's transformed to (const_int -1) which should fix the issue. ---cut from expand_expr_real_1-- 11010case INTEGER_CST: 11011 { 11012/* Given that TYPE_PRECISION (type) is not always equal to 11013 GET_MODE_PRECISION (TYPE_MODE (type)), we need to extend from 11014 the former to the latter according to the signedness of the 11015 type. */ 11016scalar_int_mode int_mode = SCALAR_INT_TYPE_MODE (type); 11017temp = immed_wide_int_const 11018 (wi::to_wide (exp, GET_MODE_PRECISION (int_mode)), int_mode); 11019return temp; 11020 } ---cut ends Proposed patch: diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc index 0373c3614a4..c639ee3a9f7 100644 --- a/gcc/config/i386/i386-expand.cc +++ b/gcc/config/i386/i386-expand.cc @@ -12475,7 +12475,7 @@ ix86_expand_vec_set_builtin (tree exp) op1 = expand_expr (arg1, NULL_RTX, mode1, EXPAND_NORMAL); elt = get_element_number (TREE_TYPE (arg0), arg2); - if (GET_MODE (op1) != mode1 && GET_MODE (op1) != VOIDmode) + if (GET_MODE (op1) != mode1) op1 = convert_modes (mode1, GET_MODE (op1), op1, true); op0 = force_reg (tmode, op0);
[Bug middle-end/107891] Redudant "double" permutation from PR97832
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107891 --- Comment #1 from Hongtao.liu --- commemt25 from PR97832 I guess that's possible but the SLP vectorizer has a permute optimization phase (and SLP discovery itself), it would be nice to see why the former doesn't elide the permutes here.
[Bug tree-optimization/97832] AoSoA complex caxpy-like loops: AVX2+FMA -Ofast 7 times slower than -O3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97832 --- Comment #26 from Hongtao.liu --- > I guess that's possible but the SLP vectorizer has a permute optimization > phase (and SLP discovery itself), it would be nice to see why the former > doesn't elide the permutes here. I've opened PR107891 for it.
[Bug tree-optimization/97832] AoSoA complex caxpy-like loops: AVX2+FMA -Ofast 7 times slower than -O3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97832 --- Comment #25 from rguenther at suse dot de --- On Mon, 28 Nov 2022, crazylht at gmail dot com wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97832 > > --- Comment #24 from Hongtao.liu --- > _233 = {f_im_36, f_re_35, f_re_35, f_re_35}; > _217 = {f_re_35, f_im_36, f_im_36, f_im_36}; > ... > vect_x_re_55.15_227 = VEC_PERM_EXPR { 0, 5, 6, 7 }>; > vect_x_re_55.23_211 = VEC_PERM_EXPR vect_x_im_61.14_228, { 0, 5, 6, 7 }>; > ... > vect_y_re_69.17_224 = .FNMA (vect_x_re_55.15_227, _233, vect_y_re_63.9_237); > vect_y_re_69.25_208 = .FNMA (vect_x_re_55.23_211, _217, > vect_y_re_69.17_224); > > is equal to > > _233 = {f_im_36,f_im_36, f_im_36, f_im_36} > _217 = {f_re_35, f_re_35, f_re_35, f_re_35}; > ... > vect_y_re_69.17_224 = .FNMA (vect_x_im_61.14_228, _233, vect_y_re_63.9_237) > vect_y_re_69.25_208 = .FNMA (vect_x_im_61.13_230, _217, vect_y_re_69.17_224) > > A simplication in match.pd? I guess that's possible but the SLP vectorizer has a permute optimization phase (and SLP discovery itself), it would be nice to see why the former doesn't elide the permutes here.
[Bug c++/107889] Incorrect parsing of qualified friend function returning decltype(auto)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107889 Martin Liška changed: What|Removed |Added Status|UNCONFIRMED |NEW Ever confirmed|0 |1 Last reconfirmed||2022-11-28 CC||marxin at gcc dot gnu.org --- Comment #1 from Martin Liška --- Clang accepts the code.
[Bug fortran/107872] ICE on recursive DT with DTIO since r7-4096-gbf9f15ee55f5b291
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107872 Martin Liška changed: What|Removed |Added CC||marxin at gcc dot gnu.org, ||pault at gcc dot gnu.org Summary|ICE on recursive DT with|ICE on recursive DT with |DTIO|DTIO since ||r7-4096-gbf9f15ee55f5b291 --- Comment #2 from Martin Liška --- Started likely with r7-4096-gbf9f15ee55f5b291.
[Bug analyzer/107882] [13 Regression] ICE in get_last_bit_offset, at analyzer/store.h:255
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107882 Martin Liška changed: What|Removed |Added CC||marxin at gcc dot gnu.org, ||tlange at gcc dot gnu.org Status|UNCONFIRMED |NEW Last reconfirmed||2022-11-28 Ever confirmed|0 |1 --- Comment #1 from Martin Liška --- Started with r13-2582-g0ea5e3f4542832b8.
[Bug tree-optimization/107876] [13 Regression] ICE in verify_dominators, at dominance.cc:1184 (error: dominator of 4 should be 14, not 16) since r13-3749-g7314b98b1bcd382c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107876 Martin Liška changed: What|Removed |Added Summary|[13 Regression] ICE in |[13 Regression] ICE in |verify_dominators, at |verify_dominators, at |dominance.cc:1184 (error: |dominance.cc:1184 (error: |dominator of 4 should be|dominator of 4 should be |14, not 16) |14, not 16) since ||r13-3749-g7314b98b1bcd382c CC||marxin at gcc dot gnu.org, ||rguenth at gcc dot gnu.org --- Comment #2 from Martin Liška --- Started with r13-3749-g7314b98b1bcd382c.
[Bug middle-end/107891] New: Redudant "double" permutation from PR97832
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107891 Bug ID: 107891 Summary: Redudant "double" permutation from PR97832 Product: gcc Version: 13.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: crazylht at gmail dot com Target Milestone: --- #include void foo1x1(double* restrict y, const double* restrict x, int clen) { int xi = clen & 2; double f_re = x[0+xi+0]; double f_im = x[4+xi+0]; ptrdiff_t clen2 = (clen+xi) * 2; //#pragma GCC unroll 0 for (ptrdiff_t c = 0; c < clen2; c += 8) { // y[c] = y[c] - x[c]*conj(f); //#pragma GCC unroll 4 for (ptrdiff_t k = 0; k < 4; ++k) { double x_re = x[c+0+k]; double x_im = x[c+4+k]; double y_re = y[c+0+k]; double y_im = y[c+4+k]; y_re = y_re - x_re * f_re - x_im * f_im;; y_im = y_im + x_re * f_im - x_im * f_re; y[c+0+k] = y_re; y[c+4+k] = y_im; } } } -Ofast -mavx2 -mfma generate extra blendpd compared to -O3 -mavx2 -mfma and blendpd is redundant since there're "doube" permutations for mult operand in FMA. They're computing the same thing since we also do the same "permutation" for the invariants: f_re and f_imm, can we eliminate that in the vectorizer? _232 = {f_im_36, f_im_36, f_im_36, f_im_36}; _231 = {f_im_36, f_re_35, f_re_35, f_re_35}; --- here _216 = {f_re_35, f_re_35, f_re_35, f_re_35}; _215 = {f_re_35, f_im_36, f_im_36, f_im_36}; -- and here. ivtmp.36_221 = (unsigned long) y_41(D); ivtmp.38_61 = (unsigned long) x_33(D); [local count: 214748368]: # ivtmp.32_66 = PHI # ivtmp.36_64 = PHI # ivtmp.38_220 = PHI # DEBUG c => NULL # DEBUG k => 0 # DEBUG BEGIN_STMT # DEBUG BEGIN_STMT # DEBUG D#78 => D#79 * 8 # DEBUG D#77 => x_33(D) + D#78 _62 = (void *) ivtmp.38_220; vect_x_im_61.13_228 = MEM [(const double *)_62]; vect_x_im_61.14_226 = MEM [(const double *)_62 + 32B]; vect_x_re_55.15_225 = VEC_PERM_EXPR ; - here. vect_x_re_55.23_209 = VEC_PERM_EXPR ; - here # DEBUG D#76 => *D#77 # DEBUG x_re => D#76 # DEBUG BEGIN_STMT # DEBUG D#74 => (long unsigned int) D#75 # DEBUG D#73 => D#74 * 8 # DEBUG D#72 => x_33(D) + D#73 # DEBUG D#71 => *D#72 # DEBUG x_im => D#71 # DEBUG BEGIN_STMT # DEBUG D#70 => y_41(D) + D#78 _59 = (void *) ivtmp.36_64; vect_y_re_63.9_235 = MEM [(double *)_59]; vect_y_re_63.10_233 = MEM [(double *)_59 + 32B]; vect__42.18_219 = .FMA (vect_x_im_61.13_228, _232, vect_y_re_63.10_233); vect_y_re_69.17_222 = .FNMA (vect_x_re_55.15_225, _231, vect_y_re_63.9_235); vect_y_re_69.25_206 = .FNMA (vect_x_re_55.23_209, _215, vect_y_re_69.17_222); vect_y_re_69.25_205 = .FNMA (_216, vect_x_im_61.14_226, vect__42.18_219); and _233 = {f_im_36, f_re_35, f_re_35, f_re_35}; _217 = {f_re_35, f_im_36, f_im_36, f_im_36}; ... vect_x_re_55.15_227 = VEC_PERM_EXPR ; vect_x_re_55.23_211 = VEC_PERM_EXPR ; ... vect_y_re_69.17_224 = .FNMA (vect_x_re_55.15_227, _233, vect_y_re_63.9_237); vect_y_re_69.25_208 = .FNMA (vect_x_re_55.23_211, _217, vect_y_re_69.17_224); is equal to _233 = {f_im_36,f_im_36, f_im_36, f_im_36} _217 = {f_re_35, f_re_35, f_re_35, f_re_35}; ... vect_y_re_69.17_224 = .FNMA (vect_x_im_61.14_228, _233, vect_y_re_63.9_237) vect_y_re_69.25_208 = .FNMA (vect_x_im_61.13_230, _217, vect_y_re_69.17_224) A simplication in match.pd?
[Bug tree-optimization/97832] AoSoA complex caxpy-like loops: AVX2+FMA -Ofast 7 times slower than -O3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97832 --- Comment #24 from Hongtao.liu --- _233 = {f_im_36, f_re_35, f_re_35, f_re_35}; _217 = {f_re_35, f_im_36, f_im_36, f_im_36}; ... vect_x_re_55.15_227 = VEC_PERM_EXPR ; vect_x_re_55.23_211 = VEC_PERM_EXPR ; ... vect_y_re_69.17_224 = .FNMA (vect_x_re_55.15_227, _233, vect_y_re_63.9_237); vect_y_re_69.25_208 = .FNMA (vect_x_re_55.23_211, _217, vect_y_re_69.17_224); is equal to _233 = {f_im_36,f_im_36, f_im_36, f_im_36} _217 = {f_re_35, f_re_35, f_re_35, f_re_35}; ... vect_y_re_69.17_224 = .FNMA (vect_x_im_61.14_228, _233, vect_y_re_63.9_237) vect_y_re_69.25_208 = .FNMA (vect_x_im_61.13_230, _217, vect_y_re_69.17_224) A simplication in match.pd?
[Bug tree-optimization/97832] AoSoA complex caxpy-like loops: AVX2+FMA -Ofast 7 times slower than -O3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97832 --- Comment #23 from Hongtao.liu --- > the blends do not look like no-ops so I wonder if this is really computing > the same thing ... (it swaps lane 0 from the two loads from x but not the > stores) They're computing the same thing since we also do the same "permutation" for the invariants: f_re and f_imm, can we eliminate that in the vectorizer? _232 = {f_im_36, f_im_36, f_im_36, f_im_36}; _231 = {f_im_36, f_re_35, f_re_35, f_re_35}; --- here _216 = {f_re_35, f_re_35, f_re_35, f_re_35}; _215 = {f_re_35, f_im_36, f_im_36, f_im_36}; -- and here. ivtmp.36_221 = (unsigned long) y_41(D); ivtmp.38_61 = (unsigned long) x_33(D); [local count: 214748368]: # ivtmp.32_66 = PHI # ivtmp.36_64 = PHI # ivtmp.38_220 = PHI # DEBUG c => NULL # DEBUG k => 0 # DEBUG BEGIN_STMT # DEBUG BEGIN_STMT # DEBUG D#78 => D#79 * 8 # DEBUG D#77 => x_33(D) + D#78 _62 = (void *) ivtmp.38_220; vect_x_im_61.13_228 = MEM [(const double *)_62]; vect_x_im_61.14_226 = MEM [(const double *)_62 + 32B]; vect_x_re_55.15_225 = VEC_PERM_EXPR ; vect_x_re_55.23_209 = VEC_PERM_EXPR ; # DEBUG D#76 => *D#77 # DEBUG x_re => D#76 # DEBUG BEGIN_STMT # DEBUG D#74 => (long unsigned int) D#75 # DEBUG D#73 => D#74 * 8 # DEBUG D#72 => x_33(D) + D#73 # DEBUG D#71 => *D#72 # DEBUG x_im => D#71 # DEBUG BEGIN_STMT # DEBUG D#70 => y_41(D) + D#78 _59 = (void *) ivtmp.36_64; vect_y_re_63.9_235 = MEM [(double *)_59]; vect_y_re_63.10_233 = MEM [(double *)_59 + 32B]; vect__42.18_219 = .FMA (vect_x_im_61.13_228, _232, vect_y_re_63.10_233); vect_y_re_69.17_222 = .FNMA (vect_x_re_55.15_225, _231, vect_y_re_63.9_235); vect_y_re_69.25_206 = .FNMA (vect_x_re_55.23_209, _215, vect_y_re_69.17_222); vect_y_re_69.25_205 = .FNMA (_216, vect_x_im_61.14_226, vect__42.18_219);
[Bug target/104271] [12/13 Regression] 538.imagick_r run-time at -Ofast -march=native regressed by 26% on Intel Cascade Lake server CPU
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104271 --- Comment #12 from cuilili --- This regression caused by the store forwarding issue, we eliminate the redundant two pairs of loads and stores which have store forwarding issue by inlining. This regression has been fixed by https://gcc.gnu.org/g:1b9a5cc9ec08e9f239dd2096edcc447b7a72f64a
[Bug debug/105145] dropped DWARF location information at -O1/-O2/-O3 upon ftree-dse
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105145 --- Comment #2 from Andrew Pinski --- *** Bug 105248 has been marked as a duplicate of this bug. ***
[Bug tree-optimization/105248] gimple level DSE does not add DEBUG statement when deleting store to ADDRESSABLE local decl
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105248 Andrew Pinski changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |DUPLICATE --- Comment #2 from Andrew Pinski --- The problem is exactly the same as PR 105145 so closing as a dup. *** This bug has been marked as a duplicate of bug 105145 ***
[Bug middle-end/107494] -ffinite-loops does not show it is enabled with --help by default for C++11+
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107494 Andrew Pinski changed: What|Removed |Added Ever confirmed|0 |1 Summary|-ffinite-loops is not |-ffinite-loops does not |enable by default |show it is enabled with ||--help by default for ||C++11+ Last reconfirmed||2022-11-28 Status|UNCONFIRMED |NEW --- Comment #2 from Andrew Pinski --- Confirmed. /* Exit early if we can (e.g. -help). */ if (!exit_after_options) { /* Just in case lang_hooks.post_options ends up calling a debug_hook. This can happen with incorrect pre-processed input. */ debug_hooks = _nothing_debug_hooks; /* Allow the front end to perform consistency checks and do further initialization based on the command line options. This hook also sets the original filename if appropriate (e.g. foo.i -> foo.c) so we can correctly initialize debug output. */ bool no_backend = lang_hooks.post_options (_input_filename); So the language hook that does the SET_OPTION_IF_UNSET is not called at all.
[Bug tree-optimization/91882] boolean XOR tautology missed optimisation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91882 Andrew Pinski changed: What|Removed |Added Target Milestone|--- |13.0
[Bug tree-optimization/107887] (bool0 > bool1) | bool1 is not optimized to bool0 | bool1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107887 Andrew Pinski changed: What|Removed |Added Status|ASSIGNED|NEW Assignee|pinskia at gcc dot gnu.org |unassigned at gcc dot gnu.org --- Comment #2 from Andrew Pinski --- Hmm, the code of reassociation is somewhat hard to follow. So I am not going to work on this.
[Bug tree-optimization/107881] (a <= b) == (b >= a) should be optimized to (a == b)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107881 Andrew Pinski changed: What|Removed |Added Status|ASSIGNED|NEW Assignee|pinskia at gcc dot gnu.org |unassigned at gcc dot gnu.org --- Comment #8 from Andrew Pinski --- Hmm, the code of reassociation is somewhat hard to follow. So I am not going to work on this.
[Bug tree-optimization/107881] (a <= b) == (b >= a) should be optimized to (a == b)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107881 Andrew Pinski changed: What|Removed |Added Status|NEW |ASSIGNED Assignee|unassigned at gcc dot gnu.org |pinskia at gcc dot gnu.org --- Comment #7 from Andrew Pinski --- Mine.
[Bug tree-optimization/107887] (bool0 > bool1) | bool1 is not optimized to bool0 | bool1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107887 Andrew Pinski changed: What|Removed |Added Status|UNCONFIRMED |ASSIGNED Ever confirmed|0 |1 Last reconfirmed||2022-11-28 Assignee|unassigned at gcc dot gnu.org |pinskia at gcc dot gnu.org See Also||https://gcc.gnu.org/bugzill ||a/show_bug.cgi?id=107881 --- Comment #1 from Andrew Pinski --- There is some discussion about this in bug 107881 comment #6 on how to implement this inside reassociation . I am going to try to figure out how to handle this there.
[Bug tree-optimization/107881] (a <= b) == (b >= a) should be optimized to (a == b)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107881 --- Comment #6 from Andrew Pinski --- I was thinking about having reassociation changing bool == bool, bool < bool, and bool <= bool into ~(bool ^ bool), !bool & bool, !bool | bool to "linearizing" so then reassociation can handle the rest (with the xor patch still? or we change ^ to how we expand xor like it is done in the patch) and then when finalizing, we simplify back to ==, <, and <= (and ^).
[Bug target/107748] [13 Regression] Isn't _mm_cvtsbh_ss incorrect?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107748 --- Comment #11 from Hongtao.liu --- Fixed in GCC13.
[Bug target/107748] [13 Regression] Isn't _mm_cvtsbh_ss incorrect?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107748 --- Comment #10 from CVS Commits --- The master branch has been updated by hongtao Liu : https://gcc.gnu.org/g:a1ecc5600464f6a62faab246d522b6328badda90 commit r13-4314-ga1ecc5600464f6a62faab246d522b6328badda90 Author: liuhongt Date: Wed Nov 23 21:58:09 2022 +0800 Fix incorrect _mm_cvtsbh_ss. After supporting real __bf16, the implementation of _mm_cvtsbh_ss went wrong. The patch add a builtin to generate pslld for the intrinsic, also extendbfsf2 is supported with pslld when !HONOR_NANS (BFmode). truncsfbf2 is supported with vcvtneps2bf16 when !HONOR_NANS (BFmode) && flag_unsafe_math_optimizations. gcc/ChangeLog: PR target/107748 * config/i386/avx512bf16intrin.h (_mm_cvtsbh_ss): Refined. * config/i386/i386-builtin-types.def (FLOAT_FTYPE_BFLOAT16): New function type. * config/i386/i386-builtin.def (BDESC): New builtin. * config/i386/i386-expand.cc (ix86_expand_args_builtin): Handle the builtin. * config/i386/i386.md (extendbfsf2): New expander. (extendbfsf2_1): New define_insn. (truncsfbf2): Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/avx512bf16-cvtsbh2ss-1.c: Scan pslld. * gcc.target/i386/extendbfsf.c: New test.
[Bug c/107890] UB on integer overflow impacts code flow
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107890 --- Comment #2 from Jonathan Wakely --- You should read https://blog.regehr.org/archives/213
[Bug c/107890] UB on integer overflow impacts code flow
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107890 Andrew Pinski changed: What|Removed |Added Resolution|--- |INVALID Status|UNCONFIRMED |RESOLVED --- Comment #1 from Andrew Pinski --- >I was under the impression that this kind of undefined behavior essentially >meant that the value of that integer could become unreliable. Your impression is incorrect. Once undefined behavior happens, anything can happen. This is why things like -fsanitize=undefined is there now.
[Bug c/107890] New: UB on integer overflow impacts code flow
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107890 Bug ID: 107890 Summary: UB on integer overflow impacts code flow Product: gcc Version: 12.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: gcc at pkh dot me Target Milestone: --- Following is a code that is sensible to a signed integer overflow. I was under the impression that this kind of undefined behavior essentially meant that the value of that integer could become unreliable. But apparently this is not limited to the value of said integer, it can also dramatically impact the code flow. Here is the pathological code: #include #include #include uint8_t tab[0x1ff + 1]; uint8_t f(int32_t x) { if (x < 0) return 0; int32_t i = x * 0x1ff / 0x; if (i >= 0 && i < sizeof(tab)) { printf("tab[%d] looks safe because %d is between [0;%d[\n", i, i, (int)sizeof(tab)); return tab[i]; } return 0; } int main(int ac, char **av) { return f(atoi(av[1])); } Triggering an overflow actually enters the printf/dereference scope, violating the protective condition and thus causing a crash: % cc -Wall -O2 overflow.c -o overflow && ./overflow 5000 tab[62183] looks safe because 62183 is between [0;512[ zsh: segmentation fault (core dumped) ./overflow 5000 I feel extremely uncomfortable about an integer overflow actually impacting something else than the integer itself. Is it expected or is this a bug?
[Bug analyzer/107807] gcc.dg/analyzer/errno-1.c FAILs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107807 --- Comment #7 from ro at CeBiTec dot Uni-Bielefeld.DE --- > --- Comment #6 from Rainer Orth --- > It did in last night's Solaris bootstraps (sparc and x86). macOS bootstraps > are > super-slow, so I'll wait for tomorrow night's weekly bootstraps there and > report > back when they are finished. The Mac OS X 10.7 have finished now and as expected, the failures are gone.
[Bug fortran/107874] merge not using all its arguments
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107874 --- Comment #5 from Steve Kargl --- On Sun, Nov 27, 2022 at 08:00:35PM +, anlauf at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107874 > > --- Comment #3 from anlauf at gcc dot gnu.org --- > (In reply to kargl from comment #2) > > Harald, you are likely right the patch can be moved down. I'll programmed > > up the example from the Fortran 2018 standard, which works as expected. So, > > there is definitely something about a scalar mask choosing the actual > > argument before both are evaluated. > > > >program foo > > Steve, > > this example from the standard seems to be working down to 7.5 for me. > Am I missing something? Do we need this in the testsuite? You are not missing anything. I wanted an example that works with or without the patch John included, so that we don't accidently introduce a regression. > I'd say it's rather the following two lines replacing the loop in the > reproducer in comment#0: > > print *, merge(tstuff(),fstuff(),.true.) > print *, merge(tstuff(),fstuff(),.false.) > > This is mis-simplified in simplify.cc:4909 Good find! This may indeed be a source of the issue.
[Bug libstdc++/107815] 20_util/to_chars/float128_c++23.cc FAILs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107815 --- Comment #16 from dave.anglin at bell dot net --- This is what the test prints: 6.47518e-4966 6e-4966 xxx.cc:79: void test(std::chars_format): Assertion 'ec4 == std::errc() && ptr4 == ptr1' failed. ABORT instruction (core dumped)
[Bug fortran/107874] merge not using all its arguments
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107874 --- Comment #4 from anlauf at gcc dot gnu.org --- The following patch fixes comment#3: diff --git a/gcc/fortran/simplify.cc b/gcc/fortran/simplify.cc index 9c2fea8c5f2..2f69c4369ab 100644 --- a/gcc/fortran/simplify.cc +++ b/gcc/fortran/simplify.cc @@ -4913,6 +4914,11 @@ gfc_simplify_merge (gfc_expr *tsource, gfc_expr *fsource, gfc_expr *mask) if (mask->expr_type == EXPR_CONSTANT) { + /* The standard requires evaluation of all function arguments. +Simplify only when TSOURCE, FSOURCE are constant expressions. */ + if (!gfc_is_constant_expr (tsource) || !gfc_is_constant_expr (fsource)) + return NULL; + result = gfc_copy_expr (mask->value.logical ? tsource : fsource); /* Parenthesis is needed to get lower bounds of 1. */ result = gfc_get_parentheses (result); This leads to a "regression" for gfortran.dg/merge_init_expr_2.f90, which is due to the pattern matching the old, faulty simplification result. That's trivial to fix, though.
[Bug libstdc++/107815] 20_util/to_chars/float128_c++23.cc FAILs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107815 --- Comment #15 from Jakub Jelinek --- (In reply to dave.anglin from comment #14) > /home/dave/gnu/gcc/gcc/libstdc++-v3/testsuite/20_util/to_chars/ > float128_c++23.cc > :77: void test(std::chars_format): Assertion 'ec4 == std::errc() && ptr4 == > ptr1 > ' failed. > FAIL: 20_util/to_chars/float128_c++23.cc execution test Can you provide more info? E.g. try to run the https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107815#c5 program and attach here what it prints, uncomment the //std::cout << u << ' ' << std::string_view (str1, ptr1) << '\n'; line at least to see which test it is (if also the max() or some other one)? Thanks.
[Bug libstdc++/107815] 20_util/to_chars/float128_c++23.cc FAILs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107815 --- Comment #14 from dave.anglin at bell dot net --- /home/dave/gnu/gcc/gcc/libstdc++-v3/testsuite/20_util/to_chars/float128_c++23.cc :77: void test(std::chars_format): Assertion 'ec4 == std::errc() && ptr4 == ptr1 ' failed. FAIL: 20_util/to_chars/float128_c++23.cc execution test
[Bug fortran/107819] ICE in gfc_check_argument_var_dependency, at fortran/dependency.cc:978
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107819 anlauf at gcc dot gnu.org changed: What|Removed |Added Status|NEW |ASSIGNED Assignee|unassigned at gcc dot gnu.org |anlauf at gcc dot gnu.org --- Comment #13 from anlauf at gcc dot gnu.org --- Submitted: https://gcc.gnu.org/pipermail/fortran/2022-November/058556.html
[Bug libstdc++/107886] Problem witch std::latch, std::binary_semaphores in C++20
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107886 Jonathan Wakely changed: What|Removed |Added Status|UNCONFIRMED |WAITING Ever confirmed|0 |1 Last reconfirmed||2022-11-27 --- Comment #7 from Jonathan Wakely --- (In reply to Jamaika from comment #0) > https://github.com/meganz/mingw-std-threads/issues/67 Please read https://gcc.gnu.org/bugs/ and provide the requested info, not just a URL.
[Bug fortran/107874] merge not using all its arguments
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107874 --- Comment #3 from anlauf at gcc dot gnu.org --- (In reply to kargl from comment #2) > Harald, you are likely right the patch can be moved down. I'll programmed > up the example from the Fortran 2018 standard, which works as expected. So, > there is definitely something about a scalar mask choosing the actual > argument before both are evaluated. > >program foo Steve, this example from the standard seems to be working down to 7.5 for me. Am I missing something? Do we need this in the testsuite? I'd say it's rather the following two lines replacing the loop in the reproducer in comment#0: print *, merge(tstuff(),fstuff(),.true.) print *, merge(tstuff(),fstuff(),.false.) This is mis-simplified in simplify.cc:4909 gfc_expr * gfc_simplify_merge (gfc_expr *tsource, gfc_expr *fsource, gfc_expr *mask) { gfc_expr * result; gfc_constructor *tsource_ctor, *fsource_ctor, *mask_ctor; if (mask->expr_type == EXPR_CONSTANT) { result = gfc_copy_expr (mask->value.logical ? tsource : fsource); /* Parenthesis is needed to get lower bounds of 1. */ result = gfc_get_parentheses (result); gfc_simplify_expr (result, 1); return result; } So unless tsource and fsource are both constant, we have to give up here.
[Bug c++/107889] New: Incorrect parsing of qualified friend function returning decltype(auto)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107889 Bug ID: 107889 Summary: Incorrect parsing of qualified friend function returning decltype(auto) Product: gcc Version: 12.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: gcc at nospam dot scs.stanford.edu Target Milestone: --- G++ 12.2.0 rejects a valid friend declaration for a fully-qualified function returning `decltype(auto)`. To reproduce the problem, you can try to compile the following code with `g++ -std=c++20 -c bug.cc`: decltype(auto) f() { } struct S { friend decltype(auto) ::f(); }; This results in the following error: $ c++ -std=c++20 -c bug.cc bug.cc:7:27: error: 'decltype(auto)' is not a class type 7 | friend decltype(auto) ::f(); | ^ bug.cc:7:27: error: 'decltype(auto)' is not a class type bug.cc:7:27: error: 'decltype(auto)' is not a class type bug.cc:7:29: error: 'decltype(auto)' is not a class type 7 | friend decltype(auto) ::f(); | ^ bug.cc:7:10: error: ISO C++ forbids declaration of 'f' with no type [-fpermissive] 7 | friend decltype(auto) ::f(); | ^~ bug.cc:7:29: error: invalid use of 'decltype(auto)' 7 | friend decltype(auto) ::f(); A similar problem was reported in bug #59766 for friend functions returning auto. It seems to have been mostly fixed, but the combination of decltype(auto) and the function name being qualified (::f) is still a problem.
[Bug tree-optimization/107888] New: [12/13 Regression] Missed min/max transformation in phiopt due to VRP
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107888 Bug ID: 107888 Summary: [12/13 Regression] Missed min/max transformation in phiopt due to VRP Product: gcc Version: 13.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: pinskia at gcc dot gnu.org Target Milestone: --- Take: ``` #define bool _Bool int maxbool(bool ab, bool bb) { int a = ab; int b = bb; int c; if (a > b) c = a; else c = b; return c; } ``` We miss that c is max of a and b because VRP decides to change the phi. We get out of VRP: ``` if (a_3 > b_5) goto ; [INV] else goto ; [INV] : : # c_1 = PHI <1(2), b_5(3)> ``` What VRP is doing is correct just is harder to optimize to a max (and then a | ). In the above case we could optimize `bool0 ? 1 : bool1` to `bool0 | bool1` But then we end up with PR 107887 too. You can also end up with the above issue where you know the only overlap between the two arguments is [5,6] : ``` int max(int ab, int bb) { if (ab < 5) __builtin_trap(); if (bb > 6) __builtin_trap(); int a = ab; int b = bb; int c; if (a >= b) c = a; else c = b; return c; } ``` Which we cannot optimize based on zero/one any more. (note this version of max has been an issue since at least GCC 4.1, I suspect since VRP was added).
[Bug tree-optimization/107887] (bool0 > bool1) | bool1 is not optimized to bool0 | bool1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107887 Andrew Pinski changed: What|Removed |Added Severity|normal |enhancement
[Bug tree-optimization/107887] New: (bool0 > bool1) | bool1 is not optimized to bool0 | bool1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107887 Bug ID: 107887 Summary: (bool0 > bool1) | bool1 is not optimized to bool0 | bool1 Product: gcc Version: 13.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: pinskia at gcc dot gnu.org Target Milestone: --- Take: ``` _Bool max(_Bool aa, _Bool bb) { bool t = aa > bb; return t | bb; } ``` This should be optimized to just `return aa | bb;` I accidently found this while working on PR 101805 . The original testcase which I found it: ``` int ii(_Bool aa, _Bool bb) { int c; int a = aa; int b = bb; if (a > b) c = a; else c = b; if (c) return 100; return c; } ```
[Bug libstdc++/107886] Problem witch std::latch, std::binary_semaphores in C++20
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107886 --- Comment #6 from Jamaika --- I don't understand something. Why _GLIBCXX_HAS_GTHREADS works for std::jthread but not for std::latch ``` #if defined _GLIBCXX_HAS_GTHREADS || defined _GLIBCXX_HAVE_LINUX_FUTEX # define __cpp_lib_atomic_wait 201907L # if __cpp_aligned_new # define __cpp_lib_barrier 201907L # endif #endif ```
[Bug libstdc++/107886] Problem witch std::latch, std::binary_semaphores in C++20
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107886 --- Comment #5 from Jamaika --- I test gcc 13.0.0. No change
[Bug libstdc++/107886] Problem witch std::latch, std::binary_semaphores in C++20
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107886 --- Comment #4 from Jamaika --- I test gcc 13.0.0. No change
[Bug libstdc++/107886] Problem witch std::latch, std::binary_semaphores in C++20
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107886 --- Comment #3 from Jamaika --- (In reply to Andrew Pinski from comment #2) > Also it might be the case mingw work is needed to support > __cpp_lib_atomic_wait and all. I test gcc 13.0.0. No change. http://msystem.waw.pl/x265/mingw-gcc1300-20221124.7z
[Bug libstdc++/107886] Problem witch std::latch, std::binary_semaphores in C++20
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107886 --- Comment #2 from Andrew Pinski --- Also it might be the case mingw work is needed to support __cpp_lib_atomic_wait and all.
[Bug libstdc++/107886] Problem witch std::latch, std::binary_semaphores in C++20
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107886 --- Comment #1 from Andrew Pinski --- Have you tried GCC 12? As C++20 support was barely there for GCC 11. For an example r12-10-gb52aef3a8cbcc8 improved latch support.
[Bug c++/107886] New: Problem witch std::latch, std::binary_semaphores in C++2a
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107886 Bug ID: 107886 Summary: Problem witch std::latch, std::binary_semaphores in C++2a Product: gcc Version: 11.3.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: lukaszcz18 at wp dot pl Target Milestone: --- https://github.com/meganz/mingw-std-threads/issues/67
[Bug c++/99576] [coroutines] destructor of a temporary called too early within co_await expression
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99576 --- Comment #11 from Adrian Perl --- Yeah, my mistake. My IDE failed to look up the function and a short search on the internet revealed only builtin_trap (https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html) You just saved me hours with the -j hint! I assumed it was not applicable as it is not used in the guide (https://gcc.gnu.org/contribute.html). Thanks
[Bug c++/99576] [coroutines] destructor of a temporary called too early within co_await expression
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99576 --- Comment #10 from Iain Sandoe --- (In reply to Adrian Perl from comment #9) > Thanks for the advice. > > I hope you meant __builtin_trap() as I can't find a __builtin_abort() > function. hmm .. I meant __builtin_abort () ... it is widely used in the testsuite for the reasons mentioned (try grepping for it in gcc/testsuite/gcc.dg to see some examples). > I have now written test applications for all relevant bug reports (99576, > 100611, 101976, 101367). great! > I also verified that it fixes 107288, but did not add a test as it requires > boost asio. The way to deal with cases like that is to take the .ii file (so that the dependencies on external headers are removed) and then reduce it to something usable as a test. Such reductions vary in difficulty (using tools like c-vise or creduce can help, sometimes it's possible to do it manually too). [I'm not asking you to do this right now, but mentioning that this is the approach used in such cases]. > Unfortunately I was wrong that the patch will fix 102217 and 101244. They > use similar examples but also the ternary operator, which still leads to an > invalid statement error when used in co_awaits. Yes, this is a different problem for which I have some work in progress, but not ready for publication just yet. > I will send the patch together with the testfiles as soon as the testsuite > has finished. Is it normal that it takes more than 6 hours to complete? depends on your hardware .. my fastest box takes about 2 hours, my slowest nearly a week :) .. so long as you are using "-jN" on the make line where N ≈ the number of threads your hardware will accommodate, that's about the best you can do. If you plan on working more with GCC there is also the option to get an account on the "compile farm" which gives you access to more platform versions and some quite powerful hardware.
[Bug c++/99576] [coroutines] destructor of a temporary called too early within co_await expression
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99576 --- Comment #9 from Adrian Perl --- Thanks for the advice. I hope you meant __builtin_trap() as I can't find a __builtin_abort() function. I have now written test applications for all relevant bug reports (99576, 100611, 101976, 101367). I also verified that it fixes 107288, but did not add a test as it requires boost asio. Unfortunately I was wrong that the patch will fix 102217 and 101244. They use similar examples but also the ternary operator, which still leads to an invalid statement error when used in co_awaits. I will send the patch together with the testfiles as soon as the testsuite has finished. Is it normal that it takes more than 6 hours to complete?
[Bug demangler/107884] H8/300: cp-demangle.c fix warning related demangle.h
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107884 --- Comment #2 from SASANO Takayoshi --- Hello, can you tell me more details to do? I think "some better way" seems to be one of them as follows. 1) change "#if __INT_WIDTH__ > 16 ~ #else ~ #endif" to "#if defined(__INT_WIDTH__) && (__INT_WIDTH__ <= 16) ~ #else ~ #endif" to safer choice. 2) remove "#define DMGL_OPT_BIT(x)", all "#define DMGL_..." uses (1 << x). 3) abandon remap bit position for int=16bit architecture, modify codes that can pass 32bit-value option. 4) others (I have no idea...) please tell me.