[Bug target/112532] [14 Regression] ICE: in extract_insn, at recog.cc:2804 (unrecognizable insn: vec_duplicate:V4HI) with -O -msse4 since r14-5388-g2794d510b979be

2023-11-14 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112532 --- Comment #3 from Hongtao.liu --- mine.

[Bug tree-optimization/112104] loop of ^1 should just be reduced to ^(n&1)

2023-11-14 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112104 --- Comment #5 from Hongtao.liu --- (In reply to Andrew Pinski from comment #4) > Fixed via r14-5428-gfd1596f9962569afff6c9298a7c79686c6950bef . Note, my patch only handles constant tripcount for XOR, but not do the transformation when

[Bug target/112374] [14 Regression] `--with-arch=skylake-avx512 --with-cpu=skylake-avx512` causes a comparison failure

2023-11-14 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112374 --- Comment #12 from Hongtao.liu --- > So the testsuite without bootstrap is really unchanged? We still have a Yes, no extra regression observed from gcc testsuite(both w/ and w/o --with-arch=skylake-avx512 --with-cpu=skylake-avx512 in

[Bug target/112374] [14 Regression] `--with-arch=skylake-avx512 --with-cpu=skylake-avx512` causes a comparison failure

2023-11-14 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112374 --- Comment #10 from Hongtao.liu --- Below patch can pass bootstrap --with-arch=skylake-avx512 --with-cpu=skylake-avx512, but didn't observe obvious typo/bug in the pattern. diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index

[Bug fortran/106402] half preicision is not supported by gfortran(real*2).

2023-11-13 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106402 --- Comment #3 from Hongtao.liu --- (In reply to Thomas Koenig from comment #2) > It would make sense to have it, I guess. If somebody has access > to the relevant hardware, it could also be tested :-) x86 support _Float16 operations with

[Bug libfortran/110966] should matmul_c8_avx512f be updated with matmul_c8_x86-64-v4.

2023-11-13 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110966 --- Comment #6 from Hongtao.liu --- (In reply to Thomas Koenig from comment #5) > (In reply to Hongtao.liu from comment #4) > > (In reply to anlauf from comment #3) > > > (In reply to Hongtao.liu from comment #2) > > > > (In reply to Richard

[Bug tree-optimization/112496] [13/14 Regression] ICE: in vectorizable_nonlinear_induction, at tree-vect-loop.cc with bit fields

2023-11-13 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112496 --- Comment #3 from Hongtao.liu --- (In reply to Richard Biener from comment #2) > if (TREE_CODE (init_expr) == INTEGER_CST) > init_expr = fold_convert (TREE_TYPE (vectype), init_expr); > else > gcc_assert (tree_nop_conversion_p

[Bug target/112374] [14 Regression] `--with-arch=skylake-avx512 --with-cpu=skylake-avx512` causes a comparison failure

2023-11-09 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112374 --- Comment #9 from Hongtao.liu --- When I remove all cond_ patterns, it passed bootstrap. continue to rootcause the exact pattern which cause the bootstrapped failure

[Bug target/112443] [12/13/14 Regression] Misoptimization of _mm256_blendv_epi8 intrinsic on avx512bw+avx512vl

2023-11-09 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112443 --- Comment #7 from Hongtao.liu --- Should be Fixed in GCC14/GCC13/GCC12

[Bug target/112443] [12/13/14 Regression] Misoptimization of _mm256_blendv_epi8 intrinsic on avx512bw+avx512vl

2023-11-08 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112443 --- Comment #1 from Hongtao.liu --- The below can fix that, there's typo for 2 splitters. @@ -17082,7 +17082,7 @@ (define_insn_and_split "*avx2_pcmp3_4" (match_dup 4))] UNSPEC_BLENDV))] { - if (INTVAL

[Bug bootstrap/112441] Comparing stages 2 and 3 Bootstrap comparison failure!

2023-11-08 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112441 Hongtao.liu changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|---

[Bug target/112374] [14 Regression] `--with-arch=skylake-avx512 --with-cpu=skylake-avx512` causes a comparison failure

2023-11-08 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112374 Hongtao.liu changed: What|Removed |Added CC||crazylht at gmail dot com --- Comment #7

[Bug bootstrap/112441] New: Comparing stages 2 and 3 Bootstrap comparison failure!

2023-11-08 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112441 Bug ID: 112441 Summary: Comparing stages 2 and 3 Bootstrap comparison failure! Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3

[Bug target/112393] [14 Regression] ICE: in gen_reg_rtx, at emit-rtl.cc:1208 with -mavx5124fmaps -Wuninitialized

2023-11-06 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112393 --- Comment #5 from Hongtao.liu --- Fixed.

[Bug target/112393] [14 Regression] ICE: in gen_reg_rtx, at emit-rtl.cc:1208 with -mavx5124fmaps -Wuninitialized

2023-11-05 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112393 --- Comment #3 from Hongtao.liu --- Yes, should return true if d->testing_p instead of generate rtl code.

[Bug rtl-optimization/108707] suboptimal allocation with same memory op for many different instructions.

2023-10-31 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108707 Hongtao.liu changed: What|Removed |Added Status|NEW |RESOLVED Resolution|---

[Bug tree-optimization/102383] Missing optimization for PRE after enable O2 vectorization

2023-10-31 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102383 --- Comment #5 from Hongtao.liu --- It's fixed in GCC12.1

[Bug target/105034] [11/12/13/14 regression]Suboptimal codegen for min/max with -Os

2023-10-31 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105034 Hongtao.liu changed: What|Removed |Added Resolution|--- |FIXED Status|NEW

[Bug tree-optimization/53947] [meta-bug] vectorizer missed-optimizations

2023-10-31 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947 Bug 53947 depends on bug 101956, which changed state. Bug 101956 Summary: Miss vectorization from v4hi to v4df https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101956 What|Removed |Added

[Bug tree-optimization/101956] Miss vectorization from v4hi to v4df

2023-10-31 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101956 Hongtao.liu changed: What|Removed |Added Status|NEW |RESOLVED Resolution|---

[Bug middle-end/110015] openjpeg is slower when built with gcc13 compared to clang16

2023-10-31 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110015 --- Comment #4 from Hongtao.liu --- > So here we have a reduction for MAX_EXPR, but there's 2 MAX_EXPR which can > be merge together with MAX_EXPR > > Create pr112324.

[Bug middle-end/112324] New: phiopt fail to recog if (b < 0) max = MAX(-b, max); else max = MAX (b, max) into max = MAX (ABS(b), max)

2023-10-31 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112324 Bug ID: 112324 Summary: phiopt fail to recog if (b < 0) max = MAX(-b, max); else max = MAX (b, max) into max = MAX (ABS(b), max) Product: gcc Version: 14.0 Status:

[Bug middle-end/110015] openjpeg is slower when built with gcc13 compared to clang16

2023-10-31 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110015 --- Comment #3 from Hongtao.liu --- 169test.c:85:23: note: vect_is_simple_use: operand max_38 = PHI , type of def: unknown 170test.c:85:23: missed: Unsupported pattern. 171test.c:62:24: missed: not vectorized: unsupported use in stmt.

[Bug target/112276] [14 Regression] wrong code with -O2 -msse4.2 since r14-4964-g7eed861e8ca3f5

2023-10-30 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112276 --- Comment #8 from Hongtao.liu --- Fixed.

[Bug tree-optimization/112104] loop of ^1 should just be reduced to ^(n&1)

2023-10-30 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112104 --- Comment #3 from Hongtao.liu --- We already have analyze_and_compute_bitop_with_inv_effect, but it only works when inv is an SSA_NAME, it should be extended to constant.

[Bug target/112276] [14 Regression] wrong code with -O2 -msse4.2 since r14-4964-g7eed861e8ca3f5

2023-10-29 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112276 --- Comment #4 from Hongtao.liu --- -(define_split - [(set (match_operand:V2HI 0 "register_operand") -(eq:V2HI - (eq:V2HI -(us_minus:V2HI - (match_operand:V2HI 1 "register_operand") -

[Bug target/112276] [14 Regression] wrong code with -O2 -msse4.2 since r14-4964-g7eed861e8ca3f5

2023-10-29 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112276 Hongtao.liu changed: What|Removed |Added CC||crazylht at gmail dot com --- Comment #3

[Bug tree-optimization/111972] [14 regression] missed vectorzation for bool a = j != 1; j = (long int)a;

2023-10-26 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111972 --- Comment #7 from Hongtao.liu --- (In reply to Andrew Pinski from comment #3) > First off does this even make sense to vectorize but rather do some kind of > scalar reduction with respect to j = j^1 here . Filed PR 112104 for that. > >

[Bug tree-optimization/111972] [14 regression] missed vectorzation for bool a = j != 1; j = (long int)a;

2023-10-26 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111972 --- Comment #6 from Hongtao.liu --- (In reply to Andrew Pinski from comment #5) > Oh this is the original code: > https://github.com/kdlucas/byte-unixbench/blob/master/UnixBench/src/whets.c > Yes, it's from unixbench.

[Bug tree-optimization/111833] [13/14 Regression] GCC: 14: hangs on a simple for loop

2023-10-26 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111833 --- Comment #5 from Hongtao.liu --- It's the same issue as PR111820, thus should be fixed.

[Bug tree-optimization/111820] [13 Regression] Compiler time hog in the vectorizer with `-O3 -fno-tree-vrp`

2023-10-26 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111820 --- Comment #15 from Hongtao.liu --- (In reply to Richard Biener from comment #13) > (In reply to Hongtao.liu from comment #12) > > Fixed in GCC14, not sure if we want to backport the patch. > > If so, the patch needs to be adjusted since GCC13

[Bug tree-optimization/111972] [14 regression] missed vectorzation for bool a = j != 1; j = (long int)a;

2023-10-25 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111972 Hongtao.liu changed: What|Removed |Added CC||pinskia at gcc dot gnu.org

[Bug middle-end/111972] New: [14 regression] missed vectorzation for bool a = j != 1; j = (long int)a;

2023-10-25 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111972 Bug ID: 111972 Summary: [14 regression] missed vectorzation for bool a = j != 1; j = (long int)a; Product: gcc Version: 14.0 Status: UNCONFIRMED Severity:

[Bug target/111874] Missed mask_fold_left_plus with AVX512

2023-10-23 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111874 --- Comment #3 from Hongtao.liu --- > For the case of conditional (or loop masked) fold-left reductions the scalar > fallback isn't implemented. But AVX512 has vpcompress that could be used > to implement a more efficient sequence for a masked

[Bug target/111889] [14 Regression] 128/256 intrins could not be used with only specifying "no-evex512, avx512vl" in function attribute

2023-10-22 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111889 Hongtao.liu changed: What|Removed |Added CC||crazylht at gmail dot com --- Comment #4

[Bug tree-optimization/111820] [13/14 Regression] Compiler time hog in the vectorizer with `-O3 -fno-tree-vrp`

2023-10-22 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111820 --- Comment #12 from Hongtao.liu --- Fixed in GCC14, not sure if we want to backport the patch. If so, the patch needs to be adjusted since GCC13 doesn't support auto_mpz.

[Bug target/111874] Missed mask_fold_left_plus with AVX512

2023-10-19 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111874 --- Comment #1 from Hongtao.liu --- For integer, We have _mm512_mask_reduce_add_epi32 defined as extern __inline int __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) _mm512_mask_reduce_add_epi32 (__mmask16 __U, __m512i __A)

[Bug tree-optimization/111859] 521.wrf_r build failure with -O2 -march=cascadelake --param vect-partial-vector-usage=2

2023-10-18 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111859 --- Comment #1 from Hongtao.liu --- Could be reproduced with: tar zxvf 521.tar.gz cd 521 gfortran module_advect_em.fppizedi.f90 -S -O2 -march=cascadelake --param vect-partial-vector-usage=2 -std=legacy -fconvert=big-endian

[Bug tree-optimization/111859] New: 521.wrf_r build failure with -O2 -march=cascadelake --param vect-partial-vector-usage=2

2023-10-18 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111859 Bug ID: 111859 Summary: 521.wrf_r build failure with -O2 -march=cascadelake --param vect-partial-vector-usage=2 Product: gcc Version: 14.0 Status: UNCONFIRMED

[Bug tree-optimization/111820] [13/14 Regression] Compiler time hog in the vectorizer with `-O3 -fno-tree-vrp`

2023-10-17 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111820 --- Comment #9 from Hongtao.liu --- > But we end up here with niters_skip being INTEGER_CST and .. > > > 1421 || (!vect_use_loop_mask_for_alignment_p (loop_vinfo) > > possibly vect_use_loop_mask_for_alignment_p. Note >

[Bug tree-optimization/111820] [13/14 Regression] Compiler time hog in the vectorizer with `-O3 -fno-tree-vrp`

2023-10-17 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111820 --- Comment #7 from Hongtao.liu --- (In reply to rguent...@suse.de from comment #6) > On Mon, 16 Oct 2023, crazylht at gmail dot com wrote: > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111820 > > > > --- Comment #5 from Hongtao.liu ---

[Bug target/111829] Redudant register moves inside the loop

2023-10-16 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111829 --- Comment #4 from Hongtao.liu --- (In reply to Richard Biener from comment #2) > You sink the conversion, so it would be PRE on the reverse graph. The > transform doesn't really fit a particular pass I think. The conversions also needs to be

[Bug target/111829] Redudant register moves inside the loop

2023-10-16 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111829 --- Comment #3 from Hongtao.liu --- (In reply to Richard Biener from comment #2) > You sink the conversion, so it would be PRE on the reverse graph. The > transform doesn't really fit a particular pass I think. > > Why does the problem

[Bug tree-optimization/111820] [13/14 Regression] Compiler time hog in the vectorizer with `-O3 -fno-tree-vrp`

2023-10-16 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111820 --- Comment #5 from Hongtao.liu --- (In reply to Richard Biener from comment #3) > for (unsigned i = 0; i != skipn - 1; i++) > begin = wi::mul (begin, wi::to_wide (step_expr)); > > (gdb) p skipn > $5 = 4294967292 > > niters

[Bug tree-optimization/111820] [13/14 Regression] Compiler time hog in the vectorizer with `-O3 -fno-tree-vrp`

2023-10-16 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111820 --- Comment #4 from Hongtao.liu --- > niters is 4294967292 in vect_update_ivs_after_vectorizer. Maybe the loop > should terminate when begin is zero. But I wonder why we pass in 'niters' > and then name it 'skip_niters' ... > It's coming

[Bug target/111829] Redudant register moves inside the loop

2023-10-16 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111829 --- Comment #1 from Hongtao.liu --- ivtmp.23_31 = (unsigned long) b_24(D); ivtmp.24_46 = (unsigned long) pa_26(D); _50 = ivtmp.23_31 + 40; [local count: 1063004408]: # vsum_35 = PHI # ivtmp.23_14 = PHI # ivtmp.24_30 = PHI

[Bug target/111829] New: Redudant register moves inside the loop

2023-10-15 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111829 Bug ID: 111829 Summary: Redudant register moves inside the loop Product: gcc Version: 14.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal

[Bug target/111768] X86: -march=native does not support alder lake big.little cache infor correctly

2023-10-12 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111768 --- Comment #10 from Hongtao.liu --- > indeed (but I believe it did happen with Alder Lake already, by accident, > with AVX512 on P-cores but not on E-cores). AVX512 is physically fused off for Alderlake P-core, P-core and E-core share the

[Bug target/111768] X86: -march=native does not support alder lake big.little cache infor correctly

2023-10-11 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111768 --- Comment #4 from Hongtao.liu --- I checked Alderlake's L1 cachesize and it is indeed 48, and L1 cachesize in alderlake_cost is set to 32. But then again, we have a lot of different platforms that share the same cost and they may have

[Bug target/111745] [14 Regression] ICE: in extract_insn, at recog.cc:2791 (unrecognizable insn) with -ffloat-store -mavx512fp16 -mavx512vl

2023-10-10 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111745 --- Comment #3 from Hongtao.liu --- Fixed.

[Bug target/104610] memcmp () == 0 can be optimized better for avx512f

2023-10-10 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104610 --- Comment #22 from Hongtao.liu --- For 64-byte memory comparison int compare (const char* s1, const char* s2) { return __builtin_memcmp (s1, s2, 64) == 0; } We're generating vmovdqu (%rsi), %ymm0 vpxorq (%rdi),

[Bug target/111745] [14 Regression] ICE: in extract_insn, at recog.cc:2791 (unrecognizable insn) with -ffloat-store -mavx512fp16 -mavx512vl

2023-10-09 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111745 Hongtao.liu changed: What|Removed |Added CC||crazylht at gmail dot com --- Comment #1

[Bug libgcc/111731] [13/14 regression] gcc_assert is hit at libgcc/unwind-dw2-fde.c#L291

2023-10-08 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111731 --- Comment #2 from Hongtao.liu --- The original project is too complex for me to come up with a reproduction case, I can help with gdb if additional information is needed.

[Bug libgcc/111731] [13/14 regression] gcc_assert is hit at libgcc/unwind-dw2-fde.c#L291

2023-10-08 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111731 --- Comment #1 from Hongtao.liu --- GCC11.3 is ok, GCC13.2 and later have the issue, I didn't verify GCC12.

[Bug libgcc/111731] New: [13/14 regression] gcc_assert is hit at libgcc/unwind-dw2-fde.c#L291

2023-10-08 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111731 Bug ID: 111731 Summary: [13/14 regression] gcc_assert is hit at libgcc/unwind-dw2-fde.c#L291 Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal

[Bug tree-optimization/111402] Loop distribution fail to optimize memmove for multiple consecutive moves within a loop

2023-09-13 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111402 --- Comment #2 from Hongtao.liu --- Adjust code in foo1, use < n instead of != n, the issue remains. void foo1 (v4di* __restrict a, v4di *b, int n) { for (int i = 0; i < n; i+=2) { a[i] = b[i]; a[i+1] = b[i+1]; } }

[Bug middle-end/111402] New: Loop distribution fail to optimize memmove for multiple consecutive moves within a loop

2023-09-13 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111402 Bug ID: 111402 Summary: Loop distribution fail to optimize memmove for multiple consecutive moves within a loop Product: gcc Version: 14.0 Status: UNCONFIRMED

[Bug target/111354] [7/10/12 regression] The instructions of the DPDK demo program are different and run time increases.

2023-09-13 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111354 Hongtao.liu changed: What|Removed |Added CC||crazylht at gmail dot com --- Comment #5

[Bug target/111306] [12,13] macro-fusion makes error on conjugate complex multiplication fp16

2023-09-10 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111306 --- Comment #8 from Hongtao.liu --- Fixed in GCC14.1 GCC13.3 GCC12.4

[Bug target/111335] fmaddpch seems not commutative for operands[1] and operands[2] due to precision loss

2023-09-10 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111335 Hongtao.liu changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|---

[Bug target/111306] [12,13] macro-fusion makes error on conjugate complex multiplication fp16

2023-09-08 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111306 --- Comment #4 from Hongtao.liu --- A related PR111335 for fmaddcph , similar but not the same, PR111335 is due to precision difference for complex _Float16 fma, fmaddcph a, b, c is not equal to fmaddcph b, a, c

[Bug target/111335] New: fmaddpch seems not commutative for operands[1] and operands[2] due to precision loss

2023-09-07 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111335 Bug ID: 111335 Summary: fmaddpch seems not commutative for operands[1] and operands[2] due to precision loss Product: gcc Version: 14.0 Status: UNCONFIRMED

[Bug target/111306] [12,13] macro-fusion makes error on conjugate complex multiplication fp16

2023-09-07 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111306 --- Comment #3 from Hongtao.liu --- A patch is posted at https://gcc.gnu.org/pipermail/gcc-patches/2023-September/629650.html

[Bug target/111333] Runtime failure for fcmulcph instrinsic

2023-09-07 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111333 --- Comment #2 from Hongtao.liu --- The test failed since GCC12 when the pattern is added

[Bug target/111333] Runtime failure for fcmulcph instrinsic

2023-09-07 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111333 --- Comment #1 from Hongtao.liu --- fmulcph/fmaddcph is commutative for operands[1] and operands[2], but fcmulcph/fcmaddcph is not, since it's Complex conjugate operations. Below change fixes the issue. diff --git a/gcc/config/i386/sse.md

[Bug target/111333] New: Runtime failure for fcmulcph instrinsic

2023-09-07 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111333 Bug ID: 111333 Summary: Runtime failure for fcmulcph instrinsic Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target

[Bug target/111225] ICE in curr_insn_transform, unable to generate reloads for xor, since r14-2447-g13c556d6ae84be

2023-08-29 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111225 --- Comment #2 from Hongtao.liu --- (In reply to Hongtao.liu from comment #1) > So reload thought CT_SPECIAL_MEMORY is always win for spilled_pesudo_p, but > here Br should be a vec_dup:mem which doesn't match spilled_pseduo_p. > >

[Bug target/111225] ICE in curr_insn_transform, unable to generate reloads for xor, since r14-2447-g13c556d6ae84be

2023-08-29 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111225 --- Comment #1 from Hongtao.liu --- So reload thought CT_SPECIAL_MEMORY is always win for spilled_pesudo_p, but here Br should be a vec_dup:mem which doesn't match spilled_pseduo_p. case CT_SPECIAL_MEMORY:

[Bug target/111064] 5-10% regression of parest on icelake between g:d073e2d75d9ed492de9a8dc6970e5b69fae20e5a (Aug 15 2023) and g:9ade70bb86c8744f4416a48bb69cf4705f00905a (Aug 16)

2023-08-29 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111064 --- Comment #6 from Hongtao.liu --- > > [liuhongt@intel gather_emulation]$ ./gather.out > ;./nogather_xmm.out;./nogather_ymm.out > elapsed time: 1.75997 seconds for gather with 3000 iterations > elapsed time: 2.42473 seconds for

[Bug target/111119] maskload and maskstore for integer modes are oddly conditional on AVX2

2023-08-27 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=19 --- Comment #5 from Hongtao.liu --- Fixed in GCC14.

[Bug middle-end/111152] ~7-9% performance regression on 510.parest_r SPEC 2017 benchmark

2023-08-25 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=52 --- Comment #2 from Hongtao.liu --- > With Zen3 -O2 generic lto pgo the regression is less noticeable (only 4%) > https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=694.457.0 Not sure about this part

[Bug middle-end/111152] ~7-9% performance regression on 510.parest_r SPEC 2017 benchmark

2023-08-25 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=52 Hongtao.liu changed: What|Removed |Added CC||crazylht at gmail dot com --- Comment #1

[Bug target/111064] 5-10% regression of parest on icelake between g:d073e2d75d9ed492de9a8dc6970e5b69fae20e5a (Aug 15 2023) and g:9ade70bb86c8744f4416a48bb69cf4705f00905a (Aug 16)

2023-08-25 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111064 --- Comment #4 from Hongtao.liu --- The loop is like doublefoo (double* a, unsigned* b, double* c, int n) { double sum = 0; for (int i = 0; i != n; i++) { sum += a[i] * c[b[i]]; } return sum; } After

[Bug target/111119] maskload and maskstore for integer modes are oddly conditional on AVX2

2023-08-24 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=19 --- Comment #3 from Hongtao.liu --- > I see, we can add an alternative like "noavx2,avx2" to generate > vmaskmovps/pd when avx2 is not available for integer. It's better to change assmeble output as 27423 if (TARGET_AVX2) 27424return

[Bug target/111119] maskload and maskstore for integer modes are oddly conditional on AVX2

2023-08-24 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=19 Hongtao.liu changed: What|Removed |Added CC||crazylht at gmail dot com --- Comment #2

[Bug target/94866] Failure to optimize pinsrq of 0 with index 1 into movq

2023-08-23 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94866 --- Comment #8 from Hongtao.liu --- (In reply to Uroš Bizjak from comment #7) > (In reply to Hongtao.liu from comment #6) > > > So, the compiler still expects vec_concat/vec_select patterns to be > > > present. > > > > v2df foo_v2df (v2df x)

[Bug target/94866] Failure to optimize pinsrq of 0 with index 1 into movq

2023-08-23 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94866 --- Comment #6 from Hongtao.liu --- (In reply to Uroš Bizjak from comment #4) > (In reply to Hongtao.liu from comment #3) > > in x86 backend expand_vec_perm_1, we always tries vec_merge frist for > > !one_operand_p, expand_vselect_vconcat is

[Bug target/94866] Failure to optimize pinsrq of 0 with index 1 into movq

2023-08-22 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94866 Hongtao.liu changed: What|Removed |Added CC||crazylht at gmail dot com --- Comment #3

[Bug target/111064] 5-10% regression of parest on icelake between g:d073e2d75d9ed492de9a8dc6970e5b69fae20e5a (Aug 15 2023) and g:9ade70bb86c8744f4416a48bb69cf4705f00905a (Aug 16)

2023-08-21 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111064 --- Comment #3 from Hongtao.liu --- I didn't find the any regression when testing the patch. Guess it's because my tester is full-copy run and the options are -march=native -Ofast -flto -funroll-loop. Let me verify it.

[Bug target/111062] ICE: in final_scan_insn_1, at final.cc:2808 could not split insn {*andndi_1} with -O -mavx10.1-256 -mavx512bw -mno-avx512f

2023-08-20 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111062 --- Comment #1 from Hongtao.liu --- (In reply to Zdenek Sojka from comment #0) > Created attachment 55755 [details] > reduced testcase > > Compiler output: > $ x86_64-pc-linux-gnu-gcc -O -mavx10.1-256 -mavx512bw -mno-avx512f testcase.c > cc1:

[Bug libfortran/110966] should matmul_c8_avx512f be updated with matmul_c8_x86-64-v4.

2023-08-13 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110966 --- Comment #4 from Hongtao.liu --- (In reply to anlauf from comment #3) > (In reply to Hongtao.liu from comment #2) > > (In reply to Richard Biener from comment #1) > > > I think matmul is fine with avx512f or avx, so requiring/using only the

[Bug target/110979] New: Miss-optimization for O2 fully masked loop on floating point reduction.

2023-08-10 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110979 Bug ID: 110979 Summary: Miss-optimization for O2 fully masked loop on floating point reduction. Product: gcc Version: 14.0 Status: UNCONFIRMED Severity:

[Bug libfortran/110966] should matmul_c8_avx512f be updated with matmul_c8_x86-64-v4.

2023-08-10 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110966 --- Comment #2 from Hongtao.liu --- (In reply to Richard Biener from comment #1) > I think matmul is fine with avx512f or avx, so requiring/using only the base > ISA level sounds fine to me. Could be potential miss-optimization.

[Bug libfortran/110966] New: should matmul_c8_avx512f be updated with matmul_c8_x86-64-v4.

2023-08-10 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110966 Bug ID: 110966 Summary: should matmul_c8_avx512f be updated with matmul_c8_x86-64-v4. Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal

[Bug target/110926] [14 regression] Bootstrap failure (matmul_i1.c:1781:1: internal compiler error: RTL check: expected elt 0 type 'i' or 'n', have 'w' (rtx const_int) in vpternlog_redundant_operand_m

2023-08-07 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110926 --- Comment #10 from Hongtao.liu --- Fixed in GCC14.

[Bug target/110921] Relax _tzcnt_u32 support x86, all x86 arch support for this instrunction

2023-08-07 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110921 --- Comment #11 from Hongtao.liu --- (In reply to 罗勇刚(Yonggang Luo) from comment #10) > (In reply to Hongtao.liu from comment #9) > > > > Without `-mbmi` option, gcc can not compile and all other three compiler > > > can compile. > > > > As

[Bug target/110921] Relax _tzcnt_u32 support x86, all x86 arch support for this instrunction

2023-08-07 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110921 --- Comment #9 from Hongtao.liu --- > There is a redundant xor instrunction, There's false dependence issue on some specific processors. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62011 > Without `-mbmi` option, gcc can not compile and all

[Bug target/110926] [14 regression] Bootstrap failure (matmul_i1.c:1781:1: internal compiler error: RTL check: expected elt 0 type 'i' or 'n', have 'w' (rtx const_int) in vpternlog_redundant_operand_m

2023-08-06 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110926 --- Comment #8 from Hongtao.liu --- (In reply to Alexander Monakov from comment #7) > Thanks for identifying the problem. Please don't rename the argument to > 'op_mask' though: the parameter itself is not a mask, it's an eight-bit > control

[Bug target/110926] [14 regression] Bootstrap failure (matmul_i1.c:1781:1: internal compiler error: RTL check: expected elt 0 type 'i' or 'n', have 'w' (rtx const_int) in vpternlog_redundant_operand_m

2023-08-06 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110926 --- Comment #6 from Hongtao.liu --- (In reply to Hongtao.liu from comment #5) > I'm working on a patch. int -vpternlog_redundant_operand_mask (rtx *operands) +vpternlog_redundant_operand_mask (rtx op_mask) { int mask = 0; - int imm8 =

[Bug target/110926] [14 regression] Bootstrap failure (matmul_i1.c:1781:1: internal compiler error: RTL check: expected elt 0 type 'i' or 'n', have 'w' (rtx const_int) in vpternlog_redundant_operand_m

2023-08-06 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110926 --- Comment #5 from Hongtao.liu --- I'm working on a patch.

[Bug target/110921] Relax _tzcnt_u32 support x86, all x86 arch support for this instrunction

2023-08-06 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110921 --- Comment #7 from Hongtao.liu --- (In reply to 罗勇刚(Yonggang Luo) from comment #6) > MSVC also added, clang seems have optimization issue, but MSVC doesn't have > that No, I think what clang does is correct, f(int, int):

[Bug target/105504] Fails to break dependency for vcvtss2sd xmm, xmm, mem

2023-08-06 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105504 Hongtao.liu changed: What|Removed |Added CC||crazylht at gmail dot com --- Comment #8

[Bug target/110921] Relax _tzcnt_u32 support x86, all x86 arch support for this instrunction

2023-08-06 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110921 --- Comment #5 from Hongtao.liu --- Maybe source code can be changed as int f(int a, int b) { #ifdef __BMI__ return _tzcnt_u32 (a); #else return _bit_scan_forward (a); #endif } But looks like clang/MSVC doesn't

[Bug target/110921] Relax _tzcnt_u32 support x86, all x86 arch support for this instrunction

2023-08-06 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110921 --- Comment #4 from Hongtao.liu --- (In reply to Hongtao.liu from comment #3) > But there's difference between TZCNT and BSF > > The key difference between TZCNT and BSF instruction is that TZCNT provides > operand size as output when source

[Bug target/110921] Relax _tzcnt_u32 support x86, all x86 arch support for this instrunction

2023-08-06 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110921 Hongtao.liu changed: What|Removed |Added CC||crazylht at gmail dot com --- Comment #3

[Bug target/110762] [11/12/13 Regression] inappropriate use of SSE (or AVX) insns for v2sf mode operations

2023-07-31 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110762 --- Comment #23 from Hongtao.liu --- (In reply to Uroš Bizjak from comment #22) > It looks to me that partial vector half-float instructions have the same > issue. Yes, I'll take a look.

[Bug target/81904] FMA and addsub instructions

2023-07-31 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81904 --- Comment #7 from Hongtao.liu --- > > to .VEC_ADDSUB possibly loses exceptions (the vectorizer now directly > creates .VEC_ADDSUB when possible). Let's put it under -fno-trapping-math.

[Bug target/81904] FMA and addsub instructions

2023-07-30 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81904 --- Comment #5 from Hongtao.liu --- (In reply to Richard Biener from comment #1) > Hmm, I think the issue is we see > > f (__m128d x, __m128d y, __m128d z) > { > vector(2) double _4; > vector(2) double _6; > >[100.00%]: > _4 = x_2(D)

[Bug target/81904] FMA and addsub instructions

2023-07-30 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81904 --- Comment #4 from Hongtao.liu --- (In reply to Richard Biener from comment #2) > __m128d h(__m128d x, __m128d y, __m128d z){ > __m128d tem = _mm_mul_pd (x,y); > __m128d tem2 = tem + z; > __m128d tem3 = tem - z; > return

[Bug middle-end/110832] [14 Regression] 14% capacita -O2 regression between g:9fdbd7d6fa5e0a76 (2023-07-26 01:45) and g:ca912a39cccdd990 (2023-07-27 03:44) on zen3 and core

2023-07-30 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110832 Hongtao.liu changed: What|Removed |Added CC||crazylht at gmail dot com --- Comment #9

  1   2   3   4   5   6   7   8   9   10   >