[Bug target/114148] gcc.target/i386/pr106010-7b.c FAILs

2024-05-24 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114148 Hongtao Liu changed: What|Removed |Added Resolution|--- |FIXED Status|UNCONFIRMED

[Bug target/114148] gcc.target/i386/pr106010-7b.c FAILs

2024-05-23 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114148 --- Comment #4 from Hongtao Liu --- (In reply to r...@cebitec.uni-bielefeld.de from comment #3) > To investigate further, I've added comparison functions to a reduced > version of pr106010-7b.c, with > > void > cmp_epi8 (_Complex unsigned

[Bug target/115161] [15 Regression] highway-1.0.7 miscompilation of some SSE2 intrinsics

2024-05-22 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115161 --- Comment #16 from Hongtao Liu --- > > That said, this change really won't help the backend which supposedly should > have the same behavior regardless of -fno-trapping-math, because in that > case it is the value > of the result (which is

[Bug target/115069] [14/15 regression] 8 bit integer vector performance regression, x86, between gcc-14 and gcc-13 using avx2 target clones on skylake platform

2024-05-21 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115069 Hongtao Liu changed: What|Removed |Added Status|NEW |RESOLVED Resolution|---

[Bug target/115161] [15 Regression] highway-1.0.7 miscompilation of some SSE2 intrinsics

2024-05-21 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115161 --- Comment #11 from Hongtao Liu --- (In reply to Jakub Jelinek from comment #10) > Any of the floating point to integer intrinsics if they have out of range > value (haven't checked whether floating point to unsigned intrinsic is a > problem

[Bug target/114427] [x86] vec_pack_truncv8si/v4si can be optimized with pblendw instead of pand for AVX2 target

2024-05-20 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114427 Hongtao Liu changed: What|Removed |Added Resolution|--- |FIXED Status|UNCONFIRMED

[Bug rtl-optimization/115021] [14/15 regression] unnecessary spill for vpternlog

2024-05-20 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115021 --- Comment #4 from Hongtao Liu --- (In reply to Hu Lin from comment #3) > I found compiler allocates mem to the third source register of vpternlog in > IRA after commit f55cdce3f8dd8503e080e35be59c5f5390f6d95e. And it cause the > generate code

[Bug target/115069] [14/15 regression] 8 bit integer vector performance regression, x86, between gcc-14 and gcc-13 using avx2 target clones on skylake platform

2024-05-20 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115069 --- Comment #16 from Hongtao Liu --- > Should we also run a SPEC on with -O2 -mtune=generic -march=x86-64-v3 to see > if there is any surprise? Sure, I guess no.

[Bug target/115069] [14/15 regression] 8 bit integer vector performance regression, x86, between gcc-14 and gcc-13 using avx2 target clones on skylake platform

2024-05-20 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115069 --- Comment #14 from Hongtao Liu --- (In reply to Uroš Bizjak from comment #13) > (In reply to Haochen Jiang from comment #12) > > (In reply to Hongtao Liu from comment #11) > > > (In reply to Haochen Jiang from comment #10) > > > > A patch

[Bug target/115146] [15 Regression] Incorrect 8-byte vectorization: psrlw/psraw confusion

2024-05-19 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115146 Hongtao Liu changed: What|Removed |Added CC||liuhongt at gcc dot gnu.org --- Comment

[Bug target/115069] [14/15 regression] 8 bit integer vector performance regression, x86, between gcc-14 and gcc-13 using avx2 target clones on skylake platform

2024-05-19 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115069 --- Comment #11 from Hongtao Liu --- (In reply to Haochen Jiang from comment #10) > A patch like Comment 8 could definitely solve the problem. But I need to > test more benchmarks to see if there is surprise. > > But, yes, as Uros said in

[Bug target/115069] [14/15 regression] 8 bit integer vector performance regression, x86, between gcc-14 and gcc-13 using avx2 target clones on skylake platform

2024-05-17 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115069 --- Comment #5 from Hongtao Liu --- (In reply to Krzysztof Kanas from comment #4) > I bisected the issue and it seems that commit > 0368fc54bc11f15bfa0ed9913fd0017815dfaa5d introduces regression. I guess the real guilty commit is commit

[Bug target/115116] New: [x86] rtx_cost is overestimated for big size memory.

2024-05-15 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115116 Bug ID: 115116 Summary: [x86] rtx_cost is overestimated for big size memory. Product: gcc Version: 15.0 Status: UNCONFIRMED Keywords: missed-optimization Severity:

[Bug target/114514] v16qi >> 7 can be optimized with vpcmpgtb

2024-05-15 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114514 Hongtao Liu changed: What|Removed |Added Resolution|--- |FIXED Status|NEW

[Bug target/115115] [12/13/14/15 Regression] highway-1.0.7 wrong _mm_cvttps_epi32() constant fold

2024-05-15 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115115 Hongtao Liu changed: What|Removed |Added CC||liuhongt at gcc dot gnu.org --- Comment

[Bug middle-end/115101] New: [wrong code] with -O1 -floop-nest-optimize for gcc.dg/graphite/interchange-8.c

2024-05-15 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115101 Bug ID: 115101 Summary: [wrong code] with -O1 -floop-nest-optimize for gcc.dg/graphite/interchange-8.c Product: gcc Version: 15.0 Status: UNCONFIRMED

[Bug target/101017] ICE: Segmentation fault, convert_memory_address_addr_space_1 with vector_size(32) and target_clone arch=core-avx2/default

2024-05-13 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101017 Hongtao Liu changed: What|Removed |Added CC||haochen.jiang at intel dot com ---

[Bug target/114987] [14/15 Regression] floating point vector regression, x86, between gcc 14 and gcc-13 using -O3 and target clones on skylake platforms

2024-05-10 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114987 --- Comment #6 from Hongtao Liu --- > I tried to move "vmovdqa %xmm1,0xd0(%rsp)" before "vmovdqa %xmm0,0xe0(%rsp)" > and rebuilt the binary and it will save half the regression. 57.93 │200: vaddps 0xc0(%rsp),%ymm3,%ymm5

[Bug rtl-optimization/115021] New: [14/15 regression] unnecessary spill for vpternlog

2024-05-09 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115021 Bug ID: 115021 Summary: [14/15 regression] unnecessary spill for vpternlog Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3

[Bug sanitizer/84508] Load of misaligned address using _mm_load_sd

2024-05-09 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84508 Hongtao Liu changed: What|Removed |Added CC||liuhongt at gcc dot gnu.org

[Bug target/113090] Suboptimal vector permuation for 64-bit vector.

2024-05-07 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113090 Hongtao Liu changed: What|Removed |Added Resolution|--- |FIXED Status|NEW

[Bug target/113079] [x86] Fails to generate dot_prod instructions for 64-bit vector.

2024-05-07 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113079 Hongtao Liu changed: What|Removed |Added Status|NEW |RESOLVED Resolution|---

[Bug target/114943] X86 AVX2: inefficient code generated to convert SIMD Vectors

2024-05-05 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114943 Hongtao Liu changed: What|Removed |Added CC||liuhongt at gcc dot gnu.org --- Comment

[Bug libgcc/114907] __trunchfbf2 should be renamed to __extendhfbf2

2024-05-05 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114907 Hongtao Liu changed: What|Removed |Added CC||liuhongt at gcc dot gnu.org --- Comment

[Bug tree-optimization/114883] [14/15 Regression] 521.wrf_r ICE with -O2 -march=sapphirerapids -fvect-cost-model=cheap

2024-04-29 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114883 --- Comment #10 from Hongtao Liu --- (In reply to Jakub Jelinek from comment #9) > Created attachment 58073 [details] > gcc14-pr114883.patch > > Full untested patch. This will fix 521.wrf_r ICE, and pass runtime validation.

[Bug tree-optimization/114883] [14/15 Regression] 521.wrf_r ICE with -O2 -march=sapphirerapids -fvect-cost-model=cheap

2024-04-29 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114883 --- Comment #5 from Hongtao Liu --- (In reply to Hongtao Liu from comment #4) > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc > index a6cf0a5546c..ae6abe00f3e 100644 > --- a/gcc/tree-vect-loop.cc > +++ b/gcc/tree-vect-loop.cc > @@

[Bug tree-optimization/114883] [14/15 Regression] 521.wrf_r ICE with -O2 -march=sapphirerapids -fvect-cost-model=cheap

2024-04-29 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114883 --- Comment #4 from Hongtao Liu --- diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc index a6cf0a5546c..ae6abe00f3e 100644 --- a/gcc/tree-vect-loop.cc +++ b/gcc/tree-vect-loop.cc @@ -8505,7 +8505,8 @@ vect_transform_reduction

[Bug tree-optimization/114883] [14/15 Regression] 521.wrf_r ICE with -O2 -march=sapphirerapids -fvect-cost-model=cheap

2024-04-28 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114883 --- Comment #3 from Hongtao Liu --- Created attachment 58066 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58066=edit reproduced testcase gfortran -O2 -march=x86-64-v4 -fvect-cost-model=cheap.

[Bug tree-optimization/114883] [14/15 Regression] 521.wrf_r ICE with -O2 -march=sapphirerapids -fvect-cost-model=cheap

2024-04-28 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114883 --- Comment #2 from Hongtao Liu --- (In reply to Andrew Pinski from comment #1) > Can you reduce the fortran code down for the ICE? It should not be hard, you > can use delta even. Let me try.

[Bug tree-optimization/114883] New: 521.wrf_r ICE with -O2 -march=sapphirerapids -fvect-cost-model=cheap

2024-04-28 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114883 Bug ID: 114883 Summary: 521.wrf_r ICE with -O2 -march=sapphirerapids -fvect-cost-model=cheap Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal

[Bug target/110621] x86_64: Test gcc.target/i386/pr105354-2.c fails with -fstack-protector

2024-04-26 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110621 Hongtao Liu changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|---

[Bug target/85048] [missed optimization] vector conversions

2024-04-22 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85048 --- Comment #16 from Hongtao Liu --- (In reply to Matthias Kretz (Vir) from comment #15) > So it seems that if at least one of the vector builtins involved in the > expression is 512 bits GCC needs to locally increase prefer-vector-width to >

[Bug target/85048] [missed optimization] vector conversions

2024-04-21 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85048 Hongtao Liu changed: What|Removed |Added CC||liuhongt at gcc dot gnu.org --- Comment

[Bug target/82731] _mm256_set_epi8(array[offset[0]], array[offset[1]], ...) byte gather makes slow code, trying to zero-extend all the uint16_t offsets first and spilling them.

2024-04-17 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82731 --- Comment #7 from Hongtao Liu --- (In reply to Hongtao Liu from comment #4) > (In reply to Hongtao Liu from comment #3) > > Looks like ix86_vect_estimate_reg_pressure doesn't work here, taking a look. > > Oh, ix86_vect_estimate_reg_pressure

[Bug target/82731] _mm256_set_epi8(array[offset[0]], array[offset[1]], ...) byte gather makes slow code, trying to zero-extend all the uint16_t offsets first and spilling them.

2024-04-17 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82731 --- Comment #4 from Hongtao Liu --- (In reply to Hongtao Liu from comment #3) > Looks like ix86_vect_estimate_reg_pressure doesn't work here, taking a look. Oh, ix86_vect_estimate_reg_pressure is only for loop, BB vectorizer only use

[Bug target/82731] _mm256_set_epi8(array[offset[0]], array[offset[1]], ...) byte gather makes slow code, trying to zero-extend all the uint16_t offsets first and spilling them.

2024-04-17 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82731 Hongtao Liu changed: What|Removed |Added CC||liuhongt at gcc dot gnu.org --- Comment

[Bug target/114591] [12/13/14 Regression] register allocators introduce an extra load operation since gcc-12

2024-04-11 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114591 --- Comment #16 from Hongtao Liu --- > > 4952 /* See if a MEM has already been loaded with a widening operation; > 4953 if it has, we can use a subreg of that. Many CISC machines > 4954 also have such operations, but

[Bug target/114591] [12/13/14 Regression] register allocators introduce an extra load operation since gcc-12

2024-04-11 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114591 --- Comment #15 from Hongtao Liu --- > I don't see this as problematic. IIRC, there was a discussion in the past > that a couple (two?) memory accesses from the same location close to each > other can be faster (so, -O2, not -Os) than

[Bug middle-end/110027] [11/12/13/14 regression] Stack objects with extended alignments (vectors etc) misaligned on detect_stack_use_after_return

2024-04-11 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110027 --- Comment #19 from Hongtao Liu --- (In reply to Jakub Jelinek from comment #17) > Both of the posted patches are incorrect, this needs to be fixed in > asan_emit_stack_protection, account for the different offsets[0] which > happens when a

[Bug target/114591] [12/13/14 Regression] register allocators introduce an extra load operation since gcc-12

2024-04-11 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114591 --- Comment #12 from Hongtao Liu --- short a; short c; short d; void foo (short b, short f) { c = b + a; d = f + a; } foo(short, short): addwa(%rip), %di addwa(%rip), %si movw%di, c(%rip) movw

[Bug target/114591] [12/13/14 Regression] register allocators introduce an extra load operation since gcc-12

2024-04-10 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114591 --- Comment #11 from Hongtao Liu --- unsigned v; long long v2; char foo () { v2 = v; return v; } This is related to *movqi_internal, and codegen has been worse since gcc8.1 foo: movlv(%rip), %eax movq%rax,

[Bug target/114591] [12/13/14 Regression] register allocators introduce an extra load operation since gcc-12

2024-04-10 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114591 --- Comment #9 from Hongtao Liu --- > > It looks that different modes of memory read confuse LRA to not CSE the read. > > IMO, if the preloaded value is later accessed in different modes, LRA should > leave it. Alternatively, LRA should CSE

[Bug target/114591] [12/13/14 Regression] register allocators introduce an extra load operation since gcc-12

2024-04-10 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114591 --- Comment #5 from Hongtao Liu --- > My experience is memory cost for the operand with rm or separate r, m is > different which impacts RA decision. > > https://gcc.gnu.org/pipermail/gcc-patches/2022-May/595573.html Change operands[1]

[Bug target/114591] [12/13/14 Regression] register allocators introduce an extra load operation since gcc-12

2024-04-10 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114591 Hongtao Liu changed: What|Removed |Added CC||liuhongt at gcc dot gnu.org --- Comment

[Bug tree-optimization/66862] OpenMP SIMD does not work (use SIMD instructions) on conditional code

2024-04-08 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66862 Hongtao Liu changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|---

[Bug target/113288] [i386] Missing #define for -mavx10.1-256 and -mavx10.1-512

2024-04-08 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113288 Hongtao Liu changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|---

[Bug target/114544] [x86] stv should transform (subreg DI (V1TI) 8) as (vec_select:DI (V2DI) (const_int 1))

2024-04-07 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114544 --- Comment #3 from Hongtao Liu --- <__umodti3>: ... 37 58: 66 48 0f 6e c7 movq %rdi,%xmm0 38 5d: 66 48 0f 6e d6 movq %rsi,%xmm2 39 62: 66 0f 6c c2 punpcklqdq %xmm2,%xmm0 40 66:

[Bug target/114570] New: GCC doesn't perform good loop invariant code motion for very long vector operations.

2024-04-03 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114570 Bug ID: 114570 Summary: GCC doesn't perform good loop invariant code motion for very long vector operations. Product: gcc Version: 14.0 Status: UNCONFIRMED

[Bug rtl-optimization/114556] New: weird loop unrolling when there's attribute aligned in side the loop

2024-04-02 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114556 Bug ID: 114556 Summary: weird loop unrolling when there's attribute aligned in side the loop Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal

[Bug target/114544] [x86] stv should transform (subreg DI (V1TI) 8) as (vec_select:DI (V2DI) (const_int 1))

2024-04-01 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114544 --- Comment #2 from Hongtao Liu --- Also for void foo2 (v128_t* a, v128_t* b) { c = (*a & *b)+ *b; } (insn 9 8 10 2 (set (reg:V1TI 108 [ _3 ]) (and:V1TI (reg:V1TI 99 [ _2 ]) (mem:V1TI (reg:DI 113) [1 *a_6(D)+0 S16

[Bug target/114544] [x86] stv should transform (subreg DI (V1TI) 8) as (vec_select:DI (V2DI) (const_int 1))

2024-04-01 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114544 --- Comment #1 from Hongtao Liu --- 20590;; Turn SImode or DImode extraction from arbitrary SSE/AVX/AVX512F 20591;; vector modes into vec_extract*. 20592(define_split 20593 [(set (match_operand:SWI48x 0 "nonimmediate_operand") 20594

[Bug target/114544] New: [x86] stv should transform (subreg DI (V1TI) 8) as (vec_select:DI (V2DI) (const_int 1))

2024-04-01 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114544 Bug ID: 114544 Summary: [x86] stv should transform (subreg DI (V1TI) 8) as (vec_select:DI (V2DI) (const_int 1)) Product: gcc Version: 14.0 Status: UNCONFIRMED

[Bug target/114514] v16qi >> 7 can be optimized with vpcmpgtb

2024-03-28 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114514 --- Comment #3 from Hongtao Liu --- (In reply to Andrew Pinski from comment #1) > Confirmed. > > Note non sign bit can be improved too: > ``` I assume you're talking about broadcast from imm or directly from constant pool. GCC chooses the

[Bug target/114514] New: v16qi >> 7 can be optimized with vpcmpgtb

2024-03-28 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114514 Bug ID: 114514 Summary: v16qi >> 7 can be optimized with vpcmpgtb Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component:

[Bug tree-optimization/114471] [14 regression] ICE when building liblc3-1.0.4 with -fno-vect-cost-model -march=x86-64-v4

2024-03-25 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114471 --- Comment #6 from Hongtao Liu --- (In reply to Hongtao Liu from comment #5) > Maybe we should always use kmask under AVX512, currently only >= 128-bits > vector of vector _Float16 use kmask, < 128 bits vector still use vector mask. > and we

[Bug tree-optimization/114471] [14 regression] ICE when building liblc3-1.0.4 with -fno-vect-cost-model -march=x86-64-v4

2024-03-25 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114471 Hongtao Liu changed: What|Removed |Added CC||liuhongt at gcc dot gnu.org --- Comment

[Bug target/114429] [x86] (neg a) ashifrt>> 31 can be optimized to a > 0.

2024-03-22 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114429 Hongtao Liu changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|---

[Bug target/114429] [x86] (neg a) ashifrt>> 31 can be optimized to a > 0.

2024-03-22 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114429 --- Comment #2 from Hongtao Liu --- (In reply to Hongtao Liu from comment #1) > when x is INT_MIN, I assume -x is UD, so compiler can do anything. > otherwise, (-x) >> 31 is just x > 0. > From rtl view. neg of INT_MIN is assumed to 0 after it's

[Bug target/114429] [x86] (neg a) ashifrt>> 31 can be optimized to a > 0.

2024-03-21 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114429 Hongtao Liu changed: What|Removed |Added Target||x86_64-*-* i?86-*-* --- Comment #1 from

[Bug target/114429] New: [x86] (neg a) ashifrt>> 31 can be optimized to a > 0.

2024-03-21 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114429 Bug ID: 114429 Summary: [x86] (neg a) ashifrt>> 31 can be optimized to a > 0. Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3

[Bug target/114428] New: [x86] psrad xmm, xmm, 16 and pand xmm, const_vector (0xffff x4) can be optimized to psrld

2024-03-21 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114428 Bug ID: 114428 Summary: [x86] psrad xmm, xmm, 16 and pand xmm, const_vector (0x x4) can be optimized to psrld Product: gcc Version: 14.0 Status: UNCONFIRMED

[Bug target/114427] New: [x86] ec_pack_truncv8si/v4si can be optimized with pblendw instead of pand for AVX2 target

2024-03-21 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114427 Bug ID: 114427 Summary: [x86] ec_pack_truncv8si/v4si can be optimized with pblendw instead of pand for AVX2 target Product: gcc Version: 14.0 Status: UNCONFIRMED

[Bug tree-optimization/114396] [13/14 Regression] Vector: Runtime mismatch at -O2 with -fwrapv since r13-7988-g82919cf4cb2321

2024-03-21 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114396 Hongtao Liu changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|---

[Bug tree-optimization/114396] [13/14 Regression] Vector: Runtime mismatch at -O2 with -fwrapv since r13-7988-g82919cf4cb2321

2024-03-21 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114396 --- Comment #20 from Hongtao Liu --- (In reply to JuzheZhong from comment #19) > I think it's better to add pr114396.c into vect testsuite instead of x86 > target test since it's the bug not only happens on x86. Sure, there's no target

[Bug rtl-optimization/92080] Missed CSE of _mm512_set1_epi8(c) with _mm256_set1_epi8(c)

2024-03-21 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92080 --- Comment #9 from Hongtao Liu --- > If we were to expose that vpxor before postreload we'd likely CSE but > we have > > 5: xmm0:V4SI=const_vector > REG_EQUIV const_vector > 6: [`b']=xmm0:V4SI > 7: xmm0:V8HI=const_vector >

[Bug rtl-optimization/92080] Missed CSE of _mm512_set1_epi8(c) with _mm256_set1_epi8(c)

2024-03-21 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92080 Hongtao Liu changed: What|Removed |Added CC||liuhongt at gcc dot gnu.org --- Comment

[Bug tree-optimization/114396] [13/14 Regression] Vector: Runtime mismatch at -O2 with -fwrapv since r13-7988-g82919cf4cb2321

2024-03-20 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114396 --- Comment #17 from Hongtao Liu --- > > > > The to_mpz args look like they could be mixing signs as well: > > I tries below, looks like mixing signs works well. debug show step_expr is -5 and signed. short a = 0xF; short b[16]; unsigned

[Bug tree-optimization/114396] [13/14 Regression] Vector: Runtime mismatch at -O2 with -fwrapv since r13-7988-g82919cf4cb2321

2024-03-20 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114396 Hongtao Liu changed: What|Removed |Added Status|NEW |ASSIGNED --- Comment #16 from Hongtao

[Bug tree-optimization/114396] [13/14 Regression] Vector: Runtime mismatch at -O2 with -fwrapv since r13-7988-g82919cf4cb2321

2024-03-20 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114396 --- Comment #15 from Hongtao Liu --- (In reply to Richard Biener from comment #9) > (In reply to Robin Dapp from comment #8) > > No fallout on x86 or aarch64. > > > > Of course using false instead of TYPE_SIGN (utype) is also possible and > >

[Bug tree-optimization/67683] Missed vectorization: shifts of an induction variable

2024-03-18 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67683 Hongtao Liu changed: What|Removed |Added CC||liuhongt at gcc dot gnu.org --- Comment

[Bug middle-end/114347] wrong constant folding when casting __bf16 to int

2024-03-18 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114347 --- Comment #9 from Hongtao Liu --- (In reply to Richard Biener from comment #7) > (In reply to Jakub Jelinek from comment #6) > > You can use -fexcess-precision=16 if you don't want treating _Float16 and > > __bf16 as having excess precision.

[Bug target/114334] [14 Regression] ICE: in extract_insn, at recog.cc:2812 (unrecognizable insn and:HF?) with lroundf16() and -ffast-math -mavx512fp16

2024-03-17 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114334 Hongtao Liu changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|---

[Bug tree-optimization/66862] OpenMP SIMD does not work (use SIMD instructions) on conditional code

2024-03-17 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66862 --- Comment #5 from Hongtao Liu --- > Now, it seems AVX512BW (and AVX512VL in some cases) has the needed > instructions, > in particular VMOVDQU{8,16}, but it is not reflected in maskload and > maskstore expanders. CCing Kyrill and Uros on

[Bug target/114334] [14 Regression] ICE: in extract_insn, at recog.cc:2812 (unrecognizable insn and:HF?) with lroundf16() and -ffast-math -mavx512fp16

2024-03-14 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114334 Hongtao Liu changed: What|Removed |Added Status|UNCONFIRMED |ASSIGNED Last reconfirmed|

[Bug target/110027] [11/12/13/14 regression] Misaligned vector store on detect_stack_use_after_return

2024-03-14 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110027 --- Comment #15 from Hongtao Liu --- A patch is posted at https://gcc.gnu.org/pipermail/gcc-patches/2024-March/647604.html

[Bug target/111822] [12/13/14 Regression] during RTL pass: lr_shrinkage ICE: in operator[], at vec.h:910 with -O2 -m32 -flive-range-shrinkage -fno-dce -fnon-call-exceptions since r12-5301-g04520645038

2024-03-14 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111822 Hongtao Liu changed: What|Removed |Added Status|NEW |RESOLVED Resolution|---

[Bug target/110027] [11/12/13/14 regression] Misaligned vector store on detect_stack_use_after_return

2024-03-12 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110027 --- Comment #14 from Hongtao Liu --- diff --git a/gcc/cfgexpand.cc b/gcc/cfgexpand.cc index 0de299c62e3..92062378d8e 100644 --- a/gcc/cfgexpand.cc +++ b/gcc/cfgexpand.cc @@ -1214,7 +1214,7 @@ expand_stack_vars (bool (*pred) (size_t), class

[Bug libgcc/111731] [13/14 regression] gcc_assert is hit at libgcc/unwind-dw2-fde.c#L291

2024-03-12 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111731 Hongtao Liu changed: What|Removed |Added CC||liuhongt at gcc dot gnu.org --- Comment

[Bug target/110027] [11/12/13/14 regression] Misaligned vector store on detect_stack_use_after_return

2024-03-11 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110027 --- Comment #13 from Hongtao Liu --- So the stack is like --- stack top -32 - (offset -32) -64 (32 bytes redzone) - (offset -64) -128 (64 bytes __m512) (offset -128) (32-bytes redzone) ---(offset

[Bug target/110027] [11/12/13/14 regression] Misaligned vector store on detect_stack_use_after_return

2024-03-10 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110027 --- Comment #12 from Hongtao Liu --- (In reply to Sam James from comment #11) > Calling it a 11..14 regression as we know 14 is bad and 7.5 is OK, but I > can't test 11/12 on an avx512 machine right now. I can't reproduce that with 11/12, but

[Bug target/111822] [12/13/14 Regression] during RTL pass: lr_shrinkage ICE: in operator[], at vec.h:910 with -O2 -m32 -flive-range-shrinkage -fno-dce -fnon-call-exceptions since r12-5301-g04520645038

2024-03-10 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111822 --- Comment #16 from Hongtao Liu --- (In reply to Uroš Bizjak from comment #11) > (In reply to Richard Biener from comment #10) > > The easiest fix would be to refuse applying STV to a insn that > > can_throw_internal () (that's an insn that

[Bug d/114171] [13/14 Regression] gdc -O2 -mavx generates misaligned vmovdqa instruction

2024-02-29 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114171 Hongtao Liu changed: What|Removed |Added CC||liuhongt at gcc dot gnu.org Last

[Bug tree-optimization/114164] simdclone vectorization creates unsupported IL

2024-02-29 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114164 Hongtao Liu changed: What|Removed |Added CC||liuhongt at gcc dot gnu.org --- Comment

[Bug tree-optimization/112325] Missed vectorization of reduction after unrolling

2024-02-28 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112325 --- Comment #16 from Hongtao Liu --- > I'm all for removing the 1/3 for innermost loop handling (in cunroll > the unrolled loop is then innermost). I'm more concerned about > unrolling more than one level which is exactly what's required for

[Bug tree-optimization/112325] Missed vectorization of reduction after unrolling

2024-02-27 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112325 --- Comment #14 from Hongtao Liu --- (In reply to rguent...@suse.de from comment #13) > On Tue, 27 Feb 2024, liuhongt at gcc dot gnu.org wrote: > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112325 > > > > --- Comment #11 from Hongtao Liu

[Bug target/114125] Support vcond_mask_qiqi and friends.

2024-02-26 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114125 Hongtao Liu changed: What|Removed |Added Ever confirmed|0 |1 Status|UNCONFIRMED

[Bug target/114125] New: Support vcond_mask_qiqi and friends.

2024-02-26 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114125 Bug ID: 114125 Summary: Support vcond_mask_qiqi and friends. Product: gcc Version: 14.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal

[Bug tree-optimization/112325] Missed vectorization of reduction after unrolling

2024-02-26 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112325 --- Comment #11 from Hongtao Liu --- >Loop body is likely going to simplify further, this is difficult >to guess, we just decrease the result by 1/3. */ > This is introduced by r0-68074-g91a01f21abfe19 /* Estimate number of insns

[Bug tree-optimization/112325] Missed vectorization of reduction after unrolling

2024-02-26 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112325 --- Comment #10 from Hongtao Liu --- (In reply to Hongtao Liu from comment #9) > The original case is a little different from the one in PR. But the issue is similar, after cunrolli, GCC failed to vectorize the outer loop. The interesting

[Bug tree-optimization/112325] Missed vectorization of reduction after unrolling

2024-02-26 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112325 --- Comment #9 from Hongtao Liu --- The original case is a little different from the one in PR. It comes from ggml #include #include typedef uint16_t ggml_fp16_t; static float table_f32_f16[1 << 16]; inline static float

[Bug target/114107] poor vectorization at -O3 when dealing with arrays of different multiplicity, good with -O2

2024-02-25 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114107 --- Comment #11 from Hongtao Liu --- (In reply to N Schaeffer from comment #9) > In addition, optimizing for size with -Os leads to a non-vectorized > double-loop (51 bytes) while the vectorized loop with vbroadcastsd (produced > by clang -Os)

[Bug target/114107] poor vectorization at -O3 when dealing with arrays of different multiplicity, good with -O2

2024-02-25 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114107 --- Comment #8 from Hongtao Liu --- (In reply to Hongtao Liu from comment #7) > perm_cost is very low in x86 backend, and it maybe ok for 128-bit vectors, > pshufb/shufps are avaible for most cases. > But for 256/512-bit vectors, when the

[Bug target/114107] poor vectorization at -O3 when dealing with arrays of different multiplicity, good with -O2

2024-02-25 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114107 Hongtao Liu changed: What|Removed |Added CC||liuhongt at gcc dot gnu.org --- Comment

[Bug tree-optimization/109885] gcc does not generate movmskps and testps instructions (clang does)

2024-02-17 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109885 --- Comment #4 from Hongtao Liu --- int sum() { int ret = 0; for (int i=0; i<8; ++i) ret +=(0==v[i]); return ret; } int sum2() { int ret = 0; auto m = v==0; for (int i=0; i<8; ++i) ret += m[i]; return ret; } For sum, gcc

[Bug tree-optimization/113576] [14 regression] 502.gcc_r hangs r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c

2024-02-17 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113576 --- Comment #57 from Hongtao Liu --- > For dg-do run testcases I really think we should avoid those -march= > options, because it means a lot of other stuff, BMI, LZCNT, ... Make sense.

[Bug tree-optimization/113576] [14 regression] 502.gcc_r hangs r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c

2024-02-08 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113576 --- Comment #45 from Hongtao Liu --- > > There's do_store_flag to fixup for uses not in branches and > > do_compare_and_jump for conditional jumps. > > reasonable enough for me. I mean we only handle it at consumers where upper bits matters.

[Bug tree-optimization/113576] [14 regression] 502.gcc_r hangs r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c

2024-02-08 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113576 --- Comment #44 from Hongtao Liu --- > > Note the AND is removed by combine if I add it: > > Successfully matched this instruction: > (set (reg:CCZ 17 flags) > (compare:CCZ (and:HI (not:HI (subreg:HI (reg:QI 102 [ tem_3 ]) 0)) >

[Bug tree-optimization/113576] [14 regression] 502.gcc_r hangs r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c

2024-02-08 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113576 --- Comment #43 from Hongtao Liu --- > Well, yes, the discussion in this bug was whether to do this at consumers > (that's sth new) or with all mask operations (that's how we handle > bit-precision integer operations, so it might be relatively

[Bug tree-optimization/113576] [14 regression] 502.gcc_r hangs r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c

2024-02-07 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113576 --- Comment #39 from Hongtao Liu --- > > the question is whether that matches the semantics of GIMPLE (the padding > > is inverted, too), whether it invokes undefined behavior (don't do it - it > > seems for people using intrinsics that's what

[Bug tree-optimization/113576] [14 regression] 502.gcc_r hangs r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c

2024-02-07 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113576 --- Comment #38 from Hongtao Liu --- > I think we should also mask off the upper bits of variable mask? > > notl%esi > orl %esi, %edi > notl%edi > andl$15, %edi > je .L3 with

  1   2   3   >