[Bug c/105363] -ftree-slp-vectorize decreases performance significantly (x64)

2022-04-24 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105363 --- Comment #2 from Hongtao.liu --- Looks like neither ICC nor LLVM vectorized the loop https://godbolt.org/z/sbheqbE6Y

[Bug c/105363] -ftree-slp-vectorize decreases performance significantly (x64)

2022-04-24 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105363 Hongtao.liu changed: What|Removed |Added CC||crazylht at gmail dot com --- Comment #1

[Bug target/105354] __builtin_shuffle for alignr generates suboptimal code unless SSSE3 is enabled

2022-04-23 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105354 --- Comment #2 from Hongtao.liu --- (In reply to Hongtao.liu from comment #1) > Yes, and I think it's only available for simd128u8, not for > simd128u16/u32/u64. No, under sse2 the optimization is also availble for simd128u16, directly generate

[Bug target/105354] __builtin_shuffle for alignr generates suboptimal code unless SSSE3 is enabled

2022-04-23 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105354 Hongtao.liu changed: What|Removed |Added CC||crazylht at gmail dot com --- Comment #1

[Bug target/105339] [x86] missing AVX-512F scalef functions when optimization is disabled

2022-04-21 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105339 --- Comment #2 from Hongtao.liu --- We need to add macro for _mm_{mask,maskz}_scalef_round_{sd,ss} intriniscs for gcc-9/10/11/12

[Bug target/105339] [x86] missing AVX-512F scalef functions when optimization is disabled

2022-04-21 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105339 Hongtao.liu changed: What|Removed |Added CC||crazylht at gmail dot com --- Comment #1

[Bug target/105288] AVX/AVX512 casts should use the "v" constraint

2022-04-17 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105288 --- Comment #2 from Hongtao.liu --- I think HJ means avx__ can be extended to evex sse registes by change "x" to "v" when AVX512VL is available. For avx512f__, it should be "=Yv,m" " vm,v" since operands[0] could be allocated as evex register

[Bug middle-end/105253] __popcountdi2 calls generated in kernel code with gcc12

2022-04-15 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105253 Hongtao.liu changed: What|Removed |Added CC||crazylht at gmail dot com --- Comment #16

[Bug tree-optimization/105216] [12 regression] 8% regression for m-queens compared to gcc11 O2 on CLX.

2022-04-11 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105216 --- Comment #11 from Hongtao.liu --- There's post_loop pass_thread_jumps, add a copy of pass_pre doesn't help.

[Bug tree-optimization/105216] [12 regression] 8% regression for m-queens compared to gcc11 O2 on CLX.

2022-04-11 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105216 --- Comment #10 from Hongtao.liu --- Probably related to below 3 cancelled jump thread which affects pre. 2076Threading through latch before loop opts would create non-empty latch: 2077 Cancelling jump thread: (15, 16) incoming edge; (16, 43)

[Bug tree-optimization/105216] [12 regression] 8% regression for m-queens compared to gcc11 O2 on CLX.

2022-04-11 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105216 --- Comment #9 from Hongtao.liu --- (In reply to Hongtao.liu from comment #8) > (In reply to rguent...@suse.de from comment #7) > > On Mon, 11 Apr 2022, crazylht at gmail dot com wrote: > > > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105

[Bug tree-optimization/105216] [12 regression] 8% regression for m-queens compared to gcc11 O2 on CLX.

2022-04-11 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105216 --- Comment #8 from Hongtao.liu --- (In reply to rguent...@suse.de from comment #7) > On Mon, 11 Apr 2022, crazylht at gmail dot com wrote: > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105216 > > > > --- Comment #5 from Hongtao.liu ---

[Bug tree-optimization/105216] [12 regression] 8% regression for m-queens compared to gcc11 O2 on CLX.

2022-04-11 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105216 --- Comment #6 from Hongtao.liu --- (In reply to Hongtao.liu from comment #5) > My bisect shows it's caused by > r12-3876-g4a960d548b7d7d942f316c5295f6d849b74214f5 pre dump before vs after -goto ; [11.00%] - - [local count: 105119324]: -

[Bug tree-optimization/105216] [12 regression] 8% regression for m-queens compared to gcc11 O2 on CLX.

2022-04-11 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105216 --- Comment #5 from Hongtao.liu --- My bisect shows it's caused by r12-3876-g4a960d548b7d7d942f316c5295f6d849b74214f5

[Bug tree-optimization/105216] [12 regression] 8% regression for m-queens compared to gcc11 O2

2022-04-11 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105216 --- Comment #3 from Hongtao.liu --- (In reply to Andrew Pinski from comment #1) > Does -fno-tree-vectorizer help? There is definitely another big recording > the fact pre had to turn something off while vectorization is turned on. No, not relat

[Bug tree-optimization/105216] New: [12 regression] 8% regression for m-queens compared to gcc11 O2

2022-04-11 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105216 Bug ID: 105216 Summary: [12 regression] 8% regression for m-queens compared to gcc11 O2 Product: gcc Version: 12.0 Status: UNCONFIRMED Severity: normal

[Bug target/105033] Suboptimal for vec_concat lower halves of two vectors.

2022-04-10 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105033 --- Comment #1 from Hongtao.liu --- Created attachment 52776 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52776&action=edit Patch pending for GCC13

[Bug tree-optimization/102583] [x86] Failure to optimize 32-byte integer vector conversion to 16-byte float vector properly when converting upper part with -mavx2

2022-04-08 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102583 --- Comment #4 from Hongtao.liu --- Created attachment 52771 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52771&action=edit Pending patch for GCC13.

[Bug target/101908] [12 regression] cray regression with -O2 -ftree-slp-vectorize compared to -O2

2022-03-29 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101908 --- Comment #46 from Hongtao.liu --- Another issue is splitting vector load to halves or elements, the latter requires scratch registers which may not be available, the former doesn't require extra register but may still trigger STLF stalls. For

[Bug target/101908] [12 regression] cray regression with -O2 -ftree-slp-vectorize compared to -O2

2022-03-28 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101908 --- Comment #44 from Hongtao.liu --- (In reply to Hongtao.liu from comment #43) > One thing I found by experiments: > Insert 64 vaddps %xmm18, %xmm19, %xmm20(no dependence between each other, > just emulate for pipeline) before stalled load, stl

[Bug target/101908] [12 regression] cray regression with -O2 -ftree-slp-vectorize compared to -O2

2022-03-28 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101908 --- Comment #43 from Hongtao.liu --- One thing I found by experiments: Insert 64 vaddps %xmm18, %xmm19, %xmm20(no dependence between each other, just emulate for pipeline) before stalled load, stlf stall case is as fast as no stall cases on CLX.

[Bug target/104915] Miss optimization for vec_setv8hi_0

2022-03-28 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104915 --- Comment #2 from Hongtao.liu --- Created attachment 52705 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52705&action=edit Patch pending for GCC13

[Bug target/105072] Miss optimization for pmovzxbq.

2022-03-28 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105072 --- Comment #1 from Hongtao.liu --- Created attachment 52699 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52699&action=edit Patch pending for GCC13

[Bug target/105066] GCC thinks pinsrw xmm, mem, 0 requires SSE4.1, not SSE2? _mm_loadu_si16 bounces through integer reg

2022-03-28 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105066 --- Comment #4 from Hongtao.liu --- Fixed in GCC12.

[Bug target/104610] memcmp () == 0 can be optimized better for avx512f

2022-03-27 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104610 Hongtao.liu changed: What|Removed |Added Attachment #52495|0 |1 is obsolete|

[Bug target/104610] memcmp () == 0 can be optimized better for avx512f

2022-03-27 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104610 --- Comment #15 from Hongtao.liu --- Could someone help to mark this blocks PR105073, the patch is ready and waiting for GCC13.

[Bug target/105073] New: [meta bug]Patch pending for GCC13.

2022-03-27 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105073 Bug ID: 105073 Summary: [meta bug]Patch pending for GCC13. Product: gcc Version: 12.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target

[Bug target/105066] GCC thinks pinsrw xmm, mem, 0 requires SSE4.1, not SSE2? _mm_loadu_si16 bounces through integer reg

2022-03-27 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105066 --- Comment #2 from Hongtao.liu --- > That may be a separate bug, IDK > Open PR105072 for it.

[Bug target/105072] New: Miss optimization for pmovzxbq.

2022-03-27 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105072 Bug ID: 105072 Summary: Miss optimization for pmovzxbq. Product: gcc Version: 12.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target

[Bug target/99754] [sse2] new _mm_loadu_si16 and _mm_loadu_si32 implemented incorrectly

2022-03-27 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99754 --- Comment #9 from Hongtao.liu --- (In reply to Hongtao.liu from comment #7) > > > > But that's unrelated to correctness; this bug can be closed unless we're > > keeping it open until it's fixed in the GCC11 current stable series. > > Let me d

[Bug target/105066] GCC thinks pinsrw xmm, mem, 0 requires SSE4.1, not SSE2? _mm_loadu_si16 bounces through integer reg

2022-03-27 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105066 --- Comment #1 from Hongtao.liu --- pinsrw is under sse2 for both reg and mem operands, but not for pextrw which requires sse4.1 for memory operands. 10593(define_insn "vec_set_0" 10594 [(set (match_operand:V8_128 0 "register_operand" 10595

[Bug target/104915] Miss optimization for vec_setv8hi_0

2022-03-27 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104915 --- Comment #1 from Hongtao.liu --- As described in PR105066, pinsrw mem should be better than movzx + vmovd.

[Bug sanitizer/84508] Load of misaligned address using _mm_load_sd

2022-03-27 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84508 Hongtao.liu changed: What|Removed |Added CC||crazylht at gmail dot com --- Comment #15

[Bug target/99754] [sse2] new _mm_loadu_si16 and _mm_loadu_si32 implemented incorrectly

2022-03-27 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99754 --- Comment #7 from Hongtao.liu --- > > But that's unrelated to correctness; this bug can be closed unless we're > keeping it open until it's fixed in the GCC11 current stable series. Let me do the backporting.

[Bug target/105058] Incorrect register constraint in KL patterns

2022-03-25 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105058 --- Comment #1 from Hongtao.liu --- cat test.c #include unsigned int ctrl; __m128i k1, k2, k3; void test_keylocker_11 (void) { register __m128i k4 __asm ("xmm16") = k2; asm volatile ("" : "+v" (k4)); _mm_loadiwkey (ctrl, k1, k4,

[Bug target/104976] [avx512fp16] lowpart_subreg return NULL_RTX cause ICE

2022-03-23 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104976 --- Comment #4 from Hongtao.liu --- ICE is fixed in GCC12, and I'd like to keep this PR open for refining validate_subreg in GCC13.

[Bug target/105034] New: [10/11/12 regression]Suboptimal codegen for min/max with -Os

2022-03-22 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105034 Bug ID: 105034 Summary: [10/11/12 regression]Suboptimal codegen for min/max with -Os Product: gcc Version: 12.0 Status: UNCONFIRMED Severity: normal

[Bug target/105033] New: Suboptimal for vec_concat lower halves of two vectors.

2022-03-22 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105033 Bug ID: 105033 Summary: Suboptimal for vec_concat lower halves of two vectors. Product: gcc Version: 12.0 Status: UNCONFIRMED Severity: normal Priority: P3 Comp

[Bug middle-end/105032] Compiling inline ASM x86 causing GCC stuck in an endless loop with 100% CPU usage

2022-03-22 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105032 Hongtao.liu changed: What|Removed |Added CC||crazylht at gmail dot com --- Comment #2

[Bug target/104978] [avx512fp16] wrong code for _mm_mask_fcmadd_round_sch

2022-03-22 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104978 Hongtao.liu changed: What|Removed |Added Resolution|--- |FIXED Status|UNCONFIRMED

[Bug target/105000] __attribute__((target("general-regs-only"))) doesn't disable AVX/SSE ISAs in ix86_isa_flags2

2022-03-22 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105000 --- Comment #4 from Hongtao.liu --- (In reply to Martin Liška from comment #3) > Can we close it as fixed? I think so.

[Bug target/104982] [12 Regression] FAIL: gcc.target/i386/bt-5.c by r12-7687

2022-03-22 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104982 --- Comment #7 from Hongtao.liu --- Fixed in GCC12.

[Bug target/104982] [12 Regression] FAIL: gcc.target/i386/bt-5.c by r12-7687

2022-03-21 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104982 --- Comment #4 from Hongtao.liu --- I'm testing diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index 02f298c2846..c74edd1aaef 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -14182,12 +14182,12 @@ (define_i

[Bug target/104982] [12 Regression] FAIL: gcc.target/i386/bt-5.c by r12-7687

2022-03-21 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104982 --- Comment #3 from Hongtao.liu --- (In reply to Hongtao.liu from comment #2) > 334Failed to match this instruction: > 335(set (reg/v:SI 88 [ z ]) > 336(if_then_else:SI (eq (zero_extract:SI (reg:SI 92) > 337(const_int 1 [0x1]

[Bug target/104982] [12 Regression] FAIL: gcc.target/i386/bt-5.c by r12-7687

2022-03-21 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104982 --- Comment #2 from Hongtao.liu --- 334Failed to match this instruction: 335(set (reg/v:SI 88 [ z ]) 336(if_then_else:SI (eq (zero_extract:SI (reg:SI 92) 337(const_int 1 [0x1]) 338(zero_extend:SI (subreg:QI (r

[Bug target/104977] [avx512fp16] wrong code for vfmaddcsh when -masm=intel.

2022-03-21 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104977 Hongtao.liu changed: What|Removed |Added Resolution|--- |FIXED Status|UNCONFIRMED

[Bug target/104978] [avx512fp16] wrong code for _mm_mask_fcmadd_round_sch

2022-03-18 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104978 --- Comment #3 from Hongtao.liu --- (In reply to Hongtao.liu from comment #2) > (In reply to Hongtao.liu from comment #0) > > #include > > __m128h > > foo (__m128h a, __m128h b, __m128h c, __mmask8 m) > > { > > return _mm_mask_fcmadd_round_

[Bug target/104978] [avx512fp16] wrong code for _mm_mask_fcmadd_round_sch

2022-03-18 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104978 --- Comment #2 from Hongtao.liu --- (In reply to Hongtao.liu from comment #0) > #include > __m128h > foo (__m128h a, __m128h b, __m128h c, __mmask8 m) > { > return _mm_mask_fcmadd_round_sch (a, m, b, c, 8); > } > > > _Z3fooDv8_DF16_S_S_h:

[Bug target/104978] [avx512fp16] wrong code for _mm_mask_fcmadd_round_sch

2022-03-18 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104978 --- Comment #1 from Hongtao.liu --- Similar for _mm_mask_fmadd_round_sch

[Bug target/104977] [avx512fp16] wrong code for vfmaddcsh when -masm=intel.

2022-03-18 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104977 --- Comment #1 from Hongtao.liu --- Similar for _mm_fmadd_round_sch

[Bug target/104978] New: [avx512fp16] wrong code for _mm_mask_fcmadd_round_sch

2022-03-18 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104978 Bug ID: 104978 Summary: [avx512fp16] wrong code for _mm_mask_fcmadd_round_sch Product: gcc Version: 12.0 Status: UNCONFIRMED Severity: normal Priority: P3 Compo

[Bug target/104977] New: [avx512fp16] wrong code for vfmaddcsh when -masm=intel.

2022-03-18 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104977 Bug ID: 104977 Summary: [avx512fp16] wrong code for vfmaddcsh when -masm=intel. Product: gcc Version: 12.0 Status: UNCONFIRMED Severity: normal Prior

[Bug target/104974] [avx512fp16] Error: operand type mismatch for `vmovw'

2022-03-18 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104974 Hongtao.liu changed: What|Removed |Added Resolution|--- |FIXED Status|UNCONFIRMED

[Bug target/104963] GCC11/12 -march=sapphirerapids misses some ISAs.

2022-03-18 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104963 Hongtao.liu changed: What|Removed |Added Resolution|--- |FIXED Status|UNCONFIRMED

[Bug target/104976] [avx512fp16] lowpart_subreg return NULL_RTX cause ICE

2022-03-18 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104976 --- Comment #2 from Hongtao.liu --- (In reply to Hongtao.liu from comment #1) > The walkround in the backend is force_reg operand[1] before lowpart_subreg > to avoid NULL_RTX. It would be nice if we extend validate_subreg to avoid wired situati

[Bug target/104976] [avx512fp16] lowpart_subreg return NULL_RTX cause ICE

2022-03-18 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104976 Hongtao.liu changed: What|Removed |Added Target||x86_64-*-* i?86-*-* Keywords|

[Bug target/104976] New: [avx512fp16] lowpart_subreg return NULL_RTX cause ICE

2022-03-18 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104976 Bug ID: 104976 Summary: [avx512fp16] lowpart_subreg return NULL_RTX cause ICE Product: gcc Version: 12.0 Status: UNCONFIRMED Severity: normal Priority: P3 Compo

[Bug target/104974] [avx512fp16] Error: operand type mismatch for `vmovw'

2022-03-17 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104974 Hongtao.liu changed: What|Removed |Added Target||x86_64-*-* i?86-*-* Keywords|

[Bug target/104974] New: [avx512fp16] Error: operand type mismatch for `vmovw'

2022-03-17 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104974 Bug ID: 104974 Summary: [avx512fp16] Error: operand type mismatch for `vmovw' Product: gcc Version: 12.0 Status: UNCONFIRMED Severity: normal Priority: P3 Comp

[Bug target/104963] New: GCC11/12 -march=sapphirerapids miss some isa.

2022-03-16 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104963 Bug ID: 104963 Summary: GCC11/12 -march=sapphirerapids miss some isa. Product: gcc Version: 11.2.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component:

[Bug rtl-optimization/104950] GCC does not emit branchless code for load next to each other

2022-03-16 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104950 --- Comment #6 from Hongtao.liu --- (In reply to Andrew Pinski from comment #5) > (In reply to Hongtao.liu from comment #4) > > (In reply to Richard Biener from comment #3) > > > Ah, on aarch64 we get > > > > > > cmp w0, 0 > > >

[Bug rtl-optimization/104950] GCC does not emit branchless code for load next to each other

2022-03-16 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104950 Hongtao.liu changed: What|Removed |Added CC||crazylht at gmail dot com --- Comment #4

[Bug target/104946] [12 regression] Suboptimal gimple foding for blendvpd under sse4.1

2022-03-16 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104946 Hongtao.liu changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|---

[Bug target/104946] [12 regression] Suboptimal gimple foding for blendvpd under sse4.1

2022-03-15 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104946 Hongtao.liu changed: What|Removed |Added Keywords||missed-optimization Target|

[Bug target/104946] New: [12 regression] Suboptimal gimple foding for blendvpd under sse4.1

2022-03-15 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104946 Bug ID: 104946 Summary: [12 regression] Suboptimal gimple foding for blendvpd under sse4.1 Product: gcc Version: 12.0 Status: UNCONFIRMED Severity: normal

[Bug target/101908] [12 regression] cray regression with -O2 -ftree-slp-vectorize compared to -O2

2022-03-14 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101908 --- Comment #41 from Hongtao.liu --- (In reply to Richard Biener from comment #22) > (In reply to Hongtao.liu from comment #21) > > Now we have SLP node available in vector cost hook, maybe we can do sth in > > cost model to prevent vectorizatio

[Bug target/104915] New: Miss optimization for vec_setv8hi_0

2022-03-14 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104915 Bug ID: 104915 Summary: Miss optimization for vec_setv8hi_0 Product: gcc Version: 12.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target

[Bug target/101908] [12 regression] cray regression with -O2 -ftree-slp-vectorize compared to -O2

2022-03-14 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101908 --- Comment #39 from Hongtao.liu --- > I'll see if I get around to prototype some argument classification > in the vectorizer (looking how hard it is to use > INIT_CUMULATIVE_ARGS in a context where we are not expanding to RTL), > unfortunately

[Bug target/104666] [12 Regression] ICE in related_vector_mode, at stor-layout.c:537

2022-03-13 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104666 --- Comment #8 from Hongtao.liu --- Fixed in GCC12.

[Bug target/101908] [12 regression] cray regression with -O2 -ftree-slp-vectorize compared to -O2

2022-03-11 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101908 --- Comment #37 from Hongtao.liu --- > There is not much value in the vectorization we do in this function > (when manually fixing the STLF issue the speed is as good as with the > scalar code). We cost > > ray.dir.x 1 times scalar_load costs

[Bug target/101908] [12 regression] cray regression with -O2 -ftree-slp-vectorize compared to -O2

2022-03-11 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101908 --- Comment #35 from Hongtao.liu --- (In reply to Richard Biener from comment #34) > I can confirm this observation on Zen2. Note perf still records STLF > failures penalty is much higher on Znver3 than zen2 for the same case(v2df).

[Bug target/101908] [12 regression] cray regression with -O2 -ftree-slp-vectorize compared to -O2

2022-03-10 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101908 --- Comment #33 from Hongtao.liu --- (In reply to Hongtao.liu from comment #32) > (In reply to Hongtao.liu from comment #31) > > Created attachment 52595 [details] > > microbenchmark > The interesting the microbenchmark didn't hit store forward

[Bug target/101908] [12 regression] cray regression with -O2 -ftree-slp-vectorize compared to -O2

2022-03-10 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101908 --- Comment #32 from Hongtao.liu --- (In reply to Hongtao.liu from comment #31) > Created attachment 52595 [details] > microbenchmark The microbenchmark is used to test penalty for STFS, I've run it on CLX, and find 1 stalled vector load is fas

[Bug target/101908] [12 regression] cray regression with -O2 -ftree-slp-vectorize compared to -O2

2022-03-10 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101908 --- Comment #31 from Hongtao.liu --- Created attachment 52595 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52595&action=edit microbenchmark

[Bug target/101908] [12 regression] cray regression with -O2 -ftree-slp-vectorize compared to -O2

2022-03-10 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101908 --- Comment #30 from Hongtao.liu --- Created attachment 52594 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52594&action=edit tar -xvf micro.tar.gz Num/Typechar/s char/v char/vn short/s short/v short/vnint/s int/v i

[Bug target/101929] [12 Regression] r12-7319 regress x264_r by 4% on CLX.

2022-03-07 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101929 --- Comment #8 from Hongtao.liu --- (In reply to Richard Biener from comment #7) > diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc > index 9188d727e33..7f1f12fb6c6 100644 > --- a/gcc/tree-vect-slp.cc > +++ b/gcc/tree-vect-slp.cc > @@ -2

[Bug target/104773] compare with 1 not merged with subtract 1

2022-03-03 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104773 --- Comment #1 from Hongtao.liu --- It looks like the same issue as PR98977.

[Bug target/104704] [12 Regression] ix86_gen_scratch_sse_rtx doesn't work with explicit XMM7/XMM15/XMM31 usage

2022-03-03 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104704 --- Comment #13 from Hongtao.liu --- (In reply to H.J. Lu from comment #10) > Created attachment 52553 [details] > A patch to always return pseudo register in ix86_gen_scratch_sse_rtx Please go ahead with this patch, i'll submit an incremental

[Bug target/104704] [12 Regression] ix86_gen_scratch_sse_rtx doesn't work with explicit XMM7/XMM15/XMM31 usage

2022-03-02 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104704 --- Comment #12 from Hongtao.liu --- (In reply to H.J. Lu from comment #10) > Created attachment 52553 [details] > A patch to always return pseudo register in ix86_gen_scratch_sse_rtx For pr100865-8a.c,pr100865-9c.c,pr100865-8c.c +/* { dg-fina

[Bug target/104762] x86_64 538.imagick_r 8%-28% regressions and 10% 525.x264_r regressions after r12-7319-g90d693bdc9d718

2022-03-02 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104762 Hongtao.liu changed: What|Removed |Added CC||crazylht at gmail dot com --- Comment #2

[Bug target/104704] [12 Regression] ix86_gen_scratch_sse_rtx doesn't work with explicit XMM7/XMM15/XMM31 usage

2022-03-02 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104704 --- Comment #11 from Hongtao.liu --- (In reply to H.J. Lu from comment #9) > --- pieces-memset-46.s2022-03-02 06:44:55.845212762 -0800 > +++ > /export/build/gnu/tools-build/gcc-gitlab-debug/build-x86_64-linux/gcc/pieces- > memset-46.s

[Bug target/104704] [12 Regression] ix86_gen_scratch_sse_rtx doesn't work with explicit XMM7/XMM15/XMM31 usage

2022-03-01 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104704 --- Comment #8 from Hongtao.liu --- (In reply to H.J. Lu from comment #4) > (In reply to Hongtao.liu from comment #3) > > (In reply to H.J. Lu from comment #1) > > > ix86_expand_vector_move shouldn't use ix86_gen_scratch_sse_rtx. > > > > Is it

[Bug target/101908] [12 regression] cray regression with -O2 -ftree-slp-vectorize compared to -O2

2022-03-01 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101908 --- Comment #29 from Hongtao.liu --- >From Agner Fog's excellent optimization manuals(https://www.agner.org/optimize/microarchitecture.pdf). For ICX/TGL: An aligned write of 128 bits or more followed by a read of one or both of the two halves

[Bug target/104723] [12 regression] Redundant usage of stack

2022-03-01 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104723 --- Comment #2 from Hongtao.liu --- update testcase void f256(char *a) { char t[] = "012345678901234567890123456789012345678901234567"; __builtin_memcpy(a, &t[0], sizeof(t)); }

[Bug target/104723] [12 regression] Redundant usage of stack

2022-03-01 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104723 --- Comment #1 from Hongtao.liu --- (In reply to Hongtao.liu from comment #0) > bool f256(char *a) > { > char t[] = "012345678901234567890123456789012345678901234567"; > return __builtin_memcpy(a, &t[0], sizeof(t)) == 0; > } > > https://god

[Bug target/104723] New: [12 regression] Redundant usage of stack

2022-03-01 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104723 Bug ID: 104723 Summary: [12 regression] Redundant usage of stack Product: gcc Version: 12.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target

[Bug target/104704] [12 Regression] ix86_gen_scratch_sse_rtx doesn't work with explicit XMM7/XMM15/XMM31 usage

2022-02-28 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104704 --- Comment #7 from Hongtao.liu --- (In reply to Hongtao.liu from comment #6) > (In reply to Hongtao.liu from comment #5) > > I notice it regresses > > > > FAIL: gcc.target/i386/incoming-11.c scan-assembler-not andl[\\t > > ]*\\$-16,[\\t ]*%esp

[Bug target/104704] [12 Regression] ix86_gen_scratch_sse_rtx doesn't work with explicit XMM7/XMM15/XMM31 usage

2022-02-28 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104704 --- Comment #6 from Hongtao.liu --- (In reply to Hongtao.liu from comment #5) > I notice it regresses > > FAIL: gcc.target/i386/incoming-11.c scan-assembler-not andl[\\t > ]*\\$-16,[\\t ]*%esp Why replace ix86_gen_scratch_sse_rtx with gen_reg_

[Bug target/104704] [12 Regression] ix86_gen_scratch_sse_rtx doesn't work with explicit XMM7/XMM15/XMM31 usage

2022-02-28 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104704 --- Comment #5 from Hongtao.liu --- I notice it regresses FAIL: gcc.target/i386/incoming-11.c scan-assembler-not andl[\\t ]*\\$-16,[\\t ]*%esp

[Bug target/104686] [12 Regression] Huge compile-time regression building SPEC 2017 538.imagick_r with -march=skylake

2022-02-27 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104686 --- Comment #12 from Hongtao.liu --- (In reply to Hongtao.liu from comment #11) > (In reply to Martin Liška from comment #8) > > (In reply to Martin Liška from comment #7) > > > (In reply to Richard Biener from comment #6) > > > > Both revisions

[Bug target/104704] [12 Regression] ix86_gen_scratch_sse_rtx doesn't work with explicit XMM7/XMM15/XMM31 usage

2022-02-27 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104704 --- Comment #3 from Hongtao.liu --- (In reply to H.J. Lu from comment #1) > ix86_expand_vector_move shouldn't use ix86_gen_scratch_sse_rtx. Is it problematic for TARGET_GEN_MEMSET_SCRATCH_RTX?

[Bug target/104704] [12 Regression] ix86_gen_scratch_sse_rtx doesn't work with explicit XMM7/XMM15/XMM31 usage

2022-02-27 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104704 --- Comment #2 from Hongtao.liu --- Yes, thanks for the reproduced testcase.

[Bug target/101908] [12 regression] cray regression with -O2 -ftree-slp-vectorize compared to -O2

2022-02-27 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101908 --- Comment #27 from Hongtao.liu --- > We can start with disabling vectorization with very cheap cost model to fix Of course only for (>=)16-byte struct passing.

[Bug target/101908] [12 regression] cray regression with -O2 -ftree-slp-vectorize compared to -O2

2022-02-27 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101908 --- Comment #26 from Hongtao.liu --- (In reply to Richard Biener from comment #22) > (In reply to Hongtao.liu from comment #21) > > Now we have SLP node available in vector cost hook, maybe we can do sth in > > cost model to prevent vectorizatio

[Bug target/104686] [12 Regression] Huge compile-time regression building SPEC 2017 538.imagick_r with -march=skylake

2022-02-27 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104686 --- Comment #11 from Hongtao.liu --- (In reply to Martin Liška from comment #8) > (In reply to Martin Liška from comment #7) > > (In reply to Richard Biener from comment #6) > > > Both revisions affect vectorizer cost modeling only. With > > >

[Bug target/101908] [12 regression] cray regression with -O2 -ftree-slp-vectorize compared to -O2

2022-02-24 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101908 --- Comment #21 from Hongtao.liu --- Now we have SLP node available in vector cost hook, maybe we can do sth in cost model to prevent vectorization when node's definition from big-size parameter.

[Bug target/104666] [12 Regression] ICE in related_vector_mode, at stor-layout.c:537

2022-02-24 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104666 --- Comment #6 from Hongtao.liu --- (In reply to Jakub Jelinek from comment #5) > Wouldn't the right fix be instead to move the ix86_expand_builtin Good idea!

[Bug rtl-optimization/104438] Combine optimization opportunity exposed after pro_and_epilogue

2022-02-24 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104438 --- Comment #8 from Hongtao.liu --- (In reply to Martin Liška from comment #7) > (In reply to Hongtao.liu from comment #6) > > The opportunity disappear after r12-7125. > > Can you please install the latest contrib/gcc-git-customization.sh? Doi

[Bug target/104666] [12 Regression] ICE in related_vector_mode, at stor-layout.c:537

2022-02-23 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104666 --- Comment #4 from Hongtao.liu --- Same ICE exists for __builtin_ia32_blendvpd __builtin_ia32_blendvps __builtin_ia32_blendvpd256 __builtin_ia32_blendvps256 __builtin_ia32_pblendvb128 __builtin_ia32_pblenddvb256

[Bug target/104666] [12 Regression] ICE in related_vector_mode, at stor-layout.c:537

2022-02-23 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104666 --- Comment #3 from Hongtao.liu --- (In reply to Hongtao.liu from comment #2) > So builtins are registered in the beginning, but isa checking is during > pass_expand, and gimple folding is between them, maybe we should restrict > builtin gimple

<    1   2   3   4   5   6   7   8   9   10   >