[Bug target/94373] 548.exchange2_r run time is 7-12% worse than GCC 9 at -O2 and generic march/mtune

2020-03-29 Thread crazylht at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94373 --- Comment #2 from Hongtao.liu --- I think Change lea_cost from 2 --> 1 in skylake can fix this regressions. Since it's stage4 now, i hold my patch.

[Bug tree-optimization/94375] 548.exchange2_r run time is 8-18% worse than GCC 9 at -Ofast -march=native

2020-03-29 Thread crazylht at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94375 --- Comment #1 from Hongtao.liu --- Try -mprefer-vector-width=128,256-bit vectorization is not helpful for 548 according to our experience.

[Bug target/94373] 548.exchange2_r run time is 7-12% worse than GCC 9 at -O2 and generic march/mtune

2020-03-29 Thread crazylht at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94373 --- Comment #3 from Hongtao.liu --- (In reply to Hongtao.liu from comment #2) > I think > Change lea_cost from 2 --> 1 in skylake can fix this regressions. > > Since it's stage4 now, i hold my patch. Classify: it's for -O2 -mtune=skylake-avx512

[Bug tree-optimization/94375] 548.exchange2_r run time is 8-18% worse than GCC 9 at -Ofast -march=native

2020-03-30 Thread crazylht at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94375 --- Comment #4 from Hongtao.liu --- (In reply to Martin Jambor from comment #3) > (In reply to Hongtao.liu from comment #1) > > Try -mprefer-vector-width=128,256-bit vectorization is not helpful for 548 > > according to our experience. > > I hav

[Bug tree-optimization/94375] 548.exchange2_r run time is 8-18% worse than GCC 9 at -Ofast -march=native

2020-03-30 Thread crazylht at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94375 --- Comment #5 from Hongtao.liu --- (In reply to Hongtao.liu from comment #4) > (In reply to Martin Jambor from comment #3) > > (In reply to Hongtao.liu from comment #1) > > > Try -mprefer-vector-width=128,256-bit vectorization is not helpful for

[Bug target/94736] Missing ENDBR at label

2020-04-25 Thread crazylht at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94736 --- Comment #1 from Hongtao.liu --- Indirect jump `goto *p` is optimized off, so there's no indirect jump, either no need for inserting endbr64

[Bug target/94841] New: [10 Regression]527.cam4_r 7.68% regression on Intel Cascadelaker with -O2, 9.57% regression with -Ofast -march=native -funroll-loops -flto

2020-04-28 Thread crazylht at gmail dot com
: gcc Version: 10.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: crazylht at gmail dot com CC: hjl.tools at gmail dot com, tkoenig at gcc dot

[Bug target/94118] Undocumented inline assembly [target] operand modifiers

2020-05-06 Thread crazylht at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94118 --- Comment #2 from Hongtao.liu --- (In reply to Frédéric Recoules from comment #0) > The section 6.47.2.8 x86 Operand Modifiers of > https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html is only about x86. > > As it was done for Operand Constrai

[Bug target/95078] New: Missing fwprop for SIB address

2020-05-12 Thread crazylht at gmail dot com
Assignee: unassigned at gcc dot gnu.org Reporter: crazylht at gmail dot com CC: hjl.tools at gmail dot com Target Milestone: --- Target: i386, x86-64 cat test.c int foo (int* p1, int* p2, int scale) { int ret = *(p1 + scale * 4 + 11); *p2 = 3; int

[Bug target/95078] Missing fwprop for SIB address

2020-05-12 Thread crazylht at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95078 --- Comment #2 from Hongtao.liu --- (In reply to Richard Biener from comment #1) > TER should go away, not be extended. So you are suggesting that we replace > > leaq44(%rdi,%rdx,4), %rdx --- redundant could be fwprop > mov

[Bug tree-optimization/53947] [meta-bug] vectorizer missed-optimizations

2020-05-15 Thread crazylht at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947 Bug 53947 depends on bug 92611, which changed state. Bug 92611 Summary: auto vectorization failed for type promotation https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92611 What|Removed |Added -

[Bug middle-end/92492] AVX512: Missed vectorization opportunity

2020-05-15 Thread crazylht at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92492 Bug 92492 depends on bug 92611, which changed state. Bug 92611 Summary: auto vectorization failed for type promotation https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92611 What|Removed |Added -

[Bug target/92611] auto vectorization failed for type promotation

2020-05-15 Thread crazylht at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92611 Hongtao.liu changed: What|Removed |Added Status|NEW |RESOLVED Resolution|---

[Bug target/92658] x86 lacks vector extend / truncate

2020-05-15 Thread crazylht at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92658 --- Comment #13 from Hongtao.liu --- *** Bug 92611 has been marked as a duplicate of this bug. ***

[Bug target/94962] Suboptimal AVX2 code for _mm256_zextsi128_si256(_mm_set1_epi8(-1))

2020-05-18 Thread crazylht at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94962 Hongtao.liu changed: What|Removed |Added CC||crazylht at gmail dot com --- Comment #1

[Bug target/94962] Suboptimal AVX2 code for _mm256_zextsi128_si256(_mm_set1_epi8(-1))

2020-05-18 Thread crazylht at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94962 --- Comment #3 from Hongtao.liu --- You're right, from intel SDM: VEX.128 encoded version: Bits (MAXVL-1:128) of the destination register are zeroed.

[Bug target/94962] Suboptimal AVX2 code for _mm256_zextsi128_si256(_mm_set1_epi8(-1))

2020-05-18 Thread crazylht at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94962 --- Comment #4 from Hongtao.liu --- (In reply to Jakub Jelinek from comment #2) > But such an instruction isn't always redundant, it really depends on what > the previous setter of the register did, whether the upper 128 bit of the > 256-bit regi

[Bug target/94962] Suboptimal AVX2 code for _mm256_zextsi128_si256(_mm_set1_epi8(-1))

2020-05-18 Thread crazylht at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94962 --- Comment #6 from Hongtao.liu --- (In reply to Nemo from comment #5) > (In reply to Jakub Jelinek from comment #2) > > I would be happy if GCC could just emit optimal code (single vcmpeqd > instruction) for this useful constant: > > _mm25

[Bug target/92658] x86 lacks vector extend / truncate

2020-05-19 Thread crazylht at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92658 --- Comment #16 from Hongtao.liu --- (In reply to Uroš Bizjak from comment #15) > I will leave truncations (Down Converts in Intel speak) which are AVX512F > instructions to someone else. It should be easy to add missing patterns and > tests foll

[Bug target/92658] x86 lacks vector extend / truncate

2020-05-20 Thread crazylht at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92658 --- Comment #17 from Hongtao.liu --- Created attachment 48570 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=48570&action=edit 0001-Add-missing-vector-truncmn2-expanders-PR92658.patch Seems there're only truncmn2 for truncate, not expander

[Bug target/95125] Unoptimal code for vectorized conversions

2020-05-21 Thread crazylht at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95125 --- Comment #4 from Hongtao.liu --- (In reply to Uroš Bizjak from comment #3) > It turns out that a bunch of patterns have to be renamed (and testcases > added). > > Easyhack, waiting for someone to show some love to conversion patterns in > sse

[Bug target/95125] Unoptimal code for vectorized conversions

2020-05-22 Thread crazylht at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95125 --- Comment #5 from Hongtao.liu --- (In reply to Uroš Bizjak from comment #3) > It turns out that a bunch of patterns have to be renamed (and testcases > added). > > Easyhack, waiting for someone to show some love to conversion patterns in > sse

[Bug target/92658] x86 lacks vector extend / truncate

2020-05-22 Thread crazylht at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92658 --- Comment #20 from Hongtao.liu --- (In reply to Mark Wielaard from comment #19) > (In reply to CVS Commits from comment #18) > > gcc/testsuite/ChangeLog: > > * gcc.target/i386/pr92658-avx512f.c: New test. > > * gcc.t

[Bug target/95256] [11 Regression] ICE in convert_move, at expr.c:278 since r11-263-g7c355156aa20eaec

2020-05-29 Thread crazylht at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95256 --- Comment #6 from Hongtao.liu --- (In reply to Arseny Solokha from comment #5) > Is there some further work pending, or should this PR be closed now? It's fixed.

[Bug target/95211] [11 Regression] ICE in emit_unop_insn, at optabs.c:3622

2020-05-29 Thread crazylht at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95211 --- Comment #9 from Hongtao.liu --- (In reply to Arseny Solokha from comment #8) > Is there some further work pending, or should this PR be closed now? Fixed in GCC11.

[Bug target/95453] Failure to avoid useless sign extension

2020-06-01 Thread crazylht at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95453 --- Comment #2 from Hongtao.liu --- Duplicated as PR95076?

[Bug target/95488] New: Suboptimal multiplication codegen for v16qi

2020-06-02 Thread crazylht at gmail dot com
Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: crazylht at gmail dot com Target Milestone: --- Target: x86_64-*-* i?86-*-* cat test.c --- typedef unsigned char v16qi __attribute__ ((vector_size (16))); v16qi foo (v16qi

[Bug target/95488] Suboptimal multiplication codegen for v16qi

2020-06-02 Thread crazylht at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95488 --- Comment #1 from Hongtao.liu --- I think it's this TYPE_SIGN (TREE_TYPE (REG_EXPR (op1))).

[Bug target/95488] Suboptimal multiplication codegen for v16qi

2020-06-03 Thread crazylht at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95488 --- Comment #3 from Hongtao.liu --- (In reply to Richard Biener from comment #2) > (In reply to Hongtao.liu from comment #1) > > I think it's this TYPE_SIGN (TREE_TYPE (REG_EXPR (op1))). > > That's not reliable. Mutliplication shouldn't care ab

[Bug target/95488] Suboptimal multiplication codegen for v16qi

2020-06-03 Thread crazylht at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95488 --- Comment #4 from Hongtao.liu --- (In reply to Hongtao.liu from comment #3) > (In reply to Richard Biener from comment #2) > > (In reply to Hongtao.liu from comment #1) > > > I think it's this TYPE_SIGN (TREE_TYPE (REG_EXPR (op1))). > > > > Th

[Bug target/95524] New: Subtimal codegen for shift by constant for v16qi/v32qi under -march=skylake

2020-06-03 Thread crazylht at gmail dot com
: missed-optimization Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: crazylht at gmail dot com Target Milestone: --- Target: x86_64-*-* i?86-*-* cat test.c --- typedef char v16qi

[Bug target/95400] -march=native and -march=icelake-client produce different results on icelake client

2020-06-04 Thread crazylht at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95400 --- Comment #5 from Hongtao.liu --- (In reply to Martin Liška from comment #4) > Can we backport the change to active branches? Backport to GCC9, GCC10. Partially backport to GCC8.(drop tremont and tigerlake part).

[Bug target/95488] Suboptimal multiplication codegen for v16qi

2020-06-11 Thread crazylht at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95488 --- Comment #5 from Hongtao.liu --- Microbenchmark cat test.c #include #include #include typedef char v16qi __attribute__ ((vector_size (16))); extern v16qi interleave_mul (v16qi, v16qi); extern v16qi extend_mul (v16qi, v16qi); #defi

[Bug target/95524] Subtimal codegen for shift by constant for v16qi/v32qi under -march=skylake

2020-06-11 Thread crazylht at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95524 --- Comment #1 from Hongtao.liu --- Microbenchmark show interleave_ashiftrt : 69023847 magic_ashiftrt : 62488066 Seems 10% improvement.

[Bug target/95488] Suboptimal multiplication codegen for v16qi

2020-06-14 Thread crazylht at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95488 Hongtao.liu changed: What|Removed |Added Resolution|--- |FIXED Status|UNCONFIRMED

[Bug target/95524] Subtimal codegen for shift by constant for v16qi/v32qi under -march=skylake

2020-06-15 Thread crazylht at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95524 --- Comment #2 from Hongtao.liu --- Microbenchmark show on Skylake client --- benchmark Skylake client ashift improvement v16qi 13% v32qi 5% v64qi 7% ashiftrt v16qi 5% v32q

[Bug target/95524] Subtimal codegen for shift by constant for v16qi/v32qi under -march=skylake

2020-06-15 Thread crazylht at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95524 --- Comment #3 from Hongtao.liu --- (In reply to Hongtao.liu from comment #0) > icc has > --- > ashift(char __vector(16)): > vpsllwxmm1, xmm0, 5 #9.16 > vpand xmm0, xmm1, XMMWORD PTR .L_2il

[Bug target/95488] Suboptimal multiplication codegen for v16qi

2020-06-16 Thread crazylht at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95488 --- Comment #9 from Hongtao.liu --- (In reply to H.J. Lu from comment #8) > -march=skylake-avx512 gave: > > [hjl@gnu-cfl-2 gcc]$ > /export/build/gnu/tools-build/gcc-debug/build-x86_64-linux/gcc/xgcc > -B/export/build/gnu/tools-build/gcc-debug/b

[Bug target/95740] Failure to avoid using the stack when interpreting a float as an integer when it is modified afterwards

2020-06-19 Thread crazylht at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95740 Hongtao.liu changed: What|Removed |Added CC||crazylht at gmail dot com --- Comment #2

[Bug target/95524] Subtimal codegen for shift by constant for v16qi/v32qi under -march=skylake

2020-07-08 Thread crazylht at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95524 Hongtao.liu changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|---

[Bug target/95488] Suboptimal multiplication codegen for v16qi

2020-07-08 Thread crazylht at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95488 Hongtao.liu changed: What|Removed |Added Status|REOPENED|RESOLVED Resolution|---

[Bug target/95766] Failure to directly use vpbroadcastd for _mm_set1_epi32 when passing unsigned short

2020-07-09 Thread crazylht at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95766 --- Comment #1 from Hongtao.liu --- Shouldn't **a** be extended to int first?

[Bug target/95766] Failure to directly use vpbroadcastd for _mm_set1_epi32 when passing unsigned short

2020-07-09 Thread crazylht at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95766 --- Comment #4 from Hongtao.liu --- Simple case: cat test.c: int f(unsigned short a) { return a * 101; } gcc: f(unsigned short): movzwl %di, %eax imull $101, %eax, %eax ret llvm: f(unsigned short): # @f(unsigned short) imull $101,

[Bug target/87767] Missing AVX512 memory broadcast for constant vector

2020-07-13 Thread crazylht at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87767 --- Comment #7 from Hongtao.liu --- a patch is posted at https://gcc.gnu.org/pipermail/gcc-patches/2020-July/549713.html

[Bug target/96186] [11 regressoion] ICE: Unrecognizable insn since r11-1970-fab263ab0fc10ea08409b80afa7e8569438b8d28

2020-07-14 Thread crazylht at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96186 Hongtao.liu changed: What|Removed |Added CC||crazylht at gmail dot com --- Comment #2

[Bug target/96201] x86 movsd/movsq string instructions and alignment inference

2020-07-14 Thread crazylht at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96201 Hongtao.liu changed: What|Removed |Added CC||crazylht at gmail dot com --- Comment #1

[Bug target/96243] New: For vector compare to mask register, UNSPEC is needed instead of comparison operator

2020-07-19 Thread crazylht at gmail dot com
Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: crazylht at gmail dot com Target Milestone: --- Target: i386, x86-64 When tring to relax (define_expand "_eq3" [(set (match_

[Bug tree-optimization/96244] New: Redudant mask load generated

2020-07-19 Thread crazylht at gmail dot com
Assignee: unassigned at gcc dot gnu.org Reporter: crazylht at gmail dot com Target Milestone: --- cat test.c --- typedef int v8si __attribute__ ((__vector_size__ (32))); v8si foo (v8si a, v8si b, v8si c, v8si d) { v8si e; for (int i = 0; i != 8; i++) e[i] = a[i] >

[Bug target/96243] For vector compare to mask register, UNSPEC is needed instead of comparison operator

2020-07-19 Thread crazylht at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96243 --- Comment #1 from Hongtao.liu --- cut from cse.c --- 3342 case RTX_COMPARE: 3343 case RTX_COMM_COMPARE: 3344 /* See what items are actually being compared and set FOLDED_ARG[01] 3345 to those values and CODE to the actual

[Bug target/96246] New: [AVX512] unefficient code generatation for vpblendm*

2020-07-19 Thread crazylht at gmail dot com
Component: target Assignee: unassigned at gcc dot gnu.org Reporter: crazylht at gmail dot com Target Milestone: --- Target: i386, x86-64 cat test.c --- typedef int v8si __attribute__ ((__vector_size__ (32))); v8si foo (v8si a, v8si b, v8si c, v8si d) { return a >

[Bug tree-optimization/96244] Redudant mask load generated

2020-07-20 Thread crazylht at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96244 --- Comment #2 from Hongtao.liu --- (In reply to Richard Biener from comment #1) > so range-info is one index too pessimistic here. So IMHO it's not about > "redundant" masked loads, it's about the fact that we end up with loads > at all here.

[Bug target/96246] [AVX512] unefficient code generatation for vpblendm*

2020-07-20 Thread crazylht at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96246 --- Comment #2 from Hongtao.liu --- (In reply to Richard Biener from comment #1) > With -mavx2 it works: > > vpcmpgtd%ymm1, %ymm0, %ymm0 > vpblendvb %ymm0, %ymm2, %ymm3, %ymm0 > > not sure how _load comes into play

[Bug target/96262] [11 Regression] ICE: in decompose, at rtl.h:2280 with -O -mavx512bw

2020-07-21 Thread crazylht at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96262 Hongtao.liu changed: What|Removed |Added CC||crazylht at gmail dot com --- Comment #1

[Bug target/96262] [11 Regression] ICE: in decompose, at rtl.h:2280 with -O -mavx512bw

2020-07-21 Thread crazylht at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96262 --- Comment #2 from Hongtao.liu --- 2268inline wi::storage_ref 2269wi::int_traits ::decompose (HOST_WIDE_INT *, 2270unsigned int precision, 2271const rtx_mode_t &x) 2

[Bug target/96273] ice in extract_insn, at recog.c:2294, unrecognizable insn:

2020-07-21 Thread crazylht at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96273 Hongtao.liu changed: What|Removed |Added CC||ubizjak at gmail dot com --- Comment #2 fr

[Bug target/96271] Failure to optimize memcmp of doubles to avoid going through memory

2020-07-22 Thread crazylht at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96271 Hongtao.liu changed: What|Removed |Added CC||crazylht at gmail dot com --- Comment #3

[Bug target/96262] [11 Regression] ICE: in decompose, at rtl.h:2280 with -O -mavx512bw since r11-1411-gc7199fb6e694d1a0

2020-07-23 Thread crazylht at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96262 --- Comment #3 from Hongtao.liu --- a patch is posted at https://gcc.gnu.org/pipermail/gcc-patches/2020-July/550427.html

[Bug target/96350] New: [cet] For ENDBR immediate, the binary would include a gadget that starts with a fake ENDBR64 opcode.

2020-07-27 Thread crazylht at gmail dot com
: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: crazylht at gmail dot com CC: hjl.tools at gmail dot com Target Milestone: --- Target: i386, x86-64 ENDBR32 and ENDBR64

[Bug target/96476] [Request] expose preferred vector width to preprocessor

2020-08-05 Thread crazylht at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96476 Hongtao.liu changed: What|Removed |Added CC||crazylht at gmail dot com --- Comment #1

[Bug tree-optimization/96481] New: SLP fail to vectorize VEC_COND_EXPR pattern.

2020-08-05 Thread crazylht at gmail dot com
-optimization Assignee: unassigned at gcc dot gnu.org Reporter: crazylht at gmail dot com Target Milestone: --- testcase not vectorized: - #include inline unsigned opt(unsigned a, unsigned b, unsigned c, unsigned d) { return a > b ? c : d; } void opt( unsig

[Bug target/70314] AVX512 not using kandw to combine comparison results

2020-08-05 Thread crazylht at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70314 Hongtao.liu changed: What|Removed |Added CC||crazylht at gmail dot com --- Comment #5

[Bug target/70314] AVX512 not using kandw to combine comparison results

2020-08-05 Thread crazylht at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70314 --- Comment #6 from Hongtao.liu --- Same issue mentioned in PR88808

[Bug target/96243] For vector compare to mask register, UNSPEC is needed instead of comparison operator

2020-08-09 Thread crazylht at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96243 Hongtao.liu changed: What|Removed |Added Status|NEW |RESOLVED Resolution|---

[Bug tree-optimization/96512] wrong code generated with avx512 intrinsics in some cases

2020-08-09 Thread crazylht at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96512 --- Comment #4 from Hongtao.liu --- It's ok with GCC8.4.0. /export/liuhongt/install/gcc8.4.0/bin/gcc -O1 -D_GCC_VEC_=1 -march=skylake-avx512 test.c -lm ./a.out SIMD: avx512 -- vector size = 8 :: 0 == 0 :: 0.067 == 0.067 :: 0.13 =

[Bug c++/96535] GCC 10 ignoring function __attribute__ optimize for all x86

2020-08-10 Thread crazylht at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96535 --- Comment #1 from Hongtao.liu --- for cmdline option, it's handled in process_options which will enable flag_cunroll_grow_size which is the real effective flag to unroll the loop in testcase. cut from toplev.c --- /* Unrolling all loops impl

[Bug target/96536] -fcf-protection code in i386.md:restore_stack_nonlocal uses invalid compare-and-jump rtl

2020-08-10 Thread crazylht at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96536 --- Comment #1 from Hongtao.liu --- I'm testing patch like diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index b24a4557871..269c528c3ad 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -19132,15 +19132,15 @@

[Bug target/96536] -fcf-protection code in i386.md:restore_stack_nonlocal uses invalid compare-and-jump rtl

2020-08-10 Thread crazylht at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96536 --- Comment #3 from Hongtao.liu --- (In reply to Uroš Bizjak from comment #2) > (In reply to Hongtao.liu from comment #1) > > I'm testing patch like > > You can probably use gen_sub2_insn here. > > On a related note, "@" prefix can be used for

[Bug target/96551] [10/11 Regression] FAIL: gcc.target/i386/vectorize8.c (internal compiler error)

2020-08-10 Thread crazylht at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96551 --- Comment #1 from Hongtao.liu --- For `vec_unpacku_float_hi_v16si` `vec_unpacku_float_lo_v16si` --- diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index cf083ca28aa..2e60f596bc1 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/confi

[Bug target/96562] Rather poor assembly generated for copy-list-initialization in return statement.

2020-08-11 Thread crazylht at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96562 --- Comment #3 from Hongtao.liu --- a simple c testcase typedef struct { unsigned char* p; unsigned int a; }st; st foo (unsigned char* p, unsigned char* q) { return {p, (unsigned int)(q-p)}; } There's two issues here. 1. gcc use memory

[Bug target/96562] Rather poor assembly generated for copy-list-initialization in return statement.

2020-08-11 Thread crazylht at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96562 --- Comment #4 from Hongtao.liu --- in ix86_expand_pinsr with src:(reg:DI 88) dst:(subreg:DI (reg:TI 84 [ D.1940 ]) 8) pos: 64 size: 32 it goes into --- 20360 20361 case E_SImode: 20362 if (!TARGET_SSE4_1) 20363

[Bug testsuite/96574] FAIL: gcc.target/i386/pr92865-1.c scan-assembler-times vmovdq[au]16[\t ] 6

2020-08-11 Thread crazylht at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96574 Hongtao.liu changed: What|Removed |Added CC||crazylht at gmail dot com --- Comment #1

[Bug target/96562] Rather poor assembly generated for copy-list-initialization in return statement.

2020-08-11 Thread crazylht at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96562 --- Comment #6 from Hongtao.liu --- I'm testing this patch diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c index e194214804b..29809d69782 100644 --- a/gcc/config/i386/i386-expand.c +++ b/gcc/config/i386/i386-expand.c @

[Bug target/96578] [11 Regression] ICE in extract_insn, at recog.c:2294 since r11-2623-g99e4891ed552aca4

2020-08-12 Thread crazylht at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96578 --- Comment #1 from Hongtao.liu --- It's the same as PR96551.

[Bug target/96246] [AVX512] unefficient code generatation for vpblendm*

2020-08-12 Thread crazylht at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96246 Hongtao.liu changed: What|Removed |Added Resolution|--- |FIXED Status|NEW

[Bug target/96536] -fcf-protection code in i386.md:restore_stack_nonlocal uses invalid compare-and-jump rtl

2020-08-13 Thread crazylht at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96536 --- Comment #5 from Hongtao.liu --- (In reply to Uroš Bizjak from comment #4) > Created attachment 49060 [details] > Proposed patch > > Attached patch completely rewrites restore_stack_nonlocal expander. > > Can someone please test the patch on

[Bug target/96350] [cet] For ENDBR immediate, the binary would include a gadget that starts with a fake ENDBR64 opcode.

2020-08-16 Thread crazylht at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96350 Hongtao.liu changed: What|Removed |Added Resolution|--- |FIXED Status|NEW

[Bug testsuite/96574] FAIL: gcc.target/i386/pr92865-1.c scan-assembler-times vmovdq[au]16[\t ] 6

2020-08-16 Thread crazylht at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96574 --- Comment #2 from Hongtao.liu --- This testcase is used to check vector compare to integer mask. So i deleted scan-assembler for vmov instruction, also add -mprefer-vector-width=512 to avoid impact of different default arch of GCC. --- a/gcc/t

[Bug middle-end/96625] Unnecessarily large assembly generated when a bit-offsetted higher-end end of a uint64_t-backed bitfield is shifted toward the high end (left) by its bit-offset

2020-08-17 Thread crazylht at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96625 --- Comment #1 from Hongtao.liu --- movabs rax,0x1ff8 --- it also clear high 3 bits. andrax,rdi differs from andrax,0xfff8 using g++ -O2 test.c -S got --- movq%rdi, %rax andq$-8, %rax

[Bug testsuite/96574] FAIL: gcc.target/i386/pr92865-1.c scan-assembler-times vmovdq[au]16[\t ] 6

2020-08-17 Thread crazylht at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96574 --- Comment #4 from Hongtao.liu --- Fixed in GCC11.

[Bug target/93897] Poor trivial structure initialization code with -O3

2020-08-17 Thread crazylht at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93897 --- Comment #6 from Hongtao.liu --- Fixed in GCC11, backport to GCC10.

[Bug target/96562] Rather poor assembly generated for copy-list-initialization in return statement.

2020-08-17 Thread crazylht at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96562 --- Comment #9 from Hongtao.liu --- Fixed in GCC11, backport to GCC10.

[Bug target/96667] New: FAIL: gcc.target/i386/avx512bw-pr96246-1.c

2020-08-17 Thread crazylht at gmail dot com
Assignee: unassigned at gcc dot gnu.org Reporter: crazylht at gmail dot com Target Milestone: --- Target: i386, x86-64 On Linux/x86_64, 7123217afb33d4a2860f552ad778a819cc8dea5e is the first bad commit commit 7123217afb33d4a2860f552ad778a819cc8dea5e Author

[Bug target/96667] FAIL: gcc.target/i386/avx512bw-pr96246-1.c

2020-08-17 Thread crazylht at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96667 --- Comment #1 from Hongtao.liu --- Testcase need to be adjusted. I'll rewrite testcase with cpp source file, then vector compare operator could be used directly. --- a/gcc/testsuite/gcc.target/i386/avx512bw-pr96246-1.c +++ b/gcc/testsuite/g++.t

[Bug target/96536] -fcf-protection code in i386.md:restore_stack_nonlocal uses invalid compare-and-jump rtl

2020-08-18 Thread crazylht at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96536 --- Comment #7 from Hongtao.liu --- (In reply to Hongtao.liu from comment #5) > (In reply to Uroš Bizjak from comment #4) > > Created attachment 49060 [details] > > Proposed patch > > > > Attached patch completely rewrites restore_stack_nonloc

[Bug target/88808] bitwise operators on AVX512 masks fail to use the new mask instructions

2020-08-20 Thread crazylht at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88808 --- Comment #5 from Hongtao.liu --- Fixed in GCC11.

[Bug target/88798] AVX512BW code does not use bit-operations that work on mask registers

2020-08-20 Thread crazylht at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88798 Hongtao.liu changed: What|Removed |Added CC||crazylht at gmail dot com --- Comment #5

[Bug target/88473] AVX512: constant folding on mask does not remove unnecessary instructions

2020-08-20 Thread crazylht at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88473 Hongtao.liu changed: What|Removed |Added CC||crazylht at gmail dot com --- Comment #6

[Bug target/71453] Spills to vector registers are sub-optimal.

2020-08-20 Thread crazylht at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71453 Hongtao.liu changed: What|Removed |Added CC||crazylht at gmail dot com --- Comment #7

[Bug target/96262] [11 Regression] ICE: in decompose, at rtl.h:2280 with -O -mavx512bw since r11-1411-gc7199fb6e694d1a0

2020-08-21 Thread crazylht at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96262 Hongtao.liu changed: What|Removed |Added Resolution|--- |FIXED Status|ASSIGNED

[Bug target/96744] [11 Regression] FAIL: gcc.target/i386/avx512bitalgvl-vpopcntb-1.c execution test

2020-08-24 Thread crazylht at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96744 --- Comment #2 from Hongtao.liu --- Created attachment 49107 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49107&action=edit Enable spill to mask only under m_core_AVX512 this patch will fail cat test.c #include void _mm512_2interse

[Bug target/96755] [11 Regression] ICE in final_scan_insn_1, at final.c:3073 with -O3 -march=skylake-avx512

2020-08-24 Thread crazylht at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96755 --- Comment #2 from Hongtao.liu --- Sorry for TYPO --- (define_split [(set (match_operand:DI 0 "mask_reg_operand") (zero_extend:DI - (not:DI (match_operand:SI 1 "mask_reg_operand"] + (not:SI (match_operand:SI 1 "ma

[Bug target/96744] [11 Regression] FAIL: gcc.target/i386/avx512bitalgvl-vpopcntb-1.c execution test

2020-08-24 Thread crazylht at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96744 --- Comment #7 from Hongtao.liu --- (In reply to Uroš Bizjak from comment #5) > (In reply to Hongtao.liu from comment #2) > > > Need to add define_insn for movp2qi/movp2hi? > > Yes, this is needed to cover some corner cases. Please see attachme

[Bug target/96755] [11 Regression] ICE in final_scan_insn_1, at final.c:3073 with -O3 -march=skylake-avx512

2020-08-24 Thread crazylht at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96755 --- Comment #4 from Hongtao.liu --- Fixed in GCC11.

[Bug target/96667] FAIL: gcc.target/i386/avx512bw-pr96246-1.c

2020-08-25 Thread crazylht at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96667 Hongtao.liu changed: What|Removed |Added Resolution|--- |FIXED Status|UNCONFIRMED

[Bug target/96744] [11 Regression] FAIL: gcc.target/i386/avx512bitalgvl-vpopcntb-1.c execution test

2020-08-27 Thread crazylht at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96744 --- Comment #9 from Hongtao.liu --- (In reply to Hongtao.liu from comment #7) > (In reply to Uroš Bizjak from comment #5) > > (In reply to Hongtao.liu from comment #2) > > > > > Need to add define_insn for movp2qi/movp2hi? > > > > Yes, this is

[Bug target/96744] [11 Regression] FAIL: gcc.target/i386/avx512bitalgvl-vpopcntb-1.c execution test

2020-08-27 Thread crazylht at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96744 --- Comment #10 from Hongtao.liu --- (In reply to Uroš Bizjak from comment #3) > Created attachment 49112 [details] > Retune mask <-> general moves cost > > It looks to me that mask <-> general cost is too low, so the compiler now > prefers thes

[Bug target/96246] [AVX512] unefficient code generatation for vpblendm*

2020-08-28 Thread crazylht at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96246 --- Comment #6 from Hongtao.liu --- (In reply to Nathan Sidwell from comment #5) > FAIL: g++.target/i386/avx512bw-pr96246-2.C execution test > FAIL: g++.target/i386/avx512vl-pr96246-2.C execution test > > > the tests can fail at runtime, be

[Bug target/96849] [11 Regression] ICE: in extract_insn, at recog.c:2294 (error: unrecognizable insn) since r11-2623

2020-08-30 Thread crazylht at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96849 Hongtao.liu changed: What|Removed |Added CC||crazylht at gmail dot com --- Comment #3

[Bug target/96551] [10/11 Regression] FAIL: gcc.target/i386/vectorize8.c (internal compiler error)

2020-08-30 Thread crazylht at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96551 --- Comment #3 from Hongtao.liu --- a patch is posted at https://gcc.gnu.org/pipermail/gcc-patches/2020-August/552230.html

[Bug target/96855] New: r11-571 regression FAIL: gcc.target/i386/pr92658-1.c

2020-08-30 Thread crazylht at gmail dot com
Component: target Assignee: unassigned at gcc dot gnu.org Reporter: crazylht at gmail dot com Target Milestone: --- Target: x86_64-*-* i?86-*-* On Linux/x86_64, e740f3d73144abbca1ad98a04825c6bd63314a0b is the first bad commit commit

  1   2   3   4   5   6   7   8   9   10   >