https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105363
--- Comment #2 from Hongtao.liu ---
Looks like neither ICC nor LLVM vectorized the loop
https://godbolt.org/z/sbheqbE6Y
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105363
Hongtao.liu changed:
What|Removed |Added
CC||crazylht at gmail dot com
--- Comment #1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105354
--- Comment #2 from Hongtao.liu ---
(In reply to Hongtao.liu from comment #1)
> Yes, and I think it's only available for simd128u8, not for
> simd128u16/u32/u64.
No, under sse2 the optimization is also availble for simd128u16, directly
generate
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105354
Hongtao.liu changed:
What|Removed |Added
CC||crazylht at gmail dot com
--- Comment #1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105339
--- Comment #2 from Hongtao.liu ---
We need to add macro for _mm_{mask,maskz}_scalef_round_{sd,ss} intriniscs for
gcc-9/10/11/12
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105339
Hongtao.liu changed:
What|Removed |Added
CC||crazylht at gmail dot com
--- Comment #1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105288
--- Comment #2 from Hongtao.liu ---
I think HJ means avx__ can be extended to
evex sse registes by change "x" to "v" when AVX512VL is available.
For avx512f__, it should be
"=Yv,m"
" vm,v"
since operands[0] could be allocated as evex register
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105253
Hongtao.liu changed:
What|Removed |Added
CC||crazylht at gmail dot com
--- Comment #16
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105216
--- Comment #11 from Hongtao.liu ---
There's post_loop pass_thread_jumps, add a copy of pass_pre doesn't help.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105216
--- Comment #10 from Hongtao.liu ---
Probably related to below 3 cancelled jump thread which affects pre.
2076Threading through latch before loop opts would create non-empty latch:
2077 Cancelling jump thread: (15, 16) incoming edge; (16, 43)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105216
--- Comment #9 from Hongtao.liu ---
(In reply to Hongtao.liu from comment #8)
> (In reply to rguent...@suse.de from comment #7)
> > On Mon, 11 Apr 2022, crazylht at gmail dot com wrote:
> >
> > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105216
--- Comment #8 from Hongtao.liu ---
(In reply to rguent...@suse.de from comment #7)
> On Mon, 11 Apr 2022, crazylht at gmail dot com wrote:
>
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105216
> >
> > --- Comment #5 from Hongtao.liu ---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105216
--- Comment #6 from Hongtao.liu ---
(In reply to Hongtao.liu from comment #5)
> My bisect shows it's caused by
> r12-3876-g4a960d548b7d7d942f316c5295f6d849b74214f5
pre dump before vs after
-goto ; [11.00%]
-
- [local count: 105119324]:
-
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105216
--- Comment #5 from Hongtao.liu ---
My bisect shows it's caused by
r12-3876-g4a960d548b7d7d942f316c5295f6d849b74214f5
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105216
--- Comment #3 from Hongtao.liu ---
(In reply to Andrew Pinski from comment #1)
> Does -fno-tree-vectorizer help? There is definitely another big recording
> the fact pre had to turn something off while vectorization is turned on.
No, not relat
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105216
Bug ID: 105216
Summary: [12 regression] 8% regression for m-queens compared to
gcc11 O2
Product: gcc
Version: 12.0
Status: UNCONFIRMED
Severity: normal
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105033
--- Comment #1 from Hongtao.liu ---
Created attachment 52776
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52776&action=edit
Patch pending for GCC13
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102583
--- Comment #4 from Hongtao.liu ---
Created attachment 52771
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52771&action=edit
Pending patch for GCC13.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101908
--- Comment #46 from Hongtao.liu ---
Another issue is splitting vector load to halves or elements, the latter
requires scratch registers which may not be available, the former doesn't
require extra register but may still trigger STLF stalls. For
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101908
--- Comment #44 from Hongtao.liu ---
(In reply to Hongtao.liu from comment #43)
> One thing I found by experiments:
> Insert 64 vaddps %xmm18, %xmm19, %xmm20(no dependence between each other,
> just emulate for pipeline) before stalled load, stl
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101908
--- Comment #43 from Hongtao.liu ---
One thing I found by experiments:
Insert 64 vaddps %xmm18, %xmm19, %xmm20(no dependence between each other, just
emulate for pipeline) before stalled load, stlf stall case is as fast as no
stall cases on CLX.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104915
--- Comment #2 from Hongtao.liu ---
Created attachment 52705
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52705&action=edit
Patch pending for GCC13
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105072
--- Comment #1 from Hongtao.liu ---
Created attachment 52699
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52699&action=edit
Patch pending for GCC13
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105066
--- Comment #4 from Hongtao.liu ---
Fixed in GCC12.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104610
Hongtao.liu changed:
What|Removed |Added
Attachment #52495|0 |1
is obsolete|
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104610
--- Comment #15 from Hongtao.liu ---
Could someone help to mark this blocks PR105073, the patch is ready and waiting
for GCC13.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105073
Bug ID: 105073
Summary: [meta bug]Patch pending for GCC13.
Product: gcc
Version: 12.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105066
--- Comment #2 from Hongtao.liu ---
> That may be a separate bug, IDK
>
Open PR105072 for it.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105072
Bug ID: 105072
Summary: Miss optimization for pmovzxbq.
Product: gcc
Version: 12.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99754
--- Comment #9 from Hongtao.liu ---
(In reply to Hongtao.liu from comment #7)
> >
> > But that's unrelated to correctness; this bug can be closed unless we're
> > keeping it open until it's fixed in the GCC11 current stable series.
>
> Let me d
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105066
--- Comment #1 from Hongtao.liu ---
pinsrw is under sse2 for both reg and mem operands, but not for pextrw which
requires sse4.1 for memory operands.
10593(define_insn "vec_set_0"
10594 [(set (match_operand:V8_128 0 "register_operand"
10595
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104915
--- Comment #1 from Hongtao.liu ---
As described in PR105066, pinsrw mem should be better than movzx + vmovd.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84508
Hongtao.liu changed:
What|Removed |Added
CC||crazylht at gmail dot com
--- Comment #15
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99754
--- Comment #7 from Hongtao.liu ---
>
> But that's unrelated to correctness; this bug can be closed unless we're
> keeping it open until it's fixed in the GCC11 current stable series.
Let me do the backporting.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105058
--- Comment #1 from Hongtao.liu ---
cat test.c
#include
unsigned int ctrl;
__m128i k1, k2, k3;
void
test_keylocker_11 (void)
{
register __m128i k4 __asm ("xmm16") = k2;
asm volatile ("" : "+v" (k4));
_mm_loadiwkey (ctrl, k1, k4,
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104976
--- Comment #4 from Hongtao.liu ---
ICE is fixed in GCC12, and I'd like to keep this PR open for refining
validate_subreg in GCC13.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105034
Bug ID: 105034
Summary: [10/11/12 regression]Suboptimal codegen for min/max
with -Os
Product: gcc
Version: 12.0
Status: UNCONFIRMED
Severity: normal
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105033
Bug ID: 105033
Summary: Suboptimal for vec_concat lower halves of two vectors.
Product: gcc
Version: 12.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Comp
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105032
Hongtao.liu changed:
What|Removed |Added
CC||crazylht at gmail dot com
--- Comment #2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104978
Hongtao.liu changed:
What|Removed |Added
Resolution|--- |FIXED
Status|UNCONFIRMED
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105000
--- Comment #4 from Hongtao.liu ---
(In reply to Martin Liška from comment #3)
> Can we close it as fixed?
I think so.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104982
--- Comment #7 from Hongtao.liu ---
Fixed in GCC12.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104982
--- Comment #4 from Hongtao.liu ---
I'm testing
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 02f298c2846..c74edd1aaef 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -14182,12 +14182,12 @@ (define_i
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104982
--- Comment #3 from Hongtao.liu ---
(In reply to Hongtao.liu from comment #2)
> 334Failed to match this instruction:
> 335(set (reg/v:SI 88 [ z ])
> 336(if_then_else:SI (eq (zero_extract:SI (reg:SI 92)
> 337(const_int 1 [0x1]
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104982
--- Comment #2 from Hongtao.liu ---
334Failed to match this instruction:
335(set (reg/v:SI 88 [ z ])
336(if_then_else:SI (eq (zero_extract:SI (reg:SI 92)
337(const_int 1 [0x1])
338(zero_extend:SI (subreg:QI (r
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104977
Hongtao.liu changed:
What|Removed |Added
Resolution|--- |FIXED
Status|UNCONFIRMED
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104978
--- Comment #3 from Hongtao.liu ---
(In reply to Hongtao.liu from comment #2)
> (In reply to Hongtao.liu from comment #0)
> > #include
> > __m128h
> > foo (__m128h a, __m128h b, __m128h c, __mmask8 m)
> > {
> > return _mm_mask_fcmadd_round_
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104978
--- Comment #2 from Hongtao.liu ---
(In reply to Hongtao.liu from comment #0)
> #include
> __m128h
> foo (__m128h a, __m128h b, __m128h c, __mmask8 m)
> {
> return _mm_mask_fcmadd_round_sch (a, m, b, c, 8);
> }
>
>
> _Z3fooDv8_DF16_S_S_h:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104978
--- Comment #1 from Hongtao.liu ---
Similar for _mm_mask_fmadd_round_sch
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104977
--- Comment #1 from Hongtao.liu ---
Similar for _mm_fmadd_round_sch
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104978
Bug ID: 104978
Summary: [avx512fp16] wrong code for _mm_mask_fcmadd_round_sch
Product: gcc
Version: 12.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Compo
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104977
Bug ID: 104977
Summary: [avx512fp16] wrong code for vfmaddcsh when
-masm=intel.
Product: gcc
Version: 12.0
Status: UNCONFIRMED
Severity: normal
Prior
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104974
Hongtao.liu changed:
What|Removed |Added
Resolution|--- |FIXED
Status|UNCONFIRMED
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104963
Hongtao.liu changed:
What|Removed |Added
Resolution|--- |FIXED
Status|UNCONFIRMED
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104976
--- Comment #2 from Hongtao.liu ---
(In reply to Hongtao.liu from comment #1)
> The walkround in the backend is force_reg operand[1] before lowpart_subreg
> to avoid NULL_RTX.
It would be nice if we extend validate_subreg to avoid wired situati
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104976
Hongtao.liu changed:
What|Removed |Added
Target||x86_64-*-* i?86-*-*
Keywords|
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104976
Bug ID: 104976
Summary: [avx512fp16] lowpart_subreg return NULL_RTX cause ICE
Product: gcc
Version: 12.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Compo
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104974
Hongtao.liu changed:
What|Removed |Added
Target||x86_64-*-* i?86-*-*
Keywords|
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104974
Bug ID: 104974
Summary: [avx512fp16] Error: operand type mismatch for `vmovw'
Product: gcc
Version: 12.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Comp
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104963
Bug ID: 104963
Summary: GCC11/12 -march=sapphirerapids miss some isa.
Product: gcc
Version: 11.2.1
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104950
--- Comment #6 from Hongtao.liu ---
(In reply to Andrew Pinski from comment #5)
> (In reply to Hongtao.liu from comment #4)
> > (In reply to Richard Biener from comment #3)
> > > Ah, on aarch64 we get
> > >
> > > cmp w0, 0
> > >
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104950
Hongtao.liu changed:
What|Removed |Added
CC||crazylht at gmail dot com
--- Comment #4
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104946
Hongtao.liu changed:
What|Removed |Added
Status|UNCONFIRMED |RESOLVED
Resolution|---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104946
Hongtao.liu changed:
What|Removed |Added
Keywords||missed-optimization
Target|
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104946
Bug ID: 104946
Summary: [12 regression] Suboptimal gimple foding for blendvpd
under sse4.1
Product: gcc
Version: 12.0
Status: UNCONFIRMED
Severity: normal
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101908
--- Comment #41 from Hongtao.liu ---
(In reply to Richard Biener from comment #22)
> (In reply to Hongtao.liu from comment #21)
> > Now we have SLP node available in vector cost hook, maybe we can do sth in
> > cost model to prevent vectorizatio
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104915
Bug ID: 104915
Summary: Miss optimization for vec_setv8hi_0
Product: gcc
Version: 12.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101908
--- Comment #39 from Hongtao.liu ---
> I'll see if I get around to prototype some argument classification
> in the vectorizer (looking how hard it is to use
> INIT_CUMULATIVE_ARGS in a context where we are not expanding to RTL),
> unfortunately
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104666
--- Comment #8 from Hongtao.liu ---
Fixed in GCC12.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101908
--- Comment #37 from Hongtao.liu ---
> There is not much value in the vectorization we do in this function
> (when manually fixing the STLF issue the speed is as good as with the
> scalar code). We cost
>
> ray.dir.x 1 times scalar_load costs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101908
--- Comment #35 from Hongtao.liu ---
(In reply to Richard Biener from comment #34)
> I can confirm this observation on Zen2. Note perf still records STLF
> failures
penalty is much higher on Znver3 than zen2 for the same case(v2df).
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101908
--- Comment #33 from Hongtao.liu ---
(In reply to Hongtao.liu from comment #32)
> (In reply to Hongtao.liu from comment #31)
> > Created attachment 52595 [details]
> > microbenchmark
>
The interesting the microbenchmark didn't hit store forward
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101908
--- Comment #32 from Hongtao.liu ---
(In reply to Hongtao.liu from comment #31)
> Created attachment 52595 [details]
> microbenchmark
The microbenchmark is used to test penalty for STFS, I've run it on CLX, and
find 1 stalled vector load is fas
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101908
--- Comment #31 from Hongtao.liu ---
Created attachment 52595
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52595&action=edit
microbenchmark
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101908
--- Comment #30 from Hongtao.liu ---
Created attachment 52594
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52594&action=edit
tar -xvf micro.tar.gz
Num/Typechar/s char/v char/vn short/s short/v short/vnint/s
int/v i
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101929
--- Comment #8 from Hongtao.liu ---
(In reply to Richard Biener from comment #7)
> diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
> index 9188d727e33..7f1f12fb6c6 100644
> --- a/gcc/tree-vect-slp.cc
> +++ b/gcc/tree-vect-slp.cc
> @@ -2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104773
--- Comment #1 from Hongtao.liu ---
It looks like the same issue as PR98977.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104704
--- Comment #13 from Hongtao.liu ---
(In reply to H.J. Lu from comment #10)
> Created attachment 52553 [details]
> A patch to always return pseudo register in ix86_gen_scratch_sse_rtx
Please go ahead with this patch, i'll submit an incremental
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104704
--- Comment #12 from Hongtao.liu ---
(In reply to H.J. Lu from comment #10)
> Created attachment 52553 [details]
> A patch to always return pseudo register in ix86_gen_scratch_sse_rtx
For pr100865-8a.c,pr100865-9c.c,pr100865-8c.c
+/* { dg-fina
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104762
Hongtao.liu changed:
What|Removed |Added
CC||crazylht at gmail dot com
--- Comment #2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104704
--- Comment #11 from Hongtao.liu ---
(In reply to H.J. Lu from comment #9)
> --- pieces-memset-46.s2022-03-02 06:44:55.845212762 -0800
> +++
> /export/build/gnu/tools-build/gcc-gitlab-debug/build-x86_64-linux/gcc/pieces-
> memset-46.s
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104704
--- Comment #8 from Hongtao.liu ---
(In reply to H.J. Lu from comment #4)
> (In reply to Hongtao.liu from comment #3)
> > (In reply to H.J. Lu from comment #1)
> > > ix86_expand_vector_move shouldn't use ix86_gen_scratch_sse_rtx.
> >
> > Is it
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101908
--- Comment #29 from Hongtao.liu ---
>From Agner Fog's excellent optimization
manuals(https://www.agner.org/optimize/microarchitecture.pdf).
For ICX/TGL:
An aligned write of 128 bits or more followed by a read of one or both of the
two halves
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104723
--- Comment #2 from Hongtao.liu ---
update testcase
void f256(char *a)
{
char t[] = "012345678901234567890123456789012345678901234567";
__builtin_memcpy(a, &t[0], sizeof(t));
}
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104723
--- Comment #1 from Hongtao.liu ---
(In reply to Hongtao.liu from comment #0)
> bool f256(char *a)
> {
> char t[] = "012345678901234567890123456789012345678901234567";
> return __builtin_memcpy(a, &t[0], sizeof(t)) == 0;
> }
>
> https://god
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104723
Bug ID: 104723
Summary: [12 regression] Redundant usage of stack
Product: gcc
Version: 12.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104704
--- Comment #7 from Hongtao.liu ---
(In reply to Hongtao.liu from comment #6)
> (In reply to Hongtao.liu from comment #5)
> > I notice it regresses
> >
> > FAIL: gcc.target/i386/incoming-11.c scan-assembler-not andl[\\t
> > ]*\\$-16,[\\t ]*%esp
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104704
--- Comment #6 from Hongtao.liu ---
(In reply to Hongtao.liu from comment #5)
> I notice it regresses
>
> FAIL: gcc.target/i386/incoming-11.c scan-assembler-not andl[\\t
> ]*\\$-16,[\\t ]*%esp
Why replace ix86_gen_scratch_sse_rtx with gen_reg_
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104704
--- Comment #5 from Hongtao.liu ---
I notice it regresses
FAIL: gcc.target/i386/incoming-11.c scan-assembler-not andl[\\t ]*\\$-16,[\\t
]*%esp
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104686
--- Comment #12 from Hongtao.liu ---
(In reply to Hongtao.liu from comment #11)
> (In reply to Martin Liška from comment #8)
> > (In reply to Martin Liška from comment #7)
> > > (In reply to Richard Biener from comment #6)
> > > > Both revisions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104704
--- Comment #3 from Hongtao.liu ---
(In reply to H.J. Lu from comment #1)
> ix86_expand_vector_move shouldn't use ix86_gen_scratch_sse_rtx.
Is it problematic for TARGET_GEN_MEMSET_SCRATCH_RTX?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104704
--- Comment #2 from Hongtao.liu ---
Yes, thanks for the reproduced testcase.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101908
--- Comment #27 from Hongtao.liu ---
> We can start with disabling vectorization with very cheap cost model to fix
Of course only for (>=)16-byte struct passing.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101908
--- Comment #26 from Hongtao.liu ---
(In reply to Richard Biener from comment #22)
> (In reply to Hongtao.liu from comment #21)
> > Now we have SLP node available in vector cost hook, maybe we can do sth in
> > cost model to prevent vectorizatio
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104686
--- Comment #11 from Hongtao.liu ---
(In reply to Martin Liška from comment #8)
> (In reply to Martin Liška from comment #7)
> > (In reply to Richard Biener from comment #6)
> > > Both revisions affect vectorizer cost modeling only. With
> > >
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101908
--- Comment #21 from Hongtao.liu ---
Now we have SLP node available in vector cost hook, maybe we can do sth in cost
model to prevent vectorization when node's definition from big-size parameter.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104666
--- Comment #6 from Hongtao.liu ---
(In reply to Jakub Jelinek from comment #5)
> Wouldn't the right fix be instead to move the ix86_expand_builtin
Good idea!
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104438
--- Comment #8 from Hongtao.liu ---
(In reply to Martin Liška from comment #7)
> (In reply to Hongtao.liu from comment #6)
> > The opportunity disappear after r12-7125.
>
> Can you please install the latest contrib/gcc-git-customization.sh? Doi
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104666
--- Comment #4 from Hongtao.liu ---
Same ICE exists for
__builtin_ia32_blendvpd
__builtin_ia32_blendvps
__builtin_ia32_blendvpd256
__builtin_ia32_blendvps256
__builtin_ia32_pblendvb128
__builtin_ia32_pblenddvb256
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104666
--- Comment #3 from Hongtao.liu ---
(In reply to Hongtao.liu from comment #2)
> So builtins are registered in the beginning, but isa checking is during
> pass_expand, and gimple folding is between them, maybe we should restrict
> builtin gimple
501 - 600 of 1358 matches
Mail list logo