[r14-4046 Regression] FAIL: 23_containers/vector/bool/110807.cc -std=gnu++17 (test for excess errors) on Linux/x86_64

2023-09-17 Thread Jiang, Haochen via Gcc-patches
On Linux/x86_64,

3a0e01f6bb1d6ec444001f2caea6ef43a4a83e3a is the first bad commit
commit 3a0e01f6bb1d6ec444001f2caea6ef43a4a83e3a
Author: Jonathan Wakely 
Date:   Fri Sep 1 21:27:57 2023 +0100

libstdc++: Add support for running tests with multiple -std options

caused

FAIL: 23_containers/vector/bool/110807.cc  -std=gnu++17 (test for excess errors)

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r14-4046/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/x86_64-linux/libstdc++-v3/testsuite && make check 
RUNTESTFLAGS="conformance.exp=23_containers/vector/bool/110807.cc 
--target_board='unix{-m32}'"
$ cd {build_dir}/x86_64-linux/libstdc++-v3/testsuite && make check 
RUNTESTFLAGS="conformance.exp=23_containers/vector/bool/110807.cc 
--target_board='unix{-m32\ -march=cascadelake}'"

(If you met problems with cascadelake related, disabling AVX512F in command 
line might save that.)
(However, please make sure that there is no potential problems with AVX512.)


[r14-3823 Regression] FAIL: c-c++-common/analyzer/compound-assignment-1.c -std=c++98 (test for warnings, line 72) on Linux/x86_64

2023-09-11 Thread Jiang, Haochen via Gcc-patches
On Linux/x86_64,

50b5199cff690891726877e1c00ac53dfb7cc1c8 is the first bad commit
commit 50b5199cff690891726877e1c00ac53dfb7cc1c8
Author: benjamin priour 
Date:   Sat Sep 9 18:03:56 2023 +0200

analyzer: Move gcc.dg/analyzer tests to c-c++-common (2) [PR96395]

caused

FAIL: c-c++-common/analyzer/compound-assignment-1.c  -std=c++14 (test for 
excess errors)
FAIL: c-c++-common/analyzer/compound-assignment-1.c  -std=c++14  (test for 
warnings, line 72)
FAIL: c-c++-common/analyzer/compound-assignment-1.c  -std=c++17 (test for 
excess errors)
FAIL: c-c++-common/analyzer/compound-assignment-1.c  -std=c++17  (test for 
warnings, line 72)
FAIL: c-c++-common/analyzer/compound-assignment-1.c  -std=c++20 (test for 
excess errors)
FAIL: c-c++-common/analyzer/compound-assignment-1.c  -std=c++20  (test for 
warnings, line 72)
FAIL: c-c++-common/analyzer/compound-assignment-1.c  -std=c++98 (test for 
excess errors)
FAIL: c-c++-common/analyzer/compound-assignment-1.c  -std=c++98  (test for 
warnings, line 72)

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r14-3823/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="analyzer.exp=c-c++-common/analyzer/compound-assignment-1.c 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="analyzer.exp=c-c++-common/analyzer/compound-assignment-1.c 
--target_board='unix{-m32\ -march=cascadelake}'"

(If you met problems with cascadelake related, disabling AVX512F in command 
line might save that.)
(However, please make sure that there is no potential problems with AVX512.)


[r14-3571 Regression] FAIL: gcc.target/i386/pr52252-atom.c scan-assembler palignr on Linux/x86_64

2023-08-30 Thread Jiang, Haochen via Gcc-patches
On Linux/x86_64,

caa7a99a052929d5970677c5b639e1fa5166e334 is the first bad commit
commit caa7a99a052929d5970677c5b639e1fa5166e334
Author: Richard Biener 
Date:   Wed Aug 30 11:57:47 2023 +0200

tree-optimization/111228 - combine two VEC_PERM_EXPRs

caused

FAIL: gcc.target/i386/pr52252-atom.c scan-assembler palignr

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r14-3571/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/pr52252-atom.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/pr52252-atom.c 
--target_board='unix{-m64\ -march=cascadelake}'"

(For question about this report, contact me at haochen dot jiang at intel.com.)
(If you met problems with cascadelake related, disabling AVX512F in command 
line might save that.)
(However, please make sure that there is no potential problems with AVX512.)


RE: Intel AVX10.1 Compiler Design and Support

2023-08-23 Thread Jiang, Haochen via Gcc-patches
> -Original Message-
> From: Richard Biener 
> Sent: Wednesday, August 23, 2023 3:32 PM
> To: Hongtao Liu 
> Cc: Jakub Jelinek ; Jiang, Haochen
> ; ZiNgA BuRgA ; gcc-
> patc...@gcc.gnu.org
> Subject: Re: Intel AVX10.1 Compiler Design and Support
> 
> On Tue, Aug 22, 2023 at 4:36 PM Hongtao Liu  wrote:
> >
> > On Tue, Aug 22, 2023 at 9:54 PM Jakub Jelinek  wrote:
> > >
> > > On Tue, Aug 22, 2023 at 09:35:44PM +0800, Hongtao Liu wrote:
> > > > Ok, then we can't avoid TARGET_AVX10_1 in those existing 256/128-bit
> > > > evex instruction patterns.
> > >
> > > Why?
> > > Internally for md etc. purposes, we should have the current
> > > TARGET_AVX512* etc. ISA flags, plus one new one, whatever we call it
> > > (TARGET_EVEX512 even if it is not completely descriptive because of kandq
> > > etc., or some other name) which says if 512-bit vector modes can be used,
> > > if g modifier can be used, if the 64-bit mask operations can be used etc.
> > > Plus, if AVX10.1 contains any instructions not covered in the preexisting
> > > TARGET_AVX512* sets, TARGET_AVX10_1 which covers that delta, otherwise
> > > keep -mavx10.1 just as an command line option which enables/disables
> > Let's assume there's no detla now, AVX10.1-512 is equal to
> > AVX512{F,VL,BW,DQ,CD,BF16,FP16,VBMI,VBMI2,VNNI,IFMA,BITALG,
> VPOPCNTDQ}
> > > other stuff.
> > > The current common/config/i386/i386-common.cc OPTION_MASK_ISA*SET*
> would be
> > > like now, except that the current AVX512* sets imply also EVEX512/whatever
> > > it will be called, that option itself enables nothing (or TARGET_AVX512F),
> > > and unsetting it doesn't disable all the TARGET_AVX512*.
> > > -mavx10.1 would enable the AVX512* sets without EVEX512/whatever.
> > So for -mavx512bw -mavx10.1-256, -mavx512bw will set EVEX512, but
> > -mavx10.1-256 doesn't clear EVEX512 but just enable all AVX512* sets?.
> 
> As I said earlier -mavx10.1-256 (and -mavx10.1-512) should not exist.
> So instead
> we'd have -mavx512bw -mavx10.1 where -mavx512bw enables evex512 and
> -mavx10.1 will enable the 10.1 ISAs _not affecting_ whether evex512 is
> set or not.
> 
> We then have the -mevex512 flag (or whatever name we agree to) to enable
> (or disable) 512bit support.
> 
> If you insist on having -mavx10.1-256 that should alias to -mavx10.1 +
> -mno-evex512,
> but Jakub disagrees here, so I'd rather not have it at all.  We could have
> -mavx10.1-512 aliasing to -mavx10.1 + -mevex512 (Jakub would agree here).

We could first work on -mevex512 then further discuss -mavx10.1-256/512 since
these -mavx10.1-256/512 is quite controversial.

Just to clarify, -mno-evex512 -mavx512f should not enable 512 bit vector right?

Thx,
Haochen

> 
> Richard.


RE: Intel AVX10.1 Compiler Design and Support

2023-08-23 Thread Jiang, Haochen via Gcc-patches
> -Original Message-
> From: Hongtao Liu 
> Sent: Wednesday, August 23, 2023 10:19 AM
> To: Jiang, Haochen 
> Cc: Jakub Jelinek ; Richard Biener
> ; ZiNgA BuRgA ;
> gcc-patches@gcc.gnu.org
> Subject: Re: Intel AVX10.1 Compiler Design and Support
> 
> On Wed, Aug 23, 2023 at 9:58 AM Jiang, Haochen
>  wrote:
> >
> > > -Original Message-
> > > From: Jakub Jelinek 
> > > Sent: Tuesday, August 22, 2023 11:02 PM
> > > To: Hongtao Liu 
> > > Cc: Richard Biener ; Jiang, Haochen
> > > ; ZiNgA BuRgA ;
> > > gcc- patc...@gcc.gnu.org
> > > Subject: Re: Intel AVX10.1 Compiler Design and Support
> > >
> > > On Tue, Aug 22, 2023 at 10:35:55PM +0800, Hongtao Liu wrote:
> > > > Let's assume there's no detla now, AVX10.1-512 is equal to
> > > >
> AVX512{F,VL,BW,DQ,CD,BF16,FP16,VBMI,VBMI2,VNNI,IFMA,BITALG,VPOPC
> NT
> > > > DQ}
> > > > > other stuff.
> > > > > The current common/config/i386/i386-common.cc
> > > > > OPTION_MASK_ISA*SET* would be like now, except that the current
> > > > > AVX512* sets imply also EVEX512/whatever it will be called, that
> > > > > option itself enables nothing (or TARGET_AVX512F), and unsetting it
> doesn't disable all the TARGET_AVX512*.
> > > > > -mavx10.1 would enable the AVX512* sets without
> EVEX512/whatever.
> > > > So for -mavx512bw -mavx10.1-256, -mavx512bw will set EVEX512, but
> > > > -mavx10.1-256 doesn't clear EVEX512 but just enable all AVX512* sets?.
> > > > then the combination basically is equal to AVX10.1-512(AVX512*
> > > > sets +
> > > > EVEX512)
> > > > If this is your assumption, yes, there's no need for TARGET_AVX10_1.
> >
> > I think we still need that since the current w/o AVX512VL, we will not
> > only enable 512 bit vector instructions but also enable scalar
> > instructions, which means when it comes to -mavx512bw -mno-evex512,
> we
> > should enable the scalar function.
> >
> > And scalar functions will also be enabled in AVX10.1-256, we need
> > something to distinguish them out from the ISA set w/o AVX512VL.
> Why do we need to distinguish scalar evex instruction?
> As long as -mavx512XXX -mno-evex does not generate zmm/64-bit kmask, it
> should be ok.
> 
> Assume there's no delta in AVX10.1, It sounds to me the design should be like
> 
> avx512*  <== mno-evex512==  avx512* + mevex512
> (no-evex512)(original AVX512 stuff)
>/\  /\
>||(equal)   ||(equal)
>\/  \/
> avx10.1-256   avx10.1-512
> /\  /\
> ||  ||
> ||  ||
> impliedimplied
> ||  ||
> ||  ||
> avx10.2-256 <== implied ==  avx10.2-512
> /\ /\
> || ||
> || ||
> impliedImplied
> || ||
> || ||
> avx10.3-256 <== implied ==   avx10.3-512
> 
> 1. The new instructions in avx10.x should be put in either avx10.x-256 or
> avx10.x-512 according to vector/kmask size 2. -mno-evex512 should disable -
> avx10.x-512.
> 3. -mavx512* will defaultly enable -mevex512, but -mavx10.1-256 will just
> enable -mavx512* but not -mevex512

I will revert all the AVX10.1 patches that have been committed in trunk since
the design changed if there is no objection in 24 hours.

Also I am working on a sample patch for -mevex512. Although there is a little
encoding issue in APX EVEX promoted KMOVQ, most of the users will not
notice that. And -mavxex512 is quite straightforward.

Thx,
Haochen

> 
> >
> > Thx,
> > Haochen
> >
> > >
> > > I think that would be my expectation.  -mavx512bw currently implies
> > > 512-bit vector support of avx512f and avx512bw, and with
> > > -mavx512{bw,vl} also 128-bit/256-bit vector support.  All pre-AVX10
> > > chips which do support AVX512BW support 512-bit vectors.  Now,
> > > -mavx10.1 will bring in also
> > > vl,dq,cd,bf16,fp16,vbmi,vbmi2,vnni,ifma,bitalg,vpopcntdq as you
> > > wrote which weren't enabled before, but unless there is some
> > > existing or planned CPU which would support 512-bit vectors in
> > > avx512f and avx512bw ISAs and only support 128/256-bit vectors in
> > > those dq,cd,bf16,fp16,vbmi,vbmi2,vnni,ifma,bitalg,vpopcntdq isas, I
> > > think there is no need to differentiate further; the only CPUs which
> > > will support both what -mavx512bw and -mavx10.1 requires will be (if
> > > there is no delta) either CPUs with 128/256/512-bit vector support of
> those f,vl,bw,dq,cd,...vpopcntdq ISAs, or AVX10.1-512 ISAs.
> > > -mavx512vl -mavx512bw -mno-evex512 

RE: Intel AVX10.1 Compiler Design and Support

2023-08-22 Thread Jiang, Haochen via Gcc-patches
> -Original Message-
> From: Jakub Jelinek 
> Sent: Tuesday, August 22, 2023 11:02 PM
> To: Hongtao Liu 
> Cc: Richard Biener ; Jiang, Haochen
> ; ZiNgA BuRgA ; gcc-
> patc...@gcc.gnu.org
> Subject: Re: Intel AVX10.1 Compiler Design and Support
> 
> On Tue, Aug 22, 2023 at 10:35:55PM +0800, Hongtao Liu wrote:
> > Let's assume there's no detla now, AVX10.1-512 is equal to
> > AVX512{F,VL,BW,DQ,CD,BF16,FP16,VBMI,VBMI2,VNNI,IFMA,BITALG,VPOPCNTDQ}
> > > other stuff.
> > > The current common/config/i386/i386-common.cc OPTION_MASK_ISA*SET* would 
> > > be
> > > like now, except that the current AVX512* sets imply also EVEX512/whatever
> > > it will be called, that option itself enables nothing (or TARGET_AVX512F),
> > > and unsetting it doesn't disable all the TARGET_AVX512*.
> > > -mavx10.1 would enable the AVX512* sets without EVEX512/whatever.
> > So for -mavx512bw -mavx10.1-256, -mavx512bw will set EVEX512, but
> > -mavx10.1-256 doesn't clear EVEX512 but just enable all AVX512* sets?.
> > then the combination basically is equal to AVX10.1-512(AVX512* sets +
> > EVEX512)
> > If this is your assumption, yes, there's no need for TARGET_AVX10_1.

I think we still need that since the current w/o AVX512VL, we will not only
enable 512 bit vector instructions but also enable scalar instructions, which
means when it comes to -mavx512bw -mno-evex512, we should enable
the scalar function.

And scalar functions will also be enabled in AVX10.1-256, we need something
to distinguish them out from the ISA set w/o AVX512VL.

Thx,
Haochen

> 
> I think that would be my expectation.  -mavx512bw currently implies
> 512-bit vector support of avx512f and avx512bw, and with -mavx512{bw,vl}
> also 128-bit/256-bit vector support.  All pre-AVX10 chips which do support
> AVX512BW support 512-bit vectors.  Now, -mavx10.1 will bring in also
> vl,dq,cd,bf16,fp16,vbmi,vbmi2,vnni,ifma,bitalg,vpopcntdq as you wrote
> which weren't enabled before, but unless there is some existing or planned
> CPU which would support 512-bit vectors in avx512f and avx512bw ISAs and
> only support 128/256-bit vectors in those
> dq,cd,bf16,fp16,vbmi,vbmi2,vnni,ifma,bitalg,vpopcntdq isas, I think there
> is no need to differentiate further; the only CPUs which will support both
> what -mavx512bw and -mavx10.1 requires will be (if there is no delta)
> either CPUs with 128/256/512-bit vector support of those
> f,vl,bw,dq,cd,...vpopcntdq ISAs, or AVX10.1-512 ISAs.
> -mavx512vl -mavx512bw -mno-evex512 -mavx10.1-256 would on the other side
> disable all 512-bit vector instructions and in the end just mean the
> same as -mavx10.1-256.
> For just
> -mavx512bw -mno-evex512 -mavx10.1-256
> the question is if that -mno-evex512 turns off also avx512bw/avx512f because
> avx512vl isn't enabled at that point during processing, or if we do that
> only at the end as a special case.  Of course, in this exact case there is
> no difference, because -mavx10.1-256 turns that back on.
> But it would make a difference on
> -mavx512bw -mno-evex512 -mavx512vl
> (when processed right away would disable AVX512BW (because VL isn't on)
> and in the end enable VL,F including EVEX512, or be equivalent to just
> -mavx512bw -mavx512vl if processed at the end, because -mavx512vl implied
> -mevex512 again.
> 
>   Jakub



RE: Intel AVX10.1 Compiler Design and Support

2023-08-22 Thread Jiang, Haochen via Gcc-patches
> -Original Message-
> From: Richard Biener 
> Sent: Tuesday, August 22, 2023 4:36 PM
> To: Jakub Jelinek 
> Cc: Jiang, Haochen ; ZiNgA BuRgA
> ; Hongtao Liu ; gcc-
> patc...@gcc.gnu.org
> Subject: Re: Intel AVX10.1 Compiler Design and Support
> 
> On Tue, Aug 22, 2023 at 10:34 AM Jakub Jelinek  wrote:
> >
> > On Tue, Aug 22, 2023 at 09:36:15AM +0200, Richard Biener via Gcc-patches
> wrote:
> > > I think internally we should have conditional 512bit support work across
> > > AVX512 and AVX10.
> > >
> > > I also think it makes sense to _internally_ have AVX10.1 (10.1!) just
> > > enable the respective AVX512 features.  AVX10.2 would then internally
> > > cover the ISA extensions added in 10.2 only.  Both would reduce the
> > > redundancy and possibly make providing inter-operation between
> > > AVX10.1 (10.1!) and AVX512 to the user easier.  I see AVX 10.1 (10.1!)
> > > just as "re-branding" latest AVX512, so we should treat it that way
> > > (making it an alias to the AVX512 features).
> > >
> > > Whether we want allow -mavx10.1 -mno-avx512cd or whether
> > > we only allow the "positive" -mavx512f -mavx512... (omitting avx512cd)
> > > is an entirely separate
> > > question.  But I think to not wreck the core idea (more interoperability,
> > > here between small/big cores) we absolutely have to
> > > provide a subset of avx10.1 but with disabled 512bit vectors which
> > > effectively means AVX512 with disabled 512bit support.
> >
> > Agreed.  And I still think -mevex512 vs. -mno-evex512 is the best option
> > name to represent whether the effective ISA set allows 512-bit vectors or
> > not.
> 
> Works for me.  Note it also implies mask regs are SImode, not DImode,
> not sure if that relates to evex more than mask reg encodings are all evex ...
> 

Just in case we are not on the same page.

So we are looking forward to an "extended" -m[no-]avx10-max-512bit option,
which can also be used on AVX512. The other basic logic will not change.

BTW, -mevex512 is not a good name since there will be 64 bit mask operations
promoted to EVEX128 in APX, which might cause confusion.

Thx,
Haochen

> >  I think -mavx10.1 -mno-avx512cd should be fine.  And, -mavx10.1-256
> > option IMHO should be in the same spirit to all the others a positive
> enablement,
> > not both positive (enable avx512{f,cd,bw,dq,...} and negative (disallow
> > 512-bit vectors).  So, if one uses -mavx512f -mavx10.1-256, because the
> > former would allow 512-bit vectors, the latter shouldn't disable those again
> > because it isn't a -mno-* option.  Sure, instructions which are specific to
> > AVX10.1 (aren't present in any currently existing AVX512* ISA set) might be
> > enabled only in 128/256 bit variants if we differentiate that level.
> > But, if one uses -mavx2 -mavx10.1-256, because no AVX512* has been enabled
> > it can enable all the AVX10.1 implied AVX512* parts without EVEX.512.
> >
> > Jakub
> >


RE: Intel AVX10.1 Compiler Design and Support

2023-08-21 Thread Jiang, Haochen via Gcc-patches
> -Original Message-
> From: ZiNgA BuRgA 
> Sent: Monday, August 21, 2023 5:27 PM
> To: Richard Biener ; Hongtao Liu
> 
> Cc: Jiang, Haochen ; gcc-patches@gcc.gnu.org
> Subject: Re: Intel AVX10.1 Compiler Design and Support
> 
> Another way (not saying this is better, just throwing out ideas) is to
> break AVX10.1 into all the AVX-512 subsets.
> So you'd have something like -mavx10.1-256-vl, -mavx10.1-512-vbmi etc.
> 
> * -mavx10.1-256  would effectively be an alias for all the 128+256-bit
> subsets, and set the __AVX10_1__ define
> * -mavx512vbmi  would effectively be an alias for `-mavx10.1-128-vbmi
> -mavx10.1-256-vbmi -mavx10.1-512-vbmi` and set the __AVX512VBMI__ define
> (`-mavx10.1-512-vl` might not make much sense unless it implies AVX512F?)
> * -mno-avx512vbmi  would similarly be an alias for
> `-mno-avx10.1-128-vbmi -mno-avx10.1-256-vbmi -mno-avx10.1-512-vbmi`;
> with this, `-mavx10.1-256 -mno-avx512vbmi` would make sense, even if
> unusual (enable all AVX10.1 but disable all VBMI)
> * -mavx10.2-256  would act as a single feature, cementing in AVX10.2
> like the current AVX10.1 proposal, and AVX-512 subsets can't be turned off

I am considering a proposal quite similar to this if we want to change the
design so that it is flexible.

But there are a few proposals on the table. The problem for this proposal
is that if it is a over-design to make each AVX512 feature to split since in 
most
scenarios we just need to keep the vector width as the same.

Thx,
Haochen

> 
> 
> On 21/08/2023 5:36 pm, Richard Biener wrote:
> > On Mon, Aug 21, 2023 at 3:20 AM Hongtao Liu via Gcc-patches
> >  wrote:
> >
> > Yes.  Note we cannot really re-purpose -mprefer-vector-width=256 since that
> > would also make uses of 512bit intrinsics ill-formed.  So we'd need a new
> > flag that would restrict AVX512VL to 256bit, possibly using a common 
> > internal
> > flag for this and the -mavx10.1-256 vector size effect.
> >
> > Maybe -mdisable-vector-width-512 or -mavx512vl-for-avx10.1-256 or
> > -mavx512vl-256?  Writing these the last looks most sensible to me?
> > Note it should combine with -mavx512vl to -mavx512vl-256 to make
> > -march=native -mavx512vl-256 work (I think we should also allow the
> > flag together with -mavx10.1*?)
> >
> > mavx512vl-256
> > Target ...
> > Disable the 512bit vector ISA subset of AVX512 or AVX10, enable
> > the 256bit vector ISA subset of AVX512.
> >
> > Richard.
> 



[r14-2946 Regression] FAIL: gcc.target/i386/pr87007-5.c scan-assembler-times vxorps[^\n\r]*xmm[0-9] 0 on Linux/x86_64

2023-08-15 Thread Jiang, Haochen via Gcc-patches
From: haochen.jiang  
Sent: Tuesday, August 15, 2023 5:26 PM
To: rguent...@suse.de; gcc-regress...@gcc.gnu.org; gcc-patches@gcc.gnu.org; 
Jiang, Haochen 
Subject: [r14-2946 Regression] FAIL: gcc.target/i386/pr87007-5.c 
scan-assembler-times vxorps[^\n\r]*xmm[0-9] 0 on Linux/x86_64

On Linux/x86_64,

46c8c225455273ce7f7da7cc5707aed54f23e78d is the first bad commit
commit 46c8c225455273ce7f7da7cc5707aed54f23e78d
Author: Richard Biener 
Date:   Wed Jul 26 15:23:45 2023 +0200

Improve sinking with unrelated defs

caused

FAIL: gcc.target/i386/pr87007-5.c scan-assembler-times vxorps[^\n\r]*xmm[0-9] 0

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r14-2946/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/pr87007-5.c --target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/pr87007-5.c --target_board='unix{-m32\ 
-march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/pr87007-5.c --target_board='unix{-m64}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/pr87007-5.c --target_board='unix{-m64\ 
-march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at haochen dot jiang at intel.com.)
(If you met problems with cascadelake related, disabling AVX512F in command 
line might save that.)
(However, please make sure that there is no potential problems with AVX512.)


[r14-3148 Regression] FAIL: gcc.dg/vect/bb-slp-subgroups-2.c scan-tree-dump-times slp2 "optimized: basic block" 2 on Linux/x86_64

2023-08-15 Thread Jiang, Haochen via Gcc-patches
From: haochen.jiang  
Sent: Tuesday, August 15, 2023 5:26 PM
To: rguent...@suse.de; gcc-regress...@gcc.gnu.org; gcc-patches@gcc.gnu.org; 
Jiang, Haochen 
Subject: [r14-3148 Regression] FAIL: gcc.dg/vect/bb-slp-subgroups-2.c 
scan-tree-dump-times slp2 "optimized: basic block" 2 on Linux/x86_64

On Linux/x86_64,

3a13884b23ae32b43d56d68a9c6bd4ce53d60017 is the first bad commit commit 
3a13884b23ae32b43d56d68a9c6bd4ce53d60017
Author: Richard Biener 
Date:   Fri Aug 11 12:08:10 2023 +0200

Improve BB vectorization opt-info

caused

FAIL: gcc.dg/vect/bb-slp-subgroups-2.c -flto -ffat-lto-objects  
scan-tree-dump-times slp2 "optimized: basic block" 2
FAIL: gcc.dg/vect/bb-slp-subgroups-2.c scan-tree-dump-times slp2 "optimized: 
basic block" 2

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r14-3148/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/bb-slp-subgroups-2.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/bb-slp-subgroups-2.c 
--target_board='unix{-m64\ -march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at haochen dot jiang at intel.com.)
(If you met problems with cascadelake related, disabling AVX512F in command 
line might save that.)
(However, please make sure that there is no potential problems with AVX512.)


RE: Intel AVX10.1 Compiler Design and Support

2023-08-10 Thread Jiang, Haochen via Gcc-patches
Hi all,

There are lots of discussions on arch level and ABIs and I really appreciate 
that.

For the arch level issue, it might be a little early to discuss and should not 
block
these patches.

For ABI issue, the problem actually comes from the current behavior between
GCC and clang/LLVM are different in return value for m512 w/o 512 bit support.
Then it becomes a question to get unified and we get the whole discussion.
However, it is a corner case.

So let's first focus on the options design and the behavior on that. We could
continue to discuss those two issues after the main behavior is settled down.
Richard has raised some concerns in option combinations. Any other concerns?

Thx,
Haochen

> -Original Message-
> From: Gcc-patches  bounces+haochen.jiang=intel@gcc.gnu.org> On Behalf Of Haochen Jiang via
> Gcc-patches
> Sent: Tuesday, August 8, 2023 3:13 PM
> To: gcc-patches@gcc.gnu.org
> Cc: ubiz...@gmail.com; Liu, Hongtao 
> Subject: Intel AVX10.1 Compiler Design and Support
> 
> Hi all,
> 
> We will send out our initial support of AVX10 and some sample patches in this
> mailing thread. And there will be more coming up afterwards. Therefore, we
> would like to share our proposed AVX10 design in GCC.
> 
> Here is a quick introduction to AVX10:
>   - AVX10 is the first major new ISA since the introduction of AVX512 in 2013.
>   - Since the introduction of AVX10, we would like to establish a common,
> converged vector instruction set across all Intel architectures, including
> Xeon Server, Atom Server and Clients.
>   - The default maximum vector size for AVX10 will be 256 bit, while 512 bit 
> is
> optional.
>   - AVX10.1 will include all existing AVX512 instructions in Granite Rapids.
>   - There will be no new AVX512 CPUID introduced in future. All EVEX vector
> instructions will be under AVX10 umbrella.
>   - AVX10 will be version-based ISA instead of tons of different CPUIDs like
> AVX512BW, AVX512DQ, AVX512FP16, etc.
>   - Based on AVX10.1, AVX10.2 will introduce ymm embedded rounding, SAE
> (Suppressed All Exceptions) control and new instructions.
> 
> If you would like to have a closed look at the details, please follow the 
> links
> below:
> 
> Intel Advanced Vector Extensions 10 (Intel AVX10) Architecture Specification 
> It
> describes the Intel Advanced Vector Extensions 10 Instruction Set 
> Architecture.
> https://cdrdv2.intel.com/v1/dl/getContent/784267
> 
> The Converged Vector ISA: Intel Advanced Vector Extensions 10 Technical Paper 
> It
> provides introductory information regarding the converged vector ISA: Intel
> Advanced Vector Extensions 10.
> https://cdrdv2.intel.com/v1/dl/getContent/784343
> 
> Hence, we will have several compiler design ground rules for AVX10:
>   - AVX10 is a converged ISA feature set.
> We will not provide -m[no-]xxx to enable/disable each single vector 
> feature
> in one version as we used to before. Instead, a simple option 
> -m[no-]avx10.x
> is used. If 512 bit version is needed, -mavx10.x-512 is all you need. 
> Also,
> maximum vector width should be the same when different version of AVX10 is
> used. For example, enabling AVX10.1 with 512 bit vector width while 
> enabling
> AVX10.2 with only 256 bit vector width is not a desired behavior.
>   - AVX10 is an evolving ISA feature set.
> Every feature showed up in the current version will always show up in 
> future
> version.
>   - AVX10 is an independent ISA feature set.
> Although sharing the same instructions and encodings, AVX10 and AVX512 are
> conceptual independent features, which means they are orthogonal.
> 
> Since AVX10 will have several benefits like bringing AVX512 features on Atom
> Server and Clients and getting rid of tons of AVX512 CPUIDs but a simple AVX10
> option to enable features, we lean towards the adoption of AVX10 instead of
> AVX512 from now on.
> 
> Based on all we got, we would like to introduce the following compiler 
> options:
>   - -mavx10.x: The option will enable AVX10.1-AVX10.x features with a default
> 256 bit vector width to make sure the compatibility on all platforms.
>   - -mavx10.x-512: The option will enable AVX10.1-AVX10.x features with 512 
> bit
> vector width. “-mno-avx10.x-512” option will not be provided to avoid
> confusion of disabling 512 vector width or avx10.x itself.
>   - -mavx10.x-256: The option will enable AVX10.1-AVX10.x features with 256 
> bit
> vector width. But it will disable 512 bit vector width since the vector 
> size
> is indicated in option. “-mno-avx10.x-256” option will not be provided to
> keep align with the 512 ones.
>   - -mno-avx10.x: The option will disable all the features introduced 
> >=avx10.x
> (both 256 and 512 bit) and keep features  how
> -mno- options behave previously.
> 
> When there comes an option combination of various vector size indicated (e.g. 
> -
> mavx10.x-512 -mavx10.y-256), we would like to emit a 

RE: Intel AVX10.1 Compiler Design and Support

2023-08-10 Thread Jiang, Haochen via Gcc-patches
> -Original Message-
> From: Jan Beulich 
> Sent: Thursday, August 10, 2023 9:31 PM
> To: Phoebe Wang 
> Cc: Joseph Myers ; Wang, Phoebe
> ; Hongtao Liu ; Jiang, Haochen
> ; gcc-patches@gcc.gnu.org; ubiz...@gmail.com; Liu,
> Hongtao ; Zhang, Annita ;
> x86-64-abi ; llvm-dev  d...@lists.llvm.org>; Craig Topper ; Richard Biener
> 
> Subject: Re: Intel AVX10.1 Compiler Design and Support
> 
> On 10.08.2023 15:12, Phoebe Wang wrote:
> >>  The psABI should have some simple rule covering all of the above I think.
> >
> > psABI has a rule for the case doesn't mean the rule is a well defined
> > ABI in practice. A well defined ABI should guarantee 1) interlinkable
> > across different compile options within the same compiler; 2)
> > interlinkable across different compilers. Both aspects are failed in the 
> > non 512-
> bit version.
> >
> > 1) is more important than 2) and becomes more critical on AVX10 targets.
> > Because we expect AVX10-256 is a general setting for binaries that can
> > run on both AVX10-256 and AVX10-512. It would be common that binaries
> > compiled with AVX10-256 may link with native built binaries on AVX10-512
> targets.

IMO it is not acceptable for AVX10-256 to generate zmm registers.

If I have to choose among the three proposal, the second is better.

But the best choice I suppose is to keep what we are doing currently, which is
passing them in memory and emit a warning. It is a reasonable behavior.

Thx,
Haochen

> 
> But you're only describing a pre-existing problem here afaict. Code compiled 
> with
> -mavx51f passing __m512 type data to a function compiled with only, say, 
> -maxv2
> won't interoperate properly either. What's worse, imo the psABI doesn't
> sufficiently define what __m256 etc actually are. After all these aren't types
> defined by the C standard (as opposed to at least most other types in the
> respective table there), and you can't really make assumptions like "this is 
> what
> certain compilers think this is".
> 
> Jan


RE: Intel AVX10.1 Compiler Design and Support

2023-08-09 Thread Jiang, Haochen via Gcc-patches
> -Original Message-
> From: Richard Biener 
> Sent: Tuesday, August 8, 2023 8:45 PM
> To: Jiang, Haochen 
> Cc: Jakub Jelinek ; gcc-patches@gcc.gnu.org;
> ubiz...@gmail.com; Liu, Hongtao 
> Subject: Re: Intel AVX10.1 Compiler Design and Support
> 
> On Tue, Aug 8, 2023 at 10:15 AM Jiang, Haochen via Gcc-patches  patc...@gcc.gnu.org> wrote:
> >
> > Hi Jakub,
> >
> > > So, what does this imply for the current ISAs?
> >
> > AVX10 will imply AVX2 on the ISA level. And we suppose AVX10 is an
> > independent ISA feature set. Although sharing the same instructions
> > and encodings, AVX10 and AVX512 are conceptual independent features,
> > which means they are orthogonal.
> >
> > > The expectations in lots of config/i386/* is that -mavx512f /
> > > TARGET_AVX512F means 512 bit vector support is available and most of
> > > the various -mavx512XXX options imply -mavx512f (and -mno-avx512f
> > > turns those off).  And if -mavx512vl / TARGET_AVX512VL isn't
> > > available, tons of places just use 512-bit EVEX instructions for
> > > 256-bit or 128-bit stuff (mostly to be able to access [xy]mm16+).
> >
> > For AVX10, the 128/256/scalar version of the instructions are always
> > there, and also for [xy]mm16+. 512 version is "optional", which needs
> > user to indicate them in options. When 512 version is enabled,
> > 128/256/scalar version is also enabled, which is kind of reverse relation
> > between the current AVX512F/AVX512VL.
> >
> > Since we take AVX10 and AVX512 are orthogonal, we will add OR logic
> > for the current pattern, which is shown in our AVX512DQ+VL sample patches.
> 
> Hmm, so it sounds like AVX10 is currently, at the 10.1 level, a way to specify
> AVX512F and AVX512VL "differently", so wouldn't it make sense to make it
> complement those only so one can use, say, -mavx10 -mno-avx512bf16 to disable
> parts of the former AVX512 ISA one doesn't like to get code generated for?
> -mavx10 would then enable all the existing sub-AVX512 ISAs?
>

We take AVX10 and AVX512 two independent ISAs.

Therefore, it is quite weird to disable something with another unrelated ISA.
I don't think -mavx10.1 -mno-avx512f should disable anything.

Thx,
Haochen

> > > Sure, I expect all AVX10.N CPUs will have AVX512VL CPUID, will they
> > > have AVX512F CPUID even when the 512-bit vectors aren't present?
> > > What happens if one mixes the -mavx10* options together with
> > > -mno-avx512vl or similar options?  Will -mno-avx512f still imply 
> > > -mno-avx512vl etc.?
> >
> > For the CPUID part, AVX10 and AVX512 have different emulation. Only
> > Xeon Server will have AVX512 related CPUIDs for backward
> > compatibility. For GNR, it will be AVX512F, AVX512VL, AVX512CD,
> > AVX512BW, AVX512DQ, AVX512_IFMA, AVX512_VBMI, AVX512_VNNI,
> > AVX512_BF16, AVX512_BITALG, AVX512_VPOPCNTDQ, AV512_VBMI2,
> > AVX512_FP16. Also, it will have AVX10 CPUIDs with 512 bit support set. Atom
> Server and client will only have AVX10 CPUIDs with 256 bit support set.
> >
> > -mno-avx512f will still imply -mno-avx512vl.
> >
> > As we mentioned below, we don't recommend users to combine the AVX10
> > and legacy
> > AVX512 options. We understand that there will be different opinions on
> > what should compiler behave on some controversial option combinations.
> >
> > If there is someone mixes the options, the golden rule is that we are using 
> > OR logic.
> > Therefore, enabling either feature will turn on the shared
> > instructions, no matter the other feature is not mentioned or closed.
> > That is why we are emitting warning for some scenarios, which is also
> > mentioned in the letter.
> 
> I'm refraining from commenting on the senslesness of AVX10 as you're likely on
> the same receiving side as us.
> 
> Thanks,
> Richard.
> 
> > Thx,
> > Haochen
> >
> > >
> > >   Jakub
> >


RE: Intel AVX10.1 Compiler Design and Support

2023-08-09 Thread Jiang, Haochen via Gcc-patches
> -Original Message-
> From: Richard Biener 
> Sent: Wednesday, August 9, 2023 1:38 PM
> To: Phoebe Wang 
> Cc: Hongtao Liu ; Joseph Myers
> ; Jiang, Haochen ; gcc-
> patc...@gcc.gnu.org; ubiz...@gmail.com; Liu, Hongtao
> ; Zhang, Annita ; Wang,
> Phoebe ; x86-64-abi  a...@googlegroups.com>; llvm-dev ; Craig Topper
> 
> Subject: Re: Intel AVX10.1 Compiler Design and Support
> 
> 
> 
> > Am 09.08.2023 um 06:02 schrieb Phoebe Wang via Gcc-patches  patc...@gcc.gnu.org>:
> >
> > I have some proposals about unifying ABI on AVX10 for both 256-bit
> > and 512-bit.
> >
> >
> >
> > Proposal 1: Promote attribute from AVX10-256 to AVX10-512 for any
> > function which has 512-bit or above vectors in passing/returning arguments.
> >
> > Problem: Binary cannot run on AVX10-256 only target.
> >
> > Reason:
> >
> > When user tries to pass/return 512-bit vector, they should be aware of
> > it will become target dependent. User should be taught not to use it
> > on 256-bit targets and there will be unexpected things happening if
> > they insist.
> >
> > Actually, ICC and MSVC already have chosen to promote for the argument:
> > https://godbolt.org/z/vcrf9qW5z I think if compiler have to choose the
> > misbehavior between fail in result and crash due to illegal
> > instruction, the latter is definitely better than the former.
> >
> > In this way, we can also declare x86-64-v5 is inherit from x86-64-v4
> > and has the interaction with previous versions.
> >
> >
> >
> > Proposal 2: Abort compilation when user tries to pass/return 512-bit
> > vectors.
> >
> > Reason: This turns possible run time crash into compile time error.
> >
> >
> >
> > Proposal 3: Change the ABI of 512-bit vector and always be
> > passed/returned from memory.
> 
> I don’t think we can realistically change the ABI.  If we could passing them 
> in two
> 256bit registers would be possible as well.
> 
> Note I fully expect intel to turn around and implement 512 bits on a 256 but 
> data
> path on the E cores in 5 years.  And it will take at least that time for 
> AVX10 to take
> off (look at AVX512 for this and how they cautionously chose to include bf16 
> to
> cut off Zen4).  So IMHO we shouldn’t worry at all and just wait and see for 
> AVX42
> to arrive.

Let me try to clarify the whole thing.

I suppose Phoebe's "change" is based on LLVM.

In GCC, current behavior is to pass 512 bit vector in memory when there is no
512 bit support. But when there is support, everything should be passed in 
register.

In AVX10, I prefer to still keep to this pattern. But if most of you want to 
change it,
I have no objection since AVX10 is a new start.

Thx,
Haochen

> 
> Richard
> 
> > Reason: We expect AVX10-256 is a universal configuration and in most
> > scenarios, 512-bit vector won't bring performance improvements. So we
> > can sacrifice a little 512-bit performance to achieve the interaction
> > between
> > AVX10-256 and AVX10-512. In this way, there won't have any runtime
> > issue in the future either.
> >
> >
> >
> > Thanks
> >
> > Phoebe
> >
> > Hongtao Liu  于2023年8月9日周三 10:18写道:
> >
> >>> On Wed, Aug 9, 2023 at 10:14 AM Hongtao Liu  wrote:
> >>>
> >>> On Wed, Aug 9, 2023 at 9:21 AM Hongtao Liu  wrote:
> 
>  On Wed, Aug 9, 2023 at 3:55 AM Joseph Myers
>  
> >> wrote:
> >
> > Do you have any comments on the interaction of AVX10 with the
> > micro-architecture levels defined in the ABI (and supported with
> > glibc-hwcaps directories in glibc)?  Given that the levels are
> >> cumulative,
> > should we take it that any future levels will be ones supporting
> >> 512-bit
> > vector width for AVX10 (because x86-64-v4 requires the current
> >> AVX512F,
> > AVX512BW, AVX512CD, AVX512DQ and AVX512VL) - and so any future
> >> processors
> > that only support 256-bit vector width will be considered to match
> >> the
> > x86-64-v3 micro-architecture level but not any higher level?
>  This is actually something we really want to discuss in the
>  community, our proposal for x86-64-v5: AVX10.2-256(Implying AVX10.1-
> 256) + APX.
>  One big reason is Intel E-core will only support AVX10 256-bit, if
>  we want to use x86-64-v5 accross  server and client, it's better to
>  256-bit default.
> >>> + ABI and LLVM folked for this topic.
> >> s/folked/folks/
> >>
> >
> > --
> > Joseph S. Myers
> > jos...@codesourcery.com
> 
> 
> 
>  --
>  BR,
>  Hongtao
> >>>
> >>>
> >>>
> >>> --
> >>> BR,
> >>> Hongtao
> >>
> >>
> >>
> >> --
> >> BR,
> >> Hongtao
> >>
> >> --
> >> You received this message because you are subscribed to the Google
> >> Groups
> >> "X86-64 System V Application Binary Interface" group.
> >> To unsubscribe from this group and stop receiving emails from it,
> >> send an email to x86-64-abi+unsubscr...@googlegroups.com.
> >> To view this discussion on the web visit
> >> https://groups.google.com/d/msgid/x86-64-abi/CAMZc-bzj5971PJ4UN2aB4LB
> >> 

RE: Intel AVX10.1 Compiler Design and Support

2023-08-08 Thread Jiang, Haochen via Gcc-patches
Hi Jakub,

> So, what does this imply for the current ISAs?

AVX10 will imply AVX2 on the ISA level. And we suppose AVX10 is an
independent ISA feature set. Although sharing the same instructions and
encodings, AVX10 and AVX512 are conceptual independent features, which
means they are orthogonal.

> The expectations in lots of config/i386/* is that -mavx512f / TARGET_AVX512F
> means 512 bit vector support is available and most of the various -mavx512XXX
> options imply -mavx512f (and -mno-avx512f turns those off).  And if
> -mavx512vl / TARGET_AVX512VL isn't available, tons of places just use
> 512-bit EVEX instructions for 256-bit or 128-bit stuff (mostly to be able to
> access [xy]mm16+).

For AVX10, the 128/256/scalar version of the instructions are always there, and
also for [xy]mm16+. 512 version is "optional", which needs user to indicate them
in options. When 512 version is enabled, 128/256/scalar version is also enabled,
which is kind of reverse relation between the current AVX512F/AVX512VL.

Since we take AVX10 and AVX512 are orthogonal, we will add OR logic for the 
current
pattern, which is shown in our AVX512DQ+VL sample patches.

> Sure, I expect all AVX10.N CPUs will have AVX512VL CPUID, will they have
> AVX512F CPUID even when the 512-bit vectors aren't present? What happens if
> one mixes the -mavx10* options together with -mno-avx512vl or similar
> options?  Will -mno-avx512f still imply -mno-avx512vl etc.?

For the CPUID part, AVX10 and AVX512 have different emulation. Only Xeon Server
will have AVX512 related CPUIDs for backward compatibility. For GNR, it will be
AVX512F, AVX512VL, AVX512CD, AVX512BW, AVX512DQ, AVX512_IFMA, AVX512_VBMI,
AVX512_VNNI, AVX512_BF16, AVX512_BITALG, AVX512_VPOPCNTDQ, AV512_VBMI2,
AVX512_FP16. Also, it will have AVX10 CPUIDs with 512 bit support set. Atom 
Server and
client will only have AVX10 CPUIDs with 256 bit support set.

-mno-avx512f will still imply -mno-avx512vl.

As we mentioned below, we don't recommend users to combine the AVX10 and legacy
AVX512 options. We understand that there will be different opinions on what 
should
compiler behave on some controversial option combinations.

If there is someone mixes the options, the golden rule is that we are using OR 
logic.
Therefore, enabling either feature will turn on the shared instructions, no 
matter the other
feature is not mentioned or closed. That is why we are emitting warning for 
some scenarios,
which is also mentioned in the letter.

Thx,
Haochen

> 
>   Jakub



RE: [r14-2639 Regression] FAIL: gcc.dg/vect/bb-slp-pr95839-v8.c scan-tree-dump slp2 "optimized: basic block" on Linux/x86_64

2023-07-20 Thread Jiang, Haochen via Gcc-patches
> -Original Message-
> From: Richard Biener 
> Sent: Thursday, July 20, 2023 9:28 PM
> To: Maciej W. Rozycki 
> Cc: haochen.jiang ; gcc-
> regress...@gcc.gnu.org; gcc-patches@gcc.gnu.org; Jiang, Haochen
> 
> Subject: Re: [r14-2639 Regression] FAIL: gcc.dg/vect/bb-slp-pr95839-v8.c
> scan-tree-dump slp2 "optimized: basic block" on Linux/x86_64
> 
> On Thu, Jul 20, 2023 at 3:13 PM Maciej W. Rozycki 
> wrote:
> >
> > On Thu, 20 Jul 2023, Richard Biener wrote:
> >
> > > > c1e420549f2305efb70ed37e693d380724eb7540 is the first bad commit
> > > > commit c1e420549f2305efb70ed37e693d380724eb7540
> > > > Author: Maciej W. Rozycki 
> > > > Date:   Wed Jul 19 11:59:29 2023 +0100
> > > >
> > > > testsuite: Add 64-bit vector variant for bb-slp-pr95839.c
> > >
> > > I think the issue is we disable V2SF on ia32 because of the conflict
> > > with MMX which we don't want to use.
> >
> >  I'm not sure if I have a way to test with such a target.  Would you
> > expect:
> >
> > /* { dg-require-effective-target vect64 } */
> >
> > to cover it?  If so, then I'll put it back as in the original version
> > and post for Haochen to verify.

I suppose just commit to trunk and it should be ok since it is only -m32 issue.

Thx,
Haochen

> 
> Yeah, that should work here.
> 
> Richard.
> 
> >   Maciej


RE: [x86 PATCH] Fix FAIL of gcc.target/i386/pr91681-1.c

2023-07-16 Thread Jiang, Haochen via Gcc-patches
> -Original Message-
> From: Jiang, Haochen
> Sent: Friday, July 14, 2023 10:50 AM
> To: Roger Sayle ; gcc-patches@gcc.gnu.org
> Cc: 'Uros Bizjak' 
> Subject: RE: [x86 PATCH] Fix FAIL of gcc.target/i386/pr91681-1.c
> 
> > The recent change in TImode parameter passing on x86_64 results in the
> > FAIL of pr91681-1.c.  The issue is that with the extra flexibility,
> > the combine pass is now spoilt for choice between using either the
> > *add3_doubleword_concat or the *add3_doubleword_zext
> > patterns, when one operand is a *concat and the other is a zero_extend.
> > The solution proposed below is provide an
> > *add3_doubleword_concat_zext define_insn_and_split, that can
> > benefit both from the register allocation of *concat, and still avoid
> > the xor normally required by zero extension.
> >
> > I'm investigating a follow-up refinement to improve register
> > allocation further by avoiding the early clobber in the =, and
> > handling (custom) reloads explicitly, but this piece resolves the testcase
> failure.
> >
> > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> > and make -k check, both with and without --target_board=unix{-m32}
> > with no new failures.  Ok for mainline?
> >
> >
> > 2023-07-11  Roger Sayle  
> >
> > gcc/ChangeLog
> > PR target/91681
> > * config/i386/i386.md (*add3_doubleword_concat_zext): New
> > define_insn_and_split derived from *add3_doubleword_concat
> > and *add3_doubleword_zext.
> 
> Hi Roger,
> 
> This commit currently changed the codegen of testcase p443644-2.c from:

Oops, a typo, I mean pr43644-2.c.

Haochen

> 
> movq%rdx, %rax
> xorl%edx, %edx
> addq%rdi, %rax
> adcq%rsi, %rdx
> to:
> 
> movq%rdx, %rcx
> movq%rdi, %rax
> movq%rsi, %rdx
> addq%rcx, %rax
> adcq$0, %rdx
> 
> which causes the testcase fail under -m64.
> 
> Is this within your expectation?
> 
> BRs,
> Haochen
> 
> >
> >
> > Thanks,
> > Roger
> > --



RE: [x86 PATCH] Fix FAIL of gcc.target/i386/pr91681-1.c

2023-07-13 Thread Jiang, Haochen via Gcc-patches
> The recent change in TImode parameter passing on x86_64 results in the FAIL
> of pr91681-1.c.  The issue is that with the extra flexibility, the combine 
> pass is
> now spoilt for choice between using either the
> *add3_doubleword_concat or the *add3_doubleword_zext
> patterns, when one operand is a *concat and the other is a zero_extend.
> The solution proposed below is provide an
> *add3_doubleword_concat_zext define_insn_and_split, that can
> benefit both from the register allocation of *concat, and still avoid the xor
> normally required by zero extension.
> 
> I'm investigating a follow-up refinement to improve register allocation
> further by avoiding the early clobber in the =, and handling (custom)
> reloads explicitly, but this piece resolves the testcase failure.
> 
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap and
> make -k check, both with and without --target_board=unix{-m32} with no
> new failures.  Ok for mainline?
> 
> 
> 2023-07-11  Roger Sayle  
> 
> gcc/ChangeLog
> PR target/91681
> * config/i386/i386.md (*add3_doubleword_concat_zext): New
> define_insn_and_split derived from *add3_doubleword_concat
> and *add3_doubleword_zext.

Hi Roger,

This commit currently changed the codegen of testcase p443644-2.c from:

movq%rdx, %rax
xorl%edx, %edx
addq%rdi, %rax
adcq%rsi, %rdx
to:

movq%rdx, %rcx
movq%rdi, %rax
movq%rsi, %rdx
addq%rcx, %rax
adcq$0, %rdx

which causes the testcase fail under -m64.

Is this within your expectation?

BRs,
Haochen

> 
> 
> Thanks,
> Roger
> --



[r14-2462 Regression] FAIL: libgomp.c++/../libgomp.c-c++-common/alloc-12.c execution test on Linux/x86_64

2023-07-13 Thread Jiang, Haochen via Gcc-patches
On Linux/x86_64,

450b05ce54d3f08c583c3b5341233ce0df99725b is the first bad commit commit 
450b05ce54d3f08c583c3b5341233ce0df99725b
Author: Tobias Burnus 
Date:   Wed Jul 12 13:50:21 2023 +0200

libgomp: Use libnuma for OpenMP's partition=nearest allocation trait

caused


with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r14-2462/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
RUNTESTFLAGS="c++.exp=libgomp.c-c++-common/alloc-11.c 
--target_board='unix{-m32}'"
$ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
RUNTESTFLAGS="c++.exp=libgomp.c-c++-common/alloc-11.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
RUNTESTFLAGS="c++.exp=libgomp.c-c++-common/alloc-12.c 
--target_board='unix{-m32}'"
$ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
RUNTESTFLAGS="c++.exp=libgomp.c-c++-common/alloc-12.c 
--target_board='unix{-m32\ -march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at haochen dot jiang at intel.com.) (If you met problems with cascadelake 
related, disabling AVX512F in command line might save that.) (However, please 
make sure that there is no potential problems with AVX512.)


RE: [r14-2314 Regression] FAIL: gcc.target/i386/pr100711-2.c scan-assembler-times vpandn 8 on Linux/x86_64

2023-07-07 Thread Jiang, Haochen via Gcc-patches
> -Original Message-
> From: Hongtao Liu 
> Sent: Friday, July 7, 2023 3:55 PM
> To: Beulich, Jan 
> Cc: haochen.jiang ; Jiang, Haochen
> ; gcc-regress...@gcc.gnu.org; gcc-
> patc...@gcc.gnu.org; Liu, Hongtao 
> Subject: Re: [r14-2314 Regression] FAIL: gcc.target/i386/pr100711-2.c scan-
> assembler-times vpandn 8 on Linux/x86_64
> 
> On Fri, Jul 7, 2023 at 3:50 PM Hongtao Liu  wrote:
> >
> > On Fri, Jul 7, 2023 at 3:50 PM Jan Beulich  wrote:
> > >
> > > On 07.07.2023 09:46, Hongtao Liu wrote:
> > > > On Fri, Jul 7, 2023 at 3:18 PM Jan Beulich via Gcc-regression
> > > >  wrote:
> > > >>
> > > >> On 06.07.2023 13:57, haochen.jiang wrote:
> > > >>> On Linux/x86_64,
> > > >>>
> > > >>> e007369c8b67bcabd57c4fed8cff2a6db82e78e6 is the first bad commit
> > > >>> commit e007369c8b67bcabd57c4fed8cff2a6db82e78e6
> > > >>> Author: Jan Beulich 
> > > >>> Date:   Wed Jul 5 09:49:16 2023 +0200
> > > >>>
> > > >>> x86: yet more PR target/100711-like splitting
> > > >>>
> > > >>> caused
> > > >>>
> > > >>> FAIL: gcc.target/i386/pr100711-1.c scan-assembler-times pandn 2
> > > >>> FAIL: gcc.target/i386/pr100711-2.c scan-assembler-times vpandn 8
> > > >>
> > > >> I expect the same applies here - -mno-avx512f (or -mno-avx512vl)
> > > >> might
> > > > For this one, we can just add -mno-avx512f to the testcase,it aims
> > > > to optimize pandn for avx2 target.
> > > >> address this failure. But whether that's really the way to go I'm
> > > >> not sure of. Plus of course such adjustments should have been
> > > >> done ahead of time, when it was decided that testing with certain
> > > >> -march= settings is a goal. My changes have merely uncovered the
> prior omissions.
> > > > It's not a standard request, it's just our private tester which is
> > > > used to find gcc bugs and miss-optimizations.
> > > > It sometimes generates false positive reports (usually adding
> > > > -mno-avx512f to the testcase can fix that), hope that's not too
> > > > annoying.
> > >
> > > Wouldn't that then better be done once uniformly for all affected
> > > tests, rather than being discovered piecemeal?
> This also prevents us from finding potential problems.

Yes, -march=cascadelake actually opens AVX512F related features. It sometimes
show the potential problems while sometimes false positive.

I will add a hint in the script email.

Thx,
Haochen

> > >
> > > Anyway, in this case: Since you said you'd take care of the other
> > > test, will/can you do so for the two ones here as well, or am I on the 
> > > hook?
> > I'll do that.
> > >
> > > Jan
> >
> >
> >
> > --
> > BR,
> > Hongtao
> 
> 
> 
> --
> BR,
> Hongtao


RE: [COMMITTED] i386: Use 2x-wider modes when emulating QImode vector instructions

2023-05-25 Thread Jiang, Haochen via Gcc-patches
> gcc/ChangeLog:
> 
> * config/i386/i386-expand.cc (ix86_expand_vecop_qihi2):
> Rewrite to expand to 2x-wider (e.g. V16QI -> V16HImode)
> instructions when available.  Emulate truncation via
> ix86_expand_vec_perm_const_1 when native truncate insn
> is not available.
> (ix86_expand_vecop_qihi_partial) : Use pmovzx
> when available.  Trivially rename some variables.
> (ix86_expand_vecop_qihi): Unconditionally call ix86_expand_vecop_qihi2.

Hi Uros,

I suppose you pushed wrong patch to trunk.

On trunk, we see this:

@@ -23409,9 +23457,7 @@ ix86_expand_vecop_qihi (enum rtx_code code, rtx dest, 
rtx op1, rtx op2)
   && ix86_expand_vec_shift_qihi_constant (code, dest, op1, op2))
 return;

-  if (TARGET_AVX512BW
-  && VECTOR_MODE_P (GET_MODE (op2))
-  && ix86_expand_vecop_qihi2 (code, dest, op1, op2))
+  if (0 && ix86_expand_vecop_qihi2 (code, dest, op1, op2))
 return;

   switch (qimode)

It should not be if (0 && ix86_expand_vecop_qihi2 (code, dest, op1, op2))

The patch in this thread is correct, where is:

@@ -23409,9 +23457,7 @@ ix86_expand_vecop_qihi (enum rtx_code code, rtx dest, 
rtx op1, rtx op2)
   && ix86_expand_vec_shift_qihi_constant (code, dest, op1, op2))
 return;
 
-  if (TARGET_AVX512BW
-  && VECTOR_MODE_P (GET_MODE (op2))
-  && ix86_expand_vecop_qihi2 (code, dest, op1, op2))
+  if (ix86_expand_vecop_qihi2 (code, dest, op1, op2))
 return;
 
   switch (qimode)

Thx,
Haochen

> * config/i386/i386.cc (ix86_multiplication_cost): Rewrite cost
> calculation of V*QImode emulations to account for generation of
> 2x-wider mode instructions.
> (ix86_shift_rotate_cost): Update cost calculation of V*QImode
> emulations to account for generation of 2x-wider mode instructions.
> 
> gcc/testsuite/ChangeLog:
> 
> * gcc.target/i386/avx512vl-pr95488-1.c: Revert 2023-05-18 change.
> 
> Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.
> 
> Uros.


RE: [PATCH] i386: Share AES xmm intrin with VAES

2023-04-18 Thread Jiang, Haochen via Gcc-patches
> > a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index
> > 33e281901cf..e7d565a8389 100644
> > --- a/gcc/config/i386/sse.md
> > +++ b/gcc/config/i386/sse.md
> > @@ -25107,67 +25107,71 @@
> >
> > ;;
> > ;;
> >
> >  (define_insn "aesenc"
> > -  [(set (match_operand:V2DI 0 "register_operand" "=x,x")
> > -   (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x")
> > -  (match_operand:V2DI 2 "vector_operand" "xBm,xm")]
> > +  [(set (match_operand:V2DI 0 "register_operand" "=x,x,v")
> > +   (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x,v")
> > +  (match_operand:V2DI 2 "vector_operand"
> > + "xBm,xm,vm")]
> >   UNSPEC_AESENC))]
> > -  "TARGET_AES"
> > +  "TARGET_AES || (TARGET_VAES && TARGET_AVX512VL)"
> >"@
> > aesenc\t{%2, %0|%0, %2}
> > +   vaesenc\t{%2, %1, %0|%0, %1, %2}
> > vaesenc\t{%2, %1, %0|%0, %1, %2}"
> > -  [(set_attr "isa" "noavx,avx")
> > +  [(set_attr "isa" "noavx,aes,avx512vl")
> Shouldn't it be vaes_avx512vl and then remove " || (TARGET_VAES &&
> TARGET_AVX512VL)" from condition.

Since VAES should not imply AES, we need that "|| (TARGET_VAES && 
TARGET_AVX512VL)"

And there is no need to add vaes_avx512vl since the last alternative will only
be hit when there is no aes. When there is no aes, the pattern will need vaes
and avx512vl both or we could not use this pattern. avx512vl here is just like
a placeholder.

BRs,
Haochen

> Similar for below patterns.
> Others LGTM.
> > (set_attr "type" "sselog1")
> > (set_attr "prefix_extra" "1")
> > -   (set_attr "prefix" "orig,vex")
> > -   (set_attr "btver2_decode" "double,double")
> > +   (set_attr "prefix" "orig,vex,evex")
> > +   (set_attr "btver2_decode" "double,double,double")
> > (set_attr "mode" "TI")])
> >
> >  (define_insn "aesenclast"
> > -  [(set (match_operand:V2DI 0 "register_operand" "=x,x")
> > -   (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x")
> > -  (match_operand:V2DI 2 "vector_operand" "xBm,xm")]
> > +  [(set (match_operand:V2DI 0 "register_operand" "=x,x,v")
> > +   (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x,v")
> > +  (match_operand:V2DI 2 "vector_operand"
> > + "xBm,xm,vm")]
> >   UNSPEC_AESENCLAST))]
> > -  "TARGET_AES"
> > +  "TARGET_AES || (TARGET_VAES && TARGET_AVX512VL)"
> >"@
> > aesenclast\t{%2, %0|%0, %2}
> > +   vaesenclast\t{%2, %1, %0|%0, %1, %2}
> > vaesenclast\t{%2, %1, %0|%0, %1, %2}"
> > -  [(set_attr "isa" "noavx,avx")
> > +  [(set_attr "isa" "noavx,aes,avx512vl")
> > (set_attr "type" "sselog1")
> > (set_attr "prefix_extra" "1")
> > -   (set_attr "prefix" "orig,vex")
> > -   (set_attr "btver2_decode" "double,double")
> > +   (set_attr "prefix" "orig,vex,evex")
> > +   (set_attr "btver2_decode" "double,double,double")
> > (set_attr "mode" "TI")])
> >
> >  (define_insn "aesdec"
> > -  [(set (match_operand:V2DI 0 "register_operand" "=x,x")
> > -   (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x")
> > -  (match_operand:V2DI 2 "vector_operand" "xBm,xm")]
> > +  [(set (match_operand:V2DI 0 "register_operand" "=x,x,v")
> > +   (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x,v")
> > +  (match_operand:V2DI 2 "vector_operand"
> > + "xBm,xm,vm")]
> >   UNSPEC_AESDEC))]
> > -  "TARGET_AES"
> > +  "TARGET_AES || (TARGET_VAES && TARGET_AVX512VL)"
> >"@
> > aesdec\t{%2, %0|%0, %2}
> > +   vaesdec\t{%2, %1, %0|%0, %1, %2}
> > vaesdec\t{%2, %1, %0|%0, %1, %2}"
> > -  [(set_attr "isa" "noavx,avx")
> > +  [(set_attr "isa" "noavx,aes,avx512vl")
> > (set_attr "type" "sselog1")
> > (set_attr "prefix_extra" "1")
> > -   (set_attr "prefix" "orig,vex")
> > -   (set_attr "btver2_decode" "double,double")
> > +   (set_attr "prefix" "orig,vex,evex")
> > +   (set_attr "btver2_decode" "double,double,double")
> > (set_attr "mode" "TI")])
> >
> >  (define_insn "aesdeclast"
> > -  [(set (match_operand:V2DI 0 "register_operand" "=x,x")
> > -   (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x")
> > -  (match_operand:V2DI 2 "vector_operand" "xBm,xm")]
> > +  [(set (match_operand:V2DI 0 "register_operand" "=x,x,v")
> > +   (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x,v")
> > +  (match_operand:V2DI 2 "vector_operand"
> > + "xBm,xm,vm")]
> >   UNSPEC_AESDECLAST))]
> > -  "TARGET_AES"
> > +  "TARGET_AES || (TARGET_VAES && TARGET_AVX512VL)"
> >"@
> > aesdeclast\t{%2, %0|%0, %2}
> > +   vaesdeclast\t{%2, %1, %0|%0, %1, %2}
> > vaesdeclast\t{%2, %1, %0|%0, %1, %2}"
> > -  [(set_attr "isa" "noavx,avx")
> > +  [(set_attr "isa" "noavx,aes,avx512vl")
> > (set_attr "type" "sselog1")
> > (set_attr "prefix_extra" "1")
> > -   (set_attr 

[r13-5971 Regression] FAIL: gcc.target/i386/pr108774.c (test for excess errors) on Linux/x86_64

2023-02-14 Thread Jiang, Haochen via Gcc-patches
On Linux/x86_64,

a33e3dcbd15e73603796e30b5eeec11a0c8bacec is the first bad commit commit 
a33e3dcbd15e73603796e30b5eeec11a0c8bacec
Author: Vladimir N. Makarov 
Date:   Mon Feb 13 16:05:04 2023 -0500

RA: Clear reg equiv caller_save_p flag when clearing defined_p flag

caused

FAIL: gcc.target/i386/pr108774.c (test for excess errors)

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r13-5971/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/pr108774.c --target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/pr108774.c --target_board='unix{-m32\ 
-march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at haochen dot jiang at intel.com)


[r13-5318 Regression] FAIL: g++.dg/init/new51.C -std=c++98 (test for excess errors) on Linux/x86_64

2023-01-25 Thread Jiang, Haochen via Gcc-patches
This is the recent regression on gcc trunk.

Seems it got fixed. If that is the case, plz ignore that.

As mentioned in previous thread, before the mail system got fixed on server, I 
will manually forward this email.

BRs,
Haochen

> -Original Message-
> From: haochen.jiang 
> Sent: Wednesday, January 25, 2023 7:27 AM
> To: ja...@redhat.com; gcc-regress...@gcc.gnu.org; gcc-patches@gcc.gnu.org;
> Jiang, Haochen 
> Subject: [r13-5318 Regression] FAIL: g++.dg/init/new51.C -std=c++98 (test for
> excess errors) on Linux/x86_64
> 
> On Linux/x86_64,
> 
> 049a52909075117f5112971cc83952af2a818bc1 is the first bad commit commit
> 049a52909075117f5112971cc83952af2a818bc1
> Author: Jason Merrill 
> Date:   Mon Jan 23 16:25:07 2023 -0500
> 
> c++: TARGET_EXPR collapsing [PR107303]
> 
> caused
> 
> FAIL: g++.dg/init/new51.C  -std=c++14 (test for excess errors)
> FAIL: g++.dg/init/new51.C  -std=c++17 (test for excess errors)
> FAIL: g++.dg/init/new51.C  -std=c++20 (test for excess errors)
> FAIL: g++.dg/init/new51.C  -std=c++98 (test for excess errors)
> 
> with GCC configured with
> 
> ../../gcc/configure --prefix=/export/users/haochenj/src/gcc-
> bisect/master/master/r13-5318/usr --enable-clocale=gnu --with-system-zlib --
> with-demangler-in-ld --with-fpmath=sse --enable-languages=c,c++,fortran --
> enable-cet --without-isl --enable-libmpx x86_64-linux --disable-bootstrap
> 
> To reproduce:
> 
> $ cd {build_dir}/gcc && make check
> RUNTESTFLAGS="dg.exp=g++.dg/init/new51.C --target_board='unix{-m32}'"
> $ cd {build_dir}/gcc && make check
> RUNTESTFLAGS="dg.exp=g++.dg/init/new51.C --target_board='unix{-m32\ -
> march=cascadelake}'"
> 
> (Please do not reply to this email, for question about this report, contact 
> me at
> haochen dot jiang at intel.com)


[r13-5244 Regression] FAIL: gcc.dg/analyzer/SARD-tc841-basic-00182-min.c (test for excess errors) on Linux/x86_64

2023-01-18 Thread Jiang, Haochen via Gcc-patches
The mail system is still broken on that machine, still sending this manually. 
Before that mail down, I will keep check the script daily to see if there is 
new regression.

BTW, since there is a Bugzilla for r13-5202 regression, not resending that 
report

On Linux/x86_64,

c6a09bfa038ccbfc9f123ede14a3d6237fab is the first bad commit
commit c6a09bfa038ccbfc9f123ede14a3d6237fab
Author: David Malcolm dmalc...@redhat.com
Date:   Wed Jan 18 11:41:47 2023 -0500

analyzer: add SARD testsuite 81

caused

FAIL: gcc.dg/analyzer/SARD-tc841-basic-00182-min.c (test for excess errors)

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r13-5244/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="analyzer.exp=gcc.dg/analyzer/SARD-tc841-basic-00182-min.c 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="analyzer.exp=gcc.dg/analyzer/SARD-tc841-basic-00182-min.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="analyzer.exp=gcc.dg/analyzer/SARD-tc841-basic-00182-min.c 
--target_board='unix{-m64}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="analyzer.exp=gcc.dg/analyzer/SARD-tc841-basic-00182-min.c 
--target_board='unix{-m64\ -march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at haochen dot jiang at intel.com)


[r13-5092 Regression] FAIL: gcc.dg/tree-ssa/ssa-dse-46.c (test for excess errors) on Linux/x86_64

2023-01-10 Thread Jiang, Haochen via Gcc-patches
Hi all,



This is the bisect result for the latest regression which fail to send to 
mailing list.



It seems that the mail command in s-nail went down after my machine got 
upgraded, still investigating why.



On Linux/x86_64,



4e0b504f26f78ff02e80ad98ebbf8ded3aa6ffa1 is the first bad commit

commit 4e0b504f26f78ff02e80ad98ebbf8ded3aa6ffa1

Author: Richard Biener mailto:rguent...@suse.de>>

Date:   Tue Jan 10 13:48:51 2023 +0100



tree-optimization/106293 - missed DSE with virtual LC PHI



caused



FAIL: gcc.dg/tree-ssa/ssa-dse-46.c (test for excess errors)



with GCC configured with



../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r13-5092/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap



To reproduce:



$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/ssa-dse-46.c 
--target_board='unix{-m32}'"

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/ssa-dse-46.c 
--target_board='unix{-m32\ -march=cascadelake}'"



BRs,

Haochen


RE: [wwwdocs] gcc-13: Mention Intel new ISA and march support.

2022-11-15 Thread Jiang, Haochen via Gcc-patches
Hi Gerald,

I will remove "to GCC" here if there is no more comment from others on Thursday.
For me it is reasonable.

Thx,
Haochen

> -Original Message-
> From: Gerald Pfeifer 
> Sent: Monday, November 14, 2022 9:56 PM
> To: Jiang, Haochen 
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: [wwwdocs] gcc-13: Mention Intel new ISA and march support.
> 
> On Thu, 10 Nov 2022, Haochen Jiang via Gcc-patches wrote:
> > +  New ISA extension support for Intel AVX-IFMA was added to GCC.
> 
> Here and in the other cases I'd skip "to GCC". This is clear from the context
> (this being the GCC release notes :-) and makes it shorter.
> 
> Gerald


RE: [PATCH] Support Intel CMPccXADD

2022-10-24 Thread Jiang, Haochen via Gcc-patches
> -Original Message-
> From: Gcc-patches  bounces+haochen.jiang=intel@gcc.gnu.org> On Behalf Of Haochen Jiang
> via Gcc-patches
> Sent: Monday, October 24, 2022 5:01 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Liu, Hongtao 
> Subject: [PATCH] Support Intel CMPccXADD
> 
> Hi all,
> 
> I just refined CMPccXADD patch to make the enum in order intrin file aligned
> with how opcode does.
> 

I just found a testcase issue not fixed for the enum, will send the fixed patch
soon.

> Ok for trunk?
> 
> BRs,
> Haochen
> 
> gcc/ChangeLog:
> 
> * common/config/i386/cpuinfo.h (get_available_features):
>   Detect cmpccxadd.
>   * common/config/i386/i386-common.cc
>   (OPTION_MASK_ISA2_CMPCCXADD_SET,
>   OPTION_MASK_ISA2_CMPCCXADD_UNSET): New.
>   (ix86_handle_option): Handle -mcmpccxadd, unset cmpccxadd when
> avx2
>   is disabled.
> * common/config/i386/i386-cpuinfo.h (enum processor_features):
>   Add FEATURE_CMPCCXADD.
> * common/config/i386/i386-isas.h: Add ISA_NAME_TABLE_ENTRY for
>   cmpccxadd.
>   * config.gcc: Add cmpccxaddintrin.h.
>   * config/i386/cpuid.h (bit_CMPCCXADD): New.
>   * config/i386/i386-builtin-types.def:
>   Add DEF_FUNCTION_TYPE(INT, PINT, INT, INT, INT)
>   and DEF_FUNCTION_TYPE(LONGLONG, PLONGLONG, LONGLONG,
> LONGLONG, INT).
>   * config/i386/i386-builtin.def (BDESC): Add new builtins.
>   * config/i386/i386-c.cc (ix86_target_macros_internal): Define
>   __CMPCCXADD__.
>   * config/i386/i386-expand.cc (ix86_expand_special_args_builtin):
>   Add new parameter to indicate constant position.
>   Handle INT_FTYPE_PINT_INT_INT_INT
>   and LONGLONG_FTYPE_PLONGLONG_LONGLONG_LONGLONG_INT.
>   * config/i386/i386-isa.def (CMPCCXADD): Add DEF_PTA(CMPCCXADD).
>   * config/i386/i386-options.cc (isa2_opts): Add -mcmpccxadd.
>   (ix86_valid_target_attribute_inner_p): Handle cmpccxadd.
>   * config/i386/i386.opt: Add option -mcmpccxadd.
>   * config/i386/sync.md (cmpccxadd_): New define insn.
>   * config/i386/x86gprintrin.h: Include cmpccxaddintrin.h.
>   * doc/extend.texi: Document cmpccxadd.
>   * doc/invoke.texi: Document -mcmpccxadd.
>   * doc/sourcebuild.texi: Document target cmpccxadd.
>   * config/i386/cmpccxaddintrin.h: New file.
> 
> gcc/testsuite/ChangeLog:
> 
>   * g++.dg/other/i386-2.C: Add -mcmpccxadd.
>   * g++.dg/other/i386-3.C: Ditto.
>   * gcc.target/i386/avx-1.c: Add builtin define for enum.
>   * gcc.target/i386/funcspec-56.inc: Add new target attribute.
>   * gcc.target/i386/sse-13.c: Add builtin define for enum.
>   * gcc.target/i386/sse-23.c: Ditto.
>   * gcc.target/i386/x86gprintrin-1.c: Add -mcmpccxadd for 64 bit target.
>   * gcc.target/i386/x86gprintrin-2.c: Add -mcmpccxadd for 64 bit target.
>   Add builtin define for enum.
>   * gcc.target/i386/x86gprintrin-3.c: Add -mcmpccxadd for 64 bit target.
>   * gcc.target/i386/x86gprintrin-4.c: Add mcmpccxadd for 64 bit target.
>   * gcc.target/i386/x86gprintrin-5.c: Add mcpmccxadd for 64 bit target.
>   Add builtin define for enum.
>   * gcc.target/i386/cmpccxadd-1.c: New test.
>   * gcc.target/i386/cmpccxadd-2.c: New test.
> ---
>  gcc/common/config/i386/cpuinfo.h  |   2 +
>  gcc/common/config/i386/i386-common.cc |  15 ++
>  gcc/common/config/i386/i386-cpuinfo.h |   1 +
>  gcc/common/config/i386/i386-isas.h|   1 +
>  gcc/config.gcc|   3 +-
>  gcc/config/i386/cmpccxaddintrin.h |  89 +++
>  gcc/config/i386/cpuid.h   |   1 +
>  gcc/config/i386/i386-builtin-types.def|   4 +
>  gcc/config/i386/i386-builtin.def  |   4 +
>  gcc/config/i386/i386-c.cc |   2 +
>  gcc/config/i386/i386-expand.cc|  22 ++-
>  gcc/config/i386/i386-isa.def  |   1 +
>  gcc/config/i386/i386-options.cc   |   4 +-
>  gcc/config/i386/i386.opt  |   5 +
>  gcc/config/i386/sync.md   |  42 ++
>  gcc/config/i386/x86gprintrin.h|   2 +
>  gcc/doc/extend.texi   |   5 +
>  gcc/doc/invoke.texi   |  10 +-
>  gcc/doc/sourcebuild.texi  |   3 +
>  gcc/testsuite/g++.dg/other/i386-2.C   |   2 +-
>  gcc/testsuite/g++.dg/other/i386-3.C   |   2 +-
>  gcc/testsuite/gcc.target/i386/avx-1.c |   4 +
>  gcc/testsuite/gcc.target/i386/cmpccxadd-1.c   |  61 
>  gcc/testsuite/gcc.target/i386/cmpccxadd-2.c   | 138 ++
>  gcc/testsuite/gcc.target/i386/funcspec-56.inc |   2 +
>  gcc/testsuite/gcc.target/i386/sse-13.c|   6 +-
>  gcc/testsuite/gcc.target/i386/sse-23.c|   6 +-
>  .../gcc.target/i386/x86gprintrin-1.c  |   2 +-
>  .../gcc.target/i386/x86gprintrin-2.c  |   6 +-
>  

RE: [PATCH 1/2] Add a parameter for the builtin function of prefetch to align with LLVM

2022-10-20 Thread Jiang, Haochen via Gcc-patches



> -Original Message-
> From: Segher Boessenkool 
> Sent: Friday, October 21, 2022 2:54 AM
> To: Jiang, Haochen 
> Cc: gcc-patches@gcc.gnu.org; rguent...@suse.de; Liu, Hongtao
> ; ubiz...@gmail.com; richard.earns...@arm.com;
> richard.sandif...@arm.com; marcus.shawcr...@arm.com;
> kyrylo.tkac...@arm.com; r...@gcc.gnu.org; g...@amylaar.uk;
> claz...@synopsys.com; ni...@redhat.com; ramana.radhakrish...@arm.com;
> aol...@gcc.gnu.org; hubi...@ucw.cz; mfort...@gmail.com;
> dje@gmail.com; li...@gcc.gnu.org; uweig...@de.ibm.com;
> kreb...@linux.ibm.com; olege...@gcc.gnu.org; da...@redhat.com;
> ebotca...@libertysurf.fr; jeffreya...@gmail.com; dave.ang...@bell.net
> Subject: Re: [PATCH 1/2] Add a parameter for the builtin function of prefetch
> to align with LLVM
> 
> On Thu, Oct 20, 2022 at 07:34:13AM +, Jiang, Haochen wrote:
> > > > +  /* Argument 3 must be either zero or one.  */
> > > > +  if (INTVAL (op3) != 0 && INTVAL (op3) != 1)
> > > > +{
> > > > +  warning (0, "invalid fourth argument to %<__builtin_prefetch%>;"
> > > > +   " using one");
> > >
> > > "using 1" makes sense maybe, but "using one" reads as "using an
> > > argument", not very sane.
> > >
> > > An error would be better here anyway?
> >
> > Will change to 1 to avoid confusion in that. The reason why this is a 
> > warning
> > is because previous ones related to constant arguments out of range in
> prefetch
> > are also using warning.
> 
> Please don't repeat historical mistakes.  You might not want to fix the
> existing code (since that can in theory break existing user code), but
> that is not a reason to punish users of a new feature as well ;-)
> 
> > > Please use a separate pattern for this, and leave prefetch to mean data
> > > prefetch, as documented!  Documentation you didn't change btw.  Call
> the
> > > new one instruction_prefetch or something equally boring maybe :-)
> >
> > Actually I changed documentation for prefetch but it is flooded in the patch
> > (Sorry for that).
> 
> Oh huh, I looked for it but didn't find it.  Another argument for making
> better patch series ;-)
> 
> > 1. Previously we are using parameter to indicate r/w and locality in 
> > prefetch.
> I
> > suppose it is quite similar in this case. Since the pattern is already 
> > there, I
> prefer
> > reusing them.
> 
> You can use the data prefetch RTL code for all data loads just as well,
> it is more closely related than this -- but most people would call that
> insanity!

Maybe you got me here. I suppose I will write another patch for a new RTL to see
which implementation is better.

Thx,
Haochen

> 
> 
> Segher


RE: [PATCH 1/2] Add a parameter for the builtin function of prefetch to align with LLVM

2022-10-20 Thread Jiang, Haochen via Gcc-patches
> -Original Message-
> From: Segher Boessenkool 
> Sent: Thursday, October 20, 2022 5:07 AM
> To: Jiang, Haochen 
> Cc: gcc-patches@gcc.gnu.org; rguent...@suse.de; Liu, Hongtao
> ; ubiz...@gmail.com; richard.earns...@arm.com;
> richard.sandif...@arm.com; marcus.shawcr...@arm.com;
> kyrylo.tkac...@arm.com; r...@gcc.gnu.org; g...@amylaar.uk;
> claz...@synopsys.com; ni...@redhat.com; ramana.radhakrish...@arm.com;
> aol...@gcc.gnu.org; hubi...@ucw.cz; mfort...@gmail.com;
> dje@gmail.com; li...@gcc.gnu.org; uweig...@de.ibm.com;
> kreb...@linux.ibm.com; olege...@gcc.gnu.org; da...@redhat.com;
> ebotca...@libertysurf.fr; jeffreya...@gmail.com; dave.ang...@bell.net
> Subject: Re: [PATCH 1/2] Add a parameter for the builtin function of prefetch
> to align with LLVM
> 
> On Fri, Oct 14, 2022 at 04:34:05PM +0800, Haochen Jiang wrote:
> > * config/s390/s390.cc (s390_expand_cpymem): Generate fourth
> parameter for
> 
> (Many too long lines here, this is the first one.  Changelog lines are
> max. 80 positions; a tab is eight).

I will change that in next patch.

> 
> > +  /* Argument 3 must be either zero or one.  */
> > +  if (INTVAL (op3) != 0 && INTVAL (op3) != 1)
> > +{
> > +  warning (0, "invalid fourth argument to %<__builtin_prefetch%>;"
> > +   " using one");
> 
> "using 1" makes sense maybe, but "using one" reads as "using an
> argument", not very sane.
> 
> An error would be better here anyway?

Will change to 1 to avoid confusion in that. The reason why this is a warning
is because previous ones related to constant arguments out of range in prefetch
are also using warning.

/* Argument 2 must be 0, 1, 2, or 3.  */
  if (INTVAL (op2) < 0 || INTVAL (op2) > 3)
{
  warning (0, "invalid third argument to %<__builtin_prefetch%>; using 
zero");
  op2 = const0_rtx;
}

Therefore I use warning to align with them.

> 
> > --- a/gcc/config/rs6000/rs6000.md
> > +++ b/gcc/config/rs6000/rs6000.md
> > @@ -14060,10 +14060,25 @@
> >DONE;
> >  })
> >
> > -(define_insn "prefetch"
> > +(define_expand "prefetch"
> > +  [(prefetch (match_operand 0 "indexed_or_indirect_address")
> > +(match_operand:SI 1 "const_int_operand")
> > +(match_operand:SI 2 "const_int_operand")
> > +(match_operand:SI 3 "const_int_operand"))]
> > +  ""
> > +{
> > +  if (INTVAL (operands[3]) == 0)
> > +  {
> 
> Broken indentation.

I will fix that in updated patch.

> 
> > +warning (0, "instruction prefetch is not supported; using data 
> > prefetch");
> 
> Please use a separate pattern for this, and leave prefetch to mean data
> prefetch, as documented!  Documentation you didn't change btw.  Call the
> new one instruction_prefetch or something equally boring maybe :-)
> 

Actually I changed documentation for prefetch but it is flooded in the patch
(Sorry for that).

In gcc/doc/rtl.texi

-@item (prefetch:@var{m} @var{addr} @var{rw} @var{locality})
+@item (prefetch:@var{m} @var{addr} @var{rw} @var{locality} @var{cache})
 
+Operand @var{cache} is 1 if the prefetch is prefetching data, 0 for prefetching
+instruction;
+targets that do not support instruction prefetch should treat all as data
+prefetch.
 
And for the implementation on the instruction prefetch, actually I have thought
of that way previously. But I chose the way how patch current goes for the
following reasons.

1. Previously we are using parameter to indicate r/w and locality in prefetch. I
suppose it is quite similar in this case. Since the pattern is already there, I 
prefer
reusing them.

2. It will be more natural for developers to extend their prefetch in future.

If anyone have points, welcome further discussion on that.

> When you send an updated patch, please split it up better?  Generic
> changes and documentation in one patch, target changes in a separate
> patch or patches, and testsuite is distinct as well.  It isn't nice to
> have to scroll through thousands of lines to see if there is anything
> relevant to you.

Really sorry for that. Hongtao has explained the reason for why we arrange
this patch and I will split the testcase to another patch.

Also if the change on testsuites on this patch change to minimal change,
the patch will be much smaller than current one.

BRs,
Haochen

> 
> Thanks,
> 
> 
> Segher


RE: [PATCH 1/2] Add a parameter for the builtin function of prefetch to align with LLVM

2022-10-19 Thread Jiang, Haochen via Gcc-patches
> -Original Message-
> From: Segher Boessenkool 
> Sent: Thursday, October 20, 2022 5:14 AM
> To: Andrew Pinski 
> Cc: Jiang, Haochen ; gcc-patches@gcc.gnu.org;
> aol...@gcc.gnu.org; richard.sandif...@arm.com; uweig...@de.ibm.com;
> li...@gcc.gnu.org; g...@amylaar.uk; dje@gmail.com;
> olege...@gcc.gnu.org; claz...@synopsys.com; mfort...@gmail.com;
> da...@redhat.com; dave.ang...@bell.net; hubi...@ucw.cz;
> richard.earns...@arm.com; rguent...@suse.de;
> marcus.shawcr...@arm.com; ramana.radhakrish...@arm.com; Liu, Hongtao
> 
> Subject: Re: [PATCH 1/2] Add a parameter for the builtin function of prefetch
> to align with LLVM
> 
> On Wed, Oct 19, 2022 at 10:14:28AM -0700, Andrew Pinski wrote:
> > Do the testcases really need to be changed rather than adding new
> testcases?
> > Usually it is better if the testcases not change unless really needed
> > to be. That is do these testcases pass without being changed? If not
> > this seems not backwards compatible change and is not something which
> > we should do.  Otherwise you should just add new testcases instead.
> 
> Yes, that is another reason why adding parameters to random builtins is not a
> good idea :-)  s/random/only vaguely related/, if you want.
> 
> This also makes all existing code using these builtins invalid.  If you need 
> such
> testcase changes, that is a red flag.
> 

Maybe the testcase change cause some misunderstanding and concern.

Actually, the patch did not disrupt the previous builtins, as the 
builtin_prefetch
uses vargs. I set the default value of the new parameter as data prefetch, which
means that if we are not using the fourth parameter, just like how we use
prefetch previously, it is still what it is.

The reason why I did the most of the testcase change is to make it looks more
completed at the parameter side. I could take back that change on adding
parameter in current testcases just add tests related to new parameter, which
is a minimal change to current test I suppose.

BRs,
Haochen

> 
> Segher


RE: [r13-3219 Regression] FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxwq 2 on Linux/x86_64

2022-10-17 Thread Jiang, Haochen via Gcc-patches
Yes, the mail service on script machine was down previously after expected 
reboot
and it just recovered but still ran into some problems when sending previously 
email.

That is why this is the only stuck mail got sent and sorry for the disturb.

> -Original Message-
> From: Hongtao Liu 
> Sent: Monday, October 17, 2022 4:53 PM
> To: Jiang, Haochen 
> Cc: gcc-patches@gcc.gnu.org; gcc-regress...@gcc.gnu.org;
> andre.simoesdiasvie...@arm.com
> Subject: Re: [r13-3219 Regression] FAIL: gcc.target/i386/pr92658-sse4.c scan-
> assembler-times pmovzxwq 2 on Linux/x86_64
> 
> This should be already fixed.
> 
> On Mon, Oct 17, 2022 at 4:34 PM haochen.jiang via Gcc-patches  patc...@gcc.gnu.org> wrote:
> >
> > On Linux/x86_64,
> >
> > 25413fdb2ac24933214123e24ba165026452a6f2 is the first bad commit
> > commit 25413fdb2ac24933214123e24ba165026452a6f2
> > Author: Andre Vieira 
> > Date:   Tue Oct 11 10:49:27 2022 +0100
> >
> > vect: Teach vectorizer how to handle bitfield accesses
> >
> > caused
> >
> > FAIL: gcc.target/i386/pr101668.c scan-assembler vpmovsxdq
> > FAIL: gcc.target/i386/pr92658-avx2-2.c scan-assembler-times pmovsxbd 2
> > FAIL: gcc.target/i386/pr92658-avx2-2.c scan-assembler-times pmovsxbq 2
> > FAIL: gcc.target/i386/pr92658-avx2-2.c scan-assembler-times pmovsxbw 2
> > FAIL: gcc.target/i386/pr92658-avx2-2.c scan-assembler-times pmovsxdq 2
> > FAIL: gcc.target/i386/pr92658-avx2-2.c scan-assembler-times pmovsxwd 2
> > FAIL: gcc.target/i386/pr92658-avx2-2.c scan-assembler-times pmovsxwq 2
> > FAIL: gcc.target/i386/pr92658-avx2.c scan-assembler-times pmovzxbd 2
> > FAIL: gcc.target/i386/pr92658-avx2.c scan-assembler-times pmovzxbq 2
> > FAIL: gcc.target/i386/pr92658-avx2.c scan-assembler-times pmovzxbw 2
> > FAIL: gcc.target/i386/pr92658-avx2.c scan-assembler-times pmovzxdq 2
> > FAIL: gcc.target/i386/pr92658-avx2.c scan-assembler-times pmovzxwd 2
> > FAIL: gcc.target/i386/pr92658-avx2.c scan-assembler-times pmovzxwq 2
> > FAIL: gcc.target/i386/pr92658-avx512bw-2.c scan-assembler-times
> > pmovsxbd 2
> > FAIL: gcc.target/i386/pr92658-avx512bw-2.c scan-assembler-times
> > pmovsxbq 2
> > FAIL: gcc.target/i386/pr92658-avx512bw-2.c scan-assembler-times
> > pmovsxbw 2
> > FAIL: gcc.target/i386/pr92658-avx512bw-2.c scan-assembler-times
> > pmovsxdq 2
> > FAIL: gcc.target/i386/pr92658-avx512bw-2.c scan-assembler-times
> > pmovsxwd 2
> > FAIL: gcc.target/i386/pr92658-avx512bw-2.c scan-assembler-times
> > pmovsxwq 2
> > FAIL: gcc.target/i386/pr92658-avx512bw.c scan-assembler-times pmovzxbd
> > 2
> > FAIL: gcc.target/i386/pr92658-avx512bw.c scan-assembler-times pmovzxbq
> > 2
> > FAIL: gcc.target/i386/pr92658-avx512bw.c scan-assembler-times
> pmovzxbw
> > 2
> > FAIL: gcc.target/i386/pr92658-avx512bw.c scan-assembler-times pmovzxdq
> > 2
> > FAIL: gcc.target/i386/pr92658-avx512bw.c scan-assembler-times
> pmovzxwd
> > 2
> > FAIL: gcc.target/i386/pr92658-avx512bw.c scan-assembler-times
> pmovzxwq
> > 2
> > FAIL: gcc.target/i386/pr92658-avx512bw-trunc.c scan-assembler-times
> > vpmovwb 3
> > FAIL: gcc.target/i386/pr92658-avx512f.c scan-assembler-times vpmovdb 1
> > FAIL: gcc.target/i386/pr92658-avx512f.c scan-assembler-times vpmovdw 1
> > FAIL: gcc.target/i386/pr92658-avx512f.c scan-assembler-times vpmovqb 1
> > FAIL: gcc.target/i386/pr92658-avx512f.c scan-assembler-times vpmovqd 1
> > FAIL: gcc.target/i386/pr92658-avx512f.c scan-assembler-times vpmovqw 1
> > FAIL: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovdb
> > 2
> > FAIL: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovdw
> > 2
> > FAIL: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovqb[
> > \t]*%xmm 1
> > FAIL: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovqb[
> > \t]*%ymm 1
> > FAIL: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovqd
> > 2
> > FAIL: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovqw
> > 2
> > FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxbd 2
> > FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxbq 2
> > FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxbw 2
> > FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxdq 2
> > FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxwd 2
> > FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxwq 2
> > FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxbd 2
> > FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxbq 2
> > FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxbw 2
> > FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxdq 2
> > FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxwd 2
> > FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxwq 2
> >
> > with GCC configured with
> >
> > ../../gcc/configure
> > --prefix=/export/users/haochenj/src/gcc-bisect/master/master/r13-3219/
> > usr --enable-clocale=gnu --with-system-zlib 

RE: [PATCH 2/6] Support Intel AVX-VNNI-INT8

2022-10-17 Thread Jiang, Haochen via Gcc-patches
> -Original Message-
> From: Hongtao Liu 
> Sent: Monday, October 17, 2022 12:05 PM
> To: Jiang, Haochen 
> Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao 
> Subject: Re: [PATCH 2/6] Support Intel AVX-VNNI-INT8
> 
> On Fri, Oct 14, 2022 at 3:57 PM Haochen Jiang via Gcc-patches  patc...@gcc.gnu.org> wrote:
> >
> > From: Kong Lingling 
> >
> > gcc/ChangeLog
> >
> > * common/config/i386/cpuinfo.h (get_available_features): Detect
> > avxvnniint8.
> > * common/config/i386/i386-common.cc
> > (OPTION_MASK_ISA2_AVXVNNIINT8_SET): New.
> > (OPTION_MASK_ISA2_AVXVNNIINT8_UNSET): Ditto.
> > (ix86_handle_option): Handle -mavxvnniint8.
> > * common/config/i386/i386-cpuinfo.h (enum processor_features):
> > Add FEATURE_AVXVNNIINT8.
> > * common/config/i386/i386-isas.h: Add ISA_NAME_TABLE_ENTRY for
> > avxvnniint8.
> > * config.gcc: Add avxvnniint8intrin.h.
> > * config/i386/avxvnniint8intrin.h: New file.
> > * config/i386/cpuid.h (bit_AVXVNNIINT8): New.
> > * config/i386/i386-builtin.def: Add new builtins.
> > * config/i386/i386-c.cc (ix86_target_macros_internal): Define
> > __AVXVNNIINT8__.
> > * config/i386/i386-options.cc (isa2_opts): Add -mavxvnniint8.
> > (ix86_valid_target_attribute_inner_p): Handle avxvnniint8.
> > * config/i386/i386-isa.def: Add DEF_PTA(AVXVNNIINT8) New..
> > * config/i386/i386.opt: Add option -mavxvnniint8.
> > * config/i386/immintrin.h: Include avxvnniint8intrin.h.
> > * config/i386/sse.md
> > (vpdp_): New define_insn.
> > * doc/extend.texi: Document avxvnniint8.
> > * doc/invoke.texi: Document -mavxvnniint8.
> > * doc/sourcebuild.texi: Document target avxvnniint8.
> >
> > gcc/testsuite/ChangeLog
> >
> > * g++.dg/other/i386-2.C: Add -mavxvnniint8.
> > * g++.dg/other/i386-3.C: Ditto.
> > * gcc.target/i386/avx-check.h: Add avxvnniint8 check.
> > * gcc.target/i386/sse-12.c: Add -mavxvnniint8.
> > * gcc.target/i386/sse-13.c: Ditto.
> > * gcc.target/i386/sse-14.c: Ditto.
> > * gcc.target/i386/sse-22.c: Ditto.
> > * gcc.target/i386/sse-23.c: Ditto.
> > * gcc.target/i386/funcspec-56.inc: Add new target attribute.
> > * lib/target-supports.exp
> > (check_effective_target_avxvnniint8): New.
> > * gcc.target/i386/avxvnniint8-1.c: Ditto.
> > * gcc.target/i386/avxvnniint8-vpdpbssd-2.c: Ditto.
> > * gcc.target/i386/avxvnniint8-vpdpbssds-2.c: Ditto.
> > * gcc.target/i386/avxvnniint8-vpdpbsud-2.c: Ditto.
> > * gcc.target/i386/avxvnniint8-vpdpbsuds-2.c: Ditto.
> > * gcc.target/i386/avxvnniint8-vpdpbuud-2.c: Ditto.
> > * gcc.target/i386/avxvnniint8-vpdpbuuds-2.c: Ditto.
> >
> > Co-authored-by: Hongyu Wang 
> > Co-authored-by: Haochen Jiang 
> > ---
> >  gcc/common/config/i386/cpuinfo.h  |   2 +
> >  gcc/common/config/i386/i386-common.cc |  22 ++-
> >  gcc/common/config/i386/i386-cpuinfo.h |   1 +
> >  gcc/common/config/i386/i386-isas.h|   2 +
> >  gcc/config.gcc|   2 +-
> >  gcc/config/i386/avxvnniint8intrin.h   | 138 ++
> >  gcc/config/i386/cpuid.h   |   1 +
> >  gcc/config/i386/i386-builtin.def  |  14 ++
> >  gcc/config/i386/i386-c.cc |   2 +
> >  gcc/config/i386/i386-isa.def  |   1 +
> >  gcc/config/i386/i386-options.cc   |   4 +-
> >  gcc/config/i386/i386.opt  |   5 +
> >  gcc/config/i386/immintrin.h   |   2 +
> >  gcc/config/i386/sse.md|  31 
> >  gcc/doc/extend.texi   |   5 +
> >  gcc/doc/invoke.texi   |   9 +-
> >  gcc/doc/sourcebuild.texi  |   3 +
> >  gcc/testsuite/g++.dg/other/i386-2.C   |   2 +-
> >  gcc/testsuite/g++.dg/other/i386-3.C   |   2 +-
> >  gcc/testsuite/gcc.target/i386/avx-check.h |   3 +
> >  gcc/testsuite/gcc.target/i386/avxvnniint8-1.c |  43 ++
> > .../gcc.target/i386/avxvnniint8-vpdpbssd-2.c  |  72 +
> > .../gcc.target/i386/avxvnniint8-vpdpbssds-2.c |  72 +
> > .../gcc.target/i386/avxvnniint8-vpdpbsud-2.c  |  72 +
> > .../gcc.target/i386/avxvnniint8-vpdpbsuds-2.c |  72 +
> > .../gcc.target/i386/avxvnniint8-vpdpbuud-2.c  |  72 +
> > .../gcc.target/i386/avxvnniint8-vpdpbuuds-2.c |  72 +
> >  gcc/testsuite/gcc.target/i386/funcspec-56.inc |   2 +
> >  gcc/testsuite/gcc.target/i386/sse-12.c|   2 +-
> >  gcc/testsuite/gcc.target/i386/sse-13.c|   2 +-
> >  gcc/testsuite/gcc.target/i386/sse-14.c|   2 +-
> >  gcc/testsuite/gcc.target/i386/sse-22.c|   4 +-
> >  gcc/testsuite/gcc.target/i386/sse-23.c|   2 +-
> >  

RE: [r13-3172 Regression] FAIL:libgomp.oacc-c../../libgomp.oacc-c-c..-common/kernels-loop-g.c -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable -O2 (test for excess errors) on Linux/x86_64

2022-10-17 Thread Jiang, Haochen via Gcc-patches
Hi Rozenfeld,

I just checkout to your commit and the test still got failed.

It is reporting like this:
xgcc: error: 
/export/users2/haochenj/src/gcc/master/./libgomp/testsuite/libgomp.oacc-c++/../libgomp.oacc-c-c++-common/kernels-loop-g.c:
 '-fcompare-debug' failure (length)

Also fix a typo in manually sending, should be this to reproduce

To reproduce:

$ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
RUNTESTFLAGS="c++.exp=libgomp.oacc-c-c++-common/kernels-loop-g.c 
--target_board='unix{-m32}'"
$ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
RUNTESTFLAGS="c++.exp=libgomp.oacc-c-c++-common/kernels-loop-g.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
RUNTESTFLAGS="c++.exp=libgomp.oacc-c-c++-common/kernels-loop-g.c 
--target_board='unix{-m64}'"
$ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
RUNTESTFLAGS="c++.exp=libgomp.oacc-c-c++-common/kernels-loop-g.c 
--target_board='unix{-m64\ -march=cascadelake}'"

BRs,
Haochen

From: Jiang, Haochen
Sent: Monday, October 17, 2022 1:41 PM
To: Eugene Rozenfeld ; gcc-patches@gcc.gnu.org; 
gcc-regress...@gcc.gnu.org
Subject: RE: [r13-3172 Regression] 
FAIL:libgomp.oacc-c../../libgomp.oacc-c-c..-common/kernels-loop-g.c 
-DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable -O2 (test for 
excess errors) on Linux/x86_64

If that has been fixed, just ignore that mail.

It is run through by a script and got the result few days ago. However, the 
sendmail
service was down on that machine and I just noticed that issue. So I sent that 
result
manually today in case that is not fixed.

Sorry for the disturb!

BRs,
Haochen

From: Eugene Rozenfeld 
mailto:eugene.rozenf...@microsoft.com>>
Sent: Monday, October 17, 2022 1:23 PM
To: Jiang, Haochen mailto:haochen.ji...@intel.com>>; 
gcc-patches@gcc.gnu.org; 
gcc-regress...@gcc.gnu.org
Subject: RE: [r13-3172 Regression] 
FAIL:libgomp.oacc-c../../libgomp.oacc-c-c..-common/kernels-loop-g.c 
-DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable -O2 (test for 
excess errors) on Linux/x86_64

That commit had a bug that was fixed in 
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=80f414e6d73f9f1683f93d83ce63a6a482e54bee

Was that fix included in your GCC build?

From: Jiang, Haochen mailto:haochen.ji...@intel.com>>
Sent: Sunday, October 16, 2022 8:09 PM
To: gcc-patches@gcc.gnu.org; Eugene Rozenfeld 
mailto:eugene.rozenf...@microsoft.com>>; Jiang, 
Haochen mailto:haochen.ji...@intel.com>>; 
gcc-regress...@gcc.gnu.org
Subject: [EXTERNAL] [r13-3172 Regression] 
FAIL:libgomp.oacc-c../../libgomp.oacc-c-c..-common/kernels-loop-g.c 
-DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable -O2 (test for 
excess errors) on Linux/x86_64

You don't often get email from 
haochen.ji...@intel.com. Learn why this is 
important
On Linux/x86_64,

f30e9fd33e56a5a721346ea6140722e1b193db42 is the first bad commit
commit f30e9fd33e56a5a721346ea6140722e1b193db42
Author: Eugene Rozenfeld mailto:ero...@microsoft.com>>
Date:   Thu Apr 21 16:43:24 2022 -0700

Set discriminators for call stmts on the same line within the same basic 
block.

caused

FAIL: libgomp.oacc-c../../libgomp.oacc-c-c..-common/kernels-loop-g.c 
-DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable  -O2  (test for 
excess errors)

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r13-2288/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="c++.exp=libgomp.oacc-c-c++-common/kernels-loop-g.c 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="c++.exp=libgomp.oacc-c-c++-common/kernels-loop-g.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="c++.exp=libgomp.oacc-c-c++-common/kernels-loop-g.c 
--target_board='unix{-m64}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="c++.exp=libgomp.oacc-c-c++-common/kernels-loop-g.c 
--target_board='unix{-m64\ -march=cascadelake}'"



RE: [r13-3172 Regression] FAIL:libgomp.oacc-c../../libgomp.oacc-c-c..-common/kernels-loop-g.c -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable -O2 (test for excess errors) on Linux/x86_64

2022-10-16 Thread Jiang, Haochen via Gcc-patches
If that has been fixed, just ignore that mail.

It is run through by a script and got the result few days ago. However, the 
sendmail
service was down on that machine and I just noticed that issue. So I sent that 
result
manually today in case that is not fixed.

Sorry for the disturb!

BRs,
Haochen

From: Eugene Rozenfeld 
Sent: Monday, October 17, 2022 1:23 PM
To: Jiang, Haochen ; gcc-patches@gcc.gnu.org; 
gcc-regress...@gcc.gnu.org
Subject: RE: [r13-3172 Regression] 
FAIL:libgomp.oacc-c../../libgomp.oacc-c-c..-common/kernels-loop-g.c 
-DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable -O2 (test for 
excess errors) on Linux/x86_64

That commit had a bug that was fixed in 
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=80f414e6d73f9f1683f93d83ce63a6a482e54bee

Was that fix included in your GCC build?

From: Jiang, Haochen mailto:haochen.ji...@intel.com>>
Sent: Sunday, October 16, 2022 8:09 PM
To: gcc-patches@gcc.gnu.org; Eugene Rozenfeld 
mailto:eugene.rozenf...@microsoft.com>>; Jiang, 
Haochen mailto:haochen.ji...@intel.com>>; 
gcc-regress...@gcc.gnu.org
Subject: [EXTERNAL] [r13-3172 Regression] 
FAIL:libgomp.oacc-c../../libgomp.oacc-c-c..-common/kernels-loop-g.c 
-DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable -O2 (test for 
excess errors) on Linux/x86_64

You don't often get email from 
haochen.ji...@intel.com. Learn why this is 
important
On Linux/x86_64,

f30e9fd33e56a5a721346ea6140722e1b193db42 is the first bad commit
commit f30e9fd33e56a5a721346ea6140722e1b193db42
Author: Eugene Rozenfeld mailto:ero...@microsoft.com>>
Date:   Thu Apr 21 16:43:24 2022 -0700

Set discriminators for call stmts on the same line within the same basic 
block.

caused

FAIL: libgomp.oacc-c../../libgomp.oacc-c-c..-common/kernels-loop-g.c 
-DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable  -O2  (test for 
excess errors)

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r13-2288/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="c++.exp=libgomp.oacc-c-c++-common/kernels-loop-g.c 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="c++.exp=libgomp.oacc-c-c++-common/kernels-loop-g.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="c++.exp=libgomp.oacc-c-c++-common/kernels-loop-g.c 
--target_board='unix{-m64}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="c++.exp=libgomp.oacc-c-c++-common/kernels-loop-g.c 
--target_board='unix{-m64\ -march=cascadelake}'"



[r13-3212 Regression] FAIL: gcc.dg/tree-ssa/forwprop-19.c scan-tree-dump-not forwprop1 .VEC_PERM_EXPR. on Linux/x86_64

2022-10-16 Thread Jiang, Haochen via Gcc-patches
On Linux/x86_64,

b88adba751da635c6f0c353c5bc51bbe2ecf4c89 is the first bad commit
commit b88adba751da635c6f0c353c5bc51bbe2ecf4c89
Author: Liwei Xu liwei...@intel.com
Date:   Fri Sep 23 13:46:02 2022 +0800

Optimize nested permutation to single VEC_PERM_EXPR [PR54346]

caused

FAIL: gcc.dg/tree-ssa/forwprop-19.c scan-tree-dump-not forwprop1 .VEC_PERM_EXPR.

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r13-3212/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/forwprop-19.c 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/forwprop-19.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/forwprop-19.c 
--target_board='unix{-m64}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/forwprop-19.c 
--target_board='unix{-m64\ -march=cascadelake}'"



[r13-3172 Regression] FAIL:libgomp.oacc-c../../libgomp.oacc-c-c..-common/kernels-loop-g.c -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable -O2 (test for excess errors) on Linux/x86_64

2022-10-16 Thread Jiang, Haochen via Gcc-patches
On Linux/x86_64,

f30e9fd33e56a5a721346ea6140722e1b193db42 is the first bad commit
commit f30e9fd33e56a5a721346ea6140722e1b193db42
Author: Eugene Rozenfeld mailto:ero...@microsoft.com>>
Date:   Thu Apr 21 16:43:24 2022 -0700

Set discriminators for call stmts on the same line within the same basic 
block.

caused

FAIL: libgomp.oacc-c../../libgomp.oacc-c-c..-common/kernels-loop-g.c 
-DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable  -O2  (test for 
excess errors)

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r13-2288/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="c++.exp=libgomp.oacc-c-c++-common/kernels-loop-g.c 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="c++.exp=libgomp.oacc-c-c++-common/kernels-loop-g.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="c++.exp=libgomp.oacc-c-c++-common/kernels-loop-g.c 
--target_board='unix{-m64}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="c++.exp=libgomp.oacc-c-c++-common/kernels-loop-g.c 
--target_board='unix{-m64\ -march=cascadelake}'"



RE: [PATCH] i386: Add syscall to enable AMX for latest kernels

2022-09-22 Thread Jiang, Haochen via Gcc-patches
Hi all,

I would like to backport this patch to GCC 12 release branch as machines with 
the version of default GCC
is 12.x (which is always using newer kernels), if the patch is not backported, 
the amx tests will always fail.

Ok for backport?

BRs,
Haochen

> -Original Message-
> From: Uros Bizjak 
> Sent: Tuesday, June 21, 2022 10:53 PM
> To: Jiang, Haochen 
> Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao 
> Subject: Re: [PATCH] i386: Add syscall to enable AMX for latest kernels
> 
> On Tue, Jun 21, 2022 at 9:41 AM Jiang, Haochen 
> wrote:
> >
> > > -Original Message-
> > > From: Uros Bizjak 
> > > Sent: Tuesday, June 21, 2022 3:06 PM
> > > To: Jiang, Haochen 
> > > Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao 
> > > Subject: Re: [PATCH] i386: Add syscall to enable AMX for latest
> > > kernels
> > >
> > > On Tue, Jun 21, 2022 at 4:23 AM Jiang, Haochen
> > > 
> > > wrote:
> > > >
> > > > > -Original Message-
> > > > > From: Uros Bizjak 
> > > > > Sent: Monday, June 20, 2022 10:54 PM
> > > > > To: Jiang, Haochen 
> > > > > Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao
> > > > > 
> > > > > Subject: Re: [PATCH] i386: Add syscall to enable AMX for latest
> > > > > kernels
> > > > >
> > > > > On Mon, Jun 20, 2022 at 10:04 AM Haochen Jiang
> > > > > 
> > > > > wrote:
> > > > > >
> > > > > > From: "Jiang, Haochen" 
> > > > > >
> > > > > > Hi all,
> > > > > >
> > > > > > We need syscall to enable AMX for kernels>=5.4. It is missing
> > > > > > in current amx tests, which will cause test fail.
> > > > >
> > > > > So this new code is only valid for linux & co?
> > > >
> > > > Thanks for reminding me for that, I only test on linux since the
> > > > header file is
> > > only in linux.
> > > >
> > > > Just updated a patch wrapping with a macro not to change the
> > > > behavior on
> > > windows.
> > >
> > > I think you want __linux__ there, not __unix__.
> >
> > Fixed with __linux__.
> 
> OK.
> 
> Thanks,
> Uros.
> 
> >
> > Thx,
> > Haochen
> >
> > >
> > > Uros.
> > >
> > > >
> > > > Regtested on x86_64-pc-linux-gnu.
> > > >
> > > > Thx,
> > > > Haochen
> > > > >
> > > > > Uros.
> > > > >
> > > > > >
> > > > > > This patch aims to add them to fix this bug.
> > > > > >
> > > > > > BRs,
> > > > > > Haochen
> > > > > >
> > > > > > gcc/testsuite/ChangeLog:
> > > > > >
> > > > > > * gcc.target/i386/amx-check.h (request_perm_xtile_data):
> > > > > > New function to check if AMX is usable and enable AMX.
> > > > > > (main): Run test if AMX is usable.
> > > > > > ---
> > > > > >  gcc/testsuite/gcc.target/i386/amx-check.h | 24
> > > > > > +++
> > > > > >  1 file changed, 24 insertions(+)
> > > > > >
> > > > > > diff --git a/gcc/testsuite/gcc.target/i386/amx-check.h
> > > > > > b/gcc/testsuite/gcc.target/i386/amx-check.h
> > > > > > index 434b0e59703..92ed8669304 100644
> > > > > > --- a/gcc/testsuite/gcc.target/i386/amx-check.h
> > > > > > +++ b/gcc/testsuite/gcc.target/i386/amx-check.h
> > > > > > @@ -4,11 +4,22 @@
> > > > > >  #include 
> > > > > >  #include 
> > > > > >  #include 
> > > > > > +#include 
> > > > > > +#include 
> > > > > >  #ifdef DEBUG
> > > > > >  #include 
> > > > > >  #endif
> > > > > >  #include "cpuid.h"
> > > > > >
> > > > > > +#define XFEATURE_XTILECFG  17
> > > > > > +#define XFEATURE_XTILEDATA 18
> > > > > > +#define XFEATURE_MASK_XTILECFG (1 << XFEATURE_XTILECFG)
> > > > > > +#define XFEATURE_MASK_XTILEDATA(1 << XFEATURE_XTILEDATA)
> > > > > > +#define XFEATURE_MASK_XTILE(XFEATURE_MASK_XTILECFG |
> > > > > XFEATURE_MASK_XTILEDATA)
> > > > > > +
> > > > > > +#define ARCH_GET_XCOMP_PERM0x1022
> > > > > > +#define ARCH_REQ_XCOMP_PERM0x1023
> > > > > > +
> > > > > >  /* TODO: The tmm emulation is temporary for current
> > > > > > AMX implementation with no tmm regclass, should
> > > > > > be changed in the future. */ @@ -44,6 +55,18 @@ typedef
> > > > > > struct __tile
> > > > > >  /* Stride (colum width in byte) used for tileload/store */
> > > > > > #define _STRIDE 64
> > > > > >
> > > > > > +/* We need syscall to use amx functions */ int
> > > > > > +request_perm_xtile_data() {
> > > > > > +  unsigned long bitmask;
> > > > > > +
> > > > > > +  if (syscall (SYS_arch_prctl, ARCH_REQ_XCOMP_PERM,
> > > > > XFEATURE_XTILEDATA) ||
> > > > > > +  syscall (SYS_arch_prctl, ARCH_GET_XCOMP_PERM, ))
> > > > > > +return 0;
> > > > > > +
> > > > > > +  return (bitmask & XFEATURE_MASK_XTILE) != 0; }
> > > > > > +
> > > > > >  /* Initialize tile config by setting all tmm size to 16x64 */
> > > > > > void init_tile_config (__tilecfg_u *dst)  { @@ -186,6 +209,7
> > > > > > @@ main () #ifdef AMX_BF16
> > > > > >&& __builtin_cpu_supports ("amx-bf16")  #endif
> > > > > > +  && request_perm_xtile_data ()
> > > > > >)
> > > > > >  {
> > > > > >DO_TEST ();
> > > > > > --
> > > > > > 2.18.2
> > > > > >


RE: [PATCH][pushed] MAINTAINERS: fix alphabetic sorting

2022-07-04 Thread Jiang, Haochen via Gcc-patches
> -Original Message-
> From: Martin Liška 
> Sent: Monday, July 4, 2022 6:17 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Jiang, Haochen 
> Subject: [PATCH][pushed] MAINTAINERS: fix alphabetic sorting
> 
> ChangeLog:
> 
>   * MAINTAINERS: fix sorting of names
> ---
>  MAINTAINERS | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index f4a11cdc755..7d9aab76dd9 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -463,8 +463,8 @@ Andreas Jaeger
>   
>  Harsha Jagasia   
>  Fariborz Jahanian
>  Surya Kumari Jangala 
> -Qian Jianhua 
>  Haochen Jiang
> 
> +Qian Jianhua 

Sorry for misordering that g and h in alphabet table.

Maybe time to go back to kindergarten to have a review on that. Thanks for 
fixing!

Haochen

>  Janis Johnson
>   
>  Teresa Johnson
>   
>  Kean Johnston
> --
> 2.36.1



RE: [PATCH] i386: Extend cvtps2pd to memory

2022-07-03 Thread Jiang, Haochen via Gcc-patches
Hi all,

I revised my patch according to all your reviews.

Regtested on x86_64-pc-linux-gnu.

BRs,
Haochen

> -Original Message-
> From: Liu, Hongtao 
> Sent: Thursday, June 30, 2022 4:57 PM
> To: Uros Bizjak ; Jiang, Haochen
> 
> Cc: gcc-patches@gcc.gnu.org
> Subject: RE: [PATCH] i386: Extend cvtps2pd to memory
> 
> 
> 
> > -Original Message-
> > From: Uros Bizjak 
> > Sent: Thursday, June 30, 2022 4:53 PM
> > To: Jiang, Haochen 
> > Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao 
> > Subject: Re: [PATCH] i386: Extend cvtps2pd to memory
> >
> > On Thu, Jun 30, 2022 at 10:45 AM Uros Bizjak  wrote:
> > >
> > > On Thu, Jun 30, 2022 at 9:41 AM Uros Bizjak  wrote:
> > > >
> > > > On Thu, Jun 30, 2022 at 9:24 AM Jiang, Haochen
> 
> > wrote:
> > > > >
> > > > > > -Original Message-
> > > > > > From: Uros Bizjak 
> > > > > > Sent: Thursday, June 30, 2022 2:20 PM
> > > > > > To: Jiang, Haochen 
> > > > > > Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao
> > > > > > 
> > > > > > Subject: Re: [PATCH] i386: Extend cvtps2pd to memory
> > > > > >
> > > > > > On Thu, Jun 30, 2022 at 7:59 AM Haochen Jiang
> > > > > > 
> > > > > > wrote:
> > > > > > >
> > > > > > > Hi all,
> > > > > > >
> > > > > > > This patch aims to fix the cvtps2pd insn, which should also
> > > > > > > work on memory operand but currently does not. After this fix,
> > > > > > > when loop == 2, it will eliminate movq instruction.
> > > > > > >
> > > > > > > Regtested on x86_64-pc-linux-gnu. Ok for trunk?
> > > > > > >
> > > > > > > BRs,
> > > > > > > Haochen
> > > > > > >
> > > > > > > gcc/ChangeLog:
> > > > > > >
> > > > > > > PR target/43618
> > > > > > > * config/i386/sse.md (extendv2sfv2df2): New define_expand.
> > > > > > > (sse2_cvtps2pd_load): Rename
> extendvsdfv2df2.
> > >
> > > Rename FROM ...
> > >
> > > Please also mention change to sse2_cvtps2pd.
> > >
> > > > > > >
> > > > > > > gcc/testsuite/ChangeLog:
> > > > > > >
> > > > > > > PR target/43618
> > > > > > > * gcc.target/i386/pr43618-1.c: New test.
> > > > > >
> > > > > > This patch could be as simple as:
> > > > > >
> > > > > > diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> > > > > > index 8cd0f617bf3..c331445cb2d 100644
> > > > > > --- a/gcc/config/i386/sse.md
> > > > > > +++ b/gcc/config/i386/sse.md
> > > > > > @@ -9195,7 +9195,7 @@
> > > > > > (define_insn "extendv2sfv2df2"
> > > > > >   [(set (match_operand:V2DF 0 "register_operand" "=v")
> > > > > >(float_extend:V2DF
> > > > > > - (match_operand:V2SF 1 "register_operand" "v")))]
> > > > > > + (match_operand:V2SF 1 "nonimmediate_operand" "vm")))]
> > > > > >   "TARGET_MMX_WITH_SSE"
> > > > > >   "%vcvtps2pd\t{%1, %0|%0, %1}"
> > > > > >   [(set_attr "type" "ssecvt")
> > > > >
> > > > > We also tested on this version, it is ok.
> > > > >
> > > > > The reason why the patch looks like this is because in the
> > > > > previous insn sse2_cvtps2pd, the constraint vm and
> > > > > vector_operand actually does not match the actual instruction.
> > > > > Memory operand is V2SF, not V4SF.
> > > > >
> > > > > Therefore, we changed the constraint in that insn. Then it caused
> another
> > issue.
> > > > > For memory operand, it seems that we cannot generate those mask
> > instructions.
> > > > > So I change the pattern to how extendv2hfv2df2 works.
> > > >
> > > > If you want to change the memory access in
> sse2_cvtps2pd,
> > > > then please see how e.g. v2hiv2di is handled in sse.md. In
> > > > addition to two instructions, you will need one
> > > > define_insn_and_split with a pre-reload splitter.
> > >
> > > Oh, nowadays combine does vec_select from a paradoxical subreg on its
> own.
> > >
> > > +(define_expand "extendv2sfv2df2"
> > > +  [(set (match_operand:V2DF 0 "register_operand")
> > > +(float_extend:V2DF
> > > +  (match_operand:V2SF 1 "nonimmediate_operand")))]
> > > +  "TARGET_MMX_WITH_SSE"
> > > +{
> > > +  if (!MEM_P (operands[1]))
> > > +{
> > >
> > > You will need force reg here:
> > >
> > > rtx op1 = force_reg (V2SFmode, operands[1]);
> > > +  operands[1] = lowpart_subreg (V4SFmode, op1, V2SFmode);
> > > +  emit_insn (gen_sse2_cvtps2pd (operands[0], operands[1]));
> > > +  DONE;
> > > +}
> > > +})
> > >
> > >
> > > -(define_insn "extendv2sfv2df2"
> > > +(define_insn "sse2_cvtps2pd_load"
> > >
> > > Please name this insn "*sse2_cvtps2pd_1". Please note
> the
> > > star at the beginning, You don't have to make the name public.
> > >
> > > OK with the above changes.
> >
> > Forgot to mention:
> >
> >
> > - (match_operand:V2SF 1 "register_operand" "v")))]
> > -  "TARGET_MMX_WITH_SSE"
> > -  "%vcvtps2pd\t{%1, %0|%0, %1}"
> > + (match_operand:V2SF 1 "memory_operand" "m")))]
> > + "TARGET_MMX_WITH_SSE && "
> > +  "%vcvtps2pd\t{%1, %0|%0 > and2>, %q1}"
> >[(set_attr "type" "ssecvt")
> >
> > The new insn does not need to be limited to TARGET_MMX_WITH_SSE, so
> we
> > can use 

RE: [PATCH] i386: Extend cvtps2pd to memory

2022-06-30 Thread Jiang, Haochen via Gcc-patches
> -Original Message-
> From: Uros Bizjak 
> Sent: Thursday, June 30, 2022 2:20 PM
> To: Jiang, Haochen 
> Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao 
> Subject: Re: [PATCH] i386: Extend cvtps2pd to memory
> 
> On Thu, Jun 30, 2022 at 7:59 AM Haochen Jiang 
> wrote:
> >
> > Hi all,
> >
> > This patch aims to fix the cvtps2pd insn, which should also work on
> > memory operand but currently does not. After this fix, when loop == 2,
> > it will eliminate movq instruction.
> >
> > Regtested on x86_64-pc-linux-gnu. Ok for trunk?
> >
> > BRs,
> > Haochen
> >
> > gcc/ChangeLog:
> >
> > PR target/43618
> > * config/i386/sse.md (extendv2sfv2df2): New define_expand.
> > (sse2_cvtps2pd_load): Rename extendvsdfv2df2.
> >
> > gcc/testsuite/ChangeLog:
> >
> > PR target/43618
> > * gcc.target/i386/pr43618-1.c: New test.
> 
> This patch could be as simple as:
> 
> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index
> 8cd0f617bf3..c331445cb2d 100644
> --- a/gcc/config/i386/sse.md
> +++ b/gcc/config/i386/sse.md
> @@ -9195,7 +9195,7 @@
> (define_insn "extendv2sfv2df2"
>   [(set (match_operand:V2DF 0 "register_operand" "=v")
>(float_extend:V2DF
> - (match_operand:V2SF 1 "register_operand" "v")))]
> + (match_operand:V2SF 1 "nonimmediate_operand" "vm")))]
>   "TARGET_MMX_WITH_SSE"
>   "%vcvtps2pd\t{%1, %0|%0, %1}"
>   [(set_attr "type" "ssecvt")

We also tested on this version, it is ok.

The reason why the patch looks like this is because in the previous insn
sse2_cvtps2pd, the constraint vm and vector_operand
actually does not match the actual instruction. Memory operand is V2SF,
not V4SF.

Therefore, we changed the constraint in that insn. Then it caused another issue.
For memory operand, it seems that we cannot generate those mask instructions.
So I change the pattern to how extendv2hfv2df2 works.

Haochen

> Uros.


RE: [PATCH] i386: Add syscall to enable AMX for latest kernels

2022-06-21 Thread Jiang, Haochen via Gcc-patches
> -Original Message-
> From: Uros Bizjak 
> Sent: Tuesday, June 21, 2022 3:06 PM
> To: Jiang, Haochen 
> Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao 
> Subject: Re: [PATCH] i386: Add syscall to enable AMX for latest kernels
> 
> On Tue, Jun 21, 2022 at 4:23 AM Jiang, Haochen 
> wrote:
> >
> > > -Original Message-
> > > From: Uros Bizjak 
> > > Sent: Monday, June 20, 2022 10:54 PM
> > > To: Jiang, Haochen 
> > > Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao 
> > > Subject: Re: [PATCH] i386: Add syscall to enable AMX for latest
> > > kernels
> > >
> > > On Mon, Jun 20, 2022 at 10:04 AM Haochen Jiang
> > > 
> > > wrote:
> > > >
> > > > From: "Jiang, Haochen" 
> > > >
> > > > Hi all,
> > > >
> > > > We need syscall to enable AMX for kernels>=5.4. It is missing in
> > > > current amx tests, which will cause test fail.
> > >
> > > So this new code is only valid for linux & co?
> >
> > Thanks for reminding me for that, I only test on linux since the header 
> > file is
> only in linux.
> >
> > Just updated a patch wrapping with a macro not to change the behavior on
> windows.
> 
> I think you want __linux__ there, not __unix__.

Fixed with __linux__.

Thx,
Haochen

> 
> Uros.
> 
> >
> > Regtested on x86_64-pc-linux-gnu.
> >
> > Thx,
> > Haochen
> > >
> > > Uros.
> > >
> > > >
> > > > This patch aims to add them to fix this bug.
> > > >
> > > > BRs,
> > > > Haochen
> > > >
> > > > gcc/testsuite/ChangeLog:
> > > >
> > > > * gcc.target/i386/amx-check.h (request_perm_xtile_data):
> > > > New function to check if AMX is usable and enable AMX.
> > > > (main): Run test if AMX is usable.
> > > > ---
> > > >  gcc/testsuite/gcc.target/i386/amx-check.h | 24
> > > > +++
> > > >  1 file changed, 24 insertions(+)
> > > >
> > > > diff --git a/gcc/testsuite/gcc.target/i386/amx-check.h
> > > > b/gcc/testsuite/gcc.target/i386/amx-check.h
> > > > index 434b0e59703..92ed8669304 100644
> > > > --- a/gcc/testsuite/gcc.target/i386/amx-check.h
> > > > +++ b/gcc/testsuite/gcc.target/i386/amx-check.h
> > > > @@ -4,11 +4,22 @@
> > > >  #include 
> > > >  #include 
> > > >  #include 
> > > > +#include 
> > > > +#include 
> > > >  #ifdef DEBUG
> > > >  #include 
> > > >  #endif
> > > >  #include "cpuid.h"
> > > >
> > > > +#define XFEATURE_XTILECFG  17
> > > > +#define XFEATURE_XTILEDATA 18
> > > > +#define XFEATURE_MASK_XTILECFG (1 << XFEATURE_XTILECFG)
> > > > +#define XFEATURE_MASK_XTILEDATA(1 << XFEATURE_XTILEDATA)
> > > > +#define XFEATURE_MASK_XTILE(XFEATURE_MASK_XTILECFG |
> > > XFEATURE_MASK_XTILEDATA)
> > > > +
> > > > +#define ARCH_GET_XCOMP_PERM0x1022
> > > > +#define ARCH_REQ_XCOMP_PERM0x1023
> > > > +
> > > >  /* TODO: The tmm emulation is temporary for current
> > > > AMX implementation with no tmm regclass, should
> > > > be changed in the future. */
> > > > @@ -44,6 +55,18 @@ typedef struct __tile
> > > >  /* Stride (colum width in byte) used for tileload/store */
> > > > #define _STRIDE 64
> > > >
> > > > +/* We need syscall to use amx functions */ int
> > > > +request_perm_xtile_data() {
> > > > +  unsigned long bitmask;
> > > > +
> > > > +  if (syscall (SYS_arch_prctl, ARCH_REQ_XCOMP_PERM,
> > > XFEATURE_XTILEDATA) ||
> > > > +  syscall (SYS_arch_prctl, ARCH_GET_XCOMP_PERM, ))
> > > > +return 0;
> > > > +
> > > > +  return (bitmask & XFEATURE_MASK_XTILE) != 0; }
> > > > +
> > > >  /* Initialize tile config by setting all tmm size to 16x64 */
> > > > void init_tile_config (__tilecfg_u *dst)  { @@ -186,6 +209,7 @@
> > > > main () #ifdef AMX_BF16
> > > >&& __builtin_cpu_supports ("amx-bf16")  #endif
> > > > +  && request_perm_xtile_data ()
> > > >)
> > > >  {
> > > >DO_TEST ();
> > > > --
> > > > 2.18.2
> > > >


0001-i386-Add-syscall-to-enable-AMX-for-latest-kernels.patch
Description: 0001-i386-Add-syscall-to-enable-AMX-for-latest-kernels.patch


RE: [PATCH] i386: Add syscall to enable AMX for latest kernels

2022-06-20 Thread Jiang, Haochen via Gcc-patches
> -Original Message-
> From: Uros Bizjak 
> Sent: Monday, June 20, 2022 10:54 PM
> To: Jiang, Haochen 
> Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao 
> Subject: Re: [PATCH] i386: Add syscall to enable AMX for latest kernels
> 
> On Mon, Jun 20, 2022 at 10:04 AM Haochen Jiang 
> wrote:
> >
> > From: "Jiang, Haochen" 
> >
> > Hi all,
> >
> > We need syscall to enable AMX for kernels>=5.4. It is missing in
> > current amx tests, which will cause test fail.
> 
> So this new code is only valid for linux & co?

Thanks for reminding me for that, I only test on linux since the header file is 
only in linux.

Just updated a patch wrapping with a macro not to change the behavior on 
windows.

Regtested on x86_64-pc-linux-gnu.

Thx,
Haochen
> 
> Uros.
> 
> >
> > This patch aims to add them to fix this bug.
> >
> > BRs,
> > Haochen
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/i386/amx-check.h (request_perm_xtile_data):
> > New function to check if AMX is usable and enable AMX.
> > (main): Run test if AMX is usable.
> > ---
> >  gcc/testsuite/gcc.target/i386/amx-check.h | 24
> > +++
> >  1 file changed, 24 insertions(+)
> >
> > diff --git a/gcc/testsuite/gcc.target/i386/amx-check.h
> > b/gcc/testsuite/gcc.target/i386/amx-check.h
> > index 434b0e59703..92ed8669304 100644
> > --- a/gcc/testsuite/gcc.target/i386/amx-check.h
> > +++ b/gcc/testsuite/gcc.target/i386/amx-check.h
> > @@ -4,11 +4,22 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> > +#include 
> >  #ifdef DEBUG
> >  #include 
> >  #endif
> >  #include "cpuid.h"
> >
> > +#define XFEATURE_XTILECFG  17
> > +#define XFEATURE_XTILEDATA 18
> > +#define XFEATURE_MASK_XTILECFG (1 << XFEATURE_XTILECFG)
> > +#define XFEATURE_MASK_XTILEDATA(1 << XFEATURE_XTILEDATA)
> > +#define XFEATURE_MASK_XTILE(XFEATURE_MASK_XTILECFG |
> XFEATURE_MASK_XTILEDATA)
> > +
> > +#define ARCH_GET_XCOMP_PERM0x1022
> > +#define ARCH_REQ_XCOMP_PERM0x1023
> > +
> >  /* TODO: The tmm emulation is temporary for current
> > AMX implementation with no tmm regclass, should
> > be changed in the future. */
> > @@ -44,6 +55,18 @@ typedef struct __tile
> >  /* Stride (colum width in byte) used for tileload/store */  #define
> > _STRIDE 64
> >
> > +/* We need syscall to use amx functions */ int
> > +request_perm_xtile_data() {
> > +  unsigned long bitmask;
> > +
> > +  if (syscall (SYS_arch_prctl, ARCH_REQ_XCOMP_PERM,
> XFEATURE_XTILEDATA) ||
> > +  syscall (SYS_arch_prctl, ARCH_GET_XCOMP_PERM, ))
> > +return 0;
> > +
> > +  return (bitmask & XFEATURE_MASK_XTILE) != 0; }
> > +
> >  /* Initialize tile config by setting all tmm size to 16x64 */  void
> > init_tile_config (__tilecfg_u *dst)  { @@ -186,6 +209,7 @@ main ()
> > #ifdef AMX_BF16
> >&& __builtin_cpu_supports ("amx-bf16")  #endif
> > +  && request_perm_xtile_data ()
> >)
> >  {
> >DO_TEST ();
> > --
> > 2.18.2
> >


0001-i386-Add-syscall-to-enable-AMX-for-latest-kernels.patch
Description: 0001-i386-Add-syscall-to-enable-AMX-for-latest-kernels.patch


RE: [PATCH] [i386]Add combine splitter to transform pxor/pcmpeqb/pmovmskb/cmp 0xffff to ptest.

2022-05-12 Thread Jiang, Haochen via Gcc-patches
> -Original Message-
> From: Uros Bizjak 
> Sent: Thursday, May 12, 2022 5:12 PM
> To: Jiang, Haochen 
> Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao ; Hongyu
> Wang 
> Subject: Re: [PATCH] [i386]Add combine splitter to transform
> pxor/pcmpeqb/pmovmskb/cmp 0x to ptest.
> 
> On Thu, May 12, 2022 at 5:01 AM Jiang, Haochen 
> wrote:
> >
> > Hi all,
> >
> > I just refined this patch with more explanation in commit message.
> 
> The ChangeLog entry should in fact read as:
> 
> PR target/104371
> * config/i386/sse.md (vi1avx2const): New define_mode_attr.
> (pxor/pcmpeqb/pmovmskb/cmp 0x to ptest splitter):
> New define_split pattern.

Fixed ChangeLog with this.

Thx,
Haochen

> 
> Please see  [1].
> 
> [1] https://www.gnu.org/prep/standards/html_node/Change-
> Logs.html#Change-Logs
> 
> OK with the fixed ChangeLog.
> 
> Uros.
> 
> > No code change compare to last change, which removed ix86_match_ccmode.
> >
> > Ok for trunk?
> >
> > BRs,
> > Haochen
> >
> > > -Original Message-
> > > From: Jiang, Haochen
> > > Sent: Saturday, May 7, 2022 9:55 AM
> > > To: Uros Bizjak 
> > > Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao 
> > > Subject: RE: [PATCH] [i386]Add combine splitter to transform
> > > pxor/pcmpeqb/pmovmskb/cmp 0x to ptest.
> > >
> > >
> > >
> > > > -Original Message-
> > > > From: Uros Bizjak 
> > > > Sent: Friday, May 6, 2022 4:59 PM
> > > > To: Jiang, Haochen 
> > > > Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao 
> > > > Subject: Re: [PATCH] [i386]Add combine splitter to transform
> > > > pxor/pcmpeqb/pmovmskb/cmp 0x to ptest.
> > > >
> > > > On Fri, May 6, 2022 at 10:01 AM Haochen Jiang
> > > > 
> > > > wrote:
> > > > >
> > > > > Hi all,
> > > > >
> > > > > This patch aims to add a combine splitter to transform
> > > pxor/pcmpeqb/pmovmskb/cmp 0x to ptest.
> > > > >
> > > > > Regtested on x86_64-pc-linux-gnu. Ok for trunk?
> > > > >
> > > > > BRs,
> > > > > Haochen
> > > > >
> > > > > gcc/ChangeLog:
> > > > >
> > > > > PR target/104371
> > > > > * config/i386/sse.md: Add new define_mode_attr and 
> > > > > define_split.
> > > > >
> > > > > gcc/testsuite/ChangeLog:
> > > > >
> > > > > PR target/104371
> > > > > * gcc.target/i386/pr104371-1.c: New test.
> > > > > * gcc.target/i386/pr104371-2.c: Ditto.
> > > > > ---
> > > > >  gcc/config/i386/sse.md | 19 +++
> > > > >  gcc/testsuite/gcc.target/i386/pr104371-1.c | 14 ++
> > > > > gcc/testsuite/gcc.target/i386/pr104371-2.c | 14 ++
> > > > >  3 files changed, 47 insertions(+)  create mode 100644
> > > > > gcc/testsuite/gcc.target/i386/pr104371-1.c
> > > > >  create mode 100755 gcc/testsuite/gcc.target/i386/pr104371-2.c
> > > > >
> > > > > diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> > > > > index 7b791def542..71afda73c8f 100644
> > > > > --- a/gcc/config/i386/sse.md
> > > > > +++ b/gcc/config/i386/sse.md
> > > > > @@ -20083,6 +20083,25 @@
> > > > > (set_attr "prefix" "maybe_vex")
> > > > > (set_attr "mode" "SI")])
> > > > >
> > > > > +;; Optimize pxor/pcmpeqb/pmovmskb/cmp 0x to ptest.
> > > > > +(define_mode_attr vi1avx2const
> > > > > +  [(V32QI "0x") (V16QI "0x")])
> > > > > +
> > > > > +(define_split
> > > > > +  [(set (reg:CCZ FLAGS_REG)
> > > > > +   (compare:CCZ (unspec:SI
> > > > > +   [(eq:VI1_AVX2
> > > > > +   (match_operand:VI1_AVX2 0 
> > > > > "vector_operand")
> > > > > +   (match_operand:VI1_AVX2 1 
> > > > > "const0_operand"))]
> > > > > +   UNSPEC_MOVMSK)
> > > > > +(match_operand 2 "const_int_operand")))]
> > > > > +  "TARGET_SSE4_1 && ix86_match_ccmode (insn, CCmode)
> > > >
> > > > No need to use ix86_match_ccmode here, the pattern is already
> > > > limited to CCZmode,
> > > >
> > > > Uros.
> > > >
> > >
> > > Removed this condition in my new patch, also make the testcase
> > > change according to Hongyu's review.
> > >
> > > Is the patch Ok for trunk?
> > >
> > > Haochen
> > >
> > > > > +  && (INTVAL (operands[2]) == (int) ())"
> > > > > +  [(set (reg:CC FLAGS_REG)
> > > > > +   (unspec:CC [(match_dup 0)
> > > > > +   (match_dup 0)]
> > > > > +  UNSPEC_PTEST))])
> > > > > +
> > > > >  (define_expand "sse2_maskmovdqu"
> > > > >[(set (match_operand:V16QI 0 "memory_operand")
> > > > > (unspec:V16QI [(match_operand:V16QI 1
> > > > > "register_operand") diff --git
> > > > > a/gcc/testsuite/gcc.target/i386/pr104371-1.c
> > > > > b/gcc/testsuite/gcc.target/i386/pr104371-1.c
> > > > > new file mode 100644
> > > > > index 000..df7c0b074e3
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/gcc.target/i386/pr104371-1.c
> > > > > @@ -0,0 +1,14 @@
> > > > > +/* { dg-do compile } */
> > > > > +/* { dg-options "-O2 -msse4" } */
> > > > > +/* { dg-final { scan-assembler "ptest\[ \\t\]" } } */
> > > > > 

RE: [PATCH] [i386]Add combine splitter to transform pxor/pcmpeqb/pmovmskb/cmp 0xffff to ptest.

2022-05-11 Thread Jiang, Haochen via Gcc-patches
Hi all,

I just refined this patch with more explanation in commit message.

No code change compare to last change, which removed ix86_match_ccmode.

Ok for trunk?

BRs,
Haochen

> -Original Message-
> From: Jiang, Haochen
> Sent: Saturday, May 7, 2022 9:55 AM
> To: Uros Bizjak 
> Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao 
> Subject: RE: [PATCH] [i386]Add combine splitter to transform
> pxor/pcmpeqb/pmovmskb/cmp 0x to ptest.
> 
> 
> 
> > -Original Message-
> > From: Uros Bizjak 
> > Sent: Friday, May 6, 2022 4:59 PM
> > To: Jiang, Haochen 
> > Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao 
> > Subject: Re: [PATCH] [i386]Add combine splitter to transform
> > pxor/pcmpeqb/pmovmskb/cmp 0x to ptest.
> >
> > On Fri, May 6, 2022 at 10:01 AM Haochen Jiang 
> > wrote:
> > >
> > > Hi all,
> > >
> > > This patch aims to add a combine splitter to transform
> pxor/pcmpeqb/pmovmskb/cmp 0x to ptest.
> > >
> > > Regtested on x86_64-pc-linux-gnu. Ok for trunk?
> > >
> > > BRs,
> > > Haochen
> > >
> > > gcc/ChangeLog:
> > >
> > > PR target/104371
> > > * config/i386/sse.md: Add new define_mode_attr and define_split.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > > PR target/104371
> > > * gcc.target/i386/pr104371-1.c: New test.
> > > * gcc.target/i386/pr104371-2.c: Ditto.
> > > ---
> > >  gcc/config/i386/sse.md | 19 +++
> > >  gcc/testsuite/gcc.target/i386/pr104371-1.c | 14 ++
> > > gcc/testsuite/gcc.target/i386/pr104371-2.c | 14 ++
> > >  3 files changed, 47 insertions(+)
> > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr104371-1.c
> > >  create mode 100755 gcc/testsuite/gcc.target/i386/pr104371-2.c
> > >
> > > diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index
> > > 7b791def542..71afda73c8f 100644
> > > --- a/gcc/config/i386/sse.md
> > > +++ b/gcc/config/i386/sse.md
> > > @@ -20083,6 +20083,25 @@
> > > (set_attr "prefix" "maybe_vex")
> > > (set_attr "mode" "SI")])
> > >
> > > +;; Optimize pxor/pcmpeqb/pmovmskb/cmp 0x to ptest.
> > > +(define_mode_attr vi1avx2const
> > > +  [(V32QI "0x") (V16QI "0x")])
> > > +
> > > +(define_split
> > > +  [(set (reg:CCZ FLAGS_REG)
> > > +   (compare:CCZ (unspec:SI
> > > +   [(eq:VI1_AVX2
> > > +   (match_operand:VI1_AVX2 0 "vector_operand")
> > > +   (match_operand:VI1_AVX2 1 "const0_operand"))]
> > > +   UNSPEC_MOVMSK)
> > > +(match_operand 2 "const_int_operand")))]
> > > +  "TARGET_SSE4_1 && ix86_match_ccmode (insn, CCmode)
> >
> > No need to use ix86_match_ccmode here, the pattern is already limited to
> > CCZmode,
> >
> > Uros.
> >
> 
> Removed this condition in my new patch, also make the testcase change
> according to
> Hongyu's review.
> 
> Is the patch Ok for trunk?
> 
> Haochen
> 
> > > +  && (INTVAL (operands[2]) == (int) ())"
> > > +  [(set (reg:CC FLAGS_REG)
> > > +   (unspec:CC [(match_dup 0)
> > > +   (match_dup 0)]
> > > +  UNSPEC_PTEST))])
> > > +
> > >  (define_expand "sse2_maskmovdqu"
> > >[(set (match_operand:V16QI 0 "memory_operand")
> > > (unspec:V16QI [(match_operand:V16QI 1 "register_operand") diff
> > > --git a/gcc/testsuite/gcc.target/i386/pr104371-1.c
> > > b/gcc/testsuite/gcc.target/i386/pr104371-1.c
> > > new file mode 100644
> > > index 000..df7c0b074e3
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/i386/pr104371-1.c
> > > @@ -0,0 +1,14 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-options "-O2 -msse4" } */
> > > +/* { dg-final { scan-assembler "ptest\[ \\t\]" } } */
> > > +/* { dg-final { scan-assembler-not "pxor\[ \\t\]" } } */
> > > +/* { dg-final { scan-assembler-not "pcmpeqb\[ \\t\]" } } */
> > > +/* { dg-final { scan-assembler-not "pmovmskb\[ \\t\]" } } */
> > > +
> > > +#include 
> > > +#include 
> > > +
> > > +bool is_zero(__m128i x)
> > > +{
> > > +  return _mm_movemask_epi8(_mm_cmpeq_epi8(x, _mm_setzero_si128()))
> > ==
> > > +0x; }
> > > diff --git a/gcc/testsuite/gcc.target/i386/pr104371-2.c
> > > b/gcc/testsuite/gcc.target/i386/pr104371-2.c
> > > new file mode 100755
> > > index 000..f0d0afd5897
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/i386/pr104371-2.c
> > > @@ -0,0 +1,14 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-options "-O2 -mavx2" } */
> > > +/* { dg-final { scan-assembler "vptest\[ \\t\]" } } */
> > > +/* { dg-final { scan-assembler-not "vpxor\[ \\t\]" } } */
> > > +/* { dg-final { scan-assembler-not "vpcmpeqb\[ \\t\]" } } */
> > > +/* { dg-final { scan-assembler-not "vpmovmskb\[ \\t\]" } } */
> > > +
> > > +#include 
> > > +#include 
> > > +
> > > +bool is_zero256(__m256i x)
> > > +{
> > > +  return _mm256_movemask_epi8(_mm256_cmpeq_epi8(x,
> > > +_mm256_setzero_si256())) == 0x; }
> > > --
> > > 2.18.1
> > >



RE: [PATCH] Reconstruct i386 testsuite with __builtin_cpu_supports

2022-05-09 Thread Jiang, Haochen via Gcc-patches
That make sense to me. Thx!

> -Original Message-
> From: Uros Bizjak 
> Sent: Saturday, May 7, 2022 5:04 PM
> To: Jiang, Haochen 
> Cc: Hongyu Wang ; Liu, Hongtao
> ; gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH] Reconstruct i386 testsuite with __builtin_cpu_supports
> 
> On Sat, May 7, 2022 at 3:20 AM Jiang, Haochen 
> wrote:
> >
> > Hi Uros,
> >
> > I understand that we always keep the old testcases there. It is always safe 
> > to
> do that.
> >
> > But I have another question, if we add something new in one of the
> > existing files in the future, should we use __builtin_cpu_supports to keep 
> > the
> code clearer or stick to cpuids?
> 
> We should use __builtin_cpu_supports.
> 
> > I believe __builtin_cpu_supports will be a clearer way for a coder to
> understand under current circumstance.
> > So if we use that in future use, why don't we change everything to the same
> way?
> 
> Because we test the old and the new approach this way.
> 
> Uros.
> 
> > BRs,
> > Haochen
> >
> > -Original Message-
> > From: Uros Bizjak 
> > Sent: Friday, May 6, 2022 5:17 PM
> > To: Hongyu Wang 
> > Cc: Jiang, Haochen ; Liu, Hongtao
> > ; gcc-patches@gcc.gnu.org
> > Subject: Re: [PATCH] Reconstruct i386 testsuite with
> > __builtin_cpu_supports
> >
> > On Fri, May 6, 2022 at 11:00 AM Hongyu Wang 
> wrote:
> > >
> > > > I don't think *_os_support calls should be removed. IIRC,
> > > > __builtin_cpu_supports function checks if the feature is supported
> > > > by CPU, whereas *_os_supports calls check via xgetbv if OS
> > > > supports handling of new registers.
> > >
> > > avx_os_support is like
> > >
> > > avx_os_support (void)
> > > {
> > >   unsigned int eax, edx;
> > >   unsigned int ecx = XCR_XFEATURE_ENABLED_MASK;
> > >
> > >   __asm__ ("xgetbv" : "=a" (eax), "=d" (edx) : "c" (ecx));
> > >
> > >   return (eax & (XSTATE_SSE | XSTATE_YMM)) == (XSTATE_SSE |
> > > XSTATE_YMM); }
> > >
> > > While in get_avaliable_features we have
> > >
> > > #define XCR_AVX_ENABLED_MASK \
> > >   (XSTATE_SSE | XSTATE_YMM)
> > >   if ((ecx & bit_OSXSAVE))
> > > {
> > >   /* Check if XMM, YMM, OPMASK, upper 256 bits of ZMM0-ZMM15 and
> > > ZMM16-ZMM31 states are supported by OSXSAVE.  */
> > >   unsigned int xcrlow;
> > >   unsigned int xcrhigh;
> > >   __asm__ (".byte 0x0f, 0x01, 0xd0" /* xgetbv  */
> > >: "=a" (xcrlow), "=d" (xcrhigh)
> > >: "c" (XCR_XFEATURE_ENABLED_MASK));
> > >   if ((xcrlow & XCR_AVX_ENABLED_MASK) == XCR_AVX_ENABLED_MASK) {
> > >   avx_usable = 1;
> > >
> > > So __builtin_cpu_supports already inherits same check
> >
> > Indeed, thanks for the explanation.
> >
> > OTOH, we don't change the existing tests (perhaps only dg- directives
> > when infrastructure improves), so I would leave the existing testcases
> > as they are. In future, new helper functions should be implemented
> > with __builtin_cpu_supports, but let's leave existing ones as they
> > are.
> >
> > Uros.
> >
> > > Uros Bizjak via Gcc-patches  于2022年5月6
> 日周五
> > > 16:27写道:
> > > >
> > > > On Fri, May 6, 2022 at 9:57 AM Haochen Jiang 
> wrote:
> > > > >
> > > > > Hi all,
> > > > >
> > > > > There are some check files in i386 testsuite are written before the
> function __builtin_cpu_supports is introduced. All of them are using
> __get_cpuid_count. This patch aims to reconstruct the i386 testsuite with
> __builtin_cpu_supports so that we can have a much clearer code.
> > > > >
> > > > > Regtested on x86_64-pc-linux-gnu. Ok for trunk?
> > > >
> > > > I don't think *_os_support calls should be removed. IIRC,
> > > > __builtin_cpu_supports function checks if the feature is supported
> > > > by CPU, whereas *_os_supports calls check via xgetbv if OS
> > > > supports handling of new registers.
> > > >
> > > > Uros.
> > > >
> > > > >
> > > > > Also when writting this patch, I also find some files in testsuite 
> > > > > that might
> be useless currently. For example, in the file 
> gcc/testsuite/gcc.target/i386/sse-
> os-support.h, it always return 1. And there are also some files will no 
> longer be
> included at all with this patch. Should we remove those files when we have 
> time?
> > > > >
> > > > > BRs,
> > > > > Haochen
> > > > >
> > > > > gcc/testsuite/ChangeLog:
> > > > >
> > > > > * gcc.target/i386/adx-check.h: Change bit check to
> > > > > __builtin_cpu_supports.
> > > > > * gcc.target/i386/aes-avx-check.h: Ditto.
> > > > > * gcc.target/i386/aes-check.h: Ditto.
> > > > > * gcc.target/i386/avx-check.h: Ditto.
> > > > > * gcc.target/i386/avx2-check.h: Ditto.
> > > > > * gcc.target/i386/avx512-check.h: Ditto.
> > > > > * gcc.target/i386/bmi-check.h: Ditto.
> > > > > * gcc.target/i386/bmi2-check.h: Ditto.
> > > > > * gcc.target/i386/f16c-check.h: Ditto.
> > > > > * gcc.target/i386/fma-check.h: Ditto.
> > > > > * gcc.target/i386/fma4-check.h: Ditto.
> > > > > * gcc.target/i386/lzcnt-check.h: 

RE: [PATCH] [i386]Add combine splitter to transform pxor/pcmpeqb/pmovmskb/cmp 0xffff to ptest.

2022-05-06 Thread Jiang, Haochen via Gcc-patches


> -Original Message-
> From: Uros Bizjak 
> Sent: Friday, May 6, 2022 4:59 PM
> To: Jiang, Haochen 
> Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao 
> Subject: Re: [PATCH] [i386]Add combine splitter to transform
> pxor/pcmpeqb/pmovmskb/cmp 0x to ptest.
> 
> On Fri, May 6, 2022 at 10:01 AM Haochen Jiang 
> wrote:
> >
> > Hi all,
> >
> > This patch aims to add a combine splitter to transform 
> > pxor/pcmpeqb/pmovmskb/cmp 0x to ptest.
> >
> > Regtested on x86_64-pc-linux-gnu. Ok for trunk?
> >
> > BRs,
> > Haochen
> >
> > gcc/ChangeLog:
> >
> > PR target/104371
> > * config/i386/sse.md: Add new define_mode_attr and define_split.
> >
> > gcc/testsuite/ChangeLog:
> >
> > PR target/104371
> > * gcc.target/i386/pr104371-1.c: New test.
> > * gcc.target/i386/pr104371-2.c: Ditto.
> > ---
> >  gcc/config/i386/sse.md | 19 +++
> >  gcc/testsuite/gcc.target/i386/pr104371-1.c | 14 ++
> > gcc/testsuite/gcc.target/i386/pr104371-2.c | 14 ++
> >  3 files changed, 47 insertions(+)
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr104371-1.c
> >  create mode 100755 gcc/testsuite/gcc.target/i386/pr104371-2.c
> >
> > diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index
> > 7b791def542..71afda73c8f 100644
> > --- a/gcc/config/i386/sse.md
> > +++ b/gcc/config/i386/sse.md
> > @@ -20083,6 +20083,25 @@
> > (set_attr "prefix" "maybe_vex")
> > (set_attr "mode" "SI")])
> >
> > +;; Optimize pxor/pcmpeqb/pmovmskb/cmp 0x to ptest.
> > +(define_mode_attr vi1avx2const
> > +  [(V32QI "0x") (V16QI "0x")])
> > +
> > +(define_split
> > +  [(set (reg:CCZ FLAGS_REG)
> > +   (compare:CCZ (unspec:SI
> > +   [(eq:VI1_AVX2
> > +   (match_operand:VI1_AVX2 0 "vector_operand")
> > +   (match_operand:VI1_AVX2 1 "const0_operand"))]
> > +   UNSPEC_MOVMSK)
> > +(match_operand 2 "const_int_operand")))]
> > +  "TARGET_SSE4_1 && ix86_match_ccmode (insn, CCmode)
> 
> No need to use ix86_match_ccmode here, the pattern is already limited to
> CCZmode,
> 
> Uros.
> 

Removed this condition in my new patch, also make the testcase change according 
to
Hongyu's review.

Is the patch Ok for trunk?

Haochen

> > +  && (INTVAL (operands[2]) == (int) ())"
> > +  [(set (reg:CC FLAGS_REG)
> > +   (unspec:CC [(match_dup 0)
> > +   (match_dup 0)]
> > +  UNSPEC_PTEST))])
> > +
> >  (define_expand "sse2_maskmovdqu"
> >[(set (match_operand:V16QI 0 "memory_operand")
> > (unspec:V16QI [(match_operand:V16QI 1 "register_operand") diff
> > --git a/gcc/testsuite/gcc.target/i386/pr104371-1.c
> > b/gcc/testsuite/gcc.target/i386/pr104371-1.c
> > new file mode 100644
> > index 000..df7c0b074e3
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/pr104371-1.c
> > @@ -0,0 +1,14 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -msse4" } */
> > +/* { dg-final { scan-assembler "ptest\[ \\t\]" } } */
> > +/* { dg-final { scan-assembler-not "pxor\[ \\t\]" } } */
> > +/* { dg-final { scan-assembler-not "pcmpeqb\[ \\t\]" } } */
> > +/* { dg-final { scan-assembler-not "pmovmskb\[ \\t\]" } } */
> > +
> > +#include 
> > +#include 
> > +
> > +bool is_zero(__m128i x)
> > +{
> > +  return _mm_movemask_epi8(_mm_cmpeq_epi8(x, _mm_setzero_si128()))
> ==
> > +0x; }
> > diff --git a/gcc/testsuite/gcc.target/i386/pr104371-2.c
> > b/gcc/testsuite/gcc.target/i386/pr104371-2.c
> > new file mode 100755
> > index 000..f0d0afd5897
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/pr104371-2.c
> > @@ -0,0 +1,14 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -mavx2" } */
> > +/* { dg-final { scan-assembler "vptest\[ \\t\]" } } */
> > +/* { dg-final { scan-assembler-not "vpxor\[ \\t\]" } } */
> > +/* { dg-final { scan-assembler-not "vpcmpeqb\[ \\t\]" } } */
> > +/* { dg-final { scan-assembler-not "vpmovmskb\[ \\t\]" } } */
> > +
> > +#include 
> > +#include 
> > +
> > +bool is_zero256(__m256i x)
> > +{
> > +  return _mm256_movemask_epi8(_mm256_cmpeq_epi8(x,
> > +_mm256_setzero_si256())) == 0x; }
> > --
> > 2.18.1
> >


0001-i386-Add-combine-splitter-to-transform-pxor-pcmpeqb-.patch
Description: 0001-i386-Add-combine-splitter-to-transform-pxor-pcmpeqb-.patch


RE: [PATCH] [i386]Add combine splitter to transform pxor/pcmpeqb/pmovmskb/cmp 0xffff to ptest.

2022-05-06 Thread Jiang, Haochen via Gcc-patches
> -Original Message-
> From: Hongyu Wang 
> Sent: Friday, May 6, 2022 4:50 PM
> To: Jiang, Haochen 
> Cc: GCC Patches ; Liu, Hongtao
> 
> Subject: Re: [PATCH] [i386]Add combine splitter to transform
> pxor/pcmpeqb/pmovmskb/cmp 0x to ptest.
> 
> > +(define_split
> > +  [(set (reg:CCZ FLAGS_REG)
> > +   (compare:CCZ (unspec:SI
> > +   [(eq:VI1_AVX2
> > +   (match_operand:VI1_AVX2 0 "vector_operand")
> > +   (match_operand:VI1_AVX2 1 "const0_operand"))]
> > +   UNSPEC_MOVMSK)
> > +(match_operand 2 "const_int_operand")))]
> > +  "TARGET_SSE4_1 && ix86_match_ccmode (insn, CCmode)
> 
> It looks like set_src and set_dst are all CCZmode, do we really need
> ix86_match_ccmode?
> 
> > +  && (INTVAL (operands[2]) == (int) ())"
> 
> I think (int) convert is not needed for const, and INTVAL actually
> returns HOST_WIDE_INT

It should be int convert here, because we need 0xfff become -1 in this 
compare.

Haochen.

> 
> > +#include 
> > +
> > +bool is_zero(__m128i x)
> 
> bool is not necessary here, we can use int and drop stdbool.
> 
> Haochen Jiang via Gcc-patches  于2022年5月6
> 日周五 16:01写道:
> >
> > Hi all,
> >
> > This patch aims to add a combine splitter to transform
> pxor/pcmpeqb/pmovmskb/cmp 0x to ptest.
> >
> > Regtested on x86_64-pc-linux-gnu. Ok for trunk?
> >
> > BRs,
> > Haochen
> >
> > gcc/ChangeLog:
> >
> > PR target/104371
> > * config/i386/sse.md: Add new define_mode_attr and define_split.
> >
> > gcc/testsuite/ChangeLog:
> >
> > PR target/104371
> > * gcc.target/i386/pr104371-1.c: New test.
> > * gcc.target/i386/pr104371-2.c: Ditto.
> > ---
> >  gcc/config/i386/sse.md | 19 +++
> >  gcc/testsuite/gcc.target/i386/pr104371-1.c | 14 ++
> >  gcc/testsuite/gcc.target/i386/pr104371-2.c | 14 ++
> >  3 files changed, 47 insertions(+)
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr104371-1.c
> >  create mode 100755 gcc/testsuite/gcc.target/i386/pr104371-2.c
> >
> > diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> > index 7b791def542..71afda73c8f 100644
> > --- a/gcc/config/i386/sse.md
> > +++ b/gcc/config/i386/sse.md
> > @@ -20083,6 +20083,25 @@
> > (set_attr "prefix" "maybe_vex")
> > (set_attr "mode" "SI")])
> >
> > +;; Optimize pxor/pcmpeqb/pmovmskb/cmp 0x to ptest.
> > +(define_mode_attr vi1avx2const
> > +  [(V32QI "0x") (V16QI "0x")])
> > +
> > +(define_split
> > +  [(set (reg:CCZ FLAGS_REG)
> > +   (compare:CCZ (unspec:SI
> > +   [(eq:VI1_AVX2
> > +   (match_operand:VI1_AVX2 0 "vector_operand")
> > +   (match_operand:VI1_AVX2 1 "const0_operand"))]
> > +   UNSPEC_MOVMSK)
> > +(match_operand 2 "const_int_operand")))]
> > +  "TARGET_SSE4_1 && ix86_match_ccmode (insn, CCmode)
> > +  && (INTVAL (operands[2]) == (int) ())"
> > +  [(set (reg:CC FLAGS_REG)
> > +   (unspec:CC [(match_dup 0)
> > +   (match_dup 0)]
> > +  UNSPEC_PTEST))])
> > +
> >  (define_expand "sse2_maskmovdqu"
> >[(set (match_operand:V16QI 0 "memory_operand")
> > (unspec:V16QI [(match_operand:V16QI 1 "register_operand")
> > diff --git a/gcc/testsuite/gcc.target/i386/pr104371-1.c
> b/gcc/testsuite/gcc.target/i386/pr104371-1.c
> > new file mode 100644
> > index 000..df7c0b074e3
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/pr104371-1.c
> > @@ -0,0 +1,14 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -msse4" } */
> > +/* { dg-final { scan-assembler "ptest\[ \\t\]" } } */
> > +/* { dg-final { scan-assembler-not "pxor\[ \\t\]" } } */
> > +/* { dg-final { scan-assembler-not "pcmpeqb\[ \\t\]" } } */
> > +/* { dg-final { scan-assembler-not "pmovmskb\[ \\t\]" } } */
> > +
> > +#include 
> > +#include 
> > +
> > +bool is_zero(__m128i x)
> > +{
> > +  return _mm_movemask_epi8(_mm_cmpeq_epi8(x, _mm_setzero_si128()))
> == 0x;
> > +}
> > diff --git a/gcc/testsuite/gcc.target/i386/pr104371-2.c
> b/gcc/testsuite/gcc.target/i386/pr104371-2.c
> > new file mode 100755
> > index 000..f0d0afd5897
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/pr104371-2.c
> > @@ -0,0 +1,14 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -mavx2" } */
> > +/* { dg-final { scan-assembler "vptest\[ \\t\]" } } */
> > +/* { dg-final { scan-assembler-not "vpxor\[ \\t\]" } } */
> > +/* { dg-final { scan-assembler-not "vpcmpeqb\[ \\t\]" } } */
> > +/* { dg-final { scan-assembler-not "vpmovmskb\[ \\t\]" } } */
> > +
> > +#include 
> > +#include 
> > +
> > +bool is_zero256(__m256i x)
> > +{
> > +  return _mm256_movemask_epi8(_mm256_cmpeq_epi8(x,
> _mm256_setzero_si256())) == 0x;
> > +}
> > --
> > 2.18.1
> >


RE: [PATCH] Reconstruct i386 testsuite with __builtin_cpu_supports

2022-05-06 Thread Jiang, Haochen via Gcc-patches
Hi Uros,

I understand that we always keep the old testcases there. It is always safe to 
do that.

But I have another question, if we add something new in one of the existing 
files in the future,
should we use __builtin_cpu_supports to keep the code clearer or stick to 
cpuids?

I believe __builtin_cpu_supports will be a clearer way for a coder to 
understand under current circumstance.
So if we use that in future use, why don't we change everything to the same way?

BRs,
Haochen 

-Original Message-
From: Uros Bizjak  
Sent: Friday, May 6, 2022 5:17 PM
To: Hongyu Wang 
Cc: Jiang, Haochen ; Liu, Hongtao 
; gcc-patches@gcc.gnu.org
Subject: Re: [PATCH] Reconstruct i386 testsuite with __builtin_cpu_supports

On Fri, May 6, 2022 at 11:00 AM Hongyu Wang  wrote:
>
> > I don't think *_os_support calls should be removed. IIRC,
> > __builtin_cpu_supports function checks if the feature is supported by
> > CPU, whereas *_os_supports calls check via xgetbv if OS supports
> > handling of new registers.
>
> avx_os_support is like
>
> avx_os_support (void)
> {
>   unsigned int eax, edx;
>   unsigned int ecx = XCR_XFEATURE_ENABLED_MASK;
>
>   __asm__ ("xgetbv" : "=a" (eax), "=d" (edx) : "c" (ecx));
>
>   return (eax & (XSTATE_SSE | XSTATE_YMM)) == (XSTATE_SSE | XSTATE_YMM);
> }
>
> While in get_avaliable_features we have
>
> #define XCR_AVX_ENABLED_MASK \
>   (XSTATE_SSE | XSTATE_YMM)
>   if ((ecx & bit_OSXSAVE))
> {
>   /* Check if XMM, YMM, OPMASK, upper 256 bits of ZMM0-ZMM15 and
> ZMM16-ZMM31 states are supported by OSXSAVE.  */
>   unsigned int xcrlow;
>   unsigned int xcrhigh;
>   __asm__ (".byte 0x0f, 0x01, 0xd0" /* xgetbv  */
>: "=a" (xcrlow), "=d" (xcrhigh)
>: "c" (XCR_XFEATURE_ENABLED_MASK));
>   if ((xcrlow & XCR_AVX_ENABLED_MASK) == XCR_AVX_ENABLED_MASK)
> {
>   avx_usable = 1;
>
> So __builtin_cpu_supports already inherits same check

Indeed, thanks for the explanation.

OTOH, we don't change the existing tests (perhaps only dg- directives
when infrastructure improves), so I would leave the existing testcases
as they are. In future, new helper functions should be implemented
with __builtin_cpu_supports, but let's leave existing ones as they
are.

Uros.

> Uros Bizjak via Gcc-patches  于2022年5月6日周五 16:27写道:
> >
> > On Fri, May 6, 2022 at 9:57 AM Haochen Jiang  
> > wrote:
> > >
> > > Hi all,
> > >
> > > There are some check files in i386 testsuite are written before the 
> > > function __builtin_cpu_supports is introduced. All of them are using 
> > > __get_cpuid_count. This patch aims to reconstruct the i386 testsuite with 
> > > __builtin_cpu_supports so that we can have a much clearer code.
> > >
> > > Regtested on x86_64-pc-linux-gnu. Ok for trunk?
> >
> > I don't think *_os_support calls should be removed. IIRC,
> > __builtin_cpu_supports function checks if the feature is supported by
> > CPU, whereas *_os_supports calls check via xgetbv if OS supports
> > handling of new registers.
> >
> > Uros.
> >
> > >
> > > Also when writting this patch, I also find some files in testsuite that 
> > > might be useless currently. For example, in the file 
> > > gcc/testsuite/gcc.target/i386/sse-os-support.h, it always return 1. And 
> > > there are also some files will no longer be included at all with this 
> > > patch. Should we remove those files when we have time?
> > >
> > > BRs,
> > > Haochen
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > > * gcc.target/i386/adx-check.h: Change bit check to
> > > __builtin_cpu_supports.
> > > * gcc.target/i386/aes-avx-check.h: Ditto.
> > > * gcc.target/i386/aes-check.h: Ditto.
> > > * gcc.target/i386/avx-check.h: Ditto.
> > > * gcc.target/i386/avx2-check.h: Ditto.
> > > * gcc.target/i386/avx512-check.h: Ditto.
> > > * gcc.target/i386/bmi-check.h: Ditto.
> > > * gcc.target/i386/bmi2-check.h: Ditto.
> > > * gcc.target/i386/f16c-check.h: Ditto.
> > > * gcc.target/i386/fma-check.h: Ditto.
> > > * gcc.target/i386/fma4-check.h: Ditto.
> > > * gcc.target/i386/lzcnt-check.h: Ditto.
> > > * gcc.target/i386/mmx-3dnow-check.h: Ditto.
> > > * gcc.target/i386/mmx-check.h: Ditto.
> > > * gcc.target/i386/pclmul-avx-check.h: Ditto.
> > > * gcc.target/i386/pclmul-check.h: Ditto.
> > > * gcc.target/i386/rtm-check.h: Ditto.
> > > * gcc.target/i386/sha-check.h: Ditto.
> > > * gcc.target/i386/sse-check.h: Ditto.
> > > * gcc.target/i386/sse2-check.h: Ditto.
> > > * gcc.target/i386/sse3-check.h: Ditto.
> > > * gcc.target/i386/sse4_1-check.h: Ditto.
> > > * gcc.target/i386/sse4_2-check.h: Ditto.
> > > * gcc.target/i386/sse4a-check.h: Ditto.
> > > * gcc.target/i386/ssse3-check.h: Ditto.
> > > * gcc.target/i386/xop-check.h: Ditto.
> > > ---
> > >  gcc/testsuite/gcc.target/i386/adx-check.h | 10 +---
> > >  

RE: [PATCH] [i386] Optimize a ^ ((a ^ b) & mask) to (~mask & a) | (b & mask).

2022-01-12 Thread Jiang, Haochen via Gcc-patches
Hi Uros,

Has fixed that format issue with this new patch. Ok for trunk?

Thx,
Haochen

-Original Message-
From: Uros Bizjak  
Sent: Thursday, January 13, 2022 3:22 AM
To: Jiang, Haochen 
Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao 
Subject: Re: [PATCH] [i386] Optimize a ^ ((a ^ b) & mask) to (~mask & a) | (b & 
mask).

On Wed, Jan 12, 2022 at 9:11 AM Haochen Jiang  wrote:
>
> Hi all,
>
> This patch targets PR94790, which change the instruction selection under the 
> following circumstance.
>
> Regtested on x86_64-pc-linux-gnu. Ok for trunk?

Please also test with -m32, e.g.:

make -j 12 -k check RUNTESTFLAGS="--target_board=unix\{,-m32\}"

OK (with an it below), if new testcases do not FAIL with -m32.

Thanks,
Uros.

>
> BRs,
> Haochen
>
> From the perspective of the pipeline, `andn + and + ior` version take
> 2 cycles(AND and ANDN doesn't have dependence), but xor + and + xor 
> will take 3 cycles.
>
> -   xorl%edi, %esi
> andl%edx, %esi
> -   movl%esi, %eax
> -   xorl%edi, %eax
> +   andn%edi, %edx, %eax
> +   orl %esi, %eax
>
> gcc/ChangeLog:
>
> PR taeget/94790
> * config/i386/i386.md (*xor2andn): New define_insn_and_split.
>
> gcc/testsuite/ChangeLog:
>
> PR taeget/94790
> * gcc.target/i386/pr94790-1.c: New test.
> * gcc.target/i386/pr94790-2.c: Ditto.
> ---
>  gcc/config/i386/i386.md   | 39 +++
>  gcc/testsuite/gcc.target/i386/pr94790-1.c | 14   
> gcc/testsuite/gcc.target/i386/pr94790-2.c |  9 ++
>  3 files changed, 62 insertions(+)
>  create mode 100755 gcc/testsuite/gcc.target/i386/pr94790-1.c
>  create mode 100755 gcc/testsuite/gcc.target/i386/pr94790-2.c
>
> diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index 
> 9b424a3935b..38efc6d5837 100644
> --- a/gcc/config/i386/i386.md
> +++ b/gcc/config/i386/i386.md
> @@ -10452,6 +10452,45 @@
> (set_attr "znver1_decode" "double")
> (set_attr "mode" "DI")])
>
> +;; PR target/94790: Optimize a ^ ((a ^ b) & mask) to (~mask & a) | (b 
> +& mask) (define_insn_and_split "*xor2andn"
> +  [(set (match_operand:SWI248 0 "nonimmediate_operand")
> +   (xor:SWI248
> + (and:SWI248
> +   (xor:SWI248
> + (match_operand:SWI248 1 "nonimmediate_operand")
> + (match_operand:SWI248 2 "nonimmediate_operand"))
> +   (match_operand:SWI248 3 "nonimmediate_operand"))
> + (match_dup 1)))
> +(clobber (reg:CC FLAGS_REG))]
> +  "(TARGET_BMI || TARGET_AVX512BW)
> +   && ix86_pre_reload_split ()"
> +  "#"
> +  "&& 1"
> +  [(parallel [(set (match_dup 4)
> +   (and:SWI248
> + (not:SWI248
> +   (match_dup 3))
> + (match_dup 1)))
> + (clobber (reg:CC FLAGS_REG))])
> +   (parallel [(set (match_dup 5)
> +   (and:SWI248
> + (match_dup 2)
> + (match_dup 3)))
> + (clobber (reg:CC FLAGS_REG))])
> +   (parallel [(set (match_dup 0)
> +   (ior:SWI248
> + (match_dup 4)
> + (match_dup 5)))
> + (clobber (reg:CC FLAGS_REG))])]
> +  {
> +operands[1] = force_reg (mode, operands[1]);
> +operands[3] = force_reg (mode, operands[3]);
> +operands[4] = gen_reg_rtx (mode);
> +operands[5] = gen_reg_rtx (mode);
> +  }
> +)

Please put brace just after the curved brace, see numerous examples in .md 
files.

> +
>  ;; See comment for addsi_1_zext why we do use nonimmediate_operand  
> (define_insn "*si_1_zext"
>[(set (match_operand:DI 0 "register_operand" "=r") diff --git 
> a/gcc/testsuite/gcc.target/i386/pr94790-1.c 
> b/gcc/testsuite/gcc.target/i386/pr94790-1.c
> new file mode 100755
> index 000..6ebbec15cfd
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr94790-1.c
> @@ -0,0 +1,14 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mbmi" } */
> +/* { dg-final { scan-assembler-times "andn\[ \\t\]" 2 } } */
> +/* { dg-final { scan-assembler-not "xorl\[ \\t\]" } } */
> +
> +unsigned r1(unsigned a, unsigned b, unsigned mask) {
> +  return a ^ ((a ^ b) & mask);
> +}
> +
> +unsigned r2(unsigned a, unsigned b, unsigned mask) {
> +  return (~mask & a) | (b & mask);
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/pr94790-2.c 
> b/gcc/testsuite/gcc.target/i386/pr94790-2.c
> new file mode 100755
> index 000..d7b0eec5bef
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr94790-2.c
> @@ -0,0 +1,9 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mbmi" } */
> +/* { dg-final { scan-assembler-not "andn\[ \\t\]" } } */
> +/* { dg-final { scan-assembler-times "xorl\[ \\t\]" 2 } } */
> +
> +unsigned r1(unsigned a, unsigned b, unsigned mask) {
> +  return a ^ ((a ^ b) & mask) + (a ^ b); }
> --
> 2.18.1
>


0001-i386-Optimize-a-a-b-mask-to-mask-a-b-mask.patch
Description: 0001-i386-Optimize-a-a-b-mask-to-mask-a-b-mask.patch


RE: [PATCH] [i386] Optimize a ^ ((a ^ b) & mask) to (~mask & a) | (b & mask).

2022-01-12 Thread Jiang, Haochen via Gcc-patches
Hi Uros,

I have also tested on -m32. They do not fail.

Thx,
Haochen

-Original Message-
From: Uros Bizjak  
Sent: Thursday, January 13, 2022 3:22 AM
To: Jiang, Haochen 
Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao 
Subject: Re: [PATCH] [i386] Optimize a ^ ((a ^ b) & mask) to (~mask & a) | (b & 
mask).

On Wed, Jan 12, 2022 at 9:11 AM Haochen Jiang  wrote:
>
> Hi all,
>
> This patch targets PR94790, which change the instruction selection under the 
> following circumstance.
>
> Regtested on x86_64-pc-linux-gnu. Ok for trunk?

Please also test with -m32, e.g.:

make -j 12 -k check RUNTESTFLAGS="--target_board=unix\{,-m32\}"

OK (with an it below), if new testcases do not FAIL with -m32.

Thanks,
Uros.

>
> BRs,
> Haochen
>
> From the perspective of the pipeline, `andn + and + ior` version take
> 2 cycles(AND and ANDN doesn't have dependence), but xor + and + xor 
> will take 3 cycles.
>
> -   xorl%edi, %esi
> andl%edx, %esi
> -   movl%esi, %eax
> -   xorl%edi, %eax
> +   andn%edi, %edx, %eax
> +   orl %esi, %eax
>
> gcc/ChangeLog:
>
> PR taeget/94790
> * config/i386/i386.md (*xor2andn): New define_insn_and_split.
>
> gcc/testsuite/ChangeLog:
>
> PR taeget/94790
> * gcc.target/i386/pr94790-1.c: New test.
> * gcc.target/i386/pr94790-2.c: Ditto.
> ---
>  gcc/config/i386/i386.md   | 39 +++
>  gcc/testsuite/gcc.target/i386/pr94790-1.c | 14   
> gcc/testsuite/gcc.target/i386/pr94790-2.c |  9 ++
>  3 files changed, 62 insertions(+)
>  create mode 100755 gcc/testsuite/gcc.target/i386/pr94790-1.c
>  create mode 100755 gcc/testsuite/gcc.target/i386/pr94790-2.c
>
> diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index 
> 9b424a3935b..38efc6d5837 100644
> --- a/gcc/config/i386/i386.md
> +++ b/gcc/config/i386/i386.md
> @@ -10452,6 +10452,45 @@
> (set_attr "znver1_decode" "double")
> (set_attr "mode" "DI")])
>
> +;; PR target/94790: Optimize a ^ ((a ^ b) & mask) to (~mask & a) | (b 
> +& mask) (define_insn_and_split "*xor2andn"
> +  [(set (match_operand:SWI248 0 "nonimmediate_operand")
> +   (xor:SWI248
> + (and:SWI248
> +   (xor:SWI248
> + (match_operand:SWI248 1 "nonimmediate_operand")
> + (match_operand:SWI248 2 "nonimmediate_operand"))
> +   (match_operand:SWI248 3 "nonimmediate_operand"))
> + (match_dup 1)))
> +(clobber (reg:CC FLAGS_REG))]
> +  "(TARGET_BMI || TARGET_AVX512BW)
> +   && ix86_pre_reload_split ()"
> +  "#"
> +  "&& 1"
> +  [(parallel [(set (match_dup 4)
> +   (and:SWI248
> + (not:SWI248
> +   (match_dup 3))
> + (match_dup 1)))
> + (clobber (reg:CC FLAGS_REG))])
> +   (parallel [(set (match_dup 5)
> +   (and:SWI248
> + (match_dup 2)
> + (match_dup 3)))
> + (clobber (reg:CC FLAGS_REG))])
> +   (parallel [(set (match_dup 0)
> +   (ior:SWI248
> + (match_dup 4)
> + (match_dup 5)))
> + (clobber (reg:CC FLAGS_REG))])]
> +  {
> +operands[1] = force_reg (mode, operands[1]);
> +operands[3] = force_reg (mode, operands[3]);
> +operands[4] = gen_reg_rtx (mode);
> +operands[5] = gen_reg_rtx (mode);
> +  }
> +)

Please put brace just after the curved brace, see numerous examples in .md 
files.

> +
>  ;; See comment for addsi_1_zext why we do use nonimmediate_operand  
> (define_insn "*si_1_zext"
>[(set (match_operand:DI 0 "register_operand" "=r") diff --git 
> a/gcc/testsuite/gcc.target/i386/pr94790-1.c 
> b/gcc/testsuite/gcc.target/i386/pr94790-1.c
> new file mode 100755
> index 000..6ebbec15cfd
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr94790-1.c
> @@ -0,0 +1,14 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mbmi" } */
> +/* { dg-final { scan-assembler-times "andn\[ \\t\]" 2 } } */
> +/* { dg-final { scan-assembler-not "xorl\[ \\t\]" } } */
> +
> +unsigned r1(unsigned a, unsigned b, unsigned mask) {
> +  return a ^ ((a ^ b) & mask);
> +}
> +
> +unsigned r2(unsigned a, unsigned b, unsigned mask) {
> +  return (~mask & a) | (b & mask);
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/pr94790-2.c 
> b/gcc/testsuite/gcc.target/i386/pr94790-2.c
> new file mode 100755
> index 000..d7b0eec5bef
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr94790-2.c
> @@ -0,0 +1,9 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mbmi" } */
> +/* { dg-final { scan-assembler-not "andn\[ \\t\]" } } */
> +/* { dg-final { scan-assembler-times "xorl\[ \\t\]" 2 } } */
> +
> +unsigned r1(unsigned a, unsigned b, unsigned mask) {
> +  return a ^ ((a ^ b) & mask) + (a ^ b); }
> --
> 2.18.1
>


RE: [PATCH] [i386] Remove register restriction on operands for andnot insn

2022-01-09 Thread Jiang, Haochen via Gcc-patches
Hi Hongtao,

I have changed that message in this patch. Ok for trunk?

Thx,
Haochen

-Original Message-
From: Hongtao Liu  
Sent: Monday, January 10, 2022 3:25 PM
To: Jiang, Haochen 
Cc: GCC Patches ; Liu, Hongtao 
Subject: Re: [PATCH] [i386] Remove register restriction on operands for andnot 
insn

On Mon, Jan 10, 2022 at 2:23 PM Haochen Jiang via Gcc-patches 
 wrote:
>
> Hi all,
>
> This patch removes the register restriction on operands for andnot insn so 
> that it can be used from memory.
>
> Regtested on x86_64-pc-linux-gnu. Ok for trunk?
>
> BRs,
> Haochen
>
> gcc/ChangeLog:
>
> PR target/53652
> * config/i386/sse.md (*andnot3): Remove register restriction.
It should be "Extend predicate of operands[1] from register_operand to 
vector_operand".
Similar for you commit message.
>
> gcc/testsuite/ChangeLog:
>
> PR target/53652
> * gcc.target/i386/pr53652-1.c: New test.
> ---
>  gcc/config/i386/sse.md|  2 +-
>  gcc/testsuite/gcc.target/i386/pr53652-1.c | 16 
>  2 files changed, 17 insertions(+), 1 deletion(-)  create mode 100644 
> gcc/testsuite/gcc.target/i386/pr53652-1.c
>
> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 
> 0997d9edf9d..4448b875d35 100644
> --- a/gcc/config/i386/sse.md
> +++ b/gcc/config/i386/sse.md
> @@ -16630,7 +16630,7 @@
>  (define_insn "*andnot3"
>[(set (match_operand:VI 0 "register_operand" "=x,x,v")
> (and:VI
> - (not:VI (match_operand:VI 1 "register_operand" "0,x,v"))
> + (not:VI (match_operand:VI 1 "vector_operand" "0,x,v"))
>   (match_operand:VI 2 "bcst_vector_operand" "xBm,xm,vmBr")))]
>"TARGET_SSE"
>  {
> diff --git a/gcc/testsuite/gcc.target/i386/pr53652-1.c 
> b/gcc/testsuite/gcc.target/i386/pr53652-1.c
> new file mode 100644
> index 000..bd07ee29f4d
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr53652-1.c
> @@ -0,0 +1,16 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -msse2" } */
> +/* { dg-final { scan-assembler-times "pandn\[ \\t\]" 2 } } */
> +/* { dg-final { scan-assembler-not "vpternlogq\[ \\t\]" } } */
> +
> +typedef unsigned long long vec __attribute__((vector_size (16))); vec 
> +g; vec f1 (vec a, vec b) {
> +  return ~a
> +}
> +vec f2 (vec a, vec b)
> +{
> +  return ~g
> +}
> +
> --
> 2.18.1
>


-- 
BR,
Hongtao


0001-i386-Extend-predicate-of-operands-1-from-register_op.patch
Description: 0001-i386-Extend-predicate-of-operands-1-from-register_op.patch


RE: [PATCH] [i386]Add combine splitter to transform vpcmpeqd/vpxor/vblendvps to vblendvps for ~op0

2021-12-07 Thread Jiang, Haochen via Gcc-patches
Hi Uros,

I have fixed that in this patch attached for checking in. Is that ok for trunk?

Regtested on x86_64-pc-linux-gnu.

Thx,
Haochen

-Original Message-
From: Uros Bizjak  
Sent: Wednesday, December 8, 2021 12:14 AM
To: Jiang, Haochen 
Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao 
Subject: Re: [PATCH] [i386]Add combine splitter to transform 
vpcmpeqd/vpxor/vblendvps to vblendvps for ~op0

On Tue, Dec 7, 2021 at 3:10 AM Haochen Jiang via Gcc-patches 
 wrote:
>
> This patch adds combine splitter to transform vpcmpeqd/vpxor/vblendvps to 
> vblendvps for ~op0.
>
> OK for trunk?
>
> BRs,
> Haochen
>
> gcc/ChangeLog:
>
> PR target/100738
> * config/i386/sse.md 
> (*_blendv_not_ltint):
> Add new define_insn_and_split.
>
> gcc/testsuite/ChangeLog:
>
> PR target/100738
> * g++.target/i386/pr100738-1.C: New test.

OK with a change below.

Thanks,
Uros.

>
> ---
>  gcc/config/i386/sse.md | 28 ++
>  gcc/testsuite/g++.target/i386/pr100738-1.C | 19 +++
>  2 files changed, 47 insertions(+)
>  create mode 100755 gcc/testsuite/g++.target/i386/pr100738-1.C
>
> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 
> 08bdcddc111..db3506c78d7 100644
> --- a/gcc/config/i386/sse.md
> +++ b/gcc/config/i386/sse.md
> @@ -20659,6 +20659,34 @@
> (set_attr "btver2_decode" "vector,vector,vector")
> (set_attr "mode" "")])
>
> +;; PR target/100738: Transform vpcmpeqd + vpxor + vblendvps to 
> +vblendvps for inverted mask; (define_insn_and_split 
> "*_blendv_not_ltint"
> +  [(set (match_operand: 0 "register_operand")
> +   (unspec:
> + [(match_operand: 1 "register_operand")
> +  (match_operand: 2 "vector_operand")
> +  (subreg:
> +(lt:VI48_AVX
> +  (subreg:VI48_AVX
> +  (not:
> +(match_operand: 3 "register_operand")) 0)
> +  (match_operand:VI48_AVX 4 "const0_operand")) 0)]
> + UNSPEC_BLENDV))]
> +  "TARGET_SSE4_1 && ix86_pre_reload_split ()"
> +  "#"
> +  "&& 1"
> +  [(set (match_dup 0)
> +   (unspec:
> +[(match_dup 2) (match_dup 1) (match_dup 3)] UNSPEC_BLENDV))] 
> +{
> +  operands[0] = gen_lowpart (mode, operands[0]);
> +  operands[1] = gen_lowpart (mode, operands[1]);
> +  operands[2] = gen_lowpart (mode, operands[2]);
> +  operands[3] = gen_lowpart (mode, operands[3]);
> +  if (MEM_P (operands[2]))
> +operands[2] = force_reg (mode, operands[2]);

You don't need to check for MEM_P, force_reg will do it for you.

> +})
> +
>  (define_insn "_dp"
>[(set (match_operand:VF_128_256 0 "register_operand" "=Yr,*x,x")
> (unspec:VF_128_256
> diff --git a/gcc/testsuite/g++.target/i386/pr100738-1.C 
> b/gcc/testsuite/g++.target/i386/pr100738-1.C
> new file mode 100755
> index 000..5a04c5b031f
> --- /dev/null
> +++ b/gcc/testsuite/g++.target/i386/pr100738-1.C
> @@ -0,0 +1,19 @@
> +/* { dg-do compile } */
> +/* { dg-options "-Ofast -mavx2" } */
> +/* { dg-final {scan-assembler-times "vblendvps\[ \\t\]" 2 } } */
> +/* { dg-final {scan-assembler-not "vpcmpeqd\[ \\t\]" } } */
> +/* { dg-final {scan-assembler-not "vpxor\[ \\t\]" } } */
> +
> +typedef int v4si __attribute__((vector_size(16))); typedef char v16qi 
> +__attribute__((vector_size(16)));
> +v4si
> +foo_1 (v16qi a, v4si b, v4si c, v4si d) {
> +  return ((v4si)~a) < 0 ? c : d;
> +}
> +
> +v4si
> +foo_2 (v16qi a, v4si b, v4si c, v4si d) {
> +  return ((v4si)~a) >= 0 ? c : d;
> +}
> --
> 2.18.1
>


0001-i386-Add-combine-splitter-to-transform-vpcmpeqd-vpxo.patch
Description: 0001-i386-Add-combine-splitter-to-transform-vpcmpeqd-vpxo.patch