from:"Jiang, Haochen"

RE: [PATCH 0/2] Align tight loops to solve cross cacheline issue

2024-05-14 Thread Jiang, Haochen

Also cc Honza and Richard since we touched generic tune.

Thx,
Haochen

> -Original Message-
> From: Haochen Jiang 
> Sent: Wednesday, May 15, 2024 11:04 AM
> To: gcc-patches@gcc.gnu.org
> Cc: Liu, Hongtao ; ubiz...@gmail.com
> Subject: [PATCH 0/2] Align tight loops to solve cross cacheline issue
> 
> Hi all,
> 
> Recently, we have encountered several random performance regressions in
> benchmarks commit to commit. It is caused by cross cacheline issue for tight
> loops.
> 
> We are trying to solve the issue by two patches. One is adjusting the loop
> alignment for generic tune, the other is aligning tight and hot loops more
> aggressively.
> 
> For SPECINT, we get a 0.85% improvement overall in rates, under option
> -O2 -march=x86-64-v3 -mtune=generic on Emerald Rapids.
> 
> BenchMarks  EMR Rates
> 500.perlbench_r -1.21%
> 502.gcc_r   0.78%
> 505.mcf_r   0.00%
> 520.omnetpp_r   0.41%
> 523.xalancbmk_r 1.33%
> 525.x264_r  2.83%
> 531.deepsjeng_r 1.11%
> 541.leela_r 0.00%
> 548.exchange2_r 2.36%
> 557.xz_r0.98%
> Geomean-int 0.85%
> 
> Side effect is that we get a 1.40% increase in codesize.
> 
> BenchMarks  EMR Codesize
> 500.perlbench_r 0.70%
> 502.gcc_r   0.67%
> 505.mcf_r   3.26%
> 520.omnetpp_r   0.31%
> 523.xalancbmk_r 1.15%
> 525.x264_r  1.11%
> 531.deepsjeng_r 1.40%
> 541.leela_r 1.31%
> 548.exchange2_r 3.06%
> 557.xz_r1.04%
> Geomean-int 1.40%
> 
> Bootstrapped and regtested on x86_64-pc-linux-gnu.
> 
> After we committed into trunk for a month, if there isn't any unexpected
> happen. We planned to backport it to GCC14.2.
> 
> Thx,
> Haochen
> 
> Haochen Jiang (1):
>   Adjust generic loop alignment from 16:11:8 to 16 for Intel processors
> 
> liuhongt (1):
>   Align tight loop without considering max skipping bytes.
> 
>  gcc/config/i386/i386.cc  | 148 ++-
>  gcc/config/i386/i386.md  |  10 ++-
>  gcc/config/i386/x86-tune-costs.h |   2 +-
>  3 files changed, 154 insertions(+), 6 deletions(-)
> 
> --
> 2.31.1

RE: [r15-429 Regression] FAIL: experimental/simd/pr109261_constexpr_simd.cc -msse2 -O2 -Wno-psabi (test for excess errors) on Linux/x86_64

2024-05-14 Thread Jiang, Haochen

Hi Matthias,

From my side, I get several error like this:

/export/users/haochenj/src/gcc-bisect/master/master/r15-429/bld/x86_64-linux/32/libstdc++-v3/include/experimental/bits/simd_builtin.h:131:
 error: could not convert 
'std::experimental::parallelism_v2::__vec_shuffle<__vector(4) wchar_t, 
__extract_part<2, 3, 2, wchar_t, 3>(_SimdWrapper)::, std::integer_sequence 
>(std::experimental::parallelism_v2::__as_vector<_SimdWrapper 
>(__x), (std::make_index_sequence<2>(), std::make_index_sequence<2>()), 
(std::experimental::parallelism_v2::__extract_part<2, 3, 
2, wchar_t, 3>(_SimdWrapper)::(), 
std::experimental::parallelism_v2::__extract_part<2, 3, 2, wchar_t, 
3>(_SimdWrapper)::()))' from 
'__vector(2) wchar_t' to 'std::conditional_t >' {aka 
'std::conditional >::type'}

See if this helps.

Thx,
Haochen

> -Original Message-
> From: Matthias Kretz 
> Sent: Tuesday, May 14, 2024 9:26 PM
> To: Jiang, Haochen 
> Cc: gcc-regress...@gcc.gnu.org; gcc-patches@gcc.gnu.org
> Subject: Re: [r15-429 Regression] FAIL:
> experimental/simd/pr109261_constexpr_simd.cc -msse2 -O2 -Wno-psabi (test
> for excess errors) on Linux/x86_64
> 
> Thanks for the report. But I'm unable to reproduce the issue. I'm testing on a
> Skylake-AVX512 system. I even did a clean rebuild of all of GCC using your
> configuration (minus your prefix) and still no failure.
> 
> Could you please send me your libstdc++.log after failing the test?
> 
> Best,
>   Matthias
> 
> On Montag, 13. Mai 2024 18:55:13 MESZ haochen. jiang wrote:
> > On Linux/x86_64,
> >
> > fb1649f8b4ad5043dd0e65e4e3a643a0ced018a9 is the first bad commit
> > commit fb1649f8b4ad5043dd0e65e4e3a643a0ced018a9
> > Author: Matthias Kretz 
> > Date:   Mon May 6 12:13:55 2024 +0200
> >
> > libstdc++: Use __builtin_shufflevector for simd split and concat
> >
> > caused
> >
> > FAIL: experimental/simd/pr109261_constexpr_simd.cc -msse2 -O2 -Wno-psabi
> > (test for excess errors)
> >
> > with GCC configured with
> >
> > ../../gcc/configure
> > --prefix=/export/users/haochenj/src/gcc-bisect/master/master/r15-429/usr
> > --enable-clocale=gnu --with-system-zlib --with-demangler-in-ld
> > --with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet
> > --without-isl --enable-libmpx x86_64-linux --disable-bootstrap
> >
> > To reproduce:
> >
> > $ cd {build_dir}/x86_64-linux/libstdc++-v3/testsuite && make check
> >
> RUNTESTFLAGS="conformance.exp=experimental/simd/pr109261_constexpr_si
> md.cc
> > --target_board='unix{-m32}'"
> >
> > (Please do not reply to this email, for question about this report, contact
> > me at haochen dot jiang at intel.com.) (If you met problems with
> > cascadelake related, disabling AVX512F in command line might save that.)
> > (However, please make sure that there is no potential problems with
> > AVX512.)
> 
> 
> --
> ─
> ─
>  Dr. Matthias Kretz   https://mattkretz.github.io
>  GSI Helmholtz Center for Heavy Ion Research   https://gsi.de
>  std::simd
> ─
> ─

RE: [PATCH] i386: Fix array index overflow in pr105354-2.c

2024-04-26 Thread Jiang, Haochen

> -Original Message-
> From: Uros Bizjak 
> Sent: Friday, April 26, 2024 5:13 PM
> To: Jiang, Haochen 
> Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao 
> Subject: Re: [PATCH] i386: Fix array index overflow in pr105354-2.c
> 
> On Fri, Apr 26, 2024 at 11:03 AM Haochen Jiang 
> wrote:
> >
> > Hi all,
> >
> > The array index should not be over 8 for v8hi, or it will fail
> > under -O0 or using -fstack-protector.
> >
> > This patch aims to fix that, which is mentioned in PR110621.
> >
> > Commit as obvious and backport to GCC13.
> >
> > Thx,
> > Haochen
> >
> > gcc/testsuite/ChangeLog:
> >
> > PR target/110621
> > * gcc.target/i386/pr105354-2.c: As mentioned.
> 
> Please note that the ChangeLog entry gets copied into the relevant
> ChangeLog file independently of the commit message. So, the above
> entry will be copied to gcc/testsuite/ChangeLog without any reference
> to what was mentioned.
>

I see. Forget to pay attention to that ChangeLog entry. My Bad.

Thx,
Haochen
 
> Uros.
>

RE: [PATCH] i386: Fix aes/vaes patterns [PR114576]

2024-04-08 Thread Jiang, Haochen

> -Original Message-
> From: Jakub Jelinek 
> Sent: Monday, April 8, 2024 9:43 PM
> To: Jiang, Haochen 
> Cc: Hongtao Liu ; gcc-patches@gcc.gnu.org; Liu, Hongtao
> ; ubiz...@gmail.com
> Subject: Re: [PATCH] i386: Fix aes/vaes patterns [PR114576]
> 
> On Mon, Apr 08, 2024 at 12:33:39PM +, Jiang, Haochen wrote:
> > Sorry for the late response since I am on vacation for now.
> >
> > > As the following testcase shows, the above change was incorrect.
> > >
> > > Using aes isa for the second alternative is obviously wrong, aes is 
> > > enabled
> > > whenever -maes is, regardless of -mavx or -mno-avx, so the above change
> > > means that for -maes -mno-avx RA can choose, either it matches the first
> > > alternative with the dup operand, or it matches the second one (but that
> > > is of course wrong because vaesenc VEX encoded insn needs AES & AVX
> CPUID).
> >
> > When I wrote that patch, I suppose it will never match the second one when
> > AVX is not enabled because it will immediately drop to the first one so the
> > second one is automatically AES && AVX, which is tricky here.
> 
> Before the -mvaes changes the alternatives were noavx,avx isa and so clearly
> it was either the first alternative is the solely available, or the second,
> depending on TARGET_AVX.  But with noavx,aes on the first alternative is
> enabled only for !TARGET_AVX, but the second one whenever TARGET_AES, which
> is both if !TARGET_AVX and TARGET_AVX.  So, the RA is free to consider both
> alternatives, and because the first one is more restrictive (requires
> output matching input), if there is a match between those, it will use the
> first alternative, but if there isn't, it will happily use the second
> alternative.
> 

Aha, I see. Thanks for the explanation.

Thx,
Haochen

RE: [PATCH] i386: Fix aes/vaes patterns [PR114576]

2024-04-08 Thread Jiang, Haochen

Hi Jakub,

Sorry for the late response since I am on vacation for now.

> As the following testcase shows, the above change was incorrect.
> 
> Using aes isa for the second alternative is obviously wrong, aes is enabled
> whenever -maes is, regardless of -mavx or -mno-avx, so the above change
> means that for -maes -mno-avx RA can choose, either it matches the first
> alternative with the dup operand, or it matches the second one (but that
> is of course wrong because vaesenc VEX encoded insn needs AES & AVX CPUID).

When I wrote that patch, I suppose it will never match the second one when
AVX is not enabled because it will immediately drop to the first one so the
second one is automatically AES && AVX, which is tricky here.

But this patch is buggy when "-maes -mavx512vl -mno-vaes" with %xmm16+ so
your change is needed, really appreciate that.

> 
> The big question is if "Since VAES should not imply AES" is the case or not.
> Looking around at what LLVM does on godbolt, seems since clang 6 which added
> -mvaes support -mvaes there implies -maes, but GCC treats those two
> independent.
> 
> Now, if we'd take the LLVM path of making -mvaes imply -maes and -mno-aes
> imply -mno-vaes, then we should probably just revert the above patch and
> tweak common/config/i386/ to do the implications (+ add the testcase from
> this patch).

LLVM always had less restrictions on ISA under such circumstances, I would like 
to
stick to how SDM did when implementing that, which is a little conservative.

However, I am also ok with VAES implying AES if there is no real HW that has
VAES w/o AES to reduce complexity in this scenario.

Thx,
Haochen

RE: [PATCH] i386: Add AVX10.1 related macros

2024-01-11 Thread Jiang, Haochen

> -Original Message-
> From: Richard Biener 
> Sent: Thursday, January 11, 2024 4:19 PM
> To: Liu, Hongtao 
> Cc: Jiang, Haochen ; gcc-patches@gcc.gnu.org;
> ubiz...@gmail.com; bur...@net-b.de; san...@codesourcery.com
> Subject: Re: [PATCH] i386: Add AVX10.1 related macros
> 
> On Thu, Jan 11, 2024 at 2:16 AM Liu, Hongtao 
> wrote:
> >
> >
> >
> > > -Original Message-
> > > From: Richard Biener 
> > > Sent: Wednesday, January 10, 2024 5:44 PM
> > > To: Liu, Hongtao 
> > > Cc: Jiang, Haochen ;
> > > gcc-patches@gcc.gnu.org; ubiz...@gmail.com; bur...@net-b.de;
> > > san...@codesourcery.com
> > > Subject: Re: [PATCH] i386: Add AVX10.1 related macros
> > >
> > > On Wed, Jan 10, 2024 at 9:01 AM Liu, Hongtao 
> > > wrote:
> > > >
> > > >
> > > >
> > > > > -Original Message-
> > > > > From: Jiang, Haochen 
> > > > > Sent: Wednesday, January 10, 2024 3:35 PM
> > > > > To: gcc-patches@gcc.gnu.org
> > > > > Cc: Liu, Hongtao ; ubiz...@gmail.com;
> > > > > burnus@net- b.de; san...@codesourcery.com
> > > > > Subject: [PATCH] i386: Add AVX10.1 related macros
> > > > >
> > > > > Hi all,
> > > > >
> > > > > This patch aims to add AVX10.1 related macros for libgomp's request.
> > > > > The request comes following:
> > > > >
> > > > > https://gcc.gnu.org/pipermail/gcc-patches/2024-January/642025.ht
> > > > > ml
> > > > >
> > > > > Ok for trunk?
> > > > >
> > > > > Thx,
> > > > > Haochen
> > > > >
> > > > > gcc/ChangeLog:
> > > > >
> > > > >   PR target/113288
> > > > >   * config/i386/i386-c.cc (ix86_target_macros_internal):
> > > > >   Add __AVX10_1__, __AVX10_1_256__ and __AVX10_1_512__.
> > > > > ---
> > > > >  gcc/config/i386/i386-c.cc | 7 +++
> > > > >  1 file changed, 7 insertions(+)
> > > > >
> > > > > diff --git a/gcc/config/i386/i386-c.cc
> > > > > b/gcc/config/i386/i386-c.cc index c3ae984670b..366b560158a
> > > > > 100644
> > > > > --- a/gcc/config/i386/i386-c.cc
> > > > > +++ b/gcc/config/i386/i386-c.cc
> > > > > @@ -735,6 +735,13 @@ ix86_target_macros_internal
> (HOST_WIDE_INT
> > > > > isa_flag,
> > > > >  def_or_undef (parse_in, "__EVEX512__");
> > > > >if (isa_flag2 & OPTION_MASK_ISA2_USER_MSR)
> > > > >  def_or_undef (parse_in, "__USER_MSR__");
> > > > > +  if (isa_flag2 & OPTION_MASK_ISA2_AVX10_1_256)
> > > > > +{
> > > > > +  def_or_undef (parse_in, "__AVX10_1_256__");
> > > > > +  def_or_undef (parse_in, "__AVX10_1__");
> > > > I think this is not needed, others LGTM.
> > >
> > > So __AVX10_1_256__ and __AVX10_1_512__ are redundant with
> > > __AVX10_1__ and __EVEX512__, right?
> > No, I mean __AVX10_1__ is redundant of __AVX10_1_256__ since -
> mavx10.1 is just alias of -mavx10.1-256.
> > We want explicit __AVX10_1_256__ and __AVX10_1_512__ and don't want
> mix __EVEX512__ with AVX10(They are related in their internal
> implementation, but we don't want the user to control the vector length of
> avx10 with -mno-evex512, -mno-evex512 is supposed for the existing
> AVX512).

Let's keep both of them if we prefer __AVX10_1_256__ since I just found
that LLVM got macro __AVX10_1__.

https://github.com/llvm/llvm-project/pull/67278/files#diff-7435d50346a810555df89deb1f879b767ee985ace43fb3990de17fb23a47f004

in file clang/lib/Basic/Targets/X86.cpp L774-777.

Thx,
Haochen

> 
> Ah, that makes sense.
> 
> > > > > +}
> > > > > +  if (isa_flag2 & OPTION_MASK_ISA2_AVX10_1_512)
> > > > > +def_or_undef (parse_in, "__AVX10_1_512__");
> > > > >if (TARGET_IAMCU)
> > > > >  {
> > > > >def_or_undef (parse_in, "__iamcu");
> > > > > --
> > > > > 2.31.1
> > > >

RE: [gcc-wwwdocs PATCH v2] gcc-13/14: Mention recent update for x86_64 backend

2023-12-27 Thread Jiang, Haochen

> -Original Message-
> From: Haochen Jiang 
> Sent: Thursday, December 21, 2023 4:26 PM
> To: gcc-patches@gcc.gnu.org
> Cc: ubiz...@gmail.com; Liu, Hongtao ;
> ger...@pfeifer.com
> Subject: [gcc-wwwdocs PATCH v2] gcc-13/14: Mention recent update for
> x86_64 backend
> 
> Hi all,
> 
> This is the v2 patch for the wwwdocs change regarding to review.
> 
> If there is no objection, I will push this change next Tuesday.

I will commit the doc change patch.

Thx,
Haochen

> 
> Changes is v2:
> 
>   - Remove RAO-INT from Grand Ridge
>   - Remove the mask register restriction for -mno-evex512
>   - Arrange the options alphabetically
>   - Other minor text change
> 
> Thx,
> Haochen
> 
> Messages in v1:
> 
> This patch will mention the following changes in wwwdocs for x86_64
> backend:
> 
>   - AVX10.1 support
>   - APX EGPR, PUSH2POP2, PPX and NDD support
>   - Xeon Phi ISAs deprecated
> 
> Also I adjust the words in x86_64 part for GCC 13.
> 
> ---
> Mention AVX10.1 support, APX support and Xeon Phi deprecate in GCC 14.
> Also adjust documentation in GCC 13.
> ---
>  htdocs/gcc-13/changes.html | 38 --
>  htdocs/gcc-14/changes.html | 27 ++-
>  2 files changed, 42 insertions(+), 23 deletions(-)
> 
> diff --git a/htdocs/gcc-13/changes.html b/htdocs/gcc-13/changes.html index
> d3bacc16..b4b1a39a 100644
> --- a/htdocs/gcc-13/changes.html
> +++ b/htdocs/gcc-13/changes.html
> @@ -543,24 +543,28 @@ You may also want to check out our
>__bf16 type to x86 psABI. Users need to adjust their
>AVX512BF16-related source code when upgrading GCC12 to GCC13.
>
> -  New ISA extension support for Intel AVX-IFMA was added.
> -  AVX-IFMA intrinsics are available via the -mavxifma
> +  New ISA extension support for Intel AMX-COMPLEX was added.
> +  AMX-COMPLEX intrinsics are available via the
> + -mamx-complex
>compiler switch.
>
> -  New ISA extension support for Intel AVX-VNNI-INT8 was added.
> -  AVX-VNNI-INT8 intrinsics are available via the -
> mavxvnniint8
> +  New ISA extension support for Intel AMX-FP16 was added.
> +  AMX-FP16 intrinsics are available via the -mamx-fp16
> +  compiler switch.
> +  
> +  New ISA extension support for Intel AVX-IFMA was added.
> +  AVX-IFMA intrinsics are available via the -mavxifma
>compiler switch.
>
>New ISA extension support for Intel AVX-NE-CONVERT was added.
>AVX-NE-CONVERT intrinsics are available via the
>-mavxneconvert compiler switch.
>
> -  New ISA extension support for Intel CMPccXADD was added.
> -  CMPccXADD intrinsics are available via the -mcmpccxadd
> +  New ISA extension support for Intel AVX-VNNI-INT8 was added.
> +  AVX-VNNI-INT8 intrinsics are available via the
> + -mavxvnniint8
>compiler switch.
>
> -  New ISA extension support for Intel AMX-FP16 was added.
> -  AMX-FP16 intrinsics are available via the -mamx-fp16
> +  New ISA extension support for Intel CMPccXADD was added.
> +  CMPccXADD intrinsics are available via the
> + -mcmpccxadd
>compiler switch.
>
>New ISA extension support for Intel PREFETCHI was added.
> @@ -571,10 +575,6 @@ You may also want to check out our
>RAO-INT intrinsics are available via the -mraoint
>compiler switch.
>
> -  New ISA extension support for Intel AMX-COMPLEX was added.
> -  AMX-COMPLEX intrinsics are available via the -mamx-
> complex
> -  compiler switch.
> -  
>GCC now supports the Intel CPU named Raptor Lake through
>  -march=raptorlake.
>  Raptor Lake is based on Alder Lake.
> @@ -585,13 +585,13 @@ You may also want to check out our
>
>GCC now supports the Intel CPU named Sierra Forest through
>  -march=sierraforest.
> -The switch enables the AVX-IFMA, AVX-VNNI-INT8, AVX-NE-CONVERT,
> CMPccXADD,
> -ENQCMD and UINTR ISA extensions.
> +Based on ISA extensions enabled on Alder Lake, the switch further enables
> +the AVX-IFMA, AVX-NE-CONVERT, AVX-VNNI-INT8, CMPccXADD,
> ENQCMD and UINTR
> +ISA extensions.
>
>GCC now supports the Intel CPU named Grand Ridge through
>  -march=grandridge.
> -The switch enables the AVX-IFMA, AVX-VNNI-INT8, AVX-NE-CONVERT,
> CMPccXADD,
> -ENQCMD, UINTR and RAO-INT ISA extensions.
> +Grand Ridge is based on Sierra Forest.
>
>GCC now supports the Intel CPU named Emerald Rapids through
>  -march=emeraldrapids.
> @@ -599,11 +599,13 @@ You may also want to check out our
>
>GCC now supports the Intel CPU named Granite Rapids through
>  -march=graniterapids.
> -The switch enables the AMX-FP16 and PREFETCHI ISA extensions.
> +Based on Sapphire Rapids, the switch further enables the AMX-FP16 and
> +PREFETCHI ISA extensions.
>
>GCC now supports the Intel CPU named Granite Rapids D through
>  -march=graniterapids-d.
> -The switch enables the AMX-FP16, PREFETCHI and AMX-COMPLEX ISA
>

RE: [r14-6770 Regression] FAIL: gcc.dg/gnu23-tag-4.c (test for excess errors) on Linux/x86_64

2023-12-24 Thread Jiang, Haochen

It is not a target specific issue, it will fail if we enabled AVX.

e.g.:

$ /export/users/haochenj/env/build_no_bootstrap_master/gcc/xgcc 
-B/export/users/haochenj/env/build_no_bootstrap_master/gcc/  
/export/users/haochenj/src/gcc/master/gcc/testsuite/gcc.dg/gnu23-tag-4.c  -m64 
-mavx   -fdiagnostics-plain-output   -std=gnu23 -S -o gnu23-tag-4.s
/export/users/haochenj/src/gcc/master/gcc/testsuite/gcc.dg/gnu23-tag-4.c: In 
function ‘bar’:
/export/users/haochenj/src/gcc/master/gcc/testsuite/gcc.dg/gnu23-tag-4.c:18:47: 
error: initialization of ‘struct g *’ from incompatible pointer type ‘struct g 
*’ [-Wincompatible-pointer-types]

Thx,
Haochen

> -Original Message-
> From: Martin Uecker 
> Sent: Friday, December 22, 2023 5:39 PM
> To: gcc-regress...@gcc.gnu.org; gcc-patches@gcc.gnu.org; Jiang, Haochen
> ; Joseph Myers 
> Subject: Re: [r14-6770 Regression] FAIL: gcc.dg/gnu23-tag-4.c (test for excess
> errors) on Linux/x86_64
> 
> 
> Hm, this is weird, as it really seems to depend on the -march=  So if 
> there is
> really a difference between those structs which make them incompatible on
> some archs, we should not consider them to be compatible in general.
> 
> struct g { int a[n]; int b; } *y;
> { struct g { int a[4]; int b; } *y2 = y; }
> 
> But I do not see what could go wrong here as sizeof / alignment is the same 
> for
> n = 4.  So there is something else I missed
> 
> 
> 
> Am Freitag, dem 22.12.2023 um 05:07 +0800 schrieb haochen.jiang:
> > On Linux/x86_64,
> >
> > 23fee88f84873b0b8b41c8e5a9b229d533fb4022 is the first bad commit
> > commit 23fee88f84873b0b8b41c8e5a9b229d533fb4022
> > Author: Martin Uecker 
> > Date:   Tue Aug 15 14:58:32 2023 +0200
> >
> > c23: tag compatibility rules for struct and unions
> >
> > caused
> >
> > FAIL: gcc.dg/gnu23-tag-4.c (test for excess errors)
> >
> > with GCC configured with
> >
> > ../../gcc/configure
> > --prefix=/export/users/haochenj/src/gcc-bisect/master/master/r14-6770/
> > usr --enable-clocale=gnu --with-system-zlib --with-demangler-in-ld
> > --with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet
> > --without-isl --enable-libmpx x86_64-linux --disable-bootstrap
> >
> > To reproduce:
> >
> > $ cd {build_dir}/gcc && make check RUNTESTFLAGS="dg.exp=gcc.dg/gnu23-
> tag-4.c --target_board='unix{-m32\ -march=cascadelake}'"
> > $ cd {build_dir}/gcc && make check RUNTESTFLAGS="dg.exp=gcc.dg/gnu23-
> tag-4.c --target_board='unix{-m64\ -march=cascadelake}'"
> >
> > (Please do not reply to this email, for question about this report,
> > contact me at haochen dot jiang at intel.com.) (If you met problems
> > with cascadelake related, disabling AVX512F in command line might save
> that.) (However, please make sure that there is no potential problems with
> AVX512.)

RE: [gcc-wwwdocs PATCH] gcc-13/14: Mention recent update for x86_64 backend

2023-12-13 Thread Jiang, Haochen

> -Original Message-
> From: Gerald Pfeifer 
> Sent: Wednesday, December 13, 2023 2:20 PM
> To: Jiang, Haochen 
> Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao ;
> ubiz...@gmail.com
> Subject: Re: [gcc-wwwdocs PATCH] gcc-13/14: Mention recent update for
> x86_64 backend
> 
> On Fri, 8 Dec 2023, Haochen Jiang wrote:
> > +++ b/htdocs/gcc-13/changes.html
> 
> > +Based on ISA extensions enabled on Alder Lake, the switch further
> enables
> > +the AVX-IFMA, AVX-VNNI-INT8, AVX-NE-CONVERT, CMPccXADD,
> ENQCMD and UINTR
> > +ISA extensions.
> 
> Personally I would alphabetically sort all the options, like you have mostly
> done already. Just AVX-VNNI-INT8 and AVX-NE-CONVERT are not.
> 
> (Maybe you have a reason, and in any case this should not block this
> patch.)
> 
> 
> > +++ b/htdocs/gcc-14/changes.html
> > +  New ISA extension support for Intel AVX10.1 was added.
> > +  AVX10.1 intrinsics are available via the -mavx10.1 or
> > +  -mavx10.1-256 compiler switch with 256 bit vector size
> > +  support. 512 bit vector size support for AVX10.1 intrinsics are
> 
> We usually write 256-bit and 512-bit as adjectives, cf.
> gcc.gnu.org/codingconventions.html .
> 
> > +  Part of new feature support for Intel APX was added, including EGPR,
> > +  PUSH2POP2, PPX and NDD.
> 
> Alphabetically?
> 
> > APX features are available via the
> > +  -mapxf compiler switch.
> 
> Could we say "APX is enabled via..." or "APX support is available via..."?
> 
> > +  Xeon Phi CPUs support (a.k.a. Knight Landing and Knight Mill) are
> marked
> > +as deprecated. GCC will emit a warning when using the
> > +-mavx5124fmaps, -mavx5124vnniw,
> > +-mavx512er, -mavx512pf,
> > +-mprefetchwt1, -march=knl,
> > +-march=knm, -mtune=knl and -
> mtune=knm
> > +compiler switch. The support will be removed in GCC 15.
> > +  
> 
> I believe "or" instead of "and" will be clearer.
> 
> And "compiler switches" (plural).
> 
> And just "Support" in the last sentence.
> 
> 
> Thanks for submitting these! No need for further review before committing
> (a minor variation).

Thanks for your review. I will fix them and also other alphabetic issue in 
GCC13/14
doc. Since there will be a Grand Ridge ISA change coming soon in GCC13 (I have
not sent out the patch but will happen within one week), I will commit the whole
patch after that landed.

Thx,
Haochen

> 
> Gerald

RE: [RFC] Intel AVX10.1 Compiler Design and Support

2023-12-12 Thread Jiang, Haochen

> > On the other hand, a new EVEX-capable level might bring earlier adoption
> > of EVEX capabilities to AMD CPUs, which still should be an improvement
> > over AVX2.  This could benefit AMD as well.  So I would really like to
> > see some AMD feedback here.
> >
> > There's also the matter that time scales for EVEX adoption are so long
> > that by then, Intel CPUs may end up supporting and preferring 512 bit
> > vectors again.
> 
> True, there isn't even widespread VEX adoption yet ... and now there's
> APX as the next best thing to target.
> 
> That said, my main point was that x86-64-v4 is "broken" as it appears
> as a dead end - AVX512 is no more, the future is AVX10, but yet we have
> to define x86-64-v5 as something that includes x86-64-v4.
> 
> So, can we un-do x86-64-v4?

As far as I have heard, x86-64-v4 is rarely used. There should be a small
chance to un-do that and not to break too many things. But I am not sure.

Thx,
Haochen

> 
> Richard.
> 
> > Thanks,
> > Florian
> >

RE: [r14-6420 Regression] FAIL: gcc.target/i386/pr110790-2.c scan-assembler-times shrq 2 on Linux/x86_64

2023-12-12 Thread Jiang, Haochen

> -Original Message-
> From: Hongtao Liu 
> Sent: Tuesday, December 12, 2023 2:06 PM
> To: Jiang, Haochen 
> Cc: Andrew Pinski (QUIC) ; haochen.jiang
> ; gcc-regress...@gcc.gnu.org; gcc-
> patc...@gcc.gnu.org
> Subject: Re: [r14-6420 Regression] FAIL: gcc.target/i386/pr110790-2.c scan-
> assembler-times shrq 2 on Linux/x86_64
> 
> On Tue, Dec 12, 2023 at 1:47 PM Jiang, Haochen via Gcc-regression
>  wrote:
> >
> > > -----Original Message-
> > > From: Jiang, Haochen
> > > Sent: Tuesday, December 12, 2023 9:11 AM
> > > To: Andrew Pinski (QUIC) ; haochen.jiang
> > > ; gcc-regress...@gcc.gnu.org; gcc-
> > > patc...@gcc.gnu.org
> > > Subject: RE: [r14-6420 Regression] FAIL: gcc.target/i386/pr110790-2.c
> scan-
> > > assembler-times shrq 2 on Linux/x86_64
> > >
> > > > -Original Message-
> > > > From: Andrew Pinski (QUIC) 
> > > > Sent: Tuesday, December 12, 2023 9:01 AM
> > > > To: haochen.jiang ; Andrew Pinski
> (QUIC)
> > > > ; gcc-regress...@gcc.gnu.org; gcc-
> > > > patc...@gcc.gnu.org; Jiang, Haochen 
> > > > Subject: RE: [r14-6420 Regression] FAIL: gcc.target/i386/pr110790-2.c
> > > scan-
> > > > assembler-times shrq 2 on Linux/x86_64
> > > >
> > > > > -Original Message-
> > > > > From: haochen.jiang 
> > > > > Sent: Monday, December 11, 2023 4:54 PM
> > > > > To: Andrew Pinski (QUIC) ; gcc-
> > > > > regress...@gcc.gnu.org; gcc-patches@gcc.gnu.org;
> > > haochen.ji...@intel.com
> > > > > Subject: [r14-6420 Regression] FAIL: gcc.target/i386/pr110790-2.c
> scan-
> > > > > assembler-times shrq 2 on Linux/x86_64
> > > > >
> > > > > On Linux/x86_64,
> > > > >
> > > > > 85c5efcffed19ca6160eeecc2d4faebd9fee63aa is the first bad commit
> > > commit
> > > > > 85c5efcffed19ca6160eeecc2d4faebd9fee63aa
> > > > > Author: Andrew Pinski 
> > > > > Date:   Sat Nov 11 15:54:10 2023 -0800
> > > > >
> > > > > MATCH: (convert)(zero_one !=/== 0/1) for outer type and zero_one
> type
> > > are
> > > > > the same
> > > > >
> > > > > caused
> > > > >
> > > > > FAIL: gcc.target/i386/pr110790-2.c scan-assembler-times shrq 2
> > > >
> > > >
> > > > So I think this is a testsuite issue, in that shrx instruction is being 
> > > > used
> here
> > > > instead of just ` shrq` due to that instruction being enabled with `-
> > > > march=cascadelake` .
> > > > Can someone confirm that and submit a testcase change?
> > >
> > > I will do that today.
> >
> > I suppose we might just need to change the scan-asm from shrq to shr to
> cover
> > shrx.
> Please use shr\[qx\], not shr.

I see. I will take that.

Thx,
Haochen

> >
> > Is that ok? If it is, I will commit a patch to change that.
> >
> > Thx,
> > Haochen
> >
> > >
> > > Thx,
> > > Haochen
> > >
> > > >
> > > > Thanks,
> > > > Andrew
> > > >
> > > > >
> > > > > with GCC configured with
> > > > >
> > > > > ../../gcc/configure --prefix=/export/users/haochenj/src/gcc-
> > > > > bisect/master/master/r14-6420/usr --enable-clocale=gnu --with-
> system-
> > > zlib -
> > > > > -with-demangler-in-ld --with-fpmath=sse --enable-
> > > languages=c,c++,fortran --
> > > > > enable-cet --without-isl --enable-libmpx x86_64-linux --disable-
> bootstrap
> > > > >
> > > > > To reproduce:
> > > > >
> > > > > $ cd {build_dir}/gcc && make check
> > > > > RUNTESTFLAGS="i386.exp=gcc.target/i386/pr110790-2.c --
> > > > > target_board='unix{-m64\ -march=cascadelake}'"
> > > > >
> > > > > (Please do not reply to this email, for question about this report,
> contact
> > > me at
> > > > > haochen dot jiang at intel.com.) (If you met problems with cascadelake
> > > > > related, disabling AVX512F in command line might save that.)
> (However,
> > > > > please make sure that there is no potential problems with AVX512.)
> 
> 
> 
> --
> BR,
> Hongtao

RE: [r14-6420 Regression] FAIL: gcc.target/i386/pr110790-2.c scan-assembler-times shrq 2 on Linux/x86_64

2023-12-11 Thread Jiang, Haochen

> -Original Message-
> From: Jiang, Haochen
> Sent: Tuesday, December 12, 2023 9:11 AM
> To: Andrew Pinski (QUIC) ; haochen.jiang
> ; gcc-regress...@gcc.gnu.org; gcc-
> patc...@gcc.gnu.org
> Subject: RE: [r14-6420 Regression] FAIL: gcc.target/i386/pr110790-2.c scan-
> assembler-times shrq 2 on Linux/x86_64
> 
> > -Original Message-
> > From: Andrew Pinski (QUIC) 
> > Sent: Tuesday, December 12, 2023 9:01 AM
> > To: haochen.jiang ; Andrew Pinski (QUIC)
> > ; gcc-regress...@gcc.gnu.org; gcc-
> > patc...@gcc.gnu.org; Jiang, Haochen 
> > Subject: RE: [r14-6420 Regression] FAIL: gcc.target/i386/pr110790-2.c
> scan-
> > assembler-times shrq 2 on Linux/x86_64
> >
> > > -Original Message-
> > > From: haochen.jiang 
> > > Sent: Monday, December 11, 2023 4:54 PM
> > > To: Andrew Pinski (QUIC) ; gcc-
> > > regress...@gcc.gnu.org; gcc-patches@gcc.gnu.org;
> haochen.ji...@intel.com
> > > Subject: [r14-6420 Regression] FAIL: gcc.target/i386/pr110790-2.c scan-
> > > assembler-times shrq 2 on Linux/x86_64
> > >
> > > On Linux/x86_64,
> > >
> > > 85c5efcffed19ca6160eeecc2d4faebd9fee63aa is the first bad commit
> commit
> > > 85c5efcffed19ca6160eeecc2d4faebd9fee63aa
> > > Author: Andrew Pinski 
> > > Date:   Sat Nov 11 15:54:10 2023 -0800
> > >
> > > MATCH: (convert)(zero_one !=/== 0/1) for outer type and zero_one type
> are
> > > the same
> > >
> > > caused
> > >
> > > FAIL: gcc.target/i386/pr110790-2.c scan-assembler-times shrq 2
> >
> >
> > So I think this is a testsuite issue, in that shrx instruction is being 
> > used here
> > instead of just ` shrq` due to that instruction being enabled with `-
> > march=cascadelake` .
> > Can someone confirm that and submit a testcase change?
> 
> I will do that today.

I suppose we might just need to change the scan-asm from shrq to shr to cover
shrx.

Is that ok? If it is, I will commit a patch to change that.

Thx,
Haochen

> 
> Thx,
> Haochen
> 
> >
> > Thanks,
> > Andrew
> >
> > >
> > > with GCC configured with
> > >
> > > ../../gcc/configure --prefix=/export/users/haochenj/src/gcc-
> > > bisect/master/master/r14-6420/usr --enable-clocale=gnu --with-system-
> zlib -
> > > -with-demangler-in-ld --with-fpmath=sse --enable-
> languages=c,c++,fortran --
> > > enable-cet --without-isl --enable-libmpx x86_64-linux --disable-bootstrap
> > >
> > > To reproduce:
> > >
> > > $ cd {build_dir}/gcc && make check
> > > RUNTESTFLAGS="i386.exp=gcc.target/i386/pr110790-2.c --
> > > target_board='unix{-m64\ -march=cascadelake}'"
> > >
> > > (Please do not reply to this email, for question about this report, 
> > > contact
> me at
> > > haochen dot jiang at intel.com.) (If you met problems with cascadelake
> > > related, disabling AVX512F in command line might save that.) (However,
> > > please make sure that there is no potential problems with AVX512.)

RE: [r14-6420 Regression] FAIL: gcc.target/i386/pr110790-2.c scan-assembler-times shrq 2 on Linux/x86_64

2023-12-11 Thread Jiang, Haochen

> -Original Message-
> From: Andrew Pinski (QUIC) 
> Sent: Tuesday, December 12, 2023 9:01 AM
> To: haochen.jiang ; Andrew Pinski (QUIC)
> ; gcc-regress...@gcc.gnu.org; gcc-
> patc...@gcc.gnu.org; Jiang, Haochen 
> Subject: RE: [r14-6420 Regression] FAIL: gcc.target/i386/pr110790-2.c scan-
> assembler-times shrq 2 on Linux/x86_64
> 
> > -Original Message-
> > From: haochen.jiang 
> > Sent: Monday, December 11, 2023 4:54 PM
> > To: Andrew Pinski (QUIC) ; gcc-
> > regress...@gcc.gnu.org; gcc-patches@gcc.gnu.org; haochen.ji...@intel.com
> > Subject: [r14-6420 Regression] FAIL: gcc.target/i386/pr110790-2.c scan-
> > assembler-times shrq 2 on Linux/x86_64
> >
> > On Linux/x86_64,
> >
> > 85c5efcffed19ca6160eeecc2d4faebd9fee63aa is the first bad commit commit
> > 85c5efcffed19ca6160eeecc2d4faebd9fee63aa
> > Author: Andrew Pinski 
> > Date:   Sat Nov 11 15:54:10 2023 -0800
> >
> > MATCH: (convert)(zero_one !=/== 0/1) for outer type and zero_one type 
> > are
> > the same
> >
> > caused
> >
> > FAIL: gcc.target/i386/pr110790-2.c scan-assembler-times shrq 2
> 
> 
> So I think this is a testsuite issue, in that shrx instruction is being used 
> here
> instead of just ` shrq` due to that instruction being enabled with `-
> march=cascadelake` .
> Can someone confirm that and submit a testcase change?

I will do that today.

Thx,
Haochen

> 
> Thanks,
> Andrew
> 
> >
> > with GCC configured with
> >
> > ../../gcc/configure --prefix=/export/users/haochenj/src/gcc-
> > bisect/master/master/r14-6420/usr --enable-clocale=gnu --with-system-zlib -
> > -with-demangler-in-ld --with-fpmath=sse --enable-languages=c,c++,fortran --
> > enable-cet --without-isl --enable-libmpx x86_64-linux --disable-bootstrap
> >
> > To reproduce:
> >
> > $ cd {build_dir}/gcc && make check
> > RUNTESTFLAGS="i386.exp=gcc.target/i386/pr110790-2.c --
> > target_board='unix{-m64\ -march=cascadelake}'"
> >
> > (Please do not reply to this email, for question about this report, contact 
> > me at
> > haochen dot jiang at intel.com.) (If you met problems with cascadelake
> > related, disabling AVX512F in command line might save that.) (However,
> > please make sure that there is no potential problems with AVX512.)

RE: [PATCH] i386: Mark Xeon Phi ISAs as deprecated

2023-12-01 Thread Jiang, Haochen

> -Original Message-
> From: Richard Biener 
> Sent: Friday, December 1, 2023 4:37 PM
> To: Jiang, Haochen 
> Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao ;
> ubiz...@gmail.com
> Subject: Re: [PATCH] i386: Mark Xeon Phi ISAs as deprecated
> 
> On Fri, Dec 1, 2023 at 8:34 AM Jiang, Haochen 
> wrote:
> >
> > > -Original Message-
> > > From: Richard Biener 
> > > Sent: Friday, December 1, 2023 3:04 PM
> > > To: Jiang, Haochen 
> > > Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao ;
> > > ubiz...@gmail.com
> > > Subject: Re: [PATCH] i386: Mark Xeon Phi ISAs as deprecated
> > >
> > > On Fri, Dec 1, 2023 at 3:22 AM Haochen Jiang 
> > > wrote:
> > > >
> > > > Since Knight Landing and Knight Mill microarchitectures are EOL, we
> > > > would like to remove its support in GCC 15. In GCC 14, we will first
> > > > emit a warning for the usage.
> > >
> > > I think it's better to keep supporting -mtune/arch=knl without diagnostics
> >
> > I see, it could be a choice and might be better. But if we take this, how 
> > should
> > we define -mtune=knl remains a question.
> 
> I'd say mapping it to a "close" micro-architecture makes most sense, but
> we could also simply keep the tuning entry for knl?

Actually I have written a removal test patch, one of the issue might be there is
something specific about knl in tuning for VZEROUPPER, which is also reflected 
in
PR82990.

/* X86_TUNE_EMIT_VZEROUPPER: This enables vzeroupper instruction insertion
   before a transfer of control flow out of the function.  */
DEF_TUNE (X86_TUNE_EMIT_VZEROUPPER, "emit_vzeroupper", ~m_KNL)

If we chose to keep them, this behavior will be changed.

> 
> > > but simply not enable the ISAs we don't support.  The better question is
> > > what to do about KNL specific intrinsics headers / intrinsics?  Will we
> > > simply remove those?
> >
> > If there is no objection, The intrinsics are planned to be removed in GCC 
> > 15.
> > As far as concerned, almost nobody are using them with the latest GCC. And
> > there is no complaint when removing them in ICC/ICX.
> 
> I see.  Replacing the header contents with #error "XYZ is no longer supported"
> might be nicer.  OTOH x86intrin.h should simply no longer include them.

That is nicer. I will take that in GCC 15 patch.

Thx,
Haochen

> 
> Richard.
> 
> > Thx,
> > Haochen
> >
> > >
> > > Richard.
> > >
> > > > gcc/ChangeLog:
> > > >
> > > > * config/i386/driver-i386.cc (host_detect_local_cpu):
> > > > Do not append "-mno-" for Xeon Phi ISAs.
> > > > * config/i386/i386-options.cc (ix86_option_override_internal):
> > > > Emit a warning for KNL/KNM targets.
> > > > * config/i386/i386.opt: Emit a warning for Xeon Phi ISAs.
> > > >
> > > > gcc/testsuite/ChangeLog:
> > > >
> > > > * g++.dg/other/i386-2.C: Adjust testcases.
> > > > * g++.dg/other/i386-3.C: Ditto.
> > > > * g++.dg/pr80481.C: Ditto.
> > > > * gcc.dg/pr71279.c: Ditto.
> > > > * gcc.target/i386/avx5124fmadd-v4fmaddps-1.c: Ditto.
> > > > * gcc.target/i386/avx5124fmadd-v4fmaddps-2.c: Ditto.
> > > > * gcc.target/i386/avx5124fmadd-v4fmaddss-1.c: Ditto.
> > > > * gcc.target/i386/avx5124fmadd-v4fnmaddps-1.c: Ditto.
> > > > * gcc.target/i386/avx5124fmadd-v4fnmaddps-2.c: Ditto.
> > > > * gcc.target/i386/avx5124fmadd-v4fnmaddss-1.c: Ditto.
> > > > * gcc.target/i386/avx5124vnniw-vp4dpwssd-1.c: Ditto.
> > > > * gcc.target/i386/avx5124vnniw-vp4dpwssd-2.c: Ditto.
> > > > * gcc.target/i386/avx5124vnniw-vp4dpwssds-1.c: Ditto.
> > > > * gcc.target/i386/avx5124vnniw-vp4dpwssds-2.c: Ditto.
> > > > * gcc.target/i386/avx512er-vexp2pd-1.c: Ditto.
> > > > * gcc.target/i386/avx512er-vexp2pd-2.c: Ditto.
> > > > * gcc.target/i386/avx512er-vexp2ps-1.c: Ditto.
> > > > * gcc.target/i386/avx512er-vexp2ps-2.c: Ditto.
> > > > * gcc.target/i386/avx512er-vrcp28pd-1.c: Ditto.
> > > > * gcc.target/i386/avx512er-vrcp28pd-2.c: Ditto.
> > > > * gcc.target/i386/avx512er-vrcp28ps-1.c: Ditto.
> > > > * gcc.target/i386/avx512er-vrcp28ps-2.c: Ditto.
> > > > * gcc.target/i386/avx512er-vrcp28ps-3

RE: [PATCH] i386: Mark Xeon Phi ISAs as deprecated

2023-11-30 Thread Jiang, Haochen

> -Original Message-
> From: Richard Biener 
> Sent: Friday, December 1, 2023 3:04 PM
> To: Jiang, Haochen 
> Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao ;
> ubiz...@gmail.com
> Subject: Re: [PATCH] i386: Mark Xeon Phi ISAs as deprecated
> 
> On Fri, Dec 1, 2023 at 3:22 AM Haochen Jiang 
> wrote:
> >
> > Since Knight Landing and Knight Mill microarchitectures are EOL, we
> > would like to remove its support in GCC 15. In GCC 14, we will first
> > emit a warning for the usage.
> 
> I think it's better to keep supporting -mtune/arch=knl without diagnostics

I see, it could be a choice and might be better. But if we take this, how should
we define -mtune=knl remains a question.

> but simply not enable the ISAs we don't support.  The better question is
> what to do about KNL specific intrinsics headers / intrinsics?  Will we
> simply remove those?

If there is no objection, The intrinsics are planned to be removed in GCC 15.
As far as concerned, almost nobody are using them with the latest GCC. And
there is no complaint when removing them in ICC/ICX.

Thx,
Haochen

> 
> Richard.
> 
> > gcc/ChangeLog:
> >
> > * config/i386/driver-i386.cc (host_detect_local_cpu):
> > Do not append "-mno-" for Xeon Phi ISAs.
> > * config/i386/i386-options.cc (ix86_option_override_internal):
> > Emit a warning for KNL/KNM targets.
> > * config/i386/i386.opt: Emit a warning for Xeon Phi ISAs.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * g++.dg/other/i386-2.C: Adjust testcases.
> > * g++.dg/other/i386-3.C: Ditto.
> > * g++.dg/pr80481.C: Ditto.
> > * gcc.dg/pr71279.c: Ditto.
> > * gcc.target/i386/avx5124fmadd-v4fmaddps-1.c: Ditto.
> > * gcc.target/i386/avx5124fmadd-v4fmaddps-2.c: Ditto.
> > * gcc.target/i386/avx5124fmadd-v4fmaddss-1.c: Ditto.
> > * gcc.target/i386/avx5124fmadd-v4fnmaddps-1.c: Ditto.
> > * gcc.target/i386/avx5124fmadd-v4fnmaddps-2.c: Ditto.
> > * gcc.target/i386/avx5124fmadd-v4fnmaddss-1.c: Ditto.
> > * gcc.target/i386/avx5124vnniw-vp4dpwssd-1.c: Ditto.
> > * gcc.target/i386/avx5124vnniw-vp4dpwssd-2.c: Ditto.
> > * gcc.target/i386/avx5124vnniw-vp4dpwssds-1.c: Ditto.
> > * gcc.target/i386/avx5124vnniw-vp4dpwssds-2.c: Ditto.
> > * gcc.target/i386/avx512er-vexp2pd-1.c: Ditto.
> > * gcc.target/i386/avx512er-vexp2pd-2.c: Ditto.
> > * gcc.target/i386/avx512er-vexp2ps-1.c: Ditto.
> > * gcc.target/i386/avx512er-vexp2ps-2.c: Ditto.
> > * gcc.target/i386/avx512er-vrcp28pd-1.c: Ditto.
> > * gcc.target/i386/avx512er-vrcp28pd-2.c: Ditto.
> > * gcc.target/i386/avx512er-vrcp28ps-1.c: Ditto.
> > * gcc.target/i386/avx512er-vrcp28ps-2.c: Ditto.
> > * gcc.target/i386/avx512er-vrcp28ps-3.c: Ditto.
> > * gcc.target/i386/avx512er-vrcp28ps-4.c: Ditto.
> > * gcc.target/i386/avx512er-vrcp28sd-1.c: Ditto.
> > * gcc.target/i386/avx512er-vrcp28sd-2.c: Ditto.
> > * gcc.target/i386/avx512er-vrcp28ss-1.c: Ditto.
> > * gcc.target/i386/avx512er-vrcp28ss-2.c: Ditto.
> > * gcc.target/i386/avx512er-vrsqrt28pd-1.c: Ditto.
> > * gcc.target/i386/avx512er-vrsqrt28pd-2.c: Ditto.
> > * gcc.target/i386/avx512er-vrsqrt28ps-1.c: Ditto.
> > * gcc.target/i386/avx512er-vrsqrt28ps-2.c: Ditto.
> > * gcc.target/i386/avx512er-vrsqrt28ps-3.c: Ditto.
> > * gcc.target/i386/avx512er-vrsqrt28ps-4.c: Ditto.
> > * gcc.target/i386/avx512er-vrsqrt28ps-5.c: Ditto.
> > * gcc.target/i386/avx512er-vrsqrt28ps-6.c: Ditto.
> > * gcc.target/i386/avx512er-vrsqrt28sd-1.c: Ditto.
> > * gcc.target/i386/avx512er-vrsqrt28sd-2.c: Ditto.
> > * gcc.target/i386/avx512er-vrsqrt28ss-1.c: Ditto.
> > * gcc.target/i386/avx512er-vrsqrt28ss-2.c: Ditto.
> > * gcc.target/i386/avx512f-gather-1.c: Ditto.
> > * gcc.target/i386/avx512f-gather-2.c: Ditto.
> > * gcc.target/i386/avx512f-gather-3.c: Ditto.
> > * gcc.target/i386/avx512f-gather-4.c: Ditto.
> > * gcc.target/i386/avx512f-gather-5.c: Ditto.
> > * gcc.target/i386/avx512f-i32gatherd512-1.c: Ditto.
> > * gcc.target/i386/avx512f-i32gatherd512-2.c: Ditto.
> > * gcc.target/i386/avx512f-i32gatherpd512-1.c: Ditto.
> > * gcc.target/i386/avx512f-i32gatherpd512-2.c: Ditto.
> > * gcc.target/i386/avx512f-i32gatherps512-1.c: Ditto.
> > * gcc.target/i386/avx512f-vect-perm

RE: [r14-5578 Regression] FAIL: gfortran.dg/gomp/pr27573.f90 -O (test for excess errors) on Linux/x86_64

2023-11-27 Thread Jiang, Haochen

> -Original Message-
> From: Sebastian Huber 
> Sent: Monday, November 27, 2023 3:58 PM
> To: haochen.jiang ; gcc-
> regress...@gcc.gnu.org; gcc-patches@gcc.gnu.org; Jiang, Haochen
> 
> Subject: Re: [r14-5578 Regression] FAIL: gfortran.dg/gomp/pr27573.f90 -O (test
> for excess errors) on Linux/x86_64
> 
> On 26.11.23 12:18, haochen.jiang wrote:
> > On Linux/x86_64,
> >
> > a350a74d6113e3a84943266eb691275951c109d9 is the first bad commit
> > commit a350a74d6113e3a84943266eb691275951c109d9
> > Author: Sebastian Huber
> > Date:   Sat Oct 21 15:52:15 2023 +0200
> >
> >  gcov: Add gen_counter_update()
> >
> > caused
> >
> > FAIL: gcc.dg/gomp/pr27573.c (internal compiler error: verify_gimple
> > failed)
> > FAIL: gcc.dg/gomp/pr27573.c (test for excess errors)
> > FAIL: gcc.dg/profile-update-warning.c (internal compiler error:
> > verify_gimple failed)
> > FAIL: gcc.dg/profile-update-warning.c (test for excess errors)
> > FAIL: gfortran.dg/gomp/pr27573.f90   -O  (internal compiler error: 
> > verify_gimple
> failed)
> > FAIL: gfortran.dg/gomp/pr27573.f90   -O  (test for excess errors)
> 
> The errors were fixed by:
> 
> commit 41aacdea55c5d795a7aa195357d966645845d00e
> Author: Sebastian Huber 
> Date:   Mon Nov 20 15:26:38 2023 +0100
> 
>  gcov: Fix integer types in gen_counter_update()
> 
> commit a034cca0a222598cb42302c059262b654685ff19
> Author: Sebastian Huber 
> Date:   Mon Nov 20 14:48:03 2023 +0100
> 
>  gcov: Use unshare_expr() in gen_counter_update()
> 

Hi Sebastian,

Thanks for your fix! This mail was automatically sent and delayed due to
the previous bootstrap fail on the trunk. If everything got fixed, that is
ok.

Thx,
Haochen

> --
> embedded brains GmbH & Co. KG
> Herr Sebastian HUBER
> Dornierstr. 4
> 82178 Puchheim
> Germany
> email: sebastian.hu...@embedded-brains.de
> phone: +49-89-18 94 741 - 16
> fax:   +49-89-18 94 741 - 08
> 
> Registergericht: Amtsgericht München
> Registernummer: HRB 157899
> Vertretungsberechtigte Geschäftsführer: Peter Rasmussen, Thomas Dörfler
> Unsere Datenschutzerklärung finden Sie hier:
> https://embedded-brains.de/datenschutzerklaerung/

RE: [PATCH] [gcc-wwwdocs]gcc-13/14: Mention Intel new ISA and march support

2023-11-26 Thread Jiang, Haochen

> -Original Message-
> From: Gerald Pfeifer 
> Sent: Saturday, November 25, 2023 7:29 PM
> To: Jiang, Haochen 
> Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao ;
> ubiz...@gmail.com
> Subject: Re: [PATCH] [gcc-wwwdocs]gcc-13/14: Mention Intel new ISA and march
> support
> 
> On Mon, 17 Jul 2023, Haochen Jiang via Gcc-patches wrote:
> >GCC now supports the Intel CPU named Granite Rapids through
> >  -march=graniterapids.
> > +The switch enables the AMX-FP16, PREFETCHI ISA extensions.
> 
> Do I understand correclty that it enables AMX-FP16 and PREFETCHI?
> 
> How about changing this to use "and", as in
>   "The switch enables the AMX-FP16, PREFETCHI ISA extensions."
> ?
> 
> Let me know, and I can make the change.
> 

Ok for me.

Thx,
Haochen

> Gerald

RE: [PATCH v2] gcov: Fix integer types in gen_counter_update()

2023-11-23 Thread Jiang, Haochen

> -Original Message-
> From: Sebastian Huber 
> Sent: Wednesday, November 22, 2023 10:24 PM
> To: Christophe Lyon 
> Cc: Jakub Jelinek ; gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH v2] gcov: Fix integer types in gen_counter_update()
> 
> On 22.11.23 15:22, Christophe Lyon wrote:
> > On Tue, 21 Nov 2023 at 12:22, Sebastian Huber
> >   wrote:
> >> On 21.11.23 11:46, Jakub Jelinek wrote:
> >>> On Tue, Nov 21, 2023 at 11:42:06AM +0100, Sebastian Huber wrote:
>  On 21.11.23 11:34, Jakub Jelinek wrote:
> >> --- a/gcc/tree-profile.cc
> >> +++ b/gcc/tree-profile.cc
> >> @@ -281,10 +281,13 @@ gen_assign_counter_update
> (gimple_stmt_iterator *gsi, gcall *call, tree func,
> >>   if (result)
> >> {
> >>   tree result_type = TREE_TYPE (TREE_TYPE (func));
> >> -  tree tmp = make_temp_ssa_name (result_type, NULL, name);
> >> -  gimple_set_lhs (call, tmp);
> >> +  tree tmp1 = make_temp_ssa_name (result_type, NULL, name);
> >> +  gimple_set_lhs (call, tmp1);
> >>   gsi_insert_after (gsi, call, GSI_NEW_STMT);
> >> -  gassign *assign = gimple_build_assign (result, tmp);
> >> +  tree tmp2 = make_ssa_name (TREE_TYPE (result));
> >> +  gassign *assign = gimple_build_assign (tmp2, NOP_EXPR, tmp1);
> >> +  gsi_insert_after (gsi, assign, GSI_NEW_STMT);
> >> +  assign = gimple_build_assign (result, gimple_assign_lhs 
> >> (assign));
> > When you use a temporary tmp2 for the lhs of the conversion, you can
> just
> > use it here,
> >  assign = gimple_build_assign (result, tmp2);
> >
> > Ok for trunk with that change.
>  Just a question, could I also use
> 
>  tree tmp2 = make_temp_ssa_name (TREE_TYPE (result), NULL, name);
> 
>  ?
> 
>  This make_temp_ssa_name() is used throughout the file and the new
>  make_ssa_name() would be the first use in this file.
> >>> Yes.  The only difference is that it won't be _234 = (type) something;
> >>> but PROF_time_profile_234 = (type) something; in the dumps, but sure,
> >>> consistency is useful.
> >> Thanks for your help. I checked in an updated version.
> >>
> > Our CI bisected a regression to this commit:
> > Running gcc:gcc.dg/tree-prof/tree-prof.exp ...
> > FAIL: gcc.dg/tree-prof/time-profiler-3.c scan-ipa-dump-times profile
> > "Read tp_first_run: 0" 1
> > FAIL: gcc.dg/tree-prof/time-profiler-3.c scan-ipa-dump-times profile
> > "Read tp_first_run: 2" 1
> >
> > (on aarch64)
> >
> > Can you check?
> 
> Yes, I will have a look at it.

The same issue also happened on i386. You can also reproduce that on
x86-64 platforms.

Thx,
Haochen

> 
> --
> embedded brains GmbH
> Herr Sebastian HUBER
> Dornierstr. 4
> 82178 Puchheim
> Germany
> email: sebastian.hu...@embedded-brains.de
> phone: +49-89-18 94 741 - 16
> fax:   +49-89-18 94 741 - 08
> 
> Registergericht: Amtsgericht München
> Registernummer: HRB 157899
> Vertretungsberechtigte Geschäftsführer: Peter Rasmussen, Thomas Dörfler
> Unsere Datenschutzerklärung finden Sie hier:
> https://embedded-brains.de/datenschutzerklaerung/

RE: [RFC] Intel AVX10.1 Compiler Design and Support

2023-11-13 Thread Jiang, Haochen

> > > > I wonder whether adoption could be made easier by also providing a
> > > > -mavx10[.0] level that removes some of the more obscure sub-ISA
> > > > requirements to cover more existing implementations (I'd not add 
> > > > -mavx10.0-512 here).
> > > > I'd require only skylake-AVX512 features here, basically all
> > > > non-KNL AVX512 CPUs should have a "virtual" AVX10 level that
> > > > allows to use that feature set,
> > >
> > > We have -mno-evex512 can cover those cases, so what you want is like
> > > a simple alias of "-march=skylake-avx512 -mno-evex512"?
> >
> > For the AVX512 enabled sub-isas of skylake-avx512 yes I guess.
> >
> > > > restricted to 256bits so future AVX10-256 implementations can
> > > > handle it as well as all existing (and relevant, which excludes
> > > > KNL) AVX512 implementations.
> > > >
> > > > Otherwise AVX10 is really a hard sell (as AVX512 was originally).
> > >
> > > It's a rebranding of the existing AVX512 to AVX10, AVX10.0  just
> > > complicated things further(considering we already have x86-64-v4
> > > which is different from skylake-avx512).
> >
> > Well, the cut-off for "AVX512" is quite arbitrary.  Introducing a
> > "new" ISA that's only available in HW available in the future and
> > suggesting users to embrace that already (like Intel did with AVX512
> > without offering client SKU support) is a hard sell.
> >
> > I realize Intel thinks client SKU support for AVX10 (restricted to
> > 256bit) will be "easier".  But then don't expect anybody to adopt that in 
> > the next 10 years.
> >
> > Just to add - we were suggesting to use x86_64-v3 for the "next"
> > enterprise product but got downvoted to x86_64-v2 for compatibility reasons.
> >
> > If it were possible I'd axe x86_64-v4.  Maybe we should add a
> > x86_64-v3.5 that sits inbetween v3 and v4, offering AVX512 but
> > restricted to 256bit (and obviously not requiring more of the AVX512 
> > features that v4 requires).
>
> About the arch level is indeed a problem, especially since the default size of
> avx10 is 256.
> +Florian Weimer for more inputs.

IMO, AVX10.1 options should be there and the arch level issue should not affect 
the
existence of this series of options.

The issue currently we are facing is much about the arch level issue actually 
since
we have defined x86-64-v4 before. The "-march=skylake-server -mno-evex512" is
much like something x86-64-v4-256.

Thx,
Haochen

RE: [x86_64 PATCH] PR target/110551: Tweak mulx register allocation using peephole2.

2023-11-01 Thread Jiang, Haochen

> -Original Message-
> From: Uros Bizjak 
> Sent: Thursday, November 2, 2023 3:23 AM
> To: Roger Sayle 
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: [x86_64 PATCH] PR target/110551: Tweak mulx register allocation
> using peephole2.
> 
> On Wed, Nov 1, 2023 at 1:58 PM Roger Sayle 
> wrote:
> >
> >
> > Hi Uros,
> >
> > > From: Uros Bizjak 
> > > Sent: 01 November 2023 10:05
> > > Subject: Re: [x86_64 PATCH] PR target/110551: Tweak mulx register
> allocation
> > > using peephole2.
> > >
> > > On Mon, Oct 30, 2023 at 6:27 PM Roger Sayle
> 
> > > wrote:
> > > >
> > > >
> > > > This patch is a follow-up to my previous PR target/110551 patch, this
> > > > time to address the additional move after mulx, seen on TARGET_BMI2
> > > > architectures (such as -march=haswell).  The complication here is that
> > > > the flexible multiple-set mulx instruction is introduced into RTL
> > > > after reload, by split2, and therefore can't benefit from register
> > > > preferencing.  This results in RTL like the following:
> > > >
> > > > (insn 32 31 17 2 (parallel [
> > > > (set (reg:DI 4 si [orig:101 r ] [101])
> > > > (mult:DI (reg:DI 1 dx [109])
> > > > (reg:DI 5 di [109])))
> > > > (set (reg:DI 5 di [ r+8 ])
> > > > (umul_highpart:DI (reg:DI 1 dx [109])
> > > > (reg:DI 5 di [109])))
> > > > ]) "pr110551-2.c":8:17 -1
> > > >  (nil))
> > > >
> > > > (insn 17 32 9 2 (set (reg:DI 0 ax [107])
> > > > (reg:DI 5 di [ r+8 ])) "pr110551-2.c":9:40 90 {*movdi_internal}
> > > >  (expr_list:REG_DEAD (reg:DI 5 di [ r+8 ])
> > > > (nil)))
> > > >
> > > > Here insn 32, the mulx instruction, places its results in si and di,
> > > > and then immediately after decides to move di to ax, with di now dead.
> > > > This can be trivially cleaned up by a peephole2.  I've added an
> > > > additional constraint that the two SET_DESTs can't be the same
> > > > register to avoid confusing the middle-end, but this has well-defined
> > > > behaviour on x86_64/BMI2, encoding a umul_highpart.
> > > >
> > > > For the new test case, compiled on x86_64 with -O2 -march=haswell:
> > > >
> > > > Before:
> > > > mulx64: movabsq $-7046029254386353131, %rdx
> > > > mulx%rdi, %rsi, %rdi
> > > > movq%rdi, %rax
> > > > xorq%rsi, %rax
> > > > ret
> > > >
> > > > After:
> > > > mulx64: movabsq $-7046029254386353131, %rdx
> > > > mulx%rdi, %rsi, %rax
> > > > xorq%rsi, %rax
> > > > ret
> > > >
> > > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> > > > and make -k check, both with and without --target_board=unix{-m32}
> > > > with no new failures.  Ok for mainline?
> > >
> > > It looks that your previous PR110551 patch regressed -march=cascadelake 
> > > [1].

Actually it is not only on -march=cascadelake, w/o -march=cascadelake will also
fail.

Thx,
Haochen

> > > Let's fix these regressions first.
> > >
> > > [1] https://gcc.gnu.org/pipermail/gcc-patches/2023-October/634660.html
> > >
> > > Uros.
> >
> > This patch fixes that "regression".  Originally, the test case in PR110551 
> > contained
> > one unnecessary mov on "default" x86_targets, but two extra movs on BMI2
> > targets, including -march=haswell and -march=cascadelake.  The first patch
> > eliminated one of these MOVs, this patch eliminates the second.  I'm not 
> > sure
> > that you can call it a regression, the added test failed when run with a 
> > non-standard
> > -march setting.  The good news is that test case doesn't have to be changed 
> > with
> > this patch applied, i.e. the correct intended behaviour is no MOVs on all
> architectures.
> 
> I was not worried about the extra mov, but more about [2], also
> referred from [1], but it looks that that regression was wrongly
> attributed to your patch.
> 
> [2] https://gcc.gnu.org/pipermail/gcc-regression/2023-October/078391.html
> 
> > I'll admit the timing is unusual; I had already written and was regression 
> > testing a
> > patch for the BMI2 issue, when the -march=cascadelake regression tester let 
> > me
> > know it was required for folks that helpfully run the regression suite with 
> > non
> > standard settings.  i.e. a long standing bug that wasn't previously tested 
> > for by
> > the testsuite.
> >
> > > > 2023-10-30  Roger Sayle  
> > > >
> > > > gcc/ChangeLog
> > > > PR target/110551
> > > > * config/i386/i386.md (*bmi2_umul3_1): Tidy condition
> > > > as operands[2] with predicate register_operand must be !MEM_P.
> > > > (peephole2): Optimize a mulx followed by a register-to-register
> > > > move, to place result in the correct destination if possible.
> > > >
> > > > gcc/testsuite/ChangeLog
> > > > PR target/110551
> > > > * gcc.target/i386/pr110551-2.c: New test case.
> 
> The patch is OK.
> 
> Thanks,
> Uros.

RE: [x86 PATCH] PR target/110551: Fix reg allocation for widening multiplications.

2023-10-30 Thread Jiang, Haochen

Hi Roger,

It seems that your patch caused some regression on x86_64:

https://gcc.gnu.org/pipermail/gcc-regression/2023-October/078390.html
https://gcc.gnu.org/pipermail/gcc-regression/2023-October/078391.html

Could you help verify that?

A simple reproducer under build folder will be:

make check RUNTESTFLAGS="conformance.exp=std/time/year_month_day/io.cc 
--target_board='unix{-m64\ -march=cascadelake,-m32\ 
-march=cascadelake,-m32,-m64}'"

Thx,
Haochen

> -Original Message-
> From: Roger Sayle 
> Sent: Wednesday, October 18, 2023 10:30 PM
> To: gcc-patches@gcc.gnu.org
> Cc: 'Uros Bizjak' ; tobias.bur...@siemens.com
> Subject: RE: [x86 PATCH] PR target/110551: Fix reg allocation for widening
> multiplications.
> 
> 
> Many thanks to Tobias Burnus for pointing out the mistake/typo in the PR
> number.
> This fix is for PR 110551, not PR 110511.  I'll update the ChangeLog and
> filename
> of the new testcase, if approved.
> 
> Sorry for any inconvenience/confusion.
> Cheers,
> Roger
> --
> 
> > -Original Message-
> > From: Roger Sayle 
> > Sent: 17 October 2023 20:06
> > To: 'gcc-patches@gcc.gnu.org' 
> > Cc: 'Uros Bizjak' 
> > Subject: [x86 PATCH] PR target/110511: Fix reg allocation for widening
> > multiplications.
> >
> >
> > This patch contains clean-ups of the widening multiplication patterns in
> i386.md,
> > and provides variants of the existing highpart multiplication
> > peephole2 transformations (that tidy up register allocation after reload),
> and
> > thereby fixes PR target/110511, which is a superfluous move instruction.
> >
> > For the new test case, compiled on x86_64 with -O2.
> >
> > Before:
> > mulx64: movabsq $-7046029254386353131, %rcx
> > movq%rcx, %rax
> > mulq%rdi
> > xorq%rdx, %rax
> > ret
> >
> > After:
> > mulx64: movabsq $-7046029254386353131, %rax
> > mulq%rdi
> > xorq%rdx, %rax
> > ret
> >
> > The clean-ups are (i) that operand 1 is consistently made register_operand
> and
> > operand 2 becomes nonimmediate_operand, so that predicates match the
> > constraints, (ii) the representation of the BMI2 mulx instruction is
> updated to use
> > the new umul_highpart RTX, and (iii) because operands
> > 0 and 1 have different modes in widening multiplications, "a" is a more
> > appropriate constraint than "0" (which avoids spills/reloads containing
> SUBREGs).
> > The new peephole2 transformations are based upon those at around line
> 9951
> of
> > i386.md, that begins with the comment ;; Highpart multiplication
> peephole2s to
> > tweak register allocation.
> > ;; mov imm,%rdx; mov %rdi,%rax; imulq %rdx  ->  mov imm,%rax; imulq %rdi
> >
> >
> > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap and
> > make -k check, both with and without --target_board=unix{-m32} with no
> new
> > failures.  Ok for mainline?
> >
> >
> > 2023-10-17  Roger Sayle  
> >
> > gcc/ChangeLog
> > PR target/110511
> > * config/i386/i386.md (mul3): Make operands 1 and
> > 2 take "regiser_operand" and "nonimmediate_operand" respectively.
> > (mulqihi3): Likewise.
> > (*bmi2_umul3_1): Operand 2 needs to be
> register_operand
> > matching the %d constraint.  Use umul_highpart RTX to represent
> > the highpart multiplication.
> > (*umul3_1):  Operand 2 should use regiser_operand
> > predicate, and "a" rather than "0" as operands 0 and 2 have
> > different modes.
> > (define_split): For mul to mulx conversion, use the new
> > umul_highpart RTX representation.
> > (*mul3_1):  Operand 1 should be register_operand
> > and the constraint %a as operands 0 and 1 have different modes.
> > (*mulqihi3_1): Operand 1 should be register_operand matching
> > the constraint %0.
> > (define_peephole2): Providing widening multiplication variants
> > of the peephole2s that tweak highpart multiplication register
> > allocation.
> >
> > gcc/testsuite/ChangeLog
> > PR target/110511
> > * gcc.target/i386/pr110511.c: New test case.
> >
> >
> > Thanks in advance,
> > Roger
>

RE: [PATCH] [gcc-wwwdocs]gcc-13/14: Mention Intel new ISA and march support

2023-10-26 Thread Jiang, Haochen

> -Original Message-
> From: Jiang, Haochen 
> Sent: Friday, October 27, 2023 10:52 AM
> To: Jiang, Haochen ; gcc-patches@gcc.gnu.org
> Cc: Liu, Hongtao ; ubiz...@gmail.com; Gerald Pfeifer
> 
> Subject: RE: [PATCH] [gcc-wwwdocs]gcc-13/14: Mention Intel new ISA and
> march support
> 
> > -Original Message-
> > From: Gcc-patches  > bounces+haochen.jiang=intel@gcc.gnu.org> On Behalf Of Haochen
> > bounces+Jiang
> > via Gcc-patches
> > Sent: Monday, July 17, 2023 11:34 AM
> > To: gcc-patches@gcc.gnu.org
> > Cc: Liu, Hongtao ; ubiz...@gmail.com
> > Subject: [PATCH] [gcc-wwwdocs]gcc-13/14: Mention Intel new ISA and
> > march support
> >
> > Hi all,
> >
> > This patch adds documentation to wwwdocs to mention the recent
> > introduction of Intel new ISA and march.
> >
> > Ok for trunk?
> 
> I will commit the patch next Monday if there is no objection.

Sorry for the disturb since I find the wrong mail to reply because they
are too similar.

> 
> Thx,
> Haochen
> 
> >
> > BRs,
> > Haochen
> >
> > ---
> >  htdocs/gcc-13/changes.html |  4 
> >  htdocs/gcc-14/changes.html | 34
> > +-
> >  2 files changed, 37 insertions(+), 1 deletion(-)
> >
> > diff --git a/htdocs/gcc-13/changes.html b/htdocs/gcc-13/changes.html
> > index 39414e18..68e8c5cc 100644
> > --- a/htdocs/gcc-13/changes.html
> > +++ b/htdocs/gcc-13/changes.html
> > @@ -593,6 +593,10 @@ You may also want to check out our
> >
> >GCC now supports the Intel CPU named Granite Rapids through
> >  -march=graniterapids.
> > +The switch enables the AMX-FP16, PREFETCHI ISA extensions.
> > +  
> > +  GCC now supports the Intel CPU named Granite Rapids D through
> > +-march=graniterapids-d.
> >  The switch enables the AMX-FP16, PREFETCHI and AMX-COMPLEX ISA
> > extensions.
> >
> >GCC now supports AMD CPUs based on the znver4 core
> > diff --git a/htdocs/gcc-14/changes.html b/htdocs/gcc-14/changes.html
> > index
> > 3f797642..dad1ba53 100644
> > --- a/htdocs/gcc-14/changes.html
> > +++ b/htdocs/gcc-14/changes.html
> > @@ -108,7 +108,39 @@ a work-in-progress.
> >
> >  
> >
> > -
> > +IA-32/x86-64
> > +
> > +  New ISA extension support for Intel AVX-VNNI-INT16 was added.
> > +  AVX-VNNI-INT16 intrinsics are available via the -
> > mavxvnniint16
> > +  compiler switch.
> > +  
> > +  New ISA extension support for Intel SHA512 was added.
> > +  SHA512 intrinsics are available via the -msha512
> > +  compiler switch.
> > +  
> > +  New ISA extension support for Intel SM3 was added.
> > +  SM3 intrinsics are available via the -msm3
> > +  compiler switch.
> > +  
> > +  New ISA extension support for Intel SM4 was added.
> > +  SM4 intrinsics are available via the -msm4
> > +  compiler switch.
> > +  
> > +  GCC now supports the Intel CPU named Arrow Lake through
> > +-march=arrowlake.
> > +Based on Alder Lake, the switch further enables the AVX-IFMA,
> > +AVX-VNNI-INT8, AVX-NE-CONVERT and CMPccXADD ISA extensions.
> > +  
> > +  GCC now supports the Intel CPU named Arrow Lake S through
> > +-march=arrowlake-s.
> > +Based on Arrow Lake, the switch further enables the
> > + AVX-VNNI-INT16,
> > SHA512,
> > +SM3 and SM4 ISA extensions.
> > +  
> > +  GCC now supports the Intel CPU named Lunar Lake through
> > +-march=lunarlake.
> > +Lunar Lake is based on Arrow Lake S.
> > +  
> > +
> >
> >  
> >
> > --
> > 2.31.1

RE: [gccwwwdocs PATCH] gcc-13/14: Mention Intel new ISA and march support

2023-10-26 Thread Jiang, Haochen

> -Original Message-
> From: Haochen Jiang 
> Sent: Monday, October 23, 2023 10:18 AM
> To: gcc-patches@gcc.gnu.org
> Cc: ger...@pfeifer.com; ubiz...@gmail.com; Liu, Hongtao
> 
> Subject: [gccwwwdocs PATCH] gcc-13/14: Mention Intel new ISA and march
> support
> 
> Hi all,
> 
> This patch mentions recent update for x86-64 backend, including ISAs enabled
> update on previous introduced CPU and newly introduced
> options/ISAs/CPUs.
> 
> Ok for wwwdocs?

I will commit the patch if there is no objection.

Thx,
Haochen

> 
> Thx,
> Haochen
> 
> ---
>  htdocs/gcc-13/changes.html |  8   htdocs/gcc-14/changes.html | 19
> +++
>  2 files changed, 23 insertions(+), 4 deletions(-)
> 
> diff --git a/htdocs/gcc-13/changes.html b/htdocs/gcc-13/changes.html index
> 10c54689..8ef3d639 100644
> --- a/htdocs/gcc-13/changes.html
> +++ b/htdocs/gcc-13/changes.html
> @@ -579,13 +579,13 @@ You may also want to check out our
>
>GCC now supports the Intel CPU named Sierra Forest through
>  -march=sierraforest.
> -The switch enables the AVX-IFMA, AVX-VNNI-INT8, AVX-NE-CONVERT and
> -CMPccXADD ISA extensions.
> +The switch enables the AVX-IFMA, AVX-VNNI-INT8, AVX-NE-CONVERT,
> CMPccXADD,
> +ENQCMD and UINTR ISA extensions.
>
>GCC now supports the Intel CPU named Grand Ridge through
>  -march=grandridge.
> -The switch enables the AVX-IFMA, AVX-VNNI-INT8, AVX-NE-CONVERT,
> CMPccXADD
> -and RAO-INT ISA extensions.
> +The switch enables the AVX-IFMA, AVX-VNNI-INT8, AVX-NE-CONVERT,
> CMPccXADD,
> +ENQCMD, UINTR and RAO-INT ISA extensions.
>
>GCC now supports the Intel CPU named Emerald Rapids through
>  -march=emeraldrapids.
> diff --git a/htdocs/gcc-14/changes.html b/htdocs/gcc-14/changes.html index
> c817dde4..4f71061f 100644
> --- a/htdocs/gcc-14/changes.html
> +++ b/htdocs/gcc-14/changes.html
> @@ -186,6 +186,10 @@ a work-in-progress.
> 
>  IA-32/x86-64
>  
> +  New compiler option -m[no-]evex512 was added.
> +  The compiler switch enables/disables 512 bit vector and 64 bit mask
> +  register. It will be default on if AVX512F is enabled.
> +  
>New ISA extension support for Intel AVX-VNNI-INT16 was added.
>AVX-VNNI-INT16 intrinsics are available via the -
> mavxvnniint16
>compiler switch.
> @@ -202,6 +206,16 @@ a work-in-progress.
>SM4 intrinsics are available via the -msm4
>compiler switch.
>
> +  New ISA extension support for Intel USER_MSR was added.
> +  USER_MSR intrinsics are available via the -muser_msr
> +  compiler switch.
> +  
> +  GCC now supports the Intel CPU named Clearwater Forest through
> +-march=clearwaterforest.
> +Based on Sierra Forest, the switch further enables the AVX-VNNI-INT16,
> +SHA512, SM3, SM4, USER_MSR and PREFETCHI ISA extensions.
> +extensions.
> +  
>GCC now supports the Intel CPU named Arrow Lake through
>  -march=arrowlake.
>  Based on Alder Lake, the switch further enables the AVX-IFMA, @@ -216,6
> +230,11 @@ a work-in-progress.
>  -march=lunarlake.
>  Lunar Lake is based on Arrow Lake S.
>
> +  GCC now supports the Intel CPU named Panther Lake through
> +-march=pantherlake.
> +Based on Arrow Lake S, the switch further enables the PREFETCHI ISA
> +extensions.
> +  
>  
> 
>  
> --
> 2.31.1

RE: [PATCH] [gcc-wwwdocs]gcc-13/14: Mention Intel new ISA and march support

2023-10-26 Thread Jiang, Haochen

> -Original Message-
> From: Gcc-patches  bounces+haochen.jiang=intel@gcc.gnu.org> On Behalf Of Haochen Jiang
> via Gcc-patches
> Sent: Monday, July 17, 2023 11:34 AM
> To: gcc-patches@gcc.gnu.org
> Cc: Liu, Hongtao ; ubiz...@gmail.com
> Subject: [PATCH] [gcc-wwwdocs]gcc-13/14: Mention Intel new ISA and march
> support
> 
> Hi all,
> 
> This patch adds documentation to wwwdocs to mention the recent
> introduction of Intel new ISA and march.
> 
> Ok for trunk?

I will commit the patch next Monday if there is no objection.

Thx,
Haochen

> 
> BRs,
> Haochen
> 
> ---
>  htdocs/gcc-13/changes.html |  4 
>  htdocs/gcc-14/changes.html | 34
> +-
>  2 files changed, 37 insertions(+), 1 deletion(-)
> 
> diff --git a/htdocs/gcc-13/changes.html b/htdocs/gcc-13/changes.html index
> 39414e18..68e8c5cc 100644
> --- a/htdocs/gcc-13/changes.html
> +++ b/htdocs/gcc-13/changes.html
> @@ -593,6 +593,10 @@ You may also want to check out our
>
>GCC now supports the Intel CPU named Granite Rapids through
>  -march=graniterapids.
> +The switch enables the AMX-FP16, PREFETCHI ISA extensions.
> +  
> +  GCC now supports the Intel CPU named Granite Rapids D through
> +-march=graniterapids-d.
>  The switch enables the AMX-FP16, PREFETCHI and AMX-COMPLEX ISA
> extensions.
>
>GCC now supports AMD CPUs based on the znver4 core
> diff --git a/htdocs/gcc-14/changes.html b/htdocs/gcc-14/changes.html index
> 3f797642..dad1ba53 100644
> --- a/htdocs/gcc-14/changes.html
> +++ b/htdocs/gcc-14/changes.html
> @@ -108,7 +108,39 @@ a work-in-progress.
> 
>  
> 
> -
> +IA-32/x86-64
> +
> +  New ISA extension support for Intel AVX-VNNI-INT16 was added.
> +  AVX-VNNI-INT16 intrinsics are available via the -
> mavxvnniint16
> +  compiler switch.
> +  
> +  New ISA extension support for Intel SHA512 was added.
> +  SHA512 intrinsics are available via the -msha512
> +  compiler switch.
> +  
> +  New ISA extension support for Intel SM3 was added.
> +  SM3 intrinsics are available via the -msm3
> +  compiler switch.
> +  
> +  New ISA extension support for Intel SM4 was added.
> +  SM4 intrinsics are available via the -msm4
> +  compiler switch.
> +  
> +  GCC now supports the Intel CPU named Arrow Lake through
> +-march=arrowlake.
> +Based on Alder Lake, the switch further enables the AVX-IFMA,
> +AVX-VNNI-INT8, AVX-NE-CONVERT and CMPccXADD ISA extensions.
> +  
> +  GCC now supports the Intel CPU named Arrow Lake S through
> +-march=arrowlake-s.
> +Based on Arrow Lake, the switch further enables the AVX-VNNI-INT16,
> SHA512,
> +SM3 and SM4 ISA extensions.
> +  
> +  GCC now supports the Intel CPU named Lunar Lake through
> +-march=lunarlake.
> +Lunar Lake is based on Arrow Lake S.
> +  
> +
> 
>  
> 
> --
> 2.31.1

RE: [PATCH-1v4, expand] Enable vector mode for compare_by_pieces [PR111449]

2023-10-26 Thread Jiang, Haochen

> -Original Message-
> From: Jiang, Haochen
> Sent: Wednesday, October 25, 2023 4:47 PM
> To: Richard Sandiford ; HAO CHEN GUI
> 
> Cc: gcc-patches 
> Subject: RE: [PATCH-1v4, expand] Enable vector mode for compare_by_pieces
> [PR111449]
> 
> > -Original Message-
> > From: Richard Sandiford 
> > Sent: Wednesday, October 25, 2023 4:40 PM
> > To: HAO CHEN GUI 
> > Cc: Jiang, Haochen ; gcc-patches  > patc...@gcc.gnu.org>
> > Subject: Re: [PATCH-1v4, expand] Enable vector mode for
> > compare_by_pieces [PR111449]
> >
> > HAO CHEN GUI  writes:
> > > Hi Haochen,
> > >   The regression cases are caused by "targetm.scalar_mode_supported_p"
> > > I added for scalar mode checking. XImode, OImode and TImode (with
> > > -m32) are not enabled in ix86_scalar_mode_supported_p. So they're
> > > excluded from by pieces operations on i386.
> > >
> > >   The original code doesn't do a check for scalar modes. I think it
> > > might be incorrect as not all scalar modes support move and compare
> optabs. (e.g.
> > > TImode with -m32 on rs6000).
> > >
> > >   I drafted a new patch to manually check optabs for scalar mode.
> > > Now both vector and scalar modes are checked for optabs.
> > >
> > >   I did a simple test. All former regression cases are back. Could
> > > you help do a full regression test? I am worry about the coverage of my CI
> system.
> 
> Thanks for that. I am running the regression test now.

The patch works. Thanks a lot!

Sorry for the delay since my CI accidentally crashed.

Thx,
Haochen

> 
> Thx,
> Haochen
> 
> >
> > Thanks for the quick fix.  The patch LGTM FWIW.  Just a small
> > suggestion for the function name:
> >
> > >
> > > Thanks
> > > Gui Haochen
> > >
> > > patch.diff
> > > diff --git a/gcc/expr.cc b/gcc/expr.cc index
> > > 7aac575eff8..2af9fcbed18
> > > 100644
> > > --- a/gcc/expr.cc
> > > +++ b/gcc/expr.cc
> > > @@ -1000,18 +1000,21 @@ can_use_qi_vectors (by_pieces_operation
> op)
> > >  /* Return true if optabs exists for the mode and certain by pieces
> > > operations.  */
> > >  static bool
> > > -qi_vector_mode_supported_p (fixed_size_mode mode,
> > by_pieces_operation
> > > op)
> > > +mode_supported_p (fixed_size_mode mode, by_pieces_operation op)
> >
> > Might be worth calling this something more specific, such as
> > by_pieces_mode_supported_p.
> >
> > Otherwise the patch is OK for trunk if it passes the x86 testing.
> >
> > Thanks,
> > Richard
> >
> > >  {
> > > +  if (optab_handler (mov_optab, mode) == CODE_FOR_nothing)
> > > +return false;
> > > +
> > >if ((op == SET_BY_PIECES || op == CLEAR_BY_PIECES)
> > > -  && optab_handler (vec_duplicate_optab, mode) !=
> CODE_FOR_nothing)
> > > -return true;
> > > +  && VECTOR_MODE_P (mode)
> > > +  && optab_handler (vec_duplicate_optab, mode) ==
> > CODE_FOR_nothing)
> > > +return false;
> > >
> > >if (op == COMPARE_BY_PIECES
> > > -  && optab_handler (mov_optab, mode) != CODE_FOR_nothing
> > > -  && can_compare_p (EQ, mode, ccp_jump))
> > > -return true;
> > > +  && !can_compare_p (EQ, mode, ccp_jump))
> > > +return false;
> > >
> > > -  return false;
> > > +  return true;
> > >  }
> > >
> > >  /* Return the widest mode that can be used to perform part of an @@
> > > -1035,7 +1038,7 @@ widest_fixed_size_mode_for_size (unsigned int
> > > size,
> > by_pieces_operation op)
> > > {
> > >   if (GET_MODE_SIZE (candidate) >= size)
> > > break;
> > > - if (qi_vector_mode_supported_p (candidate, op))
> > > + if (mode_supported_p (candidate, op))
> > > result = candidate;
> > > }
> > >
> > > @@ -1049,7 +1052,7 @@ widest_fixed_size_mode_for_size (unsigned int
> > size, by_pieces_operation op)
> > >  {
> > >mode = tmode.require ();
> > >if (GET_MODE_SIZE (mode) < size
> > > -   && targetm.scalar_mode_supported_p (mode))
> > > +   && mode_supported_p (mode, op))
> > >result = mode;
> > >  }
> > >
> > > @@ -1454,7 +1457,7 @@
> > op_by_pieces_d::smallest_fixed_size_mode_for_size (unsigned int size)
> > > break;
> > >
> > >   if (GET_MODE_SIZE (candidate) >= size
> > > - && qi_vector_mode_supported_p (candidate, m_op))
> > > + && mode_supported_p (candidate, m_op))
> > > return candidate;
> > > }
> > >  }

RE: [PATCH-1v4, expand] Enable vector mode for compare_by_pieces [PR111449]

2023-10-25 Thread Jiang, Haochen

> -Original Message-
> From: Richard Sandiford 
> Sent: Wednesday, October 25, 2023 4:40 PM
> To: HAO CHEN GUI 
> Cc: Jiang, Haochen ; gcc-patches  patc...@gcc.gnu.org>
> Subject: Re: [PATCH-1v4, expand] Enable vector mode for compare_by_pieces
> [PR111449]
> 
> HAO CHEN GUI  writes:
> > Hi Haochen,
> >   The regression cases are caused by "targetm.scalar_mode_supported_p"
> > I added for scalar mode checking. XImode, OImode and TImode (with
> > -m32) are not enabled in ix86_scalar_mode_supported_p. So they're
> > excluded from by pieces operations on i386.
> >
> >   The original code doesn't do a check for scalar modes. I think it
> > might be incorrect as not all scalar modes support move and compare optabs. 
> > (e.g.
> > TImode with -m32 on rs6000).
> >
> >   I drafted a new patch to manually check optabs for scalar mode. Now
> > both vector and scalar modes are checked for optabs.
> >
> >   I did a simple test. All former regression cases are back. Could you
> > help do a full regression test? I am worry about the coverage of my CI 
> > system.

Thanks for that. I am running the regression test now.

Thx,
Haochen

> 
> Thanks for the quick fix.  The patch LGTM FWIW.  Just a small suggestion for
> the function name:
> 
> >
> > Thanks
> > Gui Haochen
> >
> > patch.diff
> > diff --git a/gcc/expr.cc b/gcc/expr.cc index 7aac575eff8..2af9fcbed18
> > 100644
> > --- a/gcc/expr.cc
> > +++ b/gcc/expr.cc
> > @@ -1000,18 +1000,21 @@ can_use_qi_vectors (by_pieces_operation op)
> >  /* Return true if optabs exists for the mode and certain by pieces
> > operations.  */
> >  static bool
> > -qi_vector_mode_supported_p (fixed_size_mode mode,
> by_pieces_operation
> > op)
> > +mode_supported_p (fixed_size_mode mode, by_pieces_operation op)
> 
> Might be worth calling this something more specific, such as
> by_pieces_mode_supported_p.
> 
> Otherwise the patch is OK for trunk if it passes the x86 testing.
> 
> Thanks,
> Richard
> 
> >  {
> > +  if (optab_handler (mov_optab, mode) == CODE_FOR_nothing)
> > +return false;
> > +
> >if ((op == SET_BY_PIECES || op == CLEAR_BY_PIECES)
> > -  && optab_handler (vec_duplicate_optab, mode) != CODE_FOR_nothing)
> > -return true;
> > +  && VECTOR_MODE_P (mode)
> > +  && optab_handler (vec_duplicate_optab, mode) ==
> CODE_FOR_nothing)
> > +return false;
> >
> >if (op == COMPARE_BY_PIECES
> > -  && optab_handler (mov_optab, mode) != CODE_FOR_nothing
> > -  && can_compare_p (EQ, mode, ccp_jump))
> > -return true;
> > +  && !can_compare_p (EQ, mode, ccp_jump))
> > +return false;
> >
> > -  return false;
> > +  return true;
> >  }
> >
> >  /* Return the widest mode that can be used to perform part of an @@
> > -1035,7 +1038,7 @@ widest_fixed_size_mode_for_size (unsigned int size,
> by_pieces_operation op)
> >   {
> > if (GET_MODE_SIZE (candidate) >= size)
> >   break;
> > -   if (qi_vector_mode_supported_p (candidate, op))
> > +   if (mode_supported_p (candidate, op))
> >   result = candidate;
> >   }
> >
> > @@ -1049,7 +1052,7 @@ widest_fixed_size_mode_for_size (unsigned int
> size, by_pieces_operation op)
> >  {
> >mode = tmode.require ();
> >if (GET_MODE_SIZE (mode) < size
> > - && targetm.scalar_mode_supported_p (mode))
> > + && mode_supported_p (mode, op))
> >result = mode;
> >  }
> >
> > @@ -1454,7 +1457,7 @@
> op_by_pieces_d::smallest_fixed_size_mode_for_size (unsigned int size)
> >   break;
> >
> > if (GET_MODE_SIZE (candidate) >= size
> > -   && qi_vector_mode_supported_p (candidate, m_op))
> > +   && mode_supported_p (candidate, m_op))
> >   return candidate;
> >   }
> >  }

RE: [PATCH-1v4, expand] Enable vector mode for compare_by_pieces [PR111449]

2023-10-24 Thread Jiang, Haochen

Hi Haochen Gui,

It seems that the commit caused lots of test case fail on x86 platforms:

https://gcc.gnu.org/pipermail/gcc-regression/2023-October/078379.html
https://gcc.gnu.org/pipermail/gcc-regression/2023-October/078380.html
https://gcc.gnu.org/pipermail/gcc-regression/2023-October/078381.html
https://gcc.gnu.org/pipermail/gcc-regression/2023-October/078382.html
https://gcc.gnu.org/pipermail/gcc-regression/2023-October/078383.html
https://gcc.gnu.org/pipermail/gcc-regression/2023-October/078384.html

Please help verify that if we need some testcase change or we get bug here.

A simple reproducer under build folder is:

make check RUNTESTFLAGS="i386.exp=g++.target/i386/pr80566-2.C 
--target_board='unix{-m64\ -march=cascadelake,-m32\ 
-march=cascadelake,-m32,-m64}'"

Thx,
Haochen

> -Original Message-
> From: HAO CHEN GUI 
> Sent: Monday, October 23, 2023 9:30 AM
> To: Richard Sandiford 
> Cc: gcc-patches 
> Subject: Re: [PATCH-1v4, expand] Enable vector mode for compare_by_pieces
> [PR111449]
> 
> Committed as r14-4835.
> 
> https://gcc.gnu.org/g:f08ca5903c7a02b450b93143467f70b9fd8e0085
> 
> Thanks
> Gui Haochen
> 
> 在 2023/10/20 16:49, Richard Sandiford 写道:
> > HAO CHEN GUI  writes:
> >> Hi,
> >>   Vector mode instructions are efficient for compare on some targets.
> >> This patch enables vector mode for compare_by_pieces. Two help
> >> functions are added to check if vector mode is available for certain
> >> by pieces operations and if if optabs exists for the mode and certain
> >> by pieces operations. One member is added in class op_by_pieces_d to
> >> record the type of operations.
> >>
> >>   The test case is in the second patch which is rs6000 specific.
> >>
> >>   Compared to last version, the main change is to add a target hook
> >> check - scalar_mode_supported_p when retrieving the available scalar
> >> modes. The mode which is not supported for a target should be skipped.
> >> (e.g. TImode on ppc). Also some function names and comments are refined
> >> according to reviewer's advice.
> >>
> >>   Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
> >> regressions.
> >>
> >> Thanks
> >> Gui Haochen
> >>
> >> ChangeLog
> >> Expand: Enable vector mode for by pieces compares
> >>
> >> Vector mode compare instructions are efficient for equality compare on
> >> rs6000. This patch refactors the codes of by pieces operation to enable
> >> vector mode for compare.
> >>
> >> gcc/
> >>PR target/111449
> >>* expr.cc (can_use_qi_vectors): New function to return true if
> >>we know how to implement OP using vectors of bytes.
> >>(qi_vector_mode_supported_p): New function to check if optabs
> >>exists for the mode and certain by pieces operations.
> >>(widest_fixed_size_mode_for_size): Replace the second argument
> >>with the type of by pieces operations.  Call can_use_qi_vectors
> >>and qi_vector_mode_supported_p to do the check.  Call
> >>scalar_mode_supported_p to check if the scalar mode is supported.
> >>(by_pieces_ninsns): Pass the type of by pieces operation to
> >>widest_fixed_size_mode_for_size.
> >>(class op_by_pieces_d): Remove m_qi_vector_mode.  Add m_op to
> >>record the type of by pieces operations.
> >>(op_by_pieces_d::op_by_pieces_d): Change last argument to the
> >>type of by pieces operations, initialize m_op with it.  Pass
> >>m_op to function widest_fixed_size_mode_for_size.
> >>(op_by_pieces_d::get_usable_mode): Pass m_op to function
> >>widest_fixed_size_mode_for_size.
> >>(op_by_pieces_d::smallest_fixed_size_mode_for_size): Call
> >>can_use_qi_vectors and qi_vector_mode_supported_p to do the
> >>check.
> >>(op_by_pieces_d::run): Pass m_op to function
> >>widest_fixed_size_mode_for_size.
> >>(move_by_pieces_d::move_by_pieces_d): Set m_op to
> MOVE_BY_PIECES.
> >>(store_by_pieces_d::store_by_pieces_d): Set m_op with the op.
> >>(can_store_by_pieces): Pass the type of by pieces operations to
> >>widest_fixed_size_mode_for_size.
> >>(clear_by_pieces): Initialize class store_by_pieces_d with
> >>CLEAR_BY_PIECES.
> >>(compare_by_pieces_d::compare_by_pieces_d): Set m_op to
> >>COMPARE_BY_PIECES.
> >
> > OK, thanks.  And thanks for your patience.
> >
> > Richard
> >
> >> patch.diff
> >> diff --git a/gcc/expr.cc b/gcc/expr.cc
> >> index 2c9930ec674..ad5f9dd8ec2 100644
> >> --- a/gcc/expr.cc
> >> +++ b/gcc/expr.cc
> >> @@ -988,18 +988,44 @@ alignment_for_piecewise_move (unsigned int
> max_pieces, unsigned int align)
> >>return align;
> >>  }
> >>
> >> -/* Return the widest QI vector, if QI_MODE is true, or integer mode
> >> -   that is narrower than SIZE bytes.  */
> >> +/* Return true if we know how to implement OP using vectors of bytes.  */
> >> +static bool
> >> +can_use_qi_vectors (by_pieces_operation op)
> >> +{
> >> +  return (op == COMPARE_BY_PIECES
> >> +|| op == SET_BY_PIECES
> >> +|| op == CLEAR_BY_PIECES);
> >> +}
> >> +

RE: [PATCH-1v4, expand] Enable vector mode for compare_by_pieces [PR111449]

2023-10-24 Thread Jiang, Haochen

It seems that the mail got caught elsewhere and did not send into gcc-patches
mailing thread. Resending that.

Thx,
Haochen

-Original Message-
From: Jiang, Haochen 
Sent: Tuesday, October 24, 2023 4:43 PM
To: HAO CHEN GUI ; Richard Sandiford 

Cc: gcc-patches 
Subject: RE: [PATCH-1v4, expand] Enable vector mode for compare_by_pieces 
[PR111449]

Hi Haochen Gui,

It seems that the commit caused lots of test case fail on x86 platforms:

https://gcc.gnu.org/pipermail/gcc-regression/2023-October/078379.html
https://gcc.gnu.org/pipermail/gcc-regression/2023-October/078380.html
https://gcc.gnu.org/pipermail/gcc-regression/2023-October/078381.html
https://gcc.gnu.org/pipermail/gcc-regression/2023-October/078382.html
https://gcc.gnu.org/pipermail/gcc-regression/2023-October/078383.html
https://gcc.gnu.org/pipermail/gcc-regression/2023-October/078384.html

Please help verify that if we need some testcase change or we get bug here.

A simple reproducer under build folder is:

make check RUNTESTFLAGS="i386.exp=g++.target/i386/pr80566-2.C 
--target_board='unix{-m64\ -march=cascadelake,-m32\ 
-march=cascadelake,-m32,-m64}'"

Thx,
Haochen

> -Original Message-
> From: HAO CHEN GUI 
> Sent: Monday, October 23, 2023 9:30 AM
> To: Richard Sandiford 
> Cc: gcc-patches 
> Subject: Re: [PATCH-1v4, expand] Enable vector mode for 
> compare_by_pieces [PR111449]
> 
> Committed as r14-4835.
> 
> https://gcc.gnu.org/g:f08ca5903c7a02b450b93143467f70b9fd8e0085
> 
> Thanks
> Gui Haochen
> 
> 在 2023/10/20 16:49, Richard Sandiford 写道:
> > HAO CHEN GUI  writes:
> >> Hi,
> >>   Vector mode instructions are efficient for compare on some targets.
> >> This patch enables vector mode for compare_by_pieces. Two help 
> >> functions are added to check if vector mode is available for 
> >> certain by pieces operations and if if optabs exists for the mode 
> >> and certain by pieces operations. One member is added in class 
> >> op_by_pieces_d to record the type of operations.
> >>
> >>   The test case is in the second patch which is rs6000 specific.
> >>
> >>   Compared to last version, the main change is to add a target hook 
> >> check - scalar_mode_supported_p when retrieving the available 
> >> scalar modes. The mode which is not supported for a target should be 
> >> skipped.
> >> (e.g. TImode on ppc). Also some function names and comments are 
> >> refined according to reviewer's advice.
> >>
> >>   Bootstrapped and tested on x86 and powerpc64-linux BE and LE with 
> >> no regressions.
> >>
> >> Thanks
> >> Gui Haochen
> >>
> >> ChangeLog
> >> Expand: Enable vector mode for by pieces compares
> >>
> >> Vector mode compare instructions are efficient for equality compare 
> >> on rs6000. This patch refactors the codes of by pieces operation to 
> >> enable vector mode for compare.
> >>
> >> gcc/
> >>PR target/111449
> >>* expr.cc (can_use_qi_vectors): New function to return true if
> >>we know how to implement OP using vectors of bytes.
> >>(qi_vector_mode_supported_p): New function to check if optabs
> >>exists for the mode and certain by pieces operations.
> >>(widest_fixed_size_mode_for_size): Replace the second argument
> >>with the type of by pieces operations.  Call can_use_qi_vectors
> >>and qi_vector_mode_supported_p to do the check.  Call
> >>scalar_mode_supported_p to check if the scalar mode is supported.
> >>(by_pieces_ninsns): Pass the type of by pieces operation to
> >>widest_fixed_size_mode_for_size.
> >>(class op_by_pieces_d): Remove m_qi_vector_mode.  Add m_op to
> >>record the type of by pieces operations.
> >>(op_by_pieces_d::op_by_pieces_d): Change last argument to the
> >>type of by pieces operations, initialize m_op with it.  Pass
> >>m_op to function widest_fixed_size_mode_for_size.
> >>(op_by_pieces_d::get_usable_mode): Pass m_op to function
> >>widest_fixed_size_mode_for_size.
> >>(op_by_pieces_d::smallest_fixed_size_mode_for_size): Call
> >>can_use_qi_vectors and qi_vector_mode_supported_p to do the
> >>check.
> >>(op_by_pieces_d::run): Pass m_op to function
> >>widest_fixed_size_mode_for_size.
> >>(move_by_pieces_d::move_by_pieces_d): Set m_op to
> MOVE_BY_PIECES.
> >>(store_by_pieces_d::store_by_pieces_d): Set m_op with the op.
> >>(can_store_by_pieces): Pass the type of by pieces operations to
> >>widest_fixed_

RE: [PATCH 0/3] Add Intel new cpu archs

2023-10-18 Thread Jiang, Haochen



> -Original Message-
> From: Hongtao Liu 
> Sent: Wednesday, October 18, 2023 8:25 AM
> To: Jiang, Haochen 
> Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao ;
> ubiz...@gmail.com
> Subject: Re: [PATCH 0/3] Add Intel new cpu archs
> 
> On Mon, Oct 16, 2023 at 2:25 PM Haochen Jiang 
> wrote:
> >
> > Hi all,
> >
> > The patches aim to add new cpu archs Clear Water Forest and Panther
> > Lake. Here comes the documentation:
> >
> > https://cdrdv2.intel.com/v1/dl/getContent/671368
> >
> > Also in the patches, I refactored how we detect cpu according to
> > features and added m_CORE_ATOM.
> >
> > Regtested on x86_64-pc-linux-gnu. Ok for trunk?
> Ok, please also update https://gcc.gnu.org/gcc-14/changes.html with your
> patches and USER_MSR.

I will commit the patches with naming change from Clear Water Forest to
Clearwater Forest.

Thx,
Haochen

> >
> > Thx,
> > Haochen
> >
> >
> 
> 
> 
> --
> BR,
> Hongtao

[r14-4629 Regression] FAIL: gcc.dg/vect/vect-simd-clone-18f.c scan-tree-dump-times vect "[\\n\\r] [^\\n]* = foo\\.simdclone" 2 on Linux/x86_64

2023-10-17 Thread Jiang, Haochen

On Linux/x86_64,

3179ad72f67f31824c444ef30ef171ad7495d274 is the first bad commit
commit 3179ad72f67f31824c444ef30ef171ad7495d274
Author: Richard Biener rguent...@suse.de
Date:   Fri Oct 13 12:32:51 2023 +0200

OMP SIMD inbranch call vectorization for AVX512 style masks

caused

FAIL: gcc.dg/vect/vect-simd-clone-16b.c scan-tree-dump-times vect "[\\n\\r] 
[^\\n]* = foo\\.simdclone" 3
FAIL: gcc.dg/vect/vect-simd-clone-16.c scan-tree-dump-times vect "[\\n\\r] 
[^\\n]* = foo\\.simdclone" 2
FAIL: gcc.dg/vect/vect-simd-clone-16e.c scan-tree-dump-times vect "[\\n\\r] 
[^\\n]* = foo\\.simdclone" 3
FAIL: gcc.dg/vect/vect-simd-clone-16f.c scan-tree-dump-times vect "[\\n\\r] 
[^\\n]* = foo\\.simdclone" 2
FAIL: gcc.dg/vect/vect-simd-clone-17b.c scan-tree-dump-times vect "[\\n\\r] 
[^\\n]* = foo\\.simdclone" 3
FAIL: gcc.dg/vect/vect-simd-clone-17.c scan-tree-dump-times vect "[\\n\\r] 
[^\\n]* = foo\\.simdclone" 2
FAIL: gcc.dg/vect/vect-simd-clone-17e.c scan-tree-dump-times vect "[\\n\\r] 
[^\\n]* = foo\\.simdclone" 3
FAIL: gcc.dg/vect/vect-simd-clone-17f.c scan-tree-dump-times vect "[\\n\\r] 
[^\\n]* = foo\\.simdclone" 2
FAIL: gcc.dg/vect/vect-simd-clone-18b.c scan-tree-dump-times vect "[\\n\\r] 
[^\\n]* = foo\\.simdclone" 3
FAIL: gcc.dg/vect/vect-simd-clone-18.c scan-tree-dump-times vect "[\\n\\r] 
[^\\n]* = foo\\.simdclone" 2
FAIL: gcc.dg/vect/vect-simd-clone-18e.c scan-tree-dump-times vect "[\\n\\r] 
[^\\n]* = foo\\.simdclone" 3
FAIL: gcc.dg/vect/vect-simd-clone-18f.c scan-tree-dump-times vect "[\\n\\r] 
[^\\n]* = foo\\.simdclone" 2

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r14-4629/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-simd-clone-16b.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-simd-clone-16b.c 
--target_board='unix{-m64\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-simd-clone-16.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-simd-clone-16.c 
--target_board='unix{-m64\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-simd-clone-16e.c 
--target_board='unix{-m64\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-simd-clone-16f.c 
--target_board='unix{-m64\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-simd-clone-17b.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-simd-clone-17b.c 
--target_board='unix{-m64\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-simd-clone-17.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-simd-clone-17.c 
--target_board='unix{-m64\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-simd-clone-17e.c 
--target_board='unix{-m64\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-simd-clone-17f.c 
--target_board='unix{-m64\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-simd-clone-18b.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-simd-clone-18b.c 
--target_board='unix{-m64\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-simd-clone-18.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-simd-clone-18.c 
--target_board='unix{-m64\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-simd-clone-18e.c 
--target_board='unix{-m64\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-simd-clone-18f.c 
--target_board='unix{-m64\ -march=cascadelake}'"

(If you met problems with cascadelake related, disabling AVX512F in command 
line might save that.)
(However, please make sure that there is no potential problems with AVX512.)

[r14-4046 Regression] FAIL: 23_containers/vector/bool/110807.cc -std=gnu++17 (test for excess errors) on Linux/x86_64

2023-09-17 Thread Jiang, Haochen via Gcc-patches

On Linux/x86_64,

3a0e01f6bb1d6ec444001f2caea6ef43a4a83e3a is the first bad commit
commit 3a0e01f6bb1d6ec444001f2caea6ef43a4a83e3a
Author: Jonathan Wakely 
Date:   Fri Sep 1 21:27:57 2023 +0100

libstdc++: Add support for running tests with multiple -std options

caused

FAIL: 23_containers/vector/bool/110807.cc  -std=gnu++17 (test for excess errors)

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r14-4046/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/x86_64-linux/libstdc++-v3/testsuite && make check 
RUNTESTFLAGS="conformance.exp=23_containers/vector/bool/110807.cc 
--target_board='unix{-m32}'"
$ cd {build_dir}/x86_64-linux/libstdc++-v3/testsuite && make check 
RUNTESTFLAGS="conformance.exp=23_containers/vector/bool/110807.cc 
--target_board='unix{-m32\ -march=cascadelake}'"

(If you met problems with cascadelake related, disabling AVX512F in command 
line might save that.)
(However, please make sure that there is no potential problems with AVX512.)

[r14-3823 Regression] FAIL: c-c++-common/analyzer/compound-assignment-1.c -std=c++98 (test for warnings, line 72) on Linux/x86_64

2023-09-11 Thread Jiang, Haochen via Gcc-patches

On Linux/x86_64,

50b5199cff690891726877e1c00ac53dfb7cc1c8 is the first bad commit
commit 50b5199cff690891726877e1c00ac53dfb7cc1c8
Author: benjamin priour 
Date:   Sat Sep 9 18:03:56 2023 +0200

analyzer: Move gcc.dg/analyzer tests to c-c++-common (2) [PR96395]

caused

FAIL: c-c++-common/analyzer/compound-assignment-1.c  -std=c++14 (test for 
excess errors)
FAIL: c-c++-common/analyzer/compound-assignment-1.c  -std=c++14  (test for 
warnings, line 72)
FAIL: c-c++-common/analyzer/compound-assignment-1.c  -std=c++17 (test for 
excess errors)
FAIL: c-c++-common/analyzer/compound-assignment-1.c  -std=c++17  (test for 
warnings, line 72)
FAIL: c-c++-common/analyzer/compound-assignment-1.c  -std=c++20 (test for 
excess errors)
FAIL: c-c++-common/analyzer/compound-assignment-1.c  -std=c++20  (test for 
warnings, line 72)
FAIL: c-c++-common/analyzer/compound-assignment-1.c  -std=c++98 (test for 
excess errors)
FAIL: c-c++-common/analyzer/compound-assignment-1.c  -std=c++98  (test for 
warnings, line 72)

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r14-3823/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="analyzer.exp=c-c++-common/analyzer/compound-assignment-1.c 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="analyzer.exp=c-c++-common/analyzer/compound-assignment-1.c 
--target_board='unix{-m32\ -march=cascadelake}'"

(If you met problems with cascadelake related, disabling AVX512F in command 
line might save that.)
(However, please make sure that there is no potential problems with AVX512.)

[r14-3571 Regression] FAIL: gcc.target/i386/pr52252-atom.c scan-assembler palignr on Linux/x86_64

2023-08-30 Thread Jiang, Haochen via Gcc-patches

On Linux/x86_64,

caa7a99a052929d5970677c5b639e1fa5166e334 is the first bad commit
commit caa7a99a052929d5970677c5b639e1fa5166e334
Author: Richard Biener 
Date:   Wed Aug 30 11:57:47 2023 +0200

tree-optimization/111228 - combine two VEC_PERM_EXPRs

caused

FAIL: gcc.target/i386/pr52252-atom.c scan-assembler palignr

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r14-3571/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/pr52252-atom.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/pr52252-atom.c 
--target_board='unix{-m64\ -march=cascadelake}'"

(For question about this report, contact me at haochen dot jiang at intel.com.)
(If you met problems with cascadelake related, disabling AVX512F in command 
line might save that.)
(However, please make sure that there is no potential problems with AVX512.)

RE: Intel AVX10.1 Compiler Design and Support

2023-08-23 Thread Jiang, Haochen via Gcc-patches

> -Original Message-
> From: Richard Biener 
> Sent: Wednesday, August 23, 2023 3:32 PM
> To: Hongtao Liu 
> Cc: Jakub Jelinek ; Jiang, Haochen
> ; ZiNgA BuRgA ; gcc-
> patc...@gcc.gnu.org
> Subject: Re: Intel AVX10.1 Compiler Design and Support
> 
> On Tue, Aug 22, 2023 at 4:36 PM Hongtao Liu  wrote:
> >
> > On Tue, Aug 22, 2023 at 9:54 PM Jakub Jelinek  wrote:
> > >
> > > On Tue, Aug 22, 2023 at 09:35:44PM +0800, Hongtao Liu wrote:
> > > > Ok, then we can't avoid TARGET_AVX10_1 in those existing 256/128-bit
> > > > evex instruction patterns.
> > >
> > > Why?
> > > Internally for md etc. purposes, we should have the current
> > > TARGET_AVX512* etc. ISA flags, plus one new one, whatever we call it
> > > (TARGET_EVEX512 even if it is not completely descriptive because of kandq
> > > etc., or some other name) which says if 512-bit vector modes can be used,
> > > if g modifier can be used, if the 64-bit mask operations can be used etc.
> > > Plus, if AVX10.1 contains any instructions not covered in the preexisting
> > > TARGET_AVX512* sets, TARGET_AVX10_1 which covers that delta, otherwise
> > > keep -mavx10.1 just as an command line option which enables/disables
> > Let's assume there's no detla now, AVX10.1-512 is equal to
> > AVX512{F,VL,BW,DQ,CD,BF16,FP16,VBMI,VBMI2,VNNI,IFMA,BITALG,
> VPOPCNTDQ}
> > > other stuff.
> > > The current common/config/i386/i386-common.cc OPTION_MASK_ISA*SET*
> would be
> > > like now, except that the current AVX512* sets imply also EVEX512/whatever
> > > it will be called, that option itself enables nothing (or TARGET_AVX512F),
> > > and unsetting it doesn't disable all the TARGET_AVX512*.
> > > -mavx10.1 would enable the AVX512* sets without EVEX512/whatever.
> > So for -mavx512bw -mavx10.1-256, -mavx512bw will set EVEX512, but
> > -mavx10.1-256 doesn't clear EVEX512 but just enable all AVX512* sets?.
> 
> As I said earlier -mavx10.1-256 (and -mavx10.1-512) should not exist.
> So instead
> we'd have -mavx512bw -mavx10.1 where -mavx512bw enables evex512 and
> -mavx10.1 will enable the 10.1 ISAs _not affecting_ whether evex512 is
> set or not.
> 
> We then have the -mevex512 flag (or whatever name we agree to) to enable
> (or disable) 512bit support.
> 
> If you insist on having -mavx10.1-256 that should alias to -mavx10.1 +
> -mno-evex512,
> but Jakub disagrees here, so I'd rather not have it at all.  We could have
> -mavx10.1-512 aliasing to -mavx10.1 + -mevex512 (Jakub would agree here).

We could first work on -mevex512 then further discuss -mavx10.1-256/512 since
these -mavx10.1-256/512 is quite controversial.

Just to clarify, -mno-evex512 -mavx512f should not enable 512 bit vector right?

Thx,
Haochen

> 
> Richard.

RE: Intel AVX10.1 Compiler Design and Support

2023-08-23 Thread Jiang, Haochen via Gcc-patches

> -Original Message-
> From: Hongtao Liu 
> Sent: Wednesday, August 23, 2023 10:19 AM
> To: Jiang, Haochen 
> Cc: Jakub Jelinek ; Richard Biener
> ; ZiNgA BuRgA ;
> gcc-patches@gcc.gnu.org
> Subject: Re: Intel AVX10.1 Compiler Design and Support
> 
> On Wed, Aug 23, 2023 at 9:58 AM Jiang, Haochen
>  wrote:
> >
> > > -Original Message-
> > > From: Jakub Jelinek 
> > > Sent: Tuesday, August 22, 2023 11:02 PM
> > > To: Hongtao Liu 
> > > Cc: Richard Biener ; Jiang, Haochen
> > > ; ZiNgA BuRgA ;
> > > gcc- patc...@gcc.gnu.org
> > > Subject: Re: Intel AVX10.1 Compiler Design and Support
> > >
> > > On Tue, Aug 22, 2023 at 10:35:55PM +0800, Hongtao Liu wrote:
> > > > Let's assume there's no detla now, AVX10.1-512 is equal to
> > > >
> AVX512{F,VL,BW,DQ,CD,BF16,FP16,VBMI,VBMI2,VNNI,IFMA,BITALG,VPOPC
> NT
> > > > DQ}
> > > > > other stuff.
> > > > > The current common/config/i386/i386-common.cc
> > > > > OPTION_MASK_ISA*SET* would be like now, except that the current
> > > > > AVX512* sets imply also EVEX512/whatever it will be called, that
> > > > > option itself enables nothing (or TARGET_AVX512F), and unsetting it
> doesn't disable all the TARGET_AVX512*.
> > > > > -mavx10.1 would enable the AVX512* sets without
> EVEX512/whatever.
> > > > So for -mavx512bw -mavx10.1-256, -mavx512bw will set EVEX512, but
> > > > -mavx10.1-256 doesn't clear EVEX512 but just enable all AVX512* sets?.
> > > > then the combination basically is equal to AVX10.1-512(AVX512*
> > > > sets +
> > > > EVEX512)
> > > > If this is your assumption, yes, there's no need for TARGET_AVX10_1.
> >
> > I think we still need that since the current w/o AVX512VL, we will not
> > only enable 512 bit vector instructions but also enable scalar
> > instructions, which means when it comes to -mavx512bw -mno-evex512,
> we
> > should enable the scalar function.
> >
> > And scalar functions will also be enabled in AVX10.1-256, we need
> > something to distinguish them out from the ISA set w/o AVX512VL.
> Why do we need to distinguish scalar evex instruction?
> As long as -mavx512XXX -mno-evex does not generate zmm/64-bit kmask, it
> should be ok.
> 
> Assume there's no delta in AVX10.1, It sounds to me the design should be like
> 
> avx512*  <== mno-evex512==  avx512* + mevex512
> (no-evex512)(original AVX512 stuff)
>/\  /\
>||(equal)   ||(equal)
>\/  \/
> avx10.1-256   avx10.1-512
> /\  /\
> ||  ||
> ||  ||
> impliedimplied
> ||  ||
> ||  ||
> avx10.2-256 <== implied ==  avx10.2-512
> /\ /\
> || ||
> || ||
> impliedImplied
> || ||
> || ||
> avx10.3-256 <== implied ==   avx10.3-512
> 
> 1. The new instructions in avx10.x should be put in either avx10.x-256 or
> avx10.x-512 according to vector/kmask size 2. -mno-evex512 should disable -
> avx10.x-512.
> 3. -mavx512* will defaultly enable -mevex512, but -mavx10.1-256 will just
> enable -mavx512* but not -mevex512

I will revert all the AVX10.1 patches that have been committed in trunk since
the design changed if there is no objection in 24 hours.

Also I am working on a sample patch for -mevex512. Although there is a little
encoding issue in APX EVEX promoted KMOVQ, most of the users will not
notice that. And -mavxex512 is quite straightforward.

Thx,
Haochen

> 
> >
> > Thx,
> > Haochen
> >
> > >
> > > I think that would be my expectation.  -mavx512bw currently implies
> > > 512-bit vector support of avx512f and avx512bw, and with
> > > -mavx512{bw,vl} also 128-bit/256-bit vector support.  All pre-AVX10
> > > chips which do support AVX512BW support 512-bit vectors.  Now,
> > > -mavx10.1 will bring in also
> > > vl,dq,cd,bf16,fp16,vbmi,vbmi2,vnni,ifma,bitalg,vpopcntdq as you
> > > wrote whic

RE: Intel AVX10.1 Compiler Design and Support

2023-08-22 Thread Jiang, Haochen via Gcc-patches

> -Original Message-
> From: Jakub Jelinek 
> Sent: Tuesday, August 22, 2023 11:02 PM
> To: Hongtao Liu 
> Cc: Richard Biener ; Jiang, Haochen
> ; ZiNgA BuRgA ; gcc-
> patc...@gcc.gnu.org
> Subject: Re: Intel AVX10.1 Compiler Design and Support
> 
> On Tue, Aug 22, 2023 at 10:35:55PM +0800, Hongtao Liu wrote:
> > Let's assume there's no detla now, AVX10.1-512 is equal to
> > AVX512{F,VL,BW,DQ,CD,BF16,FP16,VBMI,VBMI2,VNNI,IFMA,BITALG,VPOPCNTDQ}
> > > other stuff.
> > > The current common/config/i386/i386-common.cc OPTION_MASK_ISA*SET* would 
> > > be
> > > like now, except that the current AVX512* sets imply also EVEX512/whatever
> > > it will be called, that option itself enables nothing (or TARGET_AVX512F),
> > > and unsetting it doesn't disable all the TARGET_AVX512*.
> > > -mavx10.1 would enable the AVX512* sets without EVEX512/whatever.
> > So for -mavx512bw -mavx10.1-256, -mavx512bw will set EVEX512, but
> > -mavx10.1-256 doesn't clear EVEX512 but just enable all AVX512* sets?.
> > then the combination basically is equal to AVX10.1-512(AVX512* sets +
> > EVEX512)
> > If this is your assumption, yes, there's no need for TARGET_AVX10_1.

I think we still need that since the current w/o AVX512VL, we will not only
enable 512 bit vector instructions but also enable scalar instructions, which
means when it comes to -mavx512bw -mno-evex512, we should enable
the scalar function.

And scalar functions will also be enabled in AVX10.1-256, we need something
to distinguish them out from the ISA set w/o AVX512VL.

Thx,
Haochen

> 
> I think that would be my expectation.  -mavx512bw currently implies
> 512-bit vector support of avx512f and avx512bw, and with -mavx512{bw,vl}
> also 128-bit/256-bit vector support.  All pre-AVX10 chips which do support
> AVX512BW support 512-bit vectors.  Now, -mavx10.1 will bring in also
> vl,dq,cd,bf16,fp16,vbmi,vbmi2,vnni,ifma,bitalg,vpopcntdq as you wrote
> which weren't enabled before, but unless there is some existing or planned
> CPU which would support 512-bit vectors in avx512f and avx512bw ISAs and
> only support 128/256-bit vectors in those
> dq,cd,bf16,fp16,vbmi,vbmi2,vnni,ifma,bitalg,vpopcntdq isas, I think there
> is no need to differentiate further; the only CPUs which will support both
> what -mavx512bw and -mavx10.1 requires will be (if there is no delta)
> either CPUs with 128/256/512-bit vector support of those
> f,vl,bw,dq,cd,...vpopcntdq ISAs, or AVX10.1-512 ISAs.
> -mavx512vl -mavx512bw -mno-evex512 -mavx10.1-256 would on the other side
> disable all 512-bit vector instructions and in the end just mean the
> same as -mavx10.1-256.
> For just
> -mavx512bw -mno-evex512 -mavx10.1-256
> the question is if that -mno-evex512 turns off also avx512bw/avx512f because
> avx512vl isn't enabled at that point during processing, or if we do that
> only at the end as a special case.  Of course, in this exact case there is
> no difference, because -mavx10.1-256 turns that back on.
> But it would make a difference on
> -mavx512bw -mno-evex512 -mavx512vl
> (when processed right away would disable AVX512BW (because VL isn't on)
> and in the end enable VL,F including EVEX512, or be equivalent to just
> -mavx512bw -mavx512vl if processed at the end, because -mavx512vl implied
> -mevex512 again.
> 
>   Jakub

RE: Intel AVX10.1 Compiler Design and Support

2023-08-22 Thread Jiang, Haochen via Gcc-patches

> -Original Message-
> From: Richard Biener 
> Sent: Tuesday, August 22, 2023 4:36 PM
> To: Jakub Jelinek 
> Cc: Jiang, Haochen ; ZiNgA BuRgA
> ; Hongtao Liu ; gcc-
> patc...@gcc.gnu.org
> Subject: Re: Intel AVX10.1 Compiler Design and Support
> 
> On Tue, Aug 22, 2023 at 10:34 AM Jakub Jelinek  wrote:
> >
> > On Tue, Aug 22, 2023 at 09:36:15AM +0200, Richard Biener via Gcc-patches
> wrote:
> > > I think internally we should have conditional 512bit support work across
> > > AVX512 and AVX10.
> > >
> > > I also think it makes sense to _internally_ have AVX10.1 (10.1!) just
> > > enable the respective AVX512 features.  AVX10.2 would then internally
> > > cover the ISA extensions added in 10.2 only.  Both would reduce the
> > > redundancy and possibly make providing inter-operation between
> > > AVX10.1 (10.1!) and AVX512 to the user easier.  I see AVX 10.1 (10.1!)
> > > just as "re-branding" latest AVX512, so we should treat it that way
> > > (making it an alias to the AVX512 features).
> > >
> > > Whether we want allow -mavx10.1 -mno-avx512cd or whether
> > > we only allow the "positive" -mavx512f -mavx512... (omitting avx512cd)
> > > is an entirely separate
> > > question.  But I think to not wreck the core idea (more interoperability,
> > > here between small/big cores) we absolutely have to
> > > provide a subset of avx10.1 but with disabled 512bit vectors which
> > > effectively means AVX512 with disabled 512bit support.
> >
> > Agreed.  And I still think -mevex512 vs. -mno-evex512 is the best option
> > name to represent whether the effective ISA set allows 512-bit vectors or
> > not.
> 
> Works for me.  Note it also implies mask regs are SImode, not DImode,
> not sure if that relates to evex more than mask reg encodings are all evex ...
> 

Just in case we are not on the same page.

So we are looking forward to an "extended" -m[no-]avx10-max-512bit option,
which can also be used on AVX512. The other basic logic will not change.

BTW, -mevex512 is not a good name since there will be 64 bit mask operations
promoted to EVEX128 in APX, which might cause confusion.

Thx,
Haochen

> >  I think -mavx10.1 -mno-avx512cd should be fine.  And, -mavx10.1-256
> > option IMHO should be in the same spirit to all the others a positive
> enablement,
> > not both positive (enable avx512{f,cd,bw,dq,...} and negative (disallow
> > 512-bit vectors).  So, if one uses -mavx512f -mavx10.1-256, because the
> > former would allow 512-bit vectors, the latter shouldn't disable those again
> > because it isn't a -mno-* option.  Sure, instructions which are specific to
> > AVX10.1 (aren't present in any currently existing AVX512* ISA set) might be
> > enabled only in 128/256 bit variants if we differentiate that level.
> > But, if one uses -mavx2 -mavx10.1-256, because no AVX512* has been enabled
> > it can enable all the AVX10.1 implied AVX512* parts without EVEX.512.
> >
> > Jakub
> >

RE: Intel AVX10.1 Compiler Design and Support

2023-08-21 Thread Jiang, Haochen via Gcc-patches

> -Original Message-
> From: ZiNgA BuRgA 
> Sent: Monday, August 21, 2023 5:27 PM
> To: Richard Biener ; Hongtao Liu
> 
> Cc: Jiang, Haochen ; gcc-patches@gcc.gnu.org
> Subject: Re: Intel AVX10.1 Compiler Design and Support
> 
> Another way (not saying this is better, just throwing out ideas) is to
> break AVX10.1 into all the AVX-512 subsets.
> So you'd have something like -mavx10.1-256-vl, -mavx10.1-512-vbmi etc.
> 
> * -mavx10.1-256  would effectively be an alias for all the 128+256-bit
> subsets, and set the __AVX10_1__ define
> * -mavx512vbmi  would effectively be an alias for `-mavx10.1-128-vbmi
> -mavx10.1-256-vbmi -mavx10.1-512-vbmi` and set the __AVX512VBMI__ define
> (`-mavx10.1-512-vl` might not make much sense unless it implies AVX512F?)
> * -mno-avx512vbmi  would similarly be an alias for
> `-mno-avx10.1-128-vbmi -mno-avx10.1-256-vbmi -mno-avx10.1-512-vbmi`;
> with this, `-mavx10.1-256 -mno-avx512vbmi` would make sense, even if
> unusual (enable all AVX10.1 but disable all VBMI)
> * -mavx10.2-256  would act as a single feature, cementing in AVX10.2
> like the current AVX10.1 proposal, and AVX-512 subsets can't be turned off

I am considering a proposal quite similar to this if we want to change the
design so that it is flexible.

But there are a few proposals on the table. The problem for this proposal
is that if it is a over-design to make each AVX512 feature to split since in 
most
scenarios we just need to keep the vector width as the same.

Thx,
Haochen

> 
> 
> On 21/08/2023 5:36 pm, Richard Biener wrote:
> > On Mon, Aug 21, 2023 at 3:20 AM Hongtao Liu via Gcc-patches
> >  wrote:
> >
> > Yes.  Note we cannot really re-purpose -mprefer-vector-width=256 since that
> > would also make uses of 512bit intrinsics ill-formed.  So we'd need a new
> > flag that would restrict AVX512VL to 256bit, possibly using a common 
> > internal
> > flag for this and the -mavx10.1-256 vector size effect.
> >
> > Maybe -mdisable-vector-width-512 or -mavx512vl-for-avx10.1-256 or
> > -mavx512vl-256?  Writing these the last looks most sensible to me?
> > Note it should combine with -mavx512vl to -mavx512vl-256 to make
> > -march=native -mavx512vl-256 work (I think we should also allow the
> > flag together with -mavx10.1*?)
> >
> > mavx512vl-256
> > Target ...
> > Disable the 512bit vector ISA subset of AVX512 or AVX10, enable
> > the 256bit vector ISA subset of AVX512.
> >
> > Richard.
>

[r14-2946 Regression] FAIL: gcc.target/i386/pr87007-5.c scan-assembler-times vxorps[^\n\r]*xmm[0-9] 0 on Linux/x86_64

2023-08-15 Thread Jiang, Haochen via Gcc-patches

From: haochen.jiang  
Sent: Tuesday, August 15, 2023 5:26 PM
To: rguent...@suse.de; gcc-regress...@gcc.gnu.org; gcc-patches@gcc.gnu.org; 
Jiang, Haochen 
Subject: [r14-2946 Regression] FAIL: gcc.target/i386/pr87007-5.c 
scan-assembler-times vxorps[^\n\r]*xmm[0-9] 0 on Linux/x86_64

On Linux/x86_64,

46c8c225455273ce7f7da7cc5707aed54f23e78d is the first bad commit
commit 46c8c225455273ce7f7da7cc5707aed54f23e78d
Author: Richard Biener 
Date:   Wed Jul 26 15:23:45 2023 +0200

Improve sinking with unrelated defs

caused

FAIL: gcc.target/i386/pr87007-5.c scan-assembler-times vxorps[^\n\r]*xmm[0-9] 0

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r14-2946/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/pr87007-5.c --target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/pr87007-5.c --target_board='unix{-m32\ 
-march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/pr87007-5.c --target_board='unix{-m64}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/pr87007-5.c --target_board='unix{-m64\ 
-march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at haochen dot jiang at intel.com.)
(If you met problems with cascadelake related, disabling AVX512F in command 
line might save that.)
(However, please make sure that there is no potential problems with AVX512.)

[r14-3148 Regression] FAIL: gcc.dg/vect/bb-slp-subgroups-2.c scan-tree-dump-times slp2 "optimized: basic block" 2 on Linux/x86_64

2023-08-15 Thread Jiang, Haochen via Gcc-patches

From: haochen.jiang  
Sent: Tuesday, August 15, 2023 5:26 PM
To: rguent...@suse.de; gcc-regress...@gcc.gnu.org; gcc-patches@gcc.gnu.org; 
Jiang, Haochen 
Subject: [r14-3148 Regression] FAIL: gcc.dg/vect/bb-slp-subgroups-2.c 
scan-tree-dump-times slp2 "optimized: basic block" 2 on Linux/x86_64

On Linux/x86_64,

3a13884b23ae32b43d56d68a9c6bd4ce53d60017 is the first bad commit commit 
3a13884b23ae32b43d56d68a9c6bd4ce53d60017
Author: Richard Biener 
Date:   Fri Aug 11 12:08:10 2023 +0200

Improve BB vectorization opt-info

caused

FAIL: gcc.dg/vect/bb-slp-subgroups-2.c -flto -ffat-lto-objects  
scan-tree-dump-times slp2 "optimized: basic block" 2
FAIL: gcc.dg/vect/bb-slp-subgroups-2.c scan-tree-dump-times slp2 "optimized: 
basic block" 2

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r14-3148/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/bb-slp-subgroups-2.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/bb-slp-subgroups-2.c 
--target_board='unix{-m64\ -march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at haochen dot jiang at intel.com.)
(If you met problems with cascadelake related, disabling AVX512F in command 
line might save that.)
(However, please make sure that there is no potential problems with AVX512.)

RE: Intel AVX10.1 Compiler Design and Support

2023-08-10 Thread Jiang, Haochen via Gcc-patches

Hi all,

There are lots of discussions on arch level and ABIs and I really appreciate 
that.

For the arch level issue, it might be a little early to discuss and should not 
block
these patches.

For ABI issue, the problem actually comes from the current behavior between
GCC and clang/LLVM are different in return value for m512 w/o 512 bit support.
Then it becomes a question to get unified and we get the whole discussion.
However, it is a corner case.

So let's first focus on the options design and the behavior on that. We could
continue to discuss those two issues after the main behavior is settled down.
Richard has raised some concerns in option combinations. Any other concerns?

Thx,
Haochen

> -Original Message-
> From: Gcc-patches  bounces+haochen.jiang=intel@gcc.gnu.org> On Behalf Of Haochen Jiang via
> Gcc-patches
> Sent: Tuesday, August 8, 2023 3:13 PM
> To: gcc-patches@gcc.gnu.org
> Cc: ubiz...@gmail.com; Liu, Hongtao 
> Subject: Intel AVX10.1 Compiler Design and Support
> 
> Hi all,
> 
> We will send out our initial support of AVX10 and some sample patches in this
> mailing thread. And there will be more coming up afterwards. Therefore, we
> would like to share our proposed AVX10 design in GCC.
> 
> Here is a quick introduction to AVX10:
>   - AVX10 is the first major new ISA since the introduction of AVX512 in 2013.
>   - Since the introduction of AVX10, we would like to establish a common,
> converged vector instruction set across all Intel architectures, including
> Xeon Server, Atom Server and Clients.
>   - The default maximum vector size for AVX10 will be 256 bit, while 512 bit 
> is
> optional.
>   - AVX10.1 will include all existing AVX512 instructions in Granite Rapids.
>   - There will be no new AVX512 CPUID introduced in future. All EVEX vector
> instructions will be under AVX10 umbrella.
>   - AVX10 will be version-based ISA instead of tons of different CPUIDs like
> AVX512BW, AVX512DQ, AVX512FP16, etc.
>   - Based on AVX10.1, AVX10.2 will introduce ymm embedded rounding, SAE
> (Suppressed All Exceptions) control and new instructions.
> 
> If you would like to have a closed look at the details, please follow the 
> links
> below:
> 
> Intel Advanced Vector Extensions 10 (Intel AVX10) Architecture Specification 
> It
> describes the Intel Advanced Vector Extensions 10 Instruction Set 
> Architecture.
> https://cdrdv2.intel.com/v1/dl/getContent/784267
> 
> The Converged Vector ISA: Intel Advanced Vector Extensions 10 Technical Paper 
> It
> provides introductory information regarding the converged vector ISA: Intel
> Advanced Vector Extensions 10.
> https://cdrdv2.intel.com/v1/dl/getContent/784343
> 
> Hence, we will have several compiler design ground rules for AVX10:
>   - AVX10 is a converged ISA feature set.
> We will not provide -m[no-]xxx to enable/disable each single vector 
> feature
> in one version as we used to before. Instead, a simple option 
> -m[no-]avx10.x
> is used. If 512 bit version is needed, -mavx10.x-512 is all you need. 
> Also,
> maximum vector width should be the same when different version of AVX10 is
> used. For example, enabling AVX10.1 with 512 bit vector width while 
> enabling
> AVX10.2 with only 256 bit vector width is not a desired behavior.
>   - AVX10 is an evolving ISA feature set.
> Every feature showed up in the current version will always show up in 
> future
> version.
>   - AVX10 is an independent ISA feature set.
> Although sharing the same instructions and encodings, AVX10 and AVX512 are
> conceptual independent features, which means they are orthogonal.
> 
> Since AVX10 will have several benefits like bringing AVX512 features on Atom
> Server and Clients and getting rid of tons of AVX512 CPUIDs but a simple AVX10
> option to enable features, we lean towards the adoption of AVX10 instead of
> AVX512 from now on.
> 
> Based on all we got, we would like to introduce the following compiler 
> options:
>   - -mavx10.x: The option will enable AVX10.1-AVX10.x features with a default
> 256 bit vector width to make sure the compatibility on all platforms.
>   - -mavx10.x-512: The option will enable AVX10.1-AVX10.x features with 512 
> bit
> vector width. “-mno-avx10.x-512” option will not be provided to avoid
> confusion of disabling 512 vector width or avx10.x itself.
>   - -mavx10.x-256: The option will enable AVX10.1-AVX10.x features with 256 
> bit
> vector width. But it will disable 512 bit vector width since the vector 
> size
> is indicated in option. “-mno-avx10.x-256” option will not be provided to
> keep align with the 512 ones.
>   - -mno-avx10.x: The option will disable all the features introduced 
> >=avx10.x
> (both 256 and 512 bit) and keep features  how
> -mno- options behave previously.
> 
> When there comes an option combination of various vector size indicated (e.g. 
> -
> mavx10.x-512 -mavx10.y-256), we would like to emit a

RE: Intel AVX10.1 Compiler Design and Support

2023-08-10 Thread Jiang, Haochen via Gcc-patches

> -Original Message-
> From: Jan Beulich 
> Sent: Thursday, August 10, 2023 9:31 PM
> To: Phoebe Wang 
> Cc: Joseph Myers ; Wang, Phoebe
> ; Hongtao Liu ; Jiang, Haochen
> ; gcc-patches@gcc.gnu.org; ubiz...@gmail.com; Liu,
> Hongtao ; Zhang, Annita ;
> x86-64-abi ; llvm-dev  d...@lists.llvm.org>; Craig Topper ; Richard Biener
> 
> Subject: Re: Intel AVX10.1 Compiler Design and Support
> 
> On 10.08.2023 15:12, Phoebe Wang wrote:
> >>  The psABI should have some simple rule covering all of the above I think.
> >
> > psABI has a rule for the case doesn't mean the rule is a well defined
> > ABI in practice. A well defined ABI should guarantee 1) interlinkable
> > across different compile options within the same compiler; 2)
> > interlinkable across different compilers. Both aspects are failed in the 
> > non 512-
> bit version.
> >
> > 1) is more important than 2) and becomes more critical on AVX10 targets.
> > Because we expect AVX10-256 is a general setting for binaries that can
> > run on both AVX10-256 and AVX10-512. It would be common that binaries
> > compiled with AVX10-256 may link with native built binaries on AVX10-512
> targets.

IMO it is not acceptable for AVX10-256 to generate zmm registers.

If I have to choose among the three proposal, the second is better.

But the best choice I suppose is to keep what we are doing currently, which is
passing them in memory and emit a warning. It is a reasonable behavior.

Thx,
Haochen

> 
> But you're only describing a pre-existing problem here afaict. Code compiled 
> with
> -mavx51f passing __m512 type data to a function compiled with only, say, 
> -maxv2
> won't interoperate properly either. What's worse, imo the psABI doesn't
> sufficiently define what __m256 etc actually are. After all these aren't types
> defined by the C standard (as opposed to at least most other types in the
> respective table there), and you can't really make assumptions like "this is 
> what
> certain compilers think this is".
> 
> Jan

RE: Intel AVX10.1 Compiler Design and Support

2023-08-09 Thread Jiang, Haochen via Gcc-patches

> -Original Message-
> From: Richard Biener 
> Sent: Tuesday, August 8, 2023 8:45 PM
> To: Jiang, Haochen 
> Cc: Jakub Jelinek ; gcc-patches@gcc.gnu.org;
> ubiz...@gmail.com; Liu, Hongtao 
> Subject: Re: Intel AVX10.1 Compiler Design and Support
> 
> On Tue, Aug 8, 2023 at 10:15 AM Jiang, Haochen via Gcc-patches  patc...@gcc.gnu.org> wrote:
> >
> > Hi Jakub,
> >
> > > So, what does this imply for the current ISAs?
> >
> > AVX10 will imply AVX2 on the ISA level. And we suppose AVX10 is an
> > independent ISA feature set. Although sharing the same instructions
> > and encodings, AVX10 and AVX512 are conceptual independent features,
> > which means they are orthogonal.
> >
> > > The expectations in lots of config/i386/* is that -mavx512f /
> > > TARGET_AVX512F means 512 bit vector support is available and most of
> > > the various -mavx512XXX options imply -mavx512f (and -mno-avx512f
> > > turns those off).  And if -mavx512vl / TARGET_AVX512VL isn't
> > > available, tons of places just use 512-bit EVEX instructions for
> > > 256-bit or 128-bit stuff (mostly to be able to access [xy]mm16+).
> >
> > For AVX10, the 128/256/scalar version of the instructions are always
> > there, and also for [xy]mm16+. 512 version is "optional", which needs
> > user to indicate them in options. When 512 version is enabled,
> > 128/256/scalar version is also enabled, which is kind of reverse relation
> > between the current AVX512F/AVX512VL.
> >
> > Since we take AVX10 and AVX512 are orthogonal, we will add OR logic
> > for the current pattern, which is shown in our AVX512DQ+VL sample patches.
> 
> Hmm, so it sounds like AVX10 is currently, at the 10.1 level, a way to specify
> AVX512F and AVX512VL "differently", so wouldn't it make sense to make it
> complement those only so one can use, say, -mavx10 -mno-avx512bf16 to disable
> parts of the former AVX512 ISA one doesn't like to get code generated for?
> -mavx10 would then enable all the existing sub-AVX512 ISAs?
>

We take AVX10 and AVX512 two independent ISAs.

Therefore, it is quite weird to disable something with another unrelated ISA.
I don't think -mavx10.1 -mno-avx512f should disable anything.

Thx,
Haochen

> > > Sure, I expect all AVX10.N CPUs will have AVX512VL CPUID, will they
> > > have AVX512F CPUID even when the 512-bit vectors aren't present?
> > > What happens if one mixes the -mavx10* options together with
> > > -mno-avx512vl or similar options?  Will -mno-avx512f still imply 
> > > -mno-avx512vl etc.?
> >
> > For the CPUID part, AVX10 and AVX512 have different emulation. Only
> > Xeon Server will have AVX512 related CPUIDs for backward
> > compatibility. For GNR, it will be AVX512F, AVX512VL, AVX512CD,
> > AVX512BW, AVX512DQ, AVX512_IFMA, AVX512_VBMI, AVX512_VNNI,
> > AVX512_BF16, AVX512_BITALG, AVX512_VPOPCNTDQ, AV512_VBMI2,
> > AVX512_FP16. Also, it will have AVX10 CPUIDs with 512 bit support set. Atom
> Server and client will only have AVX10 CPUIDs with 256 bit support set.
> >
> > -mno-avx512f will still imply -mno-avx512vl.
> >
> > As we mentioned below, we don't recommend users to combine the AVX10
> > and legacy
> > AVX512 options. We understand that there will be different opinions on
> > what should compiler behave on some controversial option combinations.
> >
> > If there is someone mixes the options, the golden rule is that we are using 
> > OR logic.
> > Therefore, enabling either feature will turn on the shared
> > instructions, no matter the other feature is not mentioned or closed.
> > That is why we are emitting warning for some scenarios, which is also
> > mentioned in the letter.
> 
> I'm refraining from commenting on the senslesness of AVX10 as you're likely on
> the same receiving side as us.
> 
> Thanks,
> Richard.
> 
> > Thx,
> > Haochen
> >
> > >
> > >   Jakub
> >

RE: Intel AVX10.1 Compiler Design and Support

2023-08-09 Thread Jiang, Haochen via Gcc-patches

> -Original Message-
> From: Richard Biener 
> Sent: Wednesday, August 9, 2023 1:38 PM
> To: Phoebe Wang 
> Cc: Hongtao Liu ; Joseph Myers
> ; Jiang, Haochen ; gcc-
> patc...@gcc.gnu.org; ubiz...@gmail.com; Liu, Hongtao
> ; Zhang, Annita ; Wang,
> Phoebe ; x86-64-abi  a...@googlegroups.com>; llvm-dev ; Craig Topper
> 
> Subject: Re: Intel AVX10.1 Compiler Design and Support
> 
> 
> 
> > Am 09.08.2023 um 06:02 schrieb Phoebe Wang via Gcc-patches  patc...@gcc.gnu.org>:
> >
> > I have some proposals about unifying ABI on AVX10 for both 256-bit
> > and 512-bit.
> >
> >
> >
> > Proposal 1: Promote attribute from AVX10-256 to AVX10-512 for any
> > function which has 512-bit or above vectors in passing/returning arguments.
> >
> > Problem: Binary cannot run on AVX10-256 only target.
> >
> > Reason:
> >
> > When user tries to pass/return 512-bit vector, they should be aware of
> > it will become target dependent. User should be taught not to use it
> > on 256-bit targets and there will be unexpected things happening if
> > they insist.
> >
> > Actually, ICC and MSVC already have chosen to promote for the argument:
> > https://godbolt.org/z/vcrf9qW5z I think if compiler have to choose the
> > misbehavior between fail in result and crash due to illegal
> > instruction, the latter is definitely better than the former.
> >
> > In this way, we can also declare x86-64-v5 is inherit from x86-64-v4
> > and has the interaction with previous versions.
> >
> >
> >
> > Proposal 2: Abort compilation when user tries to pass/return 512-bit
> > vectors.
> >
> > Reason: This turns possible run time crash into compile time error.
> >
> >
> >
> > Proposal 3: Change the ABI of 512-bit vector and always be
> > passed/returned from memory.
> 
> I don’t think we can realistically change the ABI.  If we could passing them 
> in two
> 256bit registers would be possible as well.
> 
> Note I fully expect intel to turn around and implement 512 bits on a 256 but 
> data
> path on the E cores in 5 years.  And it will take at least that time for 
> AVX10 to take
> off (look at AVX512 for this and how they cautionously chose to include bf16 
> to
> cut off Zen4).  So IMHO we shouldn’t worry at all and just wait and see for 
> AVX42
> to arrive.

Let me try to clarify the whole thing.

I suppose Phoebe's "change" is based on LLVM.

In GCC, current behavior is to pass 512 bit vector in memory when there is no
512 bit support. But when there is support, everything should be passed in 
register.

In AVX10, I prefer to still keep to this pattern. But if most of you want to 
change it,
I have no objection since AVX10 is a new start.

Thx,
Haochen

> 
> Richard
> 
> > Reason: We expect AVX10-256 is a universal configuration and in most
> > scenarios, 512-bit vector won't bring performance improvements. So we
> > can sacrifice a little 512-bit performance to achieve the interaction
> > between
> > AVX10-256 and AVX10-512. In this way, there won't have any runtime
> > issue in the future either.
> >
> >
> >
> > Thanks
> >
> > Phoebe
> >
> > Hongtao Liu  于2023年8月9日周三 10:18写道：
> >
> >>> On Wed, Aug 9, 2023 at 10:14 AM Hongtao Liu  wrote:
> >>>
> >>> On Wed, Aug 9, 2023 at 9:21 AM Hongtao Liu  wrote:
> >>>>
> >>>> On Wed, Aug 9, 2023 at 3:55 AM Joseph Myers
> >>>> 
> >> wrote:
> >>>>>
> >>>>> Do you have any comments on the interaction of AVX10 with the
> >>>>> micro-architecture levels defined in the ABI (and supported with
> >>>>> glibc-hwcaps directories in glibc)?  Given that the levels are
> >> cumulative,
> >>>>> should we take it that any future levels will be ones supporting
> >> 512-bit
> >>>>> vector width for AVX10 (because x86-64-v4 requires the current
> >> AVX512F,
> >>>>> AVX512BW, AVX512CD, AVX512DQ and AVX512VL) - and so any future
> >> processors
> >>>>> that only support 256-bit vector width will be considered to match
> >> the
> >>>>> x86-64-v3 micro-architecture level but not any higher level?
> >>>> This is actually something we really want to discuss in the
> >>>> community, our proposal for x86-64-v5: AVX10.2-256(Implying AVX10.1-
> 256) + APX.
> >>>> One big reason is Intel E-core will only support AVX10 256-bit, if
> >>>> we want to us

RE: Intel AVX10.1 Compiler Design and Support

2023-08-08 Thread Jiang, Haochen via Gcc-patches

Hi Jakub,

> So, what does this imply for the current ISAs?

AVX10 will imply AVX2 on the ISA level. And we suppose AVX10 is an
independent ISA feature set. Although sharing the same instructions and
encodings, AVX10 and AVX512 are conceptual independent features, which
means they are orthogonal.

> The expectations in lots of config/i386/* is that -mavx512f / TARGET_AVX512F
> means 512 bit vector support is available and most of the various -mavx512XXX
> options imply -mavx512f (and -mno-avx512f turns those off).  And if
> -mavx512vl / TARGET_AVX512VL isn't available, tons of places just use
> 512-bit EVEX instructions for 256-bit or 128-bit stuff (mostly to be able to
> access [xy]mm16+).

For AVX10, the 128/256/scalar version of the instructions are always there, and
also for [xy]mm16+. 512 version is "optional", which needs user to indicate them
in options. When 512 version is enabled, 128/256/scalar version is also enabled,
which is kind of reverse relation between the current AVX512F/AVX512VL.

Since we take AVX10 and AVX512 are orthogonal, we will add OR logic for the 
current
pattern, which is shown in our AVX512DQ+VL sample patches.

> Sure, I expect all AVX10.N CPUs will have AVX512VL CPUID, will they have
> AVX512F CPUID even when the 512-bit vectors aren't present? What happens if
> one mixes the -mavx10* options together with -mno-avx512vl or similar
> options?  Will -mno-avx512f still imply -mno-avx512vl etc.?

For the CPUID part, AVX10 and AVX512 have different emulation. Only Xeon Server
will have AVX512 related CPUIDs for backward compatibility. For GNR, it will be
AVX512F, AVX512VL, AVX512CD, AVX512BW, AVX512DQ, AVX512_IFMA, AVX512_VBMI,
AVX512_VNNI, AVX512_BF16, AVX512_BITALG, AVX512_VPOPCNTDQ, AV512_VBMI2,
AVX512_FP16. Also, it will have AVX10 CPUIDs with 512 bit support set. Atom 
Server and
client will only have AVX10 CPUIDs with 256 bit support set.

-mno-avx512f will still imply -mno-avx512vl.

As we mentioned below, we don't recommend users to combine the AVX10 and legacy
AVX512 options. We understand that there will be different opinions on what 
should
compiler behave on some controversial option combinations.

If there is someone mixes the options, the golden rule is that we are using OR 
logic.
Therefore, enabling either feature will turn on the shared instructions, no 
matter the other
feature is not mentioned or closed. That is why we are emitting warning for 
some scenarios,
which is also mentioned in the letter.

Thx,
Haochen

> 
>   Jakub

RE: [r14-2639 Regression] FAIL: gcc.dg/vect/bb-slp-pr95839-v8.c scan-tree-dump slp2 "optimized: basic block" on Linux/x86_64

2023-07-20 Thread Jiang, Haochen via Gcc-patches

> -Original Message-
> From: Richard Biener 
> Sent: Thursday, July 20, 2023 9:28 PM
> To: Maciej W. Rozycki 
> Cc: haochen.jiang ; gcc-
> regress...@gcc.gnu.org; gcc-patches@gcc.gnu.org; Jiang, Haochen
> 
> Subject: Re: [r14-2639 Regression] FAIL: gcc.dg/vect/bb-slp-pr95839-v8.c
> scan-tree-dump slp2 "optimized: basic block" on Linux/x86_64
> 
> On Thu, Jul 20, 2023 at 3:13 PM Maciej W. Rozycki 
> wrote:
> >
> > On Thu, 20 Jul 2023, Richard Biener wrote:
> >
> > > > c1e420549f2305efb70ed37e693d380724eb7540 is the first bad commit
> > > > commit c1e420549f2305efb70ed37e693d380724eb7540
> > > > Author: Maciej W. Rozycki 
> > > > Date:   Wed Jul 19 11:59:29 2023 +0100
> > > >
> > > > testsuite: Add 64-bit vector variant for bb-slp-pr95839.c
> > >
> > > I think the issue is we disable V2SF on ia32 because of the conflict
> > > with MMX which we don't want to use.
> >
> >  I'm not sure if I have a way to test with such a target.  Would you
> > expect:
> >
> > /* { dg-require-effective-target vect64 } */
> >
> > to cover it?  If so, then I'll put it back as in the original version
> > and post for Haochen to verify.

I suppose just commit to trunk and it should be ok since it is only -m32 issue.

Thx,
Haochen

> 
> Yeah, that should work here.
> 
> Richard.
> 
> >   Maciej

RE: [x86 PATCH] Fix FAIL of gcc.target/i386/pr91681-1.c

2023-07-16 Thread Jiang, Haochen via Gcc-patches

> -Original Message-
> From: Jiang, Haochen
> Sent: Friday, July 14, 2023 10:50 AM
> To: Roger Sayle ; gcc-patches@gcc.gnu.org
> Cc: 'Uros Bizjak' 
> Subject: RE: [x86 PATCH] Fix FAIL of gcc.target/i386/pr91681-1.c
> 
> > The recent change in TImode parameter passing on x86_64 results in the
> > FAIL of pr91681-1.c.  The issue is that with the extra flexibility,
> > the combine pass is now spoilt for choice between using either the
> > *add3_doubleword_concat or the *add3_doubleword_zext
> > patterns, when one operand is a *concat and the other is a zero_extend.
> > The solution proposed below is provide an
> > *add3_doubleword_concat_zext define_insn_and_split, that can
> > benefit both from the register allocation of *concat, and still avoid
> > the xor normally required by zero extension.
> >
> > I'm investigating a follow-up refinement to improve register
> > allocation further by avoiding the early clobber in the =, and
> > handling (custom) reloads explicitly, but this piece resolves the testcase
> failure.
> >
> > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> > and make -k check, both with and without --target_board=unix{-m32}
> > with no new failures.  Ok for mainline?
> >
> >
> > 2023-07-11  Roger Sayle  
> >
> > gcc/ChangeLog
> > PR target/91681
> > * config/i386/i386.md (*add3_doubleword_concat_zext): New
> > define_insn_and_split derived from *add3_doubleword_concat
> > and *add3_doubleword_zext.
> 
> Hi Roger,
> 
> This commit currently changed the codegen of testcase p443644-2.c from:

Oops, a typo, I mean pr43644-2.c.

Haochen

> 
> movq%rdx, %rax
> xorl%edx, %edx
> addq%rdi, %rax
> adcq%rsi, %rdx
> to:
> 
> movq%rdx, %rcx
> movq%rdi, %rax
> movq%rsi, %rdx
> addq%rcx, %rax
> adcq$0, %rdx
> 
> which causes the testcase fail under -m64.
> 
> Is this within your expectation?
> 
> BRs,
> Haochen
> 
> >
> >
> > Thanks,
> > Roger
> > --

RE: [x86 PATCH] Fix FAIL of gcc.target/i386/pr91681-1.c

2023-07-13 Thread Jiang, Haochen via Gcc-patches

> The recent change in TImode parameter passing on x86_64 results in the FAIL
> of pr91681-1.c.  The issue is that with the extra flexibility, the combine 
> pass is
> now spoilt for choice between using either the
> *add3_doubleword_concat or the *add3_doubleword_zext
> patterns, when one operand is a *concat and the other is a zero_extend.
> The solution proposed below is provide an
> *add3_doubleword_concat_zext define_insn_and_split, that can
> benefit both from the register allocation of *concat, and still avoid the xor
> normally required by zero extension.
> 
> I'm investigating a follow-up refinement to improve register allocation
> further by avoiding the early clobber in the =, and handling (custom)
> reloads explicitly, but this piece resolves the testcase failure.
> 
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap and
> make -k check, both with and without --target_board=unix{-m32} with no
> new failures.  Ok for mainline?
> 
> 
> 2023-07-11  Roger Sayle  
> 
> gcc/ChangeLog
> PR target/91681
> * config/i386/i386.md (*add3_doubleword_concat_zext): New
> define_insn_and_split derived from *add3_doubleword_concat
> and *add3_doubleword_zext.

Hi Roger,

This commit currently changed the codegen of testcase p443644-2.c from:

movq%rdx, %rax
xorl%edx, %edx
addq%rdi, %rax
adcq%rsi, %rdx
to:

movq%rdx, %rcx
movq%rdi, %rax
movq%rsi, %rdx
addq%rcx, %rax
adcq$0, %rdx

which causes the testcase fail under -m64.

Is this within your expectation?

BRs,
Haochen

> 
> 
> Thanks,
> Roger
> --

[r14-2462 Regression] FAIL: libgomp.c++/../libgomp.c-c++-common/alloc-12.c execution test on Linux/x86_64

2023-07-13 Thread Jiang, Haochen via Gcc-patches

On Linux/x86_64,

450b05ce54d3f08c583c3b5341233ce0df99725b is the first bad commit commit 
450b05ce54d3f08c583c3b5341233ce0df99725b
Author: Tobias Burnus 
Date:   Wed Jul 12 13:50:21 2023 +0200

libgomp: Use libnuma for OpenMP's partition=nearest allocation trait

caused


with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r14-2462/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
RUNTESTFLAGS="c++.exp=libgomp.c-c++-common/alloc-11.c 
--target_board='unix{-m32}'"
$ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
RUNTESTFLAGS="c++.exp=libgomp.c-c++-common/alloc-11.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
RUNTESTFLAGS="c++.exp=libgomp.c-c++-common/alloc-12.c 
--target_board='unix{-m32}'"
$ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
RUNTESTFLAGS="c++.exp=libgomp.c-c++-common/alloc-12.c 
--target_board='unix{-m32\ -march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at haochen dot jiang at intel.com.) (If you met problems with cascadelake 
related, disabling AVX512F in command line might save that.) (However, please 
make sure that there is no potential problems with AVX512.)

RE: [r14-2314 Regression] FAIL: gcc.target/i386/pr100711-2.c scan-assembler-times vpandn 8 on Linux/x86_64

2023-07-07 Thread Jiang, Haochen via Gcc-patches

> -Original Message-
> From: Hongtao Liu 
> Sent: Friday, July 7, 2023 3:55 PM
> To: Beulich, Jan 
> Cc: haochen.jiang ; Jiang, Haochen
> ; gcc-regress...@gcc.gnu.org; gcc-
> patc...@gcc.gnu.org; Liu, Hongtao 
> Subject: Re: [r14-2314 Regression] FAIL: gcc.target/i386/pr100711-2.c scan-
> assembler-times vpandn 8 on Linux/x86_64
> 
> On Fri, Jul 7, 2023 at 3:50 PM Hongtao Liu  wrote:
> >
> > On Fri, Jul 7, 2023 at 3:50 PM Jan Beulich  wrote:
> > >
> > > On 07.07.2023 09:46, Hongtao Liu wrote:
> > > > On Fri, Jul 7, 2023 at 3:18 PM Jan Beulich via Gcc-regression
> > > >  wrote:
> > > >>
> > > >> On 06.07.2023 13:57, haochen.jiang wrote:
> > > >>> On Linux/x86_64,
> > > >>>
> > > >>> e007369c8b67bcabd57c4fed8cff2a6db82e78e6 is the first bad commit
> > > >>> commit e007369c8b67bcabd57c4fed8cff2a6db82e78e6
> > > >>> Author: Jan Beulich 
> > > >>> Date:   Wed Jul 5 09:49:16 2023 +0200
> > > >>>
> > > >>> x86: yet more PR target/100711-like splitting
> > > >>>
> > > >>> caused
> > > >>>
> > > >>> FAIL: gcc.target/i386/pr100711-1.c scan-assembler-times pandn 2
> > > >>> FAIL: gcc.target/i386/pr100711-2.c scan-assembler-times vpandn 8
> > > >>
> > > >> I expect the same applies here - -mno-avx512f (or -mno-avx512vl)
> > > >> might
> > > > For this one, we can just add -mno-avx512f to the testcase,it aims
> > > > to optimize pandn for avx2 target.
> > > >> address this failure. But whether that's really the way to go I'm
> > > >> not sure of. Plus of course such adjustments should have been
> > > >> done ahead of time, when it was decided that testing with certain
> > > >> -march= settings is a goal. My changes have merely uncovered the
> prior omissions.
> > > > It's not a standard request, it's just our private tester which is
> > > > used to find gcc bugs and miss-optimizations.
> > > > It sometimes generates false positive reports (usually adding
> > > > -mno-avx512f to the testcase can fix that), hope that's not too
> > > > annoying.
> > >
> > > Wouldn't that then better be done once uniformly for all affected
> > > tests, rather than being discovered piecemeal?
> This also prevents us from finding potential problems.

Yes, -march=cascadelake actually opens AVX512F related features. It sometimes
show the potential problems while sometimes false positive.

I will add a hint in the script email.

Thx,
Haochen

> > >
> > > Anyway, in this case: Since you said you'd take care of the other
> > > test, will/can you do so for the two ones here as well, or am I on the 
> > > hook?
> > I'll do that.
> > >
> > > Jan
> >
> >
> >
> > --
> > BR,
> > Hongtao
> 
> 
> 
> --
> BR,
> Hongtao

RE: [COMMITTED] i386: Use 2x-wider modes when emulating QImode vector instructions

2023-05-25 Thread Jiang, Haochen via Gcc-patches

> gcc/ChangeLog:
> 
> * config/i386/i386-expand.cc (ix86_expand_vecop_qihi2):
> Rewrite to expand to 2x-wider (e.g. V16QI -> V16HImode)
> instructions when available.  Emulate truncation via
> ix86_expand_vec_perm_const_1 when native truncate insn
> is not available.
> (ix86_expand_vecop_qihi_partial) : Use pmovzx
> when available.  Trivially rename some variables.
> (ix86_expand_vecop_qihi): Unconditionally call ix86_expand_vecop_qihi2.

Hi Uros,

I suppose you pushed wrong patch to trunk.

On trunk, we see this:

@@ -23409,9 +23457,7 @@ ix86_expand_vecop_qihi (enum rtx_code code, rtx dest, 
rtx op1, rtx op2)
   && ix86_expand_vec_shift_qihi_constant (code, dest, op1, op2))
 return;

-  if (TARGET_AVX512BW
-  && VECTOR_MODE_P (GET_MODE (op2))
-  && ix86_expand_vecop_qihi2 (code, dest, op1, op2))
+  if (0 && ix86_expand_vecop_qihi2 (code, dest, op1, op2))
 return;

   switch (qimode)

It should not be if (0 && ix86_expand_vecop_qihi2 (code, dest, op1, op2))

The patch in this thread is correct, where is:

@@ -23409,9 +23457,7 @@ ix86_expand_vecop_qihi (enum rtx_code code, rtx dest, 
rtx op1, rtx op2)
   && ix86_expand_vec_shift_qihi_constant (code, dest, op1, op2))
 return;
 
-  if (TARGET_AVX512BW
-  && VECTOR_MODE_P (GET_MODE (op2))
-  && ix86_expand_vecop_qihi2 (code, dest, op1, op2))
+  if (ix86_expand_vecop_qihi2 (code, dest, op1, op2))
 return;
 
   switch (qimode)

Thx,
Haochen

> * config/i386/i386.cc (ix86_multiplication_cost): Rewrite cost
> calculation of V*QImode emulations to account for generation of
> 2x-wider mode instructions.
> (ix86_shift_rotate_cost): Update cost calculation of V*QImode
> emulations to account for generation of 2x-wider mode instructions.
> 
> gcc/testsuite/ChangeLog:
> 
> * gcc.target/i386/avx512vl-pr95488-1.c: Revert 2023-05-18 change.
> 
> Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.
> 
> Uros.

RE: [PATCH] i386: Share AES xmm intrin with VAES

2023-04-18 Thread Jiang, Haochen via Gcc-patches

> > a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index
> > 33e281901cf..e7d565a8389 100644
> > --- a/gcc/config/i386/sse.md
> > +++ b/gcc/config/i386/sse.md
> > @@ -25107,67 +25107,71 @@
> >
> > ;;
> > ;;
> >
> >  (define_insn "aesenc"
> > -  [(set (match_operand:V2DI 0 "register_operand" "=x,x")
> > -   (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x")
> > -  (match_operand:V2DI 2 "vector_operand" "xBm,xm")]
> > +  [(set (match_operand:V2DI 0 "register_operand" "=x,x,v")
> > +   (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x,v")
> > +  (match_operand:V2DI 2 "vector_operand"
> > + "xBm,xm,vm")]
> >   UNSPEC_AESENC))]
> > -  "TARGET_AES"
> > +  "TARGET_AES || (TARGET_VAES && TARGET_AVX512VL)"
> >"@
> > aesenc\t{%2, %0|%0, %2}
> > +   vaesenc\t{%2, %1, %0|%0, %1, %2}
> > vaesenc\t{%2, %1, %0|%0, %1, %2}"
> > -  [(set_attr "isa" "noavx,avx")
> > +  [(set_attr "isa" "noavx,aes,avx512vl")
> Shouldn't it be vaes_avx512vl and then remove " || (TARGET_VAES &&
> TARGET_AVX512VL)" from condition.

Since VAES should not imply AES, we need that "|| (TARGET_VAES && 
TARGET_AVX512VL)"

And there is no need to add vaes_avx512vl since the last alternative will only
be hit when there is no aes. When there is no aes, the pattern will need vaes
and avx512vl both or we could not use this pattern. avx512vl here is just like
a placeholder.

BRs,
Haochen

> Similar for below patterns.
> Others LGTM.
> > (set_attr "type" "sselog1")
> > (set_attr "prefix_extra" "1")
> > -   (set_attr "prefix" "orig,vex")
> > -   (set_attr "btver2_decode" "double,double")
> > +   (set_attr "prefix" "orig,vex,evex")
> > +   (set_attr "btver2_decode" "double,double,double")
> > (set_attr "mode" "TI")])
> >
> >  (define_insn "aesenclast"
> > -  [(set (match_operand:V2DI 0 "register_operand" "=x,x")
> > -   (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x")
> > -  (match_operand:V2DI 2 "vector_operand" "xBm,xm")]
> > +  [(set (match_operand:V2DI 0 "register_operand" "=x,x,v")
> > +   (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x,v")
> > +  (match_operand:V2DI 2 "vector_operand"
> > + "xBm,xm,vm")]
> >   UNSPEC_AESENCLAST))]
> > -  "TARGET_AES"
> > +  "TARGET_AES || (TARGET_VAES && TARGET_AVX512VL)"
> >"@
> > aesenclast\t{%2, %0|%0, %2}
> > +   vaesenclast\t{%2, %1, %0|%0, %1, %2}
> > vaesenclast\t{%2, %1, %0|%0, %1, %2}"
> > -  [(set_attr "isa" "noavx,avx")
> > +  [(set_attr "isa" "noavx,aes,avx512vl")
> > (set_attr "type" "sselog1")
> > (set_attr "prefix_extra" "1")
> > -   (set_attr "prefix" "orig,vex")
> > -   (set_attr "btver2_decode" "double,double")
> > +   (set_attr "prefix" "orig,vex,evex")
> > +   (set_attr "btver2_decode" "double,double,double")
> > (set_attr "mode" "TI")])
> >
> >  (define_insn "aesdec"
> > -  [(set (match_operand:V2DI 0 "register_operand" "=x,x")
> > -   (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x")
> > -  (match_operand:V2DI 2 "vector_operand" "xBm,xm")]
> > +  [(set (match_operand:V2DI 0 "register_operand" "=x,x,v")
> > +   (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x,v")
> > +  (match_operand:V2DI 2 "vector_operand"
> > + "xBm,xm,vm")]
> >   UNSPEC_AESDEC))]
> > -  "TARGET_AES"
> > +  "TARGET_AES || (TARGET_VAES && TARGET_AVX512VL)"
> >"@
> > aesdec\t{%2, %0|%0, %2}
> > +   vaesdec\t{%2, %1, %0|%0, %1, %2}
> > vaesdec\t{%2, %1, %0|%0, %1, %2}"
> > -  [(set_attr "isa" "noavx,avx")
> > +  [(set_attr "isa" "noavx,aes,avx512vl")
> > (set_attr "type" "sselog1")
> > (set_attr "prefix_extra" "1")
> > -   (set_attr "prefix" "orig,vex")
> > -   (set_attr "btver2_decode" "double,double")
> > +   (set_attr "prefix" "orig,vex,evex")
> > +   (set_attr "btver2_decode" "double,double,double")
> > (set_attr "mode" "TI")])
> >
> >  (define_insn "aesdeclast"
> > -  [(set (match_operand:V2DI 0 "register_operand" "=x,x")
> > -   (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x")
> > -  (match_operand:V2DI 2 "vector_operand" "xBm,xm")]
> > +  [(set (match_operand:V2DI 0 "register_operand" "=x,x,v")
> > +   (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x,v")
> > +  (match_operand:V2DI 2 "vector_operand"
> > + "xBm,xm,vm")]
> >   UNSPEC_AESDECLAST))]
> > -  "TARGET_AES"
> > +  "TARGET_AES || (TARGET_VAES && TARGET_AVX512VL)"
> >"@
> > aesdeclast\t{%2, %0|%0, %2}
> > +   vaesdeclast\t{%2, %1, %0|%0, %1, %2}
> > vaesdeclast\t{%2, %1, %0|%0, %1, %2}"
> > -  [(set_attr "isa" "noavx,avx")
> > +  [(set_attr "isa" "noavx,aes,avx512vl")
> > (set_attr "type" "sselog1")
> > (set_attr "prefix_extra" "1")
> > -   (set_attr

[r13-5971 Regression] FAIL: gcc.target/i386/pr108774.c (test for excess errors) on Linux/x86_64

2023-02-14 Thread Jiang, Haochen via Gcc-patches

On Linux/x86_64,

a33e3dcbd15e73603796e30b5eeec11a0c8bacec is the first bad commit commit 
a33e3dcbd15e73603796e30b5eeec11a0c8bacec
Author: Vladimir N. Makarov 
Date:   Mon Feb 13 16:05:04 2023 -0500

RA: Clear reg equiv caller_save_p flag when clearing defined_p flag

caused

FAIL: gcc.target/i386/pr108774.c (test for excess errors)

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r13-5971/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/pr108774.c --target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/pr108774.c --target_board='unix{-m32\ 
-march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at haochen dot jiang at intel.com)

[r13-5318 Regression] FAIL: g++.dg/init/new51.C -std=c++98 (test for excess errors) on Linux/x86_64

2023-01-25 Thread Jiang, Haochen via Gcc-patches

This is the recent regression on gcc trunk.

Seems it got fixed. If that is the case, plz ignore that.

As mentioned in previous thread, before the mail system got fixed on server, I 
will manually forward this email.

BRs,
Haochen

> -Original Message-
> From: haochen.jiang 
> Sent: Wednesday, January 25, 2023 7:27 AM
> To: ja...@redhat.com; gcc-regress...@gcc.gnu.org; gcc-patches@gcc.gnu.org;
> Jiang, Haochen 
> Subject: [r13-5318 Regression] FAIL: g++.dg/init/new51.C -std=c++98 (test for
> excess errors) on Linux/x86_64
> 
> On Linux/x86_64,
> 
> 049a52909075117f5112971cc83952af2a818bc1 is the first bad commit commit
> 049a52909075117f5112971cc83952af2a818bc1
> Author: Jason Merrill 
> Date:   Mon Jan 23 16:25:07 2023 -0500
> 
> c++: TARGET_EXPR collapsing [PR107303]
> 
> caused
> 
> FAIL: g++.dg/init/new51.C  -std=c++14 (test for excess errors)
> FAIL: g++.dg/init/new51.C  -std=c++17 (test for excess errors)
> FAIL: g++.dg/init/new51.C  -std=c++20 (test for excess errors)
> FAIL: g++.dg/init/new51.C  -std=c++98 (test for excess errors)
> 
> with GCC configured with
> 
> ../../gcc/configure --prefix=/export/users/haochenj/src/gcc-
> bisect/master/master/r13-5318/usr --enable-clocale=gnu --with-system-zlib --
> with-demangler-in-ld --with-fpmath=sse --enable-languages=c,c++,fortran --
> enable-cet --without-isl --enable-libmpx x86_64-linux --disable-bootstrap
> 
> To reproduce:
> 
> $ cd {build_dir}/gcc && make check
> RUNTESTFLAGS="dg.exp=g++.dg/init/new51.C --target_board='unix{-m32}'"
> $ cd {build_dir}/gcc && make check
> RUNTESTFLAGS="dg.exp=g++.dg/init/new51.C --target_board='unix{-m32\ -
> march=cascadelake}'"
> 
> (Please do not reply to this email, for question about this report, contact 
> me at
> haochen dot jiang at intel.com)

[r13-5244 Regression] FAIL: gcc.dg/analyzer/SARD-tc841-basic-00182-min.c (test for excess errors) on Linux/x86_64

2023-01-18 Thread Jiang, Haochen via Gcc-patches

The mail system is still broken on that machine, still sending this manually. 
Before that mail down, I will keep check the script daily to see if there is 
new regression.

BTW, since there is a Bugzilla for r13-5202 regression, not resending that 
report

On Linux/x86_64,

c6a09bfa038ccbfc9f123ede14a3d6237fab is the first bad commit
commit c6a09bfa038ccbfc9f123ede14a3d6237fab
Author: David Malcolm dmalc...@redhat.com
Date:   Wed Jan 18 11:41:47 2023 -0500

analyzer: add SARD testsuite 81

caused

FAIL: gcc.dg/analyzer/SARD-tc841-basic-00182-min.c (test for excess errors)

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r13-5244/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="analyzer.exp=gcc.dg/analyzer/SARD-tc841-basic-00182-min.c 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="analyzer.exp=gcc.dg/analyzer/SARD-tc841-basic-00182-min.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="analyzer.exp=gcc.dg/analyzer/SARD-tc841-basic-00182-min.c 
--target_board='unix{-m64}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="analyzer.exp=gcc.dg/analyzer/SARD-tc841-basic-00182-min.c 
--target_board='unix{-m64\ -march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at haochen dot jiang at intel.com)

[r13-5092 Regression] FAIL: gcc.dg/tree-ssa/ssa-dse-46.c (test for excess errors) on Linux/x86_64

2023-01-10 Thread Jiang, Haochen via Gcc-patches

Hi all,



This is the bisect result for the latest regression which fail to send to 
mailing list.



It seems that the mail command in s-nail went down after my machine got 
upgraded, still investigating why.



On Linux/x86_64,



4e0b504f26f78ff02e80ad98ebbf8ded3aa6ffa1 is the first bad commit

commit 4e0b504f26f78ff02e80ad98ebbf8ded3aa6ffa1

Author: Richard Biener mailto:rguent...@suse.de>>

Date:   Tue Jan 10 13:48:51 2023 +0100



tree-optimization/106293 - missed DSE with virtual LC PHI



caused



FAIL: gcc.dg/tree-ssa/ssa-dse-46.c (test for excess errors)



with GCC configured with



../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r13-5092/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap



To reproduce:



$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/ssa-dse-46.c 
--target_board='unix{-m32}'"

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/ssa-dse-46.c 
--target_board='unix{-m32\ -march=cascadelake}'"



BRs,

Haochen

RE: [wwwdocs] gcc-13: Mention Intel new ISA and march support.

2022-11-15 Thread Jiang, Haochen via Gcc-patches

Hi Gerald,

I will remove "to GCC" here if there is no more comment from others on Thursday.
For me it is reasonable.

Thx,
Haochen

> -Original Message-
> From: Gerald Pfeifer 
> Sent: Monday, November 14, 2022 9:56 PM
> To: Jiang, Haochen 
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: [wwwdocs] gcc-13: Mention Intel new ISA and march support.
> 
> On Thu, 10 Nov 2022, Haochen Jiang via Gcc-patches wrote:
> > +  New ISA extension support for Intel AVX-IFMA was added to GCC.
> 
> Here and in the other cases I'd skip "to GCC". This is clear from the context
> (this being the GCC release notes :-) and makes it shorter.
> 
> Gerald

RE: [PATCH] Support Intel CMPccXADD

2022-10-24 Thread Jiang, Haochen via Gcc-patches

> -Original Message-
> From: Gcc-patches  bounces+haochen.jiang=intel@gcc.gnu.org> On Behalf Of Haochen Jiang
> via Gcc-patches
> Sent: Monday, October 24, 2022 5:01 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Liu, Hongtao 
> Subject: [PATCH] Support Intel CMPccXADD
> 
> Hi all,
> 
> I just refined CMPccXADD patch to make the enum in order intrin file aligned
> with how opcode does.
> 

I just found a testcase issue not fixed for the enum, will send the fixed patch
soon.

> Ok for trunk?
> 
> BRs,
> Haochen
> 
> gcc/ChangeLog:
> 
> * common/config/i386/cpuinfo.h (get_available_features):
>   Detect cmpccxadd.
>   * common/config/i386/i386-common.cc
>   (OPTION_MASK_ISA2_CMPCCXADD_SET,
>   OPTION_MASK_ISA2_CMPCCXADD_UNSET): New.
>   (ix86_handle_option): Handle -mcmpccxadd, unset cmpccxadd when
> avx2
>   is disabled.
> * common/config/i386/i386-cpuinfo.h (enum processor_features):
>   Add FEATURE_CMPCCXADD.
> * common/config/i386/i386-isas.h: Add ISA_NAME_TABLE_ENTRY for
>   cmpccxadd.
>   * config.gcc: Add cmpccxaddintrin.h.
>   * config/i386/cpuid.h (bit_CMPCCXADD): New.
>   * config/i386/i386-builtin-types.def:
>   Add DEF_FUNCTION_TYPE(INT, PINT, INT, INT, INT)
>   and DEF_FUNCTION_TYPE(LONGLONG, PLONGLONG, LONGLONG,
> LONGLONG, INT).
>   * config/i386/i386-builtin.def (BDESC): Add new builtins.
>   * config/i386/i386-c.cc (ix86_target_macros_internal): Define
>   __CMPCCXADD__.
>   * config/i386/i386-expand.cc (ix86_expand_special_args_builtin):
>   Add new parameter to indicate constant position.
>   Handle INT_FTYPE_PINT_INT_INT_INT
>   and LONGLONG_FTYPE_PLONGLONG_LONGLONG_LONGLONG_INT.
>   * config/i386/i386-isa.def (CMPCCXADD): Add DEF_PTA(CMPCCXADD).
>   * config/i386/i386-options.cc (isa2_opts): Add -mcmpccxadd.
>   (ix86_valid_target_attribute_inner_p): Handle cmpccxadd.
>   * config/i386/i386.opt: Add option -mcmpccxadd.
>   * config/i386/sync.md (cmpccxadd_): New define insn.
>   * config/i386/x86gprintrin.h: Include cmpccxaddintrin.h.
>   * doc/extend.texi: Document cmpccxadd.
>   * doc/invoke.texi: Document -mcmpccxadd.
>   * doc/sourcebuild.texi: Document target cmpccxadd.
>   * config/i386/cmpccxaddintrin.h: New file.
> 
> gcc/testsuite/ChangeLog:
> 
>   * g++.dg/other/i386-2.C: Add -mcmpccxadd.
>   * g++.dg/other/i386-3.C: Ditto.
>   * gcc.target/i386/avx-1.c: Add builtin define for enum.
>   * gcc.target/i386/funcspec-56.inc: Add new target attribute.
>   * gcc.target/i386/sse-13.c: Add builtin define for enum.
>   * gcc.target/i386/sse-23.c: Ditto.
>   * gcc.target/i386/x86gprintrin-1.c: Add -mcmpccxadd for 64 bit target.
>   * gcc.target/i386/x86gprintrin-2.c: Add -mcmpccxadd for 64 bit target.
>   Add builtin define for enum.
>   * gcc.target/i386/x86gprintrin-3.c: Add -mcmpccxadd for 64 bit target.
>   * gcc.target/i386/x86gprintrin-4.c: Add mcmpccxadd for 64 bit target.
>   * gcc.target/i386/x86gprintrin-5.c: Add mcpmccxadd for 64 bit target.
>   Add builtin define for enum.
>   * gcc.target/i386/cmpccxadd-1.c: New test.
>   * gcc.target/i386/cmpccxadd-2.c: New test.
> ---
>  gcc/common/config/i386/cpuinfo.h  |   2 +
>  gcc/common/config/i386/i386-common.cc |  15 ++
>  gcc/common/config/i386/i386-cpuinfo.h |   1 +
>  gcc/common/config/i386/i386-isas.h|   1 +
>  gcc/config.gcc|   3 +-
>  gcc/config/i386/cmpccxaddintrin.h |  89 +++
>  gcc/config/i386/cpuid.h   |   1 +
>  gcc/config/i386/i386-builtin-types.def|   4 +
>  gcc/config/i386/i386-builtin.def  |   4 +
>  gcc/config/i386/i386-c.cc |   2 +
>  gcc/config/i386/i386-expand.cc|  22 ++-
>  gcc/config/i386/i386-isa.def  |   1 +
>  gcc/config/i386/i386-options.cc   |   4 +-
>  gcc/config/i386/i386.opt  |   5 +
>  gcc/config/i386/sync.md   |  42 ++
>  gcc/config/i386/x86gprintrin.h|   2 +
>  gcc/doc/extend.texi   |   5 +
>  gcc/doc/invoke.texi   |  10 +-
>  gcc/doc/sourcebuild.texi  |   3 +
>  gcc/testsuite/g++.dg/other/i386-2.C   |   2 +-
>  gcc/testsuite/g++.dg/other/i386-3.C   |   2 +-
>  gcc/testsuite/gcc.target/i386/avx-1.c |   4 +
>  gcc/testsuite/gcc.target/i386/cmpccxadd-1.c   |  61 
>  gcc/testsuite/gcc.target/i386/cmpccxadd-2.c   | 138 ++
>  gcc/testsuite/gcc.target/i386/funcspec-56.inc |   2 +
>  gcc/testsuite/gcc.target/i386/sse-13.c|   6 +-
>  gcc/testsuite/gcc.target/i386/sse-23.c|   6 +-
>  .../gcc.target/i386/x86gprintrin-1.c  |   2 +-
>  .../gcc.target/i386/x86gprintrin-2.c  |   6 +-
>

RE: [PATCH 1/2] Add a parameter for the builtin function of prefetch to align with LLVM

2022-10-20 Thread Jiang, Haochen via Gcc-patches




> -Original Message-
> From: Segher Boessenkool 
> Sent: Friday, October 21, 2022 2:54 AM
> To: Jiang, Haochen 
> Cc: gcc-patches@gcc.gnu.org; rguent...@suse.de; Liu, Hongtao
> ; ubiz...@gmail.com; richard.earns...@arm.com;
> richard.sandif...@arm.com; marcus.shawcr...@arm.com;
> kyrylo.tkac...@arm.com; r...@gcc.gnu.org; g...@amylaar.uk;
> claz...@synopsys.com; ni...@redhat.com; ramana.radhakrish...@arm.com;
> aol...@gcc.gnu.org; hubi...@ucw.cz; mfort...@gmail.com;
> dje@gmail.com; li...@gcc.gnu.org; uweig...@de.ibm.com;
> kreb...@linux.ibm.com; olege...@gcc.gnu.org; da...@redhat.com;
> ebotca...@libertysurf.fr; jeffreya...@gmail.com; dave.ang...@bell.net
> Subject: Re: [PATCH 1/2] Add a parameter for the builtin function of prefetch
> to align with LLVM
> 
> On Thu, Oct 20, 2022 at 07:34:13AM +, Jiang, Haochen wrote:
> > > > +  /* Argument 3 must be either zero or one.  */
> > > > +  if (INTVAL (op3) != 0 && INTVAL (op3) != 1)
> > > > +{
> > > > +  warning (0, "invalid fourth argument to %<__builtin_prefetch%>;"
> > > > +   " using one");
> > >
> > > "using 1" makes sense maybe, but "using one" reads as "using an
> > > argument", not very sane.
> > >
> > > An error would be better here anyway?
> >
> > Will change to 1 to avoid confusion in that. The reason why this is a 
> > warning
> > is because previous ones related to constant arguments out of range in
> prefetch
> > are also using warning.
> 
> Please don't repeat historical mistakes.  You might not want to fix the
> existing code (since that can in theory break existing user code), but
> that is not a reason to punish users of a new feature as well ;-)
> 
> > > Please use a separate pattern for this, and leave prefetch to mean data
> > > prefetch, as documented!  Documentation you didn't change btw.  Call
> the
> > > new one instruction_prefetch or something equally boring maybe :-)
> >
> > Actually I changed documentation for prefetch but it is flooded in the patch
> > (Sorry for that).
> 
> Oh huh, I looked for it but didn't find it.  Another argument for making
> better patch series ;-)
> 
> > 1. Previously we are using parameter to indicate r/w and locality in 
> > prefetch.
> I
> > suppose it is quite similar in this case. Since the pattern is already 
> > there, I
> prefer
> > reusing them.
> 
> You can use the data prefetch RTL code for all data loads just as well,
> it is more closely related than this -- but most people would call that
> insanity!

Maybe you got me here. I suppose I will write another patch for a new RTL to see
which implementation is better.

Thx,
Haochen

> 
> 
> Segher

RE: [PATCH 1/2] Add a parameter for the builtin function of prefetch to align with LLVM

2022-10-20 Thread Jiang, Haochen via Gcc-patches

> -Original Message-
> From: Segher Boessenkool 
> Sent: Thursday, October 20, 2022 5:07 AM
> To: Jiang, Haochen 
> Cc: gcc-patches@gcc.gnu.org; rguent...@suse.de; Liu, Hongtao
> ; ubiz...@gmail.com; richard.earns...@arm.com;
> richard.sandif...@arm.com; marcus.shawcr...@arm.com;
> kyrylo.tkac...@arm.com; r...@gcc.gnu.org; g...@amylaar.uk;
> claz...@synopsys.com; ni...@redhat.com; ramana.radhakrish...@arm.com;
> aol...@gcc.gnu.org; hubi...@ucw.cz; mfort...@gmail.com;
> dje@gmail.com; li...@gcc.gnu.org; uweig...@de.ibm.com;
> kreb...@linux.ibm.com; olege...@gcc.gnu.org; da...@redhat.com;
> ebotca...@libertysurf.fr; jeffreya...@gmail.com; dave.ang...@bell.net
> Subject: Re: [PATCH 1/2] Add a parameter for the builtin function of prefetch
> to align with LLVM
> 
> On Fri, Oct 14, 2022 at 04:34:05PM +0800, Haochen Jiang wrote:
> > * config/s390/s390.cc (s390_expand_cpymem): Generate fourth
> parameter for
> 
> (Many too long lines here, this is the first one.  Changelog lines are
> max. 80 positions; a tab is eight).

I will change that in next patch.

> 
> > +  /* Argument 3 must be either zero or one.  */
> > +  if (INTVAL (op3) != 0 && INTVAL (op3) != 1)
> > +{
> > +  warning (0, "invalid fourth argument to %<__builtin_prefetch%>;"
> > +   " using one");
> 
> "using 1" makes sense maybe, but "using one" reads as "using an
> argument", not very sane.
> 
> An error would be better here anyway?

Will change to 1 to avoid confusion in that. The reason why this is a warning
is because previous ones related to constant arguments out of range in prefetch
are also using warning.

/* Argument 2 must be 0, 1, 2, or 3.  */
  if (INTVAL (op2) < 0 || INTVAL (op2) > 3)
{
  warning (0, "invalid third argument to %<__builtin_prefetch%>; using 
zero");
  op2 = const0_rtx;
}

Therefore I use warning to align with them.

> 
> > --- a/gcc/config/rs6000/rs6000.md
> > +++ b/gcc/config/rs6000/rs6000.md
> > @@ -14060,10 +14060,25 @@
> >DONE;
> >  })
> >
> > -(define_insn "prefetch"
> > +(define_expand "prefetch"
> > +  [(prefetch (match_operand 0 "indexed_or_indirect_address")
> > +(match_operand:SI 1 "const_int_operand")
> > +(match_operand:SI 2 "const_int_operand")
> > +(match_operand:SI 3 "const_int_operand"))]
> > +  ""
> > +{
> > +  if (INTVAL (operands[3]) == 0)
> > +  {
> 
> Broken indentation.

I will fix that in updated patch.

> 
> > +warning (0, "instruction prefetch is not supported; using data 
> > prefetch");
> 
> Please use a separate pattern for this, and leave prefetch to mean data
> prefetch, as documented!  Documentation you didn't change btw.  Call the
> new one instruction_prefetch or something equally boring maybe :-)
> 

Actually I changed documentation for prefetch but it is flooded in the patch
(Sorry for that).

In gcc/doc/rtl.texi

-@item (prefetch:@var{m} @var{addr} @var{rw} @var{locality})
+@item (prefetch:@var{m} @var{addr} @var{rw} @var{locality} @var{cache})
 
+Operand @var{cache} is 1 if the prefetch is prefetching data, 0 for prefetching
+instruction;
+targets that do not support instruction prefetch should treat all as data
+prefetch.
 
And for the implementation on the instruction prefetch, actually I have thought
of that way previously. But I chose the way how patch current goes for the
following reasons.

1. Previously we are using parameter to indicate r/w and locality in prefetch. I
suppose it is quite similar in this case. Since the pattern is already there, I 
prefer
reusing them.

2. It will be more natural for developers to extend their prefetch in future.

If anyone have points, welcome further discussion on that.

> When you send an updated patch, please split it up better?  Generic
> changes and documentation in one patch, target changes in a separate
> patch or patches, and testsuite is distinct as well.  It isn't nice to
> have to scroll through thousands of lines to see if there is anything
> relevant to you.

Really sorry for that. Hongtao has explained the reason for why we arrange
this patch and I will split the testcase to another patch.

Also if the change on testsuites on this patch change to minimal change,
the patch will be much smaller than current one.

BRs,
Haochen

> 
> Thanks,
> 
> 
> Segher

RE: [PATCH 1/2] Add a parameter for the builtin function of prefetch to align with LLVM

2022-10-19 Thread Jiang, Haochen via Gcc-patches

> -Original Message-
> From: Segher Boessenkool 
> Sent: Thursday, October 20, 2022 5:14 AM
> To: Andrew Pinski 
> Cc: Jiang, Haochen ; gcc-patches@gcc.gnu.org;
> aol...@gcc.gnu.org; richard.sandif...@arm.com; uweig...@de.ibm.com;
> li...@gcc.gnu.org; g...@amylaar.uk; dje@gmail.com;
> olege...@gcc.gnu.org; claz...@synopsys.com; mfort...@gmail.com;
> da...@redhat.com; dave.ang...@bell.net; hubi...@ucw.cz;
> richard.earns...@arm.com; rguent...@suse.de;
> marcus.shawcr...@arm.com; ramana.radhakrish...@arm.com; Liu, Hongtao
> 
> Subject: Re: [PATCH 1/2] Add a parameter for the builtin function of prefetch
> to align with LLVM
> 
> On Wed, Oct 19, 2022 at 10:14:28AM -0700, Andrew Pinski wrote:
> > Do the testcases really need to be changed rather than adding new
> testcases?
> > Usually it is better if the testcases not change unless really needed
> > to be. That is do these testcases pass without being changed? If not
> > this seems not backwards compatible change and is not something which
> > we should do.  Otherwise you should just add new testcases instead.
> 
> Yes, that is another reason why adding parameters to random builtins is not a
> good idea :-)  s/random/only vaguely related/, if you want.
> 
> This also makes all existing code using these builtins invalid.  If you need 
> such
> testcase changes, that is a red flag.
> 

Maybe the testcase change cause some misunderstanding and concern.

Actually, the patch did not disrupt the previous builtins, as the 
builtin_prefetch
uses vargs. I set the default value of the new parameter as data prefetch, which
means that if we are not using the fourth parameter, just like how we use
prefetch previously, it is still what it is.

The reason why I did the most of the testcase change is to make it looks more
completed at the parameter side. I could take back that change on adding
parameter in current testcases just add tests related to new parameter, which
is a minimal change to current test I suppose.

BRs,
Haochen

> 
> Segher

RE: [r13-3219 Regression] FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxwq 2 on Linux/x86_64

2022-10-17 Thread Jiang, Haochen via Gcc-patches

Yes, the mail service on script machine was down previously after expected 
reboot
and it just recovered but still ran into some problems when sending previously 
email.

That is why this is the only stuck mail got sent and sorry for the disturb.

> -Original Message-
> From: Hongtao Liu 
> Sent: Monday, October 17, 2022 4:53 PM
> To: Jiang, Haochen 
> Cc: gcc-patches@gcc.gnu.org; gcc-regress...@gcc.gnu.org;
> andre.simoesdiasvie...@arm.com
> Subject: Re: [r13-3219 Regression] FAIL: gcc.target/i386/pr92658-sse4.c scan-
> assembler-times pmovzxwq 2 on Linux/x86_64
> 
> This should be already fixed.
> 
> On Mon, Oct 17, 2022 at 4:34 PM haochen.jiang via Gcc-patches  patc...@gcc.gnu.org> wrote:
> >
> > On Linux/x86_64,
> >
> > 25413fdb2ac24933214123e24ba165026452a6f2 is the first bad commit
> > commit 25413fdb2ac24933214123e24ba165026452a6f2
> > Author: Andre Vieira 
> > Date:   Tue Oct 11 10:49:27 2022 +0100
> >
> > vect: Teach vectorizer how to handle bitfield accesses
> >
> > caused
> >
> > FAIL: gcc.target/i386/pr101668.c scan-assembler vpmovsxdq
> > FAIL: gcc.target/i386/pr92658-avx2-2.c scan-assembler-times pmovsxbd 2
> > FAIL: gcc.target/i386/pr92658-avx2-2.c scan-assembler-times pmovsxbq 2
> > FAIL: gcc.target/i386/pr92658-avx2-2.c scan-assembler-times pmovsxbw 2
> > FAIL: gcc.target/i386/pr92658-avx2-2.c scan-assembler-times pmovsxdq 2
> > FAIL: gcc.target/i386/pr92658-avx2-2.c scan-assembler-times pmovsxwd 2
> > FAIL: gcc.target/i386/pr92658-avx2-2.c scan-assembler-times pmovsxwq 2
> > FAIL: gcc.target/i386/pr92658-avx2.c scan-assembler-times pmovzxbd 2
> > FAIL: gcc.target/i386/pr92658-avx2.c scan-assembler-times pmovzxbq 2
> > FAIL: gcc.target/i386/pr92658-avx2.c scan-assembler-times pmovzxbw 2
> > FAIL: gcc.target/i386/pr92658-avx2.c scan-assembler-times pmovzxdq 2
> > FAIL: gcc.target/i386/pr92658-avx2.c scan-assembler-times pmovzxwd 2
> > FAIL: gcc.target/i386/pr92658-avx2.c scan-assembler-times pmovzxwq 2
> > FAIL: gcc.target/i386/pr92658-avx512bw-2.c scan-assembler-times
> > pmovsxbd 2
> > FAIL: gcc.target/i386/pr92658-avx512bw-2.c scan-assembler-times
> > pmovsxbq 2
> > FAIL: gcc.target/i386/pr92658-avx512bw-2.c scan-assembler-times
> > pmovsxbw 2
> > FAIL: gcc.target/i386/pr92658-avx512bw-2.c scan-assembler-times
> > pmovsxdq 2
> > FAIL: gcc.target/i386/pr92658-avx512bw-2.c scan-assembler-times
> > pmovsxwd 2
> > FAIL: gcc.target/i386/pr92658-avx512bw-2.c scan-assembler-times
> > pmovsxwq 2
> > FAIL: gcc.target/i386/pr92658-avx512bw.c scan-assembler-times pmovzxbd
> > 2
> > FAIL: gcc.target/i386/pr92658-avx512bw.c scan-assembler-times pmovzxbq
> > 2
> > FAIL: gcc.target/i386/pr92658-avx512bw.c scan-assembler-times
> pmovzxbw
> > 2
> > FAIL: gcc.target/i386/pr92658-avx512bw.c scan-assembler-times pmovzxdq
> > 2
> > FAIL: gcc.target/i386/pr92658-avx512bw.c scan-assembler-times
> pmovzxwd
> > 2
> > FAIL: gcc.target/i386/pr92658-avx512bw.c scan-assembler-times
> pmovzxwq
> > 2
> > FAIL: gcc.target/i386/pr92658-avx512bw-trunc.c scan-assembler-times
> > vpmovwb 3
> > FAIL: gcc.target/i386/pr92658-avx512f.c scan-assembler-times vpmovdb 1
> > FAIL: gcc.target/i386/pr92658-avx512f.c scan-assembler-times vpmovdw 1
> > FAIL: gcc.target/i386/pr92658-avx512f.c scan-assembler-times vpmovqb 1
> > FAIL: gcc.target/i386/pr92658-avx512f.c scan-assembler-times vpmovqd 1
> > FAIL: gcc.target/i386/pr92658-avx512f.c scan-assembler-times vpmovqw 1
> > FAIL: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovdb
> > 2
> > FAIL: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovdw
> > 2
> > FAIL: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovqb[
> > \t]*%xmm 1
> > FAIL: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovqb[
> > \t]*%ymm 1
> > FAIL: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovqd
> > 2
> > FAIL: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovqw
> > 2
> > FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxbd 2
> > FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxbq 2
> > FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxbw 2
> > FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxdq 2
> > FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxwd 2
> > FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxwq 2
> > FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxbd 2
> > FAIL: gcc.target/i386/pr92658-sse4.c sca

RE: [PATCH 2/6] Support Intel AVX-VNNI-INT8

2022-10-17 Thread Jiang, Haochen via Gcc-patches

> -Original Message-
> From: Hongtao Liu 
> Sent: Monday, October 17, 2022 12:05 PM
> To: Jiang, Haochen 
> Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao 
> Subject: Re: [PATCH 2/6] Support Intel AVX-VNNI-INT8
> 
> On Fri, Oct 14, 2022 at 3:57 PM Haochen Jiang via Gcc-patches  patc...@gcc.gnu.org> wrote:
> >
> > From: Kong Lingling 
> >
> > gcc/ChangeLog
> >
> > * common/config/i386/cpuinfo.h (get_available_features): Detect
> > avxvnniint8.
> > * common/config/i386/i386-common.cc
> > (OPTION_MASK_ISA2_AVXVNNIINT8_SET): New.
> > (OPTION_MASK_ISA2_AVXVNNIINT8_UNSET): Ditto.
> > (ix86_handle_option): Handle -mavxvnniint8.
> > * common/config/i386/i386-cpuinfo.h (enum processor_features):
> > Add FEATURE_AVXVNNIINT8.
> > * common/config/i386/i386-isas.h: Add ISA_NAME_TABLE_ENTRY for
> > avxvnniint8.
> > * config.gcc: Add avxvnniint8intrin.h.
> > * config/i386/avxvnniint8intrin.h: New file.
> > * config/i386/cpuid.h (bit_AVXVNNIINT8): New.
> > * config/i386/i386-builtin.def: Add new builtins.
> > * config/i386/i386-c.cc (ix86_target_macros_internal): Define
> > __AVXVNNIINT8__.
> > * config/i386/i386-options.cc (isa2_opts): Add -mavxvnniint8.
> > (ix86_valid_target_attribute_inner_p): Handle avxvnniint8.
> > * config/i386/i386-isa.def: Add DEF_PTA(AVXVNNIINT8) New..
> > * config/i386/i386.opt: Add option -mavxvnniint8.
> > * config/i386/immintrin.h: Include avxvnniint8intrin.h.
> > * config/i386/sse.md
> > (vpdp_): New define_insn.
> > * doc/extend.texi: Document avxvnniint8.
> > * doc/invoke.texi: Document -mavxvnniint8.
> > * doc/sourcebuild.texi: Document target avxvnniint8.
> >
> > gcc/testsuite/ChangeLog
> >
> > * g++.dg/other/i386-2.C: Add -mavxvnniint8.
> > * g++.dg/other/i386-3.C: Ditto.
> > * gcc.target/i386/avx-check.h: Add avxvnniint8 check.
> > * gcc.target/i386/sse-12.c: Add -mavxvnniint8.
> > * gcc.target/i386/sse-13.c: Ditto.
> > * gcc.target/i386/sse-14.c: Ditto.
> > * gcc.target/i386/sse-22.c: Ditto.
> > * gcc.target/i386/sse-23.c: Ditto.
> > * gcc.target/i386/funcspec-56.inc: Add new target attribute.
> > * lib/target-supports.exp
> > (check_effective_target_avxvnniint8): New.
> > * gcc.target/i386/avxvnniint8-1.c: Ditto.
> > * gcc.target/i386/avxvnniint8-vpdpbssd-2.c: Ditto.
> > * gcc.target/i386/avxvnniint8-vpdpbssds-2.c: Ditto.
> > * gcc.target/i386/avxvnniint8-vpdpbsud-2.c: Ditto.
> > * gcc.target/i386/avxvnniint8-vpdpbsuds-2.c: Ditto.
> > * gcc.target/i386/avxvnniint8-vpdpbuud-2.c: Ditto.
> > * gcc.target/i386/avxvnniint8-vpdpbuuds-2.c: Ditto.
> >
> > Co-authored-by: Hongyu Wang 
> > Co-authored-by: Haochen Jiang 
> > ---
> >  gcc/common/config/i386/cpuinfo.h  |   2 +
> >  gcc/common/config/i386/i386-common.cc |  22 ++-
> >  gcc/common/config/i386/i386-cpuinfo.h |   1 +
> >  gcc/common/config/i386/i386-isas.h|   2 +
> >  gcc/config.gcc|   2 +-
> >  gcc/config/i386/avxvnniint8intrin.h   | 138 ++
> >  gcc/config/i386/cpuid.h   |   1 +
> >  gcc/config/i386/i386-builtin.def  |  14 ++
> >  gcc/config/i386/i386-c.cc |   2 +
> >  gcc/config/i386/i386-isa.def  |   1 +
> >  gcc/config/i386/i386-options.cc   |   4 +-
> >  gcc/config/i386/i386.opt  |   5 +
> >  gcc/config/i386/immintrin.h   |   2 +
> >  gcc/config/i386/sse.md|  31 
> >  gcc/doc/extend.texi   |   5 +
> >  gcc/doc/invoke.texi   |   9 +-
> >  gcc/doc/sourcebuild.texi  |   3 +
> >  gcc/testsuite/g++.dg/other/i386-2.C   |   2 +-
> >  gcc/testsuite/g++.dg/other/i386-3.C   |   2 +-
> >  gcc/testsuite/gcc.target/i386/avx-check.h |   3 +
> >  gcc/testsuite/gcc.target/i386/avxvnniint8-1.c |  43 ++
> > .../gcc.target/i386/avxvnniint8-vpdpbssd-2.c  |  72 +
> > .../gcc.target/i386/avxvnniint8-vpdpbssds-2.c |  72 +
> > .../gcc.target/i386/avxvnniint8-vpdpbsud-2.c  |  72 +
> > .../gcc.target/i386/avxvnniint8-vpdpbsuds-2.c |  72 +

RE: [r13-3172 Regression] FAIL:libgomp.oacc-c../../libgomp.oacc-c-c..-common/kernels-loop-g.c -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable -O2 (test for excess errors) on Linux/x86_64

2022-10-17 Thread Jiang, Haochen via Gcc-patches

Hi Rozenfeld,

I just checkout to your commit and the test still got failed.

It is reporting like this:
xgcc: error: 
/export/users2/haochenj/src/gcc/master/./libgomp/testsuite/libgomp.oacc-c++/../libgomp.oacc-c-c++-common/kernels-loop-g.c:
 '-fcompare-debug' failure (length)

Also fix a typo in manually sending, should be this to reproduce

To reproduce:

$ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
RUNTESTFLAGS="c++.exp=libgomp.oacc-c-c++-common/kernels-loop-g.c 
--target_board='unix{-m32}'"
$ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
RUNTESTFLAGS="c++.exp=libgomp.oacc-c-c++-common/kernels-loop-g.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
RUNTESTFLAGS="c++.exp=libgomp.oacc-c-c++-common/kernels-loop-g.c 
--target_board='unix{-m64}'"
$ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
RUNTESTFLAGS="c++.exp=libgomp.oacc-c-c++-common/kernels-loop-g.c 
--target_board='unix{-m64\ -march=cascadelake}'"

BRs,
Haochen

From: Jiang, Haochen
Sent: Monday, October 17, 2022 1:41 PM
To: Eugene Rozenfeld ; gcc-patches@gcc.gnu.org; 
gcc-regress...@gcc.gnu.org
Subject: RE: [r13-3172 Regression] 
FAIL:libgomp.oacc-c../../libgomp.oacc-c-c..-common/kernels-loop-g.c 
-DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable -O2 (test for 
excess errors) on Linux/x86_64

If that has been fixed, just ignore that mail.

It is run through by a script and got the result few days ago. However, the 
sendmail
service was down on that machine and I just noticed that issue. So I sent that 
result
manually today in case that is not fixed.

Sorry for the disturb!

BRs,
Haochen

From: Eugene Rozenfeld 
mailto:eugene.rozenf...@microsoft.com>>
Sent: Monday, October 17, 2022 1:23 PM
To: Jiang, Haochen mailto:haochen.ji...@intel.com>>; 
gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org>; 
gcc-regress...@gcc.gnu.org<mailto:gcc-regress...@gcc.gnu.org>
Subject: RE: [r13-3172 Regression] 
FAIL:libgomp.oacc-c../../libgomp.oacc-c-c..-common/kernels-loop-g.c 
-DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable -O2 (test for 
excess errors) on Linux/x86_64

That commit had a bug that was fixed in 
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=80f414e6d73f9f1683f93d83ce63a6a482e54bee

Was that fix included in your GCC build?

From: Jiang, Haochen mailto:haochen.ji...@intel.com>>
Sent: Sunday, October 16, 2022 8:09 PM
To: gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org>; Eugene Rozenfeld 
mailto:eugene.rozenf...@microsoft.com>>; Jiang, 
Haochen mailto:haochen.ji...@intel.com>>; 
gcc-regress...@gcc.gnu.org<mailto:gcc-regress...@gcc.gnu.org>
Subject: [EXTERNAL] [r13-3172 Regression] 
FAIL:libgomp.oacc-c../../libgomp.oacc-c-c..-common/kernels-loop-g.c 
-DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable -O2 (test for 
excess errors) on Linux/x86_64

You don't often get email from 
haochen.ji...@intel.com<mailto:haochen.ji...@intel.com>. Learn why this is 
important<https://aka.ms/LearnAboutSenderIdentification>
On Linux/x86_64,

f30e9fd33e56a5a721346ea6140722e1b193db42 is the first bad commit
commit f30e9fd33e56a5a721346ea6140722e1b193db42
Author: Eugene Rozenfeld mailto:ero...@microsoft.com>>
Date:   Thu Apr 21 16:43:24 2022 -0700

Set discriminators for call stmts on the same line within the same basic 
block.

caused

FAIL: libgomp.oacc-c../../libgomp.oacc-c-c..-common/kernels-loop-g.c 
-DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable  -O2  (test for 
excess errors)

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r13-2288/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="c++.exp=libgomp.oacc-c-c++-common/kernels-loop-g.c 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="c++.exp=libgomp.oacc-c-c++-common/kernels-loop-g.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="c++.exp=libgomp.oacc-c-c++-common/kernels-loop-g.c 
--target_board='unix{-m64}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="c++.exp=libgomp.oacc-c-c++-common/kernels-loop-g.c 
--target_board='unix{-m64\ -march=cascadelake}'"

RE: [r13-3172 Regression] FAIL:libgomp.oacc-c../../libgomp.oacc-c-c..-common/kernels-loop-g.c -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable -O2 (test for excess errors) on Linux/x86_64

2022-10-16 Thread Jiang, Haochen via Gcc-patches

If that has been fixed, just ignore that mail.

It is run through by a script and got the result few days ago. However, the 
sendmail
service was down on that machine and I just noticed that issue. So I sent that 
result
manually today in case that is not fixed.

Sorry for the disturb!

BRs,
Haochen

From: Eugene Rozenfeld 
Sent: Monday, October 17, 2022 1:23 PM
To: Jiang, Haochen ; gcc-patches@gcc.gnu.org; 
gcc-regress...@gcc.gnu.org
Subject: RE: [r13-3172 Regression] 
FAIL:libgomp.oacc-c../../libgomp.oacc-c-c..-common/kernels-loop-g.c 
-DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable -O2 (test for 
excess errors) on Linux/x86_64

That commit had a bug that was fixed in 
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=80f414e6d73f9f1683f93d83ce63a6a482e54bee

Was that fix included in your GCC build?

From: Jiang, Haochen mailto:haochen.ji...@intel.com>>
Sent: Sunday, October 16, 2022 8:09 PM
To: gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org>; Eugene Rozenfeld 
mailto:eugene.rozenf...@microsoft.com>>; Jiang, 
Haochen mailto:haochen.ji...@intel.com>>; 
gcc-regress...@gcc.gnu.org<mailto:gcc-regress...@gcc.gnu.org>
Subject: [EXTERNAL] [r13-3172 Regression] 
FAIL:libgomp.oacc-c../../libgomp.oacc-c-c..-common/kernels-loop-g.c 
-DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable -O2 (test for 
excess errors) on Linux/x86_64

You don't often get email from 
haochen.ji...@intel.com<mailto:haochen.ji...@intel.com>. Learn why this is 
important<https://aka.ms/LearnAboutSenderIdentification>
On Linux/x86_64,

f30e9fd33e56a5a721346ea6140722e1b193db42 is the first bad commit
commit f30e9fd33e56a5a721346ea6140722e1b193db42
Author: Eugene Rozenfeld mailto:ero...@microsoft.com>>
Date:   Thu Apr 21 16:43:24 2022 -0700

Set discriminators for call stmts on the same line within the same basic 
block.

caused

FAIL: libgomp.oacc-c../../libgomp.oacc-c-c..-common/kernels-loop-g.c 
-DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable  -O2  (test for 
excess errors)

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r13-2288/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="c++.exp=libgomp.oacc-c-c++-common/kernels-loop-g.c 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="c++.exp=libgomp.oacc-c-c++-common/kernels-loop-g.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="c++.exp=libgomp.oacc-c-c++-common/kernels-loop-g.c 
--target_board='unix{-m64}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="c++.exp=libgomp.oacc-c-c++-common/kernels-loop-g.c 
--target_board='unix{-m64\ -march=cascadelake}'"

[r13-3212 Regression] FAIL: gcc.dg/tree-ssa/forwprop-19.c scan-tree-dump-not forwprop1 .VEC_PERM_EXPR. on Linux/x86_64

2022-10-16 Thread Jiang, Haochen via Gcc-patches

On Linux/x86_64,

b88adba751da635c6f0c353c5bc51bbe2ecf4c89 is the first bad commit
commit b88adba751da635c6f0c353c5bc51bbe2ecf4c89
Author: Liwei Xu liwei...@intel.com
Date:   Fri Sep 23 13:46:02 2022 +0800

Optimize nested permutation to single VEC_PERM_EXPR [PR54346]

caused

FAIL: gcc.dg/tree-ssa/forwprop-19.c scan-tree-dump-not forwprop1 .VEC_PERM_EXPR.

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r13-3212/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/forwprop-19.c 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/forwprop-19.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/forwprop-19.c 
--target_board='unix{-m64}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/forwprop-19.c 
--target_board='unix{-m64\ -march=cascadelake}'"

[r13-3172 Regression] FAIL:libgomp.oacc-c../../libgomp.oacc-c-c..-common/kernels-loop-g.c -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable -O2 (test for excess errors) on Linux/x86_64

2022-10-16 Thread Jiang, Haochen via Gcc-patches

On Linux/x86_64,

f30e9fd33e56a5a721346ea6140722e1b193db42 is the first bad commit
commit f30e9fd33e56a5a721346ea6140722e1b193db42
Author: Eugene Rozenfeld mailto:ero...@microsoft.com>>
Date:   Thu Apr 21 16:43:24 2022 -0700

Set discriminators for call stmts on the same line within the same basic 
block.

caused

FAIL: libgomp.oacc-c../../libgomp.oacc-c-c..-common/kernels-loop-g.c 
-DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable  -O2  (test for 
excess errors)

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r13-2288/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="c++.exp=libgomp.oacc-c-c++-common/kernels-loop-g.c 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="c++.exp=libgomp.oacc-c-c++-common/kernels-loop-g.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="c++.exp=libgomp.oacc-c-c++-common/kernels-loop-g.c 
--target_board='unix{-m64}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="c++.exp=libgomp.oacc-c-c++-common/kernels-loop-g.c 
--target_board='unix{-m64\ -march=cascadelake}'"

RE: [PATCH] i386: Add syscall to enable AMX for latest kernels

2022-09-22 Thread Jiang, Haochen via Gcc-patches

Hi all,

I would like to backport this patch to GCC 12 release branch as machines with 
the version of default GCC
is 12.x (which is always using newer kernels), if the patch is not backported, 
the amx tests will always fail.

Ok for backport?

BRs,
Haochen

> -Original Message-
> From: Uros Bizjak 
> Sent: Tuesday, June 21, 2022 10:53 PM
> To: Jiang, Haochen 
> Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao 
> Subject: Re: [PATCH] i386: Add syscall to enable AMX for latest kernels
> 
> On Tue, Jun 21, 2022 at 9:41 AM Jiang, Haochen 
> wrote:
> >
> > > -Original Message-
> > > From: Uros Bizjak 
> > > Sent: Tuesday, June 21, 2022 3:06 PM
> > > To: Jiang, Haochen 
> > > Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao 
> > > Subject: Re: [PATCH] i386: Add syscall to enable AMX for latest
> > > kernels
> > >
> > > On Tue, Jun 21, 2022 at 4:23 AM Jiang, Haochen
> > > 
> > > wrote:
> > > >
> > > > > -Original Message-
> > > > > From: Uros Bizjak 
> > > > > Sent: Monday, June 20, 2022 10:54 PM
> > > > > To: Jiang, Haochen 
> > > > > Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao
> > > > > 
> > > > > Subject: Re: [PATCH] i386: Add syscall to enable AMX for latest
> > > > > kernels
> > > > >
> > > > > On Mon, Jun 20, 2022 at 10:04 AM Haochen Jiang
> > > > > 
> > > > > wrote:
> > > > > >
> > > > > > From: "Jiang, Haochen" 
> > > > > >
> > > > > > Hi all,
> > > > > >
> > > > > > We need syscall to enable AMX for kernels>=5.4. It is missing
> > > > > > in current amx tests, which will cause test fail.
> > > > >
> > > > > So this new code is only valid for linux & co?
> > > >
> > > > Thanks for reminding me for that, I only test on linux since the
> > > > header file is
> > > only in linux.
> > > >
> > > > Just updated a patch wrapping with a macro not to change the
> > > > behavior on
> > > windows.
> > >
> > > I think you want __linux__ there, not __unix__.
> >
> > Fixed with __linux__.
> 
> OK.
> 
> Thanks,
> Uros.
> 
> >
> > Thx,
> > Haochen
> >
> > >
> > > Uros.
> > >
> > > >
> > > > Regtested on x86_64-pc-linux-gnu.
> > > >
> > > > Thx,
> > > > Haochen
> > > > >
> > > > > Uros.
> > > > >
> > > > > >
> > > > > > This patch aims to add them to fix this bug.
> > > > > >
> > > > > > BRs,
> > > > > > Haochen
> > > > > >
> > > > > > gcc/testsuite/ChangeLog:
> > > > > >
> > > > > > * gcc.target/i386/amx-check.h (request_perm_xtile_data):
> > > > > > New function to check if AMX is usable and enable AMX.
> > > > > > (main): Run test if AMX is usable.
> > > > > > ---
> > > > > >  gcc/testsuite/gcc.target/i386/amx-check.h | 24
> > > > > > +++
> > > > > >  1 file changed, 24 insertions(+)
> > > > > >
> > > > > > diff --git a/gcc/testsuite/gcc.target/i386/amx-check.h
> > > > > > b/gcc/testsuite/gcc.target/i386/amx-check.h
> > > > > > index 434b0e59703..92ed8669304 100644
> > > > > > --- a/gcc/testsuite/gcc.target/i386/amx-check.h
> > > > > > +++ b/gcc/testsuite/gcc.target/i386/amx-check.h
> > > > > > @@ -4,11 +4,22 @@
> > > > > >  #include 
> > > > > >  #include 
> > > > > >  #include 
> > > > > > +#include 
> > > > > > +#include 
> > > > > >  #ifdef DEBUG
> > > > > >  #include 
> > > > > >  #endif
> > > > > >  #include "cpuid.h"
> > > > > >
> > > > > > +#define XFEATURE_XTILECFG  17
> > > > > > +#define XFEATURE_XTILEDATA 18
> > > > > > +#define XFEATURE_MASK_XTILECFG (1 << XFEATURE_XTILECFG)
> > > > > > +#define XFEATURE_MASK_XTILEDATA(1 << XFEATURE_XTILEDATA)
> > > > > > +#define XFEATURE_MASK_XTILE(XFEATURE_MASK_XTILECFG |
> > > > > XFEATURE_MASK_XTILEDATA)
> > > > > > +
> > > > > > +#define ARCH_GET_XCOMP_PERM0x1022
> > > > > > +#define ARCH_REQ_XCOMP_PERM0x1023
> > > > > > +
> > > > > >  /* TODO: The tmm emulation is temporary for current
> > > > > > AMX implementation with no tmm regclass, should
> > > > > > be changed in the future. */ @@ -44,6 +55,18 @@ typedef
> > > > > > struct __tile
> > > > > >  /* Stride (colum width in byte) used for tileload/store */
> > > > > > #define _STRIDE 64
> > > > > >
> > > > > > +/* We need syscall to use amx functions */ int
> > > > > > +request_perm_xtile_data() {
> > > > > > +  unsigned long bitmask;
> > > > > > +
> > > > > > +  if (syscall (SYS_arch_prctl, ARCH_REQ_XCOMP_PERM,
> > > > > XFEATURE_XTILEDATA) ||
> > > > > > +  syscall (SYS_arch_prctl, ARCH_GET_XCOMP_PERM, ))
> > > > > > +return 0;
> > > > > > +
> > > > > > +  return (bitmask & XFEATURE_MASK_XTILE) != 0; }
> > > > > > +
> > > > > >  /* Initialize tile config by setting all tmm size to 16x64 */
> > > > > > void init_tile_config (__tilecfg_u *dst)  { @@ -186,6 +209,7
> > > > > > @@ main () #ifdef AMX_BF16
> > > > > >&& __builtin_cpu_supports ("amx-bf16")  #endif
> > > > > > +  && request_perm_xtile_data ()
> > > > > >)
> > > > > >  {
> > > > > >DO_TEST ();
> > > > > > --
> > > > > > 2.18.2
> > > > > >

RE: [PATCH][pushed] MAINTAINERS: fix alphabetic sorting

2022-07-04 Thread Jiang, Haochen via Gcc-patches

> -Original Message-
> From: Martin Liška 
> Sent: Monday, July 4, 2022 6:17 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Jiang, Haochen 
> Subject: [PATCH][pushed] MAINTAINERS: fix alphabetic sorting
> 
> ChangeLog:
> 
>   * MAINTAINERS: fix sorting of names
> ---
>  MAINTAINERS | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index f4a11cdc755..7d9aab76dd9 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -463,8 +463,8 @@ Andreas Jaeger
>   
>  Harsha Jagasia   
>  Fariborz Jahanian
>  Surya Kumari Jangala 
> -Qian Jianhua 
>  Haochen Jiang
> 
> +Qian Jianhua 

Sorry for misordering that g and h in alphabet table.

Maybe time to go back to kindergarten to have a review on that. Thanks for 
fixing!

Haochen

>  Janis Johnson
>   
>  Teresa Johnson
>   
>  Kean Johnston
> --
> 2.36.1

RE: [PATCH] i386: Extend cvtps2pd to memory

2022-07-03 Thread Jiang, Haochen via Gcc-patches

Hi all,

I revised my patch according to all your reviews.

Regtested on x86_64-pc-linux-gnu.

BRs,
Haochen

> -Original Message-
> From: Liu, Hongtao 
> Sent: Thursday, June 30, 2022 4:57 PM
> To: Uros Bizjak ; Jiang, Haochen
> 
> Cc: gcc-patches@gcc.gnu.org
> Subject: RE: [PATCH] i386: Extend cvtps2pd to memory
> 
> 
> 
> > -Original Message-
> > From: Uros Bizjak 
> > Sent: Thursday, June 30, 2022 4:53 PM
> > To: Jiang, Haochen 
> > Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao 
> > Subject: Re: [PATCH] i386: Extend cvtps2pd to memory
> >
> > On Thu, Jun 30, 2022 at 10:45 AM Uros Bizjak  wrote:
> > >
> > > On Thu, Jun 30, 2022 at 9:41 AM Uros Bizjak  wrote:
> > > >
> > > > On Thu, Jun 30, 2022 at 9:24 AM Jiang, Haochen
> 
> > wrote:
> > > > >
> > > > > > -Original Message-
> > > > > > From: Uros Bizjak 
> > > > > > Sent: Thursday, June 30, 2022 2:20 PM
> > > > > > To: Jiang, Haochen 
> > > > > > Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao
> > > > > > 
> > > > > > Subject: Re: [PATCH] i386: Extend cvtps2pd to memory
> > > > > >
> > > > > > On Thu, Jun 30, 2022 at 7:59 AM Haochen Jiang
> > > > > > 
> > > > > > wrote:
> > > > > > >
> > > > > > > Hi all,
> > > > > > >
> > > > > > > This patch aims to fix the cvtps2pd insn, which should also
> > > > > > > work on memory operand but currently does not. After this fix,
> > > > > > > when loop == 2, it will eliminate movq instruction.
> > > > > > >
> > > > > > > Regtested on x86_64-pc-linux-gnu. Ok for trunk?
> > > > > > >
> > > > > > > BRs,
> > > > > > > Haochen
> > > > > > >
> > > > > > > gcc/ChangeLog:
> > > > > > >
> > > > > > > PR target/43618
> > > > > > > * config/i386/sse.md (extendv2sfv2df2): New define_expand.
> > > > > > > (sse2_cvtps2pd_load): Rename
> extendvsdfv2df2.
> > >
> > > Rename FROM ...
> > >
> > > Please also mention change to sse2_cvtps2pd.
> > >
> > > > > > >
> > > > > > > gcc/testsuite/ChangeLog:
> > > > > > >
> > > > > > > PR target/43618
> > > > > > > * gcc.target/i386/pr43618-1.c: New test.
> > > > > >
> > > > > > This patch could be as simple as:
> > > > > >
> > > > > > diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> > > > > > index 8cd0f617bf3..c331445cb2d 100644
> > > > > > --- a/gcc/config/i386/sse.md
> > > > > > +++ b/gcc/config/i386/sse.md
> > > > > > @@ -9195,7 +9195,7 @@
> > > > > > (define_insn "extendv2sfv2df2"
> > > > > >   [(set (match_operand:V2DF 0 "register_operand" "=v")
> > > > > >(float_extend:V2DF
> > > > > > - (match_operand:V2SF 1 "register_operand" "v")))]
> > > > > > + (match_operand:V2SF 1 "nonimmediate_operand" "vm")))]
> > > > > >   "TARGET_MMX_WITH_SSE"
> > > > > >   "%vcvtps2pd\t{%1, %0|%0, %1}"
> > > > > >   [(set_attr "type" "ssecvt")
> > > > >
> > > > > We also tested on this version, it is ok.
> > > > >
> > > > > The reason why the patch looks like this is because in the
> > > > > previous insn sse2_cvtps2pd, the constraint vm and
> > > > > vector_operand actually does not match the actual instruction.
> > > > > Memory operand is V2SF, not V4SF.
> > > > >
> > > > > Therefore, we changed the constraint in that insn. Then it caused
> another
> > issue.
> > > > > For memory operand, it seems that we cannot generate those mask
> > instructions.
> > > > > So I change the pattern to how extendv2hfv2df2 works.
> > > >
> > > > If you want to change the memory access in
> sse2_cvtps2pd,
> > > > then please see how e.g. v2hiv2di is handled in sse.md. In
> > >

RE: [PATCH] i386: Extend cvtps2pd to memory

2022-06-30 Thread Jiang, Haochen via Gcc-patches

> -Original Message-
> From: Uros Bizjak 
> Sent: Thursday, June 30, 2022 2:20 PM
> To: Jiang, Haochen 
> Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao 
> Subject: Re: [PATCH] i386: Extend cvtps2pd to memory
> 
> On Thu, Jun 30, 2022 at 7:59 AM Haochen Jiang 
> wrote:
> >
> > Hi all,
> >
> > This patch aims to fix the cvtps2pd insn, which should also work on
> > memory operand but currently does not. After this fix, when loop == 2,
> > it will eliminate movq instruction.
> >
> > Regtested on x86_64-pc-linux-gnu. Ok for trunk?
> >
> > BRs,
> > Haochen
> >
> > gcc/ChangeLog:
> >
> > PR target/43618
> > * config/i386/sse.md (extendv2sfv2df2): New define_expand.
> > (sse2_cvtps2pd_load): Rename extendvsdfv2df2.
> >
> > gcc/testsuite/ChangeLog:
> >
> > PR target/43618
> > * gcc.target/i386/pr43618-1.c: New test.
> 
> This patch could be as simple as:
> 
> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index
> 8cd0f617bf3..c331445cb2d 100644
> --- a/gcc/config/i386/sse.md
> +++ b/gcc/config/i386/sse.md
> @@ -9195,7 +9195,7 @@
> (define_insn "extendv2sfv2df2"
>   [(set (match_operand:V2DF 0 "register_operand" "=v")
>(float_extend:V2DF
> - (match_operand:V2SF 1 "register_operand" "v")))]
> + (match_operand:V2SF 1 "nonimmediate_operand" "vm")))]
>   "TARGET_MMX_WITH_SSE"
>   "%vcvtps2pd\t{%1, %0|%0, %1}"
>   [(set_attr "type" "ssecvt")

We also tested on this version, it is ok.

The reason why the patch looks like this is because in the previous insn
sse2_cvtps2pd, the constraint vm and vector_operand
actually does not match the actual instruction. Memory operand is V2SF,
not V4SF.

Therefore, we changed the constraint in that insn. Then it caused another issue.
For memory operand, it seems that we cannot generate those mask instructions.
So I change the pattern to how extendv2hfv2df2 works.

Haochen

> Uros.

RE: [PATCH] i386: Add syscall to enable AMX for latest kernels

2022-06-21 Thread Jiang, Haochen via Gcc-patches

> -Original Message-
> From: Uros Bizjak 
> Sent: Tuesday, June 21, 2022 3:06 PM
> To: Jiang, Haochen 
> Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao 
> Subject: Re: [PATCH] i386: Add syscall to enable AMX for latest kernels
> 
> On Tue, Jun 21, 2022 at 4:23 AM Jiang, Haochen 
> wrote:
> >
> > > -Original Message-
> > > From: Uros Bizjak 
> > > Sent: Monday, June 20, 2022 10:54 PM
> > > To: Jiang, Haochen 
> > > Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao 
> > > Subject: Re: [PATCH] i386: Add syscall to enable AMX for latest
> > > kernels
> > >
> > > On Mon, Jun 20, 2022 at 10:04 AM Haochen Jiang
> > > 
> > > wrote:
> > > >
> > > > From: "Jiang, Haochen" 
> > > >
> > > > Hi all,
> > > >
> > > > We need syscall to enable AMX for kernels>=5.4. It is missing in
> > > > current amx tests, which will cause test fail.
> > >
> > > So this new code is only valid for linux & co?
> >
> > Thanks for reminding me for that, I only test on linux since the header 
> > file is
> only in linux.
> >
> > Just updated a patch wrapping with a macro not to change the behavior on
> windows.
> 
> I think you want __linux__ there, not __unix__.

Fixed with __linux__.

Thx,
Haochen

> 
> Uros.
> 
> >
> > Regtested on x86_64-pc-linux-gnu.
> >
> > Thx,
> > Haochen
> > >
> > > Uros.
> > >
> > > >
> > > > This patch aims to add them to fix this bug.
> > > >
> > > > BRs,
> > > > Haochen
> > > >
> > > > gcc/testsuite/ChangeLog:
> > > >
> > > > * gcc.target/i386/amx-check.h (request_perm_xtile_data):
> > > > New function to check if AMX is usable and enable AMX.
> > > > (main): Run test if AMX is usable.
> > > > ---
> > > >  gcc/testsuite/gcc.target/i386/amx-check.h | 24
> > > > +++
> > > >  1 file changed, 24 insertions(+)
> > > >
> > > > diff --git a/gcc/testsuite/gcc.target/i386/amx-check.h
> > > > b/gcc/testsuite/gcc.target/i386/amx-check.h
> > > > index 434b0e59703..92ed8669304 100644
> > > > --- a/gcc/testsuite/gcc.target/i386/amx-check.h
> > > > +++ b/gcc/testsuite/gcc.target/i386/amx-check.h
> > > > @@ -4,11 +4,22 @@
> > > >  #include 
> > > >  #include 
> > > >  #include 
> > > > +#include 
> > > > +#include 
> > > >  #ifdef DEBUG
> > > >  #include 
> > > >  #endif
> > > >  #include "cpuid.h"
> > > >
> > > > +#define XFEATURE_XTILECFG  17
> > > > +#define XFEATURE_XTILEDATA 18
> > > > +#define XFEATURE_MASK_XTILECFG (1 << XFEATURE_XTILECFG)
> > > > +#define XFEATURE_MASK_XTILEDATA(1 << XFEATURE_XTILEDATA)
> > > > +#define XFEATURE_MASK_XTILE(XFEATURE_MASK_XTILECFG |
> > > XFEATURE_MASK_XTILEDATA)
> > > > +
> > > > +#define ARCH_GET_XCOMP_PERM0x1022
> > > > +#define ARCH_REQ_XCOMP_PERM0x1023
> > > > +
> > > >  /* TODO: The tmm emulation is temporary for current
> > > > AMX implementation with no tmm regclass, should
> > > > be changed in the future. */
> > > > @@ -44,6 +55,18 @@ typedef struct __tile
> > > >  /* Stride (colum width in byte) used for tileload/store */
> > > > #define _STRIDE 64
> > > >
> > > > +/* We need syscall to use amx functions */ int
> > > > +request_perm_xtile_data() {
> > > > +  unsigned long bitmask;
> > > > +
> > > > +  if (syscall (SYS_arch_prctl, ARCH_REQ_XCOMP_PERM,
> > > XFEATURE_XTILEDATA) ||
> > > > +  syscall (SYS_arch_prctl, ARCH_GET_XCOMP_PERM, ))
> > > > +return 0;
> > > > +
> > > > +  return (bitmask & XFEATURE_MASK_XTILE) != 0; }
> > > > +
> > > >  /* Initialize tile config by setting all tmm size to 16x64 */
> > > > void init_tile_config (__tilecfg_u *dst)  { @@ -186,6 +209,7 @@
> > > > main () #ifdef AMX_BF16
> > > >&& __builtin_cpu_supports ("amx-bf16")  #endif
> > > > +  && request_perm_xtile_data ()
> > > >)
> > > >  {
> > > >DO_TEST ();
> > > > --
> > > > 2.18.2
> > > >


0001-i386-Add-syscall-to-enable-AMX-for-latest-kernels.patch
Description: 0001-i386-Add-syscall-to-enable-AMX-for-latest-kernels.patch

RE: [PATCH] i386: Add syscall to enable AMX for latest kernels

2022-06-20 Thread Jiang, Haochen via Gcc-patches

> -Original Message-
> From: Uros Bizjak 
> Sent: Monday, June 20, 2022 10:54 PM
> To: Jiang, Haochen 
> Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao 
> Subject: Re: [PATCH] i386: Add syscall to enable AMX for latest kernels
> 
> On Mon, Jun 20, 2022 at 10:04 AM Haochen Jiang 
> wrote:
> >
> > From: "Jiang, Haochen" 
> >
> > Hi all,
> >
> > We need syscall to enable AMX for kernels>=5.4. It is missing in
> > current amx tests, which will cause test fail.
> 
> So this new code is only valid for linux & co?

Thanks for reminding me for that, I only test on linux since the header file is 
only in linux.

Just updated a patch wrapping with a macro not to change the behavior on 
windows.

Regtested on x86_64-pc-linux-gnu.

Thx,
Haochen
> 
> Uros.
> 
> >
> > This patch aims to add them to fix this bug.
> >
> > BRs,
> > Haochen
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/i386/amx-check.h (request_perm_xtile_data):
> > New function to check if AMX is usable and enable AMX.
> > (main): Run test if AMX is usable.
> > ---
> >  gcc/testsuite/gcc.target/i386/amx-check.h | 24
> > +++
> >  1 file changed, 24 insertions(+)
> >
> > diff --git a/gcc/testsuite/gcc.target/i386/amx-check.h
> > b/gcc/testsuite/gcc.target/i386/amx-check.h
> > index 434b0e59703..92ed8669304 100644
> > --- a/gcc/testsuite/gcc.target/i386/amx-check.h
> > +++ b/gcc/testsuite/gcc.target/i386/amx-check.h
> > @@ -4,11 +4,22 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> > +#include 
> >  #ifdef DEBUG
> >  #include 
> >  #endif
> >  #include "cpuid.h"
> >
> > +#define XFEATURE_XTILECFG  17
> > +#define XFEATURE_XTILEDATA 18
> > +#define XFEATURE_MASK_XTILECFG (1 << XFEATURE_XTILECFG)
> > +#define XFEATURE_MASK_XTILEDATA(1 << XFEATURE_XTILEDATA)
> > +#define XFEATURE_MASK_XTILE(XFEATURE_MASK_XTILECFG |
> XFEATURE_MASK_XTILEDATA)
> > +
> > +#define ARCH_GET_XCOMP_PERM0x1022
> > +#define ARCH_REQ_XCOMP_PERM0x1023
> > +
> >  /* TODO: The tmm emulation is temporary for current
> > AMX implementation with no tmm regclass, should
> > be changed in the future. */
> > @@ -44,6 +55,18 @@ typedef struct __tile
> >  /* Stride (colum width in byte) used for tileload/store */  #define
> > _STRIDE 64
> >
> > +/* We need syscall to use amx functions */ int
> > +request_perm_xtile_data() {
> > +  unsigned long bitmask;
> > +
> > +  if (syscall (SYS_arch_prctl, ARCH_REQ_XCOMP_PERM,
> XFEATURE_XTILEDATA) ||
> > +  syscall (SYS_arch_prctl, ARCH_GET_XCOMP_PERM, ))
> > +return 0;
> > +
> > +  return (bitmask & XFEATURE_MASK_XTILE) != 0; }
> > +
> >  /* Initialize tile config by setting all tmm size to 16x64 */  void
> > init_tile_config (__tilecfg_u *dst)  { @@ -186,6 +209,7 @@ main ()
> > #ifdef AMX_BF16
> >&& __builtin_cpu_supports ("amx-bf16")  #endif
> > +  && request_perm_xtile_data ()
> >)
> >  {
> >DO_TEST ();
> > --
> > 2.18.2
> >


0001-i386-Add-syscall-to-enable-AMX-for-latest-kernels.patch
Description: 0001-i386-Add-syscall-to-enable-AMX-for-latest-kernels.patch

RE: [PATCH] [i386]Add combine splitter to transform pxor/pcmpeqb/pmovmskb/cmp 0xffff to ptest.

2022-05-12 Thread Jiang, Haochen via Gcc-patches

> -Original Message-
> From: Uros Bizjak 
> Sent: Thursday, May 12, 2022 5:12 PM
> To: Jiang, Haochen 
> Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao ; Hongyu
> Wang 
> Subject: Re: [PATCH] [i386]Add combine splitter to transform
> pxor/pcmpeqb/pmovmskb/cmp 0x to ptest.
> 
> On Thu, May 12, 2022 at 5:01 AM Jiang, Haochen 
> wrote:
> >
> > Hi all,
> >
> > I just refined this patch with more explanation in commit message.
> 
> The ChangeLog entry should in fact read as:
> 
> PR target/104371
> * config/i386/sse.md (vi1avx2const): New define_mode_attr.
> (pxor/pcmpeqb/pmovmskb/cmp 0x to ptest splitter):
> New define_split pattern.

Fixed ChangeLog with this.

Thx,
Haochen

> 
> Please see  [1].
> 
> [1] https://www.gnu.org/prep/standards/html_node/Change-
> Logs.html#Change-Logs
> 
> OK with the fixed ChangeLog.
> 
> Uros.
> 
> > No code change compare to last change, which removed ix86_match_ccmode.
> >
> > Ok for trunk?
> >
> > BRs,
> > Haochen
> >
> > > -Original Message-
> > > From: Jiang, Haochen
> > > Sent: Saturday, May 7, 2022 9:55 AM
> > > To: Uros Bizjak 
> > > Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao 
> > > Subject: RE: [PATCH] [i386]Add combine splitter to transform
> > > pxor/pcmpeqb/pmovmskb/cmp 0x to ptest.
> > >
> > >
> > >
> > > > -Original Message-
> > > > From: Uros Bizjak 
> > > > Sent: Friday, May 6, 2022 4:59 PM
> > > > To: Jiang, Haochen 
> > > > Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao 
> > > > Subject: Re: [PATCH] [i386]Add combine splitter to transform
> > > > pxor/pcmpeqb/pmovmskb/cmp 0x to ptest.
> > > >
> > > > On Fri, May 6, 2022 at 10:01 AM Haochen Jiang
> > > > 
> > > > wrote:
> > > > >
> > > > > Hi all,
> > > > >
> > > > > This patch aims to add a combine splitter to transform
> > > pxor/pcmpeqb/pmovmskb/cmp 0x to ptest.
> > > > >
> > > > > Regtested on x86_64-pc-linux-gnu. Ok for trunk?
> > > > >
> > > > > BRs,
> > > > > Haochen
> > > > >
> > > > > gcc/ChangeLog:
> > > > >
> > > > > PR target/104371
> > > > > * config/i386/sse.md: Add new define_mode_attr and 
> > > > > define_split.
> > > > >
> > > > > gcc/testsuite/ChangeLog:
> > > > >
> > > > > PR target/104371
> > > > > * gcc.target/i386/pr104371-1.c: New test.
> > > > > * gcc.target/i386/pr104371-2.c: Ditto.
> > > > > ---
> > > > >  gcc/config/i386/sse.md | 19 +++
> > > > >  gcc/testsuite/gcc.target/i386/pr104371-1.c | 14 ++
> > > > > gcc/testsuite/gcc.target/i386/pr104371-2.c | 14 ++
> > > > >  3 files changed, 47 insertions(+)  create mode 100644
> > > > > gcc/testsuite/gcc.target/i386/pr104371-1.c
> > > > >  create mode 100755 gcc/testsuite/gcc.target/i386/pr104371-2.c
> > > > >
> > > > > diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> > > > > index 7b791def542..71afda73c8f 100644
> > > > > --- a/gcc/config/i386/sse.md
> > > > > +++ b/gcc/config/i386/sse.md
> > > > > @@ -20083,6 +20083,25 @@
> > > > > (set_attr "prefix" "maybe_vex")
> > > > > (set_attr "mode" "SI")])
> > > > >
> > > > > +;; Optimize pxor/pcmpeqb/pmovmskb/cmp 0x to ptest.
> > > > > +(define_mode_attr vi1avx2const
> > > > > +  [(V32QI "0x") (V16QI "0x")])
> > > > > +
> > > > > +(define_split
> > > > > +  [(set (reg:CCZ FLAGS_REG)
> > > > > +   (compare:CCZ (unspec:SI
> > > > > +   [(eq:VI1_AVX2
> > > > > +   (match_operand:VI1_AVX2 0 
> > > > > "vector_operand")
> > > > > +   (match_operand:VI1_AVX2 1 
> > > > > "const0_operand"))]
> > > > > +   UNSPEC_MOVMSK)
> > > > > +(match_operand 2 "const_int_operand")))]
> > &

RE: [PATCH] [i386]Add combine splitter to transform pxor/pcmpeqb/pmovmskb/cmp 0xffff to ptest.

2022-05-11 Thread Jiang, Haochen via Gcc-patches

Hi all,

I just refined this patch with more explanation in commit message.

No code change compare to last change, which removed ix86_match_ccmode.

Ok for trunk?

BRs,
Haochen

> -Original Message-
> From: Jiang, Haochen
> Sent: Saturday, May 7, 2022 9:55 AM
> To: Uros Bizjak 
> Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao 
> Subject: RE: [PATCH] [i386]Add combine splitter to transform
> pxor/pcmpeqb/pmovmskb/cmp 0x to ptest.
> 
> 
> 
> > -Original Message-
> > From: Uros Bizjak 
> > Sent: Friday, May 6, 2022 4:59 PM
> > To: Jiang, Haochen 
> > Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao 
> > Subject: Re: [PATCH] [i386]Add combine splitter to transform
> > pxor/pcmpeqb/pmovmskb/cmp 0x to ptest.
> >
> > On Fri, May 6, 2022 at 10:01 AM Haochen Jiang 
> > wrote:
> > >
> > > Hi all,
> > >
> > > This patch aims to add a combine splitter to transform
> pxor/pcmpeqb/pmovmskb/cmp 0x to ptest.
> > >
> > > Regtested on x86_64-pc-linux-gnu. Ok for trunk?
> > >
> > > BRs,
> > > Haochen
> > >
> > > gcc/ChangeLog:
> > >
> > > PR target/104371
> > > * config/i386/sse.md: Add new define_mode_attr and define_split.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > > PR target/104371
> > > * gcc.target/i386/pr104371-1.c: New test.
> > > * gcc.target/i386/pr104371-2.c: Ditto.
> > > ---
> > >  gcc/config/i386/sse.md | 19 +++
> > >  gcc/testsuite/gcc.target/i386/pr104371-1.c | 14 ++
> > > gcc/testsuite/gcc.target/i386/pr104371-2.c | 14 ++
> > >  3 files changed, 47 insertions(+)
> > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr104371-1.c
> > >  create mode 100755 gcc/testsuite/gcc.target/i386/pr104371-2.c
> > >
> > > diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index
> > > 7b791def542..71afda73c8f 100644
> > > --- a/gcc/config/i386/sse.md
> > > +++ b/gcc/config/i386/sse.md
> > > @@ -20083,6 +20083,25 @@
> > > (set_attr "prefix" "maybe_vex")
> > > (set_attr "mode" "SI")])
> > >
> > > +;; Optimize pxor/pcmpeqb/pmovmskb/cmp 0x to ptest.
> > > +(define_mode_attr vi1avx2const
> > > +  [(V32QI "0x") (V16QI "0x")])
> > > +
> > > +(define_split
> > > +  [(set (reg:CCZ FLAGS_REG)
> > > +   (compare:CCZ (unspec:SI
> > > +   [(eq:VI1_AVX2
> > > +   (match_operand:VI1_AVX2 0 "vector_operand")
> > > +   (match_operand:VI1_AVX2 1 "const0_operand"))]
> > > +   UNSPEC_MOVMSK)
> > > +(match_operand 2 "const_int_operand")))]
> > > +  "TARGET_SSE4_1 && ix86_match_ccmode (insn, CCmode)
> >
> > No need to use ix86_match_ccmode here, the pattern is already limited to
> > CCZmode,
> >
> > Uros.
> >
> 
> Removed this condition in my new patch, also make the testcase change
> according to
> Hongyu's review.
> 
> Is the patch Ok for trunk?
> 
> Haochen
> 
> > > +  && (INTVAL (operands[2]) == (int) ())"
> > > +  [(set (reg:CC FLAGS_REG)
> > > +   (unspec:CC [(match_dup 0)
> > > +   (match_dup 0)]
> > > +  UNSPEC_PTEST))])
> > > +
> > >  (define_expand "sse2_maskmovdqu"
> > >[(set (match_operand:V16QI 0 "memory_operand")
> > > (unspec:V16QI [(match_operand:V16QI 1 "register_operand") diff
> > > --git a/gcc/testsuite/gcc.target/i386/pr104371-1.c
> > > b/gcc/testsuite/gcc.target/i386/pr104371-1.c
> > > new file mode 100644
> > > index 000..df7c0b074e3
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/i386/pr104371-1.c
> > > @@ -0,0 +1,14 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-options "-O2 -msse4" } */
> > > +/* { dg-final { scan-assembler "ptest\[ \\t\]" } } */
> > > +/* { dg-final { scan-assembler-not "pxor\[ \\t\]" } } */
> > > +/* { dg-final { scan-assembler-not "pcmpeqb\[ \\t\]" } } */
> > > +/* { dg-final { scan-assembler-not "pmovmskb\[ \\t\]" } } */
> > > +
> > > +#include 
> > > +#include 
> > > +

RE: [PATCH] Reconstruct i386 testsuite with __builtin_cpu_supports

2022-05-09 Thread Jiang, Haochen via Gcc-patches

That make sense to me. Thx!

> -Original Message-
> From: Uros Bizjak 
> Sent: Saturday, May 7, 2022 5:04 PM
> To: Jiang, Haochen 
> Cc: Hongyu Wang ; Liu, Hongtao
> ; gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH] Reconstruct i386 testsuite with __builtin_cpu_supports
> 
> On Sat, May 7, 2022 at 3:20 AM Jiang, Haochen 
> wrote:
> >
> > Hi Uros,
> >
> > I understand that we always keep the old testcases there. It is always safe 
> > to
> do that.
> >
> > But I have another question, if we add something new in one of the
> > existing files in the future, should we use __builtin_cpu_supports to keep 
> > the
> code clearer or stick to cpuids?
> 
> We should use __builtin_cpu_supports.
> 
> > I believe __builtin_cpu_supports will be a clearer way for a coder to
> understand under current circumstance.
> > So if we use that in future use, why don't we change everything to the same
> way?
> 
> Because we test the old and the new approach this way.
> 
> Uros.
> 
> > BRs,
> > Haochen
> >
> > -Original Message-
> > From: Uros Bizjak 
> > Sent: Friday, May 6, 2022 5:17 PM
> > To: Hongyu Wang 
> > Cc: Jiang, Haochen ; Liu, Hongtao
> > ; gcc-patches@gcc.gnu.org
> > Subject: Re: [PATCH] Reconstruct i386 testsuite with
> > __builtin_cpu_supports
> >
> > On Fri, May 6, 2022 at 11:00 AM Hongyu Wang 
> wrote:
> > >
> > > > I don't think *_os_support calls should be removed. IIRC,
> > > > __builtin_cpu_supports function checks if the feature is supported
> > > > by CPU, whereas *_os_supports calls check via xgetbv if OS
> > > > supports handling of new registers.
> > >
> > > avx_os_support is like
> > >
> > > avx_os_support (void)
> > > {
> > >   unsigned int eax, edx;
> > >   unsigned int ecx = XCR_XFEATURE_ENABLED_MASK;
> > >
> > >   __asm__ ("xgetbv" : "=a" (eax), "=d" (edx) : "c" (ecx));
> > >
> > >   return (eax & (XSTATE_SSE | XSTATE_YMM)) == (XSTATE_SSE |
> > > XSTATE_YMM); }
> > >
> > > While in get_avaliable_features we have
> > >
> > > #define XCR_AVX_ENABLED_MASK \
> > >   (XSTATE_SSE | XSTATE_YMM)
> > >   if ((ecx & bit_OSXSAVE))
> > > {
> > >   /* Check if XMM, YMM, OPMASK, upper 256 bits of ZMM0-ZMM15 and
> > > ZMM16-ZMM31 states are supported by OSXSAVE.  */
> > >   unsigned int xcrlow;
> > >   unsigned int xcrhigh;
> > >   __asm__ (".byte 0x0f, 0x01, 0xd0" /* xgetbv  */
> > >: "=a" (xcrlow), "=d" (xcrhigh)
> > >: "c" (XCR_XFEATURE_ENABLED_MASK));
> > >   if ((xcrlow & XCR_AVX_ENABLED_MASK) == XCR_AVX_ENABLED_MASK) {
> > >   avx_usable = 1;
> > >
> > > So __builtin_cpu_supports already inherits same check
> >
> > Indeed, thanks for the explanation.
> >
> > OTOH, we don't change the existing tests (perhaps only dg- directives
> > when infrastructure improves), so I would leave the existing testcases
> > as they are. In future, new helper functions should be implemented
> > with __builtin_cpu_supports, but let's leave existing ones as they
> > are.
> >
> > Uros.
> >
> > > Uros Bizjak via Gcc-patches  于2022年5月6
> 日周五
> > > 16:27写道：
> > > >
> > > > On Fri, May 6, 2022 at 9:57 AM Haochen Jiang 
> wrote:
> > > > >
> > > > > Hi all,
> > > > >
> > > > > There are some check files in i386 testsuite are written before the
> function __builtin_cpu_supports is introduced. All of them are using
> __get_cpuid_count. This patch aims to reconstruct the i386 testsuite with
> __builtin_cpu_supports so that we can have a much clearer code.
> > > > >
> > > > > Regtested on x86_64-pc-linux-gnu. Ok for trunk?
> > > >
> > > > I don't think *_os_support calls should be removed. IIRC,
> > > > __builtin_cpu_supports function checks if the feature is supported
> > > > by CPU, whereas *_os_supports calls check via xgetbv if OS
> > > > supports handling of new registers.
> > > >
> > > > Uros.
> > > >
> > > > >
> > > > > Also when writting this patch, I also find some files in testsuite 
> > > > > that might
> be useless currently. For example, in the file 
> gcc/testsuite/gcc.target/i386/sse-
> os-su

RE: [PATCH] [i386]Add combine splitter to transform pxor/pcmpeqb/pmovmskb/cmp 0xffff to ptest.

2022-05-06 Thread Jiang, Haochen via Gcc-patches



> -Original Message-
> From: Uros Bizjak 
> Sent: Friday, May 6, 2022 4:59 PM
> To: Jiang, Haochen 
> Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao 
> Subject: Re: [PATCH] [i386]Add combine splitter to transform
> pxor/pcmpeqb/pmovmskb/cmp 0x to ptest.
> 
> On Fri, May 6, 2022 at 10:01 AM Haochen Jiang 
> wrote:
> >
> > Hi all,
> >
> > This patch aims to add a combine splitter to transform 
> > pxor/pcmpeqb/pmovmskb/cmp 0x to ptest.
> >
> > Regtested on x86_64-pc-linux-gnu. Ok for trunk?
> >
> > BRs,
> > Haochen
> >
> > gcc/ChangeLog:
> >
> > PR target/104371
> > * config/i386/sse.md: Add new define_mode_attr and define_split.
> >
> > gcc/testsuite/ChangeLog:
> >
> > PR target/104371
> > * gcc.target/i386/pr104371-1.c: New test.
> > * gcc.target/i386/pr104371-2.c: Ditto.
> > ---
> >  gcc/config/i386/sse.md | 19 +++
> >  gcc/testsuite/gcc.target/i386/pr104371-1.c | 14 ++
> > gcc/testsuite/gcc.target/i386/pr104371-2.c | 14 ++
> >  3 files changed, 47 insertions(+)
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr104371-1.c
> >  create mode 100755 gcc/testsuite/gcc.target/i386/pr104371-2.c
> >
> > diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index
> > 7b791def542..71afda73c8f 100644
> > --- a/gcc/config/i386/sse.md
> > +++ b/gcc/config/i386/sse.md
> > @@ -20083,6 +20083,25 @@
> > (set_attr "prefix" "maybe_vex")
> > (set_attr "mode" "SI")])
> >
> > +;; Optimize pxor/pcmpeqb/pmovmskb/cmp 0x to ptest.
> > +(define_mode_attr vi1avx2const
> > +  [(V32QI "0x") (V16QI "0x")])
> > +
> > +(define_split
> > +  [(set (reg:CCZ FLAGS_REG)
> > +   (compare:CCZ (unspec:SI
> > +   [(eq:VI1_AVX2
> > +   (match_operand:VI1_AVX2 0 "vector_operand")
> > +   (match_operand:VI1_AVX2 1 "const0_operand"))]
> > +   UNSPEC_MOVMSK)
> > +(match_operand 2 "const_int_operand")))]
> > +  "TARGET_SSE4_1 && ix86_match_ccmode (insn, CCmode)
> 
> No need to use ix86_match_ccmode here, the pattern is already limited to
> CCZmode,
> 
> Uros.
> 

Removed this condition in my new patch, also make the testcase change according 
to
Hongyu's review.

Is the patch Ok for trunk?

Haochen

> > +  && (INTVAL (operands[2]) == (int) ())"
> > +  [(set (reg:CC FLAGS_REG)
> > +   (unspec:CC [(match_dup 0)
> > +   (match_dup 0)]
> > +  UNSPEC_PTEST))])
> > +
> >  (define_expand "sse2_maskmovdqu"
> >[(set (match_operand:V16QI 0 "memory_operand")
> > (unspec:V16QI [(match_operand:V16QI 1 "register_operand") diff
> > --git a/gcc/testsuite/gcc.target/i386/pr104371-1.c
> > b/gcc/testsuite/gcc.target/i386/pr104371-1.c
> > new file mode 100644
> > index 000..df7c0b074e3
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/pr104371-1.c
> > @@ -0,0 +1,14 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -msse4" } */
> > +/* { dg-final { scan-assembler "ptest\[ \\t\]" } } */
> > +/* { dg-final { scan-assembler-not "pxor\[ \\t\]" } } */
> > +/* { dg-final { scan-assembler-not "pcmpeqb\[ \\t\]" } } */
> > +/* { dg-final { scan-assembler-not "pmovmskb\[ \\t\]" } } */
> > +
> > +#include 
> > +#include 
> > +
> > +bool is_zero(__m128i x)
> > +{
> > +  return _mm_movemask_epi8(_mm_cmpeq_epi8(x, _mm_setzero_si128()))
> ==
> > +0x; }
> > diff --git a/gcc/testsuite/gcc.target/i386/pr104371-2.c
> > b/gcc/testsuite/gcc.target/i386/pr104371-2.c
> > new file mode 100755
> > index 000..f0d0afd5897
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/pr104371-2.c
> > @@ -0,0 +1,14 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -mavx2" } */
> > +/* { dg-final { scan-assembler "vptest\[ \\t\]" } } */
> > +/* { dg-final { scan-assembler-not "vpxor\[ \\t\]" } } */
> > +/* { dg-final { scan-assembler-not "vpcmpeqb\[ \\t\]" } } */
> > +/* { dg-final { scan-assembler-not "vpmovmskb\[ \\t\]" } } */
> > +
> > +#include 
> > +#include 
> > +
> > +bool is_zero256(__m256i x)
> > +{
> > +  return _mm256_movemask_epi8(_mm256_cmpeq_epi8(x,
> > +_mm256_setzero_si256())) == 0x; }
> > --
> > 2.18.1
> >


0001-i386-Add-combine-splitter-to-transform-pxor-pcmpeqb-.patch
Description: 0001-i386-Add-combine-splitter-to-transform-pxor-pcmpeqb-.patch

RE: [PATCH] [i386]Add combine splitter to transform pxor/pcmpeqb/pmovmskb/cmp 0xffff to ptest.

2022-05-06 Thread Jiang, Haochen via Gcc-patches

> -Original Message-
> From: Hongyu Wang 
> Sent: Friday, May 6, 2022 4:50 PM
> To: Jiang, Haochen 
> Cc: GCC Patches ; Liu, Hongtao
> 
> Subject: Re: [PATCH] [i386]Add combine splitter to transform
> pxor/pcmpeqb/pmovmskb/cmp 0x to ptest.
> 
> > +(define_split
> > +  [(set (reg:CCZ FLAGS_REG)
> > +   (compare:CCZ (unspec:SI
> > +   [(eq:VI1_AVX2
> > +   (match_operand:VI1_AVX2 0 "vector_operand")
> > +   (match_operand:VI1_AVX2 1 "const0_operand"))]
> > +   UNSPEC_MOVMSK)
> > +(match_operand 2 "const_int_operand")))]
> > +  "TARGET_SSE4_1 && ix86_match_ccmode (insn, CCmode)
> 
> It looks like set_src and set_dst are all CCZmode, do we really need
> ix86_match_ccmode?
> 
> > +  && (INTVAL (operands[2]) == (int) ())"
> 
> I think (int) convert is not needed for const, and INTVAL actually
> returns HOST_WIDE_INT

It should be int convert here, because we need 0xfff become -1 in this 
compare.

Haochen.

> 
> > +#include 
> > +
> > +bool is_zero(__m128i x)
> 
> bool is not necessary here, we can use int and drop stdbool.
> 
> Haochen Jiang via Gcc-patches  于2022年5月6
> 日周五 16:01写道：
> >
> > Hi all,
> >
> > This patch aims to add a combine splitter to transform
> pxor/pcmpeqb/pmovmskb/cmp 0x to ptest.
> >
> > Regtested on x86_64-pc-linux-gnu. Ok for trunk?
> >
> > BRs,
> > Haochen
> >
> > gcc/ChangeLog:
> >
> > PR target/104371
> > * config/i386/sse.md: Add new define_mode_attr and define_split.
> >
> > gcc/testsuite/ChangeLog:
> >
> > PR target/104371
> > * gcc.target/i386/pr104371-1.c: New test.
> > * gcc.target/i386/pr104371-2.c: Ditto.
> > ---
> >  gcc/config/i386/sse.md | 19 +++
> >  gcc/testsuite/gcc.target/i386/pr104371-1.c | 14 ++
> >  gcc/testsuite/gcc.target/i386/pr104371-2.c | 14 ++
> >  3 files changed, 47 insertions(+)
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr104371-1.c
> >  create mode 100755 gcc/testsuite/gcc.target/i386/pr104371-2.c
> >
> > diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> > index 7b791def542..71afda73c8f 100644
> > --- a/gcc/config/i386/sse.md
> > +++ b/gcc/config/i386/sse.md
> > @@ -20083,6 +20083,25 @@
> > (set_attr "prefix" "maybe_vex")
> > (set_attr "mode" "SI")])
> >
> > +;; Optimize pxor/pcmpeqb/pmovmskb/cmp 0x to ptest.
> > +(define_mode_attr vi1avx2const
> > +  [(V32QI "0x") (V16QI "0x")])
> > +
> > +(define_split
> > +  [(set (reg:CCZ FLAGS_REG)
> > +   (compare:CCZ (unspec:SI
> > +   [(eq:VI1_AVX2
> > +   (match_operand:VI1_AVX2 0 "vector_operand")
> > +   (match_operand:VI1_AVX2 1 "const0_operand"))]
> > +   UNSPEC_MOVMSK)
> > +(match_operand 2 "const_int_operand")))]
> > +  "TARGET_SSE4_1 && ix86_match_ccmode (insn, CCmode)
> > +  && (INTVAL (operands[2]) == (int) ())"
> > +  [(set (reg:CC FLAGS_REG)
> > +   (unspec:CC [(match_dup 0)
> > +   (match_dup 0)]
> > +  UNSPEC_PTEST))])
> > +
> >  (define_expand "sse2_maskmovdqu"
> >[(set (match_operand:V16QI 0 "memory_operand")
> > (unspec:V16QI [(match_operand:V16QI 1 "register_operand")
> > diff --git a/gcc/testsuite/gcc.target/i386/pr104371-1.c
> b/gcc/testsuite/gcc.target/i386/pr104371-1.c
> > new file mode 100644
> > index 000..df7c0b074e3
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/pr104371-1.c
> > @@ -0,0 +1,14 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -msse4" } */
> > +/* { dg-final { scan-assembler "ptest\[ \\t\]" } } */
> > +/* { dg-final { scan-assembler-not "pxor\[ \\t\]" } } */
> > +/* { dg-final { scan-assembler-not "pcmpeqb\[ \\t\]" } } */
> > +/* { dg-final { scan-assembler-not "pmovmskb\[ \\t\]" } } */
> > +
> > +#include 
> > +#include 
> > +
> > +bool is_zero(__m128i x)
> > +{
> > +  return _mm_movemask_epi8(_mm_cmpeq_epi8(x, _mm_setzero_si128()))
> == 0x;
> > +}
> > diff --git a/gcc/testsuite/gcc.target/i386/pr104371-2.c
> b/gcc/testsuite/gcc.target/i386/pr104371-2.c
> > new file mode 100755
> > index 000..f0d0afd5897
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/pr104371-2.c
> > @@ -0,0 +1,14 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -mavx2" } */
> > +/* { dg-final { scan-assembler "vptest\[ \\t\]" } } */
> > +/* { dg-final { scan-assembler-not "vpxor\[ \\t\]" } } */
> > +/* { dg-final { scan-assembler-not "vpcmpeqb\[ \\t\]" } } */
> > +/* { dg-final { scan-assembler-not "vpmovmskb\[ \\t\]" } } */
> > +
> > +#include 
> > +#include 
> > +
> > +bool is_zero256(__m256i x)
> > +{
> > +  return _mm256_movemask_epi8(_mm256_cmpeq_epi8(x,
> _mm256_setzero_si256())) == 0x;
> > +}
> > --
> > 2.18.1
> >

RE: [PATCH] Reconstruct i386 testsuite with __builtin_cpu_supports

2022-05-06 Thread Jiang, Haochen via Gcc-patches

Hi Uros,

I understand that we always keep the old testcases there. It is always safe to 
do that.

But I have another question, if we add something new in one of the existing 
files in the future,
should we use __builtin_cpu_supports to keep the code clearer or stick to 
cpuids?

I believe __builtin_cpu_supports will be a clearer way for a coder to 
understand under current circumstance.
So if we use that in future use, why don't we change everything to the same way?

BRs,
Haochen 

-Original Message-
From: Uros Bizjak  
Sent: Friday, May 6, 2022 5:17 PM
To: Hongyu Wang 
Cc: Jiang, Haochen ; Liu, Hongtao 
; gcc-patches@gcc.gnu.org
Subject: Re: [PATCH] Reconstruct i386 testsuite with __builtin_cpu_supports

On Fri, May 6, 2022 at 11:00 AM Hongyu Wang  wrote:
>
> > I don't think *_os_support calls should be removed. IIRC,
> > __builtin_cpu_supports function checks if the feature is supported by
> > CPU, whereas *_os_supports calls check via xgetbv if OS supports
> > handling of new registers.
>
> avx_os_support is like
>
> avx_os_support (void)
> {
>   unsigned int eax, edx;
>   unsigned int ecx = XCR_XFEATURE_ENABLED_MASK;
>
>   __asm__ ("xgetbv" : "=a" (eax), "=d" (edx) : "c" (ecx));
>
>   return (eax & (XSTATE_SSE | XSTATE_YMM)) == (XSTATE_SSE | XSTATE_YMM);
> }
>
> While in get_avaliable_features we have
>
> #define XCR_AVX_ENABLED_MASK \
>   (XSTATE_SSE | XSTATE_YMM)
>   if ((ecx & bit_OSXSAVE))
> {
>   /* Check if XMM, YMM, OPMASK, upper 256 bits of ZMM0-ZMM15 and
> ZMM16-ZMM31 states are supported by OSXSAVE.  */
>   unsigned int xcrlow;
>   unsigned int xcrhigh;
>   __asm__ (".byte 0x0f, 0x01, 0xd0" /* xgetbv  */
>: "=a" (xcrlow), "=d" (xcrhigh)
>: "c" (XCR_XFEATURE_ENABLED_MASK));
>   if ((xcrlow & XCR_AVX_ENABLED_MASK) == XCR_AVX_ENABLED_MASK)
> {
>   avx_usable = 1;
>
> So __builtin_cpu_supports already inherits same check

Indeed, thanks for the explanation.

OTOH, we don't change the existing tests (perhaps only dg- directives
when infrastructure improves), so I would leave the existing testcases
as they are. In future, new helper functions should be implemented
with __builtin_cpu_supports, but let's leave existing ones as they
are.

Uros.

> Uros Bizjak via Gcc-patches  于2022年5月6日周五 16:27写道：
> >
> > On Fri, May 6, 2022 at 9:57 AM Haochen Jiang  
> > wrote:
> > >
> > > Hi all,
> > >
> > > There are some check files in i386 testsuite are written before the 
> > > function __builtin_cpu_supports is introduced. All of them are using 
> > > __get_cpuid_count. This patch aims to reconstruct the i386 testsuite with 
> > > __builtin_cpu_supports so that we can have a much clearer code.
> > >
> > > Regtested on x86_64-pc-linux-gnu. Ok for trunk?
> >
> > I don't think *_os_support calls should be removed. IIRC,
> > __builtin_cpu_supports function checks if the feature is supported by
> > CPU, whereas *_os_supports calls check via xgetbv if OS supports
> > handling of new registers.
> >
> > Uros.
> >
> > >
> > > Also when writting this patch, I also find some files in testsuite that 
> > > might be useless currently. For example, in the file 
> > > gcc/testsuite/gcc.target/i386/sse-os-support.h, it always return 1. And 
> > > there are also some files will no longer be included at all with this 
> > > patch. Should we remove those files when we have time?
> > >
> > > BRs,
> > > Haochen
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > > * gcc.target/i386/adx-check.h: Change bit check to
> > > __builtin_cpu_supports.
> > > * gcc.target/i386/aes-avx-check.h: Ditto.
> > > * gcc.target/i386/aes-check.h: Ditto.
> > > * gcc.target/i386/avx-check.h: Ditto.
> > > * gcc.target/i386/avx2-check.h: Ditto.
> > > * gcc.target/i386/avx512-check.h: Ditto.
> > > * gcc.target/i386/bmi-check.h: Ditto.
> > > * gcc.target/i386/bmi2-check.h: Ditto.
> > > * gcc.target/i386/f16c-check.h: Ditto.
> > > * gcc.target/i386/fma-check.h: Ditto.
> > > * gcc.target/i386/fma4-check.h: Ditto.
> > > * gcc.target/i386/lzcnt-check.h: Ditto.
> > > * gcc.target/i386/mmx-3dnow-check.h: Ditto.
> > > * gcc.target/i386/mmx-check.h: Ditto.
> > > * gcc.target/i386/pclmul-avx-check.h: Ditto.
> > > * gcc.target/i386/pclmul-check.h: Ditto.
> > >

RE: [PATCH] [i386] Optimize a ^ ((a ^ b) & mask) to (~mask & a) | (b & mask).

2022-01-12 Thread Jiang, Haochen via Gcc-patches

Hi Uros,

Has fixed that format issue with this new patch. Ok for trunk?

Thx,
Haochen

-Original Message-
From: Uros Bizjak  
Sent: Thursday, January 13, 2022 3:22 AM
To: Jiang, Haochen 
Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao 
Subject: Re: [PATCH] [i386] Optimize a ^ ((a ^ b) & mask) to (~mask & a) | (b & 
mask).

On Wed, Jan 12, 2022 at 9:11 AM Haochen Jiang  wrote:
>
> Hi all,
>
> This patch targets PR94790, which change the instruction selection under the 
> following circumstance.
>
> Regtested on x86_64-pc-linux-gnu. Ok for trunk?

Please also test with -m32, e.g.:

make -j 12 -k check RUNTESTFLAGS="--target_board=unix\{,-m32\}"

OK (with an it below), if new testcases do not FAIL with -m32.

Thanks,
Uros.

>
> BRs,
> Haochen
>
> From the perspective of the pipeline, `andn + and + ior` version take
> 2 cycles(AND and ANDN doesn't have dependence), but xor + and + xor 
> will take 3 cycles.
>
> -   xorl%edi, %esi
> andl%edx, %esi
> -   movl%esi, %eax
> -   xorl%edi, %eax
> +   andn%edi, %edx, %eax
> +   orl %esi, %eax
>
> gcc/ChangeLog:
>
> PR taeget/94790
> * config/i386/i386.md (*xor2andn): New define_insn_and_split.
>
> gcc/testsuite/ChangeLog:
>
> PR taeget/94790
> * gcc.target/i386/pr94790-1.c: New test.
> * gcc.target/i386/pr94790-2.c: Ditto.
> ---
>  gcc/config/i386/i386.md   | 39 +++
>  gcc/testsuite/gcc.target/i386/pr94790-1.c | 14   
> gcc/testsuite/gcc.target/i386/pr94790-2.c |  9 ++
>  3 files changed, 62 insertions(+)
>  create mode 100755 gcc/testsuite/gcc.target/i386/pr94790-1.c
>  create mode 100755 gcc/testsuite/gcc.target/i386/pr94790-2.c
>
> diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index 
> 9b424a3935b..38efc6d5837 100644
> --- a/gcc/config/i386/i386.md
> +++ b/gcc/config/i386/i386.md
> @@ -10452,6 +10452,45 @@
> (set_attr "znver1_decode" "double")
> (set_attr "mode" "DI")])
>
> +;; PR target/94790: Optimize a ^ ((a ^ b) & mask) to (~mask & a) | (b 
> +& mask) (define_insn_and_split "*xor2andn"
> +  [(set (match_operand:SWI248 0 "nonimmediate_operand")
> +   (xor:SWI248
> + (and:SWI248
> +   (xor:SWI248
> + (match_operand:SWI248 1 "nonimmediate_operand")
> + (match_operand:SWI248 2 "nonimmediate_operand"))
> +   (match_operand:SWI248 3 "nonimmediate_operand"))
> + (match_dup 1)))
> +(clobber (reg:CC FLAGS_REG))]
> +  "(TARGET_BMI || TARGET_AVX512BW)
> +   && ix86_pre_reload_split ()"
> +  "#"
> +  "&& 1"
> +  [(parallel [(set (match_dup 4)
> +   (and:SWI248
> + (not:SWI248
> +   (match_dup 3))
> + (match_dup 1)))
> + (clobber (reg:CC FLAGS_REG))])
> +   (parallel [(set (match_dup 5)
> +   (and:SWI248
> + (match_dup 2)
> + (match_dup 3)))
> + (clobber (reg:CC FLAGS_REG))])
> +   (parallel [(set (match_dup 0)
> +   (ior:SWI248
> + (match_dup 4)
> + (match_dup 5)))
> + (clobber (reg:CC FLAGS_REG))])]
> +  {
> +operands[1] = force_reg (mode, operands[1]);
> +operands[3] = force_reg (mode, operands[3]);
> +operands[4] = gen_reg_rtx (mode);
> +operands[5] = gen_reg_rtx (mode);
> +  }
> +)

Please put brace just after the curved brace, see numerous examples in .md 
files.

> +
>  ;; See comment for addsi_1_zext why we do use nonimmediate_operand  
> (define_insn "*si_1_zext"
>[(set (match_operand:DI 0 "register_operand" "=r") diff --git 
> a/gcc/testsuite/gcc.target/i386/pr94790-1.c 
> b/gcc/testsuite/gcc.target/i386/pr94790-1.c
> new file mode 100755
> index 000..6ebbec15cfd
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr94790-1.c
> @@ -0,0 +1,14 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mbmi" } */
> +/* { dg-final { scan-assembler-times "andn\[ \\t\]" 2 } } */
> +/* { dg-final { scan-assembler-not "xorl\[ \\t\]" } } */
> +
> +unsigned r1(unsigned a, unsigned b, unsigned mask) {
> +  return a ^ ((a ^ b) & mask);
> +}
> +
> +unsigned r2(unsigned a, unsigned b, unsigned mask) {
> +  return (~mask & a) | (b & mask);
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/pr94790-2.c 
> b/gcc/testsuite/gcc.target/i386/pr94790-2.c
> new file mode 100755
> index 000..d7b0eec5bef
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr94790-2.c
> @@ -0,0 +1,9 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mbmi" } */
> +/* { dg-final { scan-assembler-not "andn\[ \\t\]" } } */
> +/* { dg-final { scan-assembler-times "xorl\[ \\t\]" 2 } } */
> +
> +unsigned r1(unsigned a, unsigned b, unsigned mask) {
> +  return a ^ ((a ^ b) & mask) + (a ^ b); }
> --
> 2.18.1
>


0001-i386-Optimize-a-a-b-mask-to-mask-a-b-mask.patch
Description: 0001-i386-Optimize-a-a-b-mask-to-mask-a-b-mask.patch

RE: [PATCH] [i386] Optimize a ^ ((a ^ b) & mask) to (~mask & a) | (b & mask).

2022-01-12 Thread Jiang, Haochen via Gcc-patches

Hi Uros,

I have also tested on -m32. They do not fail.

Thx,
Haochen

-Original Message-
From: Uros Bizjak  
Sent: Thursday, January 13, 2022 3:22 AM
To: Jiang, Haochen 
Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao 
Subject: Re: [PATCH] [i386] Optimize a ^ ((a ^ b) & mask) to (~mask & a) | (b & 
mask).

On Wed, Jan 12, 2022 at 9:11 AM Haochen Jiang  wrote:
>
> Hi all,
>
> This patch targets PR94790, which change the instruction selection under the 
> following circumstance.
>
> Regtested on x86_64-pc-linux-gnu. Ok for trunk?

Please also test with -m32, e.g.:

make -j 12 -k check RUNTESTFLAGS="--target_board=unix\{,-m32\}"

OK (with an it below), if new testcases do not FAIL with -m32.

Thanks,
Uros.

>
> BRs,
> Haochen
>
> From the perspective of the pipeline, `andn + and + ior` version take
> 2 cycles(AND and ANDN doesn't have dependence), but xor + and + xor 
> will take 3 cycles.
>
> -   xorl%edi, %esi
> andl%edx, %esi
> -   movl%esi, %eax
> -   xorl%edi, %eax
> +   andn%edi, %edx, %eax
> +   orl %esi, %eax
>
> gcc/ChangeLog:
>
> PR taeget/94790
> * config/i386/i386.md (*xor2andn): New define_insn_and_split.
>
> gcc/testsuite/ChangeLog:
>
> PR taeget/94790
> * gcc.target/i386/pr94790-1.c: New test.
> * gcc.target/i386/pr94790-2.c: Ditto.
> ---
>  gcc/config/i386/i386.md   | 39 +++
>  gcc/testsuite/gcc.target/i386/pr94790-1.c | 14   
> gcc/testsuite/gcc.target/i386/pr94790-2.c |  9 ++
>  3 files changed, 62 insertions(+)
>  create mode 100755 gcc/testsuite/gcc.target/i386/pr94790-1.c
>  create mode 100755 gcc/testsuite/gcc.target/i386/pr94790-2.c
>
> diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index 
> 9b424a3935b..38efc6d5837 100644
> --- a/gcc/config/i386/i386.md
> +++ b/gcc/config/i386/i386.md
> @@ -10452,6 +10452,45 @@
> (set_attr "znver1_decode" "double")
> (set_attr "mode" "DI")])
>
> +;; PR target/94790: Optimize a ^ ((a ^ b) & mask) to (~mask & a) | (b 
> +& mask) (define_insn_and_split "*xor2andn"
> +  [(set (match_operand:SWI248 0 "nonimmediate_operand")
> +   (xor:SWI248
> + (and:SWI248
> +   (xor:SWI248
> + (match_operand:SWI248 1 "nonimmediate_operand")
> + (match_operand:SWI248 2 "nonimmediate_operand"))
> +   (match_operand:SWI248 3 "nonimmediate_operand"))
> + (match_dup 1)))
> +(clobber (reg:CC FLAGS_REG))]
> +  "(TARGET_BMI || TARGET_AVX512BW)
> +   && ix86_pre_reload_split ()"
> +  "#"
> +  "&& 1"
> +  [(parallel [(set (match_dup 4)
> +   (and:SWI248
> + (not:SWI248
> +   (match_dup 3))
> + (match_dup 1)))
> + (clobber (reg:CC FLAGS_REG))])
> +   (parallel [(set (match_dup 5)
> +   (and:SWI248
> + (match_dup 2)
> + (match_dup 3)))
> + (clobber (reg:CC FLAGS_REG))])
> +   (parallel [(set (match_dup 0)
> +   (ior:SWI248
> + (match_dup 4)
> + (match_dup 5)))
> + (clobber (reg:CC FLAGS_REG))])]
> +  {
> +operands[1] = force_reg (mode, operands[1]);
> +operands[3] = force_reg (mode, operands[3]);
> +operands[4] = gen_reg_rtx (mode);
> +operands[5] = gen_reg_rtx (mode);
> +  }
> +)

Please put brace just after the curved brace, see numerous examples in .md 
files.

> +
>  ;; See comment for addsi_1_zext why we do use nonimmediate_operand  
> (define_insn "*si_1_zext"
>[(set (match_operand:DI 0 "register_operand" "=r") diff --git 
> a/gcc/testsuite/gcc.target/i386/pr94790-1.c 
> b/gcc/testsuite/gcc.target/i386/pr94790-1.c
> new file mode 100755
> index 000..6ebbec15cfd
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr94790-1.c
> @@ -0,0 +1,14 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mbmi" } */
> +/* { dg-final { scan-assembler-times "andn\[ \\t\]" 2 } } */
> +/* { dg-final { scan-assembler-not "xorl\[ \\t\]" } } */
> +
> +unsigned r1(unsigned a, unsigned b, unsigned mask) {
> +  return a ^ ((a ^ b) & mask);
> +}
> +
> +unsigned r2(unsigned a, unsigned b, unsigned mask) {
> +  return (~mask & a) | (b & mask);
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/pr94790-2.c 
> b/gcc/testsuite/gcc.target/i386/pr94790-2.c
> new file mode 100755
> index 000..d7b0eec5bef
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr94790-2.c
> @@ -0,0 +1,9 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mbmi" } */
> +/* { dg-final { scan-assembler-not "andn\[ \\t\]" } } */
> +/* { dg-final { scan-assembler-times "xorl\[ \\t\]" 2 } } */
> +
> +unsigned r1(unsigned a, unsigned b, unsigned mask) {
> +  return a ^ ((a ^ b) & mask) + (a ^ b); }
> --
> 2.18.1
>

RE: [PATCH] [i386] Remove register restriction on operands for andnot insn

2022-01-09 Thread Jiang, Haochen via Gcc-patches

Hi Hongtao,

I have changed that message in this patch. Ok for trunk?

Thx,
Haochen

-Original Message-
From: Hongtao Liu  
Sent: Monday, January 10, 2022 3:25 PM
To: Jiang, Haochen 
Cc: GCC Patches ; Liu, Hongtao 
Subject: Re: [PATCH] [i386] Remove register restriction on operands for andnot 
insn

On Mon, Jan 10, 2022 at 2:23 PM Haochen Jiang via Gcc-patches 
 wrote:
>
> Hi all,
>
> This patch removes the register restriction on operands for andnot insn so 
> that it can be used from memory.
>
> Regtested on x86_64-pc-linux-gnu. Ok for trunk?
>
> BRs,
> Haochen
>
> gcc/ChangeLog:
>
> PR target/53652
> * config/i386/sse.md (*andnot3): Remove register restriction.
It should be "Extend predicate of operands[1] from register_operand to 
vector_operand".
Similar for you commit message.
>
> gcc/testsuite/ChangeLog:
>
> PR target/53652
> * gcc.target/i386/pr53652-1.c: New test.
> ---
>  gcc/config/i386/sse.md|  2 +-
>  gcc/testsuite/gcc.target/i386/pr53652-1.c | 16 
>  2 files changed, 17 insertions(+), 1 deletion(-)  create mode 100644 
> gcc/testsuite/gcc.target/i386/pr53652-1.c
>
> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 
> 0997d9edf9d..4448b875d35 100644
> --- a/gcc/config/i386/sse.md
> +++ b/gcc/config/i386/sse.md
> @@ -16630,7 +16630,7 @@
>  (define_insn "*andnot3"
>[(set (match_operand:VI 0 "register_operand" "=x,x,v")
> (and:VI
> - (not:VI (match_operand:VI 1 "register_operand" "0,x,v"))
> + (not:VI (match_operand:VI 1 "vector_operand" "0,x,v"))
>   (match_operand:VI 2 "bcst_vector_operand" "xBm,xm,vmBr")))]
>"TARGET_SSE"
>  {
> diff --git a/gcc/testsuite/gcc.target/i386/pr53652-1.c 
> b/gcc/testsuite/gcc.target/i386/pr53652-1.c
> new file mode 100644
> index 000..bd07ee29f4d
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr53652-1.c
> @@ -0,0 +1,16 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -msse2" } */
> +/* { dg-final { scan-assembler-times "pandn\[ \\t\]" 2 } } */
> +/* { dg-final { scan-assembler-not "vpternlogq\[ \\t\]" } } */
> +
> +typedef unsigned long long vec __attribute__((vector_size (16))); vec 
> +g; vec f1 (vec a, vec b) {
> +  return ~a
> +}
> +vec f2 (vec a, vec b)
> +{
> +  return ~g
> +}
> +
> --
> 2.18.1
>


-- 
BR,
Hongtao


0001-i386-Extend-predicate-of-operands-1-from-register_op.patch
Description: 0001-i386-Extend-predicate-of-operands-1-from-register_op.patch

RE: [PATCH] [i386]Add combine splitter to transform vpcmpeqd/vpxor/vblendvps to vblendvps for ~op0

2021-12-07 Thread Jiang, Haochen via Gcc-patches

Hi Uros,

I have fixed that in this patch attached for checking in. Is that ok for trunk?

Regtested on x86_64-pc-linux-gnu.

Thx,
Haochen

-Original Message-
From: Uros Bizjak  
Sent: Wednesday, December 8, 2021 12:14 AM
To: Jiang, Haochen 
Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao 
Subject: Re: [PATCH] [i386]Add combine splitter to transform 
vpcmpeqd/vpxor/vblendvps to vblendvps for ~op0

On Tue, Dec 7, 2021 at 3:10 AM Haochen Jiang via Gcc-patches 
 wrote:
>
> This patch adds combine splitter to transform vpcmpeqd/vpxor/vblendvps to 
> vblendvps for ~op0.
>
> OK for trunk?
>
> BRs,
> Haochen
>
> gcc/ChangeLog:
>
> PR target/100738
> * config/i386/sse.md 
> (*_blendv_not_ltint):
> Add new define_insn_and_split.
>
> gcc/testsuite/ChangeLog:
>
> PR target/100738
> * g++.target/i386/pr100738-1.C: New test.

OK with a change below.

Thanks,
Uros.

>
> ---
>  gcc/config/i386/sse.md | 28 ++
>  gcc/testsuite/g++.target/i386/pr100738-1.C | 19 +++
>  2 files changed, 47 insertions(+)
>  create mode 100755 gcc/testsuite/g++.target/i386/pr100738-1.C
>
> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 
> 08bdcddc111..db3506c78d7 100644
> --- a/gcc/config/i386/sse.md
> +++ b/gcc/config/i386/sse.md
> @@ -20659,6 +20659,34 @@
> (set_attr "btver2_decode" "vector,vector,vector")
> (set_attr "mode" "")])
>
> +;; PR target/100738: Transform vpcmpeqd + vpxor + vblendvps to 
> +vblendvps for inverted mask; (define_insn_and_split 
> "*_blendv_not_ltint"
> +  [(set (match_operand: 0 "register_operand")
> +   (unspec:
> + [(match_operand: 1 "register_operand")
> +  (match_operand: 2 "vector_operand")
> +  (subreg:
> +(lt:VI48_AVX
> +  (subreg:VI48_AVX
> +  (not:
> +(match_operand: 3 "register_operand")) 0)
> +  (match_operand:VI48_AVX 4 "const0_operand")) 0)]
> + UNSPEC_BLENDV))]
> +  "TARGET_SSE4_1 && ix86_pre_reload_split ()"
> +  "#"
> +  "&& 1"
> +  [(set (match_dup 0)
> +   (unspec:
> +[(match_dup 2) (match_dup 1) (match_dup 3)] UNSPEC_BLENDV))] 
> +{
> +  operands[0] = gen_lowpart (mode, operands[0]);
> +  operands[1] = gen_lowpart (mode, operands[1]);
> +  operands[2] = gen_lowpart (mode, operands[2]);
> +  operands[3] = gen_lowpart (mode, operands[3]);
> +  if (MEM_P (operands[2]))
> +operands[2] = force_reg (mode, operands[2]);

You don't need to check for MEM_P, force_reg will do it for you.

> +})
> +
>  (define_insn "_dp"
>[(set (match_operand:VF_128_256 0 "register_operand" "=Yr,*x,x")
> (unspec:VF_128_256
> diff --git a/gcc/testsuite/g++.target/i386/pr100738-1.C 
> b/gcc/testsuite/g++.target/i386/pr100738-1.C
> new file mode 100755
> index 000..5a04c5b031f
> --- /dev/null
> +++ b/gcc/testsuite/g++.target/i386/pr100738-1.C
> @@ -0,0 +1,19 @@
> +/* { dg-do compile } */
> +/* { dg-options "-Ofast -mavx2" } */
> +/* { dg-final {scan-assembler-times "vblendvps\[ \\t\]" 2 } } */
> +/* { dg-final {scan-assembler-not "vpcmpeqd\[ \\t\]" } } */
> +/* { dg-final {scan-assembler-not "vpxor\[ \\t\]" } } */
> +
> +typedef int v4si __attribute__((vector_size(16))); typedef char v16qi 
> +__attribute__((vector_size(16)));
> +v4si
> +foo_1 (v16qi a, v4si b, v4si c, v4si d) {
> +  return ((v4si)~a) < 0 ? c : d;
> +}
> +
> +v4si
> +foo_2 (v16qi a, v4si b, v4si c, v4si d) {
> +  return ((v4si)~a) >= 0 ? c : d;
> +}
> --
> 2.18.1
>


0001-i386-Add-combine-splitter-to-transform-vpcmpeqd-vpxo.patch
Description: 0001-i386-Add-combine-splitter-to-transform-vpcmpeqd-vpxo.patch

83 matches

Mail list logo