Re: [PATCH] Enable GCC support for AMX

2020-09-28 Thread Kirill Yukhin via Gcc-patches
Hello,

On 12 сен 01:00, Hongyu Wang wrote:
> Hi
> 
> Thanks for your review, and sorry for the late reply. It took a while
> to finish the runtime test.

Thanks for your fixes! The patch is OK for trunk.

--
Thanks, K


Re: [PATCH] Enable GCC support for AMX

2020-09-04 Thread Kirill Yukhin via Gcc-patches
Hello,

On 03 сен 08:17, H.J. Lu wrote:
> On Thu, Sep 3, 2020 at 8:08 AM Kirill Yukhin via Gcc-patches
>  wrote:
> >
> > Hello,
> >
> > On 06 июл 09:58, Hongyu Wang via Gcc-patches wrote:
> > > Hi:
> > >
> > > This patch is about to support Intel Advanced Matrix Extensions (AMX)
> > > which will be enabled in GLC.
> > >
> > > AMX is a new 64-bit programming paradigm consisting of two
> > > compo nents: a set of 2-dimensional registers (tiles) representing
> > > sub-arrays from a larger 2-dimensional memory image,
> > > and an accelerator able to operate on tiles
> > >
> > > Supported instructions are
> > >
> > > AMX-TILE:ldtilecfg/sttilecfg/tileloadd/tileloaddt1/tilezero/tilerelease
> > > AMX-INT8:tdpbssd/tdpbsud/tdpbusd/tdpbuud
> > > AMX-BF16:tdpbf16ps
> > >
> > > The intrinsics adopts constant tile register number as its input 
> > > parameters.
> > >
> > > For detailed information, please refer to
> > > https://software.intel.com/content/dam/develop/public/us/en/documents/architecture-instruction-set-extensions-programming-reference.pdf
> > >
> > > Bootstrap ok, regression test on i386/x86 backend is ok.
> > >
> > > OK for master?
> >
> > I was trying to apply your patch to recent master and got
> > compilation error:
> >
> > g++ -std=gnu++11  -fno-PIE -c   -g -O2 -DIN_GCC -fno-exceptions 
> > -fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowi
> > ng -Wwrite-strings -Wcast-qual -Wmissing-format-attribute 
> > -Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros -Wn
> > o-overlength-strings -fno-common  -DHAVE_CONFIG_H -I. -I. 
> > -I/export/kyukhin/gcc/src/gcc -I/export/kyukhin/gcc/src/gcc/. -I/expor
> > t/kyukhin/gcc/src/gcc/../include 
> > -I/export/kyukhin/gcc/src/gcc/../libcpp/include  
> > -I/export/kyukhin/gcc/src/gcc/../libdecnumber
> > -I/export/kyukhin/gcc/src/gcc/../libdecnumber/bid -I../libdecnumber 
> > -I/export/kyukhin/gcc/src/gcc/../libbacktrace   -o i386-opti
> > ons.o -MT i386-options.o -MMD -MP -MF ./.deps/i386-options.TPo 
> > /export/kyukhin/gcc/src/gcc/config/i386/i386-options.c
> > /export/kyukhin/gcc/src/gcc/config/i386/i386-options.c: In function ‘bool 
> > ix86_option_override_internal(bool, gcc_options*, gcc_
> > options*)’:
> > /export/kyukhin/gcc/src/gcc/config/i386/i386-options.c:2263:41: error: 
> > ‘PTA_AMX_TILE’ was not declared in this scope
> >   if (((processor_alias_table[i].flags & PTA_AMX_TILE) != 0)
> >  ^
> > /export/kyukhin/gcc/src/gcc/config/i386/i386-options.c:2267:41: error: 
> > ‘PTA_AMX_INT8’ was not declared in this scope
> >   if (((processor_alias_table[i].flags & PTA_AMX_INT8) != 0)
> >  ^
> > /export/kyukhin/gcc/src/gcc/config/i386/i386-options.c:2271:41: error: 
> > ‘PTA_AMX_BF16’ was not declared in this scope
> >   if (((processor_alias_table[i].flags & PTA_AMX_BF16) != 0)
> >
> > Could you please fix that?
> 
> Here is the rebased patch against
> 
> commit 3c219134152f645103f2fcd50735b177ccd76cde
> Author: Jonathan Wakely 
> Date:   Thu Sep 3 12:38:50 2020 +0100
> 
> libstdc++: Optimise GCD algorithms
> 
> Thanks.
> 
> -- 
> H.J.

> diff --git a/gcc/config.gcc b/gcc/config.gcc
> index 797f0ad5edd..d0e59e86a5c 100644
> --- a/gcc/config.gcc
> +++ b/gcc/config.gcc
> @@ -412,7 +412,7 @@ i[34567]86-*-*)
>  waitpkgintrin.h cldemoteintrin.h avx512bf16vlintrin.h
>  avx512bf16intrin.h enqcmdintrin.h serializeintrin.h
>  avx512vp2intersectintrin.h avx512vp2intersectvlintrin.h
> -tsxldtrkintrin.h"
> +tsxldtrkintrin.h amxtileintrin.h amxint8intrin.h 
> amxbf16intrin.h"

Line more than 80 chars.

>   ;;
>  x86_64-*-*)
>   cpu_type=i386
> @@ -447,7 +447,7 @@ x86_64-*-*)
>  waitpkgintrin.h cldemoteintrin.h avx512bf16vlintrin.h
>  avx512bf16intrin.h enqcmdintrin.h serializeintrin.h
>  avx512vp2intersectintrin.h avx512vp2intersectvlintrin.h
> -tsxldtrkintrin.h"
> +tsxldtrkintrin.h amxtileintrin.h amxint8intrin.h 
> amxbf16intrin.h"

Ditto.

> diff --git a/gcc/config/i386/amxbf16intrin.h b/gcc/config/i386/amxbf16intrin.h
> new file mode 100644
> index 000..df0e2262d50
> --- /dev/null
> +++ b/gcc/config/i386/amxbf16intrin.h
> @@ -0,0 +1,25 @@
>

Re: [PATCH] Enable GCC support for AMX

2020-09-03 Thread Kirill Yukhin via Gcc-patches
Hello,

On 06 июл 09:58, Hongyu Wang via Gcc-patches wrote:
> Hi:
> 
> This patch is about to support Intel Advanced Matrix Extensions (AMX)
> which will be enabled in GLC.
> 
> AMX is a new 64-bit programming paradigm consisting of two
> compo nents: a set of 2-dimensional registers (tiles) representing
> sub-arrays from a larger 2-dimensional memory image,
> and an accelerator able to operate on tiles
> 
> Supported instructions are
> 
> AMX-TILE:ldtilecfg/sttilecfg/tileloadd/tileloaddt1/tilezero/tilerelease
> AMX-INT8:tdpbssd/tdpbsud/tdpbusd/tdpbuud
> AMX-BF16:tdpbf16ps
> 
> The intrinsics adopts constant tile register number as its input parameters.
> 
> For detailed information, please refer to
> https://software.intel.com/content/dam/develop/public/us/en/documents/architecture-instruction-set-extensions-programming-reference.pdf
> 
> Bootstrap ok, regression test on i386/x86 backend is ok.
> 
> OK for master?

I was trying to apply your patch to recent master and got
compilation error:

g++ -std=gnu++11  -fno-PIE -c   -g -O2 -DIN_GCC -fno-exceptions -fno-rtti 
-fasynchronous-unwind-tables -W -Wall -Wno-narrowi
ng -Wwrite-strings -Wcast-qual -Wmissing-format-attribute -Woverloaded-virtual 
-pedantic -Wno-long-long -Wno-variadic-macros -Wn
o-overlength-strings -fno-common  -DHAVE_CONFIG_H -I. -I. 
-I/export/kyukhin/gcc/src/gcc -I/export/kyukhin/gcc/src/gcc/. -I/expor
t/kyukhin/gcc/src/gcc/../include 
-I/export/kyukhin/gcc/src/gcc/../libcpp/include  
-I/export/kyukhin/gcc/src/gcc/../libdecnumber 
-I/export/kyukhin/gcc/src/gcc/../libdecnumber/bid -I../libdecnumber 
-I/export/kyukhin/gcc/src/gcc/../libbacktrace   -o i386-opti
ons.o -MT i386-options.o -MMD -MP -MF ./.deps/i386-options.TPo 
/export/kyukhin/gcc/src/gcc/config/i386/i386-options.c
/export/kyukhin/gcc/src/gcc/config/i386/i386-options.c: In function ‘bool 
ix86_option_override_internal(bool, gcc_options*, gcc_
options*)’:
/export/kyukhin/gcc/src/gcc/config/i386/i386-options.c:2263:41: error: 
‘PTA_AMX_TILE’ was not declared in this scope
  if (((processor_alias_table[i].flags & PTA_AMX_TILE) != 0)
 ^
/export/kyukhin/gcc/src/gcc/config/i386/i386-options.c:2267:41: error: 
‘PTA_AMX_INT8’ was not declared in this scope
  if (((processor_alias_table[i].flags & PTA_AMX_INT8) != 0)
 ^
/export/kyukhin/gcc/src/gcc/config/i386/i386-options.c:2271:41: error: 
‘PTA_AMX_BF16’ was not declared in this scope
  if (((processor_alias_table[i].flags & PTA_AMX_BF16) != 0)

Could you please fix that?


--
K

PS: Please excuse me for late response.


Re: [PATCH] x86: Detect Rocket Lake and Alder Lake

2020-08-19 Thread Kirill Yukhin via Gcc-patches
Hello,

On 16 авг 06:17, H.J. Lu via Gcc-patches wrote:
> From arch/x86/include/asm/intel-family.h on Linux kernel master branch:
> 
>  #define INTEL_FAM6_ROCKETLAKE   0xA7
>  #define INTEL_FAM6_ALDERLAKE0x97
> 
>   * common/config/i386/cpuinfo.h (get_intel_cpu): Detect Rocket
>   Lake and Alder Lake.

Your patch is OK for trunk.

--
K


Re: [PATCH][AVX512][PR96246] Merge two define_insn: _blendm, _load_mask.

2020-08-12 Thread Kirill Yukhin via Gcc-patches
Hello,

On 22 июл 12:59, Hongtao Liu via Gcc-patches wrote:
>   Those two define_insns have same pattern, and
> _load_mask would always be matched since it show up
> earlier in the md file, and it may lose some opportunity in
> pass_reload since _load_mask only have constraint "0C"
> for operand2, and "v" constraint in _vblendm would never
> be matched.
> 
> 2020-07-21  Hongtao Liu  
> 
> gcc/
>PR target/96246
> * config/i386/sse.md (_load_mask,
> _load_mask): Extend to generate blendm
> instructions.
> (_blendm, _blendm): Change
> define_insn to define_expand.
> 
> gcc/testsuite/
> * gcc.target/i386/avx512bw-pr96246-1.c: New test.
> * gcc.target/i386/avx512bw-pr96246-2.c: New test.
> * gcc.target/i386/avx512vl-pr96246-1.c: New test.
> * gcc.target/i386/avx512vl-pr96246-2.c: New test.
> * gcc.target/i386/avx512bw-vmovdqu16-1.c: New test.
> * gcc.target/i386/avx512bw-vmovdqu8-1.c: New test.
> * gcc.target/i386/avx512f-vmovapd-1.c: New test.
> * gcc.target/i386/avx512f-vmovaps-1.c: New test.
> * gcc.target/i386/avx512f-vmovdqa32-1.c: New test.
> * gcc.target/i386/avx512f-vmovdqa64-1.c: New test.
> * gcc.target/i386/avx512vl-pr92686-movcc-1.c: New test.
> * gcc.target/i386/avx512vl-pr96246-1.c: New test.
> * gcc.target/i386/avx512vl-pr96246-2.c: New test.
> * gcc.target/i386/avx512vl-vmovapd-1.c: New test.
> * gcc.target/i386/avx512vl-vmovaps-1.c: New test.
> * gcc.target/i386/avx512vl-vmovdqa32-1.c: New test.
> * gcc.target/i386/avx512vl-vmovdqa64-1.c: New test.

Your patch is OK for trunk.

--
K


Re: [PATCH] [AVX512]For vector compare to mask register, UNSPEC is needed instead of comparison operator [PR96243]

2020-08-07 Thread Kirill Yukhin via Gcc-patches
Hello,

On 05 авг 09:29, Hongtao Liu wrote:
> On Tue, Aug 4, 2020 at 6:28 PM Kirill Yukhin  wrote:
> >
> > On 04 авг 13:26, Kirill Yukhin wrote:
> > > Could you please clarify, how your patch relared to [1]?
> > > I see from the bug that it describes perf issue w.r.t. scalar
> > > operations.
> >
> Sorry for Typo, it's pr96243.

Please, don't forget to update ChangeLog entry.

It's a pity that we don't support vector comparisons in CSE,
hope will fix in future.

Patch LGTM.

--
K


Re: [PATCH] Enable GCC support for AMX

2020-08-04 Thread Kirill Yukhin via Gcc-patches
Hello,

On 06 июл 09:58, Hongyu Wang via Gcc-patches wrote:
> Hi:
> 
> This patch is about to support Intel Advanced Matrix Extensions (AMX)
> which will be enabled in GLC.
> 
> AMX is a new 64-bit programming paradigm consisting of two
> compo nents: a set of 2-dimensional registers (tiles) representing
> sub-arrays from a larger 2-dimensional memory image,
> and an accelerator able to operate on tiles
> 
> Supported instructions are
> 
> AMX-TILE:ldtilecfg/sttilecfg/tileloadd/tileloaddt1/tilezero/tilerelease
> AMX-INT8:tdpbssd/tdpbsud/tdpbusd/tdpbuud
> AMX-BF16:tdpbf16ps
> 
> The intrinsics adopts constant tile register number as its input parameters.

I didn't go into the patch deeply, but why did you use inline asm for intrinsics
definition? Are you going to introduce register classes for thouse new tmm
registers and new instruction definitions for new insns in machine description?

--
K


Re: [PATCH] [AVX512]For vector compare to mask register, UNSPEC is needed instead of comparison operator [PR96243]

2020-08-04 Thread Kirill Yukhin via Gcc-patches
On 04 авг 13:26, Kirill Yukhin wrote:
> Could you please clarify, how your patch relared to [1]?
> I see from the bug that it describes perf issue w.r.t. scalar
> operations.

[1] - https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96226

> 
> --
> Regards, Kirill Yukhin


Re: [PATCH] [AVX512]For vector compare to mask register, UNSPEC is needed instead of comparison operator [PR96243]

2020-08-04 Thread Kirill Yukhin via Gcc-patches
Hello,

On 20 июл 13:46, Hongtao Liu wrote:
> Hi:
>   For rtx like (eq:HI (V8SI 90) (V8SI 91)), cse will take it as a
> boolean value and try to do some optimization. But it is not true for
> vector compare, also other places in rtl passes hold the same
> assumption.
> 
> Bootstrap is ok, regression test is ok for i386 backend.
> 
> 2020-07-20  Hongtao Liu  
> 
> gcc/
> PR target/96226

Could you please clarify, how your patch relared to [1]?
I see from the bug that it describes perf issue w.r.t. scalar
operations.

--
Regards, Kirill Yukhin


Re: [PATCH v2 1/2] i386-tdep: Fix naming in zmm and ymm type descriptions.

2020-07-28 Thread Kirill Yukhin via Gcc-patches
Hello,

On 24 июл 10:59, Felix Willgerodt via Gcc-patches wrote:
> gdb/Changelog:
> 2020-07-02  Felix Willgerodt  
> 
>   * i386-tdep.c (i386_zmm_type): Fix field names.
>   (i386_ymm_type): Fix field names.

I guess mailing list is wrong.

--
Regards, Kirill Yukhin


Re: [PATCH] x86: Enable FMA in rsqrt2 expander

2020-07-09 Thread Kirill Yukhin via Gcc-patches
On 07 июл 09:06, H.J. Lu wrote:
> On Tue, Jul 7, 2020 at 8:56 AM Kirill Yukhin  wrote:
> >
> > Hello HJ,
> >
> > On 28 июн 07:19, H.J. Lu via Gcc-patches wrote:
> > > Enable FMA in rsqrt2 expander and fold rsqrtv16sf2 expander into
> > > rsqrt2 expander which expands to UNSPEC_RSQRT28 for TARGET_AVX512ER.
> > > Although it doesn't show performance change in our workloads, FMA can
> > > improve other workloads.
> > >
> > > gcc/
> > >
> > >   PR target/88713
> > >   * config/i386/i386-expand.c (ix86_emit_swsqrtsf): Enable FMA.
> > >   * config/i386/sse.md (VF_AVX512VL_VF1_128_256): New.
> > >   (rsqrt2): Replace VF1_128_256 with VF_AVX512VL_VF1_128_256.
> > >   (rsqrtv16sf2): Removed.
> > >
> > > gcc/testsuite/
> > >
> > >   PR target/88713
> > >   * gcc.target/i386/pr88713-1.c: New test.
> > >   * gcc.target/i386/pr88713-2.c: Likewise.
> >
> > So, you've introduced new rsqrt expanders for DF vectors and relaxed
> > condition for V16SF. What I didn't get is why did you change unspec
> > type from RSQRT to RSQRT28 for V16SF expander?
> >
> 
> UNSPEC in define_expand is meaningless when the pattern is fully
> expanded by ix86_emit_swsqrtsf.  I believe that UNSPEC in rsqrt2
> expander can be removed.

Agree.

--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr88713-1.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -Ofast -mno-avx512f -mfma" } */

I gues -O2 is useless here (and in -2.c test).

Othwerwise LGTM.

--
K

> 
> -- 
> H.J.


Re: [PATCH] x86: Enable FMA in rsqrt2 expander

2020-07-07 Thread Kirill Yukhin via Gcc-patches
Hello HJ,

On 28 июн 07:19, H.J. Lu via Gcc-patches wrote:
> Enable FMA in rsqrt2 expander and fold rsqrtv16sf2 expander into
> rsqrt2 expander which expands to UNSPEC_RSQRT28 for TARGET_AVX512ER.
> Although it doesn't show performance change in our workloads, FMA can
> improve other workloads.
> 
> gcc/
> 
>   PR target/88713
>   * config/i386/i386-expand.c (ix86_emit_swsqrtsf): Enable FMA.
>   * config/i386/sse.md (VF_AVX512VL_VF1_128_256): New.
>   (rsqrt2): Replace VF1_128_256 with VF_AVX512VL_VF1_128_256.
>   (rsqrtv16sf2): Removed.
> 
> gcc/testsuite/
> 
>   PR target/88713
>   * gcc.target/i386/pr88713-1.c: New test.
>   * gcc.target/i386/pr88713-2.c: Likewise.

So, you've introduced new rsqrt expanders for DF vectors and relaxed
condition for V16SF. What I didn't get is why did you change unspec
type from RSQRT to RSQRT28 for V16SF expander?

--
K