Re: How to generate AVX512 instructions now (just to look at them).

2014-01-29 Thread Kirill Yukhin
Hello,
I think its time to remove `XPASS' from corresponding tests.
On 03 Jan 22:11, Jakub Jelinek wrote:
 Hi!
 
 On Fri, Jan 03, 2014 at 08:58:30PM +0100, Toon Moene wrote:
  I don't doubt that would work, what I'm interested in, is (cat verintlin.f):
 
 Well, you need gather loads for that and there you hit PR target/59617.

testsuite/
PR target/59617
* gcc.target/i386/avx512f-gather-2.c: Remove XPASS
* gcc.target/i386/avx512f-gather-5.c: Ditto.

Patch in the bottom. Updated tests pass.
Is it ok for trunk?

--
Thanks, K

 gcc/testsuite/gcc.target/i386/avx512f-gather-2.c | 8 
 gcc/testsuite/gcc.target/i386/avx512f-gather-5.c | 4 ++--
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/gcc/testsuite/gcc.target/i386/avx512f-gather-2.c 
b/gcc/testsuite/gcc.target/i386/avx512f-gather-2.c
index 8664192..f20d3db 100644
--- a/gcc/testsuite/gcc.target/i386/avx512f-gather-2.c
+++ b/gcc/testsuite/gcc.target/i386/avx512f-gather-2.c
@@ -3,9 +3,9 @@
 
 #include avx512f-gather-1.c
 
-/* { dg-final { scan-assembler-not gather\[^\n\]*ymm\[^\n\]*ymm { xfail { 
*-*-* } } } } */  /* PR59617 */
-/* { dg-final { scan-assembler-not gather\[^\n\]*xmm\[^\n\]*ymm { xfail { 
*-*-* } } } } */  /* PR59617 */
-/* { dg-final { scan-assembler-not gather\[^\n\]*ymm\[^\n\]*xmm { xfail { 
*-*-* } } } } */  /* PR59617 */
-/* { dg-final { scan-assembler-not gather\[^\n\]*xmm\[^\n\]*xmm { xfail { 
lp64 } } } } */  /* PR59617 */
+/* { dg-final { scan-assembler-not gather\[^\n\]*ymm\[^\n\]*ymm } } */
+/* { dg-final { scan-assembler-not gather\[^\n\]*xmm\[^\n\]*ymm } } */
+/* { dg-final { scan-assembler-not gather\[^\n\]*ymm\[^\n\]*xmm } } */
+/* { dg-final { scan-assembler-not gather\[^\n\]*xmm\[^\n\]*xmm } } */
 /* { dg-final { scan-tree-dump-times note: vectorized 1 loops in function 16 
vect } } */
 /* { dg-final { cleanup-tree-dump vect } } */
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-gather-5.c 
b/gcc/testsuite/gcc.target/i386/avx512f-gather-5.c
index 5edd446..d2237da 100644
--- a/gcc/testsuite/gcc.target/i386/avx512f-gather-5.c
+++ b/gcc/testsuite/gcc.target/i386/avx512f-gather-5.c
@@ -3,8 +3,8 @@
 
 #include avx512f-gather-4.c
 
-/* { dg-final { scan-assembler gather\[^\n\]*zmm { xfail { *-*-* } } } } */ 
/* PR59617 */
-/* { dg-final { scan-assembler-not gather\[^\n\]*ymm\[^\n\]*ymm { xfail { 
*-*-* } } } } */ /* PR59617 */
+/* { dg-final { scan-assembler gather\[^\n\]*zmm } } */
+/* { dg-final { scan-assembler-not gather\[^\n\]*ymm\[^\n\]*ymm } } */
 /* { dg-final { scan-assembler-not gather\[^\n\]*xmm\[^\n\]*ymm } } */
 /* { dg-final { scan-assembler-not gather\[^\n\]*ymm\[^\n\]*xmm } } */
 /* { dg-final { scan-assembler-not gather\[^\n\]*xmm\[^\n\]*xmm } } */


Re: How to generate AVX512 instructions now (just to look at them).

2014-01-29 Thread Jakub Jelinek
On Wed, Jan 29, 2014 at 06:33:21PM +0300, Kirill Yukhin wrote:
 I think its time to remove `XPASS' from corresponding tests.
 On 03 Jan 22:11, Jakub Jelinek wrote:
  Hi!
  
  On Fri, Jan 03, 2014 at 08:58:30PM +0100, Toon Moene wrote:
   I don't doubt that would work, what I'm interested in, is (cat 
   verintlin.f):
  
  Well, you need gather loads for that and there you hit PR target/59617.
 
 testsuite/
   PR target/59617
   * gcc.target/i386/avx512f-gather-2.c: Remove XPASS
   * gcc.target/i386/avx512f-gather-5.c: Ditto.
 
 Patch in the bottom. Updated tests pass.
 Is it ok for trunk?

Ok, thanks.  Sorry for not removing those myself.

Jakub


Re: How to generate AVX512 instructions now (just to look at them).

2014-01-05 Thread Toon Moene

On 01/03/2014 10:11 PM, Jakub Jelinek wrote:


Hi!

On Fri, Jan 03, 2014 at 08:58:30PM +0100, Toon Moene wrote:

I don't doubt that would work, what I'm interested in, is (cat verintlin.f):


Well, you need gather loads for that and there you hit PR target/59617.


I tried your patch, and the effect on the most heavily used loop in the 
full routine (not the part that I quoted before):


160   DO JY = KLAT1,KLAT2
161   DO JX = KLON1,KLON2
162  IDX  = KP(JX,JY)
163  IDY  = KQ(JX,JY)
164  ILEV = KR(JX,JY)
...
237  + + PBETA(JX,JY,4)*( PALFA(JX,JY,1)*PARG(IDX-2,IDY+1,ILEV+1)
238  +  + PALFA(JX,JY,2)*PARG(IDX-1,IDY+1,ILEV+1)
239  +  + PALFA(JX,JY,3)*PARG(IDX  ,IDY+1,ILEV+1)
240  +  + 
PALFA(JX,JY,4)*PARG(IDX+1,IDY+1,ILEV+1) ) )

241   ENDDO
242   ENDDO

is (just counting assembler lines, i.e., instructions):

-Ofast -mavx2 -mfma:   627 lines in the .s file.

-Ofast -mavx2 -mfma -mavx512f: 588 lines in the .s file.

However, this routine is clearly memory bound (as the vectorization with 
the gather instruction, needed for the indirect adressing via IDX  = 
KP(JX,JY), etc. didn't bring any speed improvement).


The number of instructions accessing memory:

-Ofast -mavx2 -mfma:   364 lines in the .s file.

-Ofast -mavx2 -mfma -mavx512f: 221 lines in the .s file.

So there might be a clear improvement here ...

Thanks !

--
Toon Moene - e-mail: t...@moene.org - phone: +31 346 214290
Saturnushof 14, 3738 XG  Maartensdijk, The Netherlands
At home: http://moene.org/~toon/; weather: http://moene.org/~hirlam/
Progress of GNU Fortran: http://gcc.gnu.org/wiki/GFortran#news


Re: How to generate AVX512 instructions now (just to look at them).

2014-01-03 Thread Jakub Jelinek
Hi!

On Fri, Jan 03, 2014 at 08:58:30PM +0100, Toon Moene wrote:
 I don't doubt that would work, what I'm interested in, is (cat verintlin.f):

Well, you need gather loads for that and there you hit PR target/59617.

Completely untested patch that let's your testcase be vectorized
using 64-byte vectors, for vectorizable_mask_load_store it still punts,
but I guess the steps there are first to teach it about non-gather MASK_LOAD
and MASK_STORE, which aren't handled for the AVX512F modes either
(I think V8DI/V8DF/V16SI/V16SF modes should be possible to handle right now)
and then move on to handle the gathers similarly.

2014-01-03  Jakub Jelinek  ja...@redhat.com

PR target/59617
* config/i386/i386.c (ix86_vectorize_builtin_gather): Uncomment
AVX512F gather builtins.
* tree-vect-stmts.c (vectorizable_mask_load_store): For now punt
on gather decls with INTEGER_TYPE masktype.
(vectorizable_load): For INTEGER_TYPE masktype, put the INTEGER_CST
directly into the builtin rather than hoisting it before loop.

--- gcc/config/i386/i386.c.jj   2014-01-03 13:19:14.0 +0100
+++ gcc/config/i386/i386.c  2014-01-03 21:12:23.630145609 +0100
@@ -36527,9 +36527,6 @@ ix86_vectorize_builtin_gather (const_tre
 case V8SImode:
   code = si ? IX86_BUILTIN_GATHERSIV8SI : IX86_BUILTIN_GATHERALTDIV8SI;
   break;
-#if 0
-/*  FIXME: Commented until vectorizer can work with (mask_type != src_type)
-   PR59617.   */
 case V8DFmode:
   if (TARGET_AVX512F)
code = si ? IX86_BUILTIN_GATHER3ALTSIV8DF : IX86_BUILTIN_GATHER3DIV8DF;
@@ -36554,7 +36551,6 @@ ix86_vectorize_builtin_gather (const_tre
   else
return NULL_TREE;
   break;
-#endif
 default:
   return NULL_TREE;
 }
--- gcc/tree-vect-stmts.c.jj2014-01-03 11:41:01.0 +0100
+++ gcc/tree-vect-stmts.c   2014-01-03 21:29:47.595911084 +0100
@@ -1813,6 +1813,17 @@ vectorizable_mask_load_store (gimple stm
 gather index use not simple.);
  return false;
}
+
+  tree arglist = TYPE_ARG_TYPES (TREE_TYPE (gather_decl));
+  tree masktype
+   = TREE_VALUE (TREE_CHAIN (TREE_CHAIN (TREE_CHAIN (arglist;
+  if (TREE_CODE (masktype) == INTEGER_TYPE)
+   {
+ if (dump_enabled_p ())
+   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+masked gather with integer mask not supported.);
+ return false;
+   }
 }
   else if (tree_int_cst_compare (nested_in_vect_loop
 ? STMT_VINFO_DR_STEP (stmt_info)
@@ -5761,6 +5772,7 @@ vectorizable_load (gimple stmt, gimple_s
{
  mask = build_int_cst (TREE_TYPE (masktype), -1);
  mask = build_vector_from_val (masktype, mask);
+ mask = vect_init_vector (stmt, mask, masktype, NULL);
}
   else if (SCALAR_FLOAT_TYPE_P (TREE_TYPE (masktype)))
{
@@ -5771,10 +5783,10 @@ vectorizable_load (gimple stmt, gimple_s
  real_from_target (r, tmp, TYPE_MODE (TREE_TYPE (masktype)));
  mask = build_real (TREE_TYPE (masktype), r);
  mask = build_vector_from_val (masktype, mask);
+ mask = vect_init_vector (stmt, mask, masktype, NULL);
}
   else
gcc_unreachable ();
-  mask = vect_init_vector (stmt, mask, masktype, NULL);
 
   scale = build_int_cst (scaletype, gather_scale);
 


Jakub