Re: How to generate AVX512 instructions now (just to look at them).
Hello, I think its time to remove `XPASS' from corresponding tests. On 03 Jan 22:11, Jakub Jelinek wrote: Hi! On Fri, Jan 03, 2014 at 08:58:30PM +0100, Toon Moene wrote: I don't doubt that would work, what I'm interested in, is (cat verintlin.f): Well, you need gather loads for that and there you hit PR target/59617. testsuite/ PR target/59617 * gcc.target/i386/avx512f-gather-2.c: Remove XPASS * gcc.target/i386/avx512f-gather-5.c: Ditto. Patch in the bottom. Updated tests pass. Is it ok for trunk? -- Thanks, K gcc/testsuite/gcc.target/i386/avx512f-gather-2.c | 8 gcc/testsuite/gcc.target/i386/avx512f-gather-5.c | 4 ++-- 2 files changed, 6 insertions(+), 6 deletions(-) diff --git a/gcc/testsuite/gcc.target/i386/avx512f-gather-2.c b/gcc/testsuite/gcc.target/i386/avx512f-gather-2.c index 8664192..f20d3db 100644 --- a/gcc/testsuite/gcc.target/i386/avx512f-gather-2.c +++ b/gcc/testsuite/gcc.target/i386/avx512f-gather-2.c @@ -3,9 +3,9 @@ #include avx512f-gather-1.c -/* { dg-final { scan-assembler-not gather\[^\n\]*ymm\[^\n\]*ymm { xfail { *-*-* } } } } */ /* PR59617 */ -/* { dg-final { scan-assembler-not gather\[^\n\]*xmm\[^\n\]*ymm { xfail { *-*-* } } } } */ /* PR59617 */ -/* { dg-final { scan-assembler-not gather\[^\n\]*ymm\[^\n\]*xmm { xfail { *-*-* } } } } */ /* PR59617 */ -/* { dg-final { scan-assembler-not gather\[^\n\]*xmm\[^\n\]*xmm { xfail { lp64 } } } } */ /* PR59617 */ +/* { dg-final { scan-assembler-not gather\[^\n\]*ymm\[^\n\]*ymm } } */ +/* { dg-final { scan-assembler-not gather\[^\n\]*xmm\[^\n\]*ymm } } */ +/* { dg-final { scan-assembler-not gather\[^\n\]*ymm\[^\n\]*xmm } } */ +/* { dg-final { scan-assembler-not gather\[^\n\]*xmm\[^\n\]*xmm } } */ /* { dg-final { scan-tree-dump-times note: vectorized 1 loops in function 16 vect } } */ /* { dg-final { cleanup-tree-dump vect } } */ diff --git a/gcc/testsuite/gcc.target/i386/avx512f-gather-5.c b/gcc/testsuite/gcc.target/i386/avx512f-gather-5.c index 5edd446..d2237da 100644 --- a/gcc/testsuite/gcc.target/i386/avx512f-gather-5.c +++ b/gcc/testsuite/gcc.target/i386/avx512f-gather-5.c @@ -3,8 +3,8 @@ #include avx512f-gather-4.c -/* { dg-final { scan-assembler gather\[^\n\]*zmm { xfail { *-*-* } } } } */ /* PR59617 */ -/* { dg-final { scan-assembler-not gather\[^\n\]*ymm\[^\n\]*ymm { xfail { *-*-* } } } } */ /* PR59617 */ +/* { dg-final { scan-assembler gather\[^\n\]*zmm } } */ +/* { dg-final { scan-assembler-not gather\[^\n\]*ymm\[^\n\]*ymm } } */ /* { dg-final { scan-assembler-not gather\[^\n\]*xmm\[^\n\]*ymm } } */ /* { dg-final { scan-assembler-not gather\[^\n\]*ymm\[^\n\]*xmm } } */ /* { dg-final { scan-assembler-not gather\[^\n\]*xmm\[^\n\]*xmm } } */
Re: How to generate AVX512 instructions now (just to look at them).
On Wed, Jan 29, 2014 at 06:33:21PM +0300, Kirill Yukhin wrote: I think its time to remove `XPASS' from corresponding tests. On 03 Jan 22:11, Jakub Jelinek wrote: Hi! On Fri, Jan 03, 2014 at 08:58:30PM +0100, Toon Moene wrote: I don't doubt that would work, what I'm interested in, is (cat verintlin.f): Well, you need gather loads for that and there you hit PR target/59617. testsuite/ PR target/59617 * gcc.target/i386/avx512f-gather-2.c: Remove XPASS * gcc.target/i386/avx512f-gather-5.c: Ditto. Patch in the bottom. Updated tests pass. Is it ok for trunk? Ok, thanks. Sorry for not removing those myself. Jakub
Re: How to generate AVX512 instructions now (just to look at them).
On 01/03/2014 10:11 PM, Jakub Jelinek wrote: Hi! On Fri, Jan 03, 2014 at 08:58:30PM +0100, Toon Moene wrote: I don't doubt that would work, what I'm interested in, is (cat verintlin.f): Well, you need gather loads for that and there you hit PR target/59617. I tried your patch, and the effect on the most heavily used loop in the full routine (not the part that I quoted before): 160 DO JY = KLAT1,KLAT2 161 DO JX = KLON1,KLON2 162 IDX = KP(JX,JY) 163 IDY = KQ(JX,JY) 164 ILEV = KR(JX,JY) ... 237 + + PBETA(JX,JY,4)*( PALFA(JX,JY,1)*PARG(IDX-2,IDY+1,ILEV+1) 238 + + PALFA(JX,JY,2)*PARG(IDX-1,IDY+1,ILEV+1) 239 + + PALFA(JX,JY,3)*PARG(IDX ,IDY+1,ILEV+1) 240 + + PALFA(JX,JY,4)*PARG(IDX+1,IDY+1,ILEV+1) ) ) 241 ENDDO 242 ENDDO is (just counting assembler lines, i.e., instructions): -Ofast -mavx2 -mfma: 627 lines in the .s file. -Ofast -mavx2 -mfma -mavx512f: 588 lines in the .s file. However, this routine is clearly memory bound (as the vectorization with the gather instruction, needed for the indirect adressing via IDX = KP(JX,JY), etc. didn't bring any speed improvement). The number of instructions accessing memory: -Ofast -mavx2 -mfma: 364 lines in the .s file. -Ofast -mavx2 -mfma -mavx512f: 221 lines in the .s file. So there might be a clear improvement here ... Thanks ! -- Toon Moene - e-mail: t...@moene.org - phone: +31 346 214290 Saturnushof 14, 3738 XG Maartensdijk, The Netherlands At home: http://moene.org/~toon/; weather: http://moene.org/~hirlam/ Progress of GNU Fortran: http://gcc.gnu.org/wiki/GFortran#news
Re: How to generate AVX512 instructions now (just to look at them).
Hi! On Fri, Jan 03, 2014 at 08:58:30PM +0100, Toon Moene wrote: I don't doubt that would work, what I'm interested in, is (cat verintlin.f): Well, you need gather loads for that and there you hit PR target/59617. Completely untested patch that let's your testcase be vectorized using 64-byte vectors, for vectorizable_mask_load_store it still punts, but I guess the steps there are first to teach it about non-gather MASK_LOAD and MASK_STORE, which aren't handled for the AVX512F modes either (I think V8DI/V8DF/V16SI/V16SF modes should be possible to handle right now) and then move on to handle the gathers similarly. 2014-01-03 Jakub Jelinek ja...@redhat.com PR target/59617 * config/i386/i386.c (ix86_vectorize_builtin_gather): Uncomment AVX512F gather builtins. * tree-vect-stmts.c (vectorizable_mask_load_store): For now punt on gather decls with INTEGER_TYPE masktype. (vectorizable_load): For INTEGER_TYPE masktype, put the INTEGER_CST directly into the builtin rather than hoisting it before loop. --- gcc/config/i386/i386.c.jj 2014-01-03 13:19:14.0 +0100 +++ gcc/config/i386/i386.c 2014-01-03 21:12:23.630145609 +0100 @@ -36527,9 +36527,6 @@ ix86_vectorize_builtin_gather (const_tre case V8SImode: code = si ? IX86_BUILTIN_GATHERSIV8SI : IX86_BUILTIN_GATHERALTDIV8SI; break; -#if 0 -/* FIXME: Commented until vectorizer can work with (mask_type != src_type) - PR59617. */ case V8DFmode: if (TARGET_AVX512F) code = si ? IX86_BUILTIN_GATHER3ALTSIV8DF : IX86_BUILTIN_GATHER3DIV8DF; @@ -36554,7 +36551,6 @@ ix86_vectorize_builtin_gather (const_tre else return NULL_TREE; break; -#endif default: return NULL_TREE; } --- gcc/tree-vect-stmts.c.jj2014-01-03 11:41:01.0 +0100 +++ gcc/tree-vect-stmts.c 2014-01-03 21:29:47.595911084 +0100 @@ -1813,6 +1813,17 @@ vectorizable_mask_load_store (gimple stm gather index use not simple.); return false; } + + tree arglist = TYPE_ARG_TYPES (TREE_TYPE (gather_decl)); + tree masktype + = TREE_VALUE (TREE_CHAIN (TREE_CHAIN (TREE_CHAIN (arglist; + if (TREE_CODE (masktype) == INTEGER_TYPE) + { + if (dump_enabled_p ()) + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, +masked gather with integer mask not supported.); + return false; + } } else if (tree_int_cst_compare (nested_in_vect_loop ? STMT_VINFO_DR_STEP (stmt_info) @@ -5761,6 +5772,7 @@ vectorizable_load (gimple stmt, gimple_s { mask = build_int_cst (TREE_TYPE (masktype), -1); mask = build_vector_from_val (masktype, mask); + mask = vect_init_vector (stmt, mask, masktype, NULL); } else if (SCALAR_FLOAT_TYPE_P (TREE_TYPE (masktype))) { @@ -5771,10 +5783,10 @@ vectorizable_load (gimple stmt, gimple_s real_from_target (r, tmp, TYPE_MODE (TREE_TYPE (masktype))); mask = build_real (TREE_TYPE (masktype), r); mask = build_vector_from_val (masktype, mask); + mask = vect_init_vector (stmt, mask, masktype, NULL); } else gcc_unreachable (); - mask = vect_init_vector (stmt, mask, masktype, NULL); scale = build_int_cst (scaletype, gather_scale); Jakub