Re: [PATCH][vect] Disable vectorization of epilogues for loops with SIMDUID set

2019-11-07 Thread Jakub Jelinek
On Thu, Nov 07, 2019 at 02:26:29PM +, Andre Vieira (lists) wrote:
> 2019-11-07  Andre Vieira  
> 
> * tree-vect-loop.c (vect_analyze_loop): Disable epilogue
> vectorization for loops with SIMDUID set.  Enable epilogue
> vectorization for loops with SIMDLEN set after finding a main
> loop with a VF that matches it.
> 
> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
> index 
> dfa087ebb2cf01a5d21da0a921f8b6fc3d691ce9..22550ca2d6c56cce201ea422bfae5472a0d85f3a
>  100644
> --- a/gcc/tree-vect-loop.c
> +++ b/gcc/tree-vect-loop.c
> @@ -2455,11 +2455,15 @@ vect_analyze_loop (class loop *loop, vec_info_shared 
> *shared)
>   delete loop_vinfo;
>  
> /* Only vectorize epilogues if PARAM_VECT_EPILOGUES_NOMASK is
> -  enabled, this is not a simd loop and it is the innermost loop.  */
> -   vect_epilogues = (!loop->simdlen
> +  enabled, SIMDUID is not set, it is the innermost loop and we have
> +  either already found the loop's SIMDLEN or there was no SIMDLEN to
> +  begin with.
> +  TODO: Enable epilogue vectorization for loops with SIMDUID set.  */
> +   vect_epilogues = (!simdlen
>   && loop->inner == NULL
>   && PARAM_VALUE (PARAM_VECT_EPILOGUES_NOMASK)
>   && LOOP_VINFO_PEELING_FOR_NITER (first_loop_vinfo)
> + && !loop->simduid
>   /* For now only allow one epilogue loop.  */
>   && first_loop_vinfo->epilogue_vinfos.is_empty ());

LGTM.

Jakub



Re: [PATCH][vect] Disable vectorization of epilogues for loops with SIMDUID set

2019-11-07 Thread Andre Vieira (lists)

Hi,

Rebased the patch on top of Richard Sandiford's patches, with his fixes 
I can now allow for vectorization of epilogues after we match simdlen. 
This will however not turn on epilogue vectorization in cases where we 
specify a desired simdlen that is never matched.  This would require 
more work as before simdlen is matched we would need to analyze each 
vector_size after creating a "first_loop_vinfo" twice: once as an 
epilogue (for in the case we never match simdlen) and once as a main 
loop (in case simdlen would match its VF).  Maybe there is a different 
way of doing it but I don't see it right now.


Bootstrapped and regression tested (also ran libgomp tests) for x86_64 
and aarch64. Currently libgomp has 5 failures for aarch64, these are all 
openacc tests. The first one I looked at is due to a reduction seemingly 
performing too many iterations when defining '$acc parallel 
vector_length(vl)' I am looking into it.


Is this OK for trunk?

Cheers,
Andre

gcc/ChangeLog:
2019-11-07  Andre Vieira  

* tree-vect-loop.c (vect_analyze_loop): Disable epilogue
vectorization for loops with SIMDUID set.  Enable epilogue
vectorization for loops with SIMDLEN set after finding a main
loop with a VF that matches it.

On 05/11/2019 07:16, Jakub Jelinek wrote:

On Tue, Nov 05, 2019 at 08:07:53AM +0100, Richard Biener wrote:

I was using loop->simdlen to detect whether it was a SIMD loop and I don't
believe that was correct, as can be witnessed by the mass failures in libgomp.
My apologies for not running this, didn't think of it!

I found that these were failing because we do not handle vectorization of
epilogues correctly when SIMDUID is set. For now Jakub and I agreed to disable
epilogue vectorization for loops where SIMDUID is set until we have fixed
this. See further comments inline.

I bootstrapped it on aarch64 and x86_64, ran libgomp on both.

This OK for trunk?


OK.  Can you remove the simdlen == 0 check as a followup?


Yeah, exactly, I wanted to ask what the point of the simdlen == 0 check is.
All a non-zero simdlen says is a user assertion that certain inter-loop
depencencies don't exist.

Jakub

diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index dfa087ebb2cf01a5d21da0a921f8b6fc3d691ce9..22550ca2d6c56cce201ea422bfae5472a0d85f3a 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -2455,11 +2455,15 @@ vect_analyze_loop (class loop *loop, vec_info_shared *shared)
 	delete loop_vinfo;
 
 	  /* Only vectorize epilogues if PARAM_VECT_EPILOGUES_NOMASK is
-	 enabled, this is not a simd loop and it is the innermost loop.  */
-	  vect_epilogues = (!loop->simdlen
+	 enabled, SIMDUID is not set, it is the innermost loop and we have
+	 either already found the loop's SIMDLEN or there was no SIMDLEN to
+	 begin with.
+	 TODO: Enable epilogue vectorization for loops with SIMDUID set.  */
+	  vect_epilogues = (!simdlen
 			&& loop->inner == NULL
 			&& PARAM_VALUE (PARAM_VECT_EPILOGUES_NOMASK)
 			&& LOOP_VINFO_PEELING_FOR_NITER (first_loop_vinfo)
+			&& !loop->simduid
 			/* For now only allow one epilogue loop.  */
 			&& first_loop_vinfo->epilogue_vinfos.is_empty ());
 


Re: [PATCH][vect] Disable vectorization of epilogues for loops with SIMDUID set

2019-11-04 Thread Jakub Jelinek
On Tue, Nov 05, 2019 at 08:07:53AM +0100, Richard Biener wrote:
> > I was using loop->simdlen to detect whether it was a SIMD loop and I don't
> > believe that was correct, as can be witnessed by the mass failures in 
> > libgomp.
> > My apologies for not running this, didn't think of it!
> > 
> > I found that these were failing because we do not handle vectorization of
> > epilogues correctly when SIMDUID is set. For now Jakub and I agreed to 
> > disable
> > epilogue vectorization for loops where SIMDUID is set until we have fixed
> > this. See further comments inline.
> > 
> > I bootstrapped it on aarch64 and x86_64, ran libgomp on both.
> > 
> > This OK for trunk?
> 
> OK.  Can you remove the simdlen == 0 check as a followup?

Yeah, exactly, I wanted to ask what the point of the simdlen == 0 check is.
All a non-zero simdlen says is a user assertion that certain inter-loop
depencencies don't exist.

Jakub



Re: [PATCH][vect] Disable vectorization of epilogues for loops with SIMDUID set

2019-11-04 Thread Richard Biener
On Mon, 4 Nov 2019, Andre Vieira (lists) wrote:

> Hi,
> 
> I was using loop->simdlen to detect whether it was a SIMD loop and I don't
> believe that was correct, as can be witnessed by the mass failures in libgomp.
> My apologies for not running this, didn't think of it!
> 
> I found that these were failing because we do not handle vectorization of
> epilogues correctly when SIMDUID is set. For now Jakub and I agreed to disable
> epilogue vectorization for loops where SIMDUID is set until we have fixed
> this. See further comments inline.
> 
> I bootstrapped it on aarch64 and x86_64, ran libgomp on both.
> 
> This OK for trunk?

OK.  Can you remove the simdlen == 0 check as a followup?

Thanks,
Richard.

> Cheers,
> Andre
> 
> gcc/ChangeLog:
> 2019-11-04  Andre Vieira  
> 
> * tree-vect-loop.c (vect_analyze_loop): Disable epilogue
> vectorization if loop->simduid is non null.
> 
> On 31/10/2019 16:58, Jakub Jelinek wrote:
> 
> > FAIL: libgomp.c/../libgomp.c-c++-common/loop-1.c execution test
> > FAIL: libgomp.c/examples-4/simd-3.c execution test
> > FAIL: libgomp.c/pr58392.c execution test
> > FAIL: libgomp.c/scan-13.c execution test
> > FAIL: libgomp.c/scan-17.c execution test
> > FAIL: libgomp.c/scan-19.c execution test
> > FAIL: libgomp.c/scan-20.c execution test
> > FAIL: libgomp.c/simd-10.c execution test
> > FAIL: libgomp.c/simd-12.c execution test
> > FAIL: libgomp.c/simd-13.c execution test
> > FAIL: libgomp.c/simd-6.c execution test
> > FAIL: libgomp.c++/../libgomp.c-c++-common/loop-1.c execution test
> > FAIL: libgomp.c++/simd-8.C execution test
> > FAIL: libgomp.fortran/examples-4/simd-3.f90   -O1  execution test
> > FAIL: libgomp.fortran/examples-4/simd-3.f90   -O2  execution test
> > FAIL: libgomp.fortran/examples-4/simd-3.f90   -O3 -fomit-frame-pointer
> > FAIL: -funroll-loops -fpeel-loops -ftracer -finline-functions  execution
> > FAIL: test
> > FAIL: libgomp.fortran/examples-4/simd-3.f90   -O3 -g  execution test
> > FAIL: libgomp.fortran/examples-4/simd-3.f90   -Os  execution test
> > FAIL: libgomp.fortran/nestedfn5.f90   -O1  execution test
> > FAIL: libgomp.fortran/nestedfn5.f90   -O2  execution test
> > FAIL: libgomp.fortran/nestedfn5.f90   -O3 -fomit-frame-pointer
> > FAIL: -funroll-loops -fpeel-loops -ftracer -finline-functions  execution
> > FAIL: test
> > FAIL: libgomp.fortran/nestedfn5.f90   -O3 -g  execution test
> > FAIL: libgomp.fortran/nestedfn5.f90   -Os  execution test
> 
> These should go away now, but we should revisit SIMDUID and epilogue
> vectorization later.  I tried to look into it, but I am afraid I know very
> little about SIMD loops to figure out how to make this work.
> > 
> > On i686-linux, I also see newly
> > FAIL: gcc.dg/vect/vect-epilogues.c -flto -ffat-lto-objects  scan-tree-dump
> > FAIL: vect "LOOP EPILOGUE VECTORIZED"
> > FAIL: gcc.dg/vect/vect-epilogues.c scan-tree-dump vect "LOOP EPILOGUE
> > FAIL: VECTORIZED"
> > and in libgomp just
> 
> These, just like for arm should be skipped for i686, I suspect it doesn't know
> how to vectorize for lower VF's.  Could someone add the appropriate skip
> target triple for i686?
> 
> > FAIL: libgomp.c/examples-4/simd-3.c execution test
> > FAIL: libgomp.c/scan-13.c execution test
> > FAIL: libgomp.c/scan-17.c execution test
> > FAIL: libgomp.fortran/examples-4/simd-3.f90   -O1  execution test
> > FAIL: libgomp.fortran/examples-4/simd-3.f90   -O2  execution test
> > FAIL: libgomp.fortran/examples-4/simd-3.f90   -O3 -fomit-frame-pointer
> > FAIL: -funroll-loops -fpeel-loops -ftracer -finline-functions  execution
> > FAIL: test
> > FAIL: libgomp.fortran/examples-4/simd-3.f90   -O3 -g  execution test
> > FAIL: libgomp.fortran/examples-4/simd-3.f90   -Os  execution test
> 
> Same as the other libgomp tests. Should go away now.
> 
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)

[PATCH][vect] Disable vectorization of epilogues for loops with SIMDUID set

2019-11-04 Thread Andre Vieira (lists)

Hi,

I was using loop->simdlen to detect whether it was a SIMD loop and I 
don't believe that was correct, as can be witnessed by the mass failures 
in libgomp. My apologies for not running this, didn't think of it!


I found that these were failing because we do not handle vectorization 
of epilogues correctly when SIMDUID is set. For now Jakub and I agreed 
to disable epilogue vectorization for loops where SIMDUID is set until 
we have fixed this. See further comments inline.


I bootstrapped it on aarch64 and x86_64, ran libgomp on both.

This OK for trunk?

Cheers,
Andre

gcc/ChangeLog:
2019-11-04  Andre Vieira  

* tree-vect-loop.c (vect_analyze_loop): Disable epilogue
vectorization if loop->simduid is non null.

On 31/10/2019 16:58, Jakub Jelinek wrote:


FAIL: libgomp.c/../libgomp.c-c++-common/loop-1.c execution test
FAIL: libgomp.c/examples-4/simd-3.c execution test
FAIL: libgomp.c/pr58392.c execution test
FAIL: libgomp.c/scan-13.c execution test
FAIL: libgomp.c/scan-17.c execution test
FAIL: libgomp.c/scan-19.c execution test
FAIL: libgomp.c/scan-20.c execution test
FAIL: libgomp.c/simd-10.c execution test
FAIL: libgomp.c/simd-12.c execution test
FAIL: libgomp.c/simd-13.c execution test
FAIL: libgomp.c/simd-6.c execution test
FAIL: libgomp.c++/../libgomp.c-c++-common/loop-1.c execution test
FAIL: libgomp.c++/simd-8.C execution test
FAIL: libgomp.fortran/examples-4/simd-3.f90   -O1  execution test
FAIL: libgomp.fortran/examples-4/simd-3.f90   -O2  execution test
FAIL: libgomp.fortran/examples-4/simd-3.f90   -O3 -fomit-frame-pointer 
-funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
FAIL: libgomp.fortran/examples-4/simd-3.f90   -O3 -g  execution test
FAIL: libgomp.fortran/examples-4/simd-3.f90   -Os  execution test
FAIL: libgomp.fortran/nestedfn5.f90   -O1  execution test
FAIL: libgomp.fortran/nestedfn5.f90   -O2  execution test
FAIL: libgomp.fortran/nestedfn5.f90   -O3 -fomit-frame-pointer -funroll-loops 
-fpeel-loops -ftracer -finline-functions  execution test
FAIL: libgomp.fortran/nestedfn5.f90   -O3 -g  execution test
FAIL: libgomp.fortran/nestedfn5.f90   -Os  execution test


These should go away now, but we should revisit SIMDUID and epilogue 
vectorization later.  I tried to look into it, but I am afraid I know 
very little about SIMD loops to figure out how to make this work.


On i686-linux, I also see newly
FAIL: gcc.dg/vect/vect-epilogues.c -flto -ffat-lto-objects  scan-tree-dump vect 
"LOOP EPILOGUE VECTORIZED"
FAIL: gcc.dg/vect/vect-epilogues.c scan-tree-dump vect "LOOP EPILOGUE 
VECTORIZED"
and in libgomp just


These, just like for arm should be skipped for i686, I suspect it 
doesn't know how to vectorize for lower VF's.  Could someone add the 
appropriate skip target triple for i686?



FAIL: libgomp.c/examples-4/simd-3.c execution test
FAIL: libgomp.c/scan-13.c execution test
FAIL: libgomp.c/scan-17.c execution test
FAIL: libgomp.fortran/examples-4/simd-3.f90   -O1  execution test
FAIL: libgomp.fortran/examples-4/simd-3.f90   -O2  execution test
FAIL: libgomp.fortran/examples-4/simd-3.f90   -O3 -fomit-frame-pointer 
-funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
FAIL: libgomp.fortran/examples-4/simd-3.f90   -O3 -g  execution test
FAIL: libgomp.fortran/examples-4/simd-3.f90   -Os  execution test


Same as the other libgomp tests. Should go away now.

diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index fa873e9b435037e5a81dda6615cab809d2d4de48..d3a0fa015332dbcccf84bc68531dc1e49550cc19 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -2383,10 +2383,11 @@ vect_analyze_loop (class loop *loop, loop_vec_info orig_loop_vinfo,
   poly_uint64 lowest_th = 0;
   unsigned vectorized_loops = 0;
 
-  /* Only vectorize epilogues if PARAM_VECT_EPILOGUES_NOMASK is enabled, this
- is not a simd loop and it is the most inner loop.  */
+  /* Only vectorize epilogues if PARAM_VECT_EPILOGUES_NOMASK is enabled,
+ SIMDLEN is not set, SIMDUID is not set and this is the most inner loop.
+ TODO: Enable epilogue vectorization for loops with SIMDUID set.  */
   bool vect_epilogues
-= !loop->simdlen && loop->inner == NULL
+= loop->simdlen == 0 && loop->simduid == NULL_TREE && loop->inner == NULL
   && PARAM_VALUE (PARAM_VECT_EPILOGUES_NOMASK);
   while (1)
 {