Re: [PATCH, vec-tails] Support loop epilogue vectorization

2016-12-21 Thread Yuri Rumyantsev
Sorry,

I put wrong test - fix it here.

2016-12-21 13:12 GMT+03:00 Yuri Rumyantsev :
> Hi Richard,
>
> I occasionally found out a bug in my patch related to epilogue
> vectorization without masking : need to put label before
> initialization.
>
> Could you please review and integrate it to trunk. Test-case is also attached.
>
>
> Thanks ahead.
> Yuri.
>
> ChangeLog:
> 2016-12-21  Yuri Rumyantsev  
>
> * tree-vectorizer.c (vectorize_loops): Put label before initialization
> of loop_vectorized_call.
>
> gcc/testsuite/
>
> * gcc.dg/vect/vect-tail-nomask-2.c: New test.
>
> 2016-12-13 16:59 GMT+03:00 Richard Biener :
>> On Mon, 12 Dec 2016, Yuri Rumyantsev wrote:
>>
>>> Richard,
>>>
>>> Could you please review cost model patch before to include it to
>>> epilogue masking patch and add masking cost estimation as you
>>> requested.
>>
>> That's just the middle-end / target changes.  I was not 100% happy
>> with them but well, the vectorizer cost modeling needs work
>> (aka another rewrite).
>>
>> From below...
>>
>>> Thanks.
>>>
>>> Patch and ChangeLog are attached.
>>>
>>> 2016-12-12 15:47 GMT+03:00 Yuri Rumyantsev :
>>> > Hi Richard,
>>> >
>>> > You asked me about performance of spec2006 on AVX2 machine with new 
>>> > feature.
>>> >
>>> > I tried the following on Haswell using original patch designed by Ilya.
>>> > 1. Masking low trip count loops  only 6 benchmarks are affected and
>>> > performance is almost the same
>>> > 464.h264ref 63.900064. +0.15%
>>> > 416.gamess  42.900042.9000 +0%
>>> > 435.gromacs 32.800032.7000 -0.30%
>>> > 447.dealII  68.500068.3000 -0.29%
>>> > 453.povray  61.900062.1000 +0.32%
>>> > 454.calculix39.800039.8000 +0%
>>> > 465.tonto   29.900029.9000 +0%
>>> >
>>> > 2. epilogue vectorization without masking (use less vf) (3 benchmarks
>>> > are not affected)
>>> > 400.perlbench 47.200046.5000 -1.48%
>>> > 401.bzip2 29.900029.9000 +0%
>>> > 403.gcc   41.800041.6000 -0.47%
>>> > 456.hmmer 32.32. +0%
>>> > 462.libquantum81.500082. +0.61%
>>> > 464.h264ref   65.65.5000 +0.76%
>>> > 471.omnetpp   27.800028.2000 +1.43%
>>> > 473.astar 28.700028.6000 -0.34%
>>> > 483.xalancbmk 48.700048.6000 -0.20%
>>> > 410.bwaves95.300095.3000 +0%
>>> > 416.gamess42.900042.8000 -0.23%
>>> > 433.milc  38.800038.8000 +0%
>>> > 434.zeusmp51.700051.4000 -0.58%
>>> > 435.gromacs   32.800032.8000 +0%
>>> > 436.cactusADM 85.83. -2.35%
>>> > 437.leslie3d  55.500055.5000 +0%
>>> > 444.namd  31.300031.3000 +0%
>>> > 447.dealII68.700068.9000 +0.29%
>>> > 450.soplex47.300047.4000 +0.21%
>>> > 453.povray62.100061.4000 -1.12%
>>> > 454.calculix  39.700039.3000 -1.00%
>>> > 459.GemsFDTD  44.900045. +0.22%
>>> > 465.tonto 29.800029.8000 +0%
>>> > 481.wrf   51.51.2000 +0.39%
>>> > 482.sphinx3   69.800071.2000 +2.00%
>>
>> I see 471.omnetpp and 482.sphinx3 are in a similar ballpark and it
>> would be nice to catch the relevant case(s) with a cost model for
>> epilogue vectorization without masking first (to get rid of
>> --param vect-epilogues-nomask).
>>
>> As said elsewhere any non-conservative cost modeling (if the
>> number of scalar iterations is not statically constant) might
>> require versioning of the loop into a non-vectorized,
>> short-trip vectorized and regular vectorized case (the Intel
>> compiler does way more aggressive versioning IIRC).
>>
>> Richard.
>>
>>> > 3. epilogue vectorization using masking (4 benchmarks are not affected):
>>> > 400.perlbench 47.500046.8000 -1.47%
>>> > 401.bzip2 30.29.9000 -0.33%
>>> > 403.gcc   42.300042.3000 +0%
>>> > 445.gobmk 32.100032.8000 +2.18%
>>> > 456.hmmer 32.32. +0%
>>> > 458.sjeng 36.100035.5000 -1.66%
>>> > 462.libquantum81.100081.1000 +0%
>>> > 464.h264ref   65.400065. -0.61%
>>> > 483.xalancbmk 49.400049.3000 -0.20%
>>> > 410.bwaves95.900095.5000 -0.41%
>>> > 416.gamess42.800042.6000 -0.46%
>>> > 433.milc  38.800039.1000 +0.77%
>>> > 434.zeusmp52.100051.3000 -1.53%
>>> > 435.gromacs   32.900032.9000 +0%
>>> > 436.cactusADM 78.800085.3000 +8.24%
>>> > 437.leslie3d  55.400055.4000 +0%
>>> > 444.namd  31.300031.3000 +0%
>>> > 447.dealII69.69.2000 +0.28%
>>> > 450.soplex47.700047.6000 -0.20%
>>> > 453.povray62.200061.7000 -0.80%
>>> > 454.calculix  39.700038.2000 -3.77%
>>> > 459.GemsFDTD  44.900045. +0.22%
>>> > 465.tonto 29.800029.9000 +0.33%
>>> > 481.wrf   51.200051.6000 

Re: [PATCH, vec-tails] Support loop epilogue vectorization

2016-12-21 Thread Yuri Rumyantsev
Hi Richard,

I occasionally found out a bug in my patch related to epilogue
vectorization without masking : need to put label before
initialization.

Could you please review and integrate it to trunk. Test-case is also attached.


Thanks ahead.
Yuri.

ChangeLog:
2016-12-21  Yuri Rumyantsev  

* tree-vectorizer.c (vectorize_loops): Put label before initialization
of loop_vectorized_call.

gcc/testsuite/

* gcc.dg/vect/vect-tail-nomask-2.c: New test.

2016-12-13 16:59 GMT+03:00 Richard Biener :
> On Mon, 12 Dec 2016, Yuri Rumyantsev wrote:
>
>> Richard,
>>
>> Could you please review cost model patch before to include it to
>> epilogue masking patch and add masking cost estimation as you
>> requested.
>
> That's just the middle-end / target changes.  I was not 100% happy
> with them but well, the vectorizer cost modeling needs work
> (aka another rewrite).
>
> From below...
>
>> Thanks.
>>
>> Patch and ChangeLog are attached.
>>
>> 2016-12-12 15:47 GMT+03:00 Yuri Rumyantsev :
>> > Hi Richard,
>> >
>> > You asked me about performance of spec2006 on AVX2 machine with new 
>> > feature.
>> >
>> > I tried the following on Haswell using original patch designed by Ilya.
>> > 1. Masking low trip count loops  only 6 benchmarks are affected and
>> > performance is almost the same
>> > 464.h264ref 63.900064. +0.15%
>> > 416.gamess  42.900042.9000 +0%
>> > 435.gromacs 32.800032.7000 -0.30%
>> > 447.dealII  68.500068.3000 -0.29%
>> > 453.povray  61.900062.1000 +0.32%
>> > 454.calculix39.800039.8000 +0%
>> > 465.tonto   29.900029.9000 +0%
>> >
>> > 2. epilogue vectorization without masking (use less vf) (3 benchmarks
>> > are not affected)
>> > 400.perlbench 47.200046.5000 -1.48%
>> > 401.bzip2 29.900029.9000 +0%
>> > 403.gcc   41.800041.6000 -0.47%
>> > 456.hmmer 32.32. +0%
>> > 462.libquantum81.500082. +0.61%
>> > 464.h264ref   65.65.5000 +0.76%
>> > 471.omnetpp   27.800028.2000 +1.43%
>> > 473.astar 28.700028.6000 -0.34%
>> > 483.xalancbmk 48.700048.6000 -0.20%
>> > 410.bwaves95.300095.3000 +0%
>> > 416.gamess42.900042.8000 -0.23%
>> > 433.milc  38.800038.8000 +0%
>> > 434.zeusmp51.700051.4000 -0.58%
>> > 435.gromacs   32.800032.8000 +0%
>> > 436.cactusADM 85.83. -2.35%
>> > 437.leslie3d  55.500055.5000 +0%
>> > 444.namd  31.300031.3000 +0%
>> > 447.dealII68.700068.9000 +0.29%
>> > 450.soplex47.300047.4000 +0.21%
>> > 453.povray62.100061.4000 -1.12%
>> > 454.calculix  39.700039.3000 -1.00%
>> > 459.GemsFDTD  44.900045. +0.22%
>> > 465.tonto 29.800029.8000 +0%
>> > 481.wrf   51.51.2000 +0.39%
>> > 482.sphinx3   69.800071.2000 +2.00%
>
> I see 471.omnetpp and 482.sphinx3 are in a similar ballpark and it
> would be nice to catch the relevant case(s) with a cost model for
> epilogue vectorization without masking first (to get rid of
> --param vect-epilogues-nomask).
>
> As said elsewhere any non-conservative cost modeling (if the
> number of scalar iterations is not statically constant) might
> require versioning of the loop into a non-vectorized,
> short-trip vectorized and regular vectorized case (the Intel
> compiler does way more aggressive versioning IIRC).
>
> Richard.
>
>> > 3. epilogue vectorization using masking (4 benchmarks are not affected):
>> > 400.perlbench 47.500046.8000 -1.47%
>> > 401.bzip2 30.29.9000 -0.33%
>> > 403.gcc   42.300042.3000 +0%
>> > 445.gobmk 32.100032.8000 +2.18%
>> > 456.hmmer 32.32. +0%
>> > 458.sjeng 36.100035.5000 -1.66%
>> > 462.libquantum81.100081.1000 +0%
>> > 464.h264ref   65.400065. -0.61%
>> > 483.xalancbmk 49.400049.3000 -0.20%
>> > 410.bwaves95.900095.5000 -0.41%
>> > 416.gamess42.800042.6000 -0.46%
>> > 433.milc  38.800039.1000 +0.77%
>> > 434.zeusmp52.100051.3000 -1.53%
>> > 435.gromacs   32.900032.9000 +0%
>> > 436.cactusADM 78.800085.3000 +8.24%
>> > 437.leslie3d  55.400055.4000 +0%
>> > 444.namd  31.300031.3000 +0%
>> > 447.dealII69.69.2000 +0.28%
>> > 450.soplex47.700047.6000 -0.20%
>> > 453.povray62.200061.7000 -0.80%
>> > 454.calculix  39.700038.2000 -3.77%
>> > 459.GemsFDTD  44.900045. +0.22%
>> > 465.tonto 29.800029.9000 +0.33%
>> > 481.wrf   51.200051.6000 +0.78%
>> > 482.sphinx3   70.300065.4000 -6.97%
>> >
>> > There is a good speed-up for 436 but there is essential slow0down on 482, 
>> > 454.
>> >
>> > So In general we don't have any advantages for AVX2.
>> >
>> > Best regards.
>> > 

Re: [PATCH, vec-tails] Support loop epilogue vectorization

2016-12-01 Thread Richard Biener
On Thu, 1 Dec 2016, Yuri Rumyantsev wrote:

> Thanks Richard for your comments.
> 
> You asked me about possible performance improvements for AVX2 machines
> - we did not see any visible speed-up for spec2k with any method of

Spec 2000?  Can you check with SPEC 2006 or CPUv6?

Did you see performance degradation?  What about compile-time and
binary size effects?

> masking, including epilogue masking and combining, only on AVX512
> machine aka knl.

I see.

Note that as said in the initial review patch the cost model I
saw therein looked flawed.  In the end I'd expect a sensible
approach would be to do

 if (n < scalar-most-profitable-niter)
   {
 no vectorization
   }
 else if (n < masking-more-profitable-than-not-masking-plus-epilogue)
   {
 do masked vectorization
   }
 else
   {
 do unmasked vectorization (with epilogue, eventually vectorized)
   }

where for short trip loops the else path would never be taken
(statically).

And yes, that means masking will only be useful for short-trip loops
which in the end means an overall performance benfit is unlikely
unless we have a lot of short-trip loops that are slow because of
the overhead of main unmasked loop plus epilogue.

Richard.

> I will answer on your question later.
> 
> Best regards.
> Yuri
> 
> 2016-12-01 14:33 GMT+03:00 Richard Biener :
> > On Mon, 28 Nov 2016, Yuri Rumyantsev wrote:
> >
> >> Richard!
> >>
> >> I attached vect dump for hte part of attached test-case which
> >> illustrated how vectorization of epilogues works through masking:
> >> #define SIZE 1023
> >> #define ALIGN 64
> >>
> >> extern int posix_memalign(void **memptr, __SIZE_TYPE__ alignment,
> >> __SIZE_TYPE__ size) __attribute__((weak));
> >> extern void free (void *);
> >>
> >> void __attribute__((noinline))
> >> test_citer (int * __restrict__ a,
> >>int * __restrict__ b,
> >>int * __restrict__ c)
> >> {
> >>   int i;
> >>
> >>   a = (int *)__builtin_assume_aligned (a, ALIGN);
> >>   b = (int *)__builtin_assume_aligned (b, ALIGN);
> >>   c = (int *)__builtin_assume_aligned (c, ALIGN);
> >>
> >>   for (i = 0; i < SIZE; i++)
> >> c[i] = a[i] + b[i];
> >> }
> >>
> >> It was compiled with -mavx2 --param vect-epilogues-mask=1 options.
> >>
> >> I did not include in this patch vectorization of low trip-count loops
> >> since in the original patch additional parameter was introduced:
> >> +DEFPARAM (PARAM_VECT_SHORT_LOOPS,
> >> +  "vect-short-loops",
> >> +  "Enable vectorization of low trip count loops using masking.",
> >> +  0, 0, 1)
> >>
> >> I assume that this ability can be included very quickly but it
> >> requires cost model enhancements also.
> >
> > Comments on the patch itself (as I'm having a closer look again,
> > I know how it vectorizes the above but I wondered why epilogue
> > and short-trip loops are not basically the same code path).
> >
> > Btw, I don't like that the features are behind a --param paywall.
> > That just means a) nobody will use it, b) it will bit-rot quickly,
> > c) bugs are well-hidden.
> >
> > +  if (loop_vinfo && LOOP_VINFO_CAN_BE_MASKED (loop_vinfo)
> > +  && integer_zerop (nested_in_vect_loop
> > +   ? STMT_VINFO_DR_STEP (stmt_info)
> > +   : DR_STEP (dr)))
> > +{
> > +  if (dump_enabled_p ())
> > +   dump_printf_loc (MSG_NOTE, vect_location,
> > +"allow invariant load for masked loop.\n");
> > +}
> >
> > this can test memory_access_type == VMAT_INVARIANT.  Please put
> > all the checks in a common
> >
> >   if (loop_vinfo && LOOP_VINFO_CAN_BE_MASKED (loop_vinfo))
> > {
> >if (memory_access_type == VMAT_INVARIANT)
> >  {
> >  }
> >else if (...)
> >  {
> > LOOP_VINFO_CAN_BE_MASKED (loop_vinfo) = false;
> >  }
> >else if (..)
> > ...
> > }
> >
> > @@ -6667,6 +6756,15 @@ vectorizable_load (gimple *stmt,
> > gimple_stmt_iterator *gsi, gimple **vec_stmt,
> >gcc_assert (!nested_in_vect_loop);
> >gcc_assert (!STMT_VINFO_GATHER_SCATTER_P (stmt_info));
> >
> > +  if (loop_vinfo && LOOP_VINFO_CAN_BE_MASKED (loop_vinfo))
> > +   {
> > + if (dump_enabled_p ())
> > +   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > +"cannot be masked: grouped access is not"
> > +" supported.");
> > + LOOP_VINFO_CAN_BE_MASKED (loop_vinfo) = false;
> > +  }
> > +
> >
> > isn't this already handled by the above?  Or rather the general
> > disallowance of SLP?
> >
> > @@ -5730,6 +5792,24 @@ vectorizable_store (gimple *stmt,
> > gimple_stmt_iterator *gsi, gimple **vec_stmt,
> > _access_type, _info))
> >  return false;
> >
> > +  if (loop_vinfo && LOOP_VINFO_CAN_BE_MASKED (loop_vinfo)
> > +  && memory_access_type != VMAT_CONTIGUOUS)
> > +{
> > +  LOOP_VINFO_CAN_BE_MASKED (loop_vinfo) = false;
> > +  if 

Re: [PATCH, vec-tails] Support loop epilogue vectorization

2016-12-01 Thread Yuri Rumyantsev
Thanks Richard for your comments.

You asked me about possible performance improvements for AVX2 machines
- we did not see any visible speed-up for spec2k with any method of
masking, including epilogue masking and combining, only on AVX512
machine aka knl.

I will answer on your question later.

Best regards.
Yuri

2016-12-01 14:33 GMT+03:00 Richard Biener :
> On Mon, 28 Nov 2016, Yuri Rumyantsev wrote:
>
>> Richard!
>>
>> I attached vect dump for hte part of attached test-case which
>> illustrated how vectorization of epilogues works through masking:
>> #define SIZE 1023
>> #define ALIGN 64
>>
>> extern int posix_memalign(void **memptr, __SIZE_TYPE__ alignment,
>> __SIZE_TYPE__ size) __attribute__((weak));
>> extern void free (void *);
>>
>> void __attribute__((noinline))
>> test_citer (int * __restrict__ a,
>>int * __restrict__ b,
>>int * __restrict__ c)
>> {
>>   int i;
>>
>>   a = (int *)__builtin_assume_aligned (a, ALIGN);
>>   b = (int *)__builtin_assume_aligned (b, ALIGN);
>>   c = (int *)__builtin_assume_aligned (c, ALIGN);
>>
>>   for (i = 0; i < SIZE; i++)
>> c[i] = a[i] + b[i];
>> }
>>
>> It was compiled with -mavx2 --param vect-epilogues-mask=1 options.
>>
>> I did not include in this patch vectorization of low trip-count loops
>> since in the original patch additional parameter was introduced:
>> +DEFPARAM (PARAM_VECT_SHORT_LOOPS,
>> +  "vect-short-loops",
>> +  "Enable vectorization of low trip count loops using masking.",
>> +  0, 0, 1)
>>
>> I assume that this ability can be included very quickly but it
>> requires cost model enhancements also.
>
> Comments on the patch itself (as I'm having a closer look again,
> I know how it vectorizes the above but I wondered why epilogue
> and short-trip loops are not basically the same code path).
>
> Btw, I don't like that the features are behind a --param paywall.
> That just means a) nobody will use it, b) it will bit-rot quickly,
> c) bugs are well-hidden.
>
> +  if (loop_vinfo && LOOP_VINFO_CAN_BE_MASKED (loop_vinfo)
> +  && integer_zerop (nested_in_vect_loop
> +   ? STMT_VINFO_DR_STEP (stmt_info)
> +   : DR_STEP (dr)))
> +{
> +  if (dump_enabled_p ())
> +   dump_printf_loc (MSG_NOTE, vect_location,
> +"allow invariant load for masked loop.\n");
> +}
>
> this can test memory_access_type == VMAT_INVARIANT.  Please put
> all the checks in a common
>
>   if (loop_vinfo && LOOP_VINFO_CAN_BE_MASKED (loop_vinfo))
> {
>if (memory_access_type == VMAT_INVARIANT)
>  {
>  }
>else if (...)
>  {
> LOOP_VINFO_CAN_BE_MASKED (loop_vinfo) = false;
>  }
>else if (..)
> ...
> }
>
> @@ -6667,6 +6756,15 @@ vectorizable_load (gimple *stmt,
> gimple_stmt_iterator *gsi, gimple **vec_stmt,
>gcc_assert (!nested_in_vect_loop);
>gcc_assert (!STMT_VINFO_GATHER_SCATTER_P (stmt_info));
>
> +  if (loop_vinfo && LOOP_VINFO_CAN_BE_MASKED (loop_vinfo))
> +   {
> + if (dump_enabled_p ())
> +   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +"cannot be masked: grouped access is not"
> +" supported.");
> + LOOP_VINFO_CAN_BE_MASKED (loop_vinfo) = false;
> +  }
> +
>
> isn't this already handled by the above?  Or rather the general
> disallowance of SLP?
>
> @@ -5730,6 +5792,24 @@ vectorizable_store (gimple *stmt,
> gimple_stmt_iterator *gsi, gimple **vec_stmt,
> _access_type, _info))
>  return false;
>
> +  if (loop_vinfo && LOOP_VINFO_CAN_BE_MASKED (loop_vinfo)
> +  && memory_access_type != VMAT_CONTIGUOUS)
> +{
> +  LOOP_VINFO_CAN_BE_MASKED (loop_vinfo) = false;
> +  if (dump_enabled_p ())
> +   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +"cannot be masked: unsupported memory access
> type.\n");
> +}
> +
> +  if (loop_vinfo && LOOP_VINFO_CAN_BE_MASKED (loop_vinfo)
> +  && !can_mask_load_store (stmt))
> +{
> +  LOOP_VINFO_CAN_BE_MASKED (loop_vinfo) = false;
> +  if (dump_enabled_p ())
> +   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +"cannot be masked: unsupported masked store.\n");
> +}
> +
>
> likewise please combine the ifs.
>
> @@ -2354,7 +2401,10 @@ vectorizable_mask_load_store (gimple *stmt,
> gimple_stmt_iterator *gsi,
>   ptr, vec_mask, vec_rhs);
>   vect_finish_stmt_generation (stmt, new_stmt, gsi);
>   if (i == 0)
> -   STMT_VINFO_VEC_STMT (stmt_info) = *vec_stmt = new_stmt;
> +   {
> + STMT_VINFO_VEC_STMT (stmt_info) = *vec_stmt = new_stmt;
> + STMT_VINFO_FIRST_COPY_P (vinfo_for_stmt (new_stmt)) = true;
> +   }
>   else
> STMT_VINFO_RELATED_STMT (prev_stmt_info) = new_stmt;
>   

Re: [PATCH, vec-tails] Support loop epilogue vectorization

2016-12-01 Thread Richard Biener
On Mon, 28 Nov 2016, Yuri Rumyantsev wrote:

> Richard!
> 
> I attached vect dump for hte part of attached test-case which
> illustrated how vectorization of epilogues works through masking:
> #define SIZE 1023
> #define ALIGN 64
> 
> extern int posix_memalign(void **memptr, __SIZE_TYPE__ alignment,
> __SIZE_TYPE__ size) __attribute__((weak));
> extern void free (void *);
> 
> void __attribute__((noinline))
> test_citer (int * __restrict__ a,
>int * __restrict__ b,
>int * __restrict__ c)
> {
>   int i;
> 
>   a = (int *)__builtin_assume_aligned (a, ALIGN);
>   b = (int *)__builtin_assume_aligned (b, ALIGN);
>   c = (int *)__builtin_assume_aligned (c, ALIGN);
> 
>   for (i = 0; i < SIZE; i++)
> c[i] = a[i] + b[i];
> }
> 
> It was compiled with -mavx2 --param vect-epilogues-mask=1 options.
> 
> I did not include in this patch vectorization of low trip-count loops
> since in the original patch additional parameter was introduced:
> +DEFPARAM (PARAM_VECT_SHORT_LOOPS,
> +  "vect-short-loops",
> +  "Enable vectorization of low trip count loops using masking.",
> +  0, 0, 1)
> 
> I assume that this ability can be included very quickly but it
> requires cost model enhancements also.

Comments on the patch itself (as I'm having a closer look again,
I know how it vectorizes the above but I wondered why epilogue
and short-trip loops are not basically the same code path).

Btw, I don't like that the features are behind a --param paywall.
That just means a) nobody will use it, b) it will bit-rot quickly,
c) bugs are well-hidden.

+  if (loop_vinfo && LOOP_VINFO_CAN_BE_MASKED (loop_vinfo)
+  && integer_zerop (nested_in_vect_loop
+   ? STMT_VINFO_DR_STEP (stmt_info)
+   : DR_STEP (dr)))
+{
+  if (dump_enabled_p ())
+   dump_printf_loc (MSG_NOTE, vect_location,
+"allow invariant load for masked loop.\n");
+}

this can test memory_access_type == VMAT_INVARIANT.  Please put
all the checks in a common

  if (loop_vinfo && LOOP_VINFO_CAN_BE_MASKED (loop_vinfo))
{
   if (memory_access_type == VMAT_INVARIANT)
 {
 }
   else if (...)
 {
LOOP_VINFO_CAN_BE_MASKED (loop_vinfo) = false;
 }
   else if (..)
...
}

@@ -6667,6 +6756,15 @@ vectorizable_load (gimple *stmt, 
gimple_stmt_iterator *gsi, gimple **vec_stmt,
   gcc_assert (!nested_in_vect_loop);
   gcc_assert (!STMT_VINFO_GATHER_SCATTER_P (stmt_info));

+  if (loop_vinfo && LOOP_VINFO_CAN_BE_MASKED (loop_vinfo))
+   {
+ if (dump_enabled_p ())
+   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+"cannot be masked: grouped access is not"
+" supported.");
+ LOOP_VINFO_CAN_BE_MASKED (loop_vinfo) = false;
+  }
+

isn't this already handled by the above?  Or rather the general
disallowance of SLP?

@@ -5730,6 +5792,24 @@ vectorizable_store (gimple *stmt, 
gimple_stmt_iterator *gsi, gimple **vec_stmt,
_access_type, _info))
 return false;

+  if (loop_vinfo && LOOP_VINFO_CAN_BE_MASKED (loop_vinfo)
+  && memory_access_type != VMAT_CONTIGUOUS)
+{
+  LOOP_VINFO_CAN_BE_MASKED (loop_vinfo) = false;
+  if (dump_enabled_p ())
+   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+"cannot be masked: unsupported memory access 
type.\n");
+}
+
+  if (loop_vinfo && LOOP_VINFO_CAN_BE_MASKED (loop_vinfo)
+  && !can_mask_load_store (stmt))
+{
+  LOOP_VINFO_CAN_BE_MASKED (loop_vinfo) = false;
+  if (dump_enabled_p ())
+   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+"cannot be masked: unsupported masked store.\n");
+}
+

likewise please combine the ifs.

@@ -2354,7 +2401,10 @@ vectorizable_mask_load_store (gimple *stmt, 
gimple_stmt_iterator *gsi,
  ptr, vec_mask, vec_rhs);
  vect_finish_stmt_generation (stmt, new_stmt, gsi);
  if (i == 0)
-   STMT_VINFO_VEC_STMT (stmt_info) = *vec_stmt = new_stmt;
+   {
+ STMT_VINFO_VEC_STMT (stmt_info) = *vec_stmt = new_stmt;
+ STMT_VINFO_FIRST_COPY_P (vinfo_for_stmt (new_stmt)) = true;
+   }
  else
STMT_VINFO_RELATED_STMT (prev_stmt_info) = new_stmt;
  prev_stmt_info = vinfo_for_stmt (new_stmt);

here you only set the flag, elsewhere you copy DR and VECTYPE as well.

@@ -2113,6 +2146,20 @@ vectorizable_mask_load_store (gimple *stmt, 
gimple_stmt_iterator *gsi,
   && !useless_type_conversion_p (vectype, rhs_vectype)))
 return false;

+  if (LOOP_VINFO_CAN_BE_MASKED (loop_vinfo))
+{
+  /* Check that mask conjuction is supported.  */
+  optab tab;
+  tab = optab_for_tree_code (BIT_AND_EXPR, vectype, optab_default);
+  if (!tab || optab_handler (tab, TYPE_MODE (vectype)) == 
CODE_FOR_nothing)
+ 

Re: [PATCH, vec-tails] Support loop epilogue vectorization

2016-11-29 Thread Christophe Lyon
On 18 November 2016 at 16:54, Christophe Lyon
 wrote:
> On 18 November 2016 at 16:46, Yuri Rumyantsev  wrote:
>> It is very strange that this test failed on arm, since it requires
>> target avx2 to check vectorizer dumps:
>>
>> /* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 2 "vect" {
>> target avx2_runtime } } } */
>> /* { dg-final { scan-tree-dump-times "LOOP EPILOGUE VECTORIZED
>> \\(VS=16\\)" 2 "vect" { target avx2_runtime } } } */
>>
>> Could you please clarify what is the reason of the failure?
>
> It's not the scan-dumps that fail, but the execution.
> The test calls abort() for some reason.
>
> It will take me a while to rebuild the test manually in the right
> debug environment to provide you with more traces.
>
>
Sorry for the delay... This problem is not directly related to your patch.

The tests in gcc.dg/vect are compiled with -mfpu=neon
-mfloat-abi=softfp -march=armv7-a
and thus cannot be executed on older versions of the architecture.

This is another instance of what I discussed with Jakub several months ago:
https://gcc.gnu.org/ml/gcc-patches/2016-06/msg00666.html
but the thread died.

Basically, check_vect_support_and_set_flags sets set
dg-do-what-default compile, but
some tests in gcc.dg/vect have dg-do run hardcoded.

Jakub was not happy with my patch that was removing all these dg-do
run directives :-)

Christophe


>
>>
>> Thanks.
>>
>> 2016-11-18 16:20 GMT+03:00 Christophe Lyon :
>>> On 15 November 2016 at 15:41, Yuri Rumyantsev  wrote:
 Hi All,

 Here is patch for non-masked epilogue vectoriziation.

 Bootstrap and regression testing did not show any new failures.

 Is it OK for trunk?

 Thanks.
 Changelog:

 2016-11-15  Yuri Rumyantsev  

 * params.def (PARAM_VECT_EPILOGUES_NOMASK): New.
 * tree-if-conv.c (tree_if_conversion): Make public.
 * * tree-if-conv.h: New file.
 * tree-vect-data-refs.c (vect_analyze_data_ref_dependences) Avoid
 dynamic alias checks for epilogues.
 * tree-vect-loop-manip.c (vect_do_peeling): Return created epilog.
 * tree-vect-loop.c: include tree-if-conv.h.
 (new_loop_vec_info): Add zeroing orig_loop_info field.
 (vect_analyze_loop_2): Don't try to enhance alignment for epilogues.
 (vect_analyze_loop): Add argument ORIG_LOOP_INFO which is not NULL
 if epilogue is vectorized, set up orig_loop_info field of loop_vinfo
 using passed argument.
 (vect_transform_loop): Check if created epilogue should be returned
 for further vectorization with less vf.  If-convert epilogue if
 required. Print vectorization success for epilogue.
 * tree-vectorizer.c (vectorize_loops): Add epilogue vectorization
 if it is required, pass loop_vinfo produced during vectorization of
 loop body to vect_analyze_loop.
 * tree-vectorizer.h (struct _loop_vec_info): Add new field
 orig_loop_info.
 (LOOP_VINFO_ORIG_LOOP_INFO): New.
 (LOOP_VINFO_EPILOGUE_P): New.
 (LOOP_VINFO_ORIG_VECT_FACTOR): New.
 (vect_do_peeling): Change prototype to return epilogue.
 (vect_analyze_loop): Add argument of loop_vec_info type.
 (vect_transform_loop): Return created loop.

 gcc/testsuite/

 * lib/target-supports.exp (check_avx2_hw_available): New.
 (check_effective_target_avx2_runtime): New.
 * gcc.dg/vect/vect-tail-nomask-1.c: New test.

>>>
>>> Hi,
>>>
>>> This new test fails on arm-none-eabi (using default cpu/fpu/mode):
>>>   gcc.dg/vect/vect-tail-nomask-1.c -flto -ffat-lto-objects execution test
>>>   gcc.dg/vect/vect-tail-nomask-1.c execution test
>>>
>>> It does pass on the same target if configured --with-cpu=cortex-a9.
>>>
>>> Christophe
>>>
>>>
>>>

 2016-11-14 20:04 GMT+03:00 Richard Biener :
> On November 14, 2016 4:39:40 PM GMT+01:00, Yuri Rumyantsev 
>  wrote:
>>Richard,
>>
>>I checked one of the tests designed for epilogue vectorization using
>>patches 1 - 3 and found out that build compiler performs vectorization
>>of epilogues with --param vect-epilogues-nomask=1 passed:
>>
>>$ gcc -Ofast -mavx2 t1.c -S --param vect-epilogues-nomask=1 -o
>>t1.new-nomask.s -fdump-tree-vect-details
>>$ grep VECTORIZED -c t1.c.156t.vect
>>4
>> Without param only 2 loops are vectorized.
>>
>>Should I simply add a part of tests related to this feature or I must
>>delete all not necessary changes also?
>
> Please remove all not necessary changes.
>
> Richard.
>
>>Thanks.
>>Yuri.
>>
>>2016-11-14 16:40 GMT+03:00 Richard Biener :
>>> On Mon, 14 Nov 2016, Yuri Rumyantsev wrote:
>>>
 Richard,

 In my previous patch I forgot to remove couple lines related to aux
>>field.
 Here is the correct updated patch.

Re: [PATCH, vec-tails] Support loop epilogue vectorization

2016-11-28 Thread Yuri Rumyantsev
Richard!

I attached vect dump for hte part of attached test-case which
illustrated how vectorization of epilogues works through masking:
#define SIZE 1023
#define ALIGN 64

extern int posix_memalign(void **memptr, __SIZE_TYPE__ alignment,
__SIZE_TYPE__ size) __attribute__((weak));
extern void free (void *);

void __attribute__((noinline))
test_citer (int * __restrict__ a,
   int * __restrict__ b,
   int * __restrict__ c)
{
  int i;

  a = (int *)__builtin_assume_aligned (a, ALIGN);
  b = (int *)__builtin_assume_aligned (b, ALIGN);
  c = (int *)__builtin_assume_aligned (c, ALIGN);

  for (i = 0; i < SIZE; i++)
c[i] = a[i] + b[i];
}

It was compiled with -mavx2 --param vect-epilogues-mask=1 options.

I did not include in this patch vectorization of low trip-count loops
since in the original patch additional parameter was introduced:
+DEFPARAM (PARAM_VECT_SHORT_LOOPS,
+  "vect-short-loops",
+  "Enable vectorization of low trip count loops using masking.",
+  0, 0, 1)

I assume that this ability can be included very quickly but it
requires cost model enhancements also.

Best regards.
Yuri.


2016-11-28 17:39 GMT+03:00 Richard Biener :
> On Thu, 24 Nov 2016, Yuri Rumyantsev wrote:
>
>> Hi All,
>>
>> Here is the second patch which supports epilogue vectorization using
>> masking without cost model. Currently it is possible
>> only with passing parameter "--param vect-epilogues-mask=1".
>>
>> Bootstrapping and regression testing did not show any new regression.
>>
>> Any comments will be appreciated.
>
> Going over the patch the main question is one how it works -- it looks
> like the decision whether to vectorize & mask the epilogue is made
> when vectorizing the loop that generates the epilogue rather than
> in the epilogue vectorization path?
>
> That is, I'd have expected to see this handling low-trip count loops
> by masking?  And thus masking the epilogue simply by it being
> low-trip count?
>
> Richard.
>
>> ChangeLog:
>> 2016-11-24  Yuri Rumyantsev  
>>
>> * params.def (PARAM_VECT_EPILOGUES_MASK): New.
>> * tree-vect-data-refs.c (vect_get_new_ssa_name): Support vect_mask_var.
>> * tree-vect-loop.c: Include insn-config.h, recog.h and alias.h.
>> (new_loop_vec_info): Add zeroing can_be_masked, mask_loop and
>> required_mask fields.
>> (vect_check_required_masks_widening): New.
>> (vect_check_required_masks_narrowing): New.
>> (vect_get_masking_iv_elems): New.
>> (vect_get_masking_iv_type): New.
>> (vect_get_extreme_masks): New.
>> (vect_check_required_masks): New.
>> (vect_analyze_loop_operations): Call vect_check_required_masks if all
>> statements can be masked.
>> (vect_analyze_loop_2): Inititalize to zero min_scalar_loop_bound.
>> Add check that epilogue can be masked with the same vf with issue
>> fail notes.  Allow epilogue vectorization through masking of low trip
>> loops. Set to true can_be_masked field before loop operation analysis.
>> Do not set-up min_scalar_loop_bound for epilogue vectorization through
>> masking.  Do not peeling for epilogue masking.  Reset can_be_masked
>> field before repeat analysis.
>> (vect_estimate_min_profitable_iters): Do not compute profitability
>> for epilogue masking.  Set up mask_loop filed to true if parameter
>> PARAM_VECT_EPILOGUES_MASK is non-zero.
>> (vectorizable_reduction): Add check that statement can be masked.
>> (vectorizable_induction): Do not support masking for induction.
>> (vect_gen_ivs_for_masking): New.
>> (vect_get_mask_index_for_elems): New.
>> (vect_get_mask_index_for_type): New.
>> (vect_create_narrowed_masks): New.
>> (vect_create_widened_masks): New.
>> (vect_gen_loop_masks): New.
>> (vect_mask_reduction_stmt): New.
>> (vect_mask_mask_load_store_stmt): New.
>> (vect_mask_load_store_stmt): New.
>> (vect_mask_loop): New.
>> (vect_transform_loop): Invoke vect_mask_loop if required.
>> Use div_ceil to recompute upper bounds for masked loops.  Issue
>> statistics for epilogue vectorization through masking. Do not reduce
>> vf for masking epilogue.
>> * tree-vect-stmts.c: Include tree-ssa-loop-ivopts.h.
>> (can_mask_load_store): New.
>> (vectorizable_mask_load_store): Check that mask conjuction is
>> supported.  Set-up first_copy_p field of stmt_vinfo.
>> (vectorizable_simd_clone_call): Check that simd clone can not be
>> masked.
>> (vectorizable_store): Check that store can be masked. Mark the first
>> copy of generated vector stores and provide it with vectype and the
>> original data reference.
>> (vectorizable_load): Check that load can be masked.
>> (vect_stmt_should_be_masked_for_epilogue): New.
>> (vect_add_required_mask_for_stmt): New.
>> (vect_analyze_stmt): Add check on unsupported statements for masking
>> with printing message.
>> * tree-vectorizer.h (struct _loop_vec_info): Add new fields
>> can_be_maske, required_masks, masl_loop.
>> (LOOP_VINFO_CAN_BE_MASKED): New.
>> (LOOP_VINFO_REQUIRED_MASKS): New.
>> (LOOP_VINFO_MASK_LOOP): New.
>> (struct _stmt_vec_info): Add first_copy_p field.
>> 

Re: [PATCH, vec-tails] Support loop epilogue vectorization

2016-11-28 Thread Richard Biener
On Thu, 24 Nov 2016, Yuri Rumyantsev wrote:

> Hi All,
> 
> Here is the second patch which supports epilogue vectorization using
> masking without cost model. Currently it is possible
> only with passing parameter "--param vect-epilogues-mask=1".
> 
> Bootstrapping and regression testing did not show any new regression.
> 
> Any comments will be appreciated.

Going over the patch the main question is one how it works -- it looks
like the decision whether to vectorize & mask the epilogue is made
when vectorizing the loop that generates the epilogue rather than
in the epilogue vectorization path?

That is, I'd have expected to see this handling low-trip count loops
by masking?  And thus masking the epilogue simply by it being
low-trip count?

Richard.

> ChangeLog:
> 2016-11-24  Yuri Rumyantsev  
> 
> * params.def (PARAM_VECT_EPILOGUES_MASK): New.
> * tree-vect-data-refs.c (vect_get_new_ssa_name): Support vect_mask_var.
> * tree-vect-loop.c: Include insn-config.h, recog.h and alias.h.
> (new_loop_vec_info): Add zeroing can_be_masked, mask_loop and
> required_mask fields.
> (vect_check_required_masks_widening): New.
> (vect_check_required_masks_narrowing): New.
> (vect_get_masking_iv_elems): New.
> (vect_get_masking_iv_type): New.
> (vect_get_extreme_masks): New.
> (vect_check_required_masks): New.
> (vect_analyze_loop_operations): Call vect_check_required_masks if all
> statements can be masked.
> (vect_analyze_loop_2): Inititalize to zero min_scalar_loop_bound.
> Add check that epilogue can be masked with the same vf with issue
> fail notes.  Allow epilogue vectorization through masking of low trip
> loops. Set to true can_be_masked field before loop operation analysis.
> Do not set-up min_scalar_loop_bound for epilogue vectorization through
> masking.  Do not peeling for epilogue masking.  Reset can_be_masked
> field before repeat analysis.
> (vect_estimate_min_profitable_iters): Do not compute profitability
> for epilogue masking.  Set up mask_loop filed to true if parameter
> PARAM_VECT_EPILOGUES_MASK is non-zero.
> (vectorizable_reduction): Add check that statement can be masked.
> (vectorizable_induction): Do not support masking for induction.
> (vect_gen_ivs_for_masking): New.
> (vect_get_mask_index_for_elems): New.
> (vect_get_mask_index_for_type): New.
> (vect_create_narrowed_masks): New.
> (vect_create_widened_masks): New.
> (vect_gen_loop_masks): New.
> (vect_mask_reduction_stmt): New.
> (vect_mask_mask_load_store_stmt): New.
> (vect_mask_load_store_stmt): New.
> (vect_mask_loop): New.
> (vect_transform_loop): Invoke vect_mask_loop if required.
> Use div_ceil to recompute upper bounds for masked loops.  Issue
> statistics for epilogue vectorization through masking. Do not reduce
> vf for masking epilogue.
> * tree-vect-stmts.c: Include tree-ssa-loop-ivopts.h.
> (can_mask_load_store): New.
> (vectorizable_mask_load_store): Check that mask conjuction is
> supported.  Set-up first_copy_p field of stmt_vinfo.
> (vectorizable_simd_clone_call): Check that simd clone can not be
> masked.
> (vectorizable_store): Check that store can be masked. Mark the first
> copy of generated vector stores and provide it with vectype and the
> original data reference.
> (vectorizable_load): Check that load can be masked.
> (vect_stmt_should_be_masked_for_epilogue): New.
> (vect_add_required_mask_for_stmt): New.
> (vect_analyze_stmt): Add check on unsupported statements for masking
> with printing message.
> * tree-vectorizer.h (struct _loop_vec_info): Add new fields
> can_be_maske, required_masks, masl_loop.
> (LOOP_VINFO_CAN_BE_MASKED): New.
> (LOOP_VINFO_REQUIRED_MASKS): New.
> (LOOP_VINFO_MASK_LOOP): New.
> (struct _stmt_vec_info): Add first_copy_p field.
> (STMT_VINFO_FIRST_COPY_P): New.
> 
> gcc/testsuite/
> 
> * gcc.dg/vect/vect-tail-mask-1.c: New test.
> 
> 2016-11-18 18:54 GMT+03:00 Christophe Lyon :
> > On 18 November 2016 at 16:46, Yuri Rumyantsev  wrote:
> >> It is very strange that this test failed on arm, since it requires
> >> target avx2 to check vectorizer dumps:
> >>
> >> /* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 2 "vect" {
> >> target avx2_runtime } } } */
> >> /* { dg-final { scan-tree-dump-times "LOOP EPILOGUE VECTORIZED
> >> \\(VS=16\\)" 2 "vect" { target avx2_runtime } } } */
> >>
> >> Could you please clarify what is the reason of the failure?
> >
> > It's not the scan-dumps that fail, but the execution.
> > The test calls abort() for some reason.
> >
> > It will take me a while to rebuild the test manually in the right
> > debug environment to provide you with more traces.
> >
> >
> >
> >>
> >> Thanks.
> >>
> >> 2016-11-18 16:20 GMT+03:00 Christophe Lyon :
> >>> On 15 November 2016 at 15:41, Yuri Rumyantsev  wrote:
>  Hi All,
> 
>  Here is patch for non-masked epilogue vectoriziation.
> 
>  Bootstrap and regression testing did not show any new 

Re: [PATCH, vec-tails] Support loop epilogue vectorization

2016-11-24 Thread Yuri Rumyantsev
Hi All,

Here is the second patch which supports epilogue vectorization using
masking without cost model. Currently it is possible
only with passing parameter "--param vect-epilogues-mask=1".

Bootstrapping and regression testing did not show any new regression.

Any comments will be appreciated.

ChangeLog:
2016-11-24  Yuri Rumyantsev  

* params.def (PARAM_VECT_EPILOGUES_MASK): New.
* tree-vect-data-refs.c (vect_get_new_ssa_name): Support vect_mask_var.
* tree-vect-loop.c: Include insn-config.h, recog.h and alias.h.
(new_loop_vec_info): Add zeroing can_be_masked, mask_loop and
required_mask fields.
(vect_check_required_masks_widening): New.
(vect_check_required_masks_narrowing): New.
(vect_get_masking_iv_elems): New.
(vect_get_masking_iv_type): New.
(vect_get_extreme_masks): New.
(vect_check_required_masks): New.
(vect_analyze_loop_operations): Call vect_check_required_masks if all
statements can be masked.
(vect_analyze_loop_2): Inititalize to zero min_scalar_loop_bound.
Add check that epilogue can be masked with the same vf with issue
fail notes.  Allow epilogue vectorization through masking of low trip
loops. Set to true can_be_masked field before loop operation analysis.
Do not set-up min_scalar_loop_bound for epilogue vectorization through
masking.  Do not peeling for epilogue masking.  Reset can_be_masked
field before repeat analysis.
(vect_estimate_min_profitable_iters): Do not compute profitability
for epilogue masking.  Set up mask_loop filed to true if parameter
PARAM_VECT_EPILOGUES_MASK is non-zero.
(vectorizable_reduction): Add check that statement can be masked.
(vectorizable_induction): Do not support masking for induction.
(vect_gen_ivs_for_masking): New.
(vect_get_mask_index_for_elems): New.
(vect_get_mask_index_for_type): New.
(vect_create_narrowed_masks): New.
(vect_create_widened_masks): New.
(vect_gen_loop_masks): New.
(vect_mask_reduction_stmt): New.
(vect_mask_mask_load_store_stmt): New.
(vect_mask_load_store_stmt): New.
(vect_mask_loop): New.
(vect_transform_loop): Invoke vect_mask_loop if required.
Use div_ceil to recompute upper bounds for masked loops.  Issue
statistics for epilogue vectorization through masking. Do not reduce
vf for masking epilogue.
* tree-vect-stmts.c: Include tree-ssa-loop-ivopts.h.
(can_mask_load_store): New.
(vectorizable_mask_load_store): Check that mask conjuction is
supported.  Set-up first_copy_p field of stmt_vinfo.
(vectorizable_simd_clone_call): Check that simd clone can not be
masked.
(vectorizable_store): Check that store can be masked. Mark the first
copy of generated vector stores and provide it with vectype and the
original data reference.
(vectorizable_load): Check that load can be masked.
(vect_stmt_should_be_masked_for_epilogue): New.
(vect_add_required_mask_for_stmt): New.
(vect_analyze_stmt): Add check on unsupported statements for masking
with printing message.
* tree-vectorizer.h (struct _loop_vec_info): Add new fields
can_be_maske, required_masks, masl_loop.
(LOOP_VINFO_CAN_BE_MASKED): New.
(LOOP_VINFO_REQUIRED_MASKS): New.
(LOOP_VINFO_MASK_LOOP): New.
(struct _stmt_vec_info): Add first_copy_p field.
(STMT_VINFO_FIRST_COPY_P): New.

gcc/testsuite/

* gcc.dg/vect/vect-tail-mask-1.c: New test.

2016-11-18 18:54 GMT+03:00 Christophe Lyon :
> On 18 November 2016 at 16:46, Yuri Rumyantsev  wrote:
>> It is very strange that this test failed on arm, since it requires
>> target avx2 to check vectorizer dumps:
>>
>> /* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 2 "vect" {
>> target avx2_runtime } } } */
>> /* { dg-final { scan-tree-dump-times "LOOP EPILOGUE VECTORIZED
>> \\(VS=16\\)" 2 "vect" { target avx2_runtime } } } */
>>
>> Could you please clarify what is the reason of the failure?
>
> It's not the scan-dumps that fail, but the execution.
> The test calls abort() for some reason.
>
> It will take me a while to rebuild the test manually in the right
> debug environment to provide you with more traces.
>
>
>
>>
>> Thanks.
>>
>> 2016-11-18 16:20 GMT+03:00 Christophe Lyon :
>>> On 15 November 2016 at 15:41, Yuri Rumyantsev  wrote:
 Hi All,

 Here is patch for non-masked epilogue vectoriziation.

 Bootstrap and regression testing did not show any new failures.

 Is it OK for trunk?

 Thanks.
 Changelog:

 2016-11-15  Yuri Rumyantsev  

 * params.def (PARAM_VECT_EPILOGUES_NOMASK): New.
 * tree-if-conv.c (tree_if_conversion): Make public.
 * * tree-if-conv.h: New file.
 * tree-vect-data-refs.c (vect_analyze_data_ref_dependences) Avoid
 dynamic alias checks for epilogues.
 * tree-vect-loop-manip.c (vect_do_peeling): Return created epilog.
 * tree-vect-loop.c: include tree-if-conv.h.
 (new_loop_vec_info): Add zeroing orig_loop_info field.
 (vect_analyze_loop_2): Don't try to enhance alignment for epilogues.
 

Re: [PATCH, vec-tails] Support loop epilogue vectorization

2016-11-18 Thread Christophe Lyon
On 18 November 2016 at 16:46, Yuri Rumyantsev  wrote:
> It is very strange that this test failed on arm, since it requires
> target avx2 to check vectorizer dumps:
>
> /* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 2 "vect" {
> target avx2_runtime } } } */
> /* { dg-final { scan-tree-dump-times "LOOP EPILOGUE VECTORIZED
> \\(VS=16\\)" 2 "vect" { target avx2_runtime } } } */
>
> Could you please clarify what is the reason of the failure?

It's not the scan-dumps that fail, but the execution.
The test calls abort() for some reason.

It will take me a while to rebuild the test manually in the right
debug environment to provide you with more traces.



>
> Thanks.
>
> 2016-11-18 16:20 GMT+03:00 Christophe Lyon :
>> On 15 November 2016 at 15:41, Yuri Rumyantsev  wrote:
>>> Hi All,
>>>
>>> Here is patch for non-masked epilogue vectoriziation.
>>>
>>> Bootstrap and regression testing did not show any new failures.
>>>
>>> Is it OK for trunk?
>>>
>>> Thanks.
>>> Changelog:
>>>
>>> 2016-11-15  Yuri Rumyantsev  
>>>
>>> * params.def (PARAM_VECT_EPILOGUES_NOMASK): New.
>>> * tree-if-conv.c (tree_if_conversion): Make public.
>>> * * tree-if-conv.h: New file.
>>> * tree-vect-data-refs.c (vect_analyze_data_ref_dependences) Avoid
>>> dynamic alias checks for epilogues.
>>> * tree-vect-loop-manip.c (vect_do_peeling): Return created epilog.
>>> * tree-vect-loop.c: include tree-if-conv.h.
>>> (new_loop_vec_info): Add zeroing orig_loop_info field.
>>> (vect_analyze_loop_2): Don't try to enhance alignment for epilogues.
>>> (vect_analyze_loop): Add argument ORIG_LOOP_INFO which is not NULL
>>> if epilogue is vectorized, set up orig_loop_info field of loop_vinfo
>>> using passed argument.
>>> (vect_transform_loop): Check if created epilogue should be returned
>>> for further vectorization with less vf.  If-convert epilogue if
>>> required. Print vectorization success for epilogue.
>>> * tree-vectorizer.c (vectorize_loops): Add epilogue vectorization
>>> if it is required, pass loop_vinfo produced during vectorization of
>>> loop body to vect_analyze_loop.
>>> * tree-vectorizer.h (struct _loop_vec_info): Add new field
>>> orig_loop_info.
>>> (LOOP_VINFO_ORIG_LOOP_INFO): New.
>>> (LOOP_VINFO_EPILOGUE_P): New.
>>> (LOOP_VINFO_ORIG_VECT_FACTOR): New.
>>> (vect_do_peeling): Change prototype to return epilogue.
>>> (vect_analyze_loop): Add argument of loop_vec_info type.
>>> (vect_transform_loop): Return created loop.
>>>
>>> gcc/testsuite/
>>>
>>> * lib/target-supports.exp (check_avx2_hw_available): New.
>>> (check_effective_target_avx2_runtime): New.
>>> * gcc.dg/vect/vect-tail-nomask-1.c: New test.
>>>
>>
>> Hi,
>>
>> This new test fails on arm-none-eabi (using default cpu/fpu/mode):
>>   gcc.dg/vect/vect-tail-nomask-1.c -flto -ffat-lto-objects execution test
>>   gcc.dg/vect/vect-tail-nomask-1.c execution test
>>
>> It does pass on the same target if configured --with-cpu=cortex-a9.
>>
>> Christophe
>>
>>
>>
>>>
>>> 2016-11-14 20:04 GMT+03:00 Richard Biener :
 On November 14, 2016 4:39:40 PM GMT+01:00, Yuri Rumyantsev 
  wrote:
>Richard,
>
>I checked one of the tests designed for epilogue vectorization using
>patches 1 - 3 and found out that build compiler performs vectorization
>of epilogues with --param vect-epilogues-nomask=1 passed:
>
>$ gcc -Ofast -mavx2 t1.c -S --param vect-epilogues-nomask=1 -o
>t1.new-nomask.s -fdump-tree-vect-details
>$ grep VECTORIZED -c t1.c.156t.vect
>4
> Without param only 2 loops are vectorized.
>
>Should I simply add a part of tests related to this feature or I must
>delete all not necessary changes also?

 Please remove all not necessary changes.

 Richard.

>Thanks.
>Yuri.
>
>2016-11-14 16:40 GMT+03:00 Richard Biener :
>> On Mon, 14 Nov 2016, Yuri Rumyantsev wrote:
>>
>>> Richard,
>>>
>>> In my previous patch I forgot to remove couple lines related to aux
>field.
>>> Here is the correct updated patch.
>>
>> Yeah, I noticed.  This patch would be ok for trunk (together with
>> necessary parts from 1 and 2) if all not required parts are removed
>> (and you'd add the testcases covering non-masked tail vect).
>>
>> Thus, can you please produce a single complete patch containing only
>> non-masked epilogue vectoriziation?
>>
>> Thanks,
>> Richard.
>>
>>> Thanks.
>>> Yuri.
>>>
>>> 2016-11-14 15:51 GMT+03:00 Richard Biener :
>>> > On Fri, 11 Nov 2016, Yuri Rumyantsev wrote:
>>> >
>>> >> Richard,
>>> >>
>>> >> I prepare updated 3 patch with passing additional argument to
>>> >> vect_analyze_loop as you proposed (untested).
>>> >>
>>> >> You wrote:
>>> >> tw, I wonder if you can produce a single patch containing 

Re: [PATCH, vec-tails] Support loop epilogue vectorization

2016-11-18 Thread Yuri Rumyantsev
It is very strange that this test failed on arm, since it requires
target avx2 to check vectorizer dumps:

/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 2 "vect" {
target avx2_runtime } } } */
/* { dg-final { scan-tree-dump-times "LOOP EPILOGUE VECTORIZED
\\(VS=16\\)" 2 "vect" { target avx2_runtime } } } */

Could you please clarify what is the reason of the failure?

Thanks.

2016-11-18 16:20 GMT+03:00 Christophe Lyon :
> On 15 November 2016 at 15:41, Yuri Rumyantsev  wrote:
>> Hi All,
>>
>> Here is patch for non-masked epilogue vectoriziation.
>>
>> Bootstrap and regression testing did not show any new failures.
>>
>> Is it OK for trunk?
>>
>> Thanks.
>> Changelog:
>>
>> 2016-11-15  Yuri Rumyantsev  
>>
>> * params.def (PARAM_VECT_EPILOGUES_NOMASK): New.
>> * tree-if-conv.c (tree_if_conversion): Make public.
>> * * tree-if-conv.h: New file.
>> * tree-vect-data-refs.c (vect_analyze_data_ref_dependences) Avoid
>> dynamic alias checks for epilogues.
>> * tree-vect-loop-manip.c (vect_do_peeling): Return created epilog.
>> * tree-vect-loop.c: include tree-if-conv.h.
>> (new_loop_vec_info): Add zeroing orig_loop_info field.
>> (vect_analyze_loop_2): Don't try to enhance alignment for epilogues.
>> (vect_analyze_loop): Add argument ORIG_LOOP_INFO which is not NULL
>> if epilogue is vectorized, set up orig_loop_info field of loop_vinfo
>> using passed argument.
>> (vect_transform_loop): Check if created epilogue should be returned
>> for further vectorization with less vf.  If-convert epilogue if
>> required. Print vectorization success for epilogue.
>> * tree-vectorizer.c (vectorize_loops): Add epilogue vectorization
>> if it is required, pass loop_vinfo produced during vectorization of
>> loop body to vect_analyze_loop.
>> * tree-vectorizer.h (struct _loop_vec_info): Add new field
>> orig_loop_info.
>> (LOOP_VINFO_ORIG_LOOP_INFO): New.
>> (LOOP_VINFO_EPILOGUE_P): New.
>> (LOOP_VINFO_ORIG_VECT_FACTOR): New.
>> (vect_do_peeling): Change prototype to return epilogue.
>> (vect_analyze_loop): Add argument of loop_vec_info type.
>> (vect_transform_loop): Return created loop.
>>
>> gcc/testsuite/
>>
>> * lib/target-supports.exp (check_avx2_hw_available): New.
>> (check_effective_target_avx2_runtime): New.
>> * gcc.dg/vect/vect-tail-nomask-1.c: New test.
>>
>
> Hi,
>
> This new test fails on arm-none-eabi (using default cpu/fpu/mode):
>   gcc.dg/vect/vect-tail-nomask-1.c -flto -ffat-lto-objects execution test
>   gcc.dg/vect/vect-tail-nomask-1.c execution test
>
> It does pass on the same target if configured --with-cpu=cortex-a9.
>
> Christophe
>
>
>
>>
>> 2016-11-14 20:04 GMT+03:00 Richard Biener :
>>> On November 14, 2016 4:39:40 PM GMT+01:00, Yuri Rumyantsev 
>>>  wrote:
Richard,

I checked one of the tests designed for epilogue vectorization using
patches 1 - 3 and found out that build compiler performs vectorization
of epilogues with --param vect-epilogues-nomask=1 passed:

$ gcc -Ofast -mavx2 t1.c -S --param vect-epilogues-nomask=1 -o
t1.new-nomask.s -fdump-tree-vect-details
$ grep VECTORIZED -c t1.c.156t.vect
4
 Without param only 2 loops are vectorized.

Should I simply add a part of tests related to this feature or I must
delete all not necessary changes also?
>>>
>>> Please remove all not necessary changes.
>>>
>>> Richard.
>>>
Thanks.
Yuri.

2016-11-14 16:40 GMT+03:00 Richard Biener :
> On Mon, 14 Nov 2016, Yuri Rumyantsev wrote:
>
>> Richard,
>>
>> In my previous patch I forgot to remove couple lines related to aux
field.
>> Here is the correct updated patch.
>
> Yeah, I noticed.  This patch would be ok for trunk (together with
> necessary parts from 1 and 2) if all not required parts are removed
> (and you'd add the testcases covering non-masked tail vect).
>
> Thus, can you please produce a single complete patch containing only
> non-masked epilogue vectoriziation?
>
> Thanks,
> Richard.
>
>> Thanks.
>> Yuri.
>>
>> 2016-11-14 15:51 GMT+03:00 Richard Biener :
>> > On Fri, 11 Nov 2016, Yuri Rumyantsev wrote:
>> >
>> >> Richard,
>> >>
>> >> I prepare updated 3 patch with passing additional argument to
>> >> vect_analyze_loop as you proposed (untested).
>> >>
>> >> You wrote:
>> >> tw, I wonder if you can produce a single patch containing just
>> >> epilogue vectorization, that is combine patches 1-3 but rip out
>> >> changes only needed by later patches?
>> >>
>> >> Did you mean that I exclude all support for vectorization
epilogues,
>> >> i.e. exclude from 2-nd patch all non-related changes
>> >> like
>> >>
>> >> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
>> >> index 11863af..32011c1 100644
>> >> --- 

Re: [PATCH, vec-tails] Support loop epilogue vectorization

2016-11-18 Thread Christophe Lyon
On 15 November 2016 at 15:41, Yuri Rumyantsev  wrote:
> Hi All,
>
> Here is patch for non-masked epilogue vectoriziation.
>
> Bootstrap and regression testing did not show any new failures.
>
> Is it OK for trunk?
>
> Thanks.
> Changelog:
>
> 2016-11-15  Yuri Rumyantsev  
>
> * params.def (PARAM_VECT_EPILOGUES_NOMASK): New.
> * tree-if-conv.c (tree_if_conversion): Make public.
> * * tree-if-conv.h: New file.
> * tree-vect-data-refs.c (vect_analyze_data_ref_dependences) Avoid
> dynamic alias checks for epilogues.
> * tree-vect-loop-manip.c (vect_do_peeling): Return created epilog.
> * tree-vect-loop.c: include tree-if-conv.h.
> (new_loop_vec_info): Add zeroing orig_loop_info field.
> (vect_analyze_loop_2): Don't try to enhance alignment for epilogues.
> (vect_analyze_loop): Add argument ORIG_LOOP_INFO which is not NULL
> if epilogue is vectorized, set up orig_loop_info field of loop_vinfo
> using passed argument.
> (vect_transform_loop): Check if created epilogue should be returned
> for further vectorization with less vf.  If-convert epilogue if
> required. Print vectorization success for epilogue.
> * tree-vectorizer.c (vectorize_loops): Add epilogue vectorization
> if it is required, pass loop_vinfo produced during vectorization of
> loop body to vect_analyze_loop.
> * tree-vectorizer.h (struct _loop_vec_info): Add new field
> orig_loop_info.
> (LOOP_VINFO_ORIG_LOOP_INFO): New.
> (LOOP_VINFO_EPILOGUE_P): New.
> (LOOP_VINFO_ORIG_VECT_FACTOR): New.
> (vect_do_peeling): Change prototype to return epilogue.
> (vect_analyze_loop): Add argument of loop_vec_info type.
> (vect_transform_loop): Return created loop.
>
> gcc/testsuite/
>
> * lib/target-supports.exp (check_avx2_hw_available): New.
> (check_effective_target_avx2_runtime): New.
> * gcc.dg/vect/vect-tail-nomask-1.c: New test.
>

Hi,

This new test fails on arm-none-eabi (using default cpu/fpu/mode):
  gcc.dg/vect/vect-tail-nomask-1.c -flto -ffat-lto-objects execution test
  gcc.dg/vect/vect-tail-nomask-1.c execution test

It does pass on the same target if configured --with-cpu=cortex-a9.

Christophe



>
> 2016-11-14 20:04 GMT+03:00 Richard Biener :
>> On November 14, 2016 4:39:40 PM GMT+01:00, Yuri Rumyantsev 
>>  wrote:
>>>Richard,
>>>
>>>I checked one of the tests designed for epilogue vectorization using
>>>patches 1 - 3 and found out that build compiler performs vectorization
>>>of epilogues with --param vect-epilogues-nomask=1 passed:
>>>
>>>$ gcc -Ofast -mavx2 t1.c -S --param vect-epilogues-nomask=1 -o
>>>t1.new-nomask.s -fdump-tree-vect-details
>>>$ grep VECTORIZED -c t1.c.156t.vect
>>>4
>>> Without param only 2 loops are vectorized.
>>>
>>>Should I simply add a part of tests related to this feature or I must
>>>delete all not necessary changes also?
>>
>> Please remove all not necessary changes.
>>
>> Richard.
>>
>>>Thanks.
>>>Yuri.
>>>
>>>2016-11-14 16:40 GMT+03:00 Richard Biener :
 On Mon, 14 Nov 2016, Yuri Rumyantsev wrote:

> Richard,
>
> In my previous patch I forgot to remove couple lines related to aux
>>>field.
> Here is the correct updated patch.

 Yeah, I noticed.  This patch would be ok for trunk (together with
 necessary parts from 1 and 2) if all not required parts are removed
 (and you'd add the testcases covering non-masked tail vect).

 Thus, can you please produce a single complete patch containing only
 non-masked epilogue vectoriziation?

 Thanks,
 Richard.

> Thanks.
> Yuri.
>
> 2016-11-14 15:51 GMT+03:00 Richard Biener :
> > On Fri, 11 Nov 2016, Yuri Rumyantsev wrote:
> >
> >> Richard,
> >>
> >> I prepare updated 3 patch with passing additional argument to
> >> vect_analyze_loop as you proposed (untested).
> >>
> >> You wrote:
> >> tw, I wonder if you can produce a single patch containing just
> >> epilogue vectorization, that is combine patches 1-3 but rip out
> >> changes only needed by later patches?
> >>
> >> Did you mean that I exclude all support for vectorization
>>>epilogues,
> >> i.e. exclude from 2-nd patch all non-related changes
> >> like
> >>
> >> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
> >> index 11863af..32011c1 100644
> >> --- a/gcc/tree-vect-loop.c
> >> +++ b/gcc/tree-vect-loop.c
> >> @@ -1120,6 +1120,12 @@ new_loop_vec_info (struct loop *loop)
> >>LOOP_VINFO_PEELING_FOR_GAPS (res) = false;
> >>LOOP_VINFO_PEELING_FOR_NITER (res) = false;
> >>LOOP_VINFO_OPERANDS_SWAPPED (res) = false;
> >> +  LOOP_VINFO_CAN_BE_MASKED (res) = false;
> >> +  LOOP_VINFO_REQUIRED_MASKS (res) = 0;
> >> +  LOOP_VINFO_COMBINE_EPILOGUE (res) = false;
> >> +  LOOP_VINFO_MASK_EPILOGUE (res) = false;
> >> +  LOOP_VINFO_NEED_MASKING (res) = false;
> >> +  LOOP_VINFO_ORIG_LOOP_INFO (res) = 

Re: [PATCH, vec-tails] Support loop epilogue vectorization

2016-11-16 Thread Richard Biener
On Tue, 15 Nov 2016, Yuri Rumyantsev wrote:

> Hi All,
> 
> Here is patch for non-masked epilogue vectoriziation.
> 
> Bootstrap and regression testing did not show any new failures.
> 
> Is it OK for trunk?

Ok for trunk.

I believe we ultimatively want to remove the new
--param and enable this by default with a better cost model.
What immediately comes to my mind when seeing

+  if (epilogue)
+{
+   unsigned int vector_sizes
+ = targetm.vectorize.autovectorize_vector_sizes ();
+   vector_sizes &= current_vector_size - 1;
+
+   if (!PARAM_VALUE (PARAM_VECT_EPILOGUES_NOMASK))
+ epilogue = NULL;
+   else if (!vector_sizes)
+ epilogue = NULL;
+   else if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
+&& LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo) >= 0)
+ {
+   int smallest_vec_size = 1 << ctz_hwi (vector_sizes);
+   int ratio = current_vector_size / smallest_vec_size;
+   int eiters = LOOP_VINFO_INT_NITERS (loop_vinfo)
+ - LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo);
+   eiters = eiters % vf;
+
+   epilogue->nb_iterations_upper_bound = eiters - 1;
+
+   if (eiters < vf / ratio)
+ epilogue = NULL;
+   }

is that we have targetm.vectorize.preferred_simd_mode which
for example with -mprefer-avx128 will first try with 128bit
vectorization.  So if a ! prefered vector size ends up
creating the epilogue we know vectorizing with the prefered
size will fail.  The above also does not take into account
peeling for gaps in which case we know the epilogue runs at
least VF times (but the vectorized epilogue might also need
an epilogue itself if not masked).

The natural next step is the masked epilogue support, after
that the merged masked epilogue one.

Thanks,
Richard.

> Thanks.
> Changelog:
> 
> 2016-11-15  Yuri Rumyantsev  
> 
> * params.def (PARAM_VECT_EPILOGUES_NOMASK): New.
> * tree-if-conv.c (tree_if_conversion): Make public.
> * * tree-if-conv.h: New file.
> * tree-vect-data-refs.c (vect_analyze_data_ref_dependences) Avoid
> dynamic alias checks for epilogues.
> * tree-vect-loop-manip.c (vect_do_peeling): Return created epilog.
> * tree-vect-loop.c: include tree-if-conv.h.
> (new_loop_vec_info): Add zeroing orig_loop_info field.
> (vect_analyze_loop_2): Don't try to enhance alignment for epilogues.
> (vect_analyze_loop): Add argument ORIG_LOOP_INFO which is not NULL
> if epilogue is vectorized, set up orig_loop_info field of loop_vinfo
> using passed argument.
> (vect_transform_loop): Check if created epilogue should be returned
> for further vectorization with less vf.  If-convert epilogue if
> required. Print vectorization success for epilogue.
> * tree-vectorizer.c (vectorize_loops): Add epilogue vectorization
> if it is required, pass loop_vinfo produced during vectorization of
> loop body to vect_analyze_loop.
> * tree-vectorizer.h (struct _loop_vec_info): Add new field
> orig_loop_info.
> (LOOP_VINFO_ORIG_LOOP_INFO): New.
> (LOOP_VINFO_EPILOGUE_P): New.
> (LOOP_VINFO_ORIG_VECT_FACTOR): New.
> (vect_do_peeling): Change prototype to return epilogue.
> (vect_analyze_loop): Add argument of loop_vec_info type.
> (vect_transform_loop): Return created loop.
> 
> gcc/testsuite/
> 
> * lib/target-supports.exp (check_avx2_hw_available): New.
> (check_effective_target_avx2_runtime): New.
> * gcc.dg/vect/vect-tail-nomask-1.c: New test.
> 
> 
> 2016-11-14 20:04 GMT+03:00 Richard Biener :
> > On November 14, 2016 4:39:40 PM GMT+01:00, Yuri Rumyantsev 
> >  wrote:
> >>Richard,
> >>
> >>I checked one of the tests designed for epilogue vectorization using
> >>patches 1 - 3 and found out that build compiler performs vectorization
> >>of epilogues with --param vect-epilogues-nomask=1 passed:
> >>
> >>$ gcc -Ofast -mavx2 t1.c -S --param vect-epilogues-nomask=1 -o
> >>t1.new-nomask.s -fdump-tree-vect-details
> >>$ grep VECTORIZED -c t1.c.156t.vect
> >>4
> >> Without param only 2 loops are vectorized.
> >>
> >>Should I simply add a part of tests related to this feature or I must
> >>delete all not necessary changes also?
> >
> > Please remove all not necessary changes.
> >
> > Richard.
> >
> >>Thanks.
> >>Yuri.
> >>
> >>2016-11-14 16:40 GMT+03:00 Richard Biener :
> >>> On Mon, 14 Nov 2016, Yuri Rumyantsev wrote:
> >>>
>  Richard,
> 
>  In my previous patch I forgot to remove couple lines related to aux
> >>field.
>  Here is the correct updated patch.
> >>>
> >>> Yeah, I noticed.  This patch would be ok for trunk (together with
> >>> necessary parts from 1 and 2) if all not required parts are removed
> >>> (and you'd add the testcases covering non-masked tail vect).
> >>>
> >>> Thus, can you please produce a single complete patch containing only
> >>> non-masked epilogue vectoriziation?
> >>>
> >>> Thanks,
> >>> Richard.
> >>>
>  Thanks.
>  Yuri.
> 
>  2016-11-14 15:51 

Re: [PATCH, vec-tails] Support loop epilogue vectorization

2016-11-15 Thread Yuri Rumyantsev
Hi All,

Here is patch for non-masked epilogue vectoriziation.

Bootstrap and regression testing did not show any new failures.

Is it OK for trunk?

Thanks.
Changelog:

2016-11-15  Yuri Rumyantsev  

* params.def (PARAM_VECT_EPILOGUES_NOMASK): New.
* tree-if-conv.c (tree_if_conversion): Make public.
* * tree-if-conv.h: New file.
* tree-vect-data-refs.c (vect_analyze_data_ref_dependences) Avoid
dynamic alias checks for epilogues.
* tree-vect-loop-manip.c (vect_do_peeling): Return created epilog.
* tree-vect-loop.c: include tree-if-conv.h.
(new_loop_vec_info): Add zeroing orig_loop_info field.
(vect_analyze_loop_2): Don't try to enhance alignment for epilogues.
(vect_analyze_loop): Add argument ORIG_LOOP_INFO which is not NULL
if epilogue is vectorized, set up orig_loop_info field of loop_vinfo
using passed argument.
(vect_transform_loop): Check if created epilogue should be returned
for further vectorization with less vf.  If-convert epilogue if
required. Print vectorization success for epilogue.
* tree-vectorizer.c (vectorize_loops): Add epilogue vectorization
if it is required, pass loop_vinfo produced during vectorization of
loop body to vect_analyze_loop.
* tree-vectorizer.h (struct _loop_vec_info): Add new field
orig_loop_info.
(LOOP_VINFO_ORIG_LOOP_INFO): New.
(LOOP_VINFO_EPILOGUE_P): New.
(LOOP_VINFO_ORIG_VECT_FACTOR): New.
(vect_do_peeling): Change prototype to return epilogue.
(vect_analyze_loop): Add argument of loop_vec_info type.
(vect_transform_loop): Return created loop.

gcc/testsuite/

* lib/target-supports.exp (check_avx2_hw_available): New.
(check_effective_target_avx2_runtime): New.
* gcc.dg/vect/vect-tail-nomask-1.c: New test.


2016-11-14 20:04 GMT+03:00 Richard Biener :
> On November 14, 2016 4:39:40 PM GMT+01:00, Yuri Rumyantsev 
>  wrote:
>>Richard,
>>
>>I checked one of the tests designed for epilogue vectorization using
>>patches 1 - 3 and found out that build compiler performs vectorization
>>of epilogues with --param vect-epilogues-nomask=1 passed:
>>
>>$ gcc -Ofast -mavx2 t1.c -S --param vect-epilogues-nomask=1 -o
>>t1.new-nomask.s -fdump-tree-vect-details
>>$ grep VECTORIZED -c t1.c.156t.vect
>>4
>> Without param only 2 loops are vectorized.
>>
>>Should I simply add a part of tests related to this feature or I must
>>delete all not necessary changes also?
>
> Please remove all not necessary changes.
>
> Richard.
>
>>Thanks.
>>Yuri.
>>
>>2016-11-14 16:40 GMT+03:00 Richard Biener :
>>> On Mon, 14 Nov 2016, Yuri Rumyantsev wrote:
>>>
 Richard,

 In my previous patch I forgot to remove couple lines related to aux
>>field.
 Here is the correct updated patch.
>>>
>>> Yeah, I noticed.  This patch would be ok for trunk (together with
>>> necessary parts from 1 and 2) if all not required parts are removed
>>> (and you'd add the testcases covering non-masked tail vect).
>>>
>>> Thus, can you please produce a single complete patch containing only
>>> non-masked epilogue vectoriziation?
>>>
>>> Thanks,
>>> Richard.
>>>
 Thanks.
 Yuri.

 2016-11-14 15:51 GMT+03:00 Richard Biener :
 > On Fri, 11 Nov 2016, Yuri Rumyantsev wrote:
 >
 >> Richard,
 >>
 >> I prepare updated 3 patch with passing additional argument to
 >> vect_analyze_loop as you proposed (untested).
 >>
 >> You wrote:
 >> tw, I wonder if you can produce a single patch containing just
 >> epilogue vectorization, that is combine patches 1-3 but rip out
 >> changes only needed by later patches?
 >>
 >> Did you mean that I exclude all support for vectorization
>>epilogues,
 >> i.e. exclude from 2-nd patch all non-related changes
 >> like
 >>
 >> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
 >> index 11863af..32011c1 100644
 >> --- a/gcc/tree-vect-loop.c
 >> +++ b/gcc/tree-vect-loop.c
 >> @@ -1120,6 +1120,12 @@ new_loop_vec_info (struct loop *loop)
 >>LOOP_VINFO_PEELING_FOR_GAPS (res) = false;
 >>LOOP_VINFO_PEELING_FOR_NITER (res) = false;
 >>LOOP_VINFO_OPERANDS_SWAPPED (res) = false;
 >> +  LOOP_VINFO_CAN_BE_MASKED (res) = false;
 >> +  LOOP_VINFO_REQUIRED_MASKS (res) = 0;
 >> +  LOOP_VINFO_COMBINE_EPILOGUE (res) = false;
 >> +  LOOP_VINFO_MASK_EPILOGUE (res) = false;
 >> +  LOOP_VINFO_NEED_MASKING (res) = false;
 >> +  LOOP_VINFO_ORIG_LOOP_INFO (res) = NULL;
 >
 > Yes.
 >
 >> Did you mean also that new combined patch must be working patch,
>>i.e.
 >> can be integrated without other patches?
 >
 > Yes.
 >
 >> Could you please look at updated patch?
 >
 > Will do.
 >
 > Thanks,
 > Richard.
 >
 >> Thanks.
 >> Yuri.
 >>
 >> 2016-11-10 15:36 GMT+03:00 Richard Biener :
 >> > On Thu, 10 Nov 2016, Richard Biener wrote:
 >> >
 >> >> On Tue, 8 Nov 2016, Yuri 

Re: [PATCH, vec-tails] Support loop epilogue vectorization

2016-11-14 Thread Richard Biener
On November 14, 2016 4:39:40 PM GMT+01:00, Yuri Rumyantsev  
wrote:
>Richard,
>
>I checked one of the tests designed for epilogue vectorization using
>patches 1 - 3 and found out that build compiler performs vectorization
>of epilogues with --param vect-epilogues-nomask=1 passed:
>
>$ gcc -Ofast -mavx2 t1.c -S --param vect-epilogues-nomask=1 -o
>t1.new-nomask.s -fdump-tree-vect-details
>$ grep VECTORIZED -c t1.c.156t.vect
>4
> Without param only 2 loops are vectorized.
>
>Should I simply add a part of tests related to this feature or I must
>delete all not necessary changes also?

Please remove all not necessary changes.

Richard.

>Thanks.
>Yuri.
>
>2016-11-14 16:40 GMT+03:00 Richard Biener :
>> On Mon, 14 Nov 2016, Yuri Rumyantsev wrote:
>>
>>> Richard,
>>>
>>> In my previous patch I forgot to remove couple lines related to aux
>field.
>>> Here is the correct updated patch.
>>
>> Yeah, I noticed.  This patch would be ok for trunk (together with
>> necessary parts from 1 and 2) if all not required parts are removed
>> (and you'd add the testcases covering non-masked tail vect).
>>
>> Thus, can you please produce a single complete patch containing only
>> non-masked epilogue vectoriziation?
>>
>> Thanks,
>> Richard.
>>
>>> Thanks.
>>> Yuri.
>>>
>>> 2016-11-14 15:51 GMT+03:00 Richard Biener :
>>> > On Fri, 11 Nov 2016, Yuri Rumyantsev wrote:
>>> >
>>> >> Richard,
>>> >>
>>> >> I prepare updated 3 patch with passing additional argument to
>>> >> vect_analyze_loop as you proposed (untested).
>>> >>
>>> >> You wrote:
>>> >> tw, I wonder if you can produce a single patch containing just
>>> >> epilogue vectorization, that is combine patches 1-3 but rip out
>>> >> changes only needed by later patches?
>>> >>
>>> >> Did you mean that I exclude all support for vectorization
>epilogues,
>>> >> i.e. exclude from 2-nd patch all non-related changes
>>> >> like
>>> >>
>>> >> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
>>> >> index 11863af..32011c1 100644
>>> >> --- a/gcc/tree-vect-loop.c
>>> >> +++ b/gcc/tree-vect-loop.c
>>> >> @@ -1120,6 +1120,12 @@ new_loop_vec_info (struct loop *loop)
>>> >>LOOP_VINFO_PEELING_FOR_GAPS (res) = false;
>>> >>LOOP_VINFO_PEELING_FOR_NITER (res) = false;
>>> >>LOOP_VINFO_OPERANDS_SWAPPED (res) = false;
>>> >> +  LOOP_VINFO_CAN_BE_MASKED (res) = false;
>>> >> +  LOOP_VINFO_REQUIRED_MASKS (res) = 0;
>>> >> +  LOOP_VINFO_COMBINE_EPILOGUE (res) = false;
>>> >> +  LOOP_VINFO_MASK_EPILOGUE (res) = false;
>>> >> +  LOOP_VINFO_NEED_MASKING (res) = false;
>>> >> +  LOOP_VINFO_ORIG_LOOP_INFO (res) = NULL;
>>> >
>>> > Yes.
>>> >
>>> >> Did you mean also that new combined patch must be working patch,
>i.e.
>>> >> can be integrated without other patches?
>>> >
>>> > Yes.
>>> >
>>> >> Could you please look at updated patch?
>>> >
>>> > Will do.
>>> >
>>> > Thanks,
>>> > Richard.
>>> >
>>> >> Thanks.
>>> >> Yuri.
>>> >>
>>> >> 2016-11-10 15:36 GMT+03:00 Richard Biener :
>>> >> > On Thu, 10 Nov 2016, Richard Biener wrote:
>>> >> >
>>> >> >> On Tue, 8 Nov 2016, Yuri Rumyantsev wrote:
>>> >> >>
>>> >> >> > Richard,
>>> >> >> >
>>> >> >> > Here is updated 3 patch.
>>> >> >> >
>>> >> >> > I checked that all new tests related to epilogue
>vectorization passed with it.
>>> >> >> >
>>> >> >> > Your comments will be appreciated.
>>> >> >>
>>> >> >> A lot better now.  Instead of the ->aux dance I now prefer to
>>> >> >> pass the original loops loop_vinfo to vect_analyze_loop as
>>> >> >> optional argument (if non-NULL we analyze the epilogue of that
>>> >> >> loop_vinfo).  OTOH I remember we mainly use it to get at the
>>> >> >> original vectorization factor?  So we can pass down an
>(optional)
>>> >> >> forced vectorization factor as well?
>>> >> >
>>> >> > Btw, I wonder if you can produce a single patch containing just
>>> >> > epilogue vectorization, that is combine patches 1-3 but rip out
>>> >> > changes only needed by later patches?
>>> >> >
>>> >> > Thanks,
>>> >> > Richard.
>>> >> >
>>> >> >> Richard.
>>> >> >>
>>> >> >> > 2016-11-08 15:38 GMT+03:00 Richard Biener
>:
>>> >> >> > > On Thu, 3 Nov 2016, Yuri Rumyantsev wrote:
>>> >> >> > >
>>> >> >> > >> Hi Richard,
>>> >> >> > >>
>>> >> >> > >> I did not understand your last remark:
>>> >> >> > >>
>>> >> >> > >> > That is, here (and avoid the FOR_EACH_LOOP change):
>>> >> >> > >> >
>>> >> >> > >> > @@ -580,12 +586,21 @@ vectorize_loops (void)
>>> >> >> > >> >   && dump_enabled_p ())
>>> >> >> > >> >   dump_printf_loc (MSG_OPTIMIZED_LOCATIONS,
>vect_location,
>>> >> >> > >> >"loop vectorized\n");
>>> >> >> > >> > -   vect_transform_loop (loop_vinfo);
>>> >> >> > >> > +   new_loop = vect_transform_loop (loop_vinfo);
>>> >> >> > >> > num_vectorized_loops++;
>>> >> >> > >> >/* Now that the loop has been vectorized, allow
>it to be unrolled
>>> >> >> > >> >   etc.  */

Re: [PATCH, vec-tails] Support loop epilogue vectorization

2016-11-14 Thread Yuri Rumyantsev
Richard,

I checked one of the tests designed for epilogue vectorization using
patches 1 - 3 and found out that build compiler performs vectorization
of epilogues with --param vect-epilogues-nomask=1 passed:

$ gcc -Ofast -mavx2 t1.c -S --param vect-epilogues-nomask=1 -o
t1.new-nomask.s -fdump-tree-vect-details
$ grep VECTORIZED -c t1.c.156t.vect
4
 Without param only 2 loops are vectorized.

Should I simply add a part of tests related to this feature or I must
delete all not necessary changes also?

Thanks.
Yuri.

2016-11-14 16:40 GMT+03:00 Richard Biener :
> On Mon, 14 Nov 2016, Yuri Rumyantsev wrote:
>
>> Richard,
>>
>> In my previous patch I forgot to remove couple lines related to aux field.
>> Here is the correct updated patch.
>
> Yeah, I noticed.  This patch would be ok for trunk (together with
> necessary parts from 1 and 2) if all not required parts are removed
> (and you'd add the testcases covering non-masked tail vect).
>
> Thus, can you please produce a single complete patch containing only
> non-masked epilogue vectoriziation?
>
> Thanks,
> Richard.
>
>> Thanks.
>> Yuri.
>>
>> 2016-11-14 15:51 GMT+03:00 Richard Biener :
>> > On Fri, 11 Nov 2016, Yuri Rumyantsev wrote:
>> >
>> >> Richard,
>> >>
>> >> I prepare updated 3 patch with passing additional argument to
>> >> vect_analyze_loop as you proposed (untested).
>> >>
>> >> You wrote:
>> >> tw, I wonder if you can produce a single patch containing just
>> >> epilogue vectorization, that is combine patches 1-3 but rip out
>> >> changes only needed by later patches?
>> >>
>> >> Did you mean that I exclude all support for vectorization epilogues,
>> >> i.e. exclude from 2-nd patch all non-related changes
>> >> like
>> >>
>> >> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
>> >> index 11863af..32011c1 100644
>> >> --- a/gcc/tree-vect-loop.c
>> >> +++ b/gcc/tree-vect-loop.c
>> >> @@ -1120,6 +1120,12 @@ new_loop_vec_info (struct loop *loop)
>> >>LOOP_VINFO_PEELING_FOR_GAPS (res) = false;
>> >>LOOP_VINFO_PEELING_FOR_NITER (res) = false;
>> >>LOOP_VINFO_OPERANDS_SWAPPED (res) = false;
>> >> +  LOOP_VINFO_CAN_BE_MASKED (res) = false;
>> >> +  LOOP_VINFO_REQUIRED_MASKS (res) = 0;
>> >> +  LOOP_VINFO_COMBINE_EPILOGUE (res) = false;
>> >> +  LOOP_VINFO_MASK_EPILOGUE (res) = false;
>> >> +  LOOP_VINFO_NEED_MASKING (res) = false;
>> >> +  LOOP_VINFO_ORIG_LOOP_INFO (res) = NULL;
>> >
>> > Yes.
>> >
>> >> Did you mean also that new combined patch must be working patch, i.e.
>> >> can be integrated without other patches?
>> >
>> > Yes.
>> >
>> >> Could you please look at updated patch?
>> >
>> > Will do.
>> >
>> > Thanks,
>> > Richard.
>> >
>> >> Thanks.
>> >> Yuri.
>> >>
>> >> 2016-11-10 15:36 GMT+03:00 Richard Biener :
>> >> > On Thu, 10 Nov 2016, Richard Biener wrote:
>> >> >
>> >> >> On Tue, 8 Nov 2016, Yuri Rumyantsev wrote:
>> >> >>
>> >> >> > Richard,
>> >> >> >
>> >> >> > Here is updated 3 patch.
>> >> >> >
>> >> >> > I checked that all new tests related to epilogue vectorization 
>> >> >> > passed with it.
>> >> >> >
>> >> >> > Your comments will be appreciated.
>> >> >>
>> >> >> A lot better now.  Instead of the ->aux dance I now prefer to
>> >> >> pass the original loops loop_vinfo to vect_analyze_loop as
>> >> >> optional argument (if non-NULL we analyze the epilogue of that
>> >> >> loop_vinfo).  OTOH I remember we mainly use it to get at the
>> >> >> original vectorization factor?  So we can pass down an (optional)
>> >> >> forced vectorization factor as well?
>> >> >
>> >> > Btw, I wonder if you can produce a single patch containing just
>> >> > epilogue vectorization, that is combine patches 1-3 but rip out
>> >> > changes only needed by later patches?
>> >> >
>> >> > Thanks,
>> >> > Richard.
>> >> >
>> >> >> Richard.
>> >> >>
>> >> >> > 2016-11-08 15:38 GMT+03:00 Richard Biener :
>> >> >> > > On Thu, 3 Nov 2016, Yuri Rumyantsev wrote:
>> >> >> > >
>> >> >> > >> Hi Richard,
>> >> >> > >>
>> >> >> > >> I did not understand your last remark:
>> >> >> > >>
>> >> >> > >> > That is, here (and avoid the FOR_EACH_LOOP change):
>> >> >> > >> >
>> >> >> > >> > @@ -580,12 +586,21 @@ vectorize_loops (void)
>> >> >> > >> >   && dump_enabled_p ())
>> >> >> > >> >   dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, 
>> >> >> > >> > vect_location,
>> >> >> > >> >"loop vectorized\n");
>> >> >> > >> > -   vect_transform_loop (loop_vinfo);
>> >> >> > >> > +   new_loop = vect_transform_loop (loop_vinfo);
>> >> >> > >> > num_vectorized_loops++;
>> >> >> > >> >/* Now that the loop has been vectorized, allow it to be 
>> >> >> > >> > unrolled
>> >> >> > >> >   etc.  */
>> >> >> > >> >  loop->force_vectorize = false;
>> >> >> > >> >
>> >> >> > >> > +   /* Add new loop to a processing queue.  To make it 
>> >> >> > >> > easier
>> >> >> > >> > +  to match loop and its epilogue 

Re: [PATCH, vec-tails] Support loop epilogue vectorization

2016-11-14 Thread Richard Biener
On Mon, 14 Nov 2016, Yuri Rumyantsev wrote:

> Richard,
> 
> In my previous patch I forgot to remove couple lines related to aux field.
> Here is the correct updated patch.

Yeah, I noticed.  This patch would be ok for trunk (together with
necessary parts from 1 and 2) if all not required parts are removed
(and you'd add the testcases covering non-masked tail vect).

Thus, can you please produce a single complete patch containing only
non-masked epilogue vectoriziation?

Thanks,
Richard.

> Thanks.
> Yuri.
> 
> 2016-11-14 15:51 GMT+03:00 Richard Biener :
> > On Fri, 11 Nov 2016, Yuri Rumyantsev wrote:
> >
> >> Richard,
> >>
> >> I prepare updated 3 patch with passing additional argument to
> >> vect_analyze_loop as you proposed (untested).
> >>
> >> You wrote:
> >> tw, I wonder if you can produce a single patch containing just
> >> epilogue vectorization, that is combine patches 1-3 but rip out
> >> changes only needed by later patches?
> >>
> >> Did you mean that I exclude all support for vectorization epilogues,
> >> i.e. exclude from 2-nd patch all non-related changes
> >> like
> >>
> >> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
> >> index 11863af..32011c1 100644
> >> --- a/gcc/tree-vect-loop.c
> >> +++ b/gcc/tree-vect-loop.c
> >> @@ -1120,6 +1120,12 @@ new_loop_vec_info (struct loop *loop)
> >>LOOP_VINFO_PEELING_FOR_GAPS (res) = false;
> >>LOOP_VINFO_PEELING_FOR_NITER (res) = false;
> >>LOOP_VINFO_OPERANDS_SWAPPED (res) = false;
> >> +  LOOP_VINFO_CAN_BE_MASKED (res) = false;
> >> +  LOOP_VINFO_REQUIRED_MASKS (res) = 0;
> >> +  LOOP_VINFO_COMBINE_EPILOGUE (res) = false;
> >> +  LOOP_VINFO_MASK_EPILOGUE (res) = false;
> >> +  LOOP_VINFO_NEED_MASKING (res) = false;
> >> +  LOOP_VINFO_ORIG_LOOP_INFO (res) = NULL;
> >
> > Yes.
> >
> >> Did you mean also that new combined patch must be working patch, i.e.
> >> can be integrated without other patches?
> >
> > Yes.
> >
> >> Could you please look at updated patch?
> >
> > Will do.
> >
> > Thanks,
> > Richard.
> >
> >> Thanks.
> >> Yuri.
> >>
> >> 2016-11-10 15:36 GMT+03:00 Richard Biener :
> >> > On Thu, 10 Nov 2016, Richard Biener wrote:
> >> >
> >> >> On Tue, 8 Nov 2016, Yuri Rumyantsev wrote:
> >> >>
> >> >> > Richard,
> >> >> >
> >> >> > Here is updated 3 patch.
> >> >> >
> >> >> > I checked that all new tests related to epilogue vectorization passed 
> >> >> > with it.
> >> >> >
> >> >> > Your comments will be appreciated.
> >> >>
> >> >> A lot better now.  Instead of the ->aux dance I now prefer to
> >> >> pass the original loops loop_vinfo to vect_analyze_loop as
> >> >> optional argument (if non-NULL we analyze the epilogue of that
> >> >> loop_vinfo).  OTOH I remember we mainly use it to get at the
> >> >> original vectorization factor?  So we can pass down an (optional)
> >> >> forced vectorization factor as well?
> >> >
> >> > Btw, I wonder if you can produce a single patch containing just
> >> > epilogue vectorization, that is combine patches 1-3 but rip out
> >> > changes only needed by later patches?
> >> >
> >> > Thanks,
> >> > Richard.
> >> >
> >> >> Richard.
> >> >>
> >> >> > 2016-11-08 15:38 GMT+03:00 Richard Biener :
> >> >> > > On Thu, 3 Nov 2016, Yuri Rumyantsev wrote:
> >> >> > >
> >> >> > >> Hi Richard,
> >> >> > >>
> >> >> > >> I did not understand your last remark:
> >> >> > >>
> >> >> > >> > That is, here (and avoid the FOR_EACH_LOOP change):
> >> >> > >> >
> >> >> > >> > @@ -580,12 +586,21 @@ vectorize_loops (void)
> >> >> > >> >   && dump_enabled_p ())
> >> >> > >> >   dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, 
> >> >> > >> > vect_location,
> >> >> > >> >"loop vectorized\n");
> >> >> > >> > -   vect_transform_loop (loop_vinfo);
> >> >> > >> > +   new_loop = vect_transform_loop (loop_vinfo);
> >> >> > >> > num_vectorized_loops++;
> >> >> > >> >/* Now that the loop has been vectorized, allow it to be 
> >> >> > >> > unrolled
> >> >> > >> >   etc.  */
> >> >> > >> >  loop->force_vectorize = false;
> >> >> > >> >
> >> >> > >> > +   /* Add new loop to a processing queue.  To make it easier
> >> >> > >> > +  to match loop and its epilogue vectorization in dumps
> >> >> > >> > +  put new loop as the next loop to process.  */
> >> >> > >> > +   if (new_loop)
> >> >> > >> > + {
> >> >> > >> > +   loops.safe_insert (i + 1, new_loop->num);
> >> >> > >> > +   vect_loops_num = number_of_loops (cfun);
> >> >> > >> > + }
> >> >> > >> >
> >> >> > >> > simply dispatch to a vectorize_epilogue (loop_vinfo, new_loop)
> >> >> > >> f> unction which will set up stuff properly (and also perform
> >> >> > >> > the if-conversion of the epilogue there).
> >> >> > >> >
> >> >> > >> > That said, if we can get in non-masked epilogue vectorization
> >> >> > >> > separately that would be great.
> >> >> > >>
> >> >> > >> Could you please clarify 

Re: [PATCH, vec-tails] Support loop epilogue vectorization

2016-11-14 Thread Yuri Rumyantsev
Richard,

In my previous patch I forgot to remove couple lines related to aux field.
Here is the correct updated patch.

Thanks.
Yuri.

2016-11-14 15:51 GMT+03:00 Richard Biener :
> On Fri, 11 Nov 2016, Yuri Rumyantsev wrote:
>
>> Richard,
>>
>> I prepare updated 3 patch with passing additional argument to
>> vect_analyze_loop as you proposed (untested).
>>
>> You wrote:
>> tw, I wonder if you can produce a single patch containing just
>> epilogue vectorization, that is combine patches 1-3 but rip out
>> changes only needed by later patches?
>>
>> Did you mean that I exclude all support for vectorization epilogues,
>> i.e. exclude from 2-nd patch all non-related changes
>> like
>>
>> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
>> index 11863af..32011c1 100644
>> --- a/gcc/tree-vect-loop.c
>> +++ b/gcc/tree-vect-loop.c
>> @@ -1120,6 +1120,12 @@ new_loop_vec_info (struct loop *loop)
>>LOOP_VINFO_PEELING_FOR_GAPS (res) = false;
>>LOOP_VINFO_PEELING_FOR_NITER (res) = false;
>>LOOP_VINFO_OPERANDS_SWAPPED (res) = false;
>> +  LOOP_VINFO_CAN_BE_MASKED (res) = false;
>> +  LOOP_VINFO_REQUIRED_MASKS (res) = 0;
>> +  LOOP_VINFO_COMBINE_EPILOGUE (res) = false;
>> +  LOOP_VINFO_MASK_EPILOGUE (res) = false;
>> +  LOOP_VINFO_NEED_MASKING (res) = false;
>> +  LOOP_VINFO_ORIG_LOOP_INFO (res) = NULL;
>
> Yes.
>
>> Did you mean also that new combined patch must be working patch, i.e.
>> can be integrated without other patches?
>
> Yes.
>
>> Could you please look at updated patch?
>
> Will do.
>
> Thanks,
> Richard.
>
>> Thanks.
>> Yuri.
>>
>> 2016-11-10 15:36 GMT+03:00 Richard Biener :
>> > On Thu, 10 Nov 2016, Richard Biener wrote:
>> >
>> >> On Tue, 8 Nov 2016, Yuri Rumyantsev wrote:
>> >>
>> >> > Richard,
>> >> >
>> >> > Here is updated 3 patch.
>> >> >
>> >> > I checked that all new tests related to epilogue vectorization passed 
>> >> > with it.
>> >> >
>> >> > Your comments will be appreciated.
>> >>
>> >> A lot better now.  Instead of the ->aux dance I now prefer to
>> >> pass the original loops loop_vinfo to vect_analyze_loop as
>> >> optional argument (if non-NULL we analyze the epilogue of that
>> >> loop_vinfo).  OTOH I remember we mainly use it to get at the
>> >> original vectorization factor?  So we can pass down an (optional)
>> >> forced vectorization factor as well?
>> >
>> > Btw, I wonder if you can produce a single patch containing just
>> > epilogue vectorization, that is combine patches 1-3 but rip out
>> > changes only needed by later patches?
>> >
>> > Thanks,
>> > Richard.
>> >
>> >> Richard.
>> >>
>> >> > 2016-11-08 15:38 GMT+03:00 Richard Biener :
>> >> > > On Thu, 3 Nov 2016, Yuri Rumyantsev wrote:
>> >> > >
>> >> > >> Hi Richard,
>> >> > >>
>> >> > >> I did not understand your last remark:
>> >> > >>
>> >> > >> > That is, here (and avoid the FOR_EACH_LOOP change):
>> >> > >> >
>> >> > >> > @@ -580,12 +586,21 @@ vectorize_loops (void)
>> >> > >> >   && dump_enabled_p ())
>> >> > >> >   dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, vect_location,
>> >> > >> >"loop vectorized\n");
>> >> > >> > -   vect_transform_loop (loop_vinfo);
>> >> > >> > +   new_loop = vect_transform_loop (loop_vinfo);
>> >> > >> > num_vectorized_loops++;
>> >> > >> >/* Now that the loop has been vectorized, allow it to be 
>> >> > >> > unrolled
>> >> > >> >   etc.  */
>> >> > >> >  loop->force_vectorize = false;
>> >> > >> >
>> >> > >> > +   /* Add new loop to a processing queue.  To make it easier
>> >> > >> > +  to match loop and its epilogue vectorization in dumps
>> >> > >> > +  put new loop as the next loop to process.  */
>> >> > >> > +   if (new_loop)
>> >> > >> > + {
>> >> > >> > +   loops.safe_insert (i + 1, new_loop->num);
>> >> > >> > +   vect_loops_num = number_of_loops (cfun);
>> >> > >> > + }
>> >> > >> >
>> >> > >> > simply dispatch to a vectorize_epilogue (loop_vinfo, new_loop)
>> >> > >> f> unction which will set up stuff properly (and also perform
>> >> > >> > the if-conversion of the epilogue there).
>> >> > >> >
>> >> > >> > That said, if we can get in non-masked epilogue vectorization
>> >> > >> > separately that would be great.
>> >> > >>
>> >> > >> Could you please clarify your proposal.
>> >> > >
>> >> > > When a loop was vectorized set things up to immediately vectorize
>> >> > > its epilogue, avoiding changing the loop iteration and avoiding
>> >> > > the re-use of ->aux.
>> >> > >
>> >> > > Richard.
>> >> > >
>> >> > >> Thanks.
>> >> > >> Yuri.
>> >> > >>
>> >> > >> 2016-11-02 15:27 GMT+03:00 Richard Biener :
>> >> > >> > On Tue, 1 Nov 2016, Yuri Rumyantsev wrote:
>> >> > >> >
>> >> > >> >> Hi All,
>> >> > >> >>
>> >> > >> >> I re-send all patches sent by Ilya earlier for review which 
>> >> > >> >> support
>> >> > >> >> vectorization of loop epilogues and loops with 

Re: [PATCH, vec-tails] Support loop epilogue vectorization

2016-11-14 Thread Richard Biener
On Fri, 11 Nov 2016, Yuri Rumyantsev wrote:

> Richard,
> 
> Here is fixed version of updated patch 3.
> 
> Any comments will be appreciated.

Looks good apart from

+  if (epilogue)
+{
+  epilogue->force_vectorize = loop->force_vectorize;
+  epilogue->safelen = loop->safelen;
+  epilogue->dont_vectorize = false;
+
+  /* We may need to if-convert epilogue to vectorize it.  */
+  if (LOOP_VINFO_SCALAR_LOOP (loop_vinfo))
+   tree_if_conversion (epilogue);
+
+  gcc_assert (!epilogue->aux);
+  epilogue->aux = loop_vinfo;

where the last two lines should now no longer be necessary?

Thanks,
Richard.

> Thanks.
> Yuri.
> 
> 2016-11-11 17:15 GMT+03:00 Yuri Rumyantsev :
> > Richard,
> >
> > Sorry for confusion but my updated patch  does not work properly, so I
> > need to fix it.
> >
> > Yuri.
> >
> > 2016-11-11 14:15 GMT+03:00 Yuri Rumyantsev :
> >> Richard,
> >>
> >> I prepare updated 3 patch with passing additional argument to
> >> vect_analyze_loop as you proposed (untested).
> >>
> >> You wrote:
> >> tw, I wonder if you can produce a single patch containing just
> >> epilogue vectorization, that is combine patches 1-3 but rip out
> >> changes only needed by later patches?
> >>
> >> Did you mean that I exclude all support for vectorization epilogues,
> >> i.e. exclude from 2-nd patch all non-related changes
> >> like
> >>
> >> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
> >> index 11863af..32011c1 100644
> >> --- a/gcc/tree-vect-loop.c
> >> +++ b/gcc/tree-vect-loop.c
> >> @@ -1120,6 +1120,12 @@ new_loop_vec_info (struct loop *loop)
> >>LOOP_VINFO_PEELING_FOR_GAPS (res) = false;
> >>LOOP_VINFO_PEELING_FOR_NITER (res) = false;
> >>LOOP_VINFO_OPERANDS_SWAPPED (res) = false;
> >> +  LOOP_VINFO_CAN_BE_MASKED (res) = false;
> >> +  LOOP_VINFO_REQUIRED_MASKS (res) = 0;
> >> +  LOOP_VINFO_COMBINE_EPILOGUE (res) = false;
> >> +  LOOP_VINFO_MASK_EPILOGUE (res) = false;
> >> +  LOOP_VINFO_NEED_MASKING (res) = false;
> >> +  LOOP_VINFO_ORIG_LOOP_INFO (res) = NULL;
> >>
> >> Did you mean also that new combined patch must be working patch, i.e.
> >> can be integrated without other patches?
> >>
> >> Could you please look at updated patch?
> >>
> >> Thanks.
> >> Yuri.
> >>
> >> 2016-11-10 15:36 GMT+03:00 Richard Biener :
> >>> On Thu, 10 Nov 2016, Richard Biener wrote:
> >>>
>  On Tue, 8 Nov 2016, Yuri Rumyantsev wrote:
> 
>  > Richard,
>  >
>  > Here is updated 3 patch.
>  >
>  > I checked that all new tests related to epilogue vectorization passed 
>  > with it.
>  >
>  > Your comments will be appreciated.
> 
>  A lot better now.  Instead of the ->aux dance I now prefer to
>  pass the original loops loop_vinfo to vect_analyze_loop as
>  optional argument (if non-NULL we analyze the epilogue of that
>  loop_vinfo).  OTOH I remember we mainly use it to get at the
>  original vectorization factor?  So we can pass down an (optional)
>  forced vectorization factor as well?
> >>>
> >>> Btw, I wonder if you can produce a single patch containing just
> >>> epilogue vectorization, that is combine patches 1-3 but rip out
> >>> changes only needed by later patches?
> >>>
> >>> Thanks,
> >>> Richard.
> >>>
>  Richard.
> 
>  > 2016-11-08 15:38 GMT+03:00 Richard Biener :
>  > > On Thu, 3 Nov 2016, Yuri Rumyantsev wrote:
>  > >
>  > >> Hi Richard,
>  > >>
>  > >> I did not understand your last remark:
>  > >>
>  > >> > That is, here (and avoid the FOR_EACH_LOOP change):
>  > >> >
>  > >> > @@ -580,12 +586,21 @@ vectorize_loops (void)
>  > >> >   && dump_enabled_p ())
>  > >> >   dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, vect_location,
>  > >> >"loop vectorized\n");
>  > >> > -   vect_transform_loop (loop_vinfo);
>  > >> > +   new_loop = vect_transform_loop (loop_vinfo);
>  > >> > num_vectorized_loops++;
>  > >> >/* Now that the loop has been vectorized, allow it to be 
>  > >> > unrolled
>  > >> >   etc.  */
>  > >> >  loop->force_vectorize = false;
>  > >> >
>  > >> > +   /* Add new loop to a processing queue.  To make it easier
>  > >> > +  to match loop and its epilogue vectorization in dumps
>  > >> > +  put new loop as the next loop to process.  */
>  > >> > +   if (new_loop)
>  > >> > + {
>  > >> > +   loops.safe_insert (i + 1, new_loop->num);
>  > >> > +   vect_loops_num = number_of_loops (cfun);
>  > >> > + }
>  > >> >
>  > >> > simply dispatch to a vectorize_epilogue (loop_vinfo, new_loop)
>  > >> f> unction which will set up stuff properly (and also perform
>  > >> > the if-conversion of the epilogue there).
>  > >> >
>  > >> > That said, if we 

Re: [PATCH, vec-tails] Support loop epilogue vectorization

2016-11-14 Thread Richard Biener
On Fri, 11 Nov 2016, Yuri Rumyantsev wrote:

> Richard,
> 
> I prepare updated 3 patch with passing additional argument to
> vect_analyze_loop as you proposed (untested).
> 
> You wrote:
> tw, I wonder if you can produce a single patch containing just
> epilogue vectorization, that is combine patches 1-3 but rip out
> changes only needed by later patches?
> 
> Did you mean that I exclude all support for vectorization epilogues,
> i.e. exclude from 2-nd patch all non-related changes
> like
> 
> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
> index 11863af..32011c1 100644
> --- a/gcc/tree-vect-loop.c
> +++ b/gcc/tree-vect-loop.c
> @@ -1120,6 +1120,12 @@ new_loop_vec_info (struct loop *loop)
>LOOP_VINFO_PEELING_FOR_GAPS (res) = false;
>LOOP_VINFO_PEELING_FOR_NITER (res) = false;
>LOOP_VINFO_OPERANDS_SWAPPED (res) = false;
> +  LOOP_VINFO_CAN_BE_MASKED (res) = false;
> +  LOOP_VINFO_REQUIRED_MASKS (res) = 0;
> +  LOOP_VINFO_COMBINE_EPILOGUE (res) = false;
> +  LOOP_VINFO_MASK_EPILOGUE (res) = false;
> +  LOOP_VINFO_NEED_MASKING (res) = false;
> +  LOOP_VINFO_ORIG_LOOP_INFO (res) = NULL;

Yes.
 
> Did you mean also that new combined patch must be working patch, i.e.
> can be integrated without other patches?

Yes.

> Could you please look at updated patch?

Will do.

Thanks,
Richard.

> Thanks.
> Yuri.
> 
> 2016-11-10 15:36 GMT+03:00 Richard Biener :
> > On Thu, 10 Nov 2016, Richard Biener wrote:
> >
> >> On Tue, 8 Nov 2016, Yuri Rumyantsev wrote:
> >>
> >> > Richard,
> >> >
> >> > Here is updated 3 patch.
> >> >
> >> > I checked that all new tests related to epilogue vectorization passed 
> >> > with it.
> >> >
> >> > Your comments will be appreciated.
> >>
> >> A lot better now.  Instead of the ->aux dance I now prefer to
> >> pass the original loops loop_vinfo to vect_analyze_loop as
> >> optional argument (if non-NULL we analyze the epilogue of that
> >> loop_vinfo).  OTOH I remember we mainly use it to get at the
> >> original vectorization factor?  So we can pass down an (optional)
> >> forced vectorization factor as well?
> >
> > Btw, I wonder if you can produce a single patch containing just
> > epilogue vectorization, that is combine patches 1-3 but rip out
> > changes only needed by later patches?
> >
> > Thanks,
> > Richard.
> >
> >> Richard.
> >>
> >> > 2016-11-08 15:38 GMT+03:00 Richard Biener :
> >> > > On Thu, 3 Nov 2016, Yuri Rumyantsev wrote:
> >> > >
> >> > >> Hi Richard,
> >> > >>
> >> > >> I did not understand your last remark:
> >> > >>
> >> > >> > That is, here (and avoid the FOR_EACH_LOOP change):
> >> > >> >
> >> > >> > @@ -580,12 +586,21 @@ vectorize_loops (void)
> >> > >> >   && dump_enabled_p ())
> >> > >> >   dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, vect_location,
> >> > >> >"loop vectorized\n");
> >> > >> > -   vect_transform_loop (loop_vinfo);
> >> > >> > +   new_loop = vect_transform_loop (loop_vinfo);
> >> > >> > num_vectorized_loops++;
> >> > >> >/* Now that the loop has been vectorized, allow it to be 
> >> > >> > unrolled
> >> > >> >   etc.  */
> >> > >> >  loop->force_vectorize = false;
> >> > >> >
> >> > >> > +   /* Add new loop to a processing queue.  To make it easier
> >> > >> > +  to match loop and its epilogue vectorization in dumps
> >> > >> > +  put new loop as the next loop to process.  */
> >> > >> > +   if (new_loop)
> >> > >> > + {
> >> > >> > +   loops.safe_insert (i + 1, new_loop->num);
> >> > >> > +   vect_loops_num = number_of_loops (cfun);
> >> > >> > + }
> >> > >> >
> >> > >> > simply dispatch to a vectorize_epilogue (loop_vinfo, new_loop)
> >> > >> f> unction which will set up stuff properly (and also perform
> >> > >> > the if-conversion of the epilogue there).
> >> > >> >
> >> > >> > That said, if we can get in non-masked epilogue vectorization
> >> > >> > separately that would be great.
> >> > >>
> >> > >> Could you please clarify your proposal.
> >> > >
> >> > > When a loop was vectorized set things up to immediately vectorize
> >> > > its epilogue, avoiding changing the loop iteration and avoiding
> >> > > the re-use of ->aux.
> >> > >
> >> > > Richard.
> >> > >
> >> > >> Thanks.
> >> > >> Yuri.
> >> > >>
> >> > >> 2016-11-02 15:27 GMT+03:00 Richard Biener :
> >> > >> > On Tue, 1 Nov 2016, Yuri Rumyantsev wrote:
> >> > >> >
> >> > >> >> Hi All,
> >> > >> >>
> >> > >> >> I re-send all patches sent by Ilya earlier for review which support
> >> > >> >> vectorization of loop epilogues and loops with low trip count. We
> >> > >> >> assume that the only patch - vec-tails-07-combine-tail.patch - was 
> >> > >> >> not
> >> > >> >> approved by Jeff.
> >> > >> >>
> >> > >> >> I did re-base of all patches and performed bootstrapping and
> >> > >> >> regression testing that did not show any new failures. Also all
> >> > >> >> changes related to new 

Re: [PATCH, vec-tails] Support loop epilogue vectorization

2016-11-11 Thread Yuri Rumyantsev
Richard,

Here is fixed version of updated patch 3.

Any comments will be appreciated.

Thanks.
Yuri.

2016-11-11 17:15 GMT+03:00 Yuri Rumyantsev :
> Richard,
>
> Sorry for confusion but my updated patch  does not work properly, so I
> need to fix it.
>
> Yuri.
>
> 2016-11-11 14:15 GMT+03:00 Yuri Rumyantsev :
>> Richard,
>>
>> I prepare updated 3 patch with passing additional argument to
>> vect_analyze_loop as you proposed (untested).
>>
>> You wrote:
>> tw, I wonder if you can produce a single patch containing just
>> epilogue vectorization, that is combine patches 1-3 but rip out
>> changes only needed by later patches?
>>
>> Did you mean that I exclude all support for vectorization epilogues,
>> i.e. exclude from 2-nd patch all non-related changes
>> like
>>
>> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
>> index 11863af..32011c1 100644
>> --- a/gcc/tree-vect-loop.c
>> +++ b/gcc/tree-vect-loop.c
>> @@ -1120,6 +1120,12 @@ new_loop_vec_info (struct loop *loop)
>>LOOP_VINFO_PEELING_FOR_GAPS (res) = false;
>>LOOP_VINFO_PEELING_FOR_NITER (res) = false;
>>LOOP_VINFO_OPERANDS_SWAPPED (res) = false;
>> +  LOOP_VINFO_CAN_BE_MASKED (res) = false;
>> +  LOOP_VINFO_REQUIRED_MASKS (res) = 0;
>> +  LOOP_VINFO_COMBINE_EPILOGUE (res) = false;
>> +  LOOP_VINFO_MASK_EPILOGUE (res) = false;
>> +  LOOP_VINFO_NEED_MASKING (res) = false;
>> +  LOOP_VINFO_ORIG_LOOP_INFO (res) = NULL;
>>
>> Did you mean also that new combined patch must be working patch, i.e.
>> can be integrated without other patches?
>>
>> Could you please look at updated patch?
>>
>> Thanks.
>> Yuri.
>>
>> 2016-11-10 15:36 GMT+03:00 Richard Biener :
>>> On Thu, 10 Nov 2016, Richard Biener wrote:
>>>
 On Tue, 8 Nov 2016, Yuri Rumyantsev wrote:

 > Richard,
 >
 > Here is updated 3 patch.
 >
 > I checked that all new tests related to epilogue vectorization passed 
 > with it.
 >
 > Your comments will be appreciated.

 A lot better now.  Instead of the ->aux dance I now prefer to
 pass the original loops loop_vinfo to vect_analyze_loop as
 optional argument (if non-NULL we analyze the epilogue of that
 loop_vinfo).  OTOH I remember we mainly use it to get at the
 original vectorization factor?  So we can pass down an (optional)
 forced vectorization factor as well?
>>>
>>> Btw, I wonder if you can produce a single patch containing just
>>> epilogue vectorization, that is combine patches 1-3 but rip out
>>> changes only needed by later patches?
>>>
>>> Thanks,
>>> Richard.
>>>
 Richard.

 > 2016-11-08 15:38 GMT+03:00 Richard Biener :
 > > On Thu, 3 Nov 2016, Yuri Rumyantsev wrote:
 > >
 > >> Hi Richard,
 > >>
 > >> I did not understand your last remark:
 > >>
 > >> > That is, here (and avoid the FOR_EACH_LOOP change):
 > >> >
 > >> > @@ -580,12 +586,21 @@ vectorize_loops (void)
 > >> >   && dump_enabled_p ())
 > >> >   dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, vect_location,
 > >> >"loop vectorized\n");
 > >> > -   vect_transform_loop (loop_vinfo);
 > >> > +   new_loop = vect_transform_loop (loop_vinfo);
 > >> > num_vectorized_loops++;
 > >> >/* Now that the loop has been vectorized, allow it to be 
 > >> > unrolled
 > >> >   etc.  */
 > >> >  loop->force_vectorize = false;
 > >> >
 > >> > +   /* Add new loop to a processing queue.  To make it easier
 > >> > +  to match loop and its epilogue vectorization in dumps
 > >> > +  put new loop as the next loop to process.  */
 > >> > +   if (new_loop)
 > >> > + {
 > >> > +   loops.safe_insert (i + 1, new_loop->num);
 > >> > +   vect_loops_num = number_of_loops (cfun);
 > >> > + }
 > >> >
 > >> > simply dispatch to a vectorize_epilogue (loop_vinfo, new_loop)
 > >> f> unction which will set up stuff properly (and also perform
 > >> > the if-conversion of the epilogue there).
 > >> >
 > >> > That said, if we can get in non-masked epilogue vectorization
 > >> > separately that would be great.
 > >>
 > >> Could you please clarify your proposal.
 > >
 > > When a loop was vectorized set things up to immediately vectorize
 > > its epilogue, avoiding changing the loop iteration and avoiding
 > > the re-use of ->aux.
 > >
 > > Richard.
 > >
 > >> Thanks.
 > >> Yuri.
 > >>
 > >> 2016-11-02 15:27 GMT+03:00 Richard Biener :
 > >> > On Tue, 1 Nov 2016, Yuri Rumyantsev wrote:
 > >> >
 > >> >> Hi All,
 > >> >>
 > >> >> I re-send all patches sent by Ilya earlier for review which support
 > >> >> vectorization of loop epilogues and loops with low trip count. We
 > >> >> assume that the 

Re: [PATCH, vec-tails] Support loop epilogue vectorization

2016-11-11 Thread Yuri Rumyantsev
Richard,

Sorry for confusion but my updated patch  does not work properly, so I
need to fix it.

Yuri.

2016-11-11 14:15 GMT+03:00 Yuri Rumyantsev :
> Richard,
>
> I prepare updated 3 patch with passing additional argument to
> vect_analyze_loop as you proposed (untested).
>
> You wrote:
> tw, I wonder if you can produce a single patch containing just
> epilogue vectorization, that is combine patches 1-3 but rip out
> changes only needed by later patches?
>
> Did you mean that I exclude all support for vectorization epilogues,
> i.e. exclude from 2-nd patch all non-related changes
> like
>
> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
> index 11863af..32011c1 100644
> --- a/gcc/tree-vect-loop.c
> +++ b/gcc/tree-vect-loop.c
> @@ -1120,6 +1120,12 @@ new_loop_vec_info (struct loop *loop)
>LOOP_VINFO_PEELING_FOR_GAPS (res) = false;
>LOOP_VINFO_PEELING_FOR_NITER (res) = false;
>LOOP_VINFO_OPERANDS_SWAPPED (res) = false;
> +  LOOP_VINFO_CAN_BE_MASKED (res) = false;
> +  LOOP_VINFO_REQUIRED_MASKS (res) = 0;
> +  LOOP_VINFO_COMBINE_EPILOGUE (res) = false;
> +  LOOP_VINFO_MASK_EPILOGUE (res) = false;
> +  LOOP_VINFO_NEED_MASKING (res) = false;
> +  LOOP_VINFO_ORIG_LOOP_INFO (res) = NULL;
>
> Did you mean also that new combined patch must be working patch, i.e.
> can be integrated without other patches?
>
> Could you please look at updated patch?
>
> Thanks.
> Yuri.
>
> 2016-11-10 15:36 GMT+03:00 Richard Biener :
>> On Thu, 10 Nov 2016, Richard Biener wrote:
>>
>>> On Tue, 8 Nov 2016, Yuri Rumyantsev wrote:
>>>
>>> > Richard,
>>> >
>>> > Here is updated 3 patch.
>>> >
>>> > I checked that all new tests related to epilogue vectorization passed 
>>> > with it.
>>> >
>>> > Your comments will be appreciated.
>>>
>>> A lot better now.  Instead of the ->aux dance I now prefer to
>>> pass the original loops loop_vinfo to vect_analyze_loop as
>>> optional argument (if non-NULL we analyze the epilogue of that
>>> loop_vinfo).  OTOH I remember we mainly use it to get at the
>>> original vectorization factor?  So we can pass down an (optional)
>>> forced vectorization factor as well?
>>
>> Btw, I wonder if you can produce a single patch containing just
>> epilogue vectorization, that is combine patches 1-3 but rip out
>> changes only needed by later patches?
>>
>> Thanks,
>> Richard.
>>
>>> Richard.
>>>
>>> > 2016-11-08 15:38 GMT+03:00 Richard Biener :
>>> > > On Thu, 3 Nov 2016, Yuri Rumyantsev wrote:
>>> > >
>>> > >> Hi Richard,
>>> > >>
>>> > >> I did not understand your last remark:
>>> > >>
>>> > >> > That is, here (and avoid the FOR_EACH_LOOP change):
>>> > >> >
>>> > >> > @@ -580,12 +586,21 @@ vectorize_loops (void)
>>> > >> >   && dump_enabled_p ())
>>> > >> >   dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, vect_location,
>>> > >> >"loop vectorized\n");
>>> > >> > -   vect_transform_loop (loop_vinfo);
>>> > >> > +   new_loop = vect_transform_loop (loop_vinfo);
>>> > >> > num_vectorized_loops++;
>>> > >> >/* Now that the loop has been vectorized, allow it to be 
>>> > >> > unrolled
>>> > >> >   etc.  */
>>> > >> >  loop->force_vectorize = false;
>>> > >> >
>>> > >> > +   /* Add new loop to a processing queue.  To make it easier
>>> > >> > +  to match loop and its epilogue vectorization in dumps
>>> > >> > +  put new loop as the next loop to process.  */
>>> > >> > +   if (new_loop)
>>> > >> > + {
>>> > >> > +   loops.safe_insert (i + 1, new_loop->num);
>>> > >> > +   vect_loops_num = number_of_loops (cfun);
>>> > >> > + }
>>> > >> >
>>> > >> > simply dispatch to a vectorize_epilogue (loop_vinfo, new_loop)
>>> > >> f> unction which will set up stuff properly (and also perform
>>> > >> > the if-conversion of the epilogue there).
>>> > >> >
>>> > >> > That said, if we can get in non-masked epilogue vectorization
>>> > >> > separately that would be great.
>>> > >>
>>> > >> Could you please clarify your proposal.
>>> > >
>>> > > When a loop was vectorized set things up to immediately vectorize
>>> > > its epilogue, avoiding changing the loop iteration and avoiding
>>> > > the re-use of ->aux.
>>> > >
>>> > > Richard.
>>> > >
>>> > >> Thanks.
>>> > >> Yuri.
>>> > >>
>>> > >> 2016-11-02 15:27 GMT+03:00 Richard Biener :
>>> > >> > On Tue, 1 Nov 2016, Yuri Rumyantsev wrote:
>>> > >> >
>>> > >> >> Hi All,
>>> > >> >>
>>> > >> >> I re-send all patches sent by Ilya earlier for review which support
>>> > >> >> vectorization of loop epilogues and loops with low trip count. We
>>> > >> >> assume that the only patch - vec-tails-07-combine-tail.patch - was 
>>> > >> >> not
>>> > >> >> approved by Jeff.
>>> > >> >>
>>> > >> >> I did re-base of all patches and performed bootstrapping and
>>> > >> >> regression testing that did not show any new failures. Also all
>>> > >> >> changes related to new 

Re: [PATCH, vec-tails] Support loop epilogue vectorization

2016-11-11 Thread Yuri Rumyantsev
Richard,

I prepare updated 3 patch with passing additional argument to
vect_analyze_loop as you proposed (untested).

You wrote:
tw, I wonder if you can produce a single patch containing just
epilogue vectorization, that is combine patches 1-3 but rip out
changes only needed by later patches?

Did you mean that I exclude all support for vectorization epilogues,
i.e. exclude from 2-nd patch all non-related changes
like

diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index 11863af..32011c1 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -1120,6 +1120,12 @@ new_loop_vec_info (struct loop *loop)
   LOOP_VINFO_PEELING_FOR_GAPS (res) = false;
   LOOP_VINFO_PEELING_FOR_NITER (res) = false;
   LOOP_VINFO_OPERANDS_SWAPPED (res) = false;
+  LOOP_VINFO_CAN_BE_MASKED (res) = false;
+  LOOP_VINFO_REQUIRED_MASKS (res) = 0;
+  LOOP_VINFO_COMBINE_EPILOGUE (res) = false;
+  LOOP_VINFO_MASK_EPILOGUE (res) = false;
+  LOOP_VINFO_NEED_MASKING (res) = false;
+  LOOP_VINFO_ORIG_LOOP_INFO (res) = NULL;

Did you mean also that new combined patch must be working patch, i.e.
can be integrated without other patches?

Could you please look at updated patch?

Thanks.
Yuri.

2016-11-10 15:36 GMT+03:00 Richard Biener :
> On Thu, 10 Nov 2016, Richard Biener wrote:
>
>> On Tue, 8 Nov 2016, Yuri Rumyantsev wrote:
>>
>> > Richard,
>> >
>> > Here is updated 3 patch.
>> >
>> > I checked that all new tests related to epilogue vectorization passed with 
>> > it.
>> >
>> > Your comments will be appreciated.
>>
>> A lot better now.  Instead of the ->aux dance I now prefer to
>> pass the original loops loop_vinfo to vect_analyze_loop as
>> optional argument (if non-NULL we analyze the epilogue of that
>> loop_vinfo).  OTOH I remember we mainly use it to get at the
>> original vectorization factor?  So we can pass down an (optional)
>> forced vectorization factor as well?
>
> Btw, I wonder if you can produce a single patch containing just
> epilogue vectorization, that is combine patches 1-3 but rip out
> changes only needed by later patches?
>
> Thanks,
> Richard.
>
>> Richard.
>>
>> > 2016-11-08 15:38 GMT+03:00 Richard Biener :
>> > > On Thu, 3 Nov 2016, Yuri Rumyantsev wrote:
>> > >
>> > >> Hi Richard,
>> > >>
>> > >> I did not understand your last remark:
>> > >>
>> > >> > That is, here (and avoid the FOR_EACH_LOOP change):
>> > >> >
>> > >> > @@ -580,12 +586,21 @@ vectorize_loops (void)
>> > >> >   && dump_enabled_p ())
>> > >> >   dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, vect_location,
>> > >> >"loop vectorized\n");
>> > >> > -   vect_transform_loop (loop_vinfo);
>> > >> > +   new_loop = vect_transform_loop (loop_vinfo);
>> > >> > num_vectorized_loops++;
>> > >> >/* Now that the loop has been vectorized, allow it to be 
>> > >> > unrolled
>> > >> >   etc.  */
>> > >> >  loop->force_vectorize = false;
>> > >> >
>> > >> > +   /* Add new loop to a processing queue.  To make it easier
>> > >> > +  to match loop and its epilogue vectorization in dumps
>> > >> > +  put new loop as the next loop to process.  */
>> > >> > +   if (new_loop)
>> > >> > + {
>> > >> > +   loops.safe_insert (i + 1, new_loop->num);
>> > >> > +   vect_loops_num = number_of_loops (cfun);
>> > >> > + }
>> > >> >
>> > >> > simply dispatch to a vectorize_epilogue (loop_vinfo, new_loop)
>> > >> f> unction which will set up stuff properly (and also perform
>> > >> > the if-conversion of the epilogue there).
>> > >> >
>> > >> > That said, if we can get in non-masked epilogue vectorization
>> > >> > separately that would be great.
>> > >>
>> > >> Could you please clarify your proposal.
>> > >
>> > > When a loop was vectorized set things up to immediately vectorize
>> > > its epilogue, avoiding changing the loop iteration and avoiding
>> > > the re-use of ->aux.
>> > >
>> > > Richard.
>> > >
>> > >> Thanks.
>> > >> Yuri.
>> > >>
>> > >> 2016-11-02 15:27 GMT+03:00 Richard Biener :
>> > >> > On Tue, 1 Nov 2016, Yuri Rumyantsev wrote:
>> > >> >
>> > >> >> Hi All,
>> > >> >>
>> > >> >> I re-send all patches sent by Ilya earlier for review which support
>> > >> >> vectorization of loop epilogues and loops with low trip count. We
>> > >> >> assume that the only patch - vec-tails-07-combine-tail.patch - was 
>> > >> >> not
>> > >> >> approved by Jeff.
>> > >> >>
>> > >> >> I did re-base of all patches and performed bootstrapping and
>> > >> >> regression testing that did not show any new failures. Also all
>> > >> >> changes related to new vect_do_peeling algorithm have been changed
>> > >> >> accordingly.
>> > >> >>
>> > >> >> Is it OK for trunk?
>> > >> >
>> > >> > I would have prefered that the series up to -03-nomask-tails would
>> > >> > _only_ contain epilogue loop vectorization changes but unfortunately
>> > >> > the patchset is oddly separated.
>> > >> >
>> > >> > 

Re: [PATCH, vec-tails] Support loop epilogue vectorization

2016-11-10 Thread Richard Biener
On Thu, 10 Nov 2016, Richard Biener wrote:

> On Tue, 8 Nov 2016, Yuri Rumyantsev wrote:
> 
> > Richard,
> > 
> > Here is updated 3 patch.
> > 
> > I checked that all new tests related to epilogue vectorization passed with 
> > it.
> > 
> > Your comments will be appreciated.
> 
> A lot better now.  Instead of the ->aux dance I now prefer to
> pass the original loops loop_vinfo to vect_analyze_loop as
> optional argument (if non-NULL we analyze the epilogue of that 
> loop_vinfo).  OTOH I remember we mainly use it to get at the
> original vectorization factor?  So we can pass down an (optional)
> forced vectorization factor as well?

Btw, I wonder if you can produce a single patch containing just
epilogue vectorization, that is combine patches 1-3 but rip out
changes only needed by later patches?

Thanks,
Richard.

> Richard.
> 
> > 2016-11-08 15:38 GMT+03:00 Richard Biener :
> > > On Thu, 3 Nov 2016, Yuri Rumyantsev wrote:
> > >
> > >> Hi Richard,
> > >>
> > >> I did not understand your last remark:
> > >>
> > >> > That is, here (and avoid the FOR_EACH_LOOP change):
> > >> >
> > >> > @@ -580,12 +586,21 @@ vectorize_loops (void)
> > >> >   && dump_enabled_p ())
> > >> >   dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, vect_location,
> > >> >"loop vectorized\n");
> > >> > -   vect_transform_loop (loop_vinfo);
> > >> > +   new_loop = vect_transform_loop (loop_vinfo);
> > >> > num_vectorized_loops++;
> > >> >/* Now that the loop has been vectorized, allow it to be 
> > >> > unrolled
> > >> >   etc.  */
> > >> >  loop->force_vectorize = false;
> > >> >
> > >> > +   /* Add new loop to a processing queue.  To make it easier
> > >> > +  to match loop and its epilogue vectorization in dumps
> > >> > +  put new loop as the next loop to process.  */
> > >> > +   if (new_loop)
> > >> > + {
> > >> > +   loops.safe_insert (i + 1, new_loop->num);
> > >> > +   vect_loops_num = number_of_loops (cfun);
> > >> > + }
> > >> >
> > >> > simply dispatch to a vectorize_epilogue (loop_vinfo, new_loop)
> > >> f> unction which will set up stuff properly (and also perform
> > >> > the if-conversion of the epilogue there).
> > >> >
> > >> > That said, if we can get in non-masked epilogue vectorization
> > >> > separately that would be great.
> > >>
> > >> Could you please clarify your proposal.
> > >
> > > When a loop was vectorized set things up to immediately vectorize
> > > its epilogue, avoiding changing the loop iteration and avoiding
> > > the re-use of ->aux.
> > >
> > > Richard.
> > >
> > >> Thanks.
> > >> Yuri.
> > >>
> > >> 2016-11-02 15:27 GMT+03:00 Richard Biener :
> > >> > On Tue, 1 Nov 2016, Yuri Rumyantsev wrote:
> > >> >
> > >> >> Hi All,
> > >> >>
> > >> >> I re-send all patches sent by Ilya earlier for review which support
> > >> >> vectorization of loop epilogues and loops with low trip count. We
> > >> >> assume that the only patch - vec-tails-07-combine-tail.patch - was not
> > >> >> approved by Jeff.
> > >> >>
> > >> >> I did re-base of all patches and performed bootstrapping and
> > >> >> regression testing that did not show any new failures. Also all
> > >> >> changes related to new vect_do_peeling algorithm have been changed
> > >> >> accordingly.
> > >> >>
> > >> >> Is it OK for trunk?
> > >> >
> > >> > I would have prefered that the series up to -03-nomask-tails would
> > >> > _only_ contain epilogue loop vectorization changes but unfortunately
> > >> > the patchset is oddly separated.
> > >> >
> > >> > I have a comment on that part nevertheless:
> > >> >
> > >> > @@ -1608,7 +1614,10 @@ vect_enhance_data_refs_alignment (loop_vec_info
> > >> > loop_vinfo)
> > >> >/* Check if we can possibly peel the loop.  */
> > >> >if (!vect_can_advance_ivs_p (loop_vinfo)
> > >> >|| !slpeel_can_duplicate_loop_p (loop, single_exit (loop))
> > >> > -  || loop->inner)
> > >> > +  || loop->inner
> > >> > +  /* Required peeling was performed in prologue and
> > >> > +is not required for epilogue.  */
> > >> > +  || LOOP_VINFO_EPILOGUE_P (loop_vinfo))
> > >> >  do_peeling = false;
> > >> >
> > >> >if (do_peeling
> > >> > @@ -1888,7 +1897,10 @@ vect_enhance_data_refs_alignment (loop_vec_info
> > >> > loop_vinfo)
> > >> >
> > >> >do_versioning =
> > >> > optimize_loop_nest_for_speed_p (loop)
> > >> > -   && (!loop->inner); /* FORNOW */
> > >> > +   && (!loop->inner) /* FORNOW */
> > >> > +/* Required versioning was performed for the
> > >> > +  original loop and is not required for epilogue.  */
> > >> > +   && !LOOP_VINFO_EPILOGUE_P (loop_vinfo);
> > >> >
> > >> >if (do_versioning)
> > >> >  {
> > >> >
> > >> > please do that check in the single caller of this function.
> > >> >
> > >> > Otherwise I still dislike the new ->aux use and I believe that simply
> > >> > 

Re: [PATCH, vec-tails] Support loop epilogue vectorization

2016-11-10 Thread Richard Biener
On Tue, 8 Nov 2016, Yuri Rumyantsev wrote:

> Richard,
> 
> Here is updated 3 patch.
> 
> I checked that all new tests related to epilogue vectorization passed with it.
> 
> Your comments will be appreciated.

A lot better now.  Instead of the ->aux dance I now prefer to
pass the original loops loop_vinfo to vect_analyze_loop as
optional argument (if non-NULL we analyze the epilogue of that 
loop_vinfo).  OTOH I remember we mainly use it to get at the
original vectorization factor?  So we can pass down an (optional)
forced vectorization factor as well?

Richard.

> 2016-11-08 15:38 GMT+03:00 Richard Biener :
> > On Thu, 3 Nov 2016, Yuri Rumyantsev wrote:
> >
> >> Hi Richard,
> >>
> >> I did not understand your last remark:
> >>
> >> > That is, here (and avoid the FOR_EACH_LOOP change):
> >> >
> >> > @@ -580,12 +586,21 @@ vectorize_loops (void)
> >> >   && dump_enabled_p ())
> >> >   dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, vect_location,
> >> >"loop vectorized\n");
> >> > -   vect_transform_loop (loop_vinfo);
> >> > +   new_loop = vect_transform_loop (loop_vinfo);
> >> > num_vectorized_loops++;
> >> >/* Now that the loop has been vectorized, allow it to be unrolled
> >> >   etc.  */
> >> >  loop->force_vectorize = false;
> >> >
> >> > +   /* Add new loop to a processing queue.  To make it easier
> >> > +  to match loop and its epilogue vectorization in dumps
> >> > +  put new loop as the next loop to process.  */
> >> > +   if (new_loop)
> >> > + {
> >> > +   loops.safe_insert (i + 1, new_loop->num);
> >> > +   vect_loops_num = number_of_loops (cfun);
> >> > + }
> >> >
> >> > simply dispatch to a vectorize_epilogue (loop_vinfo, new_loop)
> >> f> unction which will set up stuff properly (and also perform
> >> > the if-conversion of the epilogue there).
> >> >
> >> > That said, if we can get in non-masked epilogue vectorization
> >> > separately that would be great.
> >>
> >> Could you please clarify your proposal.
> >
> > When a loop was vectorized set things up to immediately vectorize
> > its epilogue, avoiding changing the loop iteration and avoiding
> > the re-use of ->aux.
> >
> > Richard.
> >
> >> Thanks.
> >> Yuri.
> >>
> >> 2016-11-02 15:27 GMT+03:00 Richard Biener :
> >> > On Tue, 1 Nov 2016, Yuri Rumyantsev wrote:
> >> >
> >> >> Hi All,
> >> >>
> >> >> I re-send all patches sent by Ilya earlier for review which support
> >> >> vectorization of loop epilogues and loops with low trip count. We
> >> >> assume that the only patch - vec-tails-07-combine-tail.patch - was not
> >> >> approved by Jeff.
> >> >>
> >> >> I did re-base of all patches and performed bootstrapping and
> >> >> regression testing that did not show any new failures. Also all
> >> >> changes related to new vect_do_peeling algorithm have been changed
> >> >> accordingly.
> >> >>
> >> >> Is it OK for trunk?
> >> >
> >> > I would have prefered that the series up to -03-nomask-tails would
> >> > _only_ contain epilogue loop vectorization changes but unfortunately
> >> > the patchset is oddly separated.
> >> >
> >> > I have a comment on that part nevertheless:
> >> >
> >> > @@ -1608,7 +1614,10 @@ vect_enhance_data_refs_alignment (loop_vec_info
> >> > loop_vinfo)
> >> >/* Check if we can possibly peel the loop.  */
> >> >if (!vect_can_advance_ivs_p (loop_vinfo)
> >> >|| !slpeel_can_duplicate_loop_p (loop, single_exit (loop))
> >> > -  || loop->inner)
> >> > +  || loop->inner
> >> > +  /* Required peeling was performed in prologue and
> >> > +is not required for epilogue.  */
> >> > +  || LOOP_VINFO_EPILOGUE_P (loop_vinfo))
> >> >  do_peeling = false;
> >> >
> >> >if (do_peeling
> >> > @@ -1888,7 +1897,10 @@ vect_enhance_data_refs_alignment (loop_vec_info
> >> > loop_vinfo)
> >> >
> >> >do_versioning =
> >> > optimize_loop_nest_for_speed_p (loop)
> >> > -   && (!loop->inner); /* FORNOW */
> >> > +   && (!loop->inner) /* FORNOW */
> >> > +/* Required versioning was performed for the
> >> > +  original loop and is not required for epilogue.  */
> >> > +   && !LOOP_VINFO_EPILOGUE_P (loop_vinfo);
> >> >
> >> >if (do_versioning)
> >> >  {
> >> >
> >> > please do that check in the single caller of this function.
> >> >
> >> > Otherwise I still dislike the new ->aux use and I believe that simply
> >> > passing down info from the processed parent would be _much_ cleaner.
> >> > That is, here (and avoid the FOR_EACH_LOOP change):
> >> >
> >> > @@ -580,12 +586,21 @@ vectorize_loops (void)
> >> > && dump_enabled_p ())
> >> >dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, vect_location,
> >> > "loop vectorized\n");
> >> > -   vect_transform_loop (loop_vinfo);
> >> > +   new_loop = vect_transform_loop (loop_vinfo);
> >> > 

Re: [PATCH, vec-tails] Support loop epilogue vectorization

2016-11-09 Thread Richard Biener
On Wed, 9 Nov 2016, Yuri Rumyantsev wrote:

> Thanks Richard for your comments.
> Your proposed to handle epilogue loop just like normal short-trip loop
> but this is not exactly truth since e.g. epilogue must not be peeled
> for alignment.

But if we know the epilogue data-refs are aligned we should have reflected
that in the code so the vectorizer wouldn't even try to peel for
alignment.  OTOH peeling for alignment already knows that peeling
a short-trip loop is not likely profitable (maybe the condition needs
to be hardened to also work for VF/2).

> It is not clear for me what are my next steps? Should I re-design the
> patch completely or simply decompose the whole patch to different
> parts? But it means that we must start review process from beginning
> but release is closed to its end.

What I disliked about the series from the beginning is that it does
everything at once rather than first introducing vectorizing of
epilogues as an independent patch.  Lumping all in together makes
it hard to decipher the conditions each feature is enabled.

I'm mostly concerned about the predication part and thus if we can
get the other parts separated and committed that would be a much
smaller patch to look at and experiment.

Note that only stage1 is at its end, we usually still accept patches
that were posted before stage1 end during stage3.

> Note also that i work for Intel till the end of year and have not idea
> who will continue working on this project.

Noted.

Richard.

> Any help will be appreciated.
>
> Thanks.
> Yuri.
> 
> 2016-11-09 13:37 GMT+03:00 Bin.Cheng :
> > On Tue, Nov 1, 2016 at 12:38 PM, Yuri Rumyantsev  wrote:
> >> Hi All,
> >>
> >> I re-send all patches sent by Ilya earlier for review which support
> >> vectorization of loop epilogues and loops with low trip count. We
> >> assume that the only patch - vec-tails-07-combine-tail.patch - was not
> >> approved by Jeff.
> >>
> >> I did re-base of all patches and performed bootstrapping and
> >> regression testing that did not show any new failures. Also all
> >> changes related to new vect_do_peeling algorithm have been changed
> >> accordingly.
> >>
> >> Is it OK for trunk?
> >
> > Hi,
> > I can't approve patches, but had some comments after going through the
> > implementation.
> >
> > One confusing part is cost model change, as well as the way it's used
> > to decide how epilogue loop should be vectorized.  Given vect-tail is
> > disabled at the moment and the cost change needs further tuning, is it
> > reasonable to split this part out and get vectorization part
> > reviewed/submitted independently?  For example, let user specified
> > parameters make the decision for now.  Cost and target dependent
> > changes should go in at last, this could make the patch easier to
> > read.
> >
> > The implementation computes/shares quite amount information from main
> > loop to epilogue loop vectorization.  Furthermore, variables/fields
> > for such information are somehow named in a misleading way.  For
> > example. LOOP_VINFO_MASK_EPILOGUE gives me the impression this is the
> > flag controlling whether epilogue loop should be vectorized with
> > masking.  However, it's actually controlled by exactly the same flag
> > as whether epilogue loop should be combined into the main loop with
> > masking:
> > @@ -7338,6 +8013,9 @@ vect_transform_loop (loop_vec_info loop_vinfo)
> >
> >slpeel_make_loop_iterate_ntimes (loop, niters_vector);
> >
> > +  if (LOOP_VINFO_COMBINE_EPILOGUE (loop_vinfo))
> > +vect_combine_loop_epilogue (loop_vinfo);
> > +
> >/* Reduce loop iterations by the vectorization factor.  */
> >scale_loop_profile (loop, GCOV_COMPUTE_SCALE (1, vf),
> >expected_iterations / vf);
> >
> > IMHO, we should decouple main loop vectorization and epilogue
> > vectorization as much as possible by sharing as few information as we
> > can.  The general idea is to handle epilogue loop just like normal
> > short-trip loop.  For example, we can rename
> > LOOP_VINFO_COMBINE_EPILOGUE into LOOP_VINFO_VECT_MASK (or something
> > else), and we don't differentiate its meaning between main and
> > epilogue(short-trip) loop.  It only indicates the current loop should
> > be vectorized with masking no matter it's a main loop or epilogue
> > loop, and it works just like the current implementation.
> >
> > After this change, we can refine vectorization and make it more
> > general for normal loop and epilogue(short trip) loop.  For example,
> > this implementation sets LOOP_VINFO_PEELING_FOR_NITER  for epilogue
> > loop and use it to control how it should be vectorized:
> > +  if (!LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo))
> > +{
> > +  LOOP_VINFO_MASK_EPILOGUE (loop_vinfo) = false;
> > +  LOOP_VINFO_COMBINE_EPILOGUE (loop_vinfo) = false;
> > +}
> > +  else if (LOOP_VINFO_CAN_BE_MASKED (loop_vinfo)
> > +   && min_profitable_combine_iters >= 0)
> > +{
> >
> > This works, 

Re: [PATCH, vec-tails] Support loop epilogue vectorization

2016-11-09 Thread Bin.Cheng
On Wed, Nov 9, 2016 at 12:12 PM, Yuri Rumyantsev  wrote:
> I am familiar with SVE extension and understand that implemented
> approach might be not suitable for ARM. The main point is that only
> load/store instructions are masked but all other calculations are not
> (we did special conversion for reduction statements to implement
> merging predication semantic). For SVE peeling for niters is not
> required but it is not true for x86 -  we must determine what
> vectorization scheme is more profitable: loop combining (the only
> essential for SVE) or separate epilogue vectorization using masking or
> less vectorization factor. So I'd like to have the full list of
> required changes to our implementation to try to remove them.
Hmm, sorry that my comment gave impression that I was trying to hold
back the patch, it's not what I meant by any means.  Also it's not
related to SVE, As a matter of fact, I haven't read any document about
SVE yet.  Sorry again for the false impression conveyed by previous
messages.

Thanks,
bin
>
> Thanks.
> Yuri.
>
> 2016-11-09 14:46 GMT+03:00 Bin.Cheng :
>> On Wed, Nov 9, 2016 at 11:28 AM, Yuri Rumyantsev  wrote:
>>> Thanks Richard for your comments.
>>> Your proposed to handle epilogue loop just like normal short-trip loop
>>> but this is not exactly truth since e.g. epilogue must not be peeled
>>> for alignment.
>> Yes there must be some differences, my motivation is to minimize that
>> so we don't need to specially check normal/epilogue loops at too many
>> places.
>> Of course it's just my feeling when going through the patch set, and
>> could be wrong.
>>
>> Thanks,
>> bin
>>>
>>> It is not clear for me what are my next steps? Should I re-design the
>>> patch completely or simply decompose the whole patch to different
>>> parts? But it means that we must start review process from beginning
>>> but release is closed to its end.
>>> Note also that i work for Intel till the end of year and have not idea
>>> who will continue working on this project.
>>>
>>> Any help will be appreciated.
>>>
>>> Thanks.
>>> Yuri.
>>>
>>> 2016-11-09 13:37 GMT+03:00 Bin.Cheng :
 On Tue, Nov 1, 2016 at 12:38 PM, Yuri Rumyantsev  
 wrote:
> Hi All,
>
> I re-send all patches sent by Ilya earlier for review which support
> vectorization of loop epilogues and loops with low trip count. We
> assume that the only patch - vec-tails-07-combine-tail.patch - was not
> approved by Jeff.
>
> I did re-base of all patches and performed bootstrapping and
> regression testing that did not show any new failures. Also all
> changes related to new vect_do_peeling algorithm have been changed
> accordingly.
>
> Is it OK for trunk?

 Hi,
 I can't approve patches, but had some comments after going through the
 implementation.

 One confusing part is cost model change, as well as the way it's used
 to decide how epilogue loop should be vectorized.  Given vect-tail is
 disabled at the moment and the cost change needs further tuning, is it
 reasonable to split this part out and get vectorization part
 reviewed/submitted independently?  For example, let user specified
 parameters make the decision for now.  Cost and target dependent
 changes should go in at last, this could make the patch easier to
 read.

 The implementation computes/shares quite amount information from main
 loop to epilogue loop vectorization.  Furthermore, variables/fields
 for such information are somehow named in a misleading way.  For
 example. LOOP_VINFO_MASK_EPILOGUE gives me the impression this is the
 flag controlling whether epilogue loop should be vectorized with
 masking.  However, it's actually controlled by exactly the same flag
 as whether epilogue loop should be combined into the main loop with
 masking:
 @@ -7338,6 +8013,9 @@ vect_transform_loop (loop_vec_info loop_vinfo)

slpeel_make_loop_iterate_ntimes (loop, niters_vector);

 +  if (LOOP_VINFO_COMBINE_EPILOGUE (loop_vinfo))
 +vect_combine_loop_epilogue (loop_vinfo);
 +
/* Reduce loop iterations by the vectorization factor.  */
scale_loop_profile (loop, GCOV_COMPUTE_SCALE (1, vf),
expected_iterations / vf);

 IMHO, we should decouple main loop vectorization and epilogue
 vectorization as much as possible by sharing as few information as we
 can.  The general idea is to handle epilogue loop just like normal
 short-trip loop.  For example, we can rename
 LOOP_VINFO_COMBINE_EPILOGUE into LOOP_VINFO_VECT_MASK (or something
 else), and we don't differentiate its meaning between main and
 epilogue(short-trip) loop.  It only indicates the current loop should
 be vectorized with masking no matter it's a main loop or epilogue
 loop, and 

Re: [PATCH, vec-tails] Support loop epilogue vectorization

2016-11-09 Thread Yuri Rumyantsev
I am familiar with SVE extension and understand that implemented
approach might be not suitable for ARM. The main point is that only
load/store instructions are masked but all other calculations are not
(we did special conversion for reduction statements to implement
merging predication semantic). For SVE peeling for niters is not
required but it is not true for x86 -  we must determine what
vectorization scheme is more profitable: loop combining (the only
essential for SVE) or separate epilogue vectorization using masking or
less vectorization factor. So I'd like to have the full list of
required changes to our implementation to try to remove them.

Thanks.
Yuri.

2016-11-09 14:46 GMT+03:00 Bin.Cheng :
> On Wed, Nov 9, 2016 at 11:28 AM, Yuri Rumyantsev  wrote:
>> Thanks Richard for your comments.
>> Your proposed to handle epilogue loop just like normal short-trip loop
>> but this is not exactly truth since e.g. epilogue must not be peeled
>> for alignment.
> Yes there must be some differences, my motivation is to minimize that
> so we don't need to specially check normal/epilogue loops at too many
> places.
> Of course it's just my feeling when going through the patch set, and
> could be wrong.
>
> Thanks,
> bin
>>
>> It is not clear for me what are my next steps? Should I re-design the
>> patch completely or simply decompose the whole patch to different
>> parts? But it means that we must start review process from beginning
>> but release is closed to its end.
>> Note also that i work for Intel till the end of year and have not idea
>> who will continue working on this project.
>>
>> Any help will be appreciated.
>>
>> Thanks.
>> Yuri.
>>
>> 2016-11-09 13:37 GMT+03:00 Bin.Cheng :
>>> On Tue, Nov 1, 2016 at 12:38 PM, Yuri Rumyantsev  wrote:
 Hi All,

 I re-send all patches sent by Ilya earlier for review which support
 vectorization of loop epilogues and loops with low trip count. We
 assume that the only patch - vec-tails-07-combine-tail.patch - was not
 approved by Jeff.

 I did re-base of all patches and performed bootstrapping and
 regression testing that did not show any new failures. Also all
 changes related to new vect_do_peeling algorithm have been changed
 accordingly.

 Is it OK for trunk?
>>>
>>> Hi,
>>> I can't approve patches, but had some comments after going through the
>>> implementation.
>>>
>>> One confusing part is cost model change, as well as the way it's used
>>> to decide how epilogue loop should be vectorized.  Given vect-tail is
>>> disabled at the moment and the cost change needs further tuning, is it
>>> reasonable to split this part out and get vectorization part
>>> reviewed/submitted independently?  For example, let user specified
>>> parameters make the decision for now.  Cost and target dependent
>>> changes should go in at last, this could make the patch easier to
>>> read.
>>>
>>> The implementation computes/shares quite amount information from main
>>> loop to epilogue loop vectorization.  Furthermore, variables/fields
>>> for such information are somehow named in a misleading way.  For
>>> example. LOOP_VINFO_MASK_EPILOGUE gives me the impression this is the
>>> flag controlling whether epilogue loop should be vectorized with
>>> masking.  However, it's actually controlled by exactly the same flag
>>> as whether epilogue loop should be combined into the main loop with
>>> masking:
>>> @@ -7338,6 +8013,9 @@ vect_transform_loop (loop_vec_info loop_vinfo)
>>>
>>>slpeel_make_loop_iterate_ntimes (loop, niters_vector);
>>>
>>> +  if (LOOP_VINFO_COMBINE_EPILOGUE (loop_vinfo))
>>> +vect_combine_loop_epilogue (loop_vinfo);
>>> +
>>>/* Reduce loop iterations by the vectorization factor.  */
>>>scale_loop_profile (loop, GCOV_COMPUTE_SCALE (1, vf),
>>>expected_iterations / vf);
>>>
>>> IMHO, we should decouple main loop vectorization and epilogue
>>> vectorization as much as possible by sharing as few information as we
>>> can.  The general idea is to handle epilogue loop just like normal
>>> short-trip loop.  For example, we can rename
>>> LOOP_VINFO_COMBINE_EPILOGUE into LOOP_VINFO_VECT_MASK (or something
>>> else), and we don't differentiate its meaning between main and
>>> epilogue(short-trip) loop.  It only indicates the current loop should
>>> be vectorized with masking no matter it's a main loop or epilogue
>>> loop, and it works just like the current implementation.
>>>
>>> After this change, we can refine vectorization and make it more
>>> general for normal loop and epilogue(short trip) loop.  For example,
>>> this implementation sets LOOP_VINFO_PEELING_FOR_NITER  for epilogue
>>> loop and use it to control how it should be vectorized:
>>> +  if (!LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo))
>>> +{
>>> +  LOOP_VINFO_MASK_EPILOGUE (loop_vinfo) = false;
>>> +  LOOP_VINFO_COMBINE_EPILOGUE 

Re: [PATCH, vec-tails] Support loop epilogue vectorization

2016-11-09 Thread Bin.Cheng
On Wed, Nov 9, 2016 at 11:28 AM, Yuri Rumyantsev  wrote:
> Thanks Richard for your comments.
> Your proposed to handle epilogue loop just like normal short-trip loop
> but this is not exactly truth since e.g. epilogue must not be peeled
> for alignment.
Yes there must be some differences, my motivation is to minimize that
so we don't need to specially check normal/epilogue loops at too many
places.
Of course it's just my feeling when going through the patch set, and
could be wrong.

Thanks,
bin
>
> It is not clear for me what are my next steps? Should I re-design the
> patch completely or simply decompose the whole patch to different
> parts? But it means that we must start review process from beginning
> but release is closed to its end.
> Note also that i work for Intel till the end of year and have not idea
> who will continue working on this project.
>
> Any help will be appreciated.
>
> Thanks.
> Yuri.
>
> 2016-11-09 13:37 GMT+03:00 Bin.Cheng :
>> On Tue, Nov 1, 2016 at 12:38 PM, Yuri Rumyantsev  wrote:
>>> Hi All,
>>>
>>> I re-send all patches sent by Ilya earlier for review which support
>>> vectorization of loop epilogues and loops with low trip count. We
>>> assume that the only patch - vec-tails-07-combine-tail.patch - was not
>>> approved by Jeff.
>>>
>>> I did re-base of all patches and performed bootstrapping and
>>> regression testing that did not show any new failures. Also all
>>> changes related to new vect_do_peeling algorithm have been changed
>>> accordingly.
>>>
>>> Is it OK for trunk?
>>
>> Hi,
>> I can't approve patches, but had some comments after going through the
>> implementation.
>>
>> One confusing part is cost model change, as well as the way it's used
>> to decide how epilogue loop should be vectorized.  Given vect-tail is
>> disabled at the moment and the cost change needs further tuning, is it
>> reasonable to split this part out and get vectorization part
>> reviewed/submitted independently?  For example, let user specified
>> parameters make the decision for now.  Cost and target dependent
>> changes should go in at last, this could make the patch easier to
>> read.
>>
>> The implementation computes/shares quite amount information from main
>> loop to epilogue loop vectorization.  Furthermore, variables/fields
>> for such information are somehow named in a misleading way.  For
>> example. LOOP_VINFO_MASK_EPILOGUE gives me the impression this is the
>> flag controlling whether epilogue loop should be vectorized with
>> masking.  However, it's actually controlled by exactly the same flag
>> as whether epilogue loop should be combined into the main loop with
>> masking:
>> @@ -7338,6 +8013,9 @@ vect_transform_loop (loop_vec_info loop_vinfo)
>>
>>slpeel_make_loop_iterate_ntimes (loop, niters_vector);
>>
>> +  if (LOOP_VINFO_COMBINE_EPILOGUE (loop_vinfo))
>> +vect_combine_loop_epilogue (loop_vinfo);
>> +
>>/* Reduce loop iterations by the vectorization factor.  */
>>scale_loop_profile (loop, GCOV_COMPUTE_SCALE (1, vf),
>>expected_iterations / vf);
>>
>> IMHO, we should decouple main loop vectorization and epilogue
>> vectorization as much as possible by sharing as few information as we
>> can.  The general idea is to handle epilogue loop just like normal
>> short-trip loop.  For example, we can rename
>> LOOP_VINFO_COMBINE_EPILOGUE into LOOP_VINFO_VECT_MASK (or something
>> else), and we don't differentiate its meaning between main and
>> epilogue(short-trip) loop.  It only indicates the current loop should
>> be vectorized with masking no matter it's a main loop or epilogue
>> loop, and it works just like the current implementation.
>>
>> After this change, we can refine vectorization and make it more
>> general for normal loop and epilogue(short trip) loop.  For example,
>> this implementation sets LOOP_VINFO_PEELING_FOR_NITER  for epilogue
>> loop and use it to control how it should be vectorized:
>> +  if (!LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo))
>> +{
>> +  LOOP_VINFO_MASK_EPILOGUE (loop_vinfo) = false;
>> +  LOOP_VINFO_COMBINE_EPILOGUE (loop_vinfo) = false;
>> +}
>> +  else if (LOOP_VINFO_CAN_BE_MASKED (loop_vinfo)
>> +   && min_profitable_combine_iters >= 0)
>> +{
>>
>> This works, but not that good for understanding or maintaining.
>>
>> Thanks,
>> bin


Re: [PATCH, vec-tails] Support loop epilogue vectorization

2016-11-09 Thread Yuri Rumyantsev
Thanks Richard for your comments.
Your proposed to handle epilogue loop just like normal short-trip loop
but this is not exactly truth since e.g. epilogue must not be peeled
for alignment.

It is not clear for me what are my next steps? Should I re-design the
patch completely or simply decompose the whole patch to different
parts? But it means that we must start review process from beginning
but release is closed to its end.
Note also that i work for Intel till the end of year and have not idea
who will continue working on this project.

Any help will be appreciated.

Thanks.
Yuri.

2016-11-09 13:37 GMT+03:00 Bin.Cheng :
> On Tue, Nov 1, 2016 at 12:38 PM, Yuri Rumyantsev  wrote:
>> Hi All,
>>
>> I re-send all patches sent by Ilya earlier for review which support
>> vectorization of loop epilogues and loops with low trip count. We
>> assume that the only patch - vec-tails-07-combine-tail.patch - was not
>> approved by Jeff.
>>
>> I did re-base of all patches and performed bootstrapping and
>> regression testing that did not show any new failures. Also all
>> changes related to new vect_do_peeling algorithm have been changed
>> accordingly.
>>
>> Is it OK for trunk?
>
> Hi,
> I can't approve patches, but had some comments after going through the
> implementation.
>
> One confusing part is cost model change, as well as the way it's used
> to decide how epilogue loop should be vectorized.  Given vect-tail is
> disabled at the moment and the cost change needs further tuning, is it
> reasonable to split this part out and get vectorization part
> reviewed/submitted independently?  For example, let user specified
> parameters make the decision for now.  Cost and target dependent
> changes should go in at last, this could make the patch easier to
> read.
>
> The implementation computes/shares quite amount information from main
> loop to epilogue loop vectorization.  Furthermore, variables/fields
> for such information are somehow named in a misleading way.  For
> example. LOOP_VINFO_MASK_EPILOGUE gives me the impression this is the
> flag controlling whether epilogue loop should be vectorized with
> masking.  However, it's actually controlled by exactly the same flag
> as whether epilogue loop should be combined into the main loop with
> masking:
> @@ -7338,6 +8013,9 @@ vect_transform_loop (loop_vec_info loop_vinfo)
>
>slpeel_make_loop_iterate_ntimes (loop, niters_vector);
>
> +  if (LOOP_VINFO_COMBINE_EPILOGUE (loop_vinfo))
> +vect_combine_loop_epilogue (loop_vinfo);
> +
>/* Reduce loop iterations by the vectorization factor.  */
>scale_loop_profile (loop, GCOV_COMPUTE_SCALE (1, vf),
>expected_iterations / vf);
>
> IMHO, we should decouple main loop vectorization and epilogue
> vectorization as much as possible by sharing as few information as we
> can.  The general idea is to handle epilogue loop just like normal
> short-trip loop.  For example, we can rename
> LOOP_VINFO_COMBINE_EPILOGUE into LOOP_VINFO_VECT_MASK (or something
> else), and we don't differentiate its meaning between main and
> epilogue(short-trip) loop.  It only indicates the current loop should
> be vectorized with masking no matter it's a main loop or epilogue
> loop, and it works just like the current implementation.
>
> After this change, we can refine vectorization and make it more
> general for normal loop and epilogue(short trip) loop.  For example,
> this implementation sets LOOP_VINFO_PEELING_FOR_NITER  for epilogue
> loop and use it to control how it should be vectorized:
> +  if (!LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo))
> +{
> +  LOOP_VINFO_MASK_EPILOGUE (loop_vinfo) = false;
> +  LOOP_VINFO_COMBINE_EPILOGUE (loop_vinfo) = false;
> +}
> +  else if (LOOP_VINFO_CAN_BE_MASKED (loop_vinfo)
> +   && min_profitable_combine_iters >= 0)
> +{
>
> This works, but not that good for understanding or maintaining.
>
> Thanks,
> bin


Re: [PATCH, vec-tails] Support loop epilogue vectorization

2016-11-09 Thread Bin.Cheng
On Tue, Nov 1, 2016 at 12:38 PM, Yuri Rumyantsev  wrote:
> Hi All,
>
> I re-send all patches sent by Ilya earlier for review which support
> vectorization of loop epilogues and loops with low trip count. We
> assume that the only patch - vec-tails-07-combine-tail.patch - was not
> approved by Jeff.
>
> I did re-base of all patches and performed bootstrapping and
> regression testing that did not show any new failures. Also all
> changes related to new vect_do_peeling algorithm have been changed
> accordingly.
>
> Is it OK for trunk?

Hi,
I can't approve patches, but had some comments after going through the
implementation.

One confusing part is cost model change, as well as the way it's used
to decide how epilogue loop should be vectorized.  Given vect-tail is
disabled at the moment and the cost change needs further tuning, is it
reasonable to split this part out and get vectorization part
reviewed/submitted independently?  For example, let user specified
parameters make the decision for now.  Cost and target dependent
changes should go in at last, this could make the patch easier to
read.

The implementation computes/shares quite amount information from main
loop to epilogue loop vectorization.  Furthermore, variables/fields
for such information are somehow named in a misleading way.  For
example. LOOP_VINFO_MASK_EPILOGUE gives me the impression this is the
flag controlling whether epilogue loop should be vectorized with
masking.  However, it's actually controlled by exactly the same flag
as whether epilogue loop should be combined into the main loop with
masking:
@@ -7338,6 +8013,9 @@ vect_transform_loop (loop_vec_info loop_vinfo)

   slpeel_make_loop_iterate_ntimes (loop, niters_vector);

+  if (LOOP_VINFO_COMBINE_EPILOGUE (loop_vinfo))
+vect_combine_loop_epilogue (loop_vinfo);
+
   /* Reduce loop iterations by the vectorization factor.  */
   scale_loop_profile (loop, GCOV_COMPUTE_SCALE (1, vf),
   expected_iterations / vf);

IMHO, we should decouple main loop vectorization and epilogue
vectorization as much as possible by sharing as few information as we
can.  The general idea is to handle epilogue loop just like normal
short-trip loop.  For example, we can rename
LOOP_VINFO_COMBINE_EPILOGUE into LOOP_VINFO_VECT_MASK (or something
else), and we don't differentiate its meaning between main and
epilogue(short-trip) loop.  It only indicates the current loop should
be vectorized with masking no matter it's a main loop or epilogue
loop, and it works just like the current implementation.

After this change, we can refine vectorization and make it more
general for normal loop and epilogue(short trip) loop.  For example,
this implementation sets LOOP_VINFO_PEELING_FOR_NITER  for epilogue
loop and use it to control how it should be vectorized:
+  if (!LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo))
+{
+  LOOP_VINFO_MASK_EPILOGUE (loop_vinfo) = false;
+  LOOP_VINFO_COMBINE_EPILOGUE (loop_vinfo) = false;
+}
+  else if (LOOP_VINFO_CAN_BE_MASKED (loop_vinfo)
+   && min_profitable_combine_iters >= 0)
+{

This works, but not that good for understanding or maintaining.

Thanks,
bin


Re: [PATCH, vec-tails] Support loop epilogue vectorization

2016-11-08 Thread Yuri Rumyantsev
Richard,

Here is updated 3 patch.

I checked that all new tests related to epilogue vectorization passed with it.

Your comments will be appreciated.

2016-11-08 15:38 GMT+03:00 Richard Biener :
> On Thu, 3 Nov 2016, Yuri Rumyantsev wrote:
>
>> Hi Richard,
>>
>> I did not understand your last remark:
>>
>> > That is, here (and avoid the FOR_EACH_LOOP change):
>> >
>> > @@ -580,12 +586,21 @@ vectorize_loops (void)
>> >   && dump_enabled_p ())
>> >   dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, vect_location,
>> >"loop vectorized\n");
>> > -   vect_transform_loop (loop_vinfo);
>> > +   new_loop = vect_transform_loop (loop_vinfo);
>> > num_vectorized_loops++;
>> >/* Now that the loop has been vectorized, allow it to be unrolled
>> >   etc.  */
>> >  loop->force_vectorize = false;
>> >
>> > +   /* Add new loop to a processing queue.  To make it easier
>> > +  to match loop and its epilogue vectorization in dumps
>> > +  put new loop as the next loop to process.  */
>> > +   if (new_loop)
>> > + {
>> > +   loops.safe_insert (i + 1, new_loop->num);
>> > +   vect_loops_num = number_of_loops (cfun);
>> > + }
>> >
>> > simply dispatch to a vectorize_epilogue (loop_vinfo, new_loop)
>> f> unction which will set up stuff properly (and also perform
>> > the if-conversion of the epilogue there).
>> >
>> > That said, if we can get in non-masked epilogue vectorization
>> > separately that would be great.
>>
>> Could you please clarify your proposal.
>
> When a loop was vectorized set things up to immediately vectorize
> its epilogue, avoiding changing the loop iteration and avoiding
> the re-use of ->aux.
>
> Richard.
>
>> Thanks.
>> Yuri.
>>
>> 2016-11-02 15:27 GMT+03:00 Richard Biener :
>> > On Tue, 1 Nov 2016, Yuri Rumyantsev wrote:
>> >
>> >> Hi All,
>> >>
>> >> I re-send all patches sent by Ilya earlier for review which support
>> >> vectorization of loop epilogues and loops with low trip count. We
>> >> assume that the only patch - vec-tails-07-combine-tail.patch - was not
>> >> approved by Jeff.
>> >>
>> >> I did re-base of all patches and performed bootstrapping and
>> >> regression testing that did not show any new failures. Also all
>> >> changes related to new vect_do_peeling algorithm have been changed
>> >> accordingly.
>> >>
>> >> Is it OK for trunk?
>> >
>> > I would have prefered that the series up to -03-nomask-tails would
>> > _only_ contain epilogue loop vectorization changes but unfortunately
>> > the patchset is oddly separated.
>> >
>> > I have a comment on that part nevertheless:
>> >
>> > @@ -1608,7 +1614,10 @@ vect_enhance_data_refs_alignment (loop_vec_info
>> > loop_vinfo)
>> >/* Check if we can possibly peel the loop.  */
>> >if (!vect_can_advance_ivs_p (loop_vinfo)
>> >|| !slpeel_can_duplicate_loop_p (loop, single_exit (loop))
>> > -  || loop->inner)
>> > +  || loop->inner
>> > +  /* Required peeling was performed in prologue and
>> > +is not required for epilogue.  */
>> > +  || LOOP_VINFO_EPILOGUE_P (loop_vinfo))
>> >  do_peeling = false;
>> >
>> >if (do_peeling
>> > @@ -1888,7 +1897,10 @@ vect_enhance_data_refs_alignment (loop_vec_info
>> > loop_vinfo)
>> >
>> >do_versioning =
>> > optimize_loop_nest_for_speed_p (loop)
>> > -   && (!loop->inner); /* FORNOW */
>> > +   && (!loop->inner) /* FORNOW */
>> > +/* Required versioning was performed for the
>> > +  original loop and is not required for epilogue.  */
>> > +   && !LOOP_VINFO_EPILOGUE_P (loop_vinfo);
>> >
>> >if (do_versioning)
>> >  {
>> >
>> > please do that check in the single caller of this function.
>> >
>> > Otherwise I still dislike the new ->aux use and I believe that simply
>> > passing down info from the processed parent would be _much_ cleaner.
>> > That is, here (and avoid the FOR_EACH_LOOP change):
>> >
>> > @@ -580,12 +586,21 @@ vectorize_loops (void)
>> > && dump_enabled_p ())
>> >dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, vect_location,
>> > "loop vectorized\n");
>> > -   vect_transform_loop (loop_vinfo);
>> > +   new_loop = vect_transform_loop (loop_vinfo);
>> > num_vectorized_loops++;
>> > /* Now that the loop has been vectorized, allow it to be unrolled
>> >etc.  */
>> > loop->force_vectorize = false;
>> >
>> > +   /* Add new loop to a processing queue.  To make it easier
>> > +  to match loop and its epilogue vectorization in dumps
>> > +  put new loop as the next loop to process.  */
>> > +   if (new_loop)
>> > + {
>> > +   loops.safe_insert (i + 1, new_loop->num);
>> > +   vect_loops_num = number_of_loops (cfun);
>> > + }
>> >
>> > simply dispatch to a vectorize_epilogue (loop_vinfo, new_loop)
>> > 

Re: [PATCH, vec-tails] Support loop epilogue vectorization

2016-11-08 Thread Richard Biener
On Thu, 3 Nov 2016, Yuri Rumyantsev wrote:

> Hi Richard,
> 
> I did not understand your last remark:
> 
> > That is, here (and avoid the FOR_EACH_LOOP change):
> >
> > @@ -580,12 +586,21 @@ vectorize_loops (void)
> >   && dump_enabled_p ())
> >   dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, vect_location,
> >"loop vectorized\n");
> > -   vect_transform_loop (loop_vinfo);
> > +   new_loop = vect_transform_loop (loop_vinfo);
> > num_vectorized_loops++;
> >/* Now that the loop has been vectorized, allow it to be unrolled
> >   etc.  */
> >  loop->force_vectorize = false;
> >
> > +   /* Add new loop to a processing queue.  To make it easier
> > +  to match loop and its epilogue vectorization in dumps
> > +  put new loop as the next loop to process.  */
> > +   if (new_loop)
> > + {
> > +   loops.safe_insert (i + 1, new_loop->num);
> > +   vect_loops_num = number_of_loops (cfun);
> > + }
> >
> > simply dispatch to a vectorize_epilogue (loop_vinfo, new_loop)
> f> unction which will set up stuff properly (and also perform
> > the if-conversion of the epilogue there).
> >
> > That said, if we can get in non-masked epilogue vectorization
> > separately that would be great.
> 
> Could you please clarify your proposal.

When a loop was vectorized set things up to immediately vectorize
its epilogue, avoiding changing the loop iteration and avoiding
the re-use of ->aux.

Richard.

> Thanks.
> Yuri.
> 
> 2016-11-02 15:27 GMT+03:00 Richard Biener :
> > On Tue, 1 Nov 2016, Yuri Rumyantsev wrote:
> >
> >> Hi All,
> >>
> >> I re-send all patches sent by Ilya earlier for review which support
> >> vectorization of loop epilogues and loops with low trip count. We
> >> assume that the only patch - vec-tails-07-combine-tail.patch - was not
> >> approved by Jeff.
> >>
> >> I did re-base of all patches and performed bootstrapping and
> >> regression testing that did not show any new failures. Also all
> >> changes related to new vect_do_peeling algorithm have been changed
> >> accordingly.
> >>
> >> Is it OK for trunk?
> >
> > I would have prefered that the series up to -03-nomask-tails would
> > _only_ contain epilogue loop vectorization changes but unfortunately
> > the patchset is oddly separated.
> >
> > I have a comment on that part nevertheless:
> >
> > @@ -1608,7 +1614,10 @@ vect_enhance_data_refs_alignment (loop_vec_info
> > loop_vinfo)
> >/* Check if we can possibly peel the loop.  */
> >if (!vect_can_advance_ivs_p (loop_vinfo)
> >|| !slpeel_can_duplicate_loop_p (loop, single_exit (loop))
> > -  || loop->inner)
> > +  || loop->inner
> > +  /* Required peeling was performed in prologue and
> > +is not required for epilogue.  */
> > +  || LOOP_VINFO_EPILOGUE_P (loop_vinfo))
> >  do_peeling = false;
> >
> >if (do_peeling
> > @@ -1888,7 +1897,10 @@ vect_enhance_data_refs_alignment (loop_vec_info
> > loop_vinfo)
> >
> >do_versioning =
> > optimize_loop_nest_for_speed_p (loop)
> > -   && (!loop->inner); /* FORNOW */
> > +   && (!loop->inner) /* FORNOW */
> > +/* Required versioning was performed for the
> > +  original loop and is not required for epilogue.  */
> > +   && !LOOP_VINFO_EPILOGUE_P (loop_vinfo);
> >
> >if (do_versioning)
> >  {
> >
> > please do that check in the single caller of this function.
> >
> > Otherwise I still dislike the new ->aux use and I believe that simply
> > passing down info from the processed parent would be _much_ cleaner.
> > That is, here (and avoid the FOR_EACH_LOOP change):
> >
> > @@ -580,12 +586,21 @@ vectorize_loops (void)
> > && dump_enabled_p ())
> >dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, vect_location,
> > "loop vectorized\n");
> > -   vect_transform_loop (loop_vinfo);
> > +   new_loop = vect_transform_loop (loop_vinfo);
> > num_vectorized_loops++;
> > /* Now that the loop has been vectorized, allow it to be unrolled
> >etc.  */
> > loop->force_vectorize = false;
> >
> > +   /* Add new loop to a processing queue.  To make it easier
> > +  to match loop and its epilogue vectorization in dumps
> > +  put new loop as the next loop to process.  */
> > +   if (new_loop)
> > + {
> > +   loops.safe_insert (i + 1, new_loop->num);
> > +   vect_loops_num = number_of_loops (cfun);
> > + }
> >
> > simply dispatch to a vectorize_epilogue (loop_vinfo, new_loop)
> > function which will set up stuff properly (and also perform
> > the if-conversion of the epilogue there).
> >
> > That said, if we can get in non-masked epilogue vectorization
> > separately that would be great.
> >
> > I'm still torn about all the rest of the stuff and question its
> > usability (esp. merging the epilogue with the main 

Re: [PATCH, vec-tails] Support loop epilogue vectorization

2016-11-06 Thread Richard Biener
On November 5, 2016 3:40:04 AM GMT+01:00, Jeff Law  wrote:
>On 11/02/2016 06:27 AM, Richard Biener wrote:
>> I'm still torn about all the rest of the stuff and question its
>> usability (esp. merging the epilogue with the main vector loop).
>> But it has already been approved ... oh well.
>Note that merging of the epilogue with the main vector loop may well be
>
>useful for SVE as well.  I'm hoping Richard S. will chime in on how to 
>best exploit SVE.

Possibly, but full exploitation of SVE requires a full vectorizer rewrite.  The 
only thing I can see us implementing is making the vector size fixed via 
versioning, that is, let the user specify -mvecsize=N and if that's not the 
case at runtime trap or execute a scalar variant.

Richard.

>Jeff




Re: [PATCH, vec-tails] Support loop epilogue vectorization

2016-11-05 Thread Jeff Law

On 11/02/2016 06:27 AM, Richard Biener wrote:

I'm still torn about all the rest of the stuff and question its
usability (esp. merging the epilogue with the main vector loop).
But it has already been approved ... oh well.
Note that merging of the epilogue with the main vector loop may well be 
useful for SVE as well.  I'm hoping Richard S. will chime in on how to 
best exploit SVE.


Jeff


Re: [PATCH, vec-tails] Support loop epilogue vectorization

2016-11-03 Thread Yuri Rumyantsev
Hi Richard,

I did not understand your last remark:

> That is, here (and avoid the FOR_EACH_LOOP change):
>
> @@ -580,12 +586,21 @@ vectorize_loops (void)
>   && dump_enabled_p ())
>   dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, vect_location,
>"loop vectorized\n");
> -   vect_transform_loop (loop_vinfo);
> +   new_loop = vect_transform_loop (loop_vinfo);
> num_vectorized_loops++;
>/* Now that the loop has been vectorized, allow it to be unrolled
>   etc.  */
>  loop->force_vectorize = false;
>
> +   /* Add new loop to a processing queue.  To make it easier
> +  to match loop and its epilogue vectorization in dumps
> +  put new loop as the next loop to process.  */
> +   if (new_loop)
> + {
> +   loops.safe_insert (i + 1, new_loop->num);
> +   vect_loops_num = number_of_loops (cfun);
> + }
>
> simply dispatch to a vectorize_epilogue (loop_vinfo, new_loop)
f> unction which will set up stuff properly (and also perform
> the if-conversion of the epilogue there).
>
> That said, if we can get in non-masked epilogue vectorization
> separately that would be great.

Could you please clarify your proposal.

Thanks.
Yuri.

2016-11-02 15:27 GMT+03:00 Richard Biener :
> On Tue, 1 Nov 2016, Yuri Rumyantsev wrote:
>
>> Hi All,
>>
>> I re-send all patches sent by Ilya earlier for review which support
>> vectorization of loop epilogues and loops with low trip count. We
>> assume that the only patch - vec-tails-07-combine-tail.patch - was not
>> approved by Jeff.
>>
>> I did re-base of all patches and performed bootstrapping and
>> regression testing that did not show any new failures. Also all
>> changes related to new vect_do_peeling algorithm have been changed
>> accordingly.
>>
>> Is it OK for trunk?
>
> I would have prefered that the series up to -03-nomask-tails would
> _only_ contain epilogue loop vectorization changes but unfortunately
> the patchset is oddly separated.
>
> I have a comment on that part nevertheless:
>
> @@ -1608,7 +1614,10 @@ vect_enhance_data_refs_alignment (loop_vec_info
> loop_vinfo)
>/* Check if we can possibly peel the loop.  */
>if (!vect_can_advance_ivs_p (loop_vinfo)
>|| !slpeel_can_duplicate_loop_p (loop, single_exit (loop))
> -  || loop->inner)
> +  || loop->inner
> +  /* Required peeling was performed in prologue and
> +is not required for epilogue.  */
> +  || LOOP_VINFO_EPILOGUE_P (loop_vinfo))
>  do_peeling = false;
>
>if (do_peeling
> @@ -1888,7 +1897,10 @@ vect_enhance_data_refs_alignment (loop_vec_info
> loop_vinfo)
>
>do_versioning =
> optimize_loop_nest_for_speed_p (loop)
> -   && (!loop->inner); /* FORNOW */
> +   && (!loop->inner) /* FORNOW */
> +/* Required versioning was performed for the
> +  original loop and is not required for epilogue.  */
> +   && !LOOP_VINFO_EPILOGUE_P (loop_vinfo);
>
>if (do_versioning)
>  {
>
> please do that check in the single caller of this function.
>
> Otherwise I still dislike the new ->aux use and I believe that simply
> passing down info from the processed parent would be _much_ cleaner.
> That is, here (and avoid the FOR_EACH_LOOP change):
>
> @@ -580,12 +586,21 @@ vectorize_loops (void)
> && dump_enabled_p ())
>dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, vect_location,
> "loop vectorized\n");
> -   vect_transform_loop (loop_vinfo);
> +   new_loop = vect_transform_loop (loop_vinfo);
> num_vectorized_loops++;
> /* Now that the loop has been vectorized, allow it to be unrolled
>etc.  */
> loop->force_vectorize = false;
>
> +   /* Add new loop to a processing queue.  To make it easier
> +  to match loop and its epilogue vectorization in dumps
> +  put new loop as the next loop to process.  */
> +   if (new_loop)
> + {
> +   loops.safe_insert (i + 1, new_loop->num);
> +   vect_loops_num = number_of_loops (cfun);
> + }
>
> simply dispatch to a vectorize_epilogue (loop_vinfo, new_loop)
> function which will set up stuff properly (and also perform
> the if-conversion of the epilogue there).
>
> That said, if we can get in non-masked epilogue vectorization
> separately that would be great.
>
> I'm still torn about all the rest of the stuff and question its
> usability (esp. merging the epilogue with the main vector loop).
> But it has already been approved ... oh well.
>
> Thanks,
> Richard.


Re: [PATCH, vec-tails] Support loop epilogue vectorization

2016-11-02 Thread Richard Biener
On Tue, 1 Nov 2016, Yuri Rumyantsev wrote:

> Hi All,
> 
> I re-send all patches sent by Ilya earlier for review which support
> vectorization of loop epilogues and loops with low trip count. We
> assume that the only patch - vec-tails-07-combine-tail.patch - was not
> approved by Jeff.
> 
> I did re-base of all patches and performed bootstrapping and
> regression testing that did not show any new failures. Also all
> changes related to new vect_do_peeling algorithm have been changed
> accordingly.
> 
> Is it OK for trunk?

I would have prefered that the series up to -03-nomask-tails would
_only_ contain epilogue loop vectorization changes but unfortunately
the patchset is oddly separated.

I have a comment on that part nevertheless:

@@ -1608,7 +1614,10 @@ vect_enhance_data_refs_alignment (loop_vec_info 
loop_vinfo)
   /* Check if we can possibly peel the loop.  */
   if (!vect_can_advance_ivs_p (loop_vinfo)
   || !slpeel_can_duplicate_loop_p (loop, single_exit (loop))
-  || loop->inner)
+  || loop->inner
+  /* Required peeling was performed in prologue and
+is not required for epilogue.  */
+  || LOOP_VINFO_EPILOGUE_P (loop_vinfo))
 do_peeling = false;

   if (do_peeling
@@ -1888,7 +1897,10 @@ vect_enhance_data_refs_alignment (loop_vec_info 
loop_vinfo)

   do_versioning =
optimize_loop_nest_for_speed_p (loop)
-   && (!loop->inner); /* FORNOW */
+   && (!loop->inner) /* FORNOW */
+/* Required versioning was performed for the
+  original loop and is not required for epilogue.  */
+   && !LOOP_VINFO_EPILOGUE_P (loop_vinfo);

   if (do_versioning)
 {

please do that check in the single caller of this function.

Otherwise I still dislike the new ->aux use and I believe that simply
passing down info from the processed parent would be _much_ cleaner.
That is, here (and avoid the FOR_EACH_LOOP change):

@@ -580,12 +586,21 @@ vectorize_loops (void)
&& dump_enabled_p ())
   dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, vect_location,
"loop vectorized\n");
-   vect_transform_loop (loop_vinfo);
+   new_loop = vect_transform_loop (loop_vinfo);
num_vectorized_loops++;
/* Now that the loop has been vectorized, allow it to be unrolled
   etc.  */
loop->force_vectorize = false;

+   /* Add new loop to a processing queue.  To make it easier
+  to match loop and its epilogue vectorization in dumps
+  put new loop as the next loop to process.  */
+   if (new_loop)
+ {
+   loops.safe_insert (i + 1, new_loop->num);
+   vect_loops_num = number_of_loops (cfun);
+ }

simply dispatch to a vectorize_epilogue (loop_vinfo, new_loop)
function which will set up stuff properly (and also perform
the if-conversion of the epilogue there).

That said, if we can get in non-masked epilogue vectorization
separately that would be great.

I'm still torn about all the rest of the stuff and question its
usability (esp. merging the epilogue with the main vector loop).
But it has already been approved ... oh well.

Thanks,
Richard.