Re: [PATCH, vec-tails] Support loop epilogue vectorization
Sorry, I put wrong test - fix it here. 2016-12-21 13:12 GMT+03:00 Yuri Rumyantsev: > Hi Richard, > > I occasionally found out a bug in my patch related to epilogue > vectorization without masking : need to put label before > initialization. > > Could you please review and integrate it to trunk. Test-case is also attached. > > > Thanks ahead. > Yuri. > > ChangeLog: > 2016-12-21 Yuri Rumyantsev > > * tree-vectorizer.c (vectorize_loops): Put label before initialization > of loop_vectorized_call. > > gcc/testsuite/ > > * gcc.dg/vect/vect-tail-nomask-2.c: New test. > > 2016-12-13 16:59 GMT+03:00 Richard Biener : >> On Mon, 12 Dec 2016, Yuri Rumyantsev wrote: >> >>> Richard, >>> >>> Could you please review cost model patch before to include it to >>> epilogue masking patch and add masking cost estimation as you >>> requested. >> >> That's just the middle-end / target changes. I was not 100% happy >> with them but well, the vectorizer cost modeling needs work >> (aka another rewrite). >> >> From below... >> >>> Thanks. >>> >>> Patch and ChangeLog are attached. >>> >>> 2016-12-12 15:47 GMT+03:00 Yuri Rumyantsev : >>> > Hi Richard, >>> > >>> > You asked me about performance of spec2006 on AVX2 machine with new >>> > feature. >>> > >>> > I tried the following on Haswell using original patch designed by Ilya. >>> > 1. Masking low trip count loops only 6 benchmarks are affected and >>> > performance is almost the same >>> > 464.h264ref 63.900064. +0.15% >>> > 416.gamess 42.900042.9000 +0% >>> > 435.gromacs 32.800032.7000 -0.30% >>> > 447.dealII 68.500068.3000 -0.29% >>> > 453.povray 61.900062.1000 +0.32% >>> > 454.calculix39.800039.8000 +0% >>> > 465.tonto 29.900029.9000 +0% >>> > >>> > 2. epilogue vectorization without masking (use less vf) (3 benchmarks >>> > are not affected) >>> > 400.perlbench 47.200046.5000 -1.48% >>> > 401.bzip2 29.900029.9000 +0% >>> > 403.gcc 41.800041.6000 -0.47% >>> > 456.hmmer 32.32. +0% >>> > 462.libquantum81.500082. +0.61% >>> > 464.h264ref 65.65.5000 +0.76% >>> > 471.omnetpp 27.800028.2000 +1.43% >>> > 473.astar 28.700028.6000 -0.34% >>> > 483.xalancbmk 48.700048.6000 -0.20% >>> > 410.bwaves95.300095.3000 +0% >>> > 416.gamess42.900042.8000 -0.23% >>> > 433.milc 38.800038.8000 +0% >>> > 434.zeusmp51.700051.4000 -0.58% >>> > 435.gromacs 32.800032.8000 +0% >>> > 436.cactusADM 85.83. -2.35% >>> > 437.leslie3d 55.500055.5000 +0% >>> > 444.namd 31.300031.3000 +0% >>> > 447.dealII68.700068.9000 +0.29% >>> > 450.soplex47.300047.4000 +0.21% >>> > 453.povray62.100061.4000 -1.12% >>> > 454.calculix 39.700039.3000 -1.00% >>> > 459.GemsFDTD 44.900045. +0.22% >>> > 465.tonto 29.800029.8000 +0% >>> > 481.wrf 51.51.2000 +0.39% >>> > 482.sphinx3 69.800071.2000 +2.00% >> >> I see 471.omnetpp and 482.sphinx3 are in a similar ballpark and it >> would be nice to catch the relevant case(s) with a cost model for >> epilogue vectorization without masking first (to get rid of >> --param vect-epilogues-nomask). >> >> As said elsewhere any non-conservative cost modeling (if the >> number of scalar iterations is not statically constant) might >> require versioning of the loop into a non-vectorized, >> short-trip vectorized and regular vectorized case (the Intel >> compiler does way more aggressive versioning IIRC). >> >> Richard. >> >>> > 3. epilogue vectorization using masking (4 benchmarks are not affected): >>> > 400.perlbench 47.500046.8000 -1.47% >>> > 401.bzip2 30.29.9000 -0.33% >>> > 403.gcc 42.300042.3000 +0% >>> > 445.gobmk 32.100032.8000 +2.18% >>> > 456.hmmer 32.32. +0% >>> > 458.sjeng 36.100035.5000 -1.66% >>> > 462.libquantum81.100081.1000 +0% >>> > 464.h264ref 65.400065. -0.61% >>> > 483.xalancbmk 49.400049.3000 -0.20% >>> > 410.bwaves95.900095.5000 -0.41% >>> > 416.gamess42.800042.6000 -0.46% >>> > 433.milc 38.800039.1000 +0.77% >>> > 434.zeusmp52.100051.3000 -1.53% >>> > 435.gromacs 32.900032.9000 +0% >>> > 436.cactusADM 78.800085.3000 +8.24% >>> > 437.leslie3d 55.400055.4000 +0% >>> > 444.namd 31.300031.3000 +0% >>> > 447.dealII69.69.2000 +0.28% >>> > 450.soplex47.700047.6000 -0.20% >>> > 453.povray62.200061.7000 -0.80% >>> > 454.calculix 39.700038.2000 -3.77% >>> > 459.GemsFDTD 44.900045. +0.22% >>> > 465.tonto 29.800029.9000 +0.33% >>> > 481.wrf 51.200051.6000
Re: [PATCH, vec-tails] Support loop epilogue vectorization
Hi Richard, I occasionally found out a bug in my patch related to epilogue vectorization without masking : need to put label before initialization. Could you please review and integrate it to trunk. Test-case is also attached. Thanks ahead. Yuri. ChangeLog: 2016-12-21 Yuri Rumyantsev* tree-vectorizer.c (vectorize_loops): Put label before initialization of loop_vectorized_call. gcc/testsuite/ * gcc.dg/vect/vect-tail-nomask-2.c: New test. 2016-12-13 16:59 GMT+03:00 Richard Biener : > On Mon, 12 Dec 2016, Yuri Rumyantsev wrote: > >> Richard, >> >> Could you please review cost model patch before to include it to >> epilogue masking patch and add masking cost estimation as you >> requested. > > That's just the middle-end / target changes. I was not 100% happy > with them but well, the vectorizer cost modeling needs work > (aka another rewrite). > > From below... > >> Thanks. >> >> Patch and ChangeLog are attached. >> >> 2016-12-12 15:47 GMT+03:00 Yuri Rumyantsev : >> > Hi Richard, >> > >> > You asked me about performance of spec2006 on AVX2 machine with new >> > feature. >> > >> > I tried the following on Haswell using original patch designed by Ilya. >> > 1. Masking low trip count loops only 6 benchmarks are affected and >> > performance is almost the same >> > 464.h264ref 63.900064. +0.15% >> > 416.gamess 42.900042.9000 +0% >> > 435.gromacs 32.800032.7000 -0.30% >> > 447.dealII 68.500068.3000 -0.29% >> > 453.povray 61.900062.1000 +0.32% >> > 454.calculix39.800039.8000 +0% >> > 465.tonto 29.900029.9000 +0% >> > >> > 2. epilogue vectorization without masking (use less vf) (3 benchmarks >> > are not affected) >> > 400.perlbench 47.200046.5000 -1.48% >> > 401.bzip2 29.900029.9000 +0% >> > 403.gcc 41.800041.6000 -0.47% >> > 456.hmmer 32.32. +0% >> > 462.libquantum81.500082. +0.61% >> > 464.h264ref 65.65.5000 +0.76% >> > 471.omnetpp 27.800028.2000 +1.43% >> > 473.astar 28.700028.6000 -0.34% >> > 483.xalancbmk 48.700048.6000 -0.20% >> > 410.bwaves95.300095.3000 +0% >> > 416.gamess42.900042.8000 -0.23% >> > 433.milc 38.800038.8000 +0% >> > 434.zeusmp51.700051.4000 -0.58% >> > 435.gromacs 32.800032.8000 +0% >> > 436.cactusADM 85.83. -2.35% >> > 437.leslie3d 55.500055.5000 +0% >> > 444.namd 31.300031.3000 +0% >> > 447.dealII68.700068.9000 +0.29% >> > 450.soplex47.300047.4000 +0.21% >> > 453.povray62.100061.4000 -1.12% >> > 454.calculix 39.700039.3000 -1.00% >> > 459.GemsFDTD 44.900045. +0.22% >> > 465.tonto 29.800029.8000 +0% >> > 481.wrf 51.51.2000 +0.39% >> > 482.sphinx3 69.800071.2000 +2.00% > > I see 471.omnetpp and 482.sphinx3 are in a similar ballpark and it > would be nice to catch the relevant case(s) with a cost model for > epilogue vectorization without masking first (to get rid of > --param vect-epilogues-nomask). > > As said elsewhere any non-conservative cost modeling (if the > number of scalar iterations is not statically constant) might > require versioning of the loop into a non-vectorized, > short-trip vectorized and regular vectorized case (the Intel > compiler does way more aggressive versioning IIRC). > > Richard. > >> > 3. epilogue vectorization using masking (4 benchmarks are not affected): >> > 400.perlbench 47.500046.8000 -1.47% >> > 401.bzip2 30.29.9000 -0.33% >> > 403.gcc 42.300042.3000 +0% >> > 445.gobmk 32.100032.8000 +2.18% >> > 456.hmmer 32.32. +0% >> > 458.sjeng 36.100035.5000 -1.66% >> > 462.libquantum81.100081.1000 +0% >> > 464.h264ref 65.400065. -0.61% >> > 483.xalancbmk 49.400049.3000 -0.20% >> > 410.bwaves95.900095.5000 -0.41% >> > 416.gamess42.800042.6000 -0.46% >> > 433.milc 38.800039.1000 +0.77% >> > 434.zeusmp52.100051.3000 -1.53% >> > 435.gromacs 32.900032.9000 +0% >> > 436.cactusADM 78.800085.3000 +8.24% >> > 437.leslie3d 55.400055.4000 +0% >> > 444.namd 31.300031.3000 +0% >> > 447.dealII69.69.2000 +0.28% >> > 450.soplex47.700047.6000 -0.20% >> > 453.povray62.200061.7000 -0.80% >> > 454.calculix 39.700038.2000 -3.77% >> > 459.GemsFDTD 44.900045. +0.22% >> > 465.tonto 29.800029.9000 +0.33% >> > 481.wrf 51.200051.6000 +0.78% >> > 482.sphinx3 70.300065.4000 -6.97% >> > >> > There is a good speed-up for 436 but there is essential slow0down on 482, >> > 454. >> > >> > So In general we don't have any advantages for AVX2. >> > >> > Best regards. >> >
Re: [PATCH, vec-tails] Support loop epilogue vectorization
On Thu, 1 Dec 2016, Yuri Rumyantsev wrote: > Thanks Richard for your comments. > > You asked me about possible performance improvements for AVX2 machines > - we did not see any visible speed-up for spec2k with any method of Spec 2000? Can you check with SPEC 2006 or CPUv6? Did you see performance degradation? What about compile-time and binary size effects? > masking, including epilogue masking and combining, only on AVX512 > machine aka knl. I see. Note that as said in the initial review patch the cost model I saw therein looked flawed. In the end I'd expect a sensible approach would be to do if (n < scalar-most-profitable-niter) { no vectorization } else if (n < masking-more-profitable-than-not-masking-plus-epilogue) { do masked vectorization } else { do unmasked vectorization (with epilogue, eventually vectorized) } where for short trip loops the else path would never be taken (statically). And yes, that means masking will only be useful for short-trip loops which in the end means an overall performance benfit is unlikely unless we have a lot of short-trip loops that are slow because of the overhead of main unmasked loop plus epilogue. Richard. > I will answer on your question later. > > Best regards. > Yuri > > 2016-12-01 14:33 GMT+03:00 Richard Biener: > > On Mon, 28 Nov 2016, Yuri Rumyantsev wrote: > > > >> Richard! > >> > >> I attached vect dump for hte part of attached test-case which > >> illustrated how vectorization of epilogues works through masking: > >> #define SIZE 1023 > >> #define ALIGN 64 > >> > >> extern int posix_memalign(void **memptr, __SIZE_TYPE__ alignment, > >> __SIZE_TYPE__ size) __attribute__((weak)); > >> extern void free (void *); > >> > >> void __attribute__((noinline)) > >> test_citer (int * __restrict__ a, > >>int * __restrict__ b, > >>int * __restrict__ c) > >> { > >> int i; > >> > >> a = (int *)__builtin_assume_aligned (a, ALIGN); > >> b = (int *)__builtin_assume_aligned (b, ALIGN); > >> c = (int *)__builtin_assume_aligned (c, ALIGN); > >> > >> for (i = 0; i < SIZE; i++) > >> c[i] = a[i] + b[i]; > >> } > >> > >> It was compiled with -mavx2 --param vect-epilogues-mask=1 options. > >> > >> I did not include in this patch vectorization of low trip-count loops > >> since in the original patch additional parameter was introduced: > >> +DEFPARAM (PARAM_VECT_SHORT_LOOPS, > >> + "vect-short-loops", > >> + "Enable vectorization of low trip count loops using masking.", > >> + 0, 0, 1) > >> > >> I assume that this ability can be included very quickly but it > >> requires cost model enhancements also. > > > > Comments on the patch itself (as I'm having a closer look again, > > I know how it vectorizes the above but I wondered why epilogue > > and short-trip loops are not basically the same code path). > > > > Btw, I don't like that the features are behind a --param paywall. > > That just means a) nobody will use it, b) it will bit-rot quickly, > > c) bugs are well-hidden. > > > > + if (loop_vinfo && LOOP_VINFO_CAN_BE_MASKED (loop_vinfo) > > + && integer_zerop (nested_in_vect_loop > > + ? STMT_VINFO_DR_STEP (stmt_info) > > + : DR_STEP (dr))) > > +{ > > + if (dump_enabled_p ()) > > + dump_printf_loc (MSG_NOTE, vect_location, > > +"allow invariant load for masked loop.\n"); > > +} > > > > this can test memory_access_type == VMAT_INVARIANT. Please put > > all the checks in a common > > > > if (loop_vinfo && LOOP_VINFO_CAN_BE_MASKED (loop_vinfo)) > > { > >if (memory_access_type == VMAT_INVARIANT) > > { > > } > >else if (...) > > { > > LOOP_VINFO_CAN_BE_MASKED (loop_vinfo) = false; > > } > >else if (..) > > ... > > } > > > > @@ -6667,6 +6756,15 @@ vectorizable_load (gimple *stmt, > > gimple_stmt_iterator *gsi, gimple **vec_stmt, > >gcc_assert (!nested_in_vect_loop); > >gcc_assert (!STMT_VINFO_GATHER_SCATTER_P (stmt_info)); > > > > + if (loop_vinfo && LOOP_VINFO_CAN_BE_MASKED (loop_vinfo)) > > + { > > + if (dump_enabled_p ()) > > + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > > +"cannot be masked: grouped access is not" > > +" supported."); > > + LOOP_VINFO_CAN_BE_MASKED (loop_vinfo) = false; > > + } > > + > > > > isn't this already handled by the above? Or rather the general > > disallowance of SLP? > > > > @@ -5730,6 +5792,24 @@ vectorizable_store (gimple *stmt, > > gimple_stmt_iterator *gsi, gimple **vec_stmt, > > _access_type, _info)) > > return false; > > > > + if (loop_vinfo && LOOP_VINFO_CAN_BE_MASKED (loop_vinfo) > > + && memory_access_type != VMAT_CONTIGUOUS) > > +{ > > + LOOP_VINFO_CAN_BE_MASKED (loop_vinfo) = false; > > + if
Re: [PATCH, vec-tails] Support loop epilogue vectorization
Thanks Richard for your comments. You asked me about possible performance improvements for AVX2 machines - we did not see any visible speed-up for spec2k with any method of masking, including epilogue masking and combining, only on AVX512 machine aka knl. I will answer on your question later. Best regards. Yuri 2016-12-01 14:33 GMT+03:00 Richard Biener: > On Mon, 28 Nov 2016, Yuri Rumyantsev wrote: > >> Richard! >> >> I attached vect dump for hte part of attached test-case which >> illustrated how vectorization of epilogues works through masking: >> #define SIZE 1023 >> #define ALIGN 64 >> >> extern int posix_memalign(void **memptr, __SIZE_TYPE__ alignment, >> __SIZE_TYPE__ size) __attribute__((weak)); >> extern void free (void *); >> >> void __attribute__((noinline)) >> test_citer (int * __restrict__ a, >>int * __restrict__ b, >>int * __restrict__ c) >> { >> int i; >> >> a = (int *)__builtin_assume_aligned (a, ALIGN); >> b = (int *)__builtin_assume_aligned (b, ALIGN); >> c = (int *)__builtin_assume_aligned (c, ALIGN); >> >> for (i = 0; i < SIZE; i++) >> c[i] = a[i] + b[i]; >> } >> >> It was compiled with -mavx2 --param vect-epilogues-mask=1 options. >> >> I did not include in this patch vectorization of low trip-count loops >> since in the original patch additional parameter was introduced: >> +DEFPARAM (PARAM_VECT_SHORT_LOOPS, >> + "vect-short-loops", >> + "Enable vectorization of low trip count loops using masking.", >> + 0, 0, 1) >> >> I assume that this ability can be included very quickly but it >> requires cost model enhancements also. > > Comments on the patch itself (as I'm having a closer look again, > I know how it vectorizes the above but I wondered why epilogue > and short-trip loops are not basically the same code path). > > Btw, I don't like that the features are behind a --param paywall. > That just means a) nobody will use it, b) it will bit-rot quickly, > c) bugs are well-hidden. > > + if (loop_vinfo && LOOP_VINFO_CAN_BE_MASKED (loop_vinfo) > + && integer_zerop (nested_in_vect_loop > + ? STMT_VINFO_DR_STEP (stmt_info) > + : DR_STEP (dr))) > +{ > + if (dump_enabled_p ()) > + dump_printf_loc (MSG_NOTE, vect_location, > +"allow invariant load for masked loop.\n"); > +} > > this can test memory_access_type == VMAT_INVARIANT. Please put > all the checks in a common > > if (loop_vinfo && LOOP_VINFO_CAN_BE_MASKED (loop_vinfo)) > { >if (memory_access_type == VMAT_INVARIANT) > { > } >else if (...) > { > LOOP_VINFO_CAN_BE_MASKED (loop_vinfo) = false; > } >else if (..) > ... > } > > @@ -6667,6 +6756,15 @@ vectorizable_load (gimple *stmt, > gimple_stmt_iterator *gsi, gimple **vec_stmt, >gcc_assert (!nested_in_vect_loop); >gcc_assert (!STMT_VINFO_GATHER_SCATTER_P (stmt_info)); > > + if (loop_vinfo && LOOP_VINFO_CAN_BE_MASKED (loop_vinfo)) > + { > + if (dump_enabled_p ()) > + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > +"cannot be masked: grouped access is not" > +" supported."); > + LOOP_VINFO_CAN_BE_MASKED (loop_vinfo) = false; > + } > + > > isn't this already handled by the above? Or rather the general > disallowance of SLP? > > @@ -5730,6 +5792,24 @@ vectorizable_store (gimple *stmt, > gimple_stmt_iterator *gsi, gimple **vec_stmt, > _access_type, _info)) > return false; > > + if (loop_vinfo && LOOP_VINFO_CAN_BE_MASKED (loop_vinfo) > + && memory_access_type != VMAT_CONTIGUOUS) > +{ > + LOOP_VINFO_CAN_BE_MASKED (loop_vinfo) = false; > + if (dump_enabled_p ()) > + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > +"cannot be masked: unsupported memory access > type.\n"); > +} > + > + if (loop_vinfo && LOOP_VINFO_CAN_BE_MASKED (loop_vinfo) > + && !can_mask_load_store (stmt)) > +{ > + LOOP_VINFO_CAN_BE_MASKED (loop_vinfo) = false; > + if (dump_enabled_p ()) > + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > +"cannot be masked: unsupported masked store.\n"); > +} > + > > likewise please combine the ifs. > > @@ -2354,7 +2401,10 @@ vectorizable_mask_load_store (gimple *stmt, > gimple_stmt_iterator *gsi, > ptr, vec_mask, vec_rhs); > vect_finish_stmt_generation (stmt, new_stmt, gsi); > if (i == 0) > - STMT_VINFO_VEC_STMT (stmt_info) = *vec_stmt = new_stmt; > + { > + STMT_VINFO_VEC_STMT (stmt_info) = *vec_stmt = new_stmt; > + STMT_VINFO_FIRST_COPY_P (vinfo_for_stmt (new_stmt)) = true; > + } > else > STMT_VINFO_RELATED_STMT (prev_stmt_info) = new_stmt; >
Re: [PATCH, vec-tails] Support loop epilogue vectorization
On Mon, 28 Nov 2016, Yuri Rumyantsev wrote: > Richard! > > I attached vect dump for hte part of attached test-case which > illustrated how vectorization of epilogues works through masking: > #define SIZE 1023 > #define ALIGN 64 > > extern int posix_memalign(void **memptr, __SIZE_TYPE__ alignment, > __SIZE_TYPE__ size) __attribute__((weak)); > extern void free (void *); > > void __attribute__((noinline)) > test_citer (int * __restrict__ a, >int * __restrict__ b, >int * __restrict__ c) > { > int i; > > a = (int *)__builtin_assume_aligned (a, ALIGN); > b = (int *)__builtin_assume_aligned (b, ALIGN); > c = (int *)__builtin_assume_aligned (c, ALIGN); > > for (i = 0; i < SIZE; i++) > c[i] = a[i] + b[i]; > } > > It was compiled with -mavx2 --param vect-epilogues-mask=1 options. > > I did not include in this patch vectorization of low trip-count loops > since in the original patch additional parameter was introduced: > +DEFPARAM (PARAM_VECT_SHORT_LOOPS, > + "vect-short-loops", > + "Enable vectorization of low trip count loops using masking.", > + 0, 0, 1) > > I assume that this ability can be included very quickly but it > requires cost model enhancements also. Comments on the patch itself (as I'm having a closer look again, I know how it vectorizes the above but I wondered why epilogue and short-trip loops are not basically the same code path). Btw, I don't like that the features are behind a --param paywall. That just means a) nobody will use it, b) it will bit-rot quickly, c) bugs are well-hidden. + if (loop_vinfo && LOOP_VINFO_CAN_BE_MASKED (loop_vinfo) + && integer_zerop (nested_in_vect_loop + ? STMT_VINFO_DR_STEP (stmt_info) + : DR_STEP (dr))) +{ + if (dump_enabled_p ()) + dump_printf_loc (MSG_NOTE, vect_location, +"allow invariant load for masked loop.\n"); +} this can test memory_access_type == VMAT_INVARIANT. Please put all the checks in a common if (loop_vinfo && LOOP_VINFO_CAN_BE_MASKED (loop_vinfo)) { if (memory_access_type == VMAT_INVARIANT) { } else if (...) { LOOP_VINFO_CAN_BE_MASKED (loop_vinfo) = false; } else if (..) ... } @@ -6667,6 +6756,15 @@ vectorizable_load (gimple *stmt, gimple_stmt_iterator *gsi, gimple **vec_stmt, gcc_assert (!nested_in_vect_loop); gcc_assert (!STMT_VINFO_GATHER_SCATTER_P (stmt_info)); + if (loop_vinfo && LOOP_VINFO_CAN_BE_MASKED (loop_vinfo)) + { + if (dump_enabled_p ()) + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, +"cannot be masked: grouped access is not" +" supported."); + LOOP_VINFO_CAN_BE_MASKED (loop_vinfo) = false; + } + isn't this already handled by the above? Or rather the general disallowance of SLP? @@ -5730,6 +5792,24 @@ vectorizable_store (gimple *stmt, gimple_stmt_iterator *gsi, gimple **vec_stmt, _access_type, _info)) return false; + if (loop_vinfo && LOOP_VINFO_CAN_BE_MASKED (loop_vinfo) + && memory_access_type != VMAT_CONTIGUOUS) +{ + LOOP_VINFO_CAN_BE_MASKED (loop_vinfo) = false; + if (dump_enabled_p ()) + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, +"cannot be masked: unsupported memory access type.\n"); +} + + if (loop_vinfo && LOOP_VINFO_CAN_BE_MASKED (loop_vinfo) + && !can_mask_load_store (stmt)) +{ + LOOP_VINFO_CAN_BE_MASKED (loop_vinfo) = false; + if (dump_enabled_p ()) + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, +"cannot be masked: unsupported masked store.\n"); +} + likewise please combine the ifs. @@ -2354,7 +2401,10 @@ vectorizable_mask_load_store (gimple *stmt, gimple_stmt_iterator *gsi, ptr, vec_mask, vec_rhs); vect_finish_stmt_generation (stmt, new_stmt, gsi); if (i == 0) - STMT_VINFO_VEC_STMT (stmt_info) = *vec_stmt = new_stmt; + { + STMT_VINFO_VEC_STMT (stmt_info) = *vec_stmt = new_stmt; + STMT_VINFO_FIRST_COPY_P (vinfo_for_stmt (new_stmt)) = true; + } else STMT_VINFO_RELATED_STMT (prev_stmt_info) = new_stmt; prev_stmt_info = vinfo_for_stmt (new_stmt); here you only set the flag, elsewhere you copy DR and VECTYPE as well. @@ -2113,6 +2146,20 @@ vectorizable_mask_load_store (gimple *stmt, gimple_stmt_iterator *gsi, && !useless_type_conversion_p (vectype, rhs_vectype))) return false; + if (LOOP_VINFO_CAN_BE_MASKED (loop_vinfo)) +{ + /* Check that mask conjuction is supported. */ + optab tab; + tab = optab_for_tree_code (BIT_AND_EXPR, vectype, optab_default); + if (!tab || optab_handler (tab, TYPE_MODE (vectype)) == CODE_FOR_nothing) +
Re: [PATCH, vec-tails] Support loop epilogue vectorization
On 18 November 2016 at 16:54, Christophe Lyonwrote: > On 18 November 2016 at 16:46, Yuri Rumyantsev wrote: >> It is very strange that this test failed on arm, since it requires >> target avx2 to check vectorizer dumps: >> >> /* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 2 "vect" { >> target avx2_runtime } } } */ >> /* { dg-final { scan-tree-dump-times "LOOP EPILOGUE VECTORIZED >> \\(VS=16\\)" 2 "vect" { target avx2_runtime } } } */ >> >> Could you please clarify what is the reason of the failure? > > It's not the scan-dumps that fail, but the execution. > The test calls abort() for some reason. > > It will take me a while to rebuild the test manually in the right > debug environment to provide you with more traces. > > Sorry for the delay... This problem is not directly related to your patch. The tests in gcc.dg/vect are compiled with -mfpu=neon -mfloat-abi=softfp -march=armv7-a and thus cannot be executed on older versions of the architecture. This is another instance of what I discussed with Jakub several months ago: https://gcc.gnu.org/ml/gcc-patches/2016-06/msg00666.html but the thread died. Basically, check_vect_support_and_set_flags sets set dg-do-what-default compile, but some tests in gcc.dg/vect have dg-do run hardcoded. Jakub was not happy with my patch that was removing all these dg-do run directives :-) Christophe > >> >> Thanks. >> >> 2016-11-18 16:20 GMT+03:00 Christophe Lyon : >>> On 15 November 2016 at 15:41, Yuri Rumyantsev wrote: Hi All, Here is patch for non-masked epilogue vectoriziation. Bootstrap and regression testing did not show any new failures. Is it OK for trunk? Thanks. Changelog: 2016-11-15 Yuri Rumyantsev * params.def (PARAM_VECT_EPILOGUES_NOMASK): New. * tree-if-conv.c (tree_if_conversion): Make public. * * tree-if-conv.h: New file. * tree-vect-data-refs.c (vect_analyze_data_ref_dependences) Avoid dynamic alias checks for epilogues. * tree-vect-loop-manip.c (vect_do_peeling): Return created epilog. * tree-vect-loop.c: include tree-if-conv.h. (new_loop_vec_info): Add zeroing orig_loop_info field. (vect_analyze_loop_2): Don't try to enhance alignment for epilogues. (vect_analyze_loop): Add argument ORIG_LOOP_INFO which is not NULL if epilogue is vectorized, set up orig_loop_info field of loop_vinfo using passed argument. (vect_transform_loop): Check if created epilogue should be returned for further vectorization with less vf. If-convert epilogue if required. Print vectorization success for epilogue. * tree-vectorizer.c (vectorize_loops): Add epilogue vectorization if it is required, pass loop_vinfo produced during vectorization of loop body to vect_analyze_loop. * tree-vectorizer.h (struct _loop_vec_info): Add new field orig_loop_info. (LOOP_VINFO_ORIG_LOOP_INFO): New. (LOOP_VINFO_EPILOGUE_P): New. (LOOP_VINFO_ORIG_VECT_FACTOR): New. (vect_do_peeling): Change prototype to return epilogue. (vect_analyze_loop): Add argument of loop_vec_info type. (vect_transform_loop): Return created loop. gcc/testsuite/ * lib/target-supports.exp (check_avx2_hw_available): New. (check_effective_target_avx2_runtime): New. * gcc.dg/vect/vect-tail-nomask-1.c: New test. >>> >>> Hi, >>> >>> This new test fails on arm-none-eabi (using default cpu/fpu/mode): >>> gcc.dg/vect/vect-tail-nomask-1.c -flto -ffat-lto-objects execution test >>> gcc.dg/vect/vect-tail-nomask-1.c execution test >>> >>> It does pass on the same target if configured --with-cpu=cortex-a9. >>> >>> Christophe >>> >>> >>> 2016-11-14 20:04 GMT+03:00 Richard Biener : > On November 14, 2016 4:39:40 PM GMT+01:00, Yuri Rumyantsev > wrote: >>Richard, >> >>I checked one of the tests designed for epilogue vectorization using >>patches 1 - 3 and found out that build compiler performs vectorization >>of epilogues with --param vect-epilogues-nomask=1 passed: >> >>$ gcc -Ofast -mavx2 t1.c -S --param vect-epilogues-nomask=1 -o >>t1.new-nomask.s -fdump-tree-vect-details >>$ grep VECTORIZED -c t1.c.156t.vect >>4 >> Without param only 2 loops are vectorized. >> >>Should I simply add a part of tests related to this feature or I must >>delete all not necessary changes also? > > Please remove all not necessary changes. > > Richard. > >>Thanks. >>Yuri. >> >>2016-11-14 16:40 GMT+03:00 Richard Biener : >>> On Mon, 14 Nov 2016, Yuri Rumyantsev wrote: >>> Richard, In my previous patch I forgot to remove couple lines related to aux >>field. Here is the correct updated patch.
Re: [PATCH, vec-tails] Support loop epilogue vectorization
Richard! I attached vect dump for hte part of attached test-case which illustrated how vectorization of epilogues works through masking: #define SIZE 1023 #define ALIGN 64 extern int posix_memalign(void **memptr, __SIZE_TYPE__ alignment, __SIZE_TYPE__ size) __attribute__((weak)); extern void free (void *); void __attribute__((noinline)) test_citer (int * __restrict__ a, int * __restrict__ b, int * __restrict__ c) { int i; a = (int *)__builtin_assume_aligned (a, ALIGN); b = (int *)__builtin_assume_aligned (b, ALIGN); c = (int *)__builtin_assume_aligned (c, ALIGN); for (i = 0; i < SIZE; i++) c[i] = a[i] + b[i]; } It was compiled with -mavx2 --param vect-epilogues-mask=1 options. I did not include in this patch vectorization of low trip-count loops since in the original patch additional parameter was introduced: +DEFPARAM (PARAM_VECT_SHORT_LOOPS, + "vect-short-loops", + "Enable vectorization of low trip count loops using masking.", + 0, 0, 1) I assume that this ability can be included very quickly but it requires cost model enhancements also. Best regards. Yuri. 2016-11-28 17:39 GMT+03:00 Richard Biener: > On Thu, 24 Nov 2016, Yuri Rumyantsev wrote: > >> Hi All, >> >> Here is the second patch which supports epilogue vectorization using >> masking without cost model. Currently it is possible >> only with passing parameter "--param vect-epilogues-mask=1". >> >> Bootstrapping and regression testing did not show any new regression. >> >> Any comments will be appreciated. > > Going over the patch the main question is one how it works -- it looks > like the decision whether to vectorize & mask the epilogue is made > when vectorizing the loop that generates the epilogue rather than > in the epilogue vectorization path? > > That is, I'd have expected to see this handling low-trip count loops > by masking? And thus masking the epilogue simply by it being > low-trip count? > > Richard. > >> ChangeLog: >> 2016-11-24 Yuri Rumyantsev >> >> * params.def (PARAM_VECT_EPILOGUES_MASK): New. >> * tree-vect-data-refs.c (vect_get_new_ssa_name): Support vect_mask_var. >> * tree-vect-loop.c: Include insn-config.h, recog.h and alias.h. >> (new_loop_vec_info): Add zeroing can_be_masked, mask_loop and >> required_mask fields. >> (vect_check_required_masks_widening): New. >> (vect_check_required_masks_narrowing): New. >> (vect_get_masking_iv_elems): New. >> (vect_get_masking_iv_type): New. >> (vect_get_extreme_masks): New. >> (vect_check_required_masks): New. >> (vect_analyze_loop_operations): Call vect_check_required_masks if all >> statements can be masked. >> (vect_analyze_loop_2): Inititalize to zero min_scalar_loop_bound. >> Add check that epilogue can be masked with the same vf with issue >> fail notes. Allow epilogue vectorization through masking of low trip >> loops. Set to true can_be_masked field before loop operation analysis. >> Do not set-up min_scalar_loop_bound for epilogue vectorization through >> masking. Do not peeling for epilogue masking. Reset can_be_masked >> field before repeat analysis. >> (vect_estimate_min_profitable_iters): Do not compute profitability >> for epilogue masking. Set up mask_loop filed to true if parameter >> PARAM_VECT_EPILOGUES_MASK is non-zero. >> (vectorizable_reduction): Add check that statement can be masked. >> (vectorizable_induction): Do not support masking for induction. >> (vect_gen_ivs_for_masking): New. >> (vect_get_mask_index_for_elems): New. >> (vect_get_mask_index_for_type): New. >> (vect_create_narrowed_masks): New. >> (vect_create_widened_masks): New. >> (vect_gen_loop_masks): New. >> (vect_mask_reduction_stmt): New. >> (vect_mask_mask_load_store_stmt): New. >> (vect_mask_load_store_stmt): New. >> (vect_mask_loop): New. >> (vect_transform_loop): Invoke vect_mask_loop if required. >> Use div_ceil to recompute upper bounds for masked loops. Issue >> statistics for epilogue vectorization through masking. Do not reduce >> vf for masking epilogue. >> * tree-vect-stmts.c: Include tree-ssa-loop-ivopts.h. >> (can_mask_load_store): New. >> (vectorizable_mask_load_store): Check that mask conjuction is >> supported. Set-up first_copy_p field of stmt_vinfo. >> (vectorizable_simd_clone_call): Check that simd clone can not be >> masked. >> (vectorizable_store): Check that store can be masked. Mark the first >> copy of generated vector stores and provide it with vectype and the >> original data reference. >> (vectorizable_load): Check that load can be masked. >> (vect_stmt_should_be_masked_for_epilogue): New. >> (vect_add_required_mask_for_stmt): New. >> (vect_analyze_stmt): Add check on unsupported statements for masking >> with printing message. >> * tree-vectorizer.h (struct _loop_vec_info): Add new fields >> can_be_maske, required_masks, masl_loop. >> (LOOP_VINFO_CAN_BE_MASKED): New. >> (LOOP_VINFO_REQUIRED_MASKS): New. >> (LOOP_VINFO_MASK_LOOP): New. >> (struct _stmt_vec_info): Add first_copy_p field. >>
Re: [PATCH, vec-tails] Support loop epilogue vectorization
On Thu, 24 Nov 2016, Yuri Rumyantsev wrote: > Hi All, > > Here is the second patch which supports epilogue vectorization using > masking without cost model. Currently it is possible > only with passing parameter "--param vect-epilogues-mask=1". > > Bootstrapping and regression testing did not show any new regression. > > Any comments will be appreciated. Going over the patch the main question is one how it works -- it looks like the decision whether to vectorize & mask the epilogue is made when vectorizing the loop that generates the epilogue rather than in the epilogue vectorization path? That is, I'd have expected to see this handling low-trip count loops by masking? And thus masking the epilogue simply by it being low-trip count? Richard. > ChangeLog: > 2016-11-24 Yuri Rumyantsev> > * params.def (PARAM_VECT_EPILOGUES_MASK): New. > * tree-vect-data-refs.c (vect_get_new_ssa_name): Support vect_mask_var. > * tree-vect-loop.c: Include insn-config.h, recog.h and alias.h. > (new_loop_vec_info): Add zeroing can_be_masked, mask_loop and > required_mask fields. > (vect_check_required_masks_widening): New. > (vect_check_required_masks_narrowing): New. > (vect_get_masking_iv_elems): New. > (vect_get_masking_iv_type): New. > (vect_get_extreme_masks): New. > (vect_check_required_masks): New. > (vect_analyze_loop_operations): Call vect_check_required_masks if all > statements can be masked. > (vect_analyze_loop_2): Inititalize to zero min_scalar_loop_bound. > Add check that epilogue can be masked with the same vf with issue > fail notes. Allow epilogue vectorization through masking of low trip > loops. Set to true can_be_masked field before loop operation analysis. > Do not set-up min_scalar_loop_bound for epilogue vectorization through > masking. Do not peeling for epilogue masking. Reset can_be_masked > field before repeat analysis. > (vect_estimate_min_profitable_iters): Do not compute profitability > for epilogue masking. Set up mask_loop filed to true if parameter > PARAM_VECT_EPILOGUES_MASK is non-zero. > (vectorizable_reduction): Add check that statement can be masked. > (vectorizable_induction): Do not support masking for induction. > (vect_gen_ivs_for_masking): New. > (vect_get_mask_index_for_elems): New. > (vect_get_mask_index_for_type): New. > (vect_create_narrowed_masks): New. > (vect_create_widened_masks): New. > (vect_gen_loop_masks): New. > (vect_mask_reduction_stmt): New. > (vect_mask_mask_load_store_stmt): New. > (vect_mask_load_store_stmt): New. > (vect_mask_loop): New. > (vect_transform_loop): Invoke vect_mask_loop if required. > Use div_ceil to recompute upper bounds for masked loops. Issue > statistics for epilogue vectorization through masking. Do not reduce > vf for masking epilogue. > * tree-vect-stmts.c: Include tree-ssa-loop-ivopts.h. > (can_mask_load_store): New. > (vectorizable_mask_load_store): Check that mask conjuction is > supported. Set-up first_copy_p field of stmt_vinfo. > (vectorizable_simd_clone_call): Check that simd clone can not be > masked. > (vectorizable_store): Check that store can be masked. Mark the first > copy of generated vector stores and provide it with vectype and the > original data reference. > (vectorizable_load): Check that load can be masked. > (vect_stmt_should_be_masked_for_epilogue): New. > (vect_add_required_mask_for_stmt): New. > (vect_analyze_stmt): Add check on unsupported statements for masking > with printing message. > * tree-vectorizer.h (struct _loop_vec_info): Add new fields > can_be_maske, required_masks, masl_loop. > (LOOP_VINFO_CAN_BE_MASKED): New. > (LOOP_VINFO_REQUIRED_MASKS): New. > (LOOP_VINFO_MASK_LOOP): New. > (struct _stmt_vec_info): Add first_copy_p field. > (STMT_VINFO_FIRST_COPY_P): New. > > gcc/testsuite/ > > * gcc.dg/vect/vect-tail-mask-1.c: New test. > > 2016-11-18 18:54 GMT+03:00 Christophe Lyon : > > On 18 November 2016 at 16:46, Yuri Rumyantsev wrote: > >> It is very strange that this test failed on arm, since it requires > >> target avx2 to check vectorizer dumps: > >> > >> /* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 2 "vect" { > >> target avx2_runtime } } } */ > >> /* { dg-final { scan-tree-dump-times "LOOP EPILOGUE VECTORIZED > >> \\(VS=16\\)" 2 "vect" { target avx2_runtime } } } */ > >> > >> Could you please clarify what is the reason of the failure? > > > > It's not the scan-dumps that fail, but the execution. > > The test calls abort() for some reason. > > > > It will take me a while to rebuild the test manually in the right > > debug environment to provide you with more traces. > > > > > > > >> > >> Thanks. > >> > >> 2016-11-18 16:20 GMT+03:00 Christophe Lyon : > >>> On 15 November 2016 at 15:41, Yuri Rumyantsev wrote: > Hi All, > > Here is patch for non-masked epilogue vectoriziation. > > Bootstrap and regression testing did not show any new
Re: [PATCH, vec-tails] Support loop epilogue vectorization
Hi All, Here is the second patch which supports epilogue vectorization using masking without cost model. Currently it is possible only with passing parameter "--param vect-epilogues-mask=1". Bootstrapping and regression testing did not show any new regression. Any comments will be appreciated. ChangeLog: 2016-11-24 Yuri Rumyantsev* params.def (PARAM_VECT_EPILOGUES_MASK): New. * tree-vect-data-refs.c (vect_get_new_ssa_name): Support vect_mask_var. * tree-vect-loop.c: Include insn-config.h, recog.h and alias.h. (new_loop_vec_info): Add zeroing can_be_masked, mask_loop and required_mask fields. (vect_check_required_masks_widening): New. (vect_check_required_masks_narrowing): New. (vect_get_masking_iv_elems): New. (vect_get_masking_iv_type): New. (vect_get_extreme_masks): New. (vect_check_required_masks): New. (vect_analyze_loop_operations): Call vect_check_required_masks if all statements can be masked. (vect_analyze_loop_2): Inititalize to zero min_scalar_loop_bound. Add check that epilogue can be masked with the same vf with issue fail notes. Allow epilogue vectorization through masking of low trip loops. Set to true can_be_masked field before loop operation analysis. Do not set-up min_scalar_loop_bound for epilogue vectorization through masking. Do not peeling for epilogue masking. Reset can_be_masked field before repeat analysis. (vect_estimate_min_profitable_iters): Do not compute profitability for epilogue masking. Set up mask_loop filed to true if parameter PARAM_VECT_EPILOGUES_MASK is non-zero. (vectorizable_reduction): Add check that statement can be masked. (vectorizable_induction): Do not support masking for induction. (vect_gen_ivs_for_masking): New. (vect_get_mask_index_for_elems): New. (vect_get_mask_index_for_type): New. (vect_create_narrowed_masks): New. (vect_create_widened_masks): New. (vect_gen_loop_masks): New. (vect_mask_reduction_stmt): New. (vect_mask_mask_load_store_stmt): New. (vect_mask_load_store_stmt): New. (vect_mask_loop): New. (vect_transform_loop): Invoke vect_mask_loop if required. Use div_ceil to recompute upper bounds for masked loops. Issue statistics for epilogue vectorization through masking. Do not reduce vf for masking epilogue. * tree-vect-stmts.c: Include tree-ssa-loop-ivopts.h. (can_mask_load_store): New. (vectorizable_mask_load_store): Check that mask conjuction is supported. Set-up first_copy_p field of stmt_vinfo. (vectorizable_simd_clone_call): Check that simd clone can not be masked. (vectorizable_store): Check that store can be masked. Mark the first copy of generated vector stores and provide it with vectype and the original data reference. (vectorizable_load): Check that load can be masked. (vect_stmt_should_be_masked_for_epilogue): New. (vect_add_required_mask_for_stmt): New. (vect_analyze_stmt): Add check on unsupported statements for masking with printing message. * tree-vectorizer.h (struct _loop_vec_info): Add new fields can_be_maske, required_masks, masl_loop. (LOOP_VINFO_CAN_BE_MASKED): New. (LOOP_VINFO_REQUIRED_MASKS): New. (LOOP_VINFO_MASK_LOOP): New. (struct _stmt_vec_info): Add first_copy_p field. (STMT_VINFO_FIRST_COPY_P): New. gcc/testsuite/ * gcc.dg/vect/vect-tail-mask-1.c: New test. 2016-11-18 18:54 GMT+03:00 Christophe Lyon : > On 18 November 2016 at 16:46, Yuri Rumyantsev wrote: >> It is very strange that this test failed on arm, since it requires >> target avx2 to check vectorizer dumps: >> >> /* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 2 "vect" { >> target avx2_runtime } } } */ >> /* { dg-final { scan-tree-dump-times "LOOP EPILOGUE VECTORIZED >> \\(VS=16\\)" 2 "vect" { target avx2_runtime } } } */ >> >> Could you please clarify what is the reason of the failure? > > It's not the scan-dumps that fail, but the execution. > The test calls abort() for some reason. > > It will take me a while to rebuild the test manually in the right > debug environment to provide you with more traces. > > > >> >> Thanks. >> >> 2016-11-18 16:20 GMT+03:00 Christophe Lyon : >>> On 15 November 2016 at 15:41, Yuri Rumyantsev wrote: Hi All, Here is patch for non-masked epilogue vectoriziation. Bootstrap and regression testing did not show any new failures. Is it OK for trunk? Thanks. Changelog: 2016-11-15 Yuri Rumyantsev * params.def (PARAM_VECT_EPILOGUES_NOMASK): New. * tree-if-conv.c (tree_if_conversion): Make public. * * tree-if-conv.h: New file. * tree-vect-data-refs.c (vect_analyze_data_ref_dependences) Avoid dynamic alias checks for epilogues. * tree-vect-loop-manip.c (vect_do_peeling): Return created epilog. * tree-vect-loop.c: include tree-if-conv.h. (new_loop_vec_info): Add zeroing orig_loop_info field. (vect_analyze_loop_2): Don't try to enhance alignment for epilogues.
Re: [PATCH, vec-tails] Support loop epilogue vectorization
On 18 November 2016 at 16:46, Yuri Rumyantsevwrote: > It is very strange that this test failed on arm, since it requires > target avx2 to check vectorizer dumps: > > /* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 2 "vect" { > target avx2_runtime } } } */ > /* { dg-final { scan-tree-dump-times "LOOP EPILOGUE VECTORIZED > \\(VS=16\\)" 2 "vect" { target avx2_runtime } } } */ > > Could you please clarify what is the reason of the failure? It's not the scan-dumps that fail, but the execution. The test calls abort() for some reason. It will take me a while to rebuild the test manually in the right debug environment to provide you with more traces. > > Thanks. > > 2016-11-18 16:20 GMT+03:00 Christophe Lyon : >> On 15 November 2016 at 15:41, Yuri Rumyantsev wrote: >>> Hi All, >>> >>> Here is patch for non-masked epilogue vectoriziation. >>> >>> Bootstrap and regression testing did not show any new failures. >>> >>> Is it OK for trunk? >>> >>> Thanks. >>> Changelog: >>> >>> 2016-11-15 Yuri Rumyantsev >>> >>> * params.def (PARAM_VECT_EPILOGUES_NOMASK): New. >>> * tree-if-conv.c (tree_if_conversion): Make public. >>> * * tree-if-conv.h: New file. >>> * tree-vect-data-refs.c (vect_analyze_data_ref_dependences) Avoid >>> dynamic alias checks for epilogues. >>> * tree-vect-loop-manip.c (vect_do_peeling): Return created epilog. >>> * tree-vect-loop.c: include tree-if-conv.h. >>> (new_loop_vec_info): Add zeroing orig_loop_info field. >>> (vect_analyze_loop_2): Don't try to enhance alignment for epilogues. >>> (vect_analyze_loop): Add argument ORIG_LOOP_INFO which is not NULL >>> if epilogue is vectorized, set up orig_loop_info field of loop_vinfo >>> using passed argument. >>> (vect_transform_loop): Check if created epilogue should be returned >>> for further vectorization with less vf. If-convert epilogue if >>> required. Print vectorization success for epilogue. >>> * tree-vectorizer.c (vectorize_loops): Add epilogue vectorization >>> if it is required, pass loop_vinfo produced during vectorization of >>> loop body to vect_analyze_loop. >>> * tree-vectorizer.h (struct _loop_vec_info): Add new field >>> orig_loop_info. >>> (LOOP_VINFO_ORIG_LOOP_INFO): New. >>> (LOOP_VINFO_EPILOGUE_P): New. >>> (LOOP_VINFO_ORIG_VECT_FACTOR): New. >>> (vect_do_peeling): Change prototype to return epilogue. >>> (vect_analyze_loop): Add argument of loop_vec_info type. >>> (vect_transform_loop): Return created loop. >>> >>> gcc/testsuite/ >>> >>> * lib/target-supports.exp (check_avx2_hw_available): New. >>> (check_effective_target_avx2_runtime): New. >>> * gcc.dg/vect/vect-tail-nomask-1.c: New test. >>> >> >> Hi, >> >> This new test fails on arm-none-eabi (using default cpu/fpu/mode): >> gcc.dg/vect/vect-tail-nomask-1.c -flto -ffat-lto-objects execution test >> gcc.dg/vect/vect-tail-nomask-1.c execution test >> >> It does pass on the same target if configured --with-cpu=cortex-a9. >> >> Christophe >> >> >> >>> >>> 2016-11-14 20:04 GMT+03:00 Richard Biener : On November 14, 2016 4:39:40 PM GMT+01:00, Yuri Rumyantsev wrote: >Richard, > >I checked one of the tests designed for epilogue vectorization using >patches 1 - 3 and found out that build compiler performs vectorization >of epilogues with --param vect-epilogues-nomask=1 passed: > >$ gcc -Ofast -mavx2 t1.c -S --param vect-epilogues-nomask=1 -o >t1.new-nomask.s -fdump-tree-vect-details >$ grep VECTORIZED -c t1.c.156t.vect >4 > Without param only 2 loops are vectorized. > >Should I simply add a part of tests related to this feature or I must >delete all not necessary changes also? Please remove all not necessary changes. Richard. >Thanks. >Yuri. > >2016-11-14 16:40 GMT+03:00 Richard Biener : >> On Mon, 14 Nov 2016, Yuri Rumyantsev wrote: >> >>> Richard, >>> >>> In my previous patch I forgot to remove couple lines related to aux >field. >>> Here is the correct updated patch. >> >> Yeah, I noticed. This patch would be ok for trunk (together with >> necessary parts from 1 and 2) if all not required parts are removed >> (and you'd add the testcases covering non-masked tail vect). >> >> Thus, can you please produce a single complete patch containing only >> non-masked epilogue vectoriziation? >> >> Thanks, >> Richard. >> >>> Thanks. >>> Yuri. >>> >>> 2016-11-14 15:51 GMT+03:00 Richard Biener : >>> > On Fri, 11 Nov 2016, Yuri Rumyantsev wrote: >>> > >>> >> Richard, >>> >> >>> >> I prepare updated 3 patch with passing additional argument to >>> >> vect_analyze_loop as you proposed (untested). >>> >> >>> >> You wrote: >>> >> tw, I wonder if you can produce a single patch containing
Re: [PATCH, vec-tails] Support loop epilogue vectorization
It is very strange that this test failed on arm, since it requires target avx2 to check vectorizer dumps: /* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 2 "vect" { target avx2_runtime } } } */ /* { dg-final { scan-tree-dump-times "LOOP EPILOGUE VECTORIZED \\(VS=16\\)" 2 "vect" { target avx2_runtime } } } */ Could you please clarify what is the reason of the failure? Thanks. 2016-11-18 16:20 GMT+03:00 Christophe Lyon: > On 15 November 2016 at 15:41, Yuri Rumyantsev wrote: >> Hi All, >> >> Here is patch for non-masked epilogue vectoriziation. >> >> Bootstrap and regression testing did not show any new failures. >> >> Is it OK for trunk? >> >> Thanks. >> Changelog: >> >> 2016-11-15 Yuri Rumyantsev >> >> * params.def (PARAM_VECT_EPILOGUES_NOMASK): New. >> * tree-if-conv.c (tree_if_conversion): Make public. >> * * tree-if-conv.h: New file. >> * tree-vect-data-refs.c (vect_analyze_data_ref_dependences) Avoid >> dynamic alias checks for epilogues. >> * tree-vect-loop-manip.c (vect_do_peeling): Return created epilog. >> * tree-vect-loop.c: include tree-if-conv.h. >> (new_loop_vec_info): Add zeroing orig_loop_info field. >> (vect_analyze_loop_2): Don't try to enhance alignment for epilogues. >> (vect_analyze_loop): Add argument ORIG_LOOP_INFO which is not NULL >> if epilogue is vectorized, set up orig_loop_info field of loop_vinfo >> using passed argument. >> (vect_transform_loop): Check if created epilogue should be returned >> for further vectorization with less vf. If-convert epilogue if >> required. Print vectorization success for epilogue. >> * tree-vectorizer.c (vectorize_loops): Add epilogue vectorization >> if it is required, pass loop_vinfo produced during vectorization of >> loop body to vect_analyze_loop. >> * tree-vectorizer.h (struct _loop_vec_info): Add new field >> orig_loop_info. >> (LOOP_VINFO_ORIG_LOOP_INFO): New. >> (LOOP_VINFO_EPILOGUE_P): New. >> (LOOP_VINFO_ORIG_VECT_FACTOR): New. >> (vect_do_peeling): Change prototype to return epilogue. >> (vect_analyze_loop): Add argument of loop_vec_info type. >> (vect_transform_loop): Return created loop. >> >> gcc/testsuite/ >> >> * lib/target-supports.exp (check_avx2_hw_available): New. >> (check_effective_target_avx2_runtime): New. >> * gcc.dg/vect/vect-tail-nomask-1.c: New test. >> > > Hi, > > This new test fails on arm-none-eabi (using default cpu/fpu/mode): > gcc.dg/vect/vect-tail-nomask-1.c -flto -ffat-lto-objects execution test > gcc.dg/vect/vect-tail-nomask-1.c execution test > > It does pass on the same target if configured --with-cpu=cortex-a9. > > Christophe > > > >> >> 2016-11-14 20:04 GMT+03:00 Richard Biener : >>> On November 14, 2016 4:39:40 PM GMT+01:00, Yuri Rumyantsev >>> wrote: Richard, I checked one of the tests designed for epilogue vectorization using patches 1 - 3 and found out that build compiler performs vectorization of epilogues with --param vect-epilogues-nomask=1 passed: $ gcc -Ofast -mavx2 t1.c -S --param vect-epilogues-nomask=1 -o t1.new-nomask.s -fdump-tree-vect-details $ grep VECTORIZED -c t1.c.156t.vect 4 Without param only 2 loops are vectorized. Should I simply add a part of tests related to this feature or I must delete all not necessary changes also? >>> >>> Please remove all not necessary changes. >>> >>> Richard. >>> Thanks. Yuri. 2016-11-14 16:40 GMT+03:00 Richard Biener : > On Mon, 14 Nov 2016, Yuri Rumyantsev wrote: > >> Richard, >> >> In my previous patch I forgot to remove couple lines related to aux field. >> Here is the correct updated patch. > > Yeah, I noticed. This patch would be ok for trunk (together with > necessary parts from 1 and 2) if all not required parts are removed > (and you'd add the testcases covering non-masked tail vect). > > Thus, can you please produce a single complete patch containing only > non-masked epilogue vectoriziation? > > Thanks, > Richard. > >> Thanks. >> Yuri. >> >> 2016-11-14 15:51 GMT+03:00 Richard Biener : >> > On Fri, 11 Nov 2016, Yuri Rumyantsev wrote: >> > >> >> Richard, >> >> >> >> I prepare updated 3 patch with passing additional argument to >> >> vect_analyze_loop as you proposed (untested). >> >> >> >> You wrote: >> >> tw, I wonder if you can produce a single patch containing just >> >> epilogue vectorization, that is combine patches 1-3 but rip out >> >> changes only needed by later patches? >> >> >> >> Did you mean that I exclude all support for vectorization epilogues, >> >> i.e. exclude from 2-nd patch all non-related changes >> >> like >> >> >> >> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c >> >> index 11863af..32011c1 100644 >> >> ---
Re: [PATCH, vec-tails] Support loop epilogue vectorization
On 15 November 2016 at 15:41, Yuri Rumyantsevwrote: > Hi All, > > Here is patch for non-masked epilogue vectoriziation. > > Bootstrap and regression testing did not show any new failures. > > Is it OK for trunk? > > Thanks. > Changelog: > > 2016-11-15 Yuri Rumyantsev > > * params.def (PARAM_VECT_EPILOGUES_NOMASK): New. > * tree-if-conv.c (tree_if_conversion): Make public. > * * tree-if-conv.h: New file. > * tree-vect-data-refs.c (vect_analyze_data_ref_dependences) Avoid > dynamic alias checks for epilogues. > * tree-vect-loop-manip.c (vect_do_peeling): Return created epilog. > * tree-vect-loop.c: include tree-if-conv.h. > (new_loop_vec_info): Add zeroing orig_loop_info field. > (vect_analyze_loop_2): Don't try to enhance alignment for epilogues. > (vect_analyze_loop): Add argument ORIG_LOOP_INFO which is not NULL > if epilogue is vectorized, set up orig_loop_info field of loop_vinfo > using passed argument. > (vect_transform_loop): Check if created epilogue should be returned > for further vectorization with less vf. If-convert epilogue if > required. Print vectorization success for epilogue. > * tree-vectorizer.c (vectorize_loops): Add epilogue vectorization > if it is required, pass loop_vinfo produced during vectorization of > loop body to vect_analyze_loop. > * tree-vectorizer.h (struct _loop_vec_info): Add new field > orig_loop_info. > (LOOP_VINFO_ORIG_LOOP_INFO): New. > (LOOP_VINFO_EPILOGUE_P): New. > (LOOP_VINFO_ORIG_VECT_FACTOR): New. > (vect_do_peeling): Change prototype to return epilogue. > (vect_analyze_loop): Add argument of loop_vec_info type. > (vect_transform_loop): Return created loop. > > gcc/testsuite/ > > * lib/target-supports.exp (check_avx2_hw_available): New. > (check_effective_target_avx2_runtime): New. > * gcc.dg/vect/vect-tail-nomask-1.c: New test. > Hi, This new test fails on arm-none-eabi (using default cpu/fpu/mode): gcc.dg/vect/vect-tail-nomask-1.c -flto -ffat-lto-objects execution test gcc.dg/vect/vect-tail-nomask-1.c execution test It does pass on the same target if configured --with-cpu=cortex-a9. Christophe > > 2016-11-14 20:04 GMT+03:00 Richard Biener : >> On November 14, 2016 4:39:40 PM GMT+01:00, Yuri Rumyantsev >> wrote: >>>Richard, >>> >>>I checked one of the tests designed for epilogue vectorization using >>>patches 1 - 3 and found out that build compiler performs vectorization >>>of epilogues with --param vect-epilogues-nomask=1 passed: >>> >>>$ gcc -Ofast -mavx2 t1.c -S --param vect-epilogues-nomask=1 -o >>>t1.new-nomask.s -fdump-tree-vect-details >>>$ grep VECTORIZED -c t1.c.156t.vect >>>4 >>> Without param only 2 loops are vectorized. >>> >>>Should I simply add a part of tests related to this feature or I must >>>delete all not necessary changes also? >> >> Please remove all not necessary changes. >> >> Richard. >> >>>Thanks. >>>Yuri. >>> >>>2016-11-14 16:40 GMT+03:00 Richard Biener : On Mon, 14 Nov 2016, Yuri Rumyantsev wrote: > Richard, > > In my previous patch I forgot to remove couple lines related to aux >>>field. > Here is the correct updated patch. Yeah, I noticed. This patch would be ok for trunk (together with necessary parts from 1 and 2) if all not required parts are removed (and you'd add the testcases covering non-masked tail vect). Thus, can you please produce a single complete patch containing only non-masked epilogue vectoriziation? Thanks, Richard. > Thanks. > Yuri. > > 2016-11-14 15:51 GMT+03:00 Richard Biener : > > On Fri, 11 Nov 2016, Yuri Rumyantsev wrote: > > > >> Richard, > >> > >> I prepare updated 3 patch with passing additional argument to > >> vect_analyze_loop as you proposed (untested). > >> > >> You wrote: > >> tw, I wonder if you can produce a single patch containing just > >> epilogue vectorization, that is combine patches 1-3 but rip out > >> changes only needed by later patches? > >> > >> Did you mean that I exclude all support for vectorization >>>epilogues, > >> i.e. exclude from 2-nd patch all non-related changes > >> like > >> > >> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c > >> index 11863af..32011c1 100644 > >> --- a/gcc/tree-vect-loop.c > >> +++ b/gcc/tree-vect-loop.c > >> @@ -1120,6 +1120,12 @@ new_loop_vec_info (struct loop *loop) > >>LOOP_VINFO_PEELING_FOR_GAPS (res) = false; > >>LOOP_VINFO_PEELING_FOR_NITER (res) = false; > >>LOOP_VINFO_OPERANDS_SWAPPED (res) = false; > >> + LOOP_VINFO_CAN_BE_MASKED (res) = false; > >> + LOOP_VINFO_REQUIRED_MASKS (res) = 0; > >> + LOOP_VINFO_COMBINE_EPILOGUE (res) = false; > >> + LOOP_VINFO_MASK_EPILOGUE (res) = false; > >> + LOOP_VINFO_NEED_MASKING (res) = false; > >> + LOOP_VINFO_ORIG_LOOP_INFO (res) =
Re: [PATCH, vec-tails] Support loop epilogue vectorization
On Tue, 15 Nov 2016, Yuri Rumyantsev wrote: > Hi All, > > Here is patch for non-masked epilogue vectoriziation. > > Bootstrap and regression testing did not show any new failures. > > Is it OK for trunk? Ok for trunk. I believe we ultimatively want to remove the new --param and enable this by default with a better cost model. What immediately comes to my mind when seeing + if (epilogue) +{ + unsigned int vector_sizes + = targetm.vectorize.autovectorize_vector_sizes (); + vector_sizes &= current_vector_size - 1; + + if (!PARAM_VALUE (PARAM_VECT_EPILOGUES_NOMASK)) + epilogue = NULL; + else if (!vector_sizes) + epilogue = NULL; + else if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo) +&& LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo) >= 0) + { + int smallest_vec_size = 1 << ctz_hwi (vector_sizes); + int ratio = current_vector_size / smallest_vec_size; + int eiters = LOOP_VINFO_INT_NITERS (loop_vinfo) + - LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo); + eiters = eiters % vf; + + epilogue->nb_iterations_upper_bound = eiters - 1; + + if (eiters < vf / ratio) + epilogue = NULL; + } is that we have targetm.vectorize.preferred_simd_mode which for example with -mprefer-avx128 will first try with 128bit vectorization. So if a ! prefered vector size ends up creating the epilogue we know vectorizing with the prefered size will fail. The above also does not take into account peeling for gaps in which case we know the epilogue runs at least VF times (but the vectorized epilogue might also need an epilogue itself if not masked). The natural next step is the masked epilogue support, after that the merged masked epilogue one. Thanks, Richard. > Thanks. > Changelog: > > 2016-11-15 Yuri Rumyantsev> > * params.def (PARAM_VECT_EPILOGUES_NOMASK): New. > * tree-if-conv.c (tree_if_conversion): Make public. > * * tree-if-conv.h: New file. > * tree-vect-data-refs.c (vect_analyze_data_ref_dependences) Avoid > dynamic alias checks for epilogues. > * tree-vect-loop-manip.c (vect_do_peeling): Return created epilog. > * tree-vect-loop.c: include tree-if-conv.h. > (new_loop_vec_info): Add zeroing orig_loop_info field. > (vect_analyze_loop_2): Don't try to enhance alignment for epilogues. > (vect_analyze_loop): Add argument ORIG_LOOP_INFO which is not NULL > if epilogue is vectorized, set up orig_loop_info field of loop_vinfo > using passed argument. > (vect_transform_loop): Check if created epilogue should be returned > for further vectorization with less vf. If-convert epilogue if > required. Print vectorization success for epilogue. > * tree-vectorizer.c (vectorize_loops): Add epilogue vectorization > if it is required, pass loop_vinfo produced during vectorization of > loop body to vect_analyze_loop. > * tree-vectorizer.h (struct _loop_vec_info): Add new field > orig_loop_info. > (LOOP_VINFO_ORIG_LOOP_INFO): New. > (LOOP_VINFO_EPILOGUE_P): New. > (LOOP_VINFO_ORIG_VECT_FACTOR): New. > (vect_do_peeling): Change prototype to return epilogue. > (vect_analyze_loop): Add argument of loop_vec_info type. > (vect_transform_loop): Return created loop. > > gcc/testsuite/ > > * lib/target-supports.exp (check_avx2_hw_available): New. > (check_effective_target_avx2_runtime): New. > * gcc.dg/vect/vect-tail-nomask-1.c: New test. > > > 2016-11-14 20:04 GMT+03:00 Richard Biener : > > On November 14, 2016 4:39:40 PM GMT+01:00, Yuri Rumyantsev > > wrote: > >>Richard, > >> > >>I checked one of the tests designed for epilogue vectorization using > >>patches 1 - 3 and found out that build compiler performs vectorization > >>of epilogues with --param vect-epilogues-nomask=1 passed: > >> > >>$ gcc -Ofast -mavx2 t1.c -S --param vect-epilogues-nomask=1 -o > >>t1.new-nomask.s -fdump-tree-vect-details > >>$ grep VECTORIZED -c t1.c.156t.vect > >>4 > >> Without param only 2 loops are vectorized. > >> > >>Should I simply add a part of tests related to this feature or I must > >>delete all not necessary changes also? > > > > Please remove all not necessary changes. > > > > Richard. > > > >>Thanks. > >>Yuri. > >> > >>2016-11-14 16:40 GMT+03:00 Richard Biener : > >>> On Mon, 14 Nov 2016, Yuri Rumyantsev wrote: > >>> > Richard, > > In my previous patch I forgot to remove couple lines related to aux > >>field. > Here is the correct updated patch. > >>> > >>> Yeah, I noticed. This patch would be ok for trunk (together with > >>> necessary parts from 1 and 2) if all not required parts are removed > >>> (and you'd add the testcases covering non-masked tail vect). > >>> > >>> Thus, can you please produce a single complete patch containing only > >>> non-masked epilogue vectoriziation? > >>> > >>> Thanks, > >>> Richard. > >>> > Thanks. > Yuri. > > 2016-11-14 15:51
Re: [PATCH, vec-tails] Support loop epilogue vectorization
Hi All, Here is patch for non-masked epilogue vectoriziation. Bootstrap and regression testing did not show any new failures. Is it OK for trunk? Thanks. Changelog: 2016-11-15 Yuri Rumyantsev* params.def (PARAM_VECT_EPILOGUES_NOMASK): New. * tree-if-conv.c (tree_if_conversion): Make public. * * tree-if-conv.h: New file. * tree-vect-data-refs.c (vect_analyze_data_ref_dependences) Avoid dynamic alias checks for epilogues. * tree-vect-loop-manip.c (vect_do_peeling): Return created epilog. * tree-vect-loop.c: include tree-if-conv.h. (new_loop_vec_info): Add zeroing orig_loop_info field. (vect_analyze_loop_2): Don't try to enhance alignment for epilogues. (vect_analyze_loop): Add argument ORIG_LOOP_INFO which is not NULL if epilogue is vectorized, set up orig_loop_info field of loop_vinfo using passed argument. (vect_transform_loop): Check if created epilogue should be returned for further vectorization with less vf. If-convert epilogue if required. Print vectorization success for epilogue. * tree-vectorizer.c (vectorize_loops): Add epilogue vectorization if it is required, pass loop_vinfo produced during vectorization of loop body to vect_analyze_loop. * tree-vectorizer.h (struct _loop_vec_info): Add new field orig_loop_info. (LOOP_VINFO_ORIG_LOOP_INFO): New. (LOOP_VINFO_EPILOGUE_P): New. (LOOP_VINFO_ORIG_VECT_FACTOR): New. (vect_do_peeling): Change prototype to return epilogue. (vect_analyze_loop): Add argument of loop_vec_info type. (vect_transform_loop): Return created loop. gcc/testsuite/ * lib/target-supports.exp (check_avx2_hw_available): New. (check_effective_target_avx2_runtime): New. * gcc.dg/vect/vect-tail-nomask-1.c: New test. 2016-11-14 20:04 GMT+03:00 Richard Biener : > On November 14, 2016 4:39:40 PM GMT+01:00, Yuri Rumyantsev > wrote: >>Richard, >> >>I checked one of the tests designed for epilogue vectorization using >>patches 1 - 3 and found out that build compiler performs vectorization >>of epilogues with --param vect-epilogues-nomask=1 passed: >> >>$ gcc -Ofast -mavx2 t1.c -S --param vect-epilogues-nomask=1 -o >>t1.new-nomask.s -fdump-tree-vect-details >>$ grep VECTORIZED -c t1.c.156t.vect >>4 >> Without param only 2 loops are vectorized. >> >>Should I simply add a part of tests related to this feature or I must >>delete all not necessary changes also? > > Please remove all not necessary changes. > > Richard. > >>Thanks. >>Yuri. >> >>2016-11-14 16:40 GMT+03:00 Richard Biener : >>> On Mon, 14 Nov 2016, Yuri Rumyantsev wrote: >>> Richard, In my previous patch I forgot to remove couple lines related to aux >>field. Here is the correct updated patch. >>> >>> Yeah, I noticed. This patch would be ok for trunk (together with >>> necessary parts from 1 and 2) if all not required parts are removed >>> (and you'd add the testcases covering non-masked tail vect). >>> >>> Thus, can you please produce a single complete patch containing only >>> non-masked epilogue vectoriziation? >>> >>> Thanks, >>> Richard. >>> Thanks. Yuri. 2016-11-14 15:51 GMT+03:00 Richard Biener : > On Fri, 11 Nov 2016, Yuri Rumyantsev wrote: > >> Richard, >> >> I prepare updated 3 patch with passing additional argument to >> vect_analyze_loop as you proposed (untested). >> >> You wrote: >> tw, I wonder if you can produce a single patch containing just >> epilogue vectorization, that is combine patches 1-3 but rip out >> changes only needed by later patches? >> >> Did you mean that I exclude all support for vectorization >>epilogues, >> i.e. exclude from 2-nd patch all non-related changes >> like >> >> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c >> index 11863af..32011c1 100644 >> --- a/gcc/tree-vect-loop.c >> +++ b/gcc/tree-vect-loop.c >> @@ -1120,6 +1120,12 @@ new_loop_vec_info (struct loop *loop) >>LOOP_VINFO_PEELING_FOR_GAPS (res) = false; >>LOOP_VINFO_PEELING_FOR_NITER (res) = false; >>LOOP_VINFO_OPERANDS_SWAPPED (res) = false; >> + LOOP_VINFO_CAN_BE_MASKED (res) = false; >> + LOOP_VINFO_REQUIRED_MASKS (res) = 0; >> + LOOP_VINFO_COMBINE_EPILOGUE (res) = false; >> + LOOP_VINFO_MASK_EPILOGUE (res) = false; >> + LOOP_VINFO_NEED_MASKING (res) = false; >> + LOOP_VINFO_ORIG_LOOP_INFO (res) = NULL; > > Yes. > >> Did you mean also that new combined patch must be working patch, >>i.e. >> can be integrated without other patches? > > Yes. > >> Could you please look at updated patch? > > Will do. > > Thanks, > Richard. > >> Thanks. >> Yuri. >> >> 2016-11-10 15:36 GMT+03:00 Richard Biener : >> > On Thu, 10 Nov 2016, Richard Biener wrote: >> > >> >> On Tue, 8 Nov 2016, Yuri
Re: [PATCH, vec-tails] Support loop epilogue vectorization
On November 14, 2016 4:39:40 PM GMT+01:00, Yuri Rumyantsevwrote: >Richard, > >I checked one of the tests designed for epilogue vectorization using >patches 1 - 3 and found out that build compiler performs vectorization >of epilogues with --param vect-epilogues-nomask=1 passed: > >$ gcc -Ofast -mavx2 t1.c -S --param vect-epilogues-nomask=1 -o >t1.new-nomask.s -fdump-tree-vect-details >$ grep VECTORIZED -c t1.c.156t.vect >4 > Without param only 2 loops are vectorized. > >Should I simply add a part of tests related to this feature or I must >delete all not necessary changes also? Please remove all not necessary changes. Richard. >Thanks. >Yuri. > >2016-11-14 16:40 GMT+03:00 Richard Biener : >> On Mon, 14 Nov 2016, Yuri Rumyantsev wrote: >> >>> Richard, >>> >>> In my previous patch I forgot to remove couple lines related to aux >field. >>> Here is the correct updated patch. >> >> Yeah, I noticed. This patch would be ok for trunk (together with >> necessary parts from 1 and 2) if all not required parts are removed >> (and you'd add the testcases covering non-masked tail vect). >> >> Thus, can you please produce a single complete patch containing only >> non-masked epilogue vectoriziation? >> >> Thanks, >> Richard. >> >>> Thanks. >>> Yuri. >>> >>> 2016-11-14 15:51 GMT+03:00 Richard Biener : >>> > On Fri, 11 Nov 2016, Yuri Rumyantsev wrote: >>> > >>> >> Richard, >>> >> >>> >> I prepare updated 3 patch with passing additional argument to >>> >> vect_analyze_loop as you proposed (untested). >>> >> >>> >> You wrote: >>> >> tw, I wonder if you can produce a single patch containing just >>> >> epilogue vectorization, that is combine patches 1-3 but rip out >>> >> changes only needed by later patches? >>> >> >>> >> Did you mean that I exclude all support for vectorization >epilogues, >>> >> i.e. exclude from 2-nd patch all non-related changes >>> >> like >>> >> >>> >> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c >>> >> index 11863af..32011c1 100644 >>> >> --- a/gcc/tree-vect-loop.c >>> >> +++ b/gcc/tree-vect-loop.c >>> >> @@ -1120,6 +1120,12 @@ new_loop_vec_info (struct loop *loop) >>> >>LOOP_VINFO_PEELING_FOR_GAPS (res) = false; >>> >>LOOP_VINFO_PEELING_FOR_NITER (res) = false; >>> >>LOOP_VINFO_OPERANDS_SWAPPED (res) = false; >>> >> + LOOP_VINFO_CAN_BE_MASKED (res) = false; >>> >> + LOOP_VINFO_REQUIRED_MASKS (res) = 0; >>> >> + LOOP_VINFO_COMBINE_EPILOGUE (res) = false; >>> >> + LOOP_VINFO_MASK_EPILOGUE (res) = false; >>> >> + LOOP_VINFO_NEED_MASKING (res) = false; >>> >> + LOOP_VINFO_ORIG_LOOP_INFO (res) = NULL; >>> > >>> > Yes. >>> > >>> >> Did you mean also that new combined patch must be working patch, >i.e. >>> >> can be integrated without other patches? >>> > >>> > Yes. >>> > >>> >> Could you please look at updated patch? >>> > >>> > Will do. >>> > >>> > Thanks, >>> > Richard. >>> > >>> >> Thanks. >>> >> Yuri. >>> >> >>> >> 2016-11-10 15:36 GMT+03:00 Richard Biener : >>> >> > On Thu, 10 Nov 2016, Richard Biener wrote: >>> >> > >>> >> >> On Tue, 8 Nov 2016, Yuri Rumyantsev wrote: >>> >> >> >>> >> >> > Richard, >>> >> >> > >>> >> >> > Here is updated 3 patch. >>> >> >> > >>> >> >> > I checked that all new tests related to epilogue >vectorization passed with it. >>> >> >> > >>> >> >> > Your comments will be appreciated. >>> >> >> >>> >> >> A lot better now. Instead of the ->aux dance I now prefer to >>> >> >> pass the original loops loop_vinfo to vect_analyze_loop as >>> >> >> optional argument (if non-NULL we analyze the epilogue of that >>> >> >> loop_vinfo). OTOH I remember we mainly use it to get at the >>> >> >> original vectorization factor? So we can pass down an >(optional) >>> >> >> forced vectorization factor as well? >>> >> > >>> >> > Btw, I wonder if you can produce a single patch containing just >>> >> > epilogue vectorization, that is combine patches 1-3 but rip out >>> >> > changes only needed by later patches? >>> >> > >>> >> > Thanks, >>> >> > Richard. >>> >> > >>> >> >> Richard. >>> >> >> >>> >> >> > 2016-11-08 15:38 GMT+03:00 Richard Biener > : >>> >> >> > > On Thu, 3 Nov 2016, Yuri Rumyantsev wrote: >>> >> >> > > >>> >> >> > >> Hi Richard, >>> >> >> > >> >>> >> >> > >> I did not understand your last remark: >>> >> >> > >> >>> >> >> > >> > That is, here (and avoid the FOR_EACH_LOOP change): >>> >> >> > >> > >>> >> >> > >> > @@ -580,12 +586,21 @@ vectorize_loops (void) >>> >> >> > >> > && dump_enabled_p ()) >>> >> >> > >> > dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, >vect_location, >>> >> >> > >> >"loop vectorized\n"); >>> >> >> > >> > - vect_transform_loop (loop_vinfo); >>> >> >> > >> > + new_loop = vect_transform_loop (loop_vinfo); >>> >> >> > >> > num_vectorized_loops++; >>> >> >> > >> >/* Now that the loop has been vectorized, allow >it to be unrolled >>> >> >> > >> > etc. */
Re: [PATCH, vec-tails] Support loop epilogue vectorization
Richard, I checked one of the tests designed for epilogue vectorization using patches 1 - 3 and found out that build compiler performs vectorization of epilogues with --param vect-epilogues-nomask=1 passed: $ gcc -Ofast -mavx2 t1.c -S --param vect-epilogues-nomask=1 -o t1.new-nomask.s -fdump-tree-vect-details $ grep VECTORIZED -c t1.c.156t.vect 4 Without param only 2 loops are vectorized. Should I simply add a part of tests related to this feature or I must delete all not necessary changes also? Thanks. Yuri. 2016-11-14 16:40 GMT+03:00 Richard Biener: > On Mon, 14 Nov 2016, Yuri Rumyantsev wrote: > >> Richard, >> >> In my previous patch I forgot to remove couple lines related to aux field. >> Here is the correct updated patch. > > Yeah, I noticed. This patch would be ok for trunk (together with > necessary parts from 1 and 2) if all not required parts are removed > (and you'd add the testcases covering non-masked tail vect). > > Thus, can you please produce a single complete patch containing only > non-masked epilogue vectoriziation? > > Thanks, > Richard. > >> Thanks. >> Yuri. >> >> 2016-11-14 15:51 GMT+03:00 Richard Biener : >> > On Fri, 11 Nov 2016, Yuri Rumyantsev wrote: >> > >> >> Richard, >> >> >> >> I prepare updated 3 patch with passing additional argument to >> >> vect_analyze_loop as you proposed (untested). >> >> >> >> You wrote: >> >> tw, I wonder if you can produce a single patch containing just >> >> epilogue vectorization, that is combine patches 1-3 but rip out >> >> changes only needed by later patches? >> >> >> >> Did you mean that I exclude all support for vectorization epilogues, >> >> i.e. exclude from 2-nd patch all non-related changes >> >> like >> >> >> >> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c >> >> index 11863af..32011c1 100644 >> >> --- a/gcc/tree-vect-loop.c >> >> +++ b/gcc/tree-vect-loop.c >> >> @@ -1120,6 +1120,12 @@ new_loop_vec_info (struct loop *loop) >> >>LOOP_VINFO_PEELING_FOR_GAPS (res) = false; >> >>LOOP_VINFO_PEELING_FOR_NITER (res) = false; >> >>LOOP_VINFO_OPERANDS_SWAPPED (res) = false; >> >> + LOOP_VINFO_CAN_BE_MASKED (res) = false; >> >> + LOOP_VINFO_REQUIRED_MASKS (res) = 0; >> >> + LOOP_VINFO_COMBINE_EPILOGUE (res) = false; >> >> + LOOP_VINFO_MASK_EPILOGUE (res) = false; >> >> + LOOP_VINFO_NEED_MASKING (res) = false; >> >> + LOOP_VINFO_ORIG_LOOP_INFO (res) = NULL; >> > >> > Yes. >> > >> >> Did you mean also that new combined patch must be working patch, i.e. >> >> can be integrated without other patches? >> > >> > Yes. >> > >> >> Could you please look at updated patch? >> > >> > Will do. >> > >> > Thanks, >> > Richard. >> > >> >> Thanks. >> >> Yuri. >> >> >> >> 2016-11-10 15:36 GMT+03:00 Richard Biener : >> >> > On Thu, 10 Nov 2016, Richard Biener wrote: >> >> > >> >> >> On Tue, 8 Nov 2016, Yuri Rumyantsev wrote: >> >> >> >> >> >> > Richard, >> >> >> > >> >> >> > Here is updated 3 patch. >> >> >> > >> >> >> > I checked that all new tests related to epilogue vectorization >> >> >> > passed with it. >> >> >> > >> >> >> > Your comments will be appreciated. >> >> >> >> >> >> A lot better now. Instead of the ->aux dance I now prefer to >> >> >> pass the original loops loop_vinfo to vect_analyze_loop as >> >> >> optional argument (if non-NULL we analyze the epilogue of that >> >> >> loop_vinfo). OTOH I remember we mainly use it to get at the >> >> >> original vectorization factor? So we can pass down an (optional) >> >> >> forced vectorization factor as well? >> >> > >> >> > Btw, I wonder if you can produce a single patch containing just >> >> > epilogue vectorization, that is combine patches 1-3 but rip out >> >> > changes only needed by later patches? >> >> > >> >> > Thanks, >> >> > Richard. >> >> > >> >> >> Richard. >> >> >> >> >> >> > 2016-11-08 15:38 GMT+03:00 Richard Biener : >> >> >> > > On Thu, 3 Nov 2016, Yuri Rumyantsev wrote: >> >> >> > > >> >> >> > >> Hi Richard, >> >> >> > >> >> >> >> > >> I did not understand your last remark: >> >> >> > >> >> >> >> > >> > That is, here (and avoid the FOR_EACH_LOOP change): >> >> >> > >> > >> >> >> > >> > @@ -580,12 +586,21 @@ vectorize_loops (void) >> >> >> > >> > && dump_enabled_p ()) >> >> >> > >> > dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, >> >> >> > >> > vect_location, >> >> >> > >> >"loop vectorized\n"); >> >> >> > >> > - vect_transform_loop (loop_vinfo); >> >> >> > >> > + new_loop = vect_transform_loop (loop_vinfo); >> >> >> > >> > num_vectorized_loops++; >> >> >> > >> >/* Now that the loop has been vectorized, allow it to be >> >> >> > >> > unrolled >> >> >> > >> > etc. */ >> >> >> > >> > loop->force_vectorize = false; >> >> >> > >> > >> >> >> > >> > + /* Add new loop to a processing queue. To make it >> >> >> > >> > easier >> >> >> > >> > + to match loop and its epilogue
Re: [PATCH, vec-tails] Support loop epilogue vectorization
On Mon, 14 Nov 2016, Yuri Rumyantsev wrote: > Richard, > > In my previous patch I forgot to remove couple lines related to aux field. > Here is the correct updated patch. Yeah, I noticed. This patch would be ok for trunk (together with necessary parts from 1 and 2) if all not required parts are removed (and you'd add the testcases covering non-masked tail vect). Thus, can you please produce a single complete patch containing only non-masked epilogue vectoriziation? Thanks, Richard. > Thanks. > Yuri. > > 2016-11-14 15:51 GMT+03:00 Richard Biener: > > On Fri, 11 Nov 2016, Yuri Rumyantsev wrote: > > > >> Richard, > >> > >> I prepare updated 3 patch with passing additional argument to > >> vect_analyze_loop as you proposed (untested). > >> > >> You wrote: > >> tw, I wonder if you can produce a single patch containing just > >> epilogue vectorization, that is combine patches 1-3 but rip out > >> changes only needed by later patches? > >> > >> Did you mean that I exclude all support for vectorization epilogues, > >> i.e. exclude from 2-nd patch all non-related changes > >> like > >> > >> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c > >> index 11863af..32011c1 100644 > >> --- a/gcc/tree-vect-loop.c > >> +++ b/gcc/tree-vect-loop.c > >> @@ -1120,6 +1120,12 @@ new_loop_vec_info (struct loop *loop) > >>LOOP_VINFO_PEELING_FOR_GAPS (res) = false; > >>LOOP_VINFO_PEELING_FOR_NITER (res) = false; > >>LOOP_VINFO_OPERANDS_SWAPPED (res) = false; > >> + LOOP_VINFO_CAN_BE_MASKED (res) = false; > >> + LOOP_VINFO_REQUIRED_MASKS (res) = 0; > >> + LOOP_VINFO_COMBINE_EPILOGUE (res) = false; > >> + LOOP_VINFO_MASK_EPILOGUE (res) = false; > >> + LOOP_VINFO_NEED_MASKING (res) = false; > >> + LOOP_VINFO_ORIG_LOOP_INFO (res) = NULL; > > > > Yes. > > > >> Did you mean also that new combined patch must be working patch, i.e. > >> can be integrated without other patches? > > > > Yes. > > > >> Could you please look at updated patch? > > > > Will do. > > > > Thanks, > > Richard. > > > >> Thanks. > >> Yuri. > >> > >> 2016-11-10 15:36 GMT+03:00 Richard Biener : > >> > On Thu, 10 Nov 2016, Richard Biener wrote: > >> > > >> >> On Tue, 8 Nov 2016, Yuri Rumyantsev wrote: > >> >> > >> >> > Richard, > >> >> > > >> >> > Here is updated 3 patch. > >> >> > > >> >> > I checked that all new tests related to epilogue vectorization passed > >> >> > with it. > >> >> > > >> >> > Your comments will be appreciated. > >> >> > >> >> A lot better now. Instead of the ->aux dance I now prefer to > >> >> pass the original loops loop_vinfo to vect_analyze_loop as > >> >> optional argument (if non-NULL we analyze the epilogue of that > >> >> loop_vinfo). OTOH I remember we mainly use it to get at the > >> >> original vectorization factor? So we can pass down an (optional) > >> >> forced vectorization factor as well? > >> > > >> > Btw, I wonder if you can produce a single patch containing just > >> > epilogue vectorization, that is combine patches 1-3 but rip out > >> > changes only needed by later patches? > >> > > >> > Thanks, > >> > Richard. > >> > > >> >> Richard. > >> >> > >> >> > 2016-11-08 15:38 GMT+03:00 Richard Biener : > >> >> > > On Thu, 3 Nov 2016, Yuri Rumyantsev wrote: > >> >> > > > >> >> > >> Hi Richard, > >> >> > >> > >> >> > >> I did not understand your last remark: > >> >> > >> > >> >> > >> > That is, here (and avoid the FOR_EACH_LOOP change): > >> >> > >> > > >> >> > >> > @@ -580,12 +586,21 @@ vectorize_loops (void) > >> >> > >> > && dump_enabled_p ()) > >> >> > >> > dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, > >> >> > >> > vect_location, > >> >> > >> >"loop vectorized\n"); > >> >> > >> > - vect_transform_loop (loop_vinfo); > >> >> > >> > + new_loop = vect_transform_loop (loop_vinfo); > >> >> > >> > num_vectorized_loops++; > >> >> > >> >/* Now that the loop has been vectorized, allow it to be > >> >> > >> > unrolled > >> >> > >> > etc. */ > >> >> > >> > loop->force_vectorize = false; > >> >> > >> > > >> >> > >> > + /* Add new loop to a processing queue. To make it easier > >> >> > >> > + to match loop and its epilogue vectorization in dumps > >> >> > >> > + put new loop as the next loop to process. */ > >> >> > >> > + if (new_loop) > >> >> > >> > + { > >> >> > >> > + loops.safe_insert (i + 1, new_loop->num); > >> >> > >> > + vect_loops_num = number_of_loops (cfun); > >> >> > >> > + } > >> >> > >> > > >> >> > >> > simply dispatch to a vectorize_epilogue (loop_vinfo, new_loop) > >> >> > >> f> unction which will set up stuff properly (and also perform > >> >> > >> > the if-conversion of the epilogue there). > >> >> > >> > > >> >> > >> > That said, if we can get in non-masked epilogue vectorization > >> >> > >> > separately that would be great. > >> >> > >> > >> >> > >> Could you please clarify
Re: [PATCH, vec-tails] Support loop epilogue vectorization
Richard, In my previous patch I forgot to remove couple lines related to aux field. Here is the correct updated patch. Thanks. Yuri. 2016-11-14 15:51 GMT+03:00 Richard Biener: > On Fri, 11 Nov 2016, Yuri Rumyantsev wrote: > >> Richard, >> >> I prepare updated 3 patch with passing additional argument to >> vect_analyze_loop as you proposed (untested). >> >> You wrote: >> tw, I wonder if you can produce a single patch containing just >> epilogue vectorization, that is combine patches 1-3 but rip out >> changes only needed by later patches? >> >> Did you mean that I exclude all support for vectorization epilogues, >> i.e. exclude from 2-nd patch all non-related changes >> like >> >> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c >> index 11863af..32011c1 100644 >> --- a/gcc/tree-vect-loop.c >> +++ b/gcc/tree-vect-loop.c >> @@ -1120,6 +1120,12 @@ new_loop_vec_info (struct loop *loop) >>LOOP_VINFO_PEELING_FOR_GAPS (res) = false; >>LOOP_VINFO_PEELING_FOR_NITER (res) = false; >>LOOP_VINFO_OPERANDS_SWAPPED (res) = false; >> + LOOP_VINFO_CAN_BE_MASKED (res) = false; >> + LOOP_VINFO_REQUIRED_MASKS (res) = 0; >> + LOOP_VINFO_COMBINE_EPILOGUE (res) = false; >> + LOOP_VINFO_MASK_EPILOGUE (res) = false; >> + LOOP_VINFO_NEED_MASKING (res) = false; >> + LOOP_VINFO_ORIG_LOOP_INFO (res) = NULL; > > Yes. > >> Did you mean also that new combined patch must be working patch, i.e. >> can be integrated without other patches? > > Yes. > >> Could you please look at updated patch? > > Will do. > > Thanks, > Richard. > >> Thanks. >> Yuri. >> >> 2016-11-10 15:36 GMT+03:00 Richard Biener : >> > On Thu, 10 Nov 2016, Richard Biener wrote: >> > >> >> On Tue, 8 Nov 2016, Yuri Rumyantsev wrote: >> >> >> >> > Richard, >> >> > >> >> > Here is updated 3 patch. >> >> > >> >> > I checked that all new tests related to epilogue vectorization passed >> >> > with it. >> >> > >> >> > Your comments will be appreciated. >> >> >> >> A lot better now. Instead of the ->aux dance I now prefer to >> >> pass the original loops loop_vinfo to vect_analyze_loop as >> >> optional argument (if non-NULL we analyze the epilogue of that >> >> loop_vinfo). OTOH I remember we mainly use it to get at the >> >> original vectorization factor? So we can pass down an (optional) >> >> forced vectorization factor as well? >> > >> > Btw, I wonder if you can produce a single patch containing just >> > epilogue vectorization, that is combine patches 1-3 but rip out >> > changes only needed by later patches? >> > >> > Thanks, >> > Richard. >> > >> >> Richard. >> >> >> >> > 2016-11-08 15:38 GMT+03:00 Richard Biener : >> >> > > On Thu, 3 Nov 2016, Yuri Rumyantsev wrote: >> >> > > >> >> > >> Hi Richard, >> >> > >> >> >> > >> I did not understand your last remark: >> >> > >> >> >> > >> > That is, here (and avoid the FOR_EACH_LOOP change): >> >> > >> > >> >> > >> > @@ -580,12 +586,21 @@ vectorize_loops (void) >> >> > >> > && dump_enabled_p ()) >> >> > >> > dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, vect_location, >> >> > >> >"loop vectorized\n"); >> >> > >> > - vect_transform_loop (loop_vinfo); >> >> > >> > + new_loop = vect_transform_loop (loop_vinfo); >> >> > >> > num_vectorized_loops++; >> >> > >> >/* Now that the loop has been vectorized, allow it to be >> >> > >> > unrolled >> >> > >> > etc. */ >> >> > >> > loop->force_vectorize = false; >> >> > >> > >> >> > >> > + /* Add new loop to a processing queue. To make it easier >> >> > >> > + to match loop and its epilogue vectorization in dumps >> >> > >> > + put new loop as the next loop to process. */ >> >> > >> > + if (new_loop) >> >> > >> > + { >> >> > >> > + loops.safe_insert (i + 1, new_loop->num); >> >> > >> > + vect_loops_num = number_of_loops (cfun); >> >> > >> > + } >> >> > >> > >> >> > >> > simply dispatch to a vectorize_epilogue (loop_vinfo, new_loop) >> >> > >> f> unction which will set up stuff properly (and also perform >> >> > >> > the if-conversion of the epilogue there). >> >> > >> > >> >> > >> > That said, if we can get in non-masked epilogue vectorization >> >> > >> > separately that would be great. >> >> > >> >> >> > >> Could you please clarify your proposal. >> >> > > >> >> > > When a loop was vectorized set things up to immediately vectorize >> >> > > its epilogue, avoiding changing the loop iteration and avoiding >> >> > > the re-use of ->aux. >> >> > > >> >> > > Richard. >> >> > > >> >> > >> Thanks. >> >> > >> Yuri. >> >> > >> >> >> > >> 2016-11-02 15:27 GMT+03:00 Richard Biener : >> >> > >> > On Tue, 1 Nov 2016, Yuri Rumyantsev wrote: >> >> > >> > >> >> > >> >> Hi All, >> >> > >> >> >> >> > >> >> I re-send all patches sent by Ilya earlier for review which >> >> > >> >> support >> >> > >> >> vectorization of loop epilogues and loops with
Re: [PATCH, vec-tails] Support loop epilogue vectorization
On Fri, 11 Nov 2016, Yuri Rumyantsev wrote: > Richard, > > Here is fixed version of updated patch 3. > > Any comments will be appreciated. Looks good apart from + if (epilogue) +{ + epilogue->force_vectorize = loop->force_vectorize; + epilogue->safelen = loop->safelen; + epilogue->dont_vectorize = false; + + /* We may need to if-convert epilogue to vectorize it. */ + if (LOOP_VINFO_SCALAR_LOOP (loop_vinfo)) + tree_if_conversion (epilogue); + + gcc_assert (!epilogue->aux); + epilogue->aux = loop_vinfo; where the last two lines should now no longer be necessary? Thanks, Richard. > Thanks. > Yuri. > > 2016-11-11 17:15 GMT+03:00 Yuri Rumyantsev: > > Richard, > > > > Sorry for confusion but my updated patch does not work properly, so I > > need to fix it. > > > > Yuri. > > > > 2016-11-11 14:15 GMT+03:00 Yuri Rumyantsev : > >> Richard, > >> > >> I prepare updated 3 patch with passing additional argument to > >> vect_analyze_loop as you proposed (untested). > >> > >> You wrote: > >> tw, I wonder if you can produce a single patch containing just > >> epilogue vectorization, that is combine patches 1-3 but rip out > >> changes only needed by later patches? > >> > >> Did you mean that I exclude all support for vectorization epilogues, > >> i.e. exclude from 2-nd patch all non-related changes > >> like > >> > >> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c > >> index 11863af..32011c1 100644 > >> --- a/gcc/tree-vect-loop.c > >> +++ b/gcc/tree-vect-loop.c > >> @@ -1120,6 +1120,12 @@ new_loop_vec_info (struct loop *loop) > >>LOOP_VINFO_PEELING_FOR_GAPS (res) = false; > >>LOOP_VINFO_PEELING_FOR_NITER (res) = false; > >>LOOP_VINFO_OPERANDS_SWAPPED (res) = false; > >> + LOOP_VINFO_CAN_BE_MASKED (res) = false; > >> + LOOP_VINFO_REQUIRED_MASKS (res) = 0; > >> + LOOP_VINFO_COMBINE_EPILOGUE (res) = false; > >> + LOOP_VINFO_MASK_EPILOGUE (res) = false; > >> + LOOP_VINFO_NEED_MASKING (res) = false; > >> + LOOP_VINFO_ORIG_LOOP_INFO (res) = NULL; > >> > >> Did you mean also that new combined patch must be working patch, i.e. > >> can be integrated without other patches? > >> > >> Could you please look at updated patch? > >> > >> Thanks. > >> Yuri. > >> > >> 2016-11-10 15:36 GMT+03:00 Richard Biener : > >>> On Thu, 10 Nov 2016, Richard Biener wrote: > >>> > On Tue, 8 Nov 2016, Yuri Rumyantsev wrote: > > > Richard, > > > > Here is updated 3 patch. > > > > I checked that all new tests related to epilogue vectorization passed > > with it. > > > > Your comments will be appreciated. > > A lot better now. Instead of the ->aux dance I now prefer to > pass the original loops loop_vinfo to vect_analyze_loop as > optional argument (if non-NULL we analyze the epilogue of that > loop_vinfo). OTOH I remember we mainly use it to get at the > original vectorization factor? So we can pass down an (optional) > forced vectorization factor as well? > >>> > >>> Btw, I wonder if you can produce a single patch containing just > >>> epilogue vectorization, that is combine patches 1-3 but rip out > >>> changes only needed by later patches? > >>> > >>> Thanks, > >>> Richard. > >>> > Richard. > > > 2016-11-08 15:38 GMT+03:00 Richard Biener : > > > On Thu, 3 Nov 2016, Yuri Rumyantsev wrote: > > > > > >> Hi Richard, > > >> > > >> I did not understand your last remark: > > >> > > >> > That is, here (and avoid the FOR_EACH_LOOP change): > > >> > > > >> > @@ -580,12 +586,21 @@ vectorize_loops (void) > > >> > && dump_enabled_p ()) > > >> > dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, vect_location, > > >> >"loop vectorized\n"); > > >> > - vect_transform_loop (loop_vinfo); > > >> > + new_loop = vect_transform_loop (loop_vinfo); > > >> > num_vectorized_loops++; > > >> >/* Now that the loop has been vectorized, allow it to be > > >> > unrolled > > >> > etc. */ > > >> > loop->force_vectorize = false; > > >> > > > >> > + /* Add new loop to a processing queue. To make it easier > > >> > + to match loop and its epilogue vectorization in dumps > > >> > + put new loop as the next loop to process. */ > > >> > + if (new_loop) > > >> > + { > > >> > + loops.safe_insert (i + 1, new_loop->num); > > >> > + vect_loops_num = number_of_loops (cfun); > > >> > + } > > >> > > > >> > simply dispatch to a vectorize_epilogue (loop_vinfo, new_loop) > > >> f> unction which will set up stuff properly (and also perform > > >> > the if-conversion of the epilogue there). > > >> > > > >> > That said, if we
Re: [PATCH, vec-tails] Support loop epilogue vectorization
On Fri, 11 Nov 2016, Yuri Rumyantsev wrote: > Richard, > > I prepare updated 3 patch with passing additional argument to > vect_analyze_loop as you proposed (untested). > > You wrote: > tw, I wonder if you can produce a single patch containing just > epilogue vectorization, that is combine patches 1-3 but rip out > changes only needed by later patches? > > Did you mean that I exclude all support for vectorization epilogues, > i.e. exclude from 2-nd patch all non-related changes > like > > diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c > index 11863af..32011c1 100644 > --- a/gcc/tree-vect-loop.c > +++ b/gcc/tree-vect-loop.c > @@ -1120,6 +1120,12 @@ new_loop_vec_info (struct loop *loop) >LOOP_VINFO_PEELING_FOR_GAPS (res) = false; >LOOP_VINFO_PEELING_FOR_NITER (res) = false; >LOOP_VINFO_OPERANDS_SWAPPED (res) = false; > + LOOP_VINFO_CAN_BE_MASKED (res) = false; > + LOOP_VINFO_REQUIRED_MASKS (res) = 0; > + LOOP_VINFO_COMBINE_EPILOGUE (res) = false; > + LOOP_VINFO_MASK_EPILOGUE (res) = false; > + LOOP_VINFO_NEED_MASKING (res) = false; > + LOOP_VINFO_ORIG_LOOP_INFO (res) = NULL; Yes. > Did you mean also that new combined patch must be working patch, i.e. > can be integrated without other patches? Yes. > Could you please look at updated patch? Will do. Thanks, Richard. > Thanks. > Yuri. > > 2016-11-10 15:36 GMT+03:00 Richard Biener: > > On Thu, 10 Nov 2016, Richard Biener wrote: > > > >> On Tue, 8 Nov 2016, Yuri Rumyantsev wrote: > >> > >> > Richard, > >> > > >> > Here is updated 3 patch. > >> > > >> > I checked that all new tests related to epilogue vectorization passed > >> > with it. > >> > > >> > Your comments will be appreciated. > >> > >> A lot better now. Instead of the ->aux dance I now prefer to > >> pass the original loops loop_vinfo to vect_analyze_loop as > >> optional argument (if non-NULL we analyze the epilogue of that > >> loop_vinfo). OTOH I remember we mainly use it to get at the > >> original vectorization factor? So we can pass down an (optional) > >> forced vectorization factor as well? > > > > Btw, I wonder if you can produce a single patch containing just > > epilogue vectorization, that is combine patches 1-3 but rip out > > changes only needed by later patches? > > > > Thanks, > > Richard. > > > >> Richard. > >> > >> > 2016-11-08 15:38 GMT+03:00 Richard Biener : > >> > > On Thu, 3 Nov 2016, Yuri Rumyantsev wrote: > >> > > > >> > >> Hi Richard, > >> > >> > >> > >> I did not understand your last remark: > >> > >> > >> > >> > That is, here (and avoid the FOR_EACH_LOOP change): > >> > >> > > >> > >> > @@ -580,12 +586,21 @@ vectorize_loops (void) > >> > >> > && dump_enabled_p ()) > >> > >> > dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, vect_location, > >> > >> >"loop vectorized\n"); > >> > >> > - vect_transform_loop (loop_vinfo); > >> > >> > + new_loop = vect_transform_loop (loop_vinfo); > >> > >> > num_vectorized_loops++; > >> > >> >/* Now that the loop has been vectorized, allow it to be > >> > >> > unrolled > >> > >> > etc. */ > >> > >> > loop->force_vectorize = false; > >> > >> > > >> > >> > + /* Add new loop to a processing queue. To make it easier > >> > >> > + to match loop and its epilogue vectorization in dumps > >> > >> > + put new loop as the next loop to process. */ > >> > >> > + if (new_loop) > >> > >> > + { > >> > >> > + loops.safe_insert (i + 1, new_loop->num); > >> > >> > + vect_loops_num = number_of_loops (cfun); > >> > >> > + } > >> > >> > > >> > >> > simply dispatch to a vectorize_epilogue (loop_vinfo, new_loop) > >> > >> f> unction which will set up stuff properly (and also perform > >> > >> > the if-conversion of the epilogue there). > >> > >> > > >> > >> > That said, if we can get in non-masked epilogue vectorization > >> > >> > separately that would be great. > >> > >> > >> > >> Could you please clarify your proposal. > >> > > > >> > > When a loop was vectorized set things up to immediately vectorize > >> > > its epilogue, avoiding changing the loop iteration and avoiding > >> > > the re-use of ->aux. > >> > > > >> > > Richard. > >> > > > >> > >> Thanks. > >> > >> Yuri. > >> > >> > >> > >> 2016-11-02 15:27 GMT+03:00 Richard Biener : > >> > >> > On Tue, 1 Nov 2016, Yuri Rumyantsev wrote: > >> > >> > > >> > >> >> Hi All, > >> > >> >> > >> > >> >> I re-send all patches sent by Ilya earlier for review which support > >> > >> >> vectorization of loop epilogues and loops with low trip count. We > >> > >> >> assume that the only patch - vec-tails-07-combine-tail.patch - was > >> > >> >> not > >> > >> >> approved by Jeff. > >> > >> >> > >> > >> >> I did re-base of all patches and performed bootstrapping and > >> > >> >> regression testing that did not show any new failures. Also all > >> > >> >> changes related to new
Re: [PATCH, vec-tails] Support loop epilogue vectorization
Richard, Here is fixed version of updated patch 3. Any comments will be appreciated. Thanks. Yuri. 2016-11-11 17:15 GMT+03:00 Yuri Rumyantsev: > Richard, > > Sorry for confusion but my updated patch does not work properly, so I > need to fix it. > > Yuri. > > 2016-11-11 14:15 GMT+03:00 Yuri Rumyantsev : >> Richard, >> >> I prepare updated 3 patch with passing additional argument to >> vect_analyze_loop as you proposed (untested). >> >> You wrote: >> tw, I wonder if you can produce a single patch containing just >> epilogue vectorization, that is combine patches 1-3 but rip out >> changes only needed by later patches? >> >> Did you mean that I exclude all support for vectorization epilogues, >> i.e. exclude from 2-nd patch all non-related changes >> like >> >> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c >> index 11863af..32011c1 100644 >> --- a/gcc/tree-vect-loop.c >> +++ b/gcc/tree-vect-loop.c >> @@ -1120,6 +1120,12 @@ new_loop_vec_info (struct loop *loop) >>LOOP_VINFO_PEELING_FOR_GAPS (res) = false; >>LOOP_VINFO_PEELING_FOR_NITER (res) = false; >>LOOP_VINFO_OPERANDS_SWAPPED (res) = false; >> + LOOP_VINFO_CAN_BE_MASKED (res) = false; >> + LOOP_VINFO_REQUIRED_MASKS (res) = 0; >> + LOOP_VINFO_COMBINE_EPILOGUE (res) = false; >> + LOOP_VINFO_MASK_EPILOGUE (res) = false; >> + LOOP_VINFO_NEED_MASKING (res) = false; >> + LOOP_VINFO_ORIG_LOOP_INFO (res) = NULL; >> >> Did you mean also that new combined patch must be working patch, i.e. >> can be integrated without other patches? >> >> Could you please look at updated patch? >> >> Thanks. >> Yuri. >> >> 2016-11-10 15:36 GMT+03:00 Richard Biener : >>> On Thu, 10 Nov 2016, Richard Biener wrote: >>> On Tue, 8 Nov 2016, Yuri Rumyantsev wrote: > Richard, > > Here is updated 3 patch. > > I checked that all new tests related to epilogue vectorization passed > with it. > > Your comments will be appreciated. A lot better now. Instead of the ->aux dance I now prefer to pass the original loops loop_vinfo to vect_analyze_loop as optional argument (if non-NULL we analyze the epilogue of that loop_vinfo). OTOH I remember we mainly use it to get at the original vectorization factor? So we can pass down an (optional) forced vectorization factor as well? >>> >>> Btw, I wonder if you can produce a single patch containing just >>> epilogue vectorization, that is combine patches 1-3 but rip out >>> changes only needed by later patches? >>> >>> Thanks, >>> Richard. >>> Richard. > 2016-11-08 15:38 GMT+03:00 Richard Biener : > > On Thu, 3 Nov 2016, Yuri Rumyantsev wrote: > > > >> Hi Richard, > >> > >> I did not understand your last remark: > >> > >> > That is, here (and avoid the FOR_EACH_LOOP change): > >> > > >> > @@ -580,12 +586,21 @@ vectorize_loops (void) > >> > && dump_enabled_p ()) > >> > dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, vect_location, > >> >"loop vectorized\n"); > >> > - vect_transform_loop (loop_vinfo); > >> > + new_loop = vect_transform_loop (loop_vinfo); > >> > num_vectorized_loops++; > >> >/* Now that the loop has been vectorized, allow it to be > >> > unrolled > >> > etc. */ > >> > loop->force_vectorize = false; > >> > > >> > + /* Add new loop to a processing queue. To make it easier > >> > + to match loop and its epilogue vectorization in dumps > >> > + put new loop as the next loop to process. */ > >> > + if (new_loop) > >> > + { > >> > + loops.safe_insert (i + 1, new_loop->num); > >> > + vect_loops_num = number_of_loops (cfun); > >> > + } > >> > > >> > simply dispatch to a vectorize_epilogue (loop_vinfo, new_loop) > >> f> unction which will set up stuff properly (and also perform > >> > the if-conversion of the epilogue there). > >> > > >> > That said, if we can get in non-masked epilogue vectorization > >> > separately that would be great. > >> > >> Could you please clarify your proposal. > > > > When a loop was vectorized set things up to immediately vectorize > > its epilogue, avoiding changing the loop iteration and avoiding > > the re-use of ->aux. > > > > Richard. > > > >> Thanks. > >> Yuri. > >> > >> 2016-11-02 15:27 GMT+03:00 Richard Biener : > >> > On Tue, 1 Nov 2016, Yuri Rumyantsev wrote: > >> > > >> >> Hi All, > >> >> > >> >> I re-send all patches sent by Ilya earlier for review which support > >> >> vectorization of loop epilogues and loops with low trip count. We > >> >> assume that the
Re: [PATCH, vec-tails] Support loop epilogue vectorization
Richard, Sorry for confusion but my updated patch does not work properly, so I need to fix it. Yuri. 2016-11-11 14:15 GMT+03:00 Yuri Rumyantsev: > Richard, > > I prepare updated 3 patch with passing additional argument to > vect_analyze_loop as you proposed (untested). > > You wrote: > tw, I wonder if you can produce a single patch containing just > epilogue vectorization, that is combine patches 1-3 but rip out > changes only needed by later patches? > > Did you mean that I exclude all support for vectorization epilogues, > i.e. exclude from 2-nd patch all non-related changes > like > > diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c > index 11863af..32011c1 100644 > --- a/gcc/tree-vect-loop.c > +++ b/gcc/tree-vect-loop.c > @@ -1120,6 +1120,12 @@ new_loop_vec_info (struct loop *loop) >LOOP_VINFO_PEELING_FOR_GAPS (res) = false; >LOOP_VINFO_PEELING_FOR_NITER (res) = false; >LOOP_VINFO_OPERANDS_SWAPPED (res) = false; > + LOOP_VINFO_CAN_BE_MASKED (res) = false; > + LOOP_VINFO_REQUIRED_MASKS (res) = 0; > + LOOP_VINFO_COMBINE_EPILOGUE (res) = false; > + LOOP_VINFO_MASK_EPILOGUE (res) = false; > + LOOP_VINFO_NEED_MASKING (res) = false; > + LOOP_VINFO_ORIG_LOOP_INFO (res) = NULL; > > Did you mean also that new combined patch must be working patch, i.e. > can be integrated without other patches? > > Could you please look at updated patch? > > Thanks. > Yuri. > > 2016-11-10 15:36 GMT+03:00 Richard Biener : >> On Thu, 10 Nov 2016, Richard Biener wrote: >> >>> On Tue, 8 Nov 2016, Yuri Rumyantsev wrote: >>> >>> > Richard, >>> > >>> > Here is updated 3 patch. >>> > >>> > I checked that all new tests related to epilogue vectorization passed >>> > with it. >>> > >>> > Your comments will be appreciated. >>> >>> A lot better now. Instead of the ->aux dance I now prefer to >>> pass the original loops loop_vinfo to vect_analyze_loop as >>> optional argument (if non-NULL we analyze the epilogue of that >>> loop_vinfo). OTOH I remember we mainly use it to get at the >>> original vectorization factor? So we can pass down an (optional) >>> forced vectorization factor as well? >> >> Btw, I wonder if you can produce a single patch containing just >> epilogue vectorization, that is combine patches 1-3 but rip out >> changes only needed by later patches? >> >> Thanks, >> Richard. >> >>> Richard. >>> >>> > 2016-11-08 15:38 GMT+03:00 Richard Biener : >>> > > On Thu, 3 Nov 2016, Yuri Rumyantsev wrote: >>> > > >>> > >> Hi Richard, >>> > >> >>> > >> I did not understand your last remark: >>> > >> >>> > >> > That is, here (and avoid the FOR_EACH_LOOP change): >>> > >> > >>> > >> > @@ -580,12 +586,21 @@ vectorize_loops (void) >>> > >> > && dump_enabled_p ()) >>> > >> > dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, vect_location, >>> > >> >"loop vectorized\n"); >>> > >> > - vect_transform_loop (loop_vinfo); >>> > >> > + new_loop = vect_transform_loop (loop_vinfo); >>> > >> > num_vectorized_loops++; >>> > >> >/* Now that the loop has been vectorized, allow it to be >>> > >> > unrolled >>> > >> > etc. */ >>> > >> > loop->force_vectorize = false; >>> > >> > >>> > >> > + /* Add new loop to a processing queue. To make it easier >>> > >> > + to match loop and its epilogue vectorization in dumps >>> > >> > + put new loop as the next loop to process. */ >>> > >> > + if (new_loop) >>> > >> > + { >>> > >> > + loops.safe_insert (i + 1, new_loop->num); >>> > >> > + vect_loops_num = number_of_loops (cfun); >>> > >> > + } >>> > >> > >>> > >> > simply dispatch to a vectorize_epilogue (loop_vinfo, new_loop) >>> > >> f> unction which will set up stuff properly (and also perform >>> > >> > the if-conversion of the epilogue there). >>> > >> > >>> > >> > That said, if we can get in non-masked epilogue vectorization >>> > >> > separately that would be great. >>> > >> >>> > >> Could you please clarify your proposal. >>> > > >>> > > When a loop was vectorized set things up to immediately vectorize >>> > > its epilogue, avoiding changing the loop iteration and avoiding >>> > > the re-use of ->aux. >>> > > >>> > > Richard. >>> > > >>> > >> Thanks. >>> > >> Yuri. >>> > >> >>> > >> 2016-11-02 15:27 GMT+03:00 Richard Biener : >>> > >> > On Tue, 1 Nov 2016, Yuri Rumyantsev wrote: >>> > >> > >>> > >> >> Hi All, >>> > >> >> >>> > >> >> I re-send all patches sent by Ilya earlier for review which support >>> > >> >> vectorization of loop epilogues and loops with low trip count. We >>> > >> >> assume that the only patch - vec-tails-07-combine-tail.patch - was >>> > >> >> not >>> > >> >> approved by Jeff. >>> > >> >> >>> > >> >> I did re-base of all patches and performed bootstrapping and >>> > >> >> regression testing that did not show any new failures. Also all >>> > >> >> changes related to new
Re: [PATCH, vec-tails] Support loop epilogue vectorization
Richard, I prepare updated 3 patch with passing additional argument to vect_analyze_loop as you proposed (untested). You wrote: tw, I wonder if you can produce a single patch containing just epilogue vectorization, that is combine patches 1-3 but rip out changes only needed by later patches? Did you mean that I exclude all support for vectorization epilogues, i.e. exclude from 2-nd patch all non-related changes like diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c index 11863af..32011c1 100644 --- a/gcc/tree-vect-loop.c +++ b/gcc/tree-vect-loop.c @@ -1120,6 +1120,12 @@ new_loop_vec_info (struct loop *loop) LOOP_VINFO_PEELING_FOR_GAPS (res) = false; LOOP_VINFO_PEELING_FOR_NITER (res) = false; LOOP_VINFO_OPERANDS_SWAPPED (res) = false; + LOOP_VINFO_CAN_BE_MASKED (res) = false; + LOOP_VINFO_REQUIRED_MASKS (res) = 0; + LOOP_VINFO_COMBINE_EPILOGUE (res) = false; + LOOP_VINFO_MASK_EPILOGUE (res) = false; + LOOP_VINFO_NEED_MASKING (res) = false; + LOOP_VINFO_ORIG_LOOP_INFO (res) = NULL; Did you mean also that new combined patch must be working patch, i.e. can be integrated without other patches? Could you please look at updated patch? Thanks. Yuri. 2016-11-10 15:36 GMT+03:00 Richard Biener: > On Thu, 10 Nov 2016, Richard Biener wrote: > >> On Tue, 8 Nov 2016, Yuri Rumyantsev wrote: >> >> > Richard, >> > >> > Here is updated 3 patch. >> > >> > I checked that all new tests related to epilogue vectorization passed with >> > it. >> > >> > Your comments will be appreciated. >> >> A lot better now. Instead of the ->aux dance I now prefer to >> pass the original loops loop_vinfo to vect_analyze_loop as >> optional argument (if non-NULL we analyze the epilogue of that >> loop_vinfo). OTOH I remember we mainly use it to get at the >> original vectorization factor? So we can pass down an (optional) >> forced vectorization factor as well? > > Btw, I wonder if you can produce a single patch containing just > epilogue vectorization, that is combine patches 1-3 but rip out > changes only needed by later patches? > > Thanks, > Richard. > >> Richard. >> >> > 2016-11-08 15:38 GMT+03:00 Richard Biener : >> > > On Thu, 3 Nov 2016, Yuri Rumyantsev wrote: >> > > >> > >> Hi Richard, >> > >> >> > >> I did not understand your last remark: >> > >> >> > >> > That is, here (and avoid the FOR_EACH_LOOP change): >> > >> > >> > >> > @@ -580,12 +586,21 @@ vectorize_loops (void) >> > >> > && dump_enabled_p ()) >> > >> > dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, vect_location, >> > >> >"loop vectorized\n"); >> > >> > - vect_transform_loop (loop_vinfo); >> > >> > + new_loop = vect_transform_loop (loop_vinfo); >> > >> > num_vectorized_loops++; >> > >> >/* Now that the loop has been vectorized, allow it to be >> > >> > unrolled >> > >> > etc. */ >> > >> > loop->force_vectorize = false; >> > >> > >> > >> > + /* Add new loop to a processing queue. To make it easier >> > >> > + to match loop and its epilogue vectorization in dumps >> > >> > + put new loop as the next loop to process. */ >> > >> > + if (new_loop) >> > >> > + { >> > >> > + loops.safe_insert (i + 1, new_loop->num); >> > >> > + vect_loops_num = number_of_loops (cfun); >> > >> > + } >> > >> > >> > >> > simply dispatch to a vectorize_epilogue (loop_vinfo, new_loop) >> > >> f> unction which will set up stuff properly (and also perform >> > >> > the if-conversion of the epilogue there). >> > >> > >> > >> > That said, if we can get in non-masked epilogue vectorization >> > >> > separately that would be great. >> > >> >> > >> Could you please clarify your proposal. >> > > >> > > When a loop was vectorized set things up to immediately vectorize >> > > its epilogue, avoiding changing the loop iteration and avoiding >> > > the re-use of ->aux. >> > > >> > > Richard. >> > > >> > >> Thanks. >> > >> Yuri. >> > >> >> > >> 2016-11-02 15:27 GMT+03:00 Richard Biener : >> > >> > On Tue, 1 Nov 2016, Yuri Rumyantsev wrote: >> > >> > >> > >> >> Hi All, >> > >> >> >> > >> >> I re-send all patches sent by Ilya earlier for review which support >> > >> >> vectorization of loop epilogues and loops with low trip count. We >> > >> >> assume that the only patch - vec-tails-07-combine-tail.patch - was >> > >> >> not >> > >> >> approved by Jeff. >> > >> >> >> > >> >> I did re-base of all patches and performed bootstrapping and >> > >> >> regression testing that did not show any new failures. Also all >> > >> >> changes related to new vect_do_peeling algorithm have been changed >> > >> >> accordingly. >> > >> >> >> > >> >> Is it OK for trunk? >> > >> > >> > >> > I would have prefered that the series up to -03-nomask-tails would >> > >> > _only_ contain epilogue loop vectorization changes but unfortunately >> > >> > the patchset is oddly separated. >> > >> > >> > >> >
Re: [PATCH, vec-tails] Support loop epilogue vectorization
On Thu, 10 Nov 2016, Richard Biener wrote: > On Tue, 8 Nov 2016, Yuri Rumyantsev wrote: > > > Richard, > > > > Here is updated 3 patch. > > > > I checked that all new tests related to epilogue vectorization passed with > > it. > > > > Your comments will be appreciated. > > A lot better now. Instead of the ->aux dance I now prefer to > pass the original loops loop_vinfo to vect_analyze_loop as > optional argument (if non-NULL we analyze the epilogue of that > loop_vinfo). OTOH I remember we mainly use it to get at the > original vectorization factor? So we can pass down an (optional) > forced vectorization factor as well? Btw, I wonder if you can produce a single patch containing just epilogue vectorization, that is combine patches 1-3 but rip out changes only needed by later patches? Thanks, Richard. > Richard. > > > 2016-11-08 15:38 GMT+03:00 Richard Biener: > > > On Thu, 3 Nov 2016, Yuri Rumyantsev wrote: > > > > > >> Hi Richard, > > >> > > >> I did not understand your last remark: > > >> > > >> > That is, here (and avoid the FOR_EACH_LOOP change): > > >> > > > >> > @@ -580,12 +586,21 @@ vectorize_loops (void) > > >> > && dump_enabled_p ()) > > >> > dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, vect_location, > > >> >"loop vectorized\n"); > > >> > - vect_transform_loop (loop_vinfo); > > >> > + new_loop = vect_transform_loop (loop_vinfo); > > >> > num_vectorized_loops++; > > >> >/* Now that the loop has been vectorized, allow it to be > > >> > unrolled > > >> > etc. */ > > >> > loop->force_vectorize = false; > > >> > > > >> > + /* Add new loop to a processing queue. To make it easier > > >> > + to match loop and its epilogue vectorization in dumps > > >> > + put new loop as the next loop to process. */ > > >> > + if (new_loop) > > >> > + { > > >> > + loops.safe_insert (i + 1, new_loop->num); > > >> > + vect_loops_num = number_of_loops (cfun); > > >> > + } > > >> > > > >> > simply dispatch to a vectorize_epilogue (loop_vinfo, new_loop) > > >> f> unction which will set up stuff properly (and also perform > > >> > the if-conversion of the epilogue there). > > >> > > > >> > That said, if we can get in non-masked epilogue vectorization > > >> > separately that would be great. > > >> > > >> Could you please clarify your proposal. > > > > > > When a loop was vectorized set things up to immediately vectorize > > > its epilogue, avoiding changing the loop iteration and avoiding > > > the re-use of ->aux. > > > > > > Richard. > > > > > >> Thanks. > > >> Yuri. > > >> > > >> 2016-11-02 15:27 GMT+03:00 Richard Biener : > > >> > On Tue, 1 Nov 2016, Yuri Rumyantsev wrote: > > >> > > > >> >> Hi All, > > >> >> > > >> >> I re-send all patches sent by Ilya earlier for review which support > > >> >> vectorization of loop epilogues and loops with low trip count. We > > >> >> assume that the only patch - vec-tails-07-combine-tail.patch - was not > > >> >> approved by Jeff. > > >> >> > > >> >> I did re-base of all patches and performed bootstrapping and > > >> >> regression testing that did not show any new failures. Also all > > >> >> changes related to new vect_do_peeling algorithm have been changed > > >> >> accordingly. > > >> >> > > >> >> Is it OK for trunk? > > >> > > > >> > I would have prefered that the series up to -03-nomask-tails would > > >> > _only_ contain epilogue loop vectorization changes but unfortunately > > >> > the patchset is oddly separated. > > >> > > > >> > I have a comment on that part nevertheless: > > >> > > > >> > @@ -1608,7 +1614,10 @@ vect_enhance_data_refs_alignment (loop_vec_info > > >> > loop_vinfo) > > >> >/* Check if we can possibly peel the loop. */ > > >> >if (!vect_can_advance_ivs_p (loop_vinfo) > > >> >|| !slpeel_can_duplicate_loop_p (loop, single_exit (loop)) > > >> > - || loop->inner) > > >> > + || loop->inner > > >> > + /* Required peeling was performed in prologue and > > >> > +is not required for epilogue. */ > > >> > + || LOOP_VINFO_EPILOGUE_P (loop_vinfo)) > > >> > do_peeling = false; > > >> > > > >> >if (do_peeling > > >> > @@ -1888,7 +1897,10 @@ vect_enhance_data_refs_alignment (loop_vec_info > > >> > loop_vinfo) > > >> > > > >> >do_versioning = > > >> > optimize_loop_nest_for_speed_p (loop) > > >> > - && (!loop->inner); /* FORNOW */ > > >> > + && (!loop->inner) /* FORNOW */ > > >> > +/* Required versioning was performed for the > > >> > + original loop and is not required for epilogue. */ > > >> > + && !LOOP_VINFO_EPILOGUE_P (loop_vinfo); > > >> > > > >> >if (do_versioning) > > >> > { > > >> > > > >> > please do that check in the single caller of this function. > > >> > > > >> > Otherwise I still dislike the new ->aux use and I believe that simply > > >> >
Re: [PATCH, vec-tails] Support loop epilogue vectorization
On Tue, 8 Nov 2016, Yuri Rumyantsev wrote: > Richard, > > Here is updated 3 patch. > > I checked that all new tests related to epilogue vectorization passed with it. > > Your comments will be appreciated. A lot better now. Instead of the ->aux dance I now prefer to pass the original loops loop_vinfo to vect_analyze_loop as optional argument (if non-NULL we analyze the epilogue of that loop_vinfo). OTOH I remember we mainly use it to get at the original vectorization factor? So we can pass down an (optional) forced vectorization factor as well? Richard. > 2016-11-08 15:38 GMT+03:00 Richard Biener: > > On Thu, 3 Nov 2016, Yuri Rumyantsev wrote: > > > >> Hi Richard, > >> > >> I did not understand your last remark: > >> > >> > That is, here (and avoid the FOR_EACH_LOOP change): > >> > > >> > @@ -580,12 +586,21 @@ vectorize_loops (void) > >> > && dump_enabled_p ()) > >> > dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, vect_location, > >> >"loop vectorized\n"); > >> > - vect_transform_loop (loop_vinfo); > >> > + new_loop = vect_transform_loop (loop_vinfo); > >> > num_vectorized_loops++; > >> >/* Now that the loop has been vectorized, allow it to be unrolled > >> > etc. */ > >> > loop->force_vectorize = false; > >> > > >> > + /* Add new loop to a processing queue. To make it easier > >> > + to match loop and its epilogue vectorization in dumps > >> > + put new loop as the next loop to process. */ > >> > + if (new_loop) > >> > + { > >> > + loops.safe_insert (i + 1, new_loop->num); > >> > + vect_loops_num = number_of_loops (cfun); > >> > + } > >> > > >> > simply dispatch to a vectorize_epilogue (loop_vinfo, new_loop) > >> f> unction which will set up stuff properly (and also perform > >> > the if-conversion of the epilogue there). > >> > > >> > That said, if we can get in non-masked epilogue vectorization > >> > separately that would be great. > >> > >> Could you please clarify your proposal. > > > > When a loop was vectorized set things up to immediately vectorize > > its epilogue, avoiding changing the loop iteration and avoiding > > the re-use of ->aux. > > > > Richard. > > > >> Thanks. > >> Yuri. > >> > >> 2016-11-02 15:27 GMT+03:00 Richard Biener : > >> > On Tue, 1 Nov 2016, Yuri Rumyantsev wrote: > >> > > >> >> Hi All, > >> >> > >> >> I re-send all patches sent by Ilya earlier for review which support > >> >> vectorization of loop epilogues and loops with low trip count. We > >> >> assume that the only patch - vec-tails-07-combine-tail.patch - was not > >> >> approved by Jeff. > >> >> > >> >> I did re-base of all patches and performed bootstrapping and > >> >> regression testing that did not show any new failures. Also all > >> >> changes related to new vect_do_peeling algorithm have been changed > >> >> accordingly. > >> >> > >> >> Is it OK for trunk? > >> > > >> > I would have prefered that the series up to -03-nomask-tails would > >> > _only_ contain epilogue loop vectorization changes but unfortunately > >> > the patchset is oddly separated. > >> > > >> > I have a comment on that part nevertheless: > >> > > >> > @@ -1608,7 +1614,10 @@ vect_enhance_data_refs_alignment (loop_vec_info > >> > loop_vinfo) > >> >/* Check if we can possibly peel the loop. */ > >> >if (!vect_can_advance_ivs_p (loop_vinfo) > >> >|| !slpeel_can_duplicate_loop_p (loop, single_exit (loop)) > >> > - || loop->inner) > >> > + || loop->inner > >> > + /* Required peeling was performed in prologue and > >> > +is not required for epilogue. */ > >> > + || LOOP_VINFO_EPILOGUE_P (loop_vinfo)) > >> > do_peeling = false; > >> > > >> >if (do_peeling > >> > @@ -1888,7 +1897,10 @@ vect_enhance_data_refs_alignment (loop_vec_info > >> > loop_vinfo) > >> > > >> >do_versioning = > >> > optimize_loop_nest_for_speed_p (loop) > >> > - && (!loop->inner); /* FORNOW */ > >> > + && (!loop->inner) /* FORNOW */ > >> > +/* Required versioning was performed for the > >> > + original loop and is not required for epilogue. */ > >> > + && !LOOP_VINFO_EPILOGUE_P (loop_vinfo); > >> > > >> >if (do_versioning) > >> > { > >> > > >> > please do that check in the single caller of this function. > >> > > >> > Otherwise I still dislike the new ->aux use and I believe that simply > >> > passing down info from the processed parent would be _much_ cleaner. > >> > That is, here (and avoid the FOR_EACH_LOOP change): > >> > > >> > @@ -580,12 +586,21 @@ vectorize_loops (void) > >> > && dump_enabled_p ()) > >> >dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, vect_location, > >> > "loop vectorized\n"); > >> > - vect_transform_loop (loop_vinfo); > >> > + new_loop = vect_transform_loop (loop_vinfo); > >> >
Re: [PATCH, vec-tails] Support loop epilogue vectorization
On Wed, 9 Nov 2016, Yuri Rumyantsev wrote: > Thanks Richard for your comments. > Your proposed to handle epilogue loop just like normal short-trip loop > but this is not exactly truth since e.g. epilogue must not be peeled > for alignment. But if we know the epilogue data-refs are aligned we should have reflected that in the code so the vectorizer wouldn't even try to peel for alignment. OTOH peeling for alignment already knows that peeling a short-trip loop is not likely profitable (maybe the condition needs to be hardened to also work for VF/2). > It is not clear for me what are my next steps? Should I re-design the > patch completely or simply decompose the whole patch to different > parts? But it means that we must start review process from beginning > but release is closed to its end. What I disliked about the series from the beginning is that it does everything at once rather than first introducing vectorizing of epilogues as an independent patch. Lumping all in together makes it hard to decipher the conditions each feature is enabled. I'm mostly concerned about the predication part and thus if we can get the other parts separated and committed that would be a much smaller patch to look at and experiment. Note that only stage1 is at its end, we usually still accept patches that were posted before stage1 end during stage3. > Note also that i work for Intel till the end of year and have not idea > who will continue working on this project. Noted. Richard. > Any help will be appreciated. > > Thanks. > Yuri. > > 2016-11-09 13:37 GMT+03:00 Bin.Cheng: > > On Tue, Nov 1, 2016 at 12:38 PM, Yuri Rumyantsev wrote: > >> Hi All, > >> > >> I re-send all patches sent by Ilya earlier for review which support > >> vectorization of loop epilogues and loops with low trip count. We > >> assume that the only patch - vec-tails-07-combine-tail.patch - was not > >> approved by Jeff. > >> > >> I did re-base of all patches and performed bootstrapping and > >> regression testing that did not show any new failures. Also all > >> changes related to new vect_do_peeling algorithm have been changed > >> accordingly. > >> > >> Is it OK for trunk? > > > > Hi, > > I can't approve patches, but had some comments after going through the > > implementation. > > > > One confusing part is cost model change, as well as the way it's used > > to decide how epilogue loop should be vectorized. Given vect-tail is > > disabled at the moment and the cost change needs further tuning, is it > > reasonable to split this part out and get vectorization part > > reviewed/submitted independently? For example, let user specified > > parameters make the decision for now. Cost and target dependent > > changes should go in at last, this could make the patch easier to > > read. > > > > The implementation computes/shares quite amount information from main > > loop to epilogue loop vectorization. Furthermore, variables/fields > > for such information are somehow named in a misleading way. For > > example. LOOP_VINFO_MASK_EPILOGUE gives me the impression this is the > > flag controlling whether epilogue loop should be vectorized with > > masking. However, it's actually controlled by exactly the same flag > > as whether epilogue loop should be combined into the main loop with > > masking: > > @@ -7338,6 +8013,9 @@ vect_transform_loop (loop_vec_info loop_vinfo) > > > >slpeel_make_loop_iterate_ntimes (loop, niters_vector); > > > > + if (LOOP_VINFO_COMBINE_EPILOGUE (loop_vinfo)) > > +vect_combine_loop_epilogue (loop_vinfo); > > + > >/* Reduce loop iterations by the vectorization factor. */ > >scale_loop_profile (loop, GCOV_COMPUTE_SCALE (1, vf), > >expected_iterations / vf); > > > > IMHO, we should decouple main loop vectorization and epilogue > > vectorization as much as possible by sharing as few information as we > > can. The general idea is to handle epilogue loop just like normal > > short-trip loop. For example, we can rename > > LOOP_VINFO_COMBINE_EPILOGUE into LOOP_VINFO_VECT_MASK (or something > > else), and we don't differentiate its meaning between main and > > epilogue(short-trip) loop. It only indicates the current loop should > > be vectorized with masking no matter it's a main loop or epilogue > > loop, and it works just like the current implementation. > > > > After this change, we can refine vectorization and make it more > > general for normal loop and epilogue(short trip) loop. For example, > > this implementation sets LOOP_VINFO_PEELING_FOR_NITER for epilogue > > loop and use it to control how it should be vectorized: > > + if (!LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo)) > > +{ > > + LOOP_VINFO_MASK_EPILOGUE (loop_vinfo) = false; > > + LOOP_VINFO_COMBINE_EPILOGUE (loop_vinfo) = false; > > +} > > + else if (LOOP_VINFO_CAN_BE_MASKED (loop_vinfo) > > + && min_profitable_combine_iters >= 0) > > +{ > > > > This works,
Re: [PATCH, vec-tails] Support loop epilogue vectorization
On Wed, Nov 9, 2016 at 12:12 PM, Yuri Rumyantsevwrote: > I am familiar with SVE extension and understand that implemented > approach might be not suitable for ARM. The main point is that only > load/store instructions are masked but all other calculations are not > (we did special conversion for reduction statements to implement > merging predication semantic). For SVE peeling for niters is not > required but it is not true for x86 - we must determine what > vectorization scheme is more profitable: loop combining (the only > essential for SVE) or separate epilogue vectorization using masking or > less vectorization factor. So I'd like to have the full list of > required changes to our implementation to try to remove them. Hmm, sorry that my comment gave impression that I was trying to hold back the patch, it's not what I meant by any means. Also it's not related to SVE, As a matter of fact, I haven't read any document about SVE yet. Sorry again for the false impression conveyed by previous messages. Thanks, bin > > Thanks. > Yuri. > > 2016-11-09 14:46 GMT+03:00 Bin.Cheng : >> On Wed, Nov 9, 2016 at 11:28 AM, Yuri Rumyantsev wrote: >>> Thanks Richard for your comments. >>> Your proposed to handle epilogue loop just like normal short-trip loop >>> but this is not exactly truth since e.g. epilogue must not be peeled >>> for alignment. >> Yes there must be some differences, my motivation is to minimize that >> so we don't need to specially check normal/epilogue loops at too many >> places. >> Of course it's just my feeling when going through the patch set, and >> could be wrong. >> >> Thanks, >> bin >>> >>> It is not clear for me what are my next steps? Should I re-design the >>> patch completely or simply decompose the whole patch to different >>> parts? But it means that we must start review process from beginning >>> but release is closed to its end. >>> Note also that i work for Intel till the end of year and have not idea >>> who will continue working on this project. >>> >>> Any help will be appreciated. >>> >>> Thanks. >>> Yuri. >>> >>> 2016-11-09 13:37 GMT+03:00 Bin.Cheng : On Tue, Nov 1, 2016 at 12:38 PM, Yuri Rumyantsev wrote: > Hi All, > > I re-send all patches sent by Ilya earlier for review which support > vectorization of loop epilogues and loops with low trip count. We > assume that the only patch - vec-tails-07-combine-tail.patch - was not > approved by Jeff. > > I did re-base of all patches and performed bootstrapping and > regression testing that did not show any new failures. Also all > changes related to new vect_do_peeling algorithm have been changed > accordingly. > > Is it OK for trunk? Hi, I can't approve patches, but had some comments after going through the implementation. One confusing part is cost model change, as well as the way it's used to decide how epilogue loop should be vectorized. Given vect-tail is disabled at the moment and the cost change needs further tuning, is it reasonable to split this part out and get vectorization part reviewed/submitted independently? For example, let user specified parameters make the decision for now. Cost and target dependent changes should go in at last, this could make the patch easier to read. The implementation computes/shares quite amount information from main loop to epilogue loop vectorization. Furthermore, variables/fields for such information are somehow named in a misleading way. For example. LOOP_VINFO_MASK_EPILOGUE gives me the impression this is the flag controlling whether epilogue loop should be vectorized with masking. However, it's actually controlled by exactly the same flag as whether epilogue loop should be combined into the main loop with masking: @@ -7338,6 +8013,9 @@ vect_transform_loop (loop_vec_info loop_vinfo) slpeel_make_loop_iterate_ntimes (loop, niters_vector); + if (LOOP_VINFO_COMBINE_EPILOGUE (loop_vinfo)) +vect_combine_loop_epilogue (loop_vinfo); + /* Reduce loop iterations by the vectorization factor. */ scale_loop_profile (loop, GCOV_COMPUTE_SCALE (1, vf), expected_iterations / vf); IMHO, we should decouple main loop vectorization and epilogue vectorization as much as possible by sharing as few information as we can. The general idea is to handle epilogue loop just like normal short-trip loop. For example, we can rename LOOP_VINFO_COMBINE_EPILOGUE into LOOP_VINFO_VECT_MASK (or something else), and we don't differentiate its meaning between main and epilogue(short-trip) loop. It only indicates the current loop should be vectorized with masking no matter it's a main loop or epilogue loop, and
Re: [PATCH, vec-tails] Support loop epilogue vectorization
I am familiar with SVE extension and understand that implemented approach might be not suitable for ARM. The main point is that only load/store instructions are masked but all other calculations are not (we did special conversion for reduction statements to implement merging predication semantic). For SVE peeling for niters is not required but it is not true for x86 - we must determine what vectorization scheme is more profitable: loop combining (the only essential for SVE) or separate epilogue vectorization using masking or less vectorization factor. So I'd like to have the full list of required changes to our implementation to try to remove them. Thanks. Yuri. 2016-11-09 14:46 GMT+03:00 Bin.Cheng: > On Wed, Nov 9, 2016 at 11:28 AM, Yuri Rumyantsev wrote: >> Thanks Richard for your comments. >> Your proposed to handle epilogue loop just like normal short-trip loop >> but this is not exactly truth since e.g. epilogue must not be peeled >> for alignment. > Yes there must be some differences, my motivation is to minimize that > so we don't need to specially check normal/epilogue loops at too many > places. > Of course it's just my feeling when going through the patch set, and > could be wrong. > > Thanks, > bin >> >> It is not clear for me what are my next steps? Should I re-design the >> patch completely or simply decompose the whole patch to different >> parts? But it means that we must start review process from beginning >> but release is closed to its end. >> Note also that i work for Intel till the end of year and have not idea >> who will continue working on this project. >> >> Any help will be appreciated. >> >> Thanks. >> Yuri. >> >> 2016-11-09 13:37 GMT+03:00 Bin.Cheng : >>> On Tue, Nov 1, 2016 at 12:38 PM, Yuri Rumyantsev wrote: Hi All, I re-send all patches sent by Ilya earlier for review which support vectorization of loop epilogues and loops with low trip count. We assume that the only patch - vec-tails-07-combine-tail.patch - was not approved by Jeff. I did re-base of all patches and performed bootstrapping and regression testing that did not show any new failures. Also all changes related to new vect_do_peeling algorithm have been changed accordingly. Is it OK for trunk? >>> >>> Hi, >>> I can't approve patches, but had some comments after going through the >>> implementation. >>> >>> One confusing part is cost model change, as well as the way it's used >>> to decide how epilogue loop should be vectorized. Given vect-tail is >>> disabled at the moment and the cost change needs further tuning, is it >>> reasonable to split this part out and get vectorization part >>> reviewed/submitted independently? For example, let user specified >>> parameters make the decision for now. Cost and target dependent >>> changes should go in at last, this could make the patch easier to >>> read. >>> >>> The implementation computes/shares quite amount information from main >>> loop to epilogue loop vectorization. Furthermore, variables/fields >>> for such information are somehow named in a misleading way. For >>> example. LOOP_VINFO_MASK_EPILOGUE gives me the impression this is the >>> flag controlling whether epilogue loop should be vectorized with >>> masking. However, it's actually controlled by exactly the same flag >>> as whether epilogue loop should be combined into the main loop with >>> masking: >>> @@ -7338,6 +8013,9 @@ vect_transform_loop (loop_vec_info loop_vinfo) >>> >>>slpeel_make_loop_iterate_ntimes (loop, niters_vector); >>> >>> + if (LOOP_VINFO_COMBINE_EPILOGUE (loop_vinfo)) >>> +vect_combine_loop_epilogue (loop_vinfo); >>> + >>>/* Reduce loop iterations by the vectorization factor. */ >>>scale_loop_profile (loop, GCOV_COMPUTE_SCALE (1, vf), >>>expected_iterations / vf); >>> >>> IMHO, we should decouple main loop vectorization and epilogue >>> vectorization as much as possible by sharing as few information as we >>> can. The general idea is to handle epilogue loop just like normal >>> short-trip loop. For example, we can rename >>> LOOP_VINFO_COMBINE_EPILOGUE into LOOP_VINFO_VECT_MASK (or something >>> else), and we don't differentiate its meaning between main and >>> epilogue(short-trip) loop. It only indicates the current loop should >>> be vectorized with masking no matter it's a main loop or epilogue >>> loop, and it works just like the current implementation. >>> >>> After this change, we can refine vectorization and make it more >>> general for normal loop and epilogue(short trip) loop. For example, >>> this implementation sets LOOP_VINFO_PEELING_FOR_NITER for epilogue >>> loop and use it to control how it should be vectorized: >>> + if (!LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo)) >>> +{ >>> + LOOP_VINFO_MASK_EPILOGUE (loop_vinfo) = false; >>> + LOOP_VINFO_COMBINE_EPILOGUE
Re: [PATCH, vec-tails] Support loop epilogue vectorization
On Wed, Nov 9, 2016 at 11:28 AM, Yuri Rumyantsevwrote: > Thanks Richard for your comments. > Your proposed to handle epilogue loop just like normal short-trip loop > but this is not exactly truth since e.g. epilogue must not be peeled > for alignment. Yes there must be some differences, my motivation is to minimize that so we don't need to specially check normal/epilogue loops at too many places. Of course it's just my feeling when going through the patch set, and could be wrong. Thanks, bin > > It is not clear for me what are my next steps? Should I re-design the > patch completely or simply decompose the whole patch to different > parts? But it means that we must start review process from beginning > but release is closed to its end. > Note also that i work for Intel till the end of year and have not idea > who will continue working on this project. > > Any help will be appreciated. > > Thanks. > Yuri. > > 2016-11-09 13:37 GMT+03:00 Bin.Cheng : >> On Tue, Nov 1, 2016 at 12:38 PM, Yuri Rumyantsev wrote: >>> Hi All, >>> >>> I re-send all patches sent by Ilya earlier for review which support >>> vectorization of loop epilogues and loops with low trip count. We >>> assume that the only patch - vec-tails-07-combine-tail.patch - was not >>> approved by Jeff. >>> >>> I did re-base of all patches and performed bootstrapping and >>> regression testing that did not show any new failures. Also all >>> changes related to new vect_do_peeling algorithm have been changed >>> accordingly. >>> >>> Is it OK for trunk? >> >> Hi, >> I can't approve patches, but had some comments after going through the >> implementation. >> >> One confusing part is cost model change, as well as the way it's used >> to decide how epilogue loop should be vectorized. Given vect-tail is >> disabled at the moment and the cost change needs further tuning, is it >> reasonable to split this part out and get vectorization part >> reviewed/submitted independently? For example, let user specified >> parameters make the decision for now. Cost and target dependent >> changes should go in at last, this could make the patch easier to >> read. >> >> The implementation computes/shares quite amount information from main >> loop to epilogue loop vectorization. Furthermore, variables/fields >> for such information are somehow named in a misleading way. For >> example. LOOP_VINFO_MASK_EPILOGUE gives me the impression this is the >> flag controlling whether epilogue loop should be vectorized with >> masking. However, it's actually controlled by exactly the same flag >> as whether epilogue loop should be combined into the main loop with >> masking: >> @@ -7338,6 +8013,9 @@ vect_transform_loop (loop_vec_info loop_vinfo) >> >>slpeel_make_loop_iterate_ntimes (loop, niters_vector); >> >> + if (LOOP_VINFO_COMBINE_EPILOGUE (loop_vinfo)) >> +vect_combine_loop_epilogue (loop_vinfo); >> + >>/* Reduce loop iterations by the vectorization factor. */ >>scale_loop_profile (loop, GCOV_COMPUTE_SCALE (1, vf), >>expected_iterations / vf); >> >> IMHO, we should decouple main loop vectorization and epilogue >> vectorization as much as possible by sharing as few information as we >> can. The general idea is to handle epilogue loop just like normal >> short-trip loop. For example, we can rename >> LOOP_VINFO_COMBINE_EPILOGUE into LOOP_VINFO_VECT_MASK (or something >> else), and we don't differentiate its meaning between main and >> epilogue(short-trip) loop. It only indicates the current loop should >> be vectorized with masking no matter it's a main loop or epilogue >> loop, and it works just like the current implementation. >> >> After this change, we can refine vectorization and make it more >> general for normal loop and epilogue(short trip) loop. For example, >> this implementation sets LOOP_VINFO_PEELING_FOR_NITER for epilogue >> loop and use it to control how it should be vectorized: >> + if (!LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo)) >> +{ >> + LOOP_VINFO_MASK_EPILOGUE (loop_vinfo) = false; >> + LOOP_VINFO_COMBINE_EPILOGUE (loop_vinfo) = false; >> +} >> + else if (LOOP_VINFO_CAN_BE_MASKED (loop_vinfo) >> + && min_profitable_combine_iters >= 0) >> +{ >> >> This works, but not that good for understanding or maintaining. >> >> Thanks, >> bin
Re: [PATCH, vec-tails] Support loop epilogue vectorization
Thanks Richard for your comments. Your proposed to handle epilogue loop just like normal short-trip loop but this is not exactly truth since e.g. epilogue must not be peeled for alignment. It is not clear for me what are my next steps? Should I re-design the patch completely or simply decompose the whole patch to different parts? But it means that we must start review process from beginning but release is closed to its end. Note also that i work for Intel till the end of year and have not idea who will continue working on this project. Any help will be appreciated. Thanks. Yuri. 2016-11-09 13:37 GMT+03:00 Bin.Cheng: > On Tue, Nov 1, 2016 at 12:38 PM, Yuri Rumyantsev wrote: >> Hi All, >> >> I re-send all patches sent by Ilya earlier for review which support >> vectorization of loop epilogues and loops with low trip count. We >> assume that the only patch - vec-tails-07-combine-tail.patch - was not >> approved by Jeff. >> >> I did re-base of all patches and performed bootstrapping and >> regression testing that did not show any new failures. Also all >> changes related to new vect_do_peeling algorithm have been changed >> accordingly. >> >> Is it OK for trunk? > > Hi, > I can't approve patches, but had some comments after going through the > implementation. > > One confusing part is cost model change, as well as the way it's used > to decide how epilogue loop should be vectorized. Given vect-tail is > disabled at the moment and the cost change needs further tuning, is it > reasonable to split this part out and get vectorization part > reviewed/submitted independently? For example, let user specified > parameters make the decision for now. Cost and target dependent > changes should go in at last, this could make the patch easier to > read. > > The implementation computes/shares quite amount information from main > loop to epilogue loop vectorization. Furthermore, variables/fields > for such information are somehow named in a misleading way. For > example. LOOP_VINFO_MASK_EPILOGUE gives me the impression this is the > flag controlling whether epilogue loop should be vectorized with > masking. However, it's actually controlled by exactly the same flag > as whether epilogue loop should be combined into the main loop with > masking: > @@ -7338,6 +8013,9 @@ vect_transform_loop (loop_vec_info loop_vinfo) > >slpeel_make_loop_iterate_ntimes (loop, niters_vector); > > + if (LOOP_VINFO_COMBINE_EPILOGUE (loop_vinfo)) > +vect_combine_loop_epilogue (loop_vinfo); > + >/* Reduce loop iterations by the vectorization factor. */ >scale_loop_profile (loop, GCOV_COMPUTE_SCALE (1, vf), >expected_iterations / vf); > > IMHO, we should decouple main loop vectorization and epilogue > vectorization as much as possible by sharing as few information as we > can. The general idea is to handle epilogue loop just like normal > short-trip loop. For example, we can rename > LOOP_VINFO_COMBINE_EPILOGUE into LOOP_VINFO_VECT_MASK (or something > else), and we don't differentiate its meaning between main and > epilogue(short-trip) loop. It only indicates the current loop should > be vectorized with masking no matter it's a main loop or epilogue > loop, and it works just like the current implementation. > > After this change, we can refine vectorization and make it more > general for normal loop and epilogue(short trip) loop. For example, > this implementation sets LOOP_VINFO_PEELING_FOR_NITER for epilogue > loop and use it to control how it should be vectorized: > + if (!LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo)) > +{ > + LOOP_VINFO_MASK_EPILOGUE (loop_vinfo) = false; > + LOOP_VINFO_COMBINE_EPILOGUE (loop_vinfo) = false; > +} > + else if (LOOP_VINFO_CAN_BE_MASKED (loop_vinfo) > + && min_profitable_combine_iters >= 0) > +{ > > This works, but not that good for understanding or maintaining. > > Thanks, > bin
Re: [PATCH, vec-tails] Support loop epilogue vectorization
On Tue, Nov 1, 2016 at 12:38 PM, Yuri Rumyantsevwrote: > Hi All, > > I re-send all patches sent by Ilya earlier for review which support > vectorization of loop epilogues and loops with low trip count. We > assume that the only patch - vec-tails-07-combine-tail.patch - was not > approved by Jeff. > > I did re-base of all patches and performed bootstrapping and > regression testing that did not show any new failures. Also all > changes related to new vect_do_peeling algorithm have been changed > accordingly. > > Is it OK for trunk? Hi, I can't approve patches, but had some comments after going through the implementation. One confusing part is cost model change, as well as the way it's used to decide how epilogue loop should be vectorized. Given vect-tail is disabled at the moment and the cost change needs further tuning, is it reasonable to split this part out and get vectorization part reviewed/submitted independently? For example, let user specified parameters make the decision for now. Cost and target dependent changes should go in at last, this could make the patch easier to read. The implementation computes/shares quite amount information from main loop to epilogue loop vectorization. Furthermore, variables/fields for such information are somehow named in a misleading way. For example. LOOP_VINFO_MASK_EPILOGUE gives me the impression this is the flag controlling whether epilogue loop should be vectorized with masking. However, it's actually controlled by exactly the same flag as whether epilogue loop should be combined into the main loop with masking: @@ -7338,6 +8013,9 @@ vect_transform_loop (loop_vec_info loop_vinfo) slpeel_make_loop_iterate_ntimes (loop, niters_vector); + if (LOOP_VINFO_COMBINE_EPILOGUE (loop_vinfo)) +vect_combine_loop_epilogue (loop_vinfo); + /* Reduce loop iterations by the vectorization factor. */ scale_loop_profile (loop, GCOV_COMPUTE_SCALE (1, vf), expected_iterations / vf); IMHO, we should decouple main loop vectorization and epilogue vectorization as much as possible by sharing as few information as we can. The general idea is to handle epilogue loop just like normal short-trip loop. For example, we can rename LOOP_VINFO_COMBINE_EPILOGUE into LOOP_VINFO_VECT_MASK (or something else), and we don't differentiate its meaning between main and epilogue(short-trip) loop. It only indicates the current loop should be vectorized with masking no matter it's a main loop or epilogue loop, and it works just like the current implementation. After this change, we can refine vectorization and make it more general for normal loop and epilogue(short trip) loop. For example, this implementation sets LOOP_VINFO_PEELING_FOR_NITER for epilogue loop and use it to control how it should be vectorized: + if (!LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo)) +{ + LOOP_VINFO_MASK_EPILOGUE (loop_vinfo) = false; + LOOP_VINFO_COMBINE_EPILOGUE (loop_vinfo) = false; +} + else if (LOOP_VINFO_CAN_BE_MASKED (loop_vinfo) + && min_profitable_combine_iters >= 0) +{ This works, but not that good for understanding or maintaining. Thanks, bin
Re: [PATCH, vec-tails] Support loop epilogue vectorization
Richard, Here is updated 3 patch. I checked that all new tests related to epilogue vectorization passed with it. Your comments will be appreciated. 2016-11-08 15:38 GMT+03:00 Richard Biener: > On Thu, 3 Nov 2016, Yuri Rumyantsev wrote: > >> Hi Richard, >> >> I did not understand your last remark: >> >> > That is, here (and avoid the FOR_EACH_LOOP change): >> > >> > @@ -580,12 +586,21 @@ vectorize_loops (void) >> > && dump_enabled_p ()) >> > dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, vect_location, >> >"loop vectorized\n"); >> > - vect_transform_loop (loop_vinfo); >> > + new_loop = vect_transform_loop (loop_vinfo); >> > num_vectorized_loops++; >> >/* Now that the loop has been vectorized, allow it to be unrolled >> > etc. */ >> > loop->force_vectorize = false; >> > >> > + /* Add new loop to a processing queue. To make it easier >> > + to match loop and its epilogue vectorization in dumps >> > + put new loop as the next loop to process. */ >> > + if (new_loop) >> > + { >> > + loops.safe_insert (i + 1, new_loop->num); >> > + vect_loops_num = number_of_loops (cfun); >> > + } >> > >> > simply dispatch to a vectorize_epilogue (loop_vinfo, new_loop) >> f> unction which will set up stuff properly (and also perform >> > the if-conversion of the epilogue there). >> > >> > That said, if we can get in non-masked epilogue vectorization >> > separately that would be great. >> >> Could you please clarify your proposal. > > When a loop was vectorized set things up to immediately vectorize > its epilogue, avoiding changing the loop iteration and avoiding > the re-use of ->aux. > > Richard. > >> Thanks. >> Yuri. >> >> 2016-11-02 15:27 GMT+03:00 Richard Biener : >> > On Tue, 1 Nov 2016, Yuri Rumyantsev wrote: >> > >> >> Hi All, >> >> >> >> I re-send all patches sent by Ilya earlier for review which support >> >> vectorization of loop epilogues and loops with low trip count. We >> >> assume that the only patch - vec-tails-07-combine-tail.patch - was not >> >> approved by Jeff. >> >> >> >> I did re-base of all patches and performed bootstrapping and >> >> regression testing that did not show any new failures. Also all >> >> changes related to new vect_do_peeling algorithm have been changed >> >> accordingly. >> >> >> >> Is it OK for trunk? >> > >> > I would have prefered that the series up to -03-nomask-tails would >> > _only_ contain epilogue loop vectorization changes but unfortunately >> > the patchset is oddly separated. >> > >> > I have a comment on that part nevertheless: >> > >> > @@ -1608,7 +1614,10 @@ vect_enhance_data_refs_alignment (loop_vec_info >> > loop_vinfo) >> >/* Check if we can possibly peel the loop. */ >> >if (!vect_can_advance_ivs_p (loop_vinfo) >> >|| !slpeel_can_duplicate_loop_p (loop, single_exit (loop)) >> > - || loop->inner) >> > + || loop->inner >> > + /* Required peeling was performed in prologue and >> > +is not required for epilogue. */ >> > + || LOOP_VINFO_EPILOGUE_P (loop_vinfo)) >> > do_peeling = false; >> > >> >if (do_peeling >> > @@ -1888,7 +1897,10 @@ vect_enhance_data_refs_alignment (loop_vec_info >> > loop_vinfo) >> > >> >do_versioning = >> > optimize_loop_nest_for_speed_p (loop) >> > - && (!loop->inner); /* FORNOW */ >> > + && (!loop->inner) /* FORNOW */ >> > +/* Required versioning was performed for the >> > + original loop and is not required for epilogue. */ >> > + && !LOOP_VINFO_EPILOGUE_P (loop_vinfo); >> > >> >if (do_versioning) >> > { >> > >> > please do that check in the single caller of this function. >> > >> > Otherwise I still dislike the new ->aux use and I believe that simply >> > passing down info from the processed parent would be _much_ cleaner. >> > That is, here (and avoid the FOR_EACH_LOOP change): >> > >> > @@ -580,12 +586,21 @@ vectorize_loops (void) >> > && dump_enabled_p ()) >> >dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, vect_location, >> > "loop vectorized\n"); >> > - vect_transform_loop (loop_vinfo); >> > + new_loop = vect_transform_loop (loop_vinfo); >> > num_vectorized_loops++; >> > /* Now that the loop has been vectorized, allow it to be unrolled >> >etc. */ >> > loop->force_vectorize = false; >> > >> > + /* Add new loop to a processing queue. To make it easier >> > + to match loop and its epilogue vectorization in dumps >> > + put new loop as the next loop to process. */ >> > + if (new_loop) >> > + { >> > + loops.safe_insert (i + 1, new_loop->num); >> > + vect_loops_num = number_of_loops (cfun); >> > + } >> > >> > simply dispatch to a vectorize_epilogue (loop_vinfo, new_loop) >> >
Re: [PATCH, vec-tails] Support loop epilogue vectorization
On Thu, 3 Nov 2016, Yuri Rumyantsev wrote: > Hi Richard, > > I did not understand your last remark: > > > That is, here (and avoid the FOR_EACH_LOOP change): > > > > @@ -580,12 +586,21 @@ vectorize_loops (void) > > && dump_enabled_p ()) > > dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, vect_location, > >"loop vectorized\n"); > > - vect_transform_loop (loop_vinfo); > > + new_loop = vect_transform_loop (loop_vinfo); > > num_vectorized_loops++; > >/* Now that the loop has been vectorized, allow it to be unrolled > > etc. */ > > loop->force_vectorize = false; > > > > + /* Add new loop to a processing queue. To make it easier > > + to match loop and its epilogue vectorization in dumps > > + put new loop as the next loop to process. */ > > + if (new_loop) > > + { > > + loops.safe_insert (i + 1, new_loop->num); > > + vect_loops_num = number_of_loops (cfun); > > + } > > > > simply dispatch to a vectorize_epilogue (loop_vinfo, new_loop) > f> unction which will set up stuff properly (and also perform > > the if-conversion of the epilogue there). > > > > That said, if we can get in non-masked epilogue vectorization > > separately that would be great. > > Could you please clarify your proposal. When a loop was vectorized set things up to immediately vectorize its epilogue, avoiding changing the loop iteration and avoiding the re-use of ->aux. Richard. > Thanks. > Yuri. > > 2016-11-02 15:27 GMT+03:00 Richard Biener: > > On Tue, 1 Nov 2016, Yuri Rumyantsev wrote: > > > >> Hi All, > >> > >> I re-send all patches sent by Ilya earlier for review which support > >> vectorization of loop epilogues and loops with low trip count. We > >> assume that the only patch - vec-tails-07-combine-tail.patch - was not > >> approved by Jeff. > >> > >> I did re-base of all patches and performed bootstrapping and > >> regression testing that did not show any new failures. Also all > >> changes related to new vect_do_peeling algorithm have been changed > >> accordingly. > >> > >> Is it OK for trunk? > > > > I would have prefered that the series up to -03-nomask-tails would > > _only_ contain epilogue loop vectorization changes but unfortunately > > the patchset is oddly separated. > > > > I have a comment on that part nevertheless: > > > > @@ -1608,7 +1614,10 @@ vect_enhance_data_refs_alignment (loop_vec_info > > loop_vinfo) > >/* Check if we can possibly peel the loop. */ > >if (!vect_can_advance_ivs_p (loop_vinfo) > >|| !slpeel_can_duplicate_loop_p (loop, single_exit (loop)) > > - || loop->inner) > > + || loop->inner > > + /* Required peeling was performed in prologue and > > +is not required for epilogue. */ > > + || LOOP_VINFO_EPILOGUE_P (loop_vinfo)) > > do_peeling = false; > > > >if (do_peeling > > @@ -1888,7 +1897,10 @@ vect_enhance_data_refs_alignment (loop_vec_info > > loop_vinfo) > > > >do_versioning = > > optimize_loop_nest_for_speed_p (loop) > > - && (!loop->inner); /* FORNOW */ > > + && (!loop->inner) /* FORNOW */ > > +/* Required versioning was performed for the > > + original loop and is not required for epilogue. */ > > + && !LOOP_VINFO_EPILOGUE_P (loop_vinfo); > > > >if (do_versioning) > > { > > > > please do that check in the single caller of this function. > > > > Otherwise I still dislike the new ->aux use and I believe that simply > > passing down info from the processed parent would be _much_ cleaner. > > That is, here (and avoid the FOR_EACH_LOOP change): > > > > @@ -580,12 +586,21 @@ vectorize_loops (void) > > && dump_enabled_p ()) > >dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, vect_location, > > "loop vectorized\n"); > > - vect_transform_loop (loop_vinfo); > > + new_loop = vect_transform_loop (loop_vinfo); > > num_vectorized_loops++; > > /* Now that the loop has been vectorized, allow it to be unrolled > >etc. */ > > loop->force_vectorize = false; > > > > + /* Add new loop to a processing queue. To make it easier > > + to match loop and its epilogue vectorization in dumps > > + put new loop as the next loop to process. */ > > + if (new_loop) > > + { > > + loops.safe_insert (i + 1, new_loop->num); > > + vect_loops_num = number_of_loops (cfun); > > + } > > > > simply dispatch to a vectorize_epilogue (loop_vinfo, new_loop) > > function which will set up stuff properly (and also perform > > the if-conversion of the epilogue there). > > > > That said, if we can get in non-masked epilogue vectorization > > separately that would be great. > > > > I'm still torn about all the rest of the stuff and question its > > usability (esp. merging the epilogue with the main
Re: [PATCH, vec-tails] Support loop epilogue vectorization
On November 5, 2016 3:40:04 AM GMT+01:00, Jeff Lawwrote: >On 11/02/2016 06:27 AM, Richard Biener wrote: >> I'm still torn about all the rest of the stuff and question its >> usability (esp. merging the epilogue with the main vector loop). >> But it has already been approved ... oh well. >Note that merging of the epilogue with the main vector loop may well be > >useful for SVE as well. I'm hoping Richard S. will chime in on how to >best exploit SVE. Possibly, but full exploitation of SVE requires a full vectorizer rewrite. The only thing I can see us implementing is making the vector size fixed via versioning, that is, let the user specify -mvecsize=N and if that's not the case at runtime trap or execute a scalar variant. Richard. >Jeff
Re: [PATCH, vec-tails] Support loop epilogue vectorization
On 11/02/2016 06:27 AM, Richard Biener wrote: I'm still torn about all the rest of the stuff and question its usability (esp. merging the epilogue with the main vector loop). But it has already been approved ... oh well. Note that merging of the epilogue with the main vector loop may well be useful for SVE as well. I'm hoping Richard S. will chime in on how to best exploit SVE. Jeff
Re: [PATCH, vec-tails] Support loop epilogue vectorization
Hi Richard, I did not understand your last remark: > That is, here (and avoid the FOR_EACH_LOOP change): > > @@ -580,12 +586,21 @@ vectorize_loops (void) > && dump_enabled_p ()) > dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, vect_location, >"loop vectorized\n"); > - vect_transform_loop (loop_vinfo); > + new_loop = vect_transform_loop (loop_vinfo); > num_vectorized_loops++; >/* Now that the loop has been vectorized, allow it to be unrolled > etc. */ > loop->force_vectorize = false; > > + /* Add new loop to a processing queue. To make it easier > + to match loop and its epilogue vectorization in dumps > + put new loop as the next loop to process. */ > + if (new_loop) > + { > + loops.safe_insert (i + 1, new_loop->num); > + vect_loops_num = number_of_loops (cfun); > + } > > simply dispatch to a vectorize_epilogue (loop_vinfo, new_loop) f> unction which will set up stuff properly (and also perform > the if-conversion of the epilogue there). > > That said, if we can get in non-masked epilogue vectorization > separately that would be great. Could you please clarify your proposal. Thanks. Yuri. 2016-11-02 15:27 GMT+03:00 Richard Biener: > On Tue, 1 Nov 2016, Yuri Rumyantsev wrote: > >> Hi All, >> >> I re-send all patches sent by Ilya earlier for review which support >> vectorization of loop epilogues and loops with low trip count. We >> assume that the only patch - vec-tails-07-combine-tail.patch - was not >> approved by Jeff. >> >> I did re-base of all patches and performed bootstrapping and >> regression testing that did not show any new failures. Also all >> changes related to new vect_do_peeling algorithm have been changed >> accordingly. >> >> Is it OK for trunk? > > I would have prefered that the series up to -03-nomask-tails would > _only_ contain epilogue loop vectorization changes but unfortunately > the patchset is oddly separated. > > I have a comment on that part nevertheless: > > @@ -1608,7 +1614,10 @@ vect_enhance_data_refs_alignment (loop_vec_info > loop_vinfo) >/* Check if we can possibly peel the loop. */ >if (!vect_can_advance_ivs_p (loop_vinfo) >|| !slpeel_can_duplicate_loop_p (loop, single_exit (loop)) > - || loop->inner) > + || loop->inner > + /* Required peeling was performed in prologue and > +is not required for epilogue. */ > + || LOOP_VINFO_EPILOGUE_P (loop_vinfo)) > do_peeling = false; > >if (do_peeling > @@ -1888,7 +1897,10 @@ vect_enhance_data_refs_alignment (loop_vec_info > loop_vinfo) > >do_versioning = > optimize_loop_nest_for_speed_p (loop) > - && (!loop->inner); /* FORNOW */ > + && (!loop->inner) /* FORNOW */ > +/* Required versioning was performed for the > + original loop and is not required for epilogue. */ > + && !LOOP_VINFO_EPILOGUE_P (loop_vinfo); > >if (do_versioning) > { > > please do that check in the single caller of this function. > > Otherwise I still dislike the new ->aux use and I believe that simply > passing down info from the processed parent would be _much_ cleaner. > That is, here (and avoid the FOR_EACH_LOOP change): > > @@ -580,12 +586,21 @@ vectorize_loops (void) > && dump_enabled_p ()) >dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, vect_location, > "loop vectorized\n"); > - vect_transform_loop (loop_vinfo); > + new_loop = vect_transform_loop (loop_vinfo); > num_vectorized_loops++; > /* Now that the loop has been vectorized, allow it to be unrolled >etc. */ > loop->force_vectorize = false; > > + /* Add new loop to a processing queue. To make it easier > + to match loop and its epilogue vectorization in dumps > + put new loop as the next loop to process. */ > + if (new_loop) > + { > + loops.safe_insert (i + 1, new_loop->num); > + vect_loops_num = number_of_loops (cfun); > + } > > simply dispatch to a vectorize_epilogue (loop_vinfo, new_loop) > function which will set up stuff properly (and also perform > the if-conversion of the epilogue there). > > That said, if we can get in non-masked epilogue vectorization > separately that would be great. > > I'm still torn about all the rest of the stuff and question its > usability (esp. merging the epilogue with the main vector loop). > But it has already been approved ... oh well. > > Thanks, > Richard.
Re: [PATCH, vec-tails] Support loop epilogue vectorization
On Tue, 1 Nov 2016, Yuri Rumyantsev wrote: > Hi All, > > I re-send all patches sent by Ilya earlier for review which support > vectorization of loop epilogues and loops with low trip count. We > assume that the only patch - vec-tails-07-combine-tail.patch - was not > approved by Jeff. > > I did re-base of all patches and performed bootstrapping and > regression testing that did not show any new failures. Also all > changes related to new vect_do_peeling algorithm have been changed > accordingly. > > Is it OK for trunk? I would have prefered that the series up to -03-nomask-tails would _only_ contain epilogue loop vectorization changes but unfortunately the patchset is oddly separated. I have a comment on that part nevertheless: @@ -1608,7 +1614,10 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo) /* Check if we can possibly peel the loop. */ if (!vect_can_advance_ivs_p (loop_vinfo) || !slpeel_can_duplicate_loop_p (loop, single_exit (loop)) - || loop->inner) + || loop->inner + /* Required peeling was performed in prologue and +is not required for epilogue. */ + || LOOP_VINFO_EPILOGUE_P (loop_vinfo)) do_peeling = false; if (do_peeling @@ -1888,7 +1897,10 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo) do_versioning = optimize_loop_nest_for_speed_p (loop) - && (!loop->inner); /* FORNOW */ + && (!loop->inner) /* FORNOW */ +/* Required versioning was performed for the + original loop and is not required for epilogue. */ + && !LOOP_VINFO_EPILOGUE_P (loop_vinfo); if (do_versioning) { please do that check in the single caller of this function. Otherwise I still dislike the new ->aux use and I believe that simply passing down info from the processed parent would be _much_ cleaner. That is, here (and avoid the FOR_EACH_LOOP change): @@ -580,12 +586,21 @@ vectorize_loops (void) && dump_enabled_p ()) dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, vect_location, "loop vectorized\n"); - vect_transform_loop (loop_vinfo); + new_loop = vect_transform_loop (loop_vinfo); num_vectorized_loops++; /* Now that the loop has been vectorized, allow it to be unrolled etc. */ loop->force_vectorize = false; + /* Add new loop to a processing queue. To make it easier + to match loop and its epilogue vectorization in dumps + put new loop as the next loop to process. */ + if (new_loop) + { + loops.safe_insert (i + 1, new_loop->num); + vect_loops_num = number_of_loops (cfun); + } simply dispatch to a vectorize_epilogue (loop_vinfo, new_loop) function which will set up stuff properly (and also perform the if-conversion of the epilogue there). That said, if we can get in non-masked epilogue vectorization separately that would be great. I'm still torn about all the rest of the stuff and question its usability (esp. merging the epilogue with the main vector loop). But it has already been approved ... oh well. Thanks, Richard.