Re: [PATCH] Avoid useless work in loop vectorization
On Wed, 18 Nov 2015, Alan Lawrence wrote: > On 13/11/15 08:41, Richard Biener wrote: > > > > Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk. > > > > Richard. > > > > 2015-11-13 Richard Biener> > > > * tree-vect-loop.c (vect_analyze_loop_2): Add fatal parameter. > > Signal fatal failure if early checks fail. > > (vect_analyze_loop): If vect_analyze_loop_2 fails fatally > > do not bother testing further vector sizes. > > It seems that on AArch64 this causes: > > FAIL: gcc.dg/vect/vect-outer-1-big-array.c -flto -ffat-lto-objects > scan-tree-dump-times vect "grouped access in outer loop" 2 > FAIL: gcc.dg/vect/vect-outer-1-big-array.c scan-tree-dump-times vect "grouped > access in outer loop" 2 > FAIL: gcc.dg/vect/vect-outer-1.c -flto -ffat-lto-objects scan-tree-dump-times > vect "grouped access in outer loop" 2 > FAIL: gcc.dg/vect/vect-outer-1.c scan-tree-dump-times vect "grouped access in > outer loop" 2 > FAIL: gcc.dg/vect/vect-outer-1a-big-array.c -flto -ffat-lto-objects > scan-tree-dump-times vect "grouped access in outer loop" 2 > FAIL: gcc.dg/vect/vect-outer-1a-big-array.c scan-tree-dump-times vect "grouped > access in outer loop" 2 > FAIL: gcc.dg/vect/vect-outer-1a.c -flto -ffat-lto-objects > scan-tree-dump-times vect "grouped access in outer loop" 2 > FAIL: gcc.dg/vect/vect-outer-1a.c scan-tree-dump-times vect "grouped access in > outer loop" 2 > FAIL: gcc.dg/vect/vect-outer-1b-big-array.c -flto -ffat-lto-objects > scan-tree-dump-times vect "grouped access in outer loop" 2 > FAIL: gcc.dg/vect/vect-outer-1b-big-array.c scan-tree-dump-times vect "grouped > access in outer loop" 2 > FAIL: gcc.dg/vect/vect-outer-1b.c -flto -ffat-lto-objects > scan-tree-dump-times vect "grouped access in outer loop" 2 > FAIL: gcc.dg/vect/vect-outer-1b.c scan-tree-dump-times vect "grouped access in > outer loop" 2 > FAIL: gcc.dg/vect/vect-outer-2b.c -flto -ffat-lto-objects > scan-tree-dump-times vect "grouped access in outer loop" 2 > FAIL: gcc.dg/vect/vect-outer-2b.c scan-tree-dump-times vect "grouped access in > outer loop" 2 > FAIL: gcc.dg/vect/vect-outer-3b.c -flto -ffat-lto-objects > scan-tree-dump-times vect "grouped access in outer loop" 4 > FAIL: gcc.dg/vect/vect-outer-3b.c scan-tree-dump-times vect "grouped access in > outer loop" 4 > > Still there on r230556, I haven't dug any further yet. Probably a testsuite issue as we have /* { dg-final { scan-tree-dump-times "grouped access in outer loop" 1 "vect" { target { ! vect_multiple_sizes } } } } */ /* { dg-final { scan-tree-dump-times "grouped access in outer loop" 2 "vect" { target vect_multiple_sizes } } } */ and now possibly terminate early before considering the other vector size(s). Richard.
Re: [PATCH] Avoid useless work in loop vectorization
On 13/11/15 08:41, Richard Biener wrote: Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk. Richard. 2015-11-13 Richard Biener* tree-vect-loop.c (vect_analyze_loop_2): Add fatal parameter. Signal fatal failure if early checks fail. (vect_analyze_loop): If vect_analyze_loop_2 fails fatally do not bother testing further vector sizes. It seems that on AArch64 this causes: FAIL: gcc.dg/vect/vect-outer-1-big-array.c -flto -ffat-lto-objects scan-tree-dump-times vect "grouped access in outer loop" 2 FAIL: gcc.dg/vect/vect-outer-1-big-array.c scan-tree-dump-times vect "grouped access in outer loop" 2 FAIL: gcc.dg/vect/vect-outer-1.c -flto -ffat-lto-objects scan-tree-dump-times vect "grouped access in outer loop" 2 FAIL: gcc.dg/vect/vect-outer-1.c scan-tree-dump-times vect "grouped access in outer loop" 2 FAIL: gcc.dg/vect/vect-outer-1a-big-array.c -flto -ffat-lto-objects scan-tree-dump-times vect "grouped access in outer loop" 2 FAIL: gcc.dg/vect/vect-outer-1a-big-array.c scan-tree-dump-times vect "grouped access in outer loop" 2 FAIL: gcc.dg/vect/vect-outer-1a.c -flto -ffat-lto-objects scan-tree-dump-times vect "grouped access in outer loop" 2 FAIL: gcc.dg/vect/vect-outer-1a.c scan-tree-dump-times vect "grouped access in outer loop" 2 FAIL: gcc.dg/vect/vect-outer-1b-big-array.c -flto -ffat-lto-objects scan-tree-dump-times vect "grouped access in outer loop" 2 FAIL: gcc.dg/vect/vect-outer-1b-big-array.c scan-tree-dump-times vect "grouped access in outer loop" 2 FAIL: gcc.dg/vect/vect-outer-1b.c -flto -ffat-lto-objects scan-tree-dump-times vect "grouped access in outer loop" 2 FAIL: gcc.dg/vect/vect-outer-1b.c scan-tree-dump-times vect "grouped access in outer loop" 2 FAIL: gcc.dg/vect/vect-outer-2b.c -flto -ffat-lto-objects scan-tree-dump-times vect "grouped access in outer loop" 2 FAIL: gcc.dg/vect/vect-outer-2b.c scan-tree-dump-times vect "grouped access in outer loop" 2 FAIL: gcc.dg/vect/vect-outer-3b.c -flto -ffat-lto-objects scan-tree-dump-times vect "grouped access in outer loop" 4 FAIL: gcc.dg/vect/vect-outer-3b.c scan-tree-dump-times vect "grouped access in outer loop" 4 Still there on r230556, I haven't dug any further yet. Thanks, Alan
[PATCH] Avoid useless work in loop vectorization
Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk. Richard. 2015-11-13 Richard Biener* tree-vect-loop.c (vect_analyze_loop_2): Add fatal parameter. Signal fatal failure if early checks fail. (vect_analyze_loop): If vect_analyze_loop_2 fails fatally do not bother testing further vector sizes. Index: gcc/tree-vect-loop.c === --- gcc/tree-vect-loop.c(revision 230260) +++ gcc/tree-vect-loop.c(working copy) @@ -1709,13 +1709,16 @@ vect_analyze_loop_operations (loop_vec_i for it. The different analyses will record information in the loop_vec_info struct. */ static bool -vect_analyze_loop_2 (loop_vec_info loop_vinfo) +vect_analyze_loop_2 (loop_vec_info loop_vinfo, bool ) { bool ok; int max_vf = MAX_VECTORIZATION_FACTOR; int min_vf = 2; unsigned int n_stmts = 0; + /* The first group of checks is independent of the vector size. */ + fatal = true; + /* Find all data references in the loop (which correspond to vdefs/vuses) and analyze their evolution in the loop. */ @@ -1795,7 +1798,6 @@ vect_analyze_loop_2 (loop_vec_info loop_ /* Classify all cross-iteration scalar data-flow cycles. Cross-iteration cycles caused by virtual phis are analyzed separately. */ - vect_analyze_scalar_cycles (loop_vinfo); vect_pattern_recog (loop_vinfo); @@ -1825,6 +1827,9 @@ vect_analyze_loop_2 (loop_vec_info loop_ return false; } + /* While the rest of the analysis below depends on it in some way. */ + fatal = false; + /* Analyze data dependences between the data-refs in the loop and adjust the maximum vectorization factor according to the dependences. @@ -2118,7 +2169,8 @@ vect_analyze_loop (struct loop *loop) return NULL; } - if (vect_analyze_loop_2 (loop_vinfo)) + bool fatal = false; + if (vect_analyze_loop_2 (loop_vinfo, fatal)) { LOOP_VINFO_VECTORIZABLE_P (loop_vinfo) = 1; @@ -2128,7 +2180,8 @@ vect_analyze_loop (struct loop *loop) destroy_loop_vec_info (loop_vinfo, true); vector_sizes &= ~current_vector_size; - if (vector_sizes == 0 + if (fatal + || vector_sizes == 0 || current_vector_size == 0) return NULL;