Re: [PATCH] Fix PR68707
On 10 January 2016 at 13:15, Thomas Schwingewrote: > Hi! > > On Fri, 8 Jan 2016 11:30:16 +, Alan Lawrence > wrote: >> Here's an alternative patch, [...] > >> +/* { dg-final { scan-tree-dump "note: Built SLP cancelled: can use >> load/store-lanes" { target { vect_perm && vect_load_lanes } } } } */ > > For all these, you're missing to provide the "suffix" in the > 'scan-tree-dump "note: [...]' directives, which results in: > > UNRESOLVED: gcc.dg/vect/slp-perm-1.c -flto -ffat-lto-objects > scan-tree-dump target { vect_perm && vect_load_lanes } "note: Built SLP > cancelled: can use load/store-lanes" > > The target selector doesn't trigger in my x86_64 GNU/Linux testing, but > to get rid of the UNRESOLVEDs, in r232197 I fixed at least the syntax as > follows: > Not sure if I need to mention that these updated tests fail on armeb; I guess that falls into the known problems with vectorization on armeb. Christophe. > commit 880ed4be84bfc9cec83d7c9718fd4f87a9ca8f39 > Author: tschwinge > Date: Sun Jan 10 12:12:38 2016 + > > Fix scan-tree-dump syntax > > gcc/testsuite/ > * gcc.dg/vect/slp-perm-1.c: Fix scan-tree-dump syntax. > * gcc.dg/vect/slp-perm-2.c: Likewise. > * gcc.dg/vect/slp-perm-3.c: Likewise. > * gcc.dg/vect/slp-perm-5.c: Likewise. > * gcc.dg/vect/slp-perm-6.c: Likewise. > * gcc.dg/vect/slp-perm-7.c: Likewise. > * gcc.dg/vect/slp-perm-8.c: Likewise. > > git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@232197 > 138bc75d-0d04-0410-961f-82ee72b054a4 > --- > gcc/testsuite/ChangeLog| 10 ++ > gcc/testsuite/gcc.dg/vect/slp-perm-1.c |2 +- > gcc/testsuite/gcc.dg/vect/slp-perm-2.c |2 +- > gcc/testsuite/gcc.dg/vect/slp-perm-3.c |2 +- > gcc/testsuite/gcc.dg/vect/slp-perm-5.c |2 +- > gcc/testsuite/gcc.dg/vect/slp-perm-6.c |2 +- > gcc/testsuite/gcc.dg/vect/slp-perm-7.c |2 +- > gcc/testsuite/gcc.dg/vect/slp-perm-8.c |2 +- > 8 files changed, 17 insertions(+), 7 deletions(-) > > diff --git gcc/testsuite/ChangeLog gcc/testsuite/ChangeLog > index 198f26c..3005c9f 100644 > --- gcc/testsuite/ChangeLog > +++ gcc/testsuite/ChangeLog > @@ -1,3 +1,13 @@ > +2016-01-10 Thomas Schwinge > + > + * gcc.dg/vect/slp-perm-1.c: Fix scan-tree-dump syntax. > + * gcc.dg/vect/slp-perm-2.c: Likewise. > + * gcc.dg/vect/slp-perm-3.c: Likewise. > + * gcc.dg/vect/slp-perm-5.c: Likewise. > + * gcc.dg/vect/slp-perm-6.c: Likewise. > + * gcc.dg/vect/slp-perm-7.c: Likewise. > + * gcc.dg/vect/slp-perm-8.c: Likewise. > + > 2016-01-10 Tom de Vries > > PR tree-optimization/69039 > diff --git gcc/testsuite/gcc.dg/vect/slp-perm-1.c > gcc/testsuite/gcc.dg/vect/slp-perm-1.c > index 6d0d66f..ee211f2 100644 > --- gcc/testsuite/gcc.dg/vect/slp-perm-1.c > +++ gcc/testsuite/gcc.dg/vect/slp-perm-1.c > @@ -60,7 +60,7 @@ int main (int argc, const char* argv[]) > /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target > vect_perm } } } */ > /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" > { target { vect_perm && {! vect_load_lanes } } } } } */ > /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" > { target vect_load_lanes } } } */ > -/* { dg-final { scan-tree-dump "note: Built SLP cancelled: can use > load/store-lanes" { target { vect_perm && vect_load_lanes } } } } */ > +/* { dg-final { scan-tree-dump "note: Built SLP cancelled: can use > load/store-lanes" "vect" { target { vect_perm && vect_load_lanes } } } } */ > /* { dg-final { scan-tree-dump "LOAD_LANES" "vect" { target vect_load_lanes > } } } */ > /* { dg-final { scan-tree-dump "STORE_LANES" "vect" { target vect_load_lanes > } } } */ > > diff --git gcc/testsuite/gcc.dg/vect/slp-perm-2.c > gcc/testsuite/gcc.dg/vect/slp-perm-2.c > index 34b413d..4bab348 100644 > --- gcc/testsuite/gcc.dg/vect/slp-perm-2.c > +++ gcc/testsuite/gcc.dg/vect/slp-perm-2.c > @@ -56,6 +56,6 @@ int main (int argc, const char* argv[]) > /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target > vect_perm } } } */ > /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" > { target { vect_perm && {! vect_load_lanes } } } } } */ > /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" > { target vect_load_lanes } } } */ > -/* { dg-final { scan-tree-dump "note: Built SLP cancelled: can use > load/store-lanes" { target { vect_perm && vect_load_lanes } } } } */ > +/* { dg-final { scan-tree-dump "note: Built SLP cancelled: can use > load/store-lanes" "vect" { target { vect_perm && vect_load_lanes } } } } */ > /* { dg-final { scan-tree-dump "LOAD_LANES" "vect" { target vect_load_lanes > } } } */ > /* { dg-final { scan-tree-dump
Re: [PATCH] Fix PR68707
Hi! On Fri, 8 Jan 2016 11:30:16 +, Alan Lawrencewrote: > Here's an alternative patch, [...] > +/* { dg-final { scan-tree-dump "note: Built SLP cancelled: can use > load/store-lanes" { target { vect_perm && vect_load_lanes } } } } */ For all these, you're missing to provide the "suffix" in the 'scan-tree-dump "note: [...]' directives, which results in: UNRESOLVED: gcc.dg/vect/slp-perm-1.c -flto -ffat-lto-objects scan-tree-dump target { vect_perm && vect_load_lanes } "note: Built SLP cancelled: can use load/store-lanes" The target selector doesn't trigger in my x86_64 GNU/Linux testing, but to get rid of the UNRESOLVEDs, in r232197 I fixed at least the syntax as follows: commit 880ed4be84bfc9cec83d7c9718fd4f87a9ca8f39 Author: tschwinge Date: Sun Jan 10 12:12:38 2016 + Fix scan-tree-dump syntax gcc/testsuite/ * gcc.dg/vect/slp-perm-1.c: Fix scan-tree-dump syntax. * gcc.dg/vect/slp-perm-2.c: Likewise. * gcc.dg/vect/slp-perm-3.c: Likewise. * gcc.dg/vect/slp-perm-5.c: Likewise. * gcc.dg/vect/slp-perm-6.c: Likewise. * gcc.dg/vect/slp-perm-7.c: Likewise. * gcc.dg/vect/slp-perm-8.c: Likewise. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@232197 138bc75d-0d04-0410-961f-82ee72b054a4 --- gcc/testsuite/ChangeLog| 10 ++ gcc/testsuite/gcc.dg/vect/slp-perm-1.c |2 +- gcc/testsuite/gcc.dg/vect/slp-perm-2.c |2 +- gcc/testsuite/gcc.dg/vect/slp-perm-3.c |2 +- gcc/testsuite/gcc.dg/vect/slp-perm-5.c |2 +- gcc/testsuite/gcc.dg/vect/slp-perm-6.c |2 +- gcc/testsuite/gcc.dg/vect/slp-perm-7.c |2 +- gcc/testsuite/gcc.dg/vect/slp-perm-8.c |2 +- 8 files changed, 17 insertions(+), 7 deletions(-) diff --git gcc/testsuite/ChangeLog gcc/testsuite/ChangeLog index 198f26c..3005c9f 100644 --- gcc/testsuite/ChangeLog +++ gcc/testsuite/ChangeLog @@ -1,3 +1,13 @@ +2016-01-10 Thomas Schwinge + + * gcc.dg/vect/slp-perm-1.c: Fix scan-tree-dump syntax. + * gcc.dg/vect/slp-perm-2.c: Likewise. + * gcc.dg/vect/slp-perm-3.c: Likewise. + * gcc.dg/vect/slp-perm-5.c: Likewise. + * gcc.dg/vect/slp-perm-6.c: Likewise. + * gcc.dg/vect/slp-perm-7.c: Likewise. + * gcc.dg/vect/slp-perm-8.c: Likewise. + 2016-01-10 Tom de Vries PR tree-optimization/69039 diff --git gcc/testsuite/gcc.dg/vect/slp-perm-1.c gcc/testsuite/gcc.dg/vect/slp-perm-1.c index 6d0d66f..ee211f2 100644 --- gcc/testsuite/gcc.dg/vect/slp-perm-1.c +++ gcc/testsuite/gcc.dg/vect/slp-perm-1.c @@ -60,7 +60,7 @@ int main (int argc, const char* argv[]) /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_perm } } } */ /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { target { vect_perm && {! vect_load_lanes } } } } } */ /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" { target vect_load_lanes } } } */ -/* { dg-final { scan-tree-dump "note: Built SLP cancelled: can use load/store-lanes" { target { vect_perm && vect_load_lanes } } } } */ +/* { dg-final { scan-tree-dump "note: Built SLP cancelled: can use load/store-lanes" "vect" { target { vect_perm && vect_load_lanes } } } } */ /* { dg-final { scan-tree-dump "LOAD_LANES" "vect" { target vect_load_lanes } } } */ /* { dg-final { scan-tree-dump "STORE_LANES" "vect" { target vect_load_lanes } } } */ diff --git gcc/testsuite/gcc.dg/vect/slp-perm-2.c gcc/testsuite/gcc.dg/vect/slp-perm-2.c index 34b413d..4bab348 100644 --- gcc/testsuite/gcc.dg/vect/slp-perm-2.c +++ gcc/testsuite/gcc.dg/vect/slp-perm-2.c @@ -56,6 +56,6 @@ int main (int argc, const char* argv[]) /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_perm } } } */ /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { target { vect_perm && {! vect_load_lanes } } } } } */ /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" { target vect_load_lanes } } } */ -/* { dg-final { scan-tree-dump "note: Built SLP cancelled: can use load/store-lanes" { target { vect_perm && vect_load_lanes } } } } */ +/* { dg-final { scan-tree-dump "note: Built SLP cancelled: can use load/store-lanes" "vect" { target { vect_perm && vect_load_lanes } } } } */ /* { dg-final { scan-tree-dump "LOAD_LANES" "vect" { target vect_load_lanes } } } */ /* { dg-final { scan-tree-dump "STORE_LANES" "vect" { target vect_load_lanes } } } */ diff --git gcc/testsuite/gcc.dg/vect/slp-perm-3.c gcc/testsuite/gcc.dg/vect/slp-perm-3.c index ec13e0f..568e400 100644 --- gcc/testsuite/gcc.dg/vect/slp-perm-3.c +++ gcc/testsuite/gcc.dg/vect/slp-perm-3.c @@ -69,7 +69,7 @@ int main (int argc, const char* argv[]) /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_perm } } } */ /* {
Re: [PATCH] Fix PR68707
On Fri, 8 Jan 2016, Alan Lawrence wrote: > Here's an alternative patch, using the hunk from > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68707#c16, which 'improves' (heh, > we think) on the previous by allowing SLP to proceed where the loads are > strided, for example, slp-perm-11.c. Also I fix the testsuite failures > (looking > for SLP where we've now decided to do load/store-lanes instead). I used simple > scan-tree-dump (rather than -times) because load_lanes typically appears many > times in the log for each instance output, and it felt fragile to look for a > specific number. > > Note, this doesn't fix PR67323 - that's an example where unpermuted SLP is > still (a bit) worse than load-lanes, as it needs unrolling * 2. The > previously-posted patch doesn't fix this either, however > (https://gcc.gnu.org/ml/gcc-patches/2015-12/msg01620.html). > > Bootstrapped + check-gcc + check-g++ on x86_64 (no changes), arm and aarch64 > (various new scan-tree-dump tests all passing, and PASS->FAIL for > O3-pr36098.c). Looks good to me. Ok if arm maintainers agree. Thanks, Richard. > gcc/ChangeLog: > > 2016-01-XX Alan Lawrence> Richard Biener > > * tree-vect-slp.c (vect_analyze_slp_instance): Cancel permuted SLP > instances that can be handled via vect_load_lanes. > > gcc/testsuite/ChangeLog: > > * lib/target-supports.exp (check_effective_target_vect_load_lanes): New. > * gcc.dg/vect/slp-perm-1.c: Look for vect_load_lanes instead of SLP > on platforms supporting it. > * gcc.dg/vect/slp-perm-2.c: Likewise. > * gcc.dg/vect/slp-perm-3.c: Likewise. > * gcc.dg/vect/slp-perm-5.c: Likewise. > * gcc.dg/vect/slp-perm-7.c: Likewise. > * gcc.dg/vect/slp-perm-8.c: Likewise. > * gcc.dg/vect/slp-perm-6.c: Look for vect_load_lanes in addition to SLP > on platforms supporting it. > --- > gcc/testsuite/gcc.dg/vect/slp-perm-1.c | 6 +- > gcc/testsuite/gcc.dg/vect/slp-perm-2.c | 7 +-- > gcc/testsuite/gcc.dg/vect/slp-perm-3.c | 7 +-- > gcc/testsuite/gcc.dg/vect/slp-perm-5.c | 7 +-- > gcc/testsuite/gcc.dg/vect/slp-perm-6.c | 7 +-- > gcc/testsuite/gcc.dg/vect/slp-perm-7.c | 8 +--- > gcc/testsuite/gcc.dg/vect/slp-perm-8.c | 7 +-- > gcc/testsuite/lib/target-supports.exp | 19 +++ > gcc/tree-vect-slp.c| 30 ++ > 9 files changed, 84 insertions(+), 14 deletions(-) > > diff --git a/gcc/testsuite/gcc.dg/vect/slp-perm-1.c > b/gcc/testsuite/gcc.dg/vect/slp-perm-1.c > index 2b6c134..6d0d66f 100644 > --- a/gcc/testsuite/gcc.dg/vect/slp-perm-1.c > +++ b/gcc/testsuite/gcc.dg/vect/slp-perm-1.c > @@ -58,5 +58,9 @@ int main (int argc, const char* argv[]) > } > > /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target > vect_perm } } } */ > -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" > { target vect_perm } } } */ > +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" > { target { vect_perm && {! vect_load_lanes } } } } } */ > +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" > { target vect_load_lanes } } } */ > +/* { dg-final { scan-tree-dump "note: Built SLP cancelled: can use > load/store-lanes" { target { vect_perm && vect_load_lanes } } } } */ > +/* { dg-final { scan-tree-dump "LOAD_LANES" "vect" { target vect_load_lanes > } } } */ > +/* { dg-final { scan-tree-dump "STORE_LANES" "vect" { target vect_load_lanes > } } } */ > > diff --git a/gcc/testsuite/gcc.dg/vect/slp-perm-2.c > b/gcc/testsuite/gcc.dg/vect/slp-perm-2.c > index da50e16..34b413d 100644 > --- a/gcc/testsuite/gcc.dg/vect/slp-perm-2.c > +++ b/gcc/testsuite/gcc.dg/vect/slp-perm-2.c > @@ -54,5 +54,8 @@ int main (int argc, const char* argv[]) > } > > /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target > vect_perm } } } */ > -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" > { target vect_perm } } } */ > - > +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" > { target { vect_perm && {! vect_load_lanes } } } } } */ > +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" > { target vect_load_lanes } } } */ > +/* { dg-final { scan-tree-dump "note: Built SLP cancelled: can use > load/store-lanes" { target { vect_perm && vect_load_lanes } } } } */ > +/* { dg-final { scan-tree-dump "LOAD_LANES" "vect" { target vect_load_lanes > } } } */ > +/* { dg-final { scan-tree-dump "STORE_LANES" "vect" { target vect_load_lanes > } } } */ > diff --git a/gcc/testsuite/gcc.dg/vect/slp-perm-3.c > b/gcc/testsuite/gcc.dg/vect/slp-perm-3.c > index 1d33f9b..ec13e0f 100644 > --- a/gcc/testsuite/gcc.dg/vect/slp-perm-3.c > +++ b/gcc/testsuite/gcc.dg/vect/slp-perm-3.c > @@ -67,6 +67,9 @@ int main (int argc, const char* argv[]) > } > > /* {
Re: [PATCH] Fix PR68707
On 08/01/16 11:45, Richard Biener wrote: > On Fri, 8 Jan 2016, Alan Lawrence wrote: > >> Here's an alternative patch, using the hunk from >> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68707#c16, which 'improves' >> (heh, >> we think) on the previous by allowing SLP to proceed where the loads are >> strided, for example, slp-perm-11.c. Also I fix the testsuite failures >> (looking >> for SLP where we've now decided to do load/store-lanes instead). I used >> simple >> scan-tree-dump (rather than -times) because load_lanes typically appears many >> times in the log for each instance output, and it felt fragile to look for a >> specific number. >> >> Note, this doesn't fix PR67323 - that's an example where unpermuted SLP is >> still (a bit) worse than load-lanes, as it needs unrolling * 2. The >> previously-posted patch doesn't fix this either, however >> (https://gcc.gnu.org/ml/gcc-patches/2015-12/msg01620.html). >> >> Bootstrapped + check-gcc + check-g++ on x86_64 (no changes), arm and aarch64 >> (various new scan-tree-dump tests all passing, and PASS->FAIL for >> O3-pr36098.c). > > Looks good to me. Ok if arm maintainers agree. > OK by me. R. > Thanks, > Richard. > >> gcc/ChangeLog: >> >> 2016-01-XX Alan Lawrence>> Richard Biener >> >> * tree-vect-slp.c (vect_analyze_slp_instance): Cancel permuted SLP >> instances that can be handled via vect_load_lanes. >> >> gcc/testsuite/ChangeLog: >> >> * lib/target-supports.exp (check_effective_target_vect_load_lanes): New. >> * gcc.dg/vect/slp-perm-1.c: Look for vect_load_lanes instead of SLP >> on platforms supporting it. >> * gcc.dg/vect/slp-perm-2.c: Likewise. >> * gcc.dg/vect/slp-perm-3.c: Likewise. >> * gcc.dg/vect/slp-perm-5.c: Likewise. >> * gcc.dg/vect/slp-perm-7.c: Likewise. >> * gcc.dg/vect/slp-perm-8.c: Likewise. >> * gcc.dg/vect/slp-perm-6.c: Look for vect_load_lanes in addition to SLP >> on platforms supporting it. >> --- >> gcc/testsuite/gcc.dg/vect/slp-perm-1.c | 6 +- >> gcc/testsuite/gcc.dg/vect/slp-perm-2.c | 7 +-- >> gcc/testsuite/gcc.dg/vect/slp-perm-3.c | 7 +-- >> gcc/testsuite/gcc.dg/vect/slp-perm-5.c | 7 +-- >> gcc/testsuite/gcc.dg/vect/slp-perm-6.c | 7 +-- >> gcc/testsuite/gcc.dg/vect/slp-perm-7.c | 8 +--- >> gcc/testsuite/gcc.dg/vect/slp-perm-8.c | 7 +-- >> gcc/testsuite/lib/target-supports.exp | 19 +++ >> gcc/tree-vect-slp.c| 30 ++ >> 9 files changed, 84 insertions(+), 14 deletions(-) >> >> diff --git a/gcc/testsuite/gcc.dg/vect/slp-perm-1.c >> b/gcc/testsuite/gcc.dg/vect/slp-perm-1.c >> index 2b6c134..6d0d66f 100644 >> --- a/gcc/testsuite/gcc.dg/vect/slp-perm-1.c >> +++ b/gcc/testsuite/gcc.dg/vect/slp-perm-1.c >> @@ -58,5 +58,9 @@ int main (int argc, const char* argv[]) >> } >> >> /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { >> target vect_perm } } } */ >> -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" >> { target vect_perm } } } */ >> +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" >> { target { vect_perm && {! vect_load_lanes } } } } } */ >> +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" >> { target vect_load_lanes } } } */ >> +/* { dg-final { scan-tree-dump "note: Built SLP cancelled: can use >> load/store-lanes" { target { vect_perm && vect_load_lanes } } } } */ >> +/* { dg-final { scan-tree-dump "LOAD_LANES" "vect" { target vect_load_lanes >> } } } */ >> +/* { dg-final { scan-tree-dump "STORE_LANES" "vect" { target >> vect_load_lanes } } } */ >> >> diff --git a/gcc/testsuite/gcc.dg/vect/slp-perm-2.c >> b/gcc/testsuite/gcc.dg/vect/slp-perm-2.c >> index da50e16..34b413d 100644 >> --- a/gcc/testsuite/gcc.dg/vect/slp-perm-2.c >> +++ b/gcc/testsuite/gcc.dg/vect/slp-perm-2.c >> @@ -54,5 +54,8 @@ int main (int argc, const char* argv[]) >> } >> >> /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { >> target vect_perm } } } */ >> -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" >> { target vect_perm } } } */ >> - >> +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" >> { target { vect_perm && {! vect_load_lanes } } } } } */ >> +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" >> { target vect_load_lanes } } } */ >> +/* { dg-final { scan-tree-dump "note: Built SLP cancelled: can use >> load/store-lanes" { target { vect_perm && vect_load_lanes } } } } */ >> +/* { dg-final { scan-tree-dump "LOAD_LANES" "vect" { target vect_load_lanes >> } } } */ >> +/* { dg-final { scan-tree-dump "STORE_LANES" "vect" { target >> vect_load_lanes } } } */ >> diff --git a/gcc/testsuite/gcc.dg/vect/slp-perm-3.c >> b/gcc/testsuite/gcc.dg/vect/slp-perm-3.c >> index 1d33f9b..ec13e0f 100644 >> ---
Re: [PATCH] Fix PR68707, 67323
On 16/12/15 15:01, Richard Biener wrote: The following patch adds a heuristic to prefer store/load-lanes over SLP when vectorizing. Compared to the variant attached to the PR I made the STMT_VINFO_STRIDED_P behavior explicit (matching what you've tested). Not sure I follow this. Compared to the variant attached to the PR - we will now attempt to use load-lanes, if (say) all of the loads are strided, even if we know we don't support load-lanes (for any of them). That sounds the wrong way around and I think rather different to what you proposed earlier? (At the least, the debug message "can use load/store lanes" is potentially misleading, that's not necessarily the case!) There are arguments that we want to do less SLP, generally, on ARM/AArch64 but I think Wilco's permute cost patch https://gcc.gnu.org/ml/gcc-patches/2015-12/msg01469.html is a better way of achieving that? Just my gut feeling at this point - I haven't evaluated this version of the patch on any benchmarks etc... Thanks, Alan
Re: [PATCH] Fix PR68707, 67323
On 17/12/15 10:46, Richard Biener wrote: On Thu, 17 Dec 2015, Alan Lawrence wrote: On 16/12/15 15:01, Richard Biener wrote: The following patch adds a heuristic to prefer store/load-lanes over SLP when vectorizing. Compared to the variant attached to the PR I made the STMT_VINFO_STRIDED_P behavior explicit (matching what you've tested). Not sure I follow this. Compared to the variant attached to the PR - we will now attempt to use load-lanes, if (say) all of the loads are strided, even if we know we don't support load-lanes (for any of them). That sounds the wrong way around and I think rather different to what you proposed earlier? (At the least, the debug message "can use load/store lanes" is potentially misleading, that's not necessarily the case!) Ah, indeed. Note that the whole thing is still guarded by the check that we can use store-lanes for the store. I can also do it the other way around (as previously proposed) which would change outcome for slp-perm-11.c. That proposal would not reject the SLP if there were any strided grouped loads involved. Indeed; the STMT_VINFO_STRIDED_P || !vect_load_lanes_supported approach (as on PR68707) vectorizes slp-perm-11.c with SLP, which works much better than the !STMT_VINFO_STRIDED_P && !vect_load_lanes_supported, which tries to use st2 (and only sort-of works - you get an st2 output, but no ld2, and lots of faff). I think I move for the patch from PR68707, therefore. (Ramana - any thoughts?) Btw, another option is to push the decision past full SLP analysis and thus make the decision globally for all SLP instances - currently SLP instances are cancelled one a one-by-one basis meaning we might do SLP plus load/store-lanes in the same loop. I don't see anything inherently wrong with doing both in the same loop. On simple loops, I suspect we'll do better committing to one strategy or the other (tho really it's only the VF required I think?), but then, on such simple loops, there are probably not very many SLP instances! Maybe we have to go all the way to implementing a better vectorization cost hook just for the permutations - the SLP path in theory knows exactly which ones it will generate. Yes, I think this sounds like a good plan for GCC 7. It doesn't require constructing an entire stmt (if you are concerned about the cost of that), and on most targets, probably integrates fairly easily with the expand_vec_perm_const hooks. --Alan
Re: [PATCH] Fix PR68707, 67323
On Thu, 17 Dec 2015, Alan Lawrence wrote: > On 16/12/15 15:01, Richard Biener wrote: > > > > The following patch adds a heuristic to prefer store/load-lanes > > over SLP when vectorizing. Compared to the variant attached to > > the PR I made the STMT_VINFO_STRIDED_P behavior explicit (matching > > what you've tested). > > Not sure I follow this. Compared to the variant attached to the PR - we will > now attempt to use load-lanes, if (say) all of the loads are strided, even if > we know we don't support load-lanes (for any of them). That sounds the wrong > way around and I think rather different to what you proposed earlier? (At the > least, the debug message "can use load/store lanes" is potentially misleading, > that's not necessarily the case!) Ah, indeed. Note that the whole thing is still guarded by the check that we can use store-lanes for the store. I can also do it the other way around (as previously proposed) which would change outcome for slp-perm-11.c. That proposal would not reject the SLP if there were any strided grouped loads involved. > There are arguments that we want to do less SLP, generally, on ARM/AArch64 but > I think Wilco's permute cost patch > https://gcc.gnu.org/ml/gcc-patches/2015-12/msg01469.html is a better way of > achieving that? Maybe, but it's also a heuristic. At least if we _statically_ fail to SLP due to cost issues then we re-try with interleaving. > Just my gut feeling at this point - I haven't evaluated this version of the > patch on any benchmarks etc... Btw, another option is to push the decision past full SLP analysis and thus make the decision globally for all SLP instances - currently SLP instances are cancelled one a one-by-one basis meaning we might do SLP plus load/store-lanes in the same loop. Maybe we have to go all the way to implementing a better vectorization cost hook just for the permutations - the SLP path in theory knows exactly which ones it will generate. Richard.