Re: [PATCH] Fix PR68707

2016-01-11 Thread Christophe Lyon
On 10 January 2016 at 13:15, Thomas Schwinge  wrote:
> Hi!
>
> On Fri,  8 Jan 2016 11:30:16 +, Alan Lawrence  
> wrote:
>> Here's an alternative patch, [...]
>
>> +/* { dg-final { scan-tree-dump "note: Built SLP cancelled: can use 
>> load/store-lanes" { target { vect_perm && vect_load_lanes } } } } */
>
> For all these, you're missing to provide the "suffix" in the
> 'scan-tree-dump "note: [...]' directives, which results in:
>
> UNRESOLVED: gcc.dg/vect/slp-perm-1.c -flto -ffat-lto-objects  
> scan-tree-dump  target { vect_perm && vect_load_lanes }  "note: Built SLP 
> cancelled: can use load/store-lanes"
>
> The target selector doesn't trigger in my x86_64 GNU/Linux testing, but
> to get rid of the UNRESOLVEDs, in r232197 I fixed at least the syntax as
> follows:
>

Not sure if I need to mention that these updated tests fail on armeb;
I guess that falls into the known problems with vectorization on
armeb.

Christophe.

> commit 880ed4be84bfc9cec83d7c9718fd4f87a9ca8f39
> Author: tschwinge 
> Date:   Sun Jan 10 12:12:38 2016 +
>
> Fix scan-tree-dump syntax
>
> gcc/testsuite/
> * gcc.dg/vect/slp-perm-1.c: Fix scan-tree-dump syntax.
> * gcc.dg/vect/slp-perm-2.c: Likewise.
> * gcc.dg/vect/slp-perm-3.c: Likewise.
> * gcc.dg/vect/slp-perm-5.c: Likewise.
> * gcc.dg/vect/slp-perm-6.c: Likewise.
> * gcc.dg/vect/slp-perm-7.c: Likewise.
> * gcc.dg/vect/slp-perm-8.c: Likewise.
>
> git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@232197 
> 138bc75d-0d04-0410-961f-82ee72b054a4
> ---
>  gcc/testsuite/ChangeLog|   10 ++
>  gcc/testsuite/gcc.dg/vect/slp-perm-1.c |2 +-
>  gcc/testsuite/gcc.dg/vect/slp-perm-2.c |2 +-
>  gcc/testsuite/gcc.dg/vect/slp-perm-3.c |2 +-
>  gcc/testsuite/gcc.dg/vect/slp-perm-5.c |2 +-
>  gcc/testsuite/gcc.dg/vect/slp-perm-6.c |2 +-
>  gcc/testsuite/gcc.dg/vect/slp-perm-7.c |2 +-
>  gcc/testsuite/gcc.dg/vect/slp-perm-8.c |2 +-
>  8 files changed, 17 insertions(+), 7 deletions(-)
>
> diff --git gcc/testsuite/ChangeLog gcc/testsuite/ChangeLog
> index 198f26c..3005c9f 100644
> --- gcc/testsuite/ChangeLog
> +++ gcc/testsuite/ChangeLog
> @@ -1,3 +1,13 @@
> +2016-01-10  Thomas Schwinge  
> +
> +   * gcc.dg/vect/slp-perm-1.c: Fix scan-tree-dump syntax.
> +   * gcc.dg/vect/slp-perm-2.c: Likewise.
> +   * gcc.dg/vect/slp-perm-3.c: Likewise.
> +   * gcc.dg/vect/slp-perm-5.c: Likewise.
> +   * gcc.dg/vect/slp-perm-6.c: Likewise.
> +   * gcc.dg/vect/slp-perm-7.c: Likewise.
> +   * gcc.dg/vect/slp-perm-8.c: Likewise.
> +
>  2016-01-10  Tom de Vries  
>
> PR tree-optimization/69039
> diff --git gcc/testsuite/gcc.dg/vect/slp-perm-1.c 
> gcc/testsuite/gcc.dg/vect/slp-perm-1.c
> index 6d0d66f..ee211f2 100644
> --- gcc/testsuite/gcc.dg/vect/slp-perm-1.c
> +++ gcc/testsuite/gcc.dg/vect/slp-perm-1.c
> @@ -60,7 +60,7 @@ int main (int argc, const char* argv[])
>  /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target 
> vect_perm } } } */
>  /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" 
> { target { vect_perm && {! vect_load_lanes } } } } } */
>  /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" 
> { target vect_load_lanes } } } */
> -/* { dg-final { scan-tree-dump "note: Built SLP cancelled: can use 
> load/store-lanes" { target { vect_perm && vect_load_lanes } } } } */
> +/* { dg-final { scan-tree-dump "note: Built SLP cancelled: can use 
> load/store-lanes" "vect" { target { vect_perm && vect_load_lanes } } } } */
>  /* { dg-final { scan-tree-dump "LOAD_LANES" "vect" { target vect_load_lanes 
> } } } */
>  /* { dg-final { scan-tree-dump "STORE_LANES" "vect" { target vect_load_lanes 
> } } } */
>
> diff --git gcc/testsuite/gcc.dg/vect/slp-perm-2.c 
> gcc/testsuite/gcc.dg/vect/slp-perm-2.c
> index 34b413d..4bab348 100644
> --- gcc/testsuite/gcc.dg/vect/slp-perm-2.c
> +++ gcc/testsuite/gcc.dg/vect/slp-perm-2.c
> @@ -56,6 +56,6 @@ int main (int argc, const char* argv[])
>  /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target 
> vect_perm } } } */
>  /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" 
> { target { vect_perm && {! vect_load_lanes } } } } } */
>  /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" 
> { target vect_load_lanes } } } */
> -/* { dg-final { scan-tree-dump "note: Built SLP cancelled: can use 
> load/store-lanes" { target { vect_perm && vect_load_lanes } } } } */
> +/* { dg-final { scan-tree-dump "note: Built SLP cancelled: can use 
> load/store-lanes" "vect" { target { vect_perm && vect_load_lanes } } } } */
>  /* { dg-final { scan-tree-dump "LOAD_LANES" "vect" { target vect_load_lanes 
> } } } */
>  /* { dg-final { scan-tree-dump 

Re: [PATCH] Fix PR68707

2016-01-10 Thread Thomas Schwinge
Hi!

On Fri,  8 Jan 2016 11:30:16 +, Alan Lawrence  wrote:
> Here's an alternative patch, [...]

> +/* { dg-final { scan-tree-dump "note: Built SLP cancelled: can use 
> load/store-lanes" { target { vect_perm && vect_load_lanes } } } } */

For all these, you're missing to provide the "suffix" in the
'scan-tree-dump "note: [...]' directives, which results in:

UNRESOLVED: gcc.dg/vect/slp-perm-1.c -flto -ffat-lto-objects  
scan-tree-dump  target { vect_perm && vect_load_lanes }  "note: Built SLP 
cancelled: can use load/store-lanes"

The target selector doesn't trigger in my x86_64 GNU/Linux testing, but
to get rid of the UNRESOLVEDs, in r232197 I fixed at least the syntax as
follows:

commit 880ed4be84bfc9cec83d7c9718fd4f87a9ca8f39
Author: tschwinge 
Date:   Sun Jan 10 12:12:38 2016 +

Fix scan-tree-dump syntax

gcc/testsuite/
* gcc.dg/vect/slp-perm-1.c: Fix scan-tree-dump syntax.
* gcc.dg/vect/slp-perm-2.c: Likewise.
* gcc.dg/vect/slp-perm-3.c: Likewise.
* gcc.dg/vect/slp-perm-5.c: Likewise.
* gcc.dg/vect/slp-perm-6.c: Likewise.
* gcc.dg/vect/slp-perm-7.c: Likewise.
* gcc.dg/vect/slp-perm-8.c: Likewise.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@232197 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/testsuite/ChangeLog|   10 ++
 gcc/testsuite/gcc.dg/vect/slp-perm-1.c |2 +-
 gcc/testsuite/gcc.dg/vect/slp-perm-2.c |2 +-
 gcc/testsuite/gcc.dg/vect/slp-perm-3.c |2 +-
 gcc/testsuite/gcc.dg/vect/slp-perm-5.c |2 +-
 gcc/testsuite/gcc.dg/vect/slp-perm-6.c |2 +-
 gcc/testsuite/gcc.dg/vect/slp-perm-7.c |2 +-
 gcc/testsuite/gcc.dg/vect/slp-perm-8.c |2 +-
 8 files changed, 17 insertions(+), 7 deletions(-)

diff --git gcc/testsuite/ChangeLog gcc/testsuite/ChangeLog
index 198f26c..3005c9f 100644
--- gcc/testsuite/ChangeLog
+++ gcc/testsuite/ChangeLog
@@ -1,3 +1,13 @@
+2016-01-10  Thomas Schwinge  
+
+   * gcc.dg/vect/slp-perm-1.c: Fix scan-tree-dump syntax.
+   * gcc.dg/vect/slp-perm-2.c: Likewise.
+   * gcc.dg/vect/slp-perm-3.c: Likewise.
+   * gcc.dg/vect/slp-perm-5.c: Likewise.
+   * gcc.dg/vect/slp-perm-6.c: Likewise.
+   * gcc.dg/vect/slp-perm-7.c: Likewise.
+   * gcc.dg/vect/slp-perm-8.c: Likewise.
+
 2016-01-10  Tom de Vries  
 
PR tree-optimization/69039
diff --git gcc/testsuite/gcc.dg/vect/slp-perm-1.c 
gcc/testsuite/gcc.dg/vect/slp-perm-1.c
index 6d0d66f..ee211f2 100644
--- gcc/testsuite/gcc.dg/vect/slp-perm-1.c
+++ gcc/testsuite/gcc.dg/vect/slp-perm-1.c
@@ -60,7 +60,7 @@ int main (int argc, const char* argv[])
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target 
vect_perm } } } */
 /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { 
target { vect_perm && {! vect_load_lanes } } } } } */
 /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" { 
target vect_load_lanes } } } */
-/* { dg-final { scan-tree-dump "note: Built SLP cancelled: can use 
load/store-lanes" { target { vect_perm && vect_load_lanes } } } } */
+/* { dg-final { scan-tree-dump "note: Built SLP cancelled: can use 
load/store-lanes" "vect" { target { vect_perm && vect_load_lanes } } } } */
 /* { dg-final { scan-tree-dump "LOAD_LANES" "vect" { target vect_load_lanes } 
} } */
 /* { dg-final { scan-tree-dump "STORE_LANES" "vect" { target vect_load_lanes } 
} } */
 
diff --git gcc/testsuite/gcc.dg/vect/slp-perm-2.c 
gcc/testsuite/gcc.dg/vect/slp-perm-2.c
index 34b413d..4bab348 100644
--- gcc/testsuite/gcc.dg/vect/slp-perm-2.c
+++ gcc/testsuite/gcc.dg/vect/slp-perm-2.c
@@ -56,6 +56,6 @@ int main (int argc, const char* argv[])
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target 
vect_perm } } } */
 /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { 
target { vect_perm && {! vect_load_lanes } } } } } */
 /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" { 
target vect_load_lanes } } } */
-/* { dg-final { scan-tree-dump "note: Built SLP cancelled: can use 
load/store-lanes" { target { vect_perm && vect_load_lanes } } } } */
+/* { dg-final { scan-tree-dump "note: Built SLP cancelled: can use 
load/store-lanes" "vect" { target { vect_perm && vect_load_lanes } } } } */
 /* { dg-final { scan-tree-dump "LOAD_LANES" "vect" { target vect_load_lanes } 
} } */
 /* { dg-final { scan-tree-dump "STORE_LANES" "vect" { target vect_load_lanes } 
} } */
diff --git gcc/testsuite/gcc.dg/vect/slp-perm-3.c 
gcc/testsuite/gcc.dg/vect/slp-perm-3.c
index ec13e0f..568e400 100644
--- gcc/testsuite/gcc.dg/vect/slp-perm-3.c
+++ gcc/testsuite/gcc.dg/vect/slp-perm-3.c
@@ -69,7 +69,7 @@ int main (int argc, const char* argv[])
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target 
vect_perm } } } */
 /* { 

Re: [PATCH] Fix PR68707

2016-01-08 Thread Richard Biener
On Fri, 8 Jan 2016, Alan Lawrence wrote:

> Here's an alternative patch, using the hunk from
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68707#c16, which 'improves' (heh,
> we think) on the previous by allowing SLP to proceed where the loads are
> strided, for example, slp-perm-11.c. Also I fix the testsuite failures 
> (looking
> for SLP where we've now decided to do load/store-lanes instead). I used simple
> scan-tree-dump (rather than -times) because load_lanes typically appears many
> times in the log for each instance output, and it felt fragile to look for a
> specific number.
> 
> Note, this doesn't fix PR67323 - that's an example where unpermuted SLP is
> still (a bit) worse than load-lanes, as it needs unrolling * 2. The
> previously-posted patch doesn't fix this either, however
> (https://gcc.gnu.org/ml/gcc-patches/2015-12/msg01620.html).
> 
> Bootstrapped + check-gcc + check-g++ on x86_64 (no changes), arm and aarch64
> (various new scan-tree-dump tests all passing, and PASS->FAIL for 
> O3-pr36098.c).

Looks good to me.  Ok if arm maintainers agree.

Thanks,
Richard.

> gcc/ChangeLog:
> 
> 2016-01-XX  Alan Lawrence  
>   Richard Biener  
> 
>   * tree-vect-slp.c (vect_analyze_slp_instance): Cancel permuted SLP
>   instances that can be handled via vect_load_lanes.
> 
> gcc/testsuite/ChangeLog:
> 
>   * lib/target-supports.exp (check_effective_target_vect_load_lanes): New.
>   * gcc.dg/vect/slp-perm-1.c: Look for vect_load_lanes instead of SLP
>   on platforms supporting it.
>   * gcc.dg/vect/slp-perm-2.c: Likewise.
>   * gcc.dg/vect/slp-perm-3.c: Likewise.
>   * gcc.dg/vect/slp-perm-5.c: Likewise.
>   * gcc.dg/vect/slp-perm-7.c: Likewise.
>   * gcc.dg/vect/slp-perm-8.c: Likewise.
>   * gcc.dg/vect/slp-perm-6.c: Look for vect_load_lanes in addition to SLP
>   on platforms supporting it.
> ---
>  gcc/testsuite/gcc.dg/vect/slp-perm-1.c |  6 +-
>  gcc/testsuite/gcc.dg/vect/slp-perm-2.c |  7 +--
>  gcc/testsuite/gcc.dg/vect/slp-perm-3.c |  7 +--
>  gcc/testsuite/gcc.dg/vect/slp-perm-5.c |  7 +--
>  gcc/testsuite/gcc.dg/vect/slp-perm-6.c |  7 +--
>  gcc/testsuite/gcc.dg/vect/slp-perm-7.c |  8 +---
>  gcc/testsuite/gcc.dg/vect/slp-perm-8.c |  7 +--
>  gcc/testsuite/lib/target-supports.exp  | 19 +++
>  gcc/tree-vect-slp.c| 30 ++
>  9 files changed, 84 insertions(+), 14 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.dg/vect/slp-perm-1.c 
> b/gcc/testsuite/gcc.dg/vect/slp-perm-1.c
> index 2b6c134..6d0d66f 100644
> --- a/gcc/testsuite/gcc.dg/vect/slp-perm-1.c
> +++ b/gcc/testsuite/gcc.dg/vect/slp-perm-1.c
> @@ -58,5 +58,9 @@ int main (int argc, const char* argv[])
>  }
>  
>  /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target 
> vect_perm } } } */
> -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" 
> { target vect_perm } } } */
> +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" 
> { target { vect_perm && {! vect_load_lanes } } } } } */
> +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" 
> { target vect_load_lanes } } } */
> +/* { dg-final { scan-tree-dump "note: Built SLP cancelled: can use 
> load/store-lanes" { target { vect_perm && vect_load_lanes } } } } */
> +/* { dg-final { scan-tree-dump "LOAD_LANES" "vect" { target vect_load_lanes 
> } } } */
> +/* { dg-final { scan-tree-dump "STORE_LANES" "vect" { target vect_load_lanes 
> } } } */
>  
> diff --git a/gcc/testsuite/gcc.dg/vect/slp-perm-2.c 
> b/gcc/testsuite/gcc.dg/vect/slp-perm-2.c
> index da50e16..34b413d 100644
> --- a/gcc/testsuite/gcc.dg/vect/slp-perm-2.c
> +++ b/gcc/testsuite/gcc.dg/vect/slp-perm-2.c
> @@ -54,5 +54,8 @@ int main (int argc, const char* argv[])
>  }
>  
>  /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target 
> vect_perm } } } */
> -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" 
> { target vect_perm } } } */
> -
> +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" 
> { target { vect_perm && {! vect_load_lanes } } } } } */
> +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" 
> { target vect_load_lanes } } } */
> +/* { dg-final { scan-tree-dump "note: Built SLP cancelled: can use 
> load/store-lanes" { target { vect_perm && vect_load_lanes } } } } */
> +/* { dg-final { scan-tree-dump "LOAD_LANES" "vect" { target vect_load_lanes 
> } } } */
> +/* { dg-final { scan-tree-dump "STORE_LANES" "vect" { target vect_load_lanes 
> } } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/slp-perm-3.c 
> b/gcc/testsuite/gcc.dg/vect/slp-perm-3.c
> index 1d33f9b..ec13e0f 100644
> --- a/gcc/testsuite/gcc.dg/vect/slp-perm-3.c
> +++ b/gcc/testsuite/gcc.dg/vect/slp-perm-3.c
> @@ -67,6 +67,9 @@ int main (int argc, const char* argv[])
>  }
>  
>  /* { 

Re: [PATCH] Fix PR68707

2016-01-08 Thread Richard Earnshaw (lists)
On 08/01/16 11:45, Richard Biener wrote:
> On Fri, 8 Jan 2016, Alan Lawrence wrote:
> 
>> Here's an alternative patch, using the hunk from
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68707#c16, which 'improves' 
>> (heh,
>> we think) on the previous by allowing SLP to proceed where the loads are
>> strided, for example, slp-perm-11.c. Also I fix the testsuite failures 
>> (looking
>> for SLP where we've now decided to do load/store-lanes instead). I used 
>> simple
>> scan-tree-dump (rather than -times) because load_lanes typically appears many
>> times in the log for each instance output, and it felt fragile to look for a
>> specific number.
>>
>> Note, this doesn't fix PR67323 - that's an example where unpermuted SLP is
>> still (a bit) worse than load-lanes, as it needs unrolling * 2. The
>> previously-posted patch doesn't fix this either, however
>> (https://gcc.gnu.org/ml/gcc-patches/2015-12/msg01620.html).
>>
>> Bootstrapped + check-gcc + check-g++ on x86_64 (no changes), arm and aarch64
>> (various new scan-tree-dump tests all passing, and PASS->FAIL for 
>> O3-pr36098.c).
> 
> Looks good to me.  Ok if arm maintainers agree.
> 

OK by me.

R.

> Thanks,
> Richard.
> 
>> gcc/ChangeLog:
>>
>> 2016-01-XX  Alan Lawrence  
>>  Richard Biener  
>>
>>  * tree-vect-slp.c (vect_analyze_slp_instance): Cancel permuted SLP
>>  instances that can be handled via vect_load_lanes.
>>
>> gcc/testsuite/ChangeLog:
>>
>>  * lib/target-supports.exp (check_effective_target_vect_load_lanes): New.
>>  * gcc.dg/vect/slp-perm-1.c: Look for vect_load_lanes instead of SLP
>>  on platforms supporting it.
>>  * gcc.dg/vect/slp-perm-2.c: Likewise.
>>  * gcc.dg/vect/slp-perm-3.c: Likewise.
>>  * gcc.dg/vect/slp-perm-5.c: Likewise.
>>  * gcc.dg/vect/slp-perm-7.c: Likewise.
>>  * gcc.dg/vect/slp-perm-8.c: Likewise.
>>  * gcc.dg/vect/slp-perm-6.c: Look for vect_load_lanes in addition to SLP
>>  on platforms supporting it.
>> ---
>>  gcc/testsuite/gcc.dg/vect/slp-perm-1.c |  6 +-
>>  gcc/testsuite/gcc.dg/vect/slp-perm-2.c |  7 +--
>>  gcc/testsuite/gcc.dg/vect/slp-perm-3.c |  7 +--
>>  gcc/testsuite/gcc.dg/vect/slp-perm-5.c |  7 +--
>>  gcc/testsuite/gcc.dg/vect/slp-perm-6.c |  7 +--
>>  gcc/testsuite/gcc.dg/vect/slp-perm-7.c |  8 +---
>>  gcc/testsuite/gcc.dg/vect/slp-perm-8.c |  7 +--
>>  gcc/testsuite/lib/target-supports.exp  | 19 +++
>>  gcc/tree-vect-slp.c| 30 ++
>>  9 files changed, 84 insertions(+), 14 deletions(-)
>>
>> diff --git a/gcc/testsuite/gcc.dg/vect/slp-perm-1.c 
>> b/gcc/testsuite/gcc.dg/vect/slp-perm-1.c
>> index 2b6c134..6d0d66f 100644
>> --- a/gcc/testsuite/gcc.dg/vect/slp-perm-1.c
>> +++ b/gcc/testsuite/gcc.dg/vect/slp-perm-1.c
>> @@ -58,5 +58,9 @@ int main (int argc, const char* argv[])
>>  }
>>  
>>  /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { 
>> target vect_perm } } } */
>> -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" 
>> { target vect_perm } } } */
>> +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" 
>> { target { vect_perm && {! vect_load_lanes } } } } } */
>> +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" 
>> { target vect_load_lanes } } } */
>> +/* { dg-final { scan-tree-dump "note: Built SLP cancelled: can use 
>> load/store-lanes" { target { vect_perm && vect_load_lanes } } } } */
>> +/* { dg-final { scan-tree-dump "LOAD_LANES" "vect" { target vect_load_lanes 
>> } } } */
>> +/* { dg-final { scan-tree-dump "STORE_LANES" "vect" { target 
>> vect_load_lanes } } } */
>>  
>> diff --git a/gcc/testsuite/gcc.dg/vect/slp-perm-2.c 
>> b/gcc/testsuite/gcc.dg/vect/slp-perm-2.c
>> index da50e16..34b413d 100644
>> --- a/gcc/testsuite/gcc.dg/vect/slp-perm-2.c
>> +++ b/gcc/testsuite/gcc.dg/vect/slp-perm-2.c
>> @@ -54,5 +54,8 @@ int main (int argc, const char* argv[])
>>  }
>>  
>>  /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { 
>> target vect_perm } } } */
>> -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" 
>> { target vect_perm } } } */
>> -
>> +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" 
>> { target { vect_perm && {! vect_load_lanes } } } } } */
>> +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" 
>> { target vect_load_lanes } } } */
>> +/* { dg-final { scan-tree-dump "note: Built SLP cancelled: can use 
>> load/store-lanes" { target { vect_perm && vect_load_lanes } } } } */
>> +/* { dg-final { scan-tree-dump "LOAD_LANES" "vect" { target vect_load_lanes 
>> } } } */
>> +/* { dg-final { scan-tree-dump "STORE_LANES" "vect" { target 
>> vect_load_lanes } } } */
>> diff --git a/gcc/testsuite/gcc.dg/vect/slp-perm-3.c 
>> b/gcc/testsuite/gcc.dg/vect/slp-perm-3.c
>> index 1d33f9b..ec13e0f 100644
>> --- 

Re: [PATCH] Fix PR68707, 67323

2015-12-17 Thread Alan Lawrence

On 16/12/15 15:01, Richard Biener wrote:


The following patch adds a heuristic to prefer store/load-lanes
over SLP when vectorizing.  Compared to the variant attached to
the PR I made the STMT_VINFO_STRIDED_P behavior explicit (matching
what you've tested).


Not sure I follow this. Compared to the variant attached to the PR - we will now 
attempt to use load-lanes, if (say) all of the loads are strided, even if we 
know we don't support load-lanes (for any of them). That sounds the wrong way 
around and I think rather different to what you proposed earlier? (At the least, 
the debug message "can use load/store lanes" is potentially misleading, that's 
not necessarily the case!)


There are arguments that we want to do less SLP, generally, on ARM/AArch64 but I 
think Wilco's permute cost patch 
https://gcc.gnu.org/ml/gcc-patches/2015-12/msg01469.html is a better way of 
achieving that?


Just my gut feeling at this point - I haven't evaluated this version of the 
patch on any benchmarks etc...


Thanks, Alan



Re: [PATCH] Fix PR68707, 67323

2015-12-17 Thread Alan Lawrence

On 17/12/15 10:46, Richard Biener wrote:

On Thu, 17 Dec 2015, Alan Lawrence wrote:


On 16/12/15 15:01, Richard Biener wrote:


The following patch adds a heuristic to prefer store/load-lanes
over SLP when vectorizing.  Compared to the variant attached to
the PR I made the STMT_VINFO_STRIDED_P behavior explicit (matching
what you've tested).


Not sure I follow this. Compared to the variant attached to the PR - we will
now attempt to use load-lanes, if (say) all of the loads are strided, even if
we know we don't support load-lanes (for any of them). That sounds the wrong
way around and I think rather different to what you proposed earlier? (At the
least, the debug message "can use load/store lanes" is potentially misleading,
that's not necessarily the case!)


Ah, indeed.  Note that the whole thing is still guarded by the check
that we can use store-lanes for the store.

I can also do it the other way around (as previously proposed) which
would change outcome for slp-perm-11.c.  That proposal would not reject
the SLP if there were any strided grouped loads involved.


Indeed; the STMT_VINFO_STRIDED_P || !vect_load_lanes_supported approach (as on 
PR68707) vectorizes slp-perm-11.c with SLP, which works much better than the 
!STMT_VINFO_STRIDED_P && !vect_load_lanes_supported, which tries to use st2 (and 
only sort-of works - you get an st2 output, but no ld2, and lots of faff).


I think I move for the patch from PR68707, therefore. (Ramana - any thoughts?)


Btw, another option is to push the decision past full SLP analysis
and thus make the decision globally for all SLP instances - currently
SLP instances are cancelled one a one-by-one basis meaning we might
do SLP plus load/store-lanes in the same loop.


I don't see anything inherently wrong with doing both in the same loop. On 
simple loops, I suspect we'll do better committing to one strategy or the other 
(tho really it's only the VF required I think?), but then, on such simple loops, 
there are probably not very many SLP instances!



Maybe we have to go all the way to implementing a better vectorization
cost hook just for the permutations - the SLP path in theory knows
exactly which ones it will generate.


Yes, I think this sounds like a good plan for GCC 7. It doesn't require 
constructing an entire stmt (if you are concerned about the cost of that), and 
on most targets, probably integrates fairly easily with the 
expand_vec_perm_const hooks.


--Alan


Re: [PATCH] Fix PR68707, 67323

2015-12-17 Thread Richard Biener
On Thu, 17 Dec 2015, Alan Lawrence wrote:

> On 16/12/15 15:01, Richard Biener wrote:
> > 
> > The following patch adds a heuristic to prefer store/load-lanes
> > over SLP when vectorizing.  Compared to the variant attached to
> > the PR I made the STMT_VINFO_STRIDED_P behavior explicit (matching
> > what you've tested).
> 
> Not sure I follow this. Compared to the variant attached to the PR - we will
> now attempt to use load-lanes, if (say) all of the loads are strided, even if
> we know we don't support load-lanes (for any of them). That sounds the wrong
> way around and I think rather different to what you proposed earlier? (At the
> least, the debug message "can use load/store lanes" is potentially misleading,
> that's not necessarily the case!)

Ah, indeed.  Note that the whole thing is still guarded by the check
that we can use store-lanes for the store.

I can also do it the other way around (as previously proposed) which
would change outcome for slp-perm-11.c.  That proposal would not reject
the SLP if there were any strided grouped loads involved.

> There are arguments that we want to do less SLP, generally, on ARM/AArch64 but
> I think Wilco's permute cost patch
> https://gcc.gnu.org/ml/gcc-patches/2015-12/msg01469.html is a better way of
> achieving that?

Maybe, but it's also a heuristic.  At least if we _statically_ fail to 
SLP due to cost issues then we re-try with interleaving.

> Just my gut feeling at this point - I haven't evaluated this version of the
> patch on any benchmarks etc...

Btw, another option is to push the decision past full SLP analysis
and thus make the decision globally for all SLP instances - currently
SLP instances are cancelled one a one-by-one basis meaning we might
do SLP plus load/store-lanes in the same loop.

Maybe we have to go all the way to implementing a better vectorization
cost hook just for the permutations - the SLP path in theory knows
exactly which ones it will generate.

Richard.