[Bug tree-optimization/69848] poor vectorization of a loop from SPEC2006 464.h264ref

2016-08-24 Thread amker at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69848

amker at gcc dot gnu.org changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #17 from amker at gcc dot gnu.org ---
Fixed, I think.

[Bug tree-optimization/69848] poor vectorization of a loop from SPEC2006 464.h264ref

2016-08-16 Thread amker at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69848

--- Comment #16 from amker at gcc dot gnu.org ---
Author: amker
Date: Tue Aug 16 13:09:40 2016
New Revision: 239502

URL: https://gcc.gnu.org/viewcvs?rev=239502&root=gcc&view=rev
Log:
PR tree-optimization/69848
* config/aarch64/aarch64-simd.md (vcond): Invert NE
and swtich operands to avoid additional NOT instruction.
(vcond): Ditto.
(vcondu, vcondu): Ditto.

gcc/testsuite
* gcc.target/aarch64/simd/vcond-ne-bit.c: New test.

Added:
trunk/gcc/testsuite/gcc.target/aarch64/simd/vcond-ne-bit.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/aarch64/aarch64-simd.md
trunk/gcc/testsuite/ChangeLog

[Bug tree-optimization/69848] poor vectorization of a loop from SPEC2006 464.h264ref

2016-08-12 Thread amker at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69848

--- Comment #15 from amker at gcc dot gnu.org ---
Author: amker
Date: Fri Aug 12 14:58:20 2016
New Revision: 239416

URL: https://gcc.gnu.org/viewcvs?rev=239416&root=gcc&view=rev
Log:
PR tree-optimization/69848
* tree-vectorizer.h (enum vect_def_type): New condition reduction
type CONST_COND_REDUCTION.
* tree-vect-loop.c (vectorizable_reduction): Support new condition
reudction type CONST_COND_REDUCTION.

gcc/testsuite
PR tree-optimization/69848
* gcc.dg/vect/vect-pr69848.c: New test.

Added:
trunk/gcc/testsuite/gcc.dg/vect/vect-pr69848.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/testsuite/ChangeLog
trunk/gcc/tree-vect-loop.c
trunk/gcc/tree-vectorizer.h

[Bug tree-optimization/69848] poor vectorization of a loop from SPEC2006 464.h264ref

2016-08-03 Thread amker at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69848

--- Comment #14 from amker at gcc dot gnu.org ---
(In reply to Jim Wilson from comment #13)
> I think it was poc_ref_pic_reorder() in slice.c that triggered the ICE.  I
> don't know if the original code shows the vectorization reduction problem. 
> That might only be present in the reduced testcase.

Thanks very much, that's all I wanted to know.  I found a suspicious reduction
case in but not sure if it's the case.  Will look into details.

[Bug tree-optimization/69848] poor vectorization of a loop from SPEC2006 464.h264ref

2016-08-03 Thread wilson at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69848

--- Comment #13 from Jim Wilson  ---
I think it was poc_ref_pic_reorder() in slice.c that triggered the ICE.  I
don't know if the original code shows the vectorization reduction problem. 
That might only be present in the reduced testcase.

[Bug tree-optimization/69848] poor vectorization of a loop from SPEC2006 464.h264ref

2016-08-03 Thread amker at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69848

--- Comment #12 from amker at gcc dot gnu.org ---
Hi Jim,
May I ask which function in h264ref also shows this issue?  I instrumented GCC
and could not found a case in it.  Thanks.

[Bug tree-optimization/69848] poor vectorization of a loop from SPEC2006 464.h264ref

2016-08-02 Thread amker at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69848

--- Comment #11 from amker at gcc dot gnu.org ---
I am also investigating as Alan suggested in comment #3 to see how to fix the
reduction issue.

[Bug tree-optimization/69848] poor vectorization of a loop from SPEC2006 464.h264ref

2016-08-02 Thread amker at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69848

--- Comment #10 from amker at gcc dot gnu.org ---
Patches @https://gcc.gnu.org/ml/gcc-patches/2016-08/msg00058.html and
https://gcc.gnu.org/ml/gcc-patches/2016-08/msg00059.html implements
vcond_mask/vec_cmp/vcond stuff on AArch64 and fix the target dependent problem.
 Number of instructions inside the loop is 8 with them.

The target-independent problem remains and needs different fix.

[Bug tree-optimization/69848] poor vectorization of a loop from SPEC2006 464.h264ref

2016-05-19 Thread amker at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69848

--- Comment #9 from amker at gcc dot gnu.org ---
Author: amker
Date: Thu May 19 09:03:36 2016
New Revision: 236447

URL: https://gcc.gnu.org/viewcvs?rev=236447&root=gcc&view=rev
Log:
PR tree-optimization/69848
* tree-vect-loop.c (vectorizable_reduction): Don't factor
comparison expr out of VEC_COND_EXPR for COND_REDUCTION.

Modified:
trunk/gcc/ChangeLog
trunk/gcc/tree-vect-loop.c

[Bug tree-optimization/69848] poor vectorization of a loop from SPEC2006 464.h264ref

2016-05-12 Thread amker at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69848

--- Comment #8 from amker at gcc dot gnu.org ---
(In reply to amker from comment #7)
> (In reply to Jim Wilson from comment #6)
> > Testing the vcond_mask* patch with make check gave 6 regressions for both
> > armhf and aarch64.
> > 
> > FAIL: gcc.dg/vect/pr65947-10.c (internal compiler error)
> > FAIL: gcc.dg/vect/pr65947-10.c (test for excess errors)
> > FAIL: gcc.dg/vect/pr65947-10.c scan-tree-dump-times vect "LOOP VECTORIZED" 2
> > FAIL: gcc.dg/vect/pr65947-10.c -flto -ffat-lto-objects (internal compiler
> > error)
> > FAIL: gcc.dg/vect/pr65947-10.c -flto -ffat-lto-objects (test for excess
> > errors)
> > FAIL: gcc.dg/vect/pr65947-10.c -flto -ffat-lto-objects  scan-tree-dump-times
> > vec
> > t "LOOP VECTORIZED" 2
> > 
> > The problem here looks like a flaw in the vcond* patterns.  They support int
> > and fp compare operands, but only int selection operands.  E.g. for 
> >   (A op B ? X : Y)
> > A and B can be either int or fp, but X and Y can only be int.  Adding the
> > vcond_mask* patterns apparently causes gcc to call vcond* in ways it didn't
> > before, and that exposes the problem.
> > 
> > The x86 port is the only port with vcond and vcond_mask patterns, and it
> > supports all four combinations if int/fp compare/select operands, so it
> > appears that aarch64 should also.
> > 
> > I will need time to figure out how to fix the vcond* problems before I can
> > formally submit the vcond_mask* patch.
> 
> Hi Jim,
> We have a patch which supports all vcond/vcondu patterns (AArch64 yet)
> including missing ones.  The patch also introduces vec_cmp&vcond_mask
> because it re-implements vcond/vcondu using these two patterns.  It will be
> ready for review shortly, but this issue itself needs vectorizer fix I think.

Hmm, supporting vcond_mask can save one cmlt instruction because it's
introduced in expand_vec_cond_expr when the input op0 is not a comparison.

Propagating _20 to both VEC_COND_EXPR in below can same the "not" and "cmlt"
instructions:

  _20 = vect__1.6_8 == { 0, 0, 0, 0 };
  vect_c_2.8_16 = VEC_COND_EXPR <_20, { 0, 0, 0, 0 }, vect_c_2.7_13>;
  _21 = VEC_COND_EXPR <_20, ivtmp_17, _19>;

[Bug tree-optimization/69848] poor vectorization of a loop from SPEC2006 464.h264ref

2016-05-11 Thread amker at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69848

amker at gcc dot gnu.org changed:

   What|Removed |Added

 CC||amker at gcc dot gnu.org

--- Comment #7 from amker at gcc dot gnu.org ---
(In reply to Jim Wilson from comment #6)
> Testing the vcond_mask* patch with make check gave 6 regressions for both
> armhf and aarch64.
> 
> FAIL: gcc.dg/vect/pr65947-10.c (internal compiler error)
> FAIL: gcc.dg/vect/pr65947-10.c (test for excess errors)
> FAIL: gcc.dg/vect/pr65947-10.c scan-tree-dump-times vect "LOOP VECTORIZED" 2
> FAIL: gcc.dg/vect/pr65947-10.c -flto -ffat-lto-objects (internal compiler
> error)
> FAIL: gcc.dg/vect/pr65947-10.c -flto -ffat-lto-objects (test for excess
> errors)
> FAIL: gcc.dg/vect/pr65947-10.c -flto -ffat-lto-objects  scan-tree-dump-times
> vec
> t "LOOP VECTORIZED" 2
> 
> The problem here looks like a flaw in the vcond* patterns.  They support int
> and fp compare operands, but only int selection operands.  E.g. for 
>   (A op B ? X : Y)
> A and B can be either int or fp, but X and Y can only be int.  Adding the
> vcond_mask* patterns apparently causes gcc to call vcond* in ways it didn't
> before, and that exposes the problem.
> 
> The x86 port is the only port with vcond and vcond_mask patterns, and it
> supports all four combinations if int/fp compare/select operands, so it
> appears that aarch64 should also.
> 
> I will need time to figure out how to fix the vcond* problems before I can
> formally submit the vcond_mask* patch.

Hi Jim,
We have a patch which supports all vcond/vcondu patterns (AArch64 yet)
including missing ones.  The patch also introduces vec_cmp&vcond_mask because
it re-implements vcond/vcondu using these two patterns.  It will be ready for
review shortly, but this issue itself needs vectorizer fix I think.

[Bug tree-optimization/69848] poor vectorization of a loop from SPEC2006 464.h264ref

2016-05-05 Thread wilson at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69848

--- Comment #6 from Jim Wilson  ---
Testing the vcond_mask* patch with make check gave 6 regressions for both armhf
and aarch64.

FAIL: gcc.dg/vect/pr65947-10.c (internal compiler error)
FAIL: gcc.dg/vect/pr65947-10.c (test for excess errors)
FAIL: gcc.dg/vect/pr65947-10.c scan-tree-dump-times vect "LOOP VECTORIZED" 2
FAIL: gcc.dg/vect/pr65947-10.c -flto -ffat-lto-objects (internal compiler
error)
FAIL: gcc.dg/vect/pr65947-10.c -flto -ffat-lto-objects (test for excess errors)
FAIL: gcc.dg/vect/pr65947-10.c -flto -ffat-lto-objects  scan-tree-dump-times
vec
t "LOOP VECTORIZED" 2

The problem here looks like a flaw in the vcond* patterns.  They support int
and fp compare operands, but only int selection operands.  E.g. for 
  (A op B ? X : Y)
A and B can be either int or fp, but X and Y can only be int.  Adding the
vcond_mask* patterns apparently causes gcc to call vcond* in ways it didn't
before, and that exposes the problem.

The x86 port is the only port with vcond and vcond_mask patterns, and it
supports all four combinations if int/fp compare/select operands, so it appears
that aarch64 should also.

I will need time to figure out how to fix the vcond* problems before I can
formally submit the vcond_mask* patch.

[Bug tree-optimization/69848] poor vectorization of a loop from SPEC2006 464.h264ref

2016-02-18 Thread wilson at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69848

--- Comment #5 from Jim Wilson  ---
Created attachment 37737
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=37737&action=edit
Patch to add missing vcond_mask* patterns.

Tested with the subset of CPU2006 that currently works at -O3 on aarch64 and
arm.

[Bug tree-optimization/69848] poor vectorization of a loop from SPEC2006 464.h264ref

2016-02-18 Thread ramana at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69848

Ramana Radhakrishnan  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2016-02-18
 CC||ramana at gcc dot gnu.org
 Ever confirmed|0   |1

--- Comment #4 from Ramana Radhakrishnan  ---
Confirmed.

[Bug tree-optimization/69848] poor vectorization of a loop from SPEC2006 464.h264ref

2016-02-17 Thread alahay01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69848

--- Comment #3 from alahay01 at gcc dot gnu.org ---
The standard way of dealing with condition reductions like this is to ignore
the contents of the "if" statement and produce a lot of code to deal with the
general case (it creates two vectors - one full of indexes and one full of
results). In the code, this is where STMT_VINFO_VEC_REDUCTION_TYPE is set to
COND_REDUCTION in tree-vect-loop.c.

We have an optimisation of this for when the code is "if (a[b]) c=b" which
bypasses most of the code produced by the general case. In the code, this is
where STMT_VINFO_VEC_REDUCTION_TYPE is set to INTEGER_INDUC_COND_REDUCTION
tree-vect-loop.c.

I haven't figured out what the generated asm should look like for this issue,
but I think we'll need a further vect_reduction_type case (CONST_COND_REDUCTION
??) which is checked for at the same point as INTEGER_INDUC_COND_REDUCTION
(just after the "If we have a condition reduction, see if we can simplify it
further." comment).

[Bug tree-optimization/69848] poor vectorization of a loop from SPEC2006 464.h264ref

2016-02-16 Thread wilson at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69848

--- Comment #2 from Jim Wilson  ---
Created attachment 37717
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=37717&action=edit
better code from hand optimizing the gcc output

[Bug tree-optimization/69848] poor vectorization of a loop from SPEC2006 464.h264ref

2016-02-16 Thread wilson at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69848

--- Comment #1 from Jim Wilson  ---
Created attachment 37716
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=37716&action=edit
code generated by -O2 -ftree-vectorize