[Bug tree-optimization/69848] poor vectorization of a loop from SPEC2006 464.h264ref
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69848 amker at gcc dot gnu.org changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #17 from amker at gcc dot gnu.org --- Fixed, I think.
[Bug tree-optimization/69848] poor vectorization of a loop from SPEC2006 464.h264ref
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69848 --- Comment #16 from amker at gcc dot gnu.org --- Author: amker Date: Tue Aug 16 13:09:40 2016 New Revision: 239502 URL: https://gcc.gnu.org/viewcvs?rev=239502&root=gcc&view=rev Log: PR tree-optimization/69848 * config/aarch64/aarch64-simd.md (vcond): Invert NE and swtich operands to avoid additional NOT instruction. (vcond): Ditto. (vcondu, vcondu): Ditto. gcc/testsuite * gcc.target/aarch64/simd/vcond-ne-bit.c: New test. Added: trunk/gcc/testsuite/gcc.target/aarch64/simd/vcond-ne-bit.c Modified: trunk/gcc/ChangeLog trunk/gcc/config/aarch64/aarch64-simd.md trunk/gcc/testsuite/ChangeLog
[Bug tree-optimization/69848] poor vectorization of a loop from SPEC2006 464.h264ref
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69848 --- Comment #15 from amker at gcc dot gnu.org --- Author: amker Date: Fri Aug 12 14:58:20 2016 New Revision: 239416 URL: https://gcc.gnu.org/viewcvs?rev=239416&root=gcc&view=rev Log: PR tree-optimization/69848 * tree-vectorizer.h (enum vect_def_type): New condition reduction type CONST_COND_REDUCTION. * tree-vect-loop.c (vectorizable_reduction): Support new condition reudction type CONST_COND_REDUCTION. gcc/testsuite PR tree-optimization/69848 * gcc.dg/vect/vect-pr69848.c: New test. Added: trunk/gcc/testsuite/gcc.dg/vect/vect-pr69848.c Modified: trunk/gcc/ChangeLog trunk/gcc/testsuite/ChangeLog trunk/gcc/tree-vect-loop.c trunk/gcc/tree-vectorizer.h
[Bug tree-optimization/69848] poor vectorization of a loop from SPEC2006 464.h264ref
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69848 --- Comment #14 from amker at gcc dot gnu.org --- (In reply to Jim Wilson from comment #13) > I think it was poc_ref_pic_reorder() in slice.c that triggered the ICE. I > don't know if the original code shows the vectorization reduction problem. > That might only be present in the reduced testcase. Thanks very much, that's all I wanted to know. I found a suspicious reduction case in but not sure if it's the case. Will look into details.
[Bug tree-optimization/69848] poor vectorization of a loop from SPEC2006 464.h264ref
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69848 --- Comment #13 from Jim Wilson --- I think it was poc_ref_pic_reorder() in slice.c that triggered the ICE. I don't know if the original code shows the vectorization reduction problem. That might only be present in the reduced testcase.
[Bug tree-optimization/69848] poor vectorization of a loop from SPEC2006 464.h264ref
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69848 --- Comment #12 from amker at gcc dot gnu.org --- Hi Jim, May I ask which function in h264ref also shows this issue? I instrumented GCC and could not found a case in it. Thanks.
[Bug tree-optimization/69848] poor vectorization of a loop from SPEC2006 464.h264ref
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69848 --- Comment #11 from amker at gcc dot gnu.org --- I am also investigating as Alan suggested in comment #3 to see how to fix the reduction issue.
[Bug tree-optimization/69848] poor vectorization of a loop from SPEC2006 464.h264ref
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69848 --- Comment #10 from amker at gcc dot gnu.org --- Patches @https://gcc.gnu.org/ml/gcc-patches/2016-08/msg00058.html and https://gcc.gnu.org/ml/gcc-patches/2016-08/msg00059.html implements vcond_mask/vec_cmp/vcond stuff on AArch64 and fix the target dependent problem. Number of instructions inside the loop is 8 with them. The target-independent problem remains and needs different fix.
[Bug tree-optimization/69848] poor vectorization of a loop from SPEC2006 464.h264ref
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69848 --- Comment #9 from amker at gcc dot gnu.org --- Author: amker Date: Thu May 19 09:03:36 2016 New Revision: 236447 URL: https://gcc.gnu.org/viewcvs?rev=236447&root=gcc&view=rev Log: PR tree-optimization/69848 * tree-vect-loop.c (vectorizable_reduction): Don't factor comparison expr out of VEC_COND_EXPR for COND_REDUCTION. Modified: trunk/gcc/ChangeLog trunk/gcc/tree-vect-loop.c
[Bug tree-optimization/69848] poor vectorization of a loop from SPEC2006 464.h264ref
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69848 --- Comment #8 from amker at gcc dot gnu.org --- (In reply to amker from comment #7) > (In reply to Jim Wilson from comment #6) > > Testing the vcond_mask* patch with make check gave 6 regressions for both > > armhf and aarch64. > > > > FAIL: gcc.dg/vect/pr65947-10.c (internal compiler error) > > FAIL: gcc.dg/vect/pr65947-10.c (test for excess errors) > > FAIL: gcc.dg/vect/pr65947-10.c scan-tree-dump-times vect "LOOP VECTORIZED" 2 > > FAIL: gcc.dg/vect/pr65947-10.c -flto -ffat-lto-objects (internal compiler > > error) > > FAIL: gcc.dg/vect/pr65947-10.c -flto -ffat-lto-objects (test for excess > > errors) > > FAIL: gcc.dg/vect/pr65947-10.c -flto -ffat-lto-objects scan-tree-dump-times > > vec > > t "LOOP VECTORIZED" 2 > > > > The problem here looks like a flaw in the vcond* patterns. They support int > > and fp compare operands, but only int selection operands. E.g. for > > (A op B ? X : Y) > > A and B can be either int or fp, but X and Y can only be int. Adding the > > vcond_mask* patterns apparently causes gcc to call vcond* in ways it didn't > > before, and that exposes the problem. > > > > The x86 port is the only port with vcond and vcond_mask patterns, and it > > supports all four combinations if int/fp compare/select operands, so it > > appears that aarch64 should also. > > > > I will need time to figure out how to fix the vcond* problems before I can > > formally submit the vcond_mask* patch. > > Hi Jim, > We have a patch which supports all vcond/vcondu patterns (AArch64 yet) > including missing ones. The patch also introduces vec_cmp&vcond_mask > because it re-implements vcond/vcondu using these two patterns. It will be > ready for review shortly, but this issue itself needs vectorizer fix I think. Hmm, supporting vcond_mask can save one cmlt instruction because it's introduced in expand_vec_cond_expr when the input op0 is not a comparison. Propagating _20 to both VEC_COND_EXPR in below can same the "not" and "cmlt" instructions: _20 = vect__1.6_8 == { 0, 0, 0, 0 }; vect_c_2.8_16 = VEC_COND_EXPR <_20, { 0, 0, 0, 0 }, vect_c_2.7_13>; _21 = VEC_COND_EXPR <_20, ivtmp_17, _19>;
[Bug tree-optimization/69848] poor vectorization of a loop from SPEC2006 464.h264ref
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69848 amker at gcc dot gnu.org changed: What|Removed |Added CC||amker at gcc dot gnu.org --- Comment #7 from amker at gcc dot gnu.org --- (In reply to Jim Wilson from comment #6) > Testing the vcond_mask* patch with make check gave 6 regressions for both > armhf and aarch64. > > FAIL: gcc.dg/vect/pr65947-10.c (internal compiler error) > FAIL: gcc.dg/vect/pr65947-10.c (test for excess errors) > FAIL: gcc.dg/vect/pr65947-10.c scan-tree-dump-times vect "LOOP VECTORIZED" 2 > FAIL: gcc.dg/vect/pr65947-10.c -flto -ffat-lto-objects (internal compiler > error) > FAIL: gcc.dg/vect/pr65947-10.c -flto -ffat-lto-objects (test for excess > errors) > FAIL: gcc.dg/vect/pr65947-10.c -flto -ffat-lto-objects scan-tree-dump-times > vec > t "LOOP VECTORIZED" 2 > > The problem here looks like a flaw in the vcond* patterns. They support int > and fp compare operands, but only int selection operands. E.g. for > (A op B ? X : Y) > A and B can be either int or fp, but X and Y can only be int. Adding the > vcond_mask* patterns apparently causes gcc to call vcond* in ways it didn't > before, and that exposes the problem. > > The x86 port is the only port with vcond and vcond_mask patterns, and it > supports all four combinations if int/fp compare/select operands, so it > appears that aarch64 should also. > > I will need time to figure out how to fix the vcond* problems before I can > formally submit the vcond_mask* patch. Hi Jim, We have a patch which supports all vcond/vcondu patterns (AArch64 yet) including missing ones. The patch also introduces vec_cmp&vcond_mask because it re-implements vcond/vcondu using these two patterns. It will be ready for review shortly, but this issue itself needs vectorizer fix I think.
[Bug tree-optimization/69848] poor vectorization of a loop from SPEC2006 464.h264ref
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69848 --- Comment #6 from Jim Wilson --- Testing the vcond_mask* patch with make check gave 6 regressions for both armhf and aarch64. FAIL: gcc.dg/vect/pr65947-10.c (internal compiler error) FAIL: gcc.dg/vect/pr65947-10.c (test for excess errors) FAIL: gcc.dg/vect/pr65947-10.c scan-tree-dump-times vect "LOOP VECTORIZED" 2 FAIL: gcc.dg/vect/pr65947-10.c -flto -ffat-lto-objects (internal compiler error) FAIL: gcc.dg/vect/pr65947-10.c -flto -ffat-lto-objects (test for excess errors) FAIL: gcc.dg/vect/pr65947-10.c -flto -ffat-lto-objects scan-tree-dump-times vec t "LOOP VECTORIZED" 2 The problem here looks like a flaw in the vcond* patterns. They support int and fp compare operands, but only int selection operands. E.g. for (A op B ? X : Y) A and B can be either int or fp, but X and Y can only be int. Adding the vcond_mask* patterns apparently causes gcc to call vcond* in ways it didn't before, and that exposes the problem. The x86 port is the only port with vcond and vcond_mask patterns, and it supports all four combinations if int/fp compare/select operands, so it appears that aarch64 should also. I will need time to figure out how to fix the vcond* problems before I can formally submit the vcond_mask* patch.
[Bug tree-optimization/69848] poor vectorization of a loop from SPEC2006 464.h264ref
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69848 --- Comment #5 from Jim Wilson --- Created attachment 37737 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=37737&action=edit Patch to add missing vcond_mask* patterns. Tested with the subset of CPU2006 that currently works at -O3 on aarch64 and arm.
[Bug tree-optimization/69848] poor vectorization of a loop from SPEC2006 464.h264ref
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69848 Ramana Radhakrishnan changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2016-02-18 CC||ramana at gcc dot gnu.org Ever confirmed|0 |1 --- Comment #4 from Ramana Radhakrishnan --- Confirmed.
[Bug tree-optimization/69848] poor vectorization of a loop from SPEC2006 464.h264ref
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69848 --- Comment #3 from alahay01 at gcc dot gnu.org --- The standard way of dealing with condition reductions like this is to ignore the contents of the "if" statement and produce a lot of code to deal with the general case (it creates two vectors - one full of indexes and one full of results). In the code, this is where STMT_VINFO_VEC_REDUCTION_TYPE is set to COND_REDUCTION in tree-vect-loop.c. We have an optimisation of this for when the code is "if (a[b]) c=b" which bypasses most of the code produced by the general case. In the code, this is where STMT_VINFO_VEC_REDUCTION_TYPE is set to INTEGER_INDUC_COND_REDUCTION tree-vect-loop.c. I haven't figured out what the generated asm should look like for this issue, but I think we'll need a further vect_reduction_type case (CONST_COND_REDUCTION ??) which is checked for at the same point as INTEGER_INDUC_COND_REDUCTION (just after the "If we have a condition reduction, see if we can simplify it further." comment).
[Bug tree-optimization/69848] poor vectorization of a loop from SPEC2006 464.h264ref
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69848 --- Comment #2 from Jim Wilson --- Created attachment 37717 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=37717&action=edit better code from hand optimizing the gcc output
[Bug tree-optimization/69848] poor vectorization of a loop from SPEC2006 464.h264ref
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69848 --- Comment #1 from Jim Wilson --- Created attachment 37716 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=37716&action=edit code generated by -O2 -ftree-vectorize