[Bug tree-optimization/84777] -Os inhibits all vectorization

2018-11-20 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84777

Richard Biener  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
  Known to work||7.3.1, 8.1.0
 Resolution|--- |FIXED

--- Comment #15 from Richard Biener  ---
Fixed.

[Bug tree-optimization/84777] -Os inhibits all vectorization

2018-11-20 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84777

--- Comment #14 from Richard Biener  ---
Author: rguenth
Date: Tue Nov 20 14:47:49 2018
New Revision: 266318

URL: https://gcc.gnu.org/viewcvs?rev=266318=gcc=rev
Log:
2018-11-20  Richard Biener  

Backport from mainline
2018-03-12  Richard Biener  

PR tree-optimization/84777
* tree-ssa-loop-ch.c (should_duplicate_loop_header_p): For
force-vectorize loops ignore whether we are optimizing for size.

2018-01-26  Richard Biener  

PR rtl-optimization/84003
* dse.c (record_store): Only record redundant stores when
the earlier store aliases at least all accesses the later one does.

* g++.dg/torture/pr77745.C: Mark foo noinline to trigger
latent bug in DSE if NOINLINE is appropriately defined.
* g++.dg/torture/pr77745-2.C: New testcase including pr77745.C
and defining NOINLINE.

Added:
branches/gcc-7-branch/gcc/testsuite/g++.dg/torture/pr77745-2.C
Modified:
branches/gcc-7-branch/gcc/ChangeLog
branches/gcc-7-branch/gcc/dse.c
branches/gcc-7-branch/gcc/testsuite/ChangeLog
branches/gcc-7-branch/gcc/testsuite/g++.dg/torture/pr77745.C
branches/gcc-7-branch/gcc/tree-ssa-loop-ch.c

[Bug tree-optimization/84777] -Os inhibits all vectorization

2018-11-20 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84777

--- Comment #13 from Richard Biener  ---
I will backport.

[Bug tree-optimization/84777] -Os inhibits all vectorization

2018-11-20 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84777

Richard Biener  changed:

   What|Removed |Added

   Keywords||missed-optimization
 Status|UNCONFIRMED |ASSIGNED
   Last reconfirmed||2018-11-20
   Assignee|unassigned at gcc dot gnu.org  |rguenth at gcc dot 
gnu.org
 Ever confirmed|0   |1

[Bug tree-optimization/84777] -Os inhibits all vectorization

2018-11-20 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84777

Martin Liška  changed:

   What|Removed |Added

 CC||marxin at gcc dot gnu.org

--- Comment #12 from Martin Liška  ---
Richi: Planning to backport or should we close it?

[Bug tree-optimization/84777] -Os inhibits all vectorization

2018-03-12 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84777

--- Comment #11 from Richard Biener  ---
Author: rguenth
Date: Mon Mar 12 08:45:54 2018
New Revision: 258444

URL: https://gcc.gnu.org/viewcvs?rev=258444=gcc=rev
Log:
2018-03-12  Richard Biener  

PR tree-optimization/84777
* tree-ssa-loop-ch.c (should_duplicate_loop_header_p): For
force-vectorize loops ignore whether we are optimizing for size.

Modified:
trunk/gcc/ChangeLog
trunk/gcc/tree-ssa-loop-ch.c

[Bug tree-optimization/84777] -Os inhibits all vectorization

2018-03-12 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84777

--- Comment #10 from Richard Biener  ---
GCC 8 has the patch now.

[Bug tree-optimization/84777] -Os inhibits all vectorization

2018-03-09 Thread rguenther at suse dot de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84777

--- Comment #9 from rguenther at suse dot de  ---
On Fri, 9 Mar 2018, linux at carewolf dot com wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84777
> 
> --- Comment #8 from Allan Jensen  ---
> Yes, those I say are missing are compared to -O2. I was investigating this in
> relation to Qt. We either build these files with -O3, or with -Os for customer
> that are binary size sensitive. Since some of the image handling routines are
> quite heavy and have been written for auto-vectorization I was just checking 
> if
> I could get it to work and the results with your patch are quite good:
> 
> Normal sizes of qdrawhelper.o with -O3/-O2/-Os: 
> 277704 / 198984 / 168440
> 
> With -O2 -ftree-vectorize: 242224
> With -O2 -fopenmp: 219536
> With -Os -ftree-loop-vectorize: 168440 (no change)
> With -Os -fopenmp: 177144 (with your patch)
> 
> So most of the -Os benefit and still many of the central draw loops
> auto-vectorized.

That looks indeed good.  We have enough infrastructure already to
support a #pragma GCC vectorize as well (it was added for Ada) just
nobody bothered to add C/C++ support.

> Haven't benchmarked it yet though.

I will test and post the patch, I think it makes sense.

[Bug tree-optimization/84777] -Os inhibits all vectorization

2018-03-09 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84777

--- Comment #8 from Allan Jensen  ---
Yes, those I say are missing are compared to -O2. I was investigating this in
relation to Qt. We either build these files with -O3, or with -Os for customer
that are binary size sensitive. Since some of the image handling routines are
quite heavy and have been written for auto-vectorization I was just checking if
I could get it to work and the results with your patch are quite good:

Normal sizes of qdrawhelper.o with -O3/-O2/-Os: 
277704 / 198984 / 168440

With -O2 -ftree-vectorize: 242224
With -O2 -fopenmp: 219536
With -Os -ftree-loop-vectorize: 168440 (no change)
With -Os -fopenmp: 177144 (with your patch)

So most of the -Os benefit and still many of the central draw loops
auto-vectorized.

Haven't benchmarked it yet though.

[Bug tree-optimization/84777] -Os inhibits all vectorization

2018-03-09 Thread rguenther at suse dot de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84777

--- Comment #7 from rguenther at suse dot de  ---
On Fri, 9 Mar 2018, linux at carewolf dot com wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84777
> 
> --- Comment #6 from Allan Jensen  ---
> Great. Your patch worked with 90% of the marked loops!

Good!

> The remaining report things like this with -fopt-info-vec-missed:
> 
> note: not vectorized: relevant stmt not supported: idisty.872_437 = (unsigned
> int) idisty_386;
> note: bad operation or unsupported loop bound.
> 
> But the result is already pretty good for -fopenmp with manually marked loops.

So is it any better if you use -O2 rather than -Os?  Do you really need
-Os?  GCCs -O2 isn't as excessive code-size wise as competitors like ICC.

[Bug tree-optimization/84777] -Os inhibits all vectorization

2018-03-09 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84777

--- Comment #6 from Allan Jensen  ---
Great. Your patch worked with 90% of the marked loops!

The remaining report things like this with -fopt-info-vec-missed:

note: not vectorized: relevant stmt not supported: idisty.872_437 = (unsigned
int) idisty_386;
note: bad operation or unsupported loop bound.

But the result is already pretty good for -fopenmp with manually marked loops.

[Bug tree-optimization/84777] -Os inhibits all vectorization

2018-03-09 Thread rguenther at suse dot de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84777

--- Comment #5 from rguenther at suse dot de  ---
On Fri, 9 Mar 2018, linux at carewolf dot com wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84777
> 
> --- Comment #4 from Allan Jensen  ---
> I will try the patch. I just tried -fopt-info-vec-missed and the message
> reported for every loop was:
> 
> note: not vectorized: latch block not empty.
> note: bad loop form.

Yeah, that's the effect for while () / for () style loops that haven't
been transformed to do {} while () style by loop header copying.

[Bug tree-optimization/84777] -Os inhibits all vectorization

2018-03-09 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84777

--- Comment #4 from Allan Jensen  ---
I will try the patch. I just tried -fopt-info-vec-missed and the message
reported for every loop was:

note: not vectorized: latch block not empty.
note: bad loop form.

[Bug tree-optimization/84777] -Os inhibits all vectorization

2018-03-09 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84777

--- Comment #3 from Richard Biener  ---
FDO might also help given important loops should show up as hot.

[Bug tree-optimization/84777] -Os inhibits all vectorization

2018-03-09 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84777

--- Comment #2 from Richard Biener  ---
Hmm, patch can't help.  Instead try the following which should make the omp
simd
case work.

Index: gcc/tree-ssa-loop-ch.c
===
--- gcc/tree-ssa-loop-ch.c  (revision 258380)
+++ gcc/tree-ssa-loop-ch.c  (working copy)
@@ -57,7 +57,8 @@ should_duplicate_loop_header_p (basic_bl
  be true, since quite often it is possible to verify that the condition is
  satisfied in the first iteration and therefore to eliminate it.  Jump
  threading handles these cases now.  */
-  if (optimize_loop_for_size_p (loop))
+  if (optimize_loop_for_size_p (loop)
+  && !loop->force_vectorize)
 {
   if (dump_file && (dump_flags & TDF_DETAILS))
fprintf (dump_file,

[Bug tree-optimization/84777] -Os inhibits all vectorization

2018-03-09 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84777

Richard Biener  changed:

   What|Removed |Added

 CC||rguenth at gcc dot gnu.org

--- Comment #1 from Richard Biener  ---
IIRC the issue is that -Os inhibits most loop-header copying.

Can you provide a testcase that shows the issue please?

Can you check if the following patch fixes things for you?

Index: gcc/tree-ssa-loop-ch.c
===
--- gcc/tree-ssa-loop-ch.c  (revision 258380)
+++ gcc/tree-ssa-loop-ch.c  (working copy)
@@ -257,8 +257,7 @@ public:
   /* opt_pass methods: */
   virtual bool gate (function *fun)
   {
-return flag_tree_ch != 0
-  && (flag_tree_loop_vectorize != 0 ||
fun->has_force_vectorize_loops);
+return flag_tree_loop_vectorize != 0 || fun->has_force_vectorize_loops;
   }

   /* Just copy headers, no initialization/finalization of loop structures.  */