[Bug tree-optimization/92738] [10 regression] Large code size growth for -O2 binaries between 2019-05-19...2019-05-29

2019-12-20 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92738

--- Comment #16 from Jakub Jelinek  ---
Author: jakub
Date: Fri Dec 20 23:51:15 2019
New Revision: 279687

URL: https://gcc.gnu.org/viewcvs?rev=279687=gcc=rev
Log:
PR middle-end/91512
PR fortran/92738
* lang.opt (-finline-arg-packing): Add trailing dot to help text.

Modified:
trunk/gcc/fortran/ChangeLog
trunk/gcc/fortran/lang.opt
trunk/gcc/testsuite/ChangeLog

[Bug tree-optimization/92738] [10 regression] Large code size growth for -O2 binaries between 2019-05-19...2019-05-29

2019-12-20 Thread tkoenig at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92738

Thomas Koenig  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #15 from Thomas Koenig  ---
The user can now circumvent this with -finline-arg-packing.

Closing as FIXED.

[Bug tree-optimization/92738] [10 regression] Large code size growth for -O2 binaries between 2019-05-19...2019-05-29

2019-12-20 Thread tkoenig at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92738

--- Comment #14 from Thomas Koenig  ---
Author: tkoenig
Date: Fri Dec 20 11:51:05 2019
New Revision: 279639

URL: https://gcc.gnu.org/viewcvs?rev=279639=gcc=rev
Log:
Introduce -finline-arg-packing.

2019-12-20  Thomas Koenig  

PR middle-end/91512
PR fortran/92738
* invoke.texi: Document -finline-arg-packing.
* lang.opt: Add -finline-arg-packing.
* options.c (gfc_post_options): Handle -finline-arg-packing.
* trans-array.c (gfc_conv_array_parameter): Use
flag_inline_arg_packing instead of checking for optimize and
optimize_size.

2019-12-20  Thomas Koenig  

PR middle-end/91512
PR fortran/92738
* gfortran.dg/inline_pack_25.f90: New test.


Added:
trunk/gcc/testsuite/gfortran.dg/internal_pack_25.f90
Modified:
trunk/gcc/fortran/ChangeLog
trunk/gcc/fortran/invoke.texi
trunk/gcc/fortran/lang.opt
trunk/gcc/fortran/options.c
trunk/gcc/fortran/trans-array.c
trunk/gcc/testsuite/ChangeLog

[Bug tree-optimization/92738] [10 regression] Large code size growth for -O2 binaries between 2019-05-19...2019-05-29

2019-12-04 Thread tkoenig at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92738

--- Comment #13 from Thomas Koenig  ---
(In reply to Thomas Koenig from comment #12)
> People who have problems can then enable

I meant disable, of course.

> that option for
> the specific files they have the problems with.

[Bug tree-optimization/92738] [10 regression] Large code size growth for -O2 binaries between 2019-05-19...2019-05-29

2019-12-04 Thread tkoenig at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92738

--- Comment #12 from Thomas Koenig  ---
(In reply to Wilco from comment #11)

> Would using -frepack-arrays solve this issue? I proposed making that the
> default a while back. It would do any repacking that is necessary at call
> sites rather than creating multiple copies of all loops in every function.

I'm not convinced that that is the answer - this would penalize (at runtime)
programs which use non-contiguous memory when _not_ passing them to
an explicit size or assumed size paramter. For example, all the optimizations
for passing a transposed argument would then no longer work.

What we could do instead is to introduce another option, -frepack-inline
(or whatever we want to call it) and enable this by default at all -O
except at -Os. People who have problems can then enable that option for
the specific files they have the problems with.

[Bug tree-optimization/92738] [10 regression] Large code size growth for -O2 binaries between 2019-05-19...2019-05-29

2019-12-02 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92738

--- Comment #11 from Wilco  ---
(In reply to Thomas Koenig from comment #10)
> (In reply to Martin Liška from comment #6)
> > So wrf grew starting with r271377, size (w/o debug info) goes from 20164464B
> > to 23674792.
> 
> I think we've had this discussion before, although I cannot offhand
> recall the PR number.  PR91512 is closely related.
> 
> Since r271377, arguments which may be contiguous are now (conditionally)
> packed and unpacked inline (see PR88821).
> 
> This was done so that the middle end can look into these conversions
> and possibly eliminate them, if it can be determined via inlining
> or LTO that the argument is contiguous anyway). This can lead to an
> extremely large performance boost for some test cases (*10 or more),
> but will, in general, lead to a size increase.
> 
> Now, wrf has an extremely strange (and rare) programming style where they
> pass
> a ton of assumed shape arguments (where it is not clear, at compile-time,
> if they need packing/unpacking) to an old-style array argument. This
> causes considerable code size increase.
> 
> So, it's a tradeoff, which we discussed at the time. This is why this
> is not done at -Os.
> 
> Should we "fix" this? I think not, the style of wrf is just too horrid,
> and pessimizing other programs for the sake of one benchmark makes little
> sense to me.

Would using -frepack-arrays solve this issue? I proposed making that the
default a while back. It would do any repacking that is necessary at call sites
rather than creating multiple copies of all loops in every function.

[Bug tree-optimization/92738] [10 regression] Large code size growth for -O2 binaries between 2019-05-19...2019-05-29

2019-12-02 Thread tkoenig at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92738

--- Comment #10 from Thomas Koenig  ---
(In reply to Martin Liška from comment #6)
> So wrf grew starting with r271377, size (w/o debug info) goes from 20164464B
> to 23674792.

I think we've had this discussion before, although I cannot offhand
recall the PR number.  PR91512 is closely related.

Since r271377, arguments which may be contiguous are now (conditionally)
packed and unpacked inline (see PR88821).

This was done so that the middle end can look into these conversions
and possibly eliminate them, if it can be determined via inlining
or LTO that the argument is contiguous anyway). This can lead to an
extremely large performance boost for some test cases (*10 or more),
but will, in general, lead to a size increase.

Now, wrf has an extremely strange (and rare) programming style where they pass
a ton of assumed shape arguments (where it is not clear, at compile-time,
if they need packing/unpacking) to an old-style array argument. This
causes considerable code size increase.

So, it's a tradeoff, which we discussed at the time. This is why this
is not done at -Os.

Should we "fix" this? I think not, the style of wrf is just too horrid,
and pessimizing other programs for the sake of one benchmark makes little
sense to me.

[Bug tree-optimization/92738] [10 regression] Large code size growth for -O2 binaries between 2019-05-19...2019-05-29

2019-12-02 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92738

--- Comment #9 from Wilco  ---
(In reply to Martin Liška from comment #8)
> (In reply to Wilco from comment #7)
> > (In reply to Martin Liška from comment #6)
> > > So wrf grew starting with r271377, size (w/o debug info) goes from 
> > > 20164464B
> > > to 23674792.
> > 
> > Also check the build time of wrf. Looking at my logs trunk takes 2x as long
> > to build it since June.
> 
> Maybe related to:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91509
> ?

I think not, this is plain -Ofast, no LTO or prefetching. The same slowdown
happens with -O2.

[Bug tree-optimization/92738] [10 regression] Large code size growth for -O2 binaries between 2019-05-19...2019-05-29

2019-12-02 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92738

--- Comment #8 from Martin Liška  ---
(In reply to Wilco from comment #7)
> (In reply to Martin Liška from comment #6)
> > So wrf grew starting with r271377, size (w/o debug info) goes from 20164464B
> > to 23674792.
> 
> Also check the build time of wrf. Looking at my logs trunk takes 2x as long
> to build it since June.

Maybe related to:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91509
?

[Bug tree-optimization/92738] [10 regression] Large code size growth for -O2 binaries between 2019-05-19...2019-05-29

2019-12-02 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92738

Wilco  changed:

   What|Removed |Added

 CC||wilco at gcc dot gnu.org

--- Comment #7 from Wilco  ---
(In reply to Martin Liška from comment #6)
> So wrf grew starting with r271377, size (w/o debug info) goes from 20164464B
> to 23674792.

Also check the build time of wrf. Looking at my logs trunk takes 2x as long to
build it since June.

[Bug tree-optimization/92738] [10 regression] Large code size growth for -O2 binaries between 2019-05-19...2019-05-29

2019-12-02 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92738

Martin Liška  changed:

   What|Removed |Added

 CC||tkoenig at gcc dot gnu.org

--- Comment #6 from Martin Liška  ---
So wrf grew starting with r271377, size (w/o debug info) goes from 20164464B to
23674792.

[Bug tree-optimization/92738] [10 regression] Large code size growth for -O2 binaries between 2019-05-19...2019-05-29

2019-12-02 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92738

--- Comment #5 from Martin Liška  ---
Ok, I've just updated LNT filter, and one can see it better with:
https://lnt.opensuse.org/db_default/v4/SPEC/spec_report/branch?sorting=gcc-9%2Cgcc-trunk_elf_detail_stats=on

I'm going to bisect the WRF size bump.

[Bug tree-optimization/92738] [10 regression] Large code size growth for -O2 binaries between 2019-05-19...2019-05-29

2019-12-01 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92738

Richard Biener  changed:

   What|Removed |Added

   Keywords||missed-optimization,
   ||needs-bisection

--- Comment #4 from Richard Biener  ---
There is

2019-05-23  Richard Biener  

PR tree-optimization/88440
* opts.c (default_options_table): Enable
-ftree-loop-distribute-patterns
at -O[2s]+.
* tree-loop-distribution.c (generate_memset_builtin): Fold the
generated call.
(generate_memcpy_builtin): Likewise.
(distribute_loop): Pass in whether to only distribute patterns.
(prepare_perfect_loop_nest): Also allow size optimization.
(pass_loop_distribution::execute): When optimizing a loop
nest for size allow pattern replacement.

but that should cause code-size shrinking... (just try
-fno-tree-loop-distribute-patterns and see if fixed)

[Bug tree-optimization/92738] [10 regression] Large code size growth for -O2 binaries between 2019-05-19...2019-05-29

2019-12-01 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92738

--- Comment #3 from Martin Liška  ---
One of the big changes that caused that:
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=21.264.4

[Bug tree-optimization/92738] [10 regression] Large code size growth for -O2 binaries between 2019-05-19...2019-05-29

2019-12-01 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92738

Martin Liška  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2019-12-01
 CC||marxin at gcc dot gnu.org
  Known to work||9.2.0
 Blocks||26163
   Target Milestone|--- |10.0
 Ever confirmed|0   |1
  Known to fail||10.0


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26163
[Bug 26163] [meta-bug] missed optimization in SPEC (2k17, 2k and 2k6 and 95)

[Bug tree-optimization/92738] [10 regression] Large code size growth for -O2 binaries between 2019-05-19...2019-05-29

2019-11-30 Thread hubicka at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92738

--- Comment #2 from Jan Hubicka  ---
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=10.542.4_run=7354
shows shorter range
+2019-05-24  Jakub Jelinek  
+
+   * tree-core.h (enum omp_clause_code): Add OMP_CLAUSE__CONDTEMP_.
+   * tree.h (OMP_CLAUSE_DECL): Use OMP_CLAUSE__CONDTEMP_ instead of
+   OMP_CLAUSE__REDUCTEMP_.
+   * tree.c (omp_clause_num_ops, omp_clause_code_name): Add
+   OMP_CLAUSE__CONDTEMP_.

+2019-05-19  Segher Boessenkool  
+
+   * config/rs6000/constraints.md (define_register_constraint "wo"):
+   Delete.
+   * config/rs6000/rs6000.h (enum r6000_reg_class_enum): Delete
+   RS6000_CONSTRAINT_wo.
+   * config/rs6000/rs6000.c (rs6000_debug_reg_global): Adjust.
+   (rs6000_init_hard_regno_mode_ok): Adjust.
+   * config/rs6000/rs6000.md: Replace "wo" constraint by "wa" with "p9v".
+   * config/rs6000/altivec.md: Ditto.
+   * doc/md.texi (Machine Constraints): Adjust.
+
 2019-05-18  Iain Sandoe  

It may be easy to bisect.

[Bug tree-optimization/92738] [10 regression] Large code size growth for -O2 binaries between 2019-05-19...2019-05-29

2019-11-30 Thread hubicka at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92738

--- Comment #1 from Jan Hubicka  ---
This is seen on
https://lnt.opensuse.org/db_default/v4/SPEC/graph?highlight_run=7361=31.574.4