[Bug rtl-optimization/92264] [10 Regression] Compile time hog in 521.wrf_r with -Ofast -march=znver2 -g since r276318

2020-04-07 Thread cvs-commit at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92264

--- Comment #47 from CVS Commits  ---
The master branch has been updated by Jeff Law :

https://gcc.gnu.org/g:14162197fd4cf0e3a848d32bd9385876e1b1f249

commit r10-7609-g14162197fd4cf0e3a848d32bd9385876e1b1f249
Author: Jeff Law 
Date:   Tue Apr 7 17:55:00 2020 -0600

Fix a variety of testsuite failures on the H8 after recent cselib changes

PR rtl-optimization/92264
* config/h8300/h8300.md (mov;add peephole2): Avoid applying when
the destination is the stack pointer.

[Bug rtl-optimization/92264] [10 Regression] Compile time hog in 521.wrf_r with -Ofast -march=znver2 -g since r276318

2020-04-03 Thread cvs-commit at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92264

--- Comment #46 from CVS Commits  ---
The master branch has been updated by Jeff Law :

https://gcc.gnu.org/g:7f26e60c2600f0a676fc9370390ebf0a3c78f31c

commit r10-7548-g7f26e60c2600f0a676fc9370390ebf0a3c78f31c
Author: Jeff Law 
Date:   Fri Apr 3 17:47:18 2020 -0600

Fix stdarg-3 regression on xstormy16 port

PR rtl-optimization/92264
* config/stormy16/stormy16.c (xstormy16_preferred_reload_class):
Handle
reloading of auto-increment addressing modes.

[Bug rtl-optimization/92264] [10 Regression] Compile time hog in 521.wrf_r with -Ofast -march=znver2 -g since r276318

2020-04-03 Thread cvs-commit at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92264

--- Comment #45 from CVS Commits  ---
The master branch has been updated by Jeff Law :

https://gcc.gnu.org/g:b949f8e2acb49273b2f08ecaa3bc7128baaad850

commit r10-7544-gb949f8e2acb49273b2f08ecaa3bc7128baaad850
Author: Jeff Law 
Date:   Fri Apr 3 12:46:13 2020 -0600

Fix va-arg-22.c at -O1 on m32r.

PR rtl-optimization/92264
* config/m32r/m32r.c (m32r_output_block_move): Properly account for
post-increment addressing of source operands as well as residuals
when computing any adjustments to the input pointer.

[Bug rtl-optimization/92264] [10 Regression] Compile time hog in 521.wrf_r with -Ofast -march=znver2 -g since r276318

2020-04-02 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92264

Jakub Jelinek  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #44 from Jakub Jelinek  ---
Fixed then.

[Bug rtl-optimization/92264] [10 Regression] Compile time hog in 521.wrf_r with -Ofast -march=znver2 -g since r276318

2020-04-02 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92264

--- Comment #43 from Martin Liška  ---
(In reply to Jakub Jelinek from comment #42)
> Is that good enough to mark this PR as resolved?  In #c0 you said before
> Richard's change it took ~200s, which is more than 2m21s, though it is
> unclear if those 141s are with checking compiler or not.

These 140 are with default checking for current master which is checking
enabled. That said, I would close this.

[Bug rtl-optimization/92264] [10 Regression] Compile time hog in 521.wrf_r with -Ofast -march=znver2 -g since r276318

2020-04-02 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92264

--- Comment #42 from Jakub Jelinek  ---
Is that good enough to mark this PR as resolved?  In #c0 you said before
Richard's change it took ~200s, which is more than 2m21s, though it is unclear
if those 141s are with checking compiler or not.

[Bug rtl-optimization/92264] [10 Regression] Compile time hog in 521.wrf_r with -Ofast -march=znver2 -g since r276318

2020-04-02 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92264

--- Comment #41 from Martin Liška  ---
The current master does:

$ time gfortran module_configure.fppized.f90 -c -march=znver2 -std=legacy
-fconvert=big-endian -fno-openmp -Ofast -march=znver2 -g
...
real2m21.190s
user2m20.487s
sys 0m0.647s

[Bug rtl-optimization/92264] [10 Regression] Compile time hog in 521.wrf_r with -Ofast -march=znver2 -g since r276318

2020-04-02 Thread cvs-commit at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92264

--- Comment #40 from CVS Commits  ---
The master branch has been updated by Jakub Jelinek :

https://gcc.gnu.org/g:86c924113208f58fdda24078c9cc9285ee8000cd

commit r10-7516-g86c924113208f58fdda24078c9cc9285ee8000cd
Author: Jakub Jelinek 
Date:   Thu Apr 2 14:34:42 2020 +0200

params: Decrease -param=max-find-base-term-values= default [PR92264]

For the PR in question, my proposal would be to also lower
-param=max-find-base-term-values=
default from 2000 to 200 after this, at least in the above 4
bootstraps/regtests there is nothing that would ever result in
find_base_term returning non-NULL with more than 200 VALUEs being
processed.

2020-04-02  Jakub Jelinek  

PR rtl-optimization/92264
* params.opt (-param=max-find-base-term-values=): Decrease default
from 2000 to 200.

[Bug rtl-optimization/92264] [10 Regression] Compile time hog in 521.wrf_r with -Ofast -march=znver2 -g since r276318

2020-04-02 Thread cvs-commit at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92264

--- Comment #39 from CVS Commits  ---
The master branch has been updated by Jakub Jelinek :

https://gcc.gnu.org/g:2c0fa3ecf70d199af18785702e9e0548fd3ab793

commit r10-7515-g2c0fa3ecf70d199af18785702e9e0548fd3ab793
Author: Jakub Jelinek 
Date:   Thu Apr 2 14:28:14 2020 +0200

cselib: Reuse VALUEs on sp adjustments [PR92264]

As discussed in the PR, if !ACCUMULATE_OUTGOING_ARGS on large functions we
can have hundreds of thousands of stack pointer adjustments and cselib
creates a new VALUE after each sp adjustment, which form extremely deep
VALUE chains, which is very harmful e.g. for find_base_term.
E.g. if we have
sp -= 4
sp -= 4
sp += 4
sp += 4
sp -= 4
sp += 4
that means 7 VALUEs, one for the sp at beginning (val1), than val2 = val1 -
4, then val3 = val2 - 4, then val4 = val3 + 4, then val5 = val4 + 4, then
val6 = val5 - 4, then val7 = val6 + 4.
This patch tweaks cselib, so that it is smarter about sp adjustments.
When cselib_lookup (stack_pointer_rtx, Pmode, 1, VOIDmode) and we know
nothing about sp yet (this happens at the start of the function, for
non-var-tracking also after cselib_reset_table and for var-tracking after
processing fp_setter insn where we forget about former sp values because
that is now hfp related while everything after it is sp related), we
look it up normally, but in addition to what we have been doing before
we mark the VALUE as SP_DERIVED_VALUE_P.  Further lookups of sp + offset
are then special cased, so that it is canonicalized to that
SP_DERIVED_VALUE_P VALUE + CONST_INT (if possible).  So, for the above,
we get val1 with SP_DERIVED_VALUE_P set, then val2 = val1 - 4, val3 = val1
-
8 (note, no longer val2 - 4!), then we get val2 again, val1 again, val2
again, val1 again.
In the find_base_term visited_vals.length () > 100 find_base_term
statistics during combined x86_64-linux and i686-linux bootstrap+regtest
cycle, without the patch I see:
find_base_term > 100
returning NULL  returning non-NULL
32-bit compilations 4229178 407
64-bit compilations 217523  0
with largest visited_vals.length () when returning non-NULL being 206.
With the patch the same numbers are:
32-bit compilations 1249588 135
64-bit compilations 35100
with largest visited_vals.length () when returning non-NULL being 173.
This shows significant reduction of the deep VALUE chains.
On powerpc64{,le}-linux, these stats didn't change at all, we have
10080
for all of -m32, -m64 and little-endian -m64, just the
gcc.dg/pr85180.c and gcc.dg/pr87985.c testcases which are unrelated to sp.

My earlier version of the patch, which contained just the rtl.h and
cselib.c
changes, regressed some tests:
gcc.dg/guality/{pr36728-{1,3},pr68860-{1,2}}.c
gcc.target/i386/{pr88416,sse-{13,23,24,25,26}}.c
The problem with the former tests was worse debug info, where with -m32
where arg7 was passed in a stack slot we though a push later on might have
invalidated it, when it couldn't.  This is something I've solved with the
var-tracking.c (vt_initialize) changes.  In those problematic functions, we
create a cfa_base VALUE (argp) and want to record that at the start of
the function the argp VALUE is sp + off and also record that current sp
VALUE is argp's VALUE - off.  The second permanent equivalence didn't make
it after the patch though, because cselib_add_permanent_equiv will
cselib_lookup the value of the expression it wants to add as the
equivalence
and if it is the same VALUE as we are calling it on, it doesn't do
anything;
and due to the cselib changes for sp based accesses that is exactly what
happened.  By reversing the order of the cselib_add_permanent_equiv calls
we
get both equivalences though and thus are able to canonicalize the sp based
accesses in var-tracking to the cfa_base value + offset.
The i386 FAILs were all ICEs, where we had pushf instruction pushing flags
and then pop pseudo reading that value again.  With the cselib changes,
cselib during RTL DSE is able to see through the sp adjustment and wanted
to replace_read what was done pushf, by moving the flags register into a
pseudo and replace the memory read in the pop with that pseudo.  That is
wrong for two reasons: one is that the backend doesn't have an instruction
to move the flags hard register into some other register, but replace_read
has been validating just the mem -> pseudo replacement and not the insns
emitted by copy_to_mode_reg.  And the second issue is that it is obviously
wrong to replace a stack pop which contains stack post-increment by a copy
of pseudo into destination.  

[Bug rtl-optimization/92264] [10 Regression] Compile time hog in 521.wrf_r with -Ofast -march=znver2 -g since r276318

2020-03-26 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92264

--- Comment #38 from Jakub Jelinek  ---
No, far from it.

[Bug rtl-optimization/92264] [10 Regression] Compile time hog in 521.wrf_r with -Ofast -march=znver2 -g since r276318

2020-03-26 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92264

--- Comment #37 from Martin Liška  ---
@Jakub: Can we close it? Or do you plan any other patch for it?

[Bug rtl-optimization/92264] [10 Regression] Compile time hog in 521.wrf_r with -Ofast -march=znver2 -g since r276318

2020-03-26 Thread cvs-commit at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92264

--- Comment #36 from CVS Commits  ---
The master branch has been updated by Jakub Jelinek :

https://gcc.gnu.org/g:9708ca2be40399d6266bc85c99e085e3fe27a809

commit r10-7392-g9708ca2be40399d6266bc85c99e085e3fe27a809
Author: Jakub Jelinek 
Date:   Thu Mar 26 09:15:39 2020 +0100

var-tracking: Mark as sp based more VALUEs [PR92264]

With this simple patch, on i686-linux and x86_64-linux with -m32 (no change
for -m64), the find_base_term visited_vals.length () > 100 find_base_term
statistics changed (fbt is before this patch, fbt2 with this patch):
for k in '' '1$'; do for i in 32 64; do for j in fbt fbt2; do \
echo -n "$j $i $k "; LC_ALL=C grep ^$i.*"$k" $j | wc -l; done; done; done
fbt 32  5313355
fbt2 32  4229854
fbt 64  217523
fbt2 64  217523
fbt 32 1$ 1296
fbt2 32 1$ 407
fbt 64 1$ 0
fbt2 64 1$ 0
For frame_pointer_needed functions, we need to wait until we see the
fp_setter insn in the prologue at which point we disassociate the fp based
VALUEs from sp based VALUEs, but for !frame_pointer_needed functions,
we IMHO don't need to wait anything.  For ACCUMULATE_OUTGOING_ARGS it isn't
IMHO worth doing anything, as there is a single sp adjustment and so there
is no risk of many thousands deep VALUE chains, but for
!ACCUMULATE_OUTGOING_ARGS the sp keeps changing constantly.

2020-03-26  Jakub Jelinek  

PR rtl-optimization/92264
* var-tracking.c (add_stores): Call cselib_set_value_sp_based even
for sp based values in !frame_pointer_needed
&& !ACCUMULATE_OUTGOING_ARGS functions.

[Bug rtl-optimization/92264] [10 Regression] Compile time hog in 521.wrf_r with -Ofast -march=znver2 -g since r276318

2020-03-23 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92264

--- Comment #35 from Martin Liška  ---
The suggested patch helped to reduce compilation to:
2m54s

[Bug rtl-optimization/92264] [10 Regression] Compile time hog in 521.wrf_r with -Ofast -march=znver2 -g since r276318

2020-03-23 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92264

--- Comment #34 from Jakub Jelinek  ---
Created attachment 48081
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=48081=edit
gcc10-pr92264-wip.patch

Further updated patch, this one passes bootstrap on both x86_64-linux and
i686-linux and according to the gathered statistics results in significantly
fewer cases of visited_vals.length () > 100 in find_base_term, together with
the var-tracking.c small change posted to gcc-patches from 5530878 to 1253101
cases.
But, so far it causes a few regressions:
gcc.dg/guality/pr36728-{1,3}.c and gcc.dg/guality/pr68860-{1,2}.c at various
optimization levels and ICEs on gcc.target/i386/pr88416.c and
gcc.target/i386/sse-{13,23,24,25,26}.c.  So far I've looked at pr88416.c and
the problem is that DSE now on:
(insn 10 9 11 2 (set (mem:DI (pre_dec:DI (reg/f:DI 7 sp)) [0  S8 A8])
(reg:DI 17 flags)) "include/ia32intrin.h":262:10 56 {*pushfldi2}
 (expr_list:REG_DEAD (reg:DI 17 flags)
(nil)))
(insn 11 10 29 2 (set (reg:DI 85)
(mem:DI (post_inc:DI (reg/f:DI 7 sp)) [0  S8 A8]))
"include/ia32intrin.h":262:10 52 {*popdi1}
 (nil))
"optimizes" it into:
(insn 31 9 10 2 (set (reg:DI 89)
(reg:DI 17 flags)) "include/ia32intrin.h":262:10 -1
 (nil))
(insn 10 31 11 2 (set (mem:DI (pre_dec:DI (reg/f:DI 7 sp)) [0  S8 A8])
(reg:DI 17 flags)) "include/ia32intrin.h":262:10 56 {*pushfldi2}
 (expr_list:REG_DEAD (reg:DI 17 flags)
(nil)))
(insn 11 10 29 2 (set (reg:DI 85)
(reg:DI 89)) "include/ia32intrin.h":262:10 66 {*movdi_internal}
 (nil))
This is wrong for 2 reasons.  The newly added insn 31 isn't recognized and DSE
doesn't bother trying to recognize it, and when the pop is optimized away into
just reg to reg assignment, the side-effect (post-increment of sp) is lost.
Wonder if DSE shouldn't punt for autoinc (don't remove stores that have them)
or say try to replace the autoinc with register update, and whether it
shouldn't try to recog new insns it creates.

As for the guality failures, I've so far briefly looked only at the first one,
where we end up with an extra
+(note 84 53 54 2 (var_location arg7 (nil)) NOTE_INSN_VAR_LOCATION)
that wasn't there before.  Will try to understand this one.

[Bug rtl-optimization/92264] [10 Regression] Compile time hog in 521.wrf_r with -Ofast -march=znver2 -g since r276318

2020-03-20 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92264

--- Comment #33 from Jakub Jelinek  ---
Created attachment 48075
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=48075=edit
gcc10-pr92264-wip.patch

Updated patch, which doesn't ICE anymore, and creates 10500 instead of 12000
VALUEs during var-tracking on diagnostic-show-location.c, which should be a
good thing.  But the .s file is smaller and in the *.vartrack dump I see some
NOTE_INSN_VAR_LOCATIONs missing, so apparently it does have bad impact on debug
info quality.

[Bug rtl-optimization/92264] [10 Regression] Compile time hog in 521.wrf_r with -Ofast -march=znver2 -g since r276318

2020-03-20 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92264

--- Comment #32 from Jakub Jelinek  ---
Some incremental progress, but still ICEs...
--- gcc/cselib.c2020-03-20 17:42:02.333023994 +0100
+++ gcc/cselib.c2020-03-20 19:23:33.506622424 +0100
@@ -58,6 +58,16 @@
 static void cselib_invalidate_mem (rtx);
 static void cselib_record_set (rtx, cselib_val *, cselib_val *);
 static void cselib_record_sets (rtx_insn *);
+static rtx autoinc_split (rtx, rtx *, machine_mode);
+
+#define PRESERVED_VALUE_P(RTX) \
+  (RTL_FLAG_CHECK1 ("PRESERVED_VALUE_P", (RTX), VALUE)->unchanging)
+
+#define SP_BASED_VALUE_P(RTX) \
+  (RTL_FLAG_CHECK1 ("SP_BASED_VALUE_P", (RTX), VALUE)->jump)
+
+#define SP_DERIVED_VALUE_P(RTX) \
+  (RTL_FLAG_CHECK1 ("SP_DERIVED_VALUE_P", (RTX), VALUE)->call)

 struct expand_value_data
 {
@@ -122,6 +132,13 @@
   if (GET_CODE (x) == VALUE)
 return x == v->val_rtx;

+  if (SP_DERIVED_VALUE_P (v->val_rtx) && GET_MODE (x) == Pmode)
+{
+  rtx xoff = NULL;
+  if (autoinc_split (x, , memmode) == v->val_rtx && xoff == NULL_RTX)
+   return true;
+}
+
   /* We don't guarantee that distinct rtx's have different hash values,
  so we need to do a comparison.  */
   for (l = v->locs; l; l = l->next)
@@ -256,15 +273,6 @@
 void (*cselib_record_sets_hook) (rtx_insn *insn, struct cselib_set *sets,
 int n_sets);

-#define PRESERVED_VALUE_P(RTX) \
-  (RTL_FLAG_CHECK1 ("PRESERVED_VALUE_P", (RTX), VALUE)->unchanging)
-
-#define SP_BASED_VALUE_P(RTX) \
-  (RTL_FLAG_CHECK1 ("SP_BASED_VALUE_P", (RTX), VALUE)->jump)
-
-#define SP_DERIVED_VALUE_P(RTX) \
-  (RTL_FLAG_CHECK1 ("SP_DERIVED_VALUE_P", (RTX), VALUE)->call)
-



 /* Allocate a struct elt_list and fill in its two elements with the
@@ -497,7 +505,7 @@
   };
   cselib_val **slot
= cselib_preserved_hash_table->find_slot_with_hash (,
-  v->hash, INSERT);
+   v->hash, INSERT);
   gcc_assert (!*slot);
   *slot = v;
 }
@@ -535,6 +543,28 @@
   max_value_regs
= hard_regno_nregs (regno,
GET_MODE (cfa_base_preserved_val->locs->loc));
+
+  /* If cfa_base is sp + const_int, need to preserve also the
+SP_DERIVED_VALUE_P value.  */
+  for (struct elt_loc_list *l = cfa_base_preserved_val->locs;
+  l; l = l->next)
+   if (GET_CODE (l->loc) == PLUS
+   && GET_CODE (XEXP (l->loc, 0)) == VALUE
+   && SP_DERIVED_VALUE_P (XEXP (l->loc, 0))
+   && CONST_INT_P (XEXP (l->loc, 1)))
+ {
+   if (! invariant_or_equiv_p (CSELIB_VAL_PTR (XEXP (l->loc, 0
+ {
+   rtx val = cfa_base_preserved_val->val_rtx;
+   rtx_insn *save_cselib_current_insn = cselib_current_insn;
+   cselib_current_insn = l->setting_insn;
+   new_elt_loc_list (CSELIB_VAL_PTR (XEXP (l->loc, 0)),
+ plus_constant (Pmode, val,
+-UINTVAL (XEXP (l->loc, 1;
+   cselib_current_insn = save_cselib_current_insn;
+ }
+   break;
+ }
 }
   else
 {
@@ -544,8 +574,7 @@
 }

   if (cselib_preserve_constants)
-cselib_hash_table->traverse 
-  (NULL);
+cselib_hash_table->traverse 
(NULL);
   else
 {
   cselib_hash_table->empty ();

[Bug rtl-optimization/92264] [10 Regression] Compile time hog in 521.wrf_r with -Ofast -march=znver2 -g since r276318

2020-03-20 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92264

--- Comment #31 from Jakub Jelinek  ---
Created attachment 48073
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=48073=edit
gcc10-pr92264-wip.patch

WIP patch to try special casing cselib handling of stack pointer VALUEs.
The intent is that when we first cselib_lookup the stack pointer (e.g. at the
start of a bb), we mark the created VALUE as SP_DERIVED_VALUE_P and when we
lookup sp + CONST_INT, we try to canonicalize that lookup into lookup of that
VALUE + adjusted offset.  Grepping e.g. -fdump-rtl-postreload-cselib or
-fdump-rtl-dse2-cselib for ^cselib.value shows that with -O2 -m32 we create
significantly fewer new VALUEs as we can share the VALUEs that are that
SP_DERIVED_VALUE_P VALUE + offset.
Unfortunately, this ICEs during var-tracking due to VALUE canonicalization and
PRESERVED_VALUE_P.

[Bug rtl-optimization/92264] [10 Regression] Compile time hog in 521.wrf_r with -Ofast -march=znver2 -g since r276318

2020-03-19 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92264

--- Comment #30 from Jakub Jelinek  ---
Or perhaps just do:
--- gcc/var-tracking.c.jj   2020-01-12 11:54:38.532381439 +0100
+++ gcc/var-tracking.c  2020-03-19 15:49:19.457340470 +0100
@@ -6112,7 +6112,8 @@ add_stores (rtx loc, const_rtx expr, voi
 }

   if (loc == stack_pointer_rtx
-  && maybe_ne (hard_frame_pointer_adjustment, -1)
+  && (maybe_ne (hard_frame_pointer_adjustment, -1)
+ || (!frame_pointer_needed && !ACCUMULATE_OUTGOING_ARGS))
   && preserve)
 cselib_set_value_sp_based (v);

and with that, retry the statistics gathering to see if now everything > 100 is
NULL?

[Bug rtl-optimization/92264] [10 Regression] Compile time hog in 521.wrf_r with -Ofast -march=znver2 -g since r276318

2020-03-19 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92264

--- Comment #29 from Jakub Jelinek  ---
So, the reason why the values aren't being marked as sp based is given in
PR54796 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54796#c2
In these functions, there is no fp_setter insn, so
hard_frame_pointer_adjustment is -1, all we have there is
(insn/f:TI 2695 2874 2696 2 (set (mem:SI (pre_dec:SI (reg/f:SI 7 sp)) [0  S4
A8])
(reg:SI 6 bp)) "../../gcc/diagnostic-show-locus.c":3861:1 43 {*pushsi2}
 (expr_list:REG_DEAD (reg:SI 6 bp)
(nil)))
some pushes to preserve call saved regs
(insn/f:TI 2699 2698 2700 2 (parallel [
(set (reg/f:SI 7 sp)
(plus:SI (reg/f:SI 7 sp)
(const_int -508 [0xfe04])))
(clobber (reg:CC 17 flags))
(clobber (mem:BLK (scratch) [0  A8]))
]) "../../gcc/diagnostic-show-locus.c":3861:1 1058
{pro_epilogue_adjust_stack_add_si}
 (expr_list:REG_UNUSED (reg:CC 17 flags)
(expr_list:REG_CFA_ADJUST_CFA (set (reg/f:SI 7 sp)
(plus:SI (reg/f:SI 7 sp)
(const_int -508 [0xfe04])))
(nil
(note 2700 2699 17 2 NOTE_INSN_PROLOGUE_END)

Now, the reason we don't mark preserved values sp based unconditionally if they
are sp based is the fear about the prologue and pre-prologue spots, as we need
to disambiguate between sp based and hfp based VALUEs and they normally start
the same as sp based.

Not sure what we can do, perhaps also remember if we've seen in the prologue
some important hfp related instruction, at that point somehow force using
different VALUEs for hfp vs. sp based accesses (dunno if by cselib invalidation
of hfp or of sp) and from that point on mark the clearly sp based VALUEs as
sp_based.

[Bug rtl-optimization/92264] [10 Regression] Compile time hog in 521.wrf_r with -Ofast -march=znver2 -g since r276318

2020-03-19 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92264

--- Comment #28 from Martin Liška  ---
(In reply to Richard Biener from comment #25)
> Created attachment 48064 [details]
> more localized caching
> 
> Updated and simplified patch.  Maybe it does help depending on how we have
> shared locs for multiple values ...
> 
> The update is to not bother updating cache values with non-NULL found ones.

So this patch is running for the problematic wrf testcase for more that 2h30m..

[Bug rtl-optimization/92264] [10 Regression] Compile time hog in 521.wrf_r with -Ofast -march=znver2 -g since r276318

2020-03-19 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92264

--- Comment #27 from Jakub Jelinek  ---
So, I've looked at two cases from those where visited_vals.length () > 100
returned non-NULL, in particular Wsizeof-pointer-memaccess1.c and
diagnostic-show-locus.c.  In both all the cases where it returned non-NULL and
length () > 100 were those where it returns
static_reg_base_value[STACK_POINTER_REGNUM].
Which seems exactly like the case for which cselib_sp_based_value_p has been
invented, except it doesn't seem to work in this case the way it should.
Will try to have a look why.  Anyway, if that is fixed, perhaps we can just
lower the param to 100?

[Bug rtl-optimization/92264] [10 Regression] Compile time hog in 521.wrf_r with -Ofast -march=znver2 -g since r276318

2020-03-19 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92264

--- Comment #26 from Richard Biener  ---
(In reply to Martin Liška from comment #24)
> (In reply to Richard Biener from comment #23)
> > (In reply to Richard Biener from comment #22)
> > > Created attachment 48063 [details]
> > > more localized caching
> > > 
> > > Like this.  Martin, can you also check the effect on this one?
> > 
> > We can actually simplify since we won't ever use non-NULL cached vals which
> > also means this patch is a no-op and should be worse than the existing
> > state :/
> 
> So no testing is needed?

would be still interesting (but surprising) if it helps (doesn't matter
whether the original or the updated patch, the update is just constant
time improvement)

[Bug rtl-optimization/92264] [10 Regression] Compile time hog in 521.wrf_r with -Ofast -march=znver2 -g since r276318

2020-03-19 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92264

Richard Biener  changed:

   What|Removed |Added

  Attachment #48063|0   |1
is obsolete||

--- Comment #25 from Richard Biener  ---
Created attachment 48064
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=48064=edit
more localized caching

Updated and simplified patch.  Maybe it does help depending on how we have
shared locs for multiple values ...

The update is to not bother updating cache values with non-NULL found ones.

[Bug rtl-optimization/92264] [10 Regression] Compile time hog in 521.wrf_r with -Ofast -march=znver2 -g since r276318

2020-03-19 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92264

--- Comment #24 from Martin Liška  ---
(In reply to Richard Biener from comment #23)
> (In reply to Richard Biener from comment #22)
> > Created attachment 48063 [details]
> > more localized caching
> > 
> > Like this.  Martin, can you also check the effect on this one?
> 
> We can actually simplify since we won't ever use non-NULL cached vals which
> also means this patch is a no-op and should be worse than the existing
> state :/

So no testing is needed?

[Bug rtl-optimization/92264] [10 Regression] Compile time hog in 521.wrf_r with -Ofast -march=znver2 -g since r276318

2020-03-19 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92264

--- Comment #23 from Richard Biener  ---
(In reply to Richard Biener from comment #22)
> Created attachment 48063 [details]
> more localized caching
> 
> Like this.  Martin, can you also check the effect on this one?

We can actually simplify since we won't ever use non-NULL cached vals which
also means this patch is a no-op and should be worse than the existing
state :/

[Bug rtl-optimization/92264] [10 Regression] Compile time hog in 521.wrf_r with -Ofast -march=znver2 -g since r276318

2020-03-19 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92264

--- Comment #22 from Richard Biener  ---
Created attachment 48063
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=48063=edit
more localized caching

Like this.  Martin, can you also check the effect on this one?

[Bug rtl-optimization/92264] [10 Regression] Compile time hog in 521.wrf_r with -Ofast -march=znver2 -g since r276318

2020-03-19 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92264

--- Comment #21 from Richard Biener  ---
(In reply to Jakub Jelinek from comment #19)
> I think caching is problematic, for a couple of reasons:
> 1) for non-cselib_preserved_value_p, the loc list is dynamic and keeps
> changing, locs are added and removed as we go through the basic blocks

Sure, but if you look at my patch I'm caching on individual locs, not on
the whole list.  That then still doesn't avoid traversing the whole list
if we don't find a base but we'd at least elide further recursion.

The base we pick also depends on the ordering of the list currently
if there ever are two possible choices.

> 2) because of the recursion prevention, doesn't it matter from exactly what
> VALUE we start walking (in case we have a cycle or cycles)?

I think the outcome is essentially random anyways since we pick the first
base we find.  I'm quite sure that if we collected "all" bases we'd find
they are not equivalent in some cases.

> 3) plus the new param on visited_vals, if we reach it, caching is unreliable

Sure, but does anybody care?  Note what I'd really propose would be
find_base_term-local caching, eliding both the recursion prevention and
the param.

[Bug rtl-optimization/92264] [10 Regression] Compile time hog in 521.wrf_r with -Ofast -march=znver2 -g since r276318

2020-03-19 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92264

--- Comment #20 from Martin Liška  ---
(In reply to Richard Biener from comment #17)
> Created attachment 48061 [details]
> cache base term
> 
> I wonder if we could simply cache the base terms in elt_loc_list?  Does that
> make a difference?

Yes:

$ time gfortran module_configure.fppized.f90 -c -march=znver2 -std=legacy
-fconvert=big-endian -fno-openmp -Ofast -march=znver2 -g
real1m43.067s
user1m42.410s

$ perf report --stdio | head -n20
...
 6.26%  f951  f951 [.] get_ref_base_and_extent
 5.85%  f951  f951 [.] bitmap_set_bit
 5.72%  f951  f951 [.] find_base_term
 3.73%  f951  f951 [.]
pre_expr_reaches_here_p_work
 2.77%  f951  f951 [.] fsm_find_thread_path
 2.62%  f951  f951 [.] rtx_equal_for_memref_p
 2.32%  f951  f951 [.] build_object_conflicts
 2.17%  f951  f951 [.] memrefs_conflict_p
 1.88%  f951  f951 [.] ira_build_conflicts

[Bug rtl-optimization/92264] [10 Regression] Compile time hog in 521.wrf_r with -Ofast -march=znver2 -g since r276318

2020-03-19 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92264

--- Comment #19 from Jakub Jelinek  ---
I think caching is problematic, for a couple of reasons:
1) for non-cselib_preserved_value_p, the loc list is dynamic and keeps
changing, locs are added and removed as we go through the basic blocks
2) because of the recursion prevention, doesn't it matter from exactly what
VALUE we start walking (in case we have a cycle or cycles)?
3) plus the new param on visited_vals, if we reach it, caching is unreliable

I've performed {x86_64,i686}-linux bootstraps+regtests overnight with the above
mentioned statistics gathering and the end result is 5530878 cases where
find_base_term had visited_vals.length () > 100 on return, but only 1296 cases
out of that where with visited_vals.length () > 100 we actually returned
non-NULL, and no cases of visited_vals.length () > 500 where we returned
non-NULL.

I'm listing just the cases where it returned non-NULL, as can be seen, all are
for -m32 and guess it would be nice to have a look at them what we actually
manage to return and how the VALUE chains look like.

$ grep -v '0$' fbt | sort | uniq -c | sort -r -k +5 -n
  1 32 ../../gcc/diagnostic-show-locus.c
selftest::test_diagnostic_show_locus_one_liner_utf8 484 1
  1 32 ../../gcc/diagnostic-show-locus.c
selftest::test_diagnostic_show_locus_one_liner_utf8 482 1
  1 32 ../../gcc/diagnostic-show-locus.c
selftest::test_diagnostic_show_locus_one_liner_utf8 481 1
  3 32 ../../gcc/diagnostic-show-locus.c
selftest::test_diagnostic_show_locus_one_liner_utf8 480 1
  2 32 ../../gcc/diagnostic-show-locus.c
selftest::test_diagnostic_show_locus_one_liner_utf8 479 1
  1 32 ../../gcc/diagnostic-show-locus.c
selftest::test_diagnostic_show_locus_one_liner_utf8 477 1
  1 32 ../../gcc/diagnostic-show-locus.c
selftest::test_diagnostic_show_locus_one_liner_utf8 476 1
  1 32 ../../gcc/diagnostic-show-locus.c
selftest::test_diagnostic_show_locus_one_liner_utf8 473 1
  3 32 ../../gcc/diagnostic-show-locus.c
selftest::test_diagnostic_show_locus_one_liner_utf8 472 1
  1 32 ../../gcc/diagnostic-show-locus.c
selftest::test_diagnostic_show_locus_one_liner_utf8 469 1
  1 32 ../../gcc/diagnostic-show-locus.c
selftest::test_diagnostic_show_locus_one_liner_utf8 468 1
  1 32 ../../gcc/diagnostic-show-locus.c
selftest::test_diagnostic_show_locus_one_liner_utf8 467 1
  1 32 ../../gcc/diagnostic-show-locus.c
selftest::test_diagnostic_show_locus_one_liner_utf8 464 1
  1 32 ../../gcc/diagnostic-show-locus.c
selftest::test_diagnostic_show_locus_one_liner_utf8 463 1
  1 32 ../../gcc/diagnostic-show-locus.c
selftest::test_diagnostic_show_locus_one_liner_utf8 462 1
  1 32 ../../gcc/diagnostic-show-locus.c
selftest::test_diagnostic_show_locus_one_liner_utf8 458 1
  1 32 ../../gcc/diagnostic-show-locus.c
selftest::test_diagnostic_show_locus_one_liner_utf8 456 1
  1 32 ../../gcc/diagnostic-show-locus.c
selftest::test_diagnostic_show_locus_one_liner_utf8 455 1
  3 32 ../../gcc/diagnostic-show-locus.c
selftest::test_diagnostic_show_locus_one_liner_utf8 454 1
  2 32 ../../gcc/diagnostic-show-locus.c
selftest::test_diagnostic_show_locus_one_liner_utf8 453 1
  1 32 ../../gcc/diagnostic-show-locus.c
selftest::test_diagnostic_show_locus_one_liner_utf8 451 1
  1 32 ../../gcc/diagnostic-show-locus.c
selftest::test_diagnostic_show_locus_one_liner_utf8 450 1
  1 32 ../../gcc/diagnostic-show-locus.c
selftest::test_diagnostic_show_locus_one_liner_utf8 447 1
  3 32 ../../gcc/diagnostic-show-locus.c
selftest::test_diagnostic_show_locus_one_liner_utf8 446 1
  1 32 ../../gcc/diagnostic-show-locus.c
selftest::test_diagnostic_show_locus_one_liner_utf8 443 1
  1 32 ../../gcc/diagnostic-show-locus.c
selftest::test_diagnostic_show_locus_one_liner_utf8 442 1
  1 32 ../../gcc/diagnostic-show-locus.c
selftest::test_diagnostic_show_locus_one_liner_utf8 439 1
  1 32 ../../gcc/diagnostic-show-locus.c
selftest::test_diagnostic_show_locus_one_liner_utf8 438 1
  1 32 ../../gcc/diagnostic-show-locus.c
selftest::test_diagnostic_show_locus_one_liner_utf8 437 1
  1 32 ../../gcc/diagnostic-show-locus.c
selftest::test_diagnostic_show_locus_one_liner_utf8 433 1
  1 32 ../../gcc/diagnostic-show-locus.c
selftest::test_diagnostic_show_locus_one_liner_utf8 431 1
  1 32 ../../gcc/diagnostic-show-locus.c
selftest::test_diagnostic_show_locus_one_liner_utf8 430 1
  3 32 ../../gcc/diagnostic-show-locus.c
selftest::test_diagnostic_show_locus_one_liner_utf8 429 1
  2 32 ../../gcc/diagnostic-show-locus.c
selftest::test_diagnostic_show_locus_one_liner_utf8 428 1
  1 32 ../../gcc/diagnostic-show-locus.c
selftest::test_diagnostic_show_locus_one_liner_utf8 426 1
  1 32 ../../gcc/diagnostic-show-locus.c
selftest::test_diagnostic_show_locus_one_liner_utf8 425 1
  1 32 ../../gcc/diagnostic-show-locus.c
selftest::test_diagnostic_show_locus_one_liner_utf8 422 1
  3 32 

[Bug rtl-optimization/92264] [10 Regression] Compile time hog in 521.wrf_r with -Ofast -march=znver2 -g since r276318

2020-03-19 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92264

--- Comment #18 from Richard Biener  ---
Note also that param_max_find_base_term_values limits recursion depth but not
width (the loc list traversals).  The original visited_vals thing was to
prevent infinite recursion only.  If the global caching works a safer approach
would be to turn that visited_vals things into a local cache and see if that's
enough as well.

[Bug rtl-optimization/92264] [10 Regression] Compile time hog in 521.wrf_r with -Ofast -march=znver2 -g since r276318

2020-03-19 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92264

--- Comment #17 from Richard Biener  ---
Created attachment 48061
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=48061=edit
cache base term

I wonder if we could simply cache the base terms in elt_loc_list?  Does that
make a difference?

[Bug rtl-optimization/92264] [10 Regression] Compile time hog in 521.wrf_r with -Ofast -march=znver2 -g since r276318

2020-03-19 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92264

--- Comment #16 from Martin Liška  ---
PR88440 is also slightly related where enablement of
-ftree-loop-distribute-patterns caused longer compilation:
521.wrf_r: 310 -> 346s

[Bug rtl-optimization/92264] [10 Regression] Compile time hog in 521.wrf_r with -Ofast -march=znver2 -g since r276318

2020-03-19 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92264

--- Comment #15 from Richard Biener  ---
WRF initial_config has very very very many (nested) loops to initialize
globals.
IIRC there's a related bug running into the very same issue when prefetching
is enabled.

[Bug rtl-optimization/92264] [10 Regression] Compile time hog in 521.wrf_r with -Ofast -march=znver2 -g since r276318

2020-03-18 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92264

--- Comment #14 from Martin Liška  ---
For firefox with LTO we get:

$ wc -l /tmp/fbt
1645 /tmp/fbt
$ sort /tmp/fbt | uniq -c | sort -n | tac | head -n20
 10 64 /tmp/libxul.so.J1HwqB.ltrans17.o CollectReports 103 0
 10 64 /tmp/libxul.so.J1HwqB.ltrans17.o CollectReports 101 0
  6 64 /tmp/libxul.so.J1HwqB.ltrans17.o CollectReports 106 0
  2 64 /tmp/libxul.so.J1HwqB.ltrans55.o GetFormat 997 0
  2 64 /tmp/libxul.so.J1HwqB.ltrans55.o GetFormat 996 0
  2 64 /tmp/libxul.so.J1HwqB.ltrans55.o GetFormat 995 0
  2 64 /tmp/libxul.so.J1HwqB.ltrans55.o GetFormat 994 0
  2 64 /tmp/libxul.so.J1HwqB.ltrans55.o GetFormat 993 0
  2 64 /tmp/libxul.so.J1HwqB.ltrans55.o GetFormat 992 0
  2 64 /tmp/libxul.so.J1HwqB.ltrans55.o GetFormat 991 0
  2 64 /tmp/libxul.so.J1HwqB.ltrans55.o GetFormat 988 0
  2 64 /tmp/libxul.so.J1HwqB.ltrans55.o GetFormat 987 0
  2 64 /tmp/libxul.so.J1HwqB.ltrans55.o GetFormat 986 0
  2 64 /tmp/libxul.so.J1HwqB.ltrans55.o GetFormat 985 0
  2 64 /tmp/libxul.so.J1HwqB.ltrans55.o GetFormat 984 0
  2 64 /tmp/libxul.so.J1HwqB.ltrans55.o GetFormat 983 0
  2 64 /tmp/libxul.so.J1HwqB.ltrans55.o GetFormat 982 0
  2 64 /tmp/libxul.so.J1HwqB.ltrans55.o GetFormat 979 0
  2 64 /tmp/libxul.so.J1HwqB.ltrans55.o GetFormat 978 0
  2 64 /tmp/libxul.so.J1HwqB.ltrans55.o GetFormat 977 0

there are no bigger values than 1000.

[Bug rtl-optimization/92264] [10 Regression] Compile time hog in 521.wrf_r with -Ofast -march=znver2 -g since r276318

2020-03-18 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92264

--- Comment #13 from Martin Liška  ---
So for the problematic wrf file we get:

$ gfortran module_configure.fppized.f90 -c -march=znver2 -std=legacy
-fconvert=big-endian -fno-openmp -Ofast -march=znver2 -g
...
$ wc -l /tmp/fbt 
26273112 /tmp/fbt
$ sort /tmp/fbt | uniq -c | sort -n | tac | head -n10
24666359 64 module_configure.fppized.f90 initial_config 2001 0
 482996 64 module_configure.fppized.f90 initial_config 757 0
 336377 64 module_configure.fppized.f90 initial_config 620 0
 321717 64 module_configure.fppized.f90 initial_config 607 0
 288735 64 module_configure.fppized.f90 initial_config 571 0
 101785 64 module_configure.fppized.f90 initial_config 317 0
  23376 64 module_configure.fppized.f90 initial_config 127 0
  22014 64 module_configure.fppized.f90 initial_config 120 0
  18135 64 module_configure.fppized.f90 initial_config 107 0
 41 64 module_configure.fppized.f90 initial_config 115 0

[Bug rtl-optimization/92264] [10 Regression] Compile time hog in 521.wrf_r with -Ofast -march=znver2 -g since r276318

2020-03-18 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92264

--- Comment #12 from Jakub Jelinek  ---
I guess we can consider lowering the default value of the param and/or having
separate param for var-tracking vs. for normal RTL optimizations.
But before doing that I think we want to gather some statistics to help us
decide on which value we want, like:
--- gcc/alias.c 2020-03-09 13:38:04.534284063 +0100
+++ gcc/alias.c 2020-03-18 14:20:03.669278109 +0100
@@ -2116,6 +2116,14 @@ find_base_term (rtx x)
   rtx res = find_base_term (x, visited_vals);
   for (unsigned i = 0; i < visited_vals.length (); ++i)
 visited_vals[i].first->locs = visited_vals[i].second;
+  if (visited_vals.length () > 100)
+{
+  FILE *f = fopen ("/tmp/fbt", "a");
+  fprintf (f, "%d %s %s %d %d\n", (int) BITS_PER_WORD,
+  main_input_filename ? main_input_filename : "-",
+  current_function_name (), visited_vals.length (), res !=
NULL_RTX);
+  fclose (f);
+}
   return res;
 }

and use it on both x86_64-linux and i686-linux bootstrap/regtest (perhaps other
targets too) and perhaps on some other packages too (firefox, libreoffice,
...).

[Bug rtl-optimization/92264] [10 Regression] Compile time hog in 521.wrf_r with -Ofast -march=znver2 -g since r276318

2020-03-17 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92264

--- Comment #11 from Martin Liška  ---
So it finishes in 50m16s.

[Bug rtl-optimization/92264] [10 Regression] Compile time hog in 521.wrf_r with -Ofast -march=znver2 -g since r276318

2020-03-17 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92264

--- Comment #10 from Jakub Jelinek  ---
(In reply to Martin Liška from comment #9)
> With --param max-find-base-term-values=100 it takes 4m24s.
> With --param max-find-base-term-values=1 it takes 2m22s.

So, if you e.g. bump the timeout to 30 minutes, does it complete even with the
default setting and how long does it take?

[Bug rtl-optimization/92264] [10 Regression] Compile time hog in 521.wrf_r with -Ofast -march=znver2 -g since r276318

2020-03-17 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92264

--- Comment #9 from Martin Liška  ---
With --param max-find-base-term-values=100 it takes 4m24s.
With --param max-find-base-term-values=1 it takes 2m22s.

[Bug rtl-optimization/92264] [10 Regression] Compile time hog in 521.wrf_r with -Ofast -march=znver2 -g since r276318

2020-03-17 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92264

--- Comment #8 from Martin Liška  ---
With --param max-find-base-term-values=10 it takes 2m34s.

[Bug rtl-optimization/92264] [10 Regression] Compile time hog in 521.wrf_r with -Ofast -march=znver2 -g since r276318

2020-03-17 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92264

--- Comment #7 from Martin Liška  ---
(In reply to Jakub Jelinek from comment #6)
> In r10-7086-g2e94d3ee47be0742df843d95e3d1bf1da11e4796 I've added a param
> controlled cap on the number of VALUEs walked by a single toplevel
> find_base_term.
> Does it make a difference on this?

The current master still fails with a timeout of 5 minutes. Perf looks a bit
lower for find_base_term:

46.64%  f951  f951  [.] find_base_term
 8.91%  f951  f951  [.] ix86_find_base_term
 2.19%  f951  libc-2.31.so  [.] _int_malloc
 2.15%  f951  f951  [.] get_ref_base_and_extent
 2.11%  f951  f951  [.] bitmap_set_bit
 2.03%  f951  f951  [.] find_base_term
 1.75%  f951  f951  [.] cselib_sp_based_value_p
 1.52%  f951  f951  [.] pre_expr_reaches_here_p_work
 1.51%  f951  f951  [.] fsm_find_thread_path
 1.20%  f951  f951  [.] build_object_conflicts

[Bug rtl-optimization/92264] [10 Regression] Compile time hog in 521.wrf_r with -Ofast -march=znver2 -g since r276318

2020-03-10 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92264

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #6 from Jakub Jelinek  ---
In r10-7086-g2e94d3ee47be0742df843d95e3d1bf1da11e4796 I've added a param
controlled cap on the number of VALUEs walked by a single toplevel
find_base_term.
Does it make a difference on this?

[Bug rtl-optimization/92264] [10 Regression] Compile time hog in 521.wrf_r with -Ofast -march=znver2 -g since r276318

2020-01-17 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92264

Richard Biener  changed:

   What|Removed |Added

   Priority|P3  |P1

[Bug rtl-optimization/92264] [10 Regression] Compile time hog in 521.wrf_r with -Ofast -march=znver2 -g since r276318

2019-11-19 Thread rsandifo at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92264

rsandifo at gcc dot gnu.org  changed:

   What|Removed |Added

 Status|UNCONFIRMED |ASSIGNED
   Last reconfirmed||2019-11-19
   Assignee|unassigned at gcc dot gnu.org  |rsandifo at gcc dot 
gnu.org
 Ever confirmed|0   |1

--- Comment #5 from rsandifo at gcc dot gnu.org  
---
Mine.

[Bug rtl-optimization/92264] [10 Regression] Compile time hog in 521.wrf_r with -Ofast -march=znver2 -g since r276318

2019-11-04 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92264

--- Comment #4 from Martin Liška  ---
> If you have handy access to the reproducer, is it -g that makes
> the difference?

Yes.

[Bug rtl-optimization/92264] [10 Regression] Compile time hog in 521.wrf_r with -Ofast -march=znver2 -g since r276318

2019-11-04 Thread rsandifo at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92264

--- Comment #3 from rsandifo at gcc dot gnu.org  
---
Similar to what Richard says, this sounds like a latent bug.  One of
the effects of that rev was to prevent unnecessary invalidation of
equivalences based on the stack pointer and frame pointer (which stay
the same after a call).  It sounds like this has triggered something
that could have happened anyway if there weren't any calls in the way.

If you have handy access to the reproducer, is it -g that makes
the difference?  var-tracking can generate very large cselib tables
and I'm wondering if it comes from there.

[Bug rtl-optimization/92264] [10 Regression] Compile time hog in 521.wrf_r with -Ofast -march=znver2 -g since r276318

2019-10-29 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92264

--- Comment #2 from Martin Liška  ---
Just for the record, the compilation takes now ~2:30 hours.

[Bug rtl-optimization/92264] [10 Regression] Compile time hog in 521.wrf_r with -Ofast -march=znver2 -g since r276318

2019-10-29 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92264

Richard Biener  changed:

   What|Removed |Added

 Target||x86_64-linux-gnu
   Host|x86_64-linux-gnu|
   Target Milestone|--- |10.0

--- Comment #1 from Richard Biener  ---
Note find_base_term is known to be quadratic ... possibly we were able to
short-cut this much more often before the rev.