[Bug rtl-optimization/98782] [11 Regression] Bad interaction between IPA frequences and IRA resulting in spills due to changes in BB frequencies

2022-01-11 Thread rsandifo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98782

--- Comment #43 from rsandifo at gcc dot gnu.org  
---
(In reply to hubicka from comment #42)
> I also see a 6.69% regression on x64 with -Ofast -march=native -flto
> https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=475.377.0
I can reproduce this with -Ofast -flto -march=znver3 (but not running
on a Zen 3).  It looks like it's due to g:d3ff7420e94 instead though
(sorry Andre!).  With a 3-iteration run, I see a 6.2% regression after
that revision compared with before it.

It would be great if someone more familiar than me with x86
could confirm the bisection though.

> and perhaps 3-5% on sphinx
> https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=476.280.0
> https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=227.280.0
I'll look at these next.

> For non-spec benchmarks spec there is a regression on nbench
> https://lnt.opensuse.org/db_default/v4/CPP/graph?plot.0=26.645.1
> There are also large changes in tsvc
> https://lnt.opensuse.org/db_default/v4/CPP/latest_runs_report
> it may be noise since kernels are tiny, but for example x293 reproduces
> both on kabylake and zen by about 80-90% regression that may be easy to
> track (the kernel is included in the testsuite). Same regression is not
> seen on zen3, so may be an ISA specific or so.
To summarise what we discussed on irc (for the record): it looks like
the s293 regression is in the noise, like you say.  I can't convince
GCC to generate different code before and after the IRA patches for that.
I haven't looked at the other tsvc tests yet.

[Bug rtl-optimization/98782] [11 Regression] Bad interaction between IPA frequences and IRA resulting in spills due to changes in BB frequencies

2022-01-11 Thread hubicka at kam dot mff.cuni.cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98782

--- Comment #42 from hubicka at kam dot mff.cuni.cz ---
on zen2 and 3 with -flto the speedup seems to be cca 12% for both -O2
and -Ofast -march=native which is both very nice!
Zen1 for some reason sees less improvement, about 6%.
With PGO it is 3.8%

Overall it seems a win, but there are few noteworthy issues.

I also see a 6.69% regression on x64 with -Ofast -march=native -flto
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=475.377.0
and perhaps 3-5% on sphinx
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=476.280.0
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=227.280.0

For non-spec benchmarks spec there is a regression on nbench
https://lnt.opensuse.org/db_default/v4/CPP/graph?plot.0=26.645.1
There are also large changes in tsvc
https://lnt.opensuse.org/db_default/v4/CPP/latest_runs_report
it may be noise since kernels are tiny, but for example x293 reproduces
both on kabylake and zen by about 80-90% regression that may be easy to
track (the kernel is included in the testsuite). Same regression is not
seen on zen3, so may be an ISA specific or so.

FInally there seems relatively large code size savings on polyhedron
benchmarks today (8% on capacita, 

Thanks a lot!

[Bug rtl-optimization/98782] [11 Regression] Bad interaction between IPA frequences and IRA resulting in spills due to changes in BB frequencies

2022-01-10 Thread rsandifo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98782

rsandifo at gcc dot gnu.org  changed:

   What|Removed |Added

 Resolution|--- |FIXED
Summary|[11/12 Regression] Bad  |[11 Regression] Bad
   |interaction between IPA |interaction between IPA
   |frequences and IRA  |frequences and IRA
   |resulting in spills due to  |resulting in spills due to
   |changes in BB frequencies   |changes in BB frequencies
 Status|ASSIGNED|RESOLVED

--- Comment #41 from rsandifo at gcc dot gnu.org  
---
Hopefully fixed on trunk, much too invasive to backport.

Thanks Vlad for the reviews.

[Bug rtl-optimization/98782] [11 Regression] Bad interaction between IPA frequences and IRA resulting in spills due to changes in BB frequencies

2021-02-26 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98782

Richard Biener  changed:

   What|Removed |Added

   Target Milestone|11.0|12.0
   Priority|P3  |P2

--- Comment #5 from Richard Biener  ---
Re-targeting to GCC 12.

[Bug rtl-optimization/98782] [11 Regression] Bad interaction between IPA frequences and IRA resulting in spills due to changes in BB frequencies

2021-02-23 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98782

Richard Biener  changed:

   What|Removed |Added

   Target Milestone|--- |11.0

[Bug rtl-optimization/98782] [11 Regression] Bad interaction between IPA frequences and IRA resulting in spills due to changes in BB frequencies

2021-02-22 Thread jiangning.liu at amperecomputing dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98782

--- Comment #4 from Jiangning Liu  ---
Hi Honza,

Do you see any other real case problems if the patch
g:1118a3ff9d3ad6a64bba25dc01e7703325e23d92 is not applied?

If exchange2 is the only one affected by this patch so far, and because we have
observed big performance regression, it sounds we need to provide an IRA fix
along with this patch to avoid unexpected performance degradation for gcc11
release vs. gcc10.

Thanks,
-Jiangning

[Bug rtl-optimization/98782] [11 Regression] Bad interaction between IPA frequences and IRA resulting in spills due to changes in BB frequencies

2021-02-05 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98782

Tamar Christina  changed:

   What|Removed |Added

 CC||hubicka at gcc dot gnu.org
Summary|IRA artificially creating   |[11 Regression] Bad
   |spills due to BB|interaction between IPA
   |frequencies |frequences and IRA
   ||resulting in spills due to
   ||changes in BB frequencies

--- Comment #3 from Tamar Christina  ---
Hi,

Since we are in stage-4 I'd like to put all our ducks in a row and see what the
options are at this point.

IRA as you can imagine is huge and quite complex, the more I investigate the
problem the more I realize that
there isn't a spot fix for this issue.  It will require a lot more work in IRA
and understanding parts of it
that I don't fully understand yet.

But one thing is clear, there is a severe interaction between IPA predictions
and IRA under conditions where
there is high register pressure *and* a function call.

The problem is that the changes introduced in
g:1118a3ff9d3ad6a64bba25dc01e7703325e23d92 make local changes.
i.e. they effect only some BB and not others.  The problem is any spot fix in
IRA would be a globally scoped.

I was investigating whether the issue could be solved by having IRA treat the
recursive inlined function in
exchange2 as one region instead of going live range splitting. And yes using
-fira-region=one does make a
difference, but only a small difference of about 33% of the regression. However
doing this has some disadvantage
in that regions that before would not count in the live range of the call are
now counted, so you regress
spilling in those cases.  This is why this flag can only recover 33% of the
regression, it introduces some of it's
own.

The second alternative I tried as a spot fix is to be able to specify a weight
for the CALL_FREQ for use during
situations of high reg pressure and call live ranges.  The "hack" looks like
this:

index 4fe019b2367..674e6ca7a48 100644
--- a/gcc/caller-save.c
+++ b/gcc/caller-save.c
@@ -425,6 +425,7 @@ setup_save_areas (void)
  || find_reg_note (insn, REG_NORETURN, NULL))
continue;
   freq = REG_FREQ_FROM_BB (BLOCK_FOR_INSN (insn));
+  freq = freq * (param_ira_call_freq_weight / 100.f);
   REG_SET_TO_HARD_REG_SET (hard_regs_to_save,
   >live_throughout);
   used_regs = insn_callee_abi (insn).full_reg_clobbers ();
diff --git a/gcc/ira-lives.c b/gcc/ira-lives.c
index 4ba29dcadf4..6e2699e5a7d 100644
--- a/gcc/ira-lives.c
+++ b/gcc/ira-lives.c
@@ -1392,7 +1392,7 @@ process_bb_node_lives (ira_loop_tree_node_t
loop_tree_node)
   it was saved on previous call in the same basic
   block and the hard register was not mentioned
   between the two calls.  */
-   ALLOCNO_CALL_FREQ (a) += freq / 3;
+   ALLOCNO_CALL_FREQ (a) += (freq *
(param_ira_call_freq_weight / 100.0f));
diff --git a/gcc/params.opt b/gcc/params.opt
index cfed980a4d2..39d5cae9f31 100644
--- a/gcc/params.opt
+++ b/gcc/params.opt
@@ -321,6 +321,11 @@ Max size of conflict table in MB.
 Common Joined UInteger Var(param_ira_max_loops_num) Init(100) Param
Optimization
 Max loops number for regional RA.

+-param=ira-call-freq-weight=
+Common Joined UInteger Var(param_ira_call_freq_weight) Init(100) Param
Optimization
+Scale to be applied to the weighting of the frequencies of allocations live
across
+a call.
+
 -param=iv-always-prune-cand-set-bound=
 Common Joined UInteger Var(param_iv_always_prune_cand_set_bound) Init(10)
Param Optimization
 If number of candidates in the set is smaller, we always try to remove unused
ivs during its optimization.

And if we look at the changes in the frequency between the good and bad case
the prediction changes approx 40%.
So using the value of --param ira-call-freq-weight=40 recovers about 60% of the
regression.  The issue this global
change introduce is however that IRA seems to start preferring callee-saves.
Which is in itself not an issue, but
at the boundary of a region it will then emit moves from temp to callee-saves
to carry live values to the next region.

This is completely unneeded, enabling the late register renaming pass
(-frename-registers) removes these superfluous moves
and we recover 66% of the regression. But this is just a big hack.  The obvious
disadvantage here, since again it's a global
change is that it pushes all caller saves to be spilled before the function
call.  And indeed, before the recursive call
there now is a massive amount of spilling happening.

But it is something that would be "safe" to do at this point in the GCC
development cycle.

The last and preferred approach, if you agree