[Bug other/90257] [9/10 Regression] 8% degradation on cpu2006 403.gcc starting with r270484

2019-04-26 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90257

--- Comment #14 from Segher Boessenkool  ---
I committed as r270601, on gcc-9-branch

2019-04-26  Segher Boessenkool  

PR other/90257
Revert the revert:
2019-04-21  H.J. Lu  

PR target/90178
Revert:
2018-11-21  Uros Bizjak  

Revert the revert:
2013-10-26  Vladimir Makarov  

Revert:
2013-10-25  Vladimir Makarov  

* lra-spills.c (lra_final_code_change): Remove useless move insns.


so this is okay for rs6000 on GCC 9 for now.

[Bug other/90257] [9/10 Regression] 8% degradation on cpu2006 403.gcc starting with r270484

2019-04-26 Thread rguenther at suse dot de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90257

--- Comment #13 from rguenther at suse dot de  ---
On April 26, 2019 4:37:24 PM GMT+02:00, rguenther at suse dot de
 wrote:
>https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90257
>
>--- Comment #12 from rguenther at suse dot de de> ---
>On April 26, 2019 4:18:03 PM GMT+02:00, "jakub at gcc dot gnu.org"
> wrote:
>>https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90257
>>
>>--- Comment #11 from Jakub Jelinek  ---
>>Created attachment 46253
>>  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=46253=edit
>>gcc9-pr90257.patch
>>
>>Untested patch that fixes PR90178 even when the reversion of reversion
>>of
>>reversion in lra-spills.c is reverted.
>
>Any reason why this heuristic is good? It looks arbitrary to solve the
>particular testcase? 

In particular we'd keep a chain of 16 forwarders unmerged with your change? 

>>For the trunk, we could as well replace the lra-spills.c change with
>>richi's
>>dce change or whatever else.  Just it seems to be wrong to rely on
>>unoptimal IL
>>to perform proper optimizations.

[Bug other/90257] [9/10 Regression] 8% degradation on cpu2006 403.gcc starting with r270484

2019-04-26 Thread rguenther at suse dot de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90257

--- Comment #12 from rguenther at suse dot de  ---
On April 26, 2019 4:18:03 PM GMT+02:00, "jakub at gcc dot gnu.org"
 wrote:
>https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90257
>
>--- Comment #11 from Jakub Jelinek  ---
>Created attachment 46253
>  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=46253=edit
>gcc9-pr90257.patch
>
>Untested patch that fixes PR90178 even when the reversion of reversion
>of
>reversion in lra-spills.c is reverted.

Any reason why this heuristic is good? It looks arbitrary to solve the
particular testcase? 

>For the trunk, we could as well replace the lra-spills.c change with
>richi's
>dce change or whatever else.  Just it seems to be wrong to rely on
>unoptimal IL
>to perform proper optimizations.

[Bug other/90257] [9/10 Regression] 8% degradation on cpu2006 403.gcc starting with r270484

2019-04-26 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90257

--- Comment #11 from Jakub Jelinek  ---
Created attachment 46253
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=46253=edit
gcc9-pr90257.patch

Untested patch that fixes PR90178 even when the reversion of reversion of
reversion in lra-spills.c is reverted.

For the trunk, we could as well replace the lra-spills.c change with richi's
dce change or whatever else.  Just it seems to be wrong to rely on unoptimal IL
to perform proper optimizations.

[Bug other/90257] [9/10 Regression] 8% degradation on cpu2006 403.gcc starting with r270484

2019-04-26 Thread pthaugen at linux dot ibm.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90257

--- Comment #10 from Pat Haugen  ---
(In reply to Richard Biener from comment #3)
> Created attachment 46250 [details]
> run_fast_dce also for LRA
> 
> Sth like this could fix it.

Yes, that restored the performance.

[Bug other/90257] [9/10 Regression] 8% degradation on cpu2006 403.gcc starting with r270484

2019-04-26 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90257

--- Comment #9 from Jakub Jelinek  ---
I believe the difference is caused by cfg cleanup without the noop move
considering
(code_label 34 6 37 9 1 (nil) [2 uses])
(note 37 34 36 9 [bb 9] NOTE_INSN_BASIC_BLOCK)
(insn 36 37 53 9 (use (reg/i:DI 0 ax)) "pr90178.c":8:1 -1
 (nil))
basic block a forwarder block (that seems correct), while with the noop it
isn't and thus doesn't try to optimize it.
The condition in try_forward_edges is:
  if (FORWARDER_BLOCK_P (target)
  && single_succ (target) != EXIT_BLOCK_PTR_FOR_FN (cfun))
and so it already tries to avoid changing forwarders to exit that way.
But the single_succ of target is not EXIT, but an empty forwarder block to
EXIT:
(note 53 36 51 10 [bb 10] NOTE_INSN_BASIC_BLOCK)
created by mode-switching.c (create_pre_exit).

Wonder if the correct fix for PR90257 isn't extend the above test to also
consider further forwarders to exit in the above test.

[Bug other/90257] [9/10 Regression] 8% degradation on cpu2006 403.gcc starting with r270484

2019-04-26 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90257

--- Comment #8 from Jakub Jelinek  ---
(In reply to Richard Biener from comment #3)
> Created attachment 46250 [details]
> run_fast_dce also for LRA
> 
> Sth like this could fix it.

I've verified this patch breaks PR90178 again as well.
I think we need to debug why cfg cleanup makes a different decision during the
vzeroupper pass on PR90178.

[Bug other/90257] [9/10 Regression] 8% degradation on cpu2006 403.gcc starting with r270484

2019-04-26 Thread pthaugen at linux dot ibm.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90257

--- Comment #7 from Pat Haugen  ---
Overall 'perf' cycle counts and hot functions.

r270483
---

# Overhead   Samples  Command  Shared Object   
#     ...  
#
91.17%721643  gcc_base.gcc_hu  gcc_base.gcc_hunt_64
 8.82% 69840  gcc_base.gcc_hu  libc-2.17.so

# Overhead   Samples  Command  Shared Object Symbol 
#     ...   
...
#
 6.22% 49295  gcc_base.gcc_hu  gcc_base.gcc_hunt_64  [.]
reg_is_remote_constant_p.isra.0.part.0
 6.18% 48897  gcc_base.gcc_hu  libc-2.17.so  [.]
__memset_power8
 6.05% 47651  gcc_base.gcc_hu  gcc_base.gcc_hunt_64  [.]
bitmap_operation
 5.92% 46695  gcc_base.gcc_hu  gcc_base.gcc_hunt_64  [.]
htab_traverse
 3.66% 28957  gcc_base.gcc_hu  gcc_base.gcc_hunt_64  [.] canon_rtx
 3.59% 28440  gcc_base.gcc_hu  gcc_base.gcc_hunt_64  [.]
compute_transp
 3.35% 26372  gcc_base.gcc_hu  gcc_base.gcc_hunt_64  [.]
bitmap_element_allocate
 2.18% 17151  gcc_base.gcc_hu  gcc_base.gcc_hunt_64  [.]
compute_dominance_frontiers_1
 2.00% 15841  gcc_base.gcc_hu  gcc_base.gcc_hunt_64  [.]
ggc_set_mark
 1.77% 13974  gcc_base.gcc_hu  gcc_base.gcc_hunt_64  [.]
fixup_var_refs_1
 1.69% 13391  gcc_base.gcc_hu  gcc_base.gcc_hunt_64  [.]
ggc_mark_rtx_children_1
 1.54% 12236  gcc_base.gcc_hu  gcc_base.gcc_hunt_64  [.]
single_set_2.part.0



r270484
---

# Overhead   Samples  Command  Shared Object
#     ...  .
#
92.08%814297  gcc_base.base_6  gcc_base.base_64 
 7.91% 70063  gcc_base.base_6  libc-2.17.so 


# Overhead   Samples  Command  Shared Object  Symbol
#     ...  . 
...
#
 8.14% 71642  gcc_base.base_6  gcc_base.base_64   [.]
bitmap_operation
 6.92% 60863  gcc_base.base_6  gcc_base.base_64   [.]
bitmap_element_allocate
 6.01% 53281  gcc_base.base_6  gcc_base.base_64   [.]
reg_is_remote_constant_p.isra.0.part.0
 5.68% 50081  gcc_base.base_6  gcc_base.base_64   [.] htab_traverse
 5.53% 48967  gcc_base.base_6  libc-2.17.so   [.]
__memset_power8
 3.82% 33850  gcc_base.base_6  gcc_base.base_64   [.]
compute_transp
 3.30% 29142  gcc_base.base_6  gcc_base.base_64   [.] canon_rtx
 1.95% 17155  gcc_base.base_6  gcc_base.base_64   [.]
compute_dominance_frontiers_1
 1.81% 16023  gcc_base.base_6  gcc_base.base_64   [.] ggc_set_mark
 1.69% 14989  gcc_base.base_6  gcc_base.base_64   [.]
ggc_mark_rtx_children_1
 1.57% 13832  gcc_base.base_6  gcc_base.base_64   [.]
fixup_var_refs_1
 1.48% 13144  gcc_base.base_6  gcc_base.base_64   [.]
single_set_2.part.0

[Bug other/90257] [9/10 Regression] 8% degradation on cpu2006 403.gcc starting with r270484

2019-04-26 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90257

Richard Biener  changed:

   What|Removed |Added

   Priority|P1  |P2

[Bug other/90257] [9/10 Regression] 8% degradation on cpu2006 403.gcc starting with r270484

2019-04-26 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90257

Richard Biener  changed:

   What|Removed |Added

 Status|UNCONFIRMED |WAITING
   Last reconfirmed||2019-04-26
 Ever confirmed|0   |1

--- Comment #6 from Richard Biener  ---
Thus, ppc folks - did you see a 8% peformance increase at the

2018-11-21  Uros Bizjak  

Revert the revert:
2013-10-26  Vladimir Makarov  

Revert:
2013-10-25  Vladimir Makarov  

* lra-spills.c (lra_final_code_change): Remove useless move insns.

revision?

[Bug other/90257] [9/10 Regression] 8% degradation on cpu2006 403.gcc starting with r270484

2019-04-26 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90257

--- Comment #5 from Richard Biener  ---
So - is the regression of 8% compared to GCC 8?  If only to some development
branch revision then it doesn't count.  As I read it the removed code in
question only got added during GCC 9 stage3?

[Bug other/90257] [9/10 Regression] 8% degradation on cpu2006 403.gcc starting with r270484

2019-04-26 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90257

--- Comment #4 from Richard Biener  ---
Let's revert the offending commit on the branch but keep it on trunk for
further investigation.  PR90178 was only a missed optimization.

[Bug other/90257] [9/10 Regression] 8% degradation on cpu2006 403.gcc starting with r270484

2019-04-26 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90257

--- Comment #3 from Richard Biener  ---
Created attachment 46250
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=46250=edit
run_fast_dce also for LRA

Sth like this could fix it.

[Bug other/90257] [9/10 Regression] 8% degradation on cpu2006 403.gcc starting with r270484

2019-04-26 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90257

Richard Biener  changed:

   What|Removed |Added

 Blocks||26163
   Target Milestone|--- |9.2
Summary|8% degradation on cpu2006   |[9/10 Regression] 8%
   |403.gcc starting with   |degradation on cpu2006
   |r270484 |403.gcc starting with
   ||r270484


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26163
[Bug 26163] [meta-bug] missed optimization in SPEC (2k17, 2k and 2k6 and 95)