[Bug tree-optimization/107409] Perf loss ~5% on 519.lbm_r SPEC cpu2017 benchmark with r10-5090-ga9a4edf0e71bba

2023-03-30 Thread marxin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107409

Martin Liška  changed:

   What|Removed |Added

 Resolution|--- |INVALID
 Status|NEW |RESOLVED

--- Comment #24 from Martin Liška  ---
Sure, let's close it.

[Bug tree-optimization/107409] Perf loss ~5% on 519.lbm_r SPEC cpu2017 benchmark with r10-5090-ga9a4edf0e71bba

2023-03-29 Thread rvmallad at amazon dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107409

--- Comment #23 from Rama Malladi  ---
(In reply to Rama Malladi from comment #22)
> I will close this issue as we were unable to reproduce the perf drop going
> from gcc-7 to gcc-8 on a Graviton2 based instance. The performance of
> 519.lbm_r built with gcc-7.4 was same as that with gcc-8.5.

Can someone from the GCC dev/ regression team close this issue as I am unable
to find an option for the same? Thanks

[Bug tree-optimization/107409] Perf loss ~5% on 519.lbm_r SPEC cpu2017 benchmark with r10-5090-ga9a4edf0e71bba

2023-03-29 Thread rvmallad at amazon dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107409

--- Comment #22 from Rama Malladi  ---
I will close this issue as we were unable to reproduce the perf drop going from
gcc-7 to gcc-8 on a Graviton2 based instance. The performance of 519.lbm_r
built with gcc-7.4 was same as that with gcc-8.5.

[Bug tree-optimization/107409] Perf loss ~5% on 519.lbm_r SPEC cpu2017 benchmark with r10-5090-ga9a4edf0e71bba

2023-02-24 Thread rvmallad at amazon dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107409

--- Comment #21 from Rama Malladi  ---
I did another triage for perf loss on Graviton 2 processor (neoverse-n1) based
instance and found this commit: `a9a4edf0e71bbac9f1b5dcecdcf9250111d16889` to
be the reason. As I had indicated in my earlier reply, I was doing a triage of
perf loss going from gcc-7 to gcc-10.

The perf of 519.libm_r 1-copy run improved 1.08x with the revert of commit:
`a9a4edf0e71bbac9f1b5dcecdcf9250111d16889` on gcc-mainline (
`2f1691be517fcdcabae9cd671ab511eb0e08b1d5`).

I am guessing that we don't see it on LNT/ Altra CPUs.

So, please look into this issue fix. Let me know if you have any queries.
Thanks.

[Bug tree-optimization/107409] Perf loss ~5% on 519.lbm_r SPEC cpu2017 benchmark with r10-5090-ga9a4edf0e71bba

2023-02-20 Thread rvmallad at amazon dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107409

--- Comment #20 from Rama Malladi  ---
@Martin J and @Sebastian P, Let me walk you through the perf data and my
triage.

First, my triage has been on Graviton 3 (neoverse-v1) processor based
instances. Next, I was looking for perf delta going from gcc-7 to gcc-10. I
found 2 issues: One was reported in 107413
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107413) and fixed (the perf delta
between gcc-7 and gcc-8 -- 215s vs. 266s); Another one is the issue reported in
here.

I did another triage and landed at the same commit that I reported earlier.

# first bad commit: [a9a4edf0e71bbac9f1b5dcecdcf9250111d16889] Update
max_bb_count in execute_fixup_cfg

Please let me know any further info/ studies you would like to see on this
report.

Thank you.

[Bug tree-optimization/107409] Perf loss ~5% on 519.lbm_r SPEC cpu2017 benchmark with r10-5090-ga9a4edf0e71bba

2023-02-02 Thread rvmallad at amazon dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107409

--- Comment #19 from Rama Malladi  ---
Thanks @Sebastian and @Martin J. I will get another bisect between GCC 7-and-8.

[Bug tree-optimization/107409] Perf loss ~5% on 519.lbm_r SPEC cpu2017 benchmark with r10-5090-ga9a4edf0e71bba

2023-02-02 Thread spop at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107409

Sebastian Pop  changed:

   What|Removed |Added

 CC||spop at gcc dot gnu.org

--- Comment #18 from Sebastian Pop  ---
A new 5% regression happened in gcc-trunk more recently and may be due to
another patch.

Rama was bisecting a 15% perf regression on lbm when updating gcc-7 to gcc-10.
The regression can be seen on the LNT graph link from comment#3 

https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=633.477.0=683.477.0=664.477.0=648.477.0=618.477.0=605.477.0=759.477.0=584.477.0

gcc-6 has execution time of 213 seconds
gcc-7 is at 215 seconds
gcc-8 is at 266
gcc-9 at 259
gcc-10 at 260

Honza's patch seems to be unrelated as it was committed to trunk before gcc-10
release on May 7, 2020:

commit a9a4edf0e71bbac9f1b5dcecdcf9250111d16889
Author: Jan Hubicka 
Date:   Sat Nov 30 22:25:24 2019 +0100

Update max_bb_count in execute_fixup_cfg


We need to git-bisect between gcc-7 and gcc-8.

[Bug tree-optimization/107409] Perf loss ~5% on 519.lbm_r SPEC cpu2017 benchmark with r10-5090-ga9a4edf0e71bba

2023-01-30 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107409

Martin Jambor  changed:

   What|Removed |Added

 CC||jamborm at gcc dot gnu.org

--- Comment #17 from Martin Jambor  ---
(In reply to Martin Liška from comment #3)
> Note I can't see it for Altra aarch64 CPU:

I think LNT can see it very well and it has appeared around the reported time:

https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=759.477.0=584.477.0;

But I don't think the commit given in the bug summary is the culprit, this
clearly happened in GCC 13 development cycle.  Rama, what is your reasoning to
suggest  reverting this particular commit?  If you bisected the issue, can you
double check you arrived at the correct one?

[Bug tree-optimization/107409] Perf loss ~5% on 519.lbm_r SPEC cpu2017 benchmark with r10-5090-ga9a4edf0e71bba

2023-01-09 Thread marxin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107409

--- Comment #16 from Martin Liška  ---
@Honza: ???

[Bug tree-optimization/107409] Perf loss ~5% on 519.lbm_r SPEC cpu2017 benchmark with r10-5090-ga9a4edf0e71bba

2023-01-08 Thread rvmallad at amazon dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107409

--- Comment #15 from Rama Malladi  ---
Hi, Can we review this issue and suggest next steps/ action please? Thanks.

[Bug tree-optimization/107409] Perf loss ~5% on 519.lbm_r SPEC cpu2017 benchmark with r10-5090-ga9a4edf0e71bba

2022-12-12 Thread rvmallad at amazon dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107409

--- Comment #14 from Rama Malladi  ---
(In reply to Martin Liška from comment #13)
> Note the mentioned revision is a fix and yes, sometimes these revisions can
> end up with a regression as profile estimation is a complex guess.

Yes, possibly. So, from my understanding, the update_max_bb_count() tracks the
max basic block count and takes a decision to inline or not in this case/ run.
That is likely why we see a larger instruction count w this function executed/
enabled.

[Bug tree-optimization/107409] Perf loss ~5% on 519.lbm_r SPEC cpu2017 benchmark with r10-5090-ga9a4edf0e71bba

2022-12-09 Thread marxin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107409

--- Comment #13 from Martin Liška  ---
Note the mentioned revision is a fix and yes, sometimes these revisions can end
up with a regression as profile estimation is a complex guess.

[Bug tree-optimization/107409] Perf loss ~5% on 519.lbm_r SPEC cpu2017 benchmark with r10-5090-ga9a4edf0e71bba

2022-12-09 Thread rvmallad at amazon dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107409

--- Comment #12 from Rama Malladi  ---
I found difference in dumps at various stages of the compilation for the
mainline GCC and with update_max_bb_count() commented. Here are the details:

Mainline: Commit ID: 63a42ffc0833553fbcb84b50cf0fd2d867b8a92f

There was difference in the dumps for these 2 stages:
"einline" and "earlydebug"

Since we use LTO for this build of 519.lbm_r build, I found these differences
in these stages of the link-time optimizer:
"vect", "slp1", "ivopts", "earlydebug", "debug"

Also, this perf drop of 5%-6% with update_max_bb_count() code was observed only
on ARM64 instances (Graviton3) and not on x86_64 instances (Intel Xeon).

I ran the other SPEC cpu2017_fprate benchmarks on ARM64 with this code
commented on GCC mainline and I haven't observed any perf regression. So, maybe
worth a fix.

Thank you.

[Bug tree-optimization/107409] Perf loss ~5% on 519.lbm_r SPEC cpu2017 benchmark with r10-5090-ga9a4edf0e71bba

2022-12-08 Thread rvmallad at amazon dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107409

--- Comment #11 from Rama Malladi  ---
(In reply to Martin Liška from comment #10)
> @Honza ?

Just checking if this can be fixed/ implemented. Thanks.

[Bug tree-optimization/107409] Perf loss ~5% on 519.lbm_r SPEC cpu2017 benchmark with r10-5090-ga9a4edf0e71bba

2022-12-01 Thread marxin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107409

Martin Liška  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2022-12-01
 Ever confirmed|0   |1

--- Comment #10 from Martin Liška  ---
@Honza ?

[Bug tree-optimization/107409] Perf loss ~5% on 519.lbm_r SPEC cpu2017 benchmark with r10-5090-ga9a4edf0e71bba

2022-11-30 Thread rvmallad at amazon dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107409

--- Comment #9 from Rama Malladi  ---
(In reply to Martin Liška from comment #3)
> Can you please share perf-profile before and after the revision?
> 
> Note I can't see it for Altra aarch64 CPU:
> https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=633.477.0.
> 1=683.477.0=664.477.0=648.477.0=618.477.0=605.
> 477.0=759.477.0=584.477.0&
> 
> However, there are huge changes in between GCC 6/7 and a newer releases.
> Note the benchmark is pretty small and very sensitive to instruction caches.

Hi, I got IPC data for baseline version of compiler and with this patch
reverted.

This is on Graviton3 processor machine, executing 1-copy rate run of 519.lbm_r.

Baseline: Compiler commit ID: f896c13489d22b30d01257bc8316ab97b3359d1c
Cycles:148,489,372,938
Instructions:  382,748,379,257
IPC:   2.58

Baseline with code change in a9a4edf0e71bbac9f1b5dcecdcf9250111d16889 reverted.

$ git diff gcc/tree-cfg.cc
diff --git a/gcc/tree-cfg.cc b/gcc/tree-cfg.cc
index d982988048f..736432713fe 100644
--- a/gcc/tree-cfg.cc
+++ b/gcc/tree-cfg.cc
@@ -9984,7 +9984,7 @@ execute_fixup_cfg (void)
 }
   if (scale)
 {
-  update_max_bb_count ();
+//  update_max_bb_count ();
   compute_function_frequency ();
 }

Cycles:140,937,228,769
Instructions:  368,881,714,982
IPC:   2.62

>From the above, I do see the instructions executed are higher for the baseline
compiler code-gen vs. the one with patch reverted. Can you please look into the
code-gen and let me know if you find some perf opportunity with this patch
revert? Thank you.

[Bug tree-optimization/107409] Perf loss ~5% on 519.lbm_r SPEC cpu2017 benchmark

2022-10-27 Thread rvmallad at amazon dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107409

--- Comment #8 from Rama Malladi  ---
(In reply to Mark Wielaard from comment #7)
> The content of attachment 53773 [details] has been deleted for the following
> reason:
> 
> https://sourceware.org/pipermail/overseers/2022q4/019048.html

Thank you.

[Bug tree-optimization/107409] Perf loss ~5% on 519.lbm_r SPEC cpu2017 benchmark

2022-10-27 Thread mark at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107409

--- Comment #7 from Mark Wielaard  ---
The content of attachment 53773 has been deleted for the following reason:

https://sourceware.org/pipermail/overseers/2022q4/019048.html

[Bug tree-optimization/107409] Perf loss ~5% on 519.lbm_r SPEC cpu2017 benchmark

2022-10-27 Thread rvmallad at amazon dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107409

--- Comment #6 from Rama Malladi  ---
(In reply to Martin Liška from comment #5)
> Please try writing here: overse...@sourceware.org

I have asked for deletion. Thanks

[Bug tree-optimization/107409] Perf loss ~5% on 519.lbm_r SPEC cpu2017 benchmark

2022-10-27 Thread marxin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107409

--- Comment #5 from Martin Liška  ---
Please try writing here: overse...@sourceware.org

[Bug tree-optimization/107409] Perf loss ~5% on 519.lbm_r SPEC cpu2017 benchmark

2022-10-27 Thread rvmallad at amazon dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107409

--- Comment #4 from Rama Malladi  ---
Hi Martin,
Thanks for the guidance. Can we delete the attachment from this bug report?

Regards,
Rama

[Bug tree-optimization/107409] Perf loss ~5% on 519.lbm_r SPEC cpu2017 benchmark

2022-10-27 Thread marxin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107409

--- Comment #3 from Martin Liška  ---
Can you please share perf-profile before and after the revision?

Note I can't see it for Altra aarch64 CPU:
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=633.477.0=683.477.0=664.477.0=648.477.0=618.477.0=605.477.0=759.477.0=584.477.0;

However, there are huge changes in between GCC 6/7 and a newer releases. Note
the benchmark is pretty small and very sensitive to instruction caches.

[Bug tree-optimization/107409] Perf loss ~5% on 519.lbm_r SPEC cpu2017 benchmark

2022-10-27 Thread marxin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107409

Martin Liška  changed:

   What|Removed |Added

 CC||marxin at gcc dot gnu.org
 Blocks||26163

--- Comment #2 from Martin Liška  ---
(In reply to Rama Malladi from comment #0)
> Created attachment 53773 [details]
> Input and source files.
> 

Note you should not share publicly source files of the entire SPEC benchmark.
That very likely violates their license rules!


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26163
[Bug 26163] [meta-bug] missed optimization in SPEC (2k17, 2k and 2k6 and 95)

[Bug tree-optimization/107409] Perf loss ~5% on 519.lbm_r SPEC cpu2017 benchmark

2022-10-26 Thread rvmallad at amazon dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107409

--- Comment #1 from Rama Malladi  ---
$ /home/ubuntu/gccfixissue1/bin/gcc  -v
Using built-in specs.
COLLECT_GCC=/home/ubuntu/gccfixissue1/bin/gcc
COLLECT_LTO_WRAPPER=/home/ubuntu/gccfixissue1/libexec/gcc/aarch64-unknown-linux-gnu/13.0.0/lto-wrapper
Target: aarch64-unknown-linux-gnu
Configured with: ../configure --prefix=/home/ubuntu/gccfixissue1
--enable-languages=c,fortran
Thread model: posix
Supported LTO compression algorithms: zlib
gcc version 13.0.0 20221021 (experimental) (GCC)