[Bug tree-optimization/107413] Perf loss ~14% on 519.lbm_r SPEC cpu2017 benchmark with r8-7132-gb5b33e113434be

2022-12-01 Thread rvmallad at amazon dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107413

--- Comment #19 from Rama Malladi  ---
(In reply to Wilco from comment #17)
> (In reply to Rama Malladi from comment #16)
> > (In reply to Wilco from comment #15)
> > > (In reply to Rama Malladi from comment #14)
> > > > This fix also improved performance of 538.imagick_r by 15%. Did you 
> > > > have a
> > > > similar observation? Thank you.
> > > 
> > > No, but I was using -mcpu=neoverse-n1 as my baseline. It's possible
> > > -mcpu=neoverse-v1 shows larger speedups, what gain do you get on the 
> > > overall
> > > FP score?
> > 
> > I was using -mcpu=native and run on a Neoverse V1 arch (Graviton3). Here are
> > the scores I got (relative gains of latest mainline vs. an earlier 
> > mainline).
> > 
> > Latest mainline: 0976b012d89e3d819d83cdaf0dab05925b3eb3a0
> > Earlier mainline: f896c13489d22b30d01257bc8316ab97b3359d1c
> 
> Right that's about 3 weeks of changes, I think
> 1b9a5cc9ec08e9f239dd2096edcc447b7a72f64a has improved imagick_r.
> 
> > geomean 1.03
> 
> That's a nice gain in 3 weeks!

Hi Wilco, Could you backport the change to active release branches? Thanks.

[Bug tree-optimization/107413] Perf loss ~14% on 519.lbm_r SPEC cpu2017 benchmark with r8-7132-gb5b33e113434be

2022-12-01 Thread rvmallad at amazon dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107413

--- Comment #18 from Rama Malladi  ---
(In reply to Wilco from comment #17)
> (In reply to Rama Malladi from comment #16)
> > (In reply to Wilco from comment #15)
> > > (In reply to Rama Malladi from comment #14)
> > > > This fix also improved performance of 538.imagick_r by 15%. Did you 
> > > > have a
> > > > similar observation? Thank you.
> > > 
> > > No, but I was using -mcpu=neoverse-n1 as my baseline. It's possible
> > > -mcpu=neoverse-v1 shows larger speedups, what gain do you get on the 
> > > overall
> > > FP score?
> > 
> > I was using -mcpu=native and run on a Neoverse V1 arch (Graviton3). Here are
> > the scores I got (relative gains of latest mainline vs. an earlier 
> > mainline).
> > 
> > Latest mainline: 0976b012d89e3d819d83cdaf0dab05925b3eb3a0
> > Earlier mainline: f896c13489d22b30d01257bc8316ab97b3359d1c
> 
> Right that's about 3 weeks of changes, I think
> 1b9a5cc9ec08e9f239dd2096edcc447b7a72f64a has improved imagick_r.
> 
> > geomean 1.03
> 
> That's a nice gain in 3 weeks!

Yes, indeed :-) ... Thank you.

[Bug tree-optimization/107413] Perf loss ~14% on 519.lbm_r SPEC cpu2017 benchmark with r8-7132-gb5b33e113434be

2022-12-01 Thread wilco at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107413

Wilco  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED

--- Comment #17 from Wilco  ---
(In reply to Rama Malladi from comment #16)
> (In reply to Wilco from comment #15)
> > (In reply to Rama Malladi from comment #14)
> > > This fix also improved performance of 538.imagick_r by 15%. Did you have a
> > > similar observation? Thank you.
> > 
> > No, but I was using -mcpu=neoverse-n1 as my baseline. It's possible
> > -mcpu=neoverse-v1 shows larger speedups, what gain do you get on the overall
> > FP score?
> 
> I was using -mcpu=native and run on a Neoverse V1 arch (Graviton3). Here are
> the scores I got (relative gains of latest mainline vs. an earlier mainline).
> 
> Latest mainline: 0976b012d89e3d819d83cdaf0dab05925b3eb3a0
> Earlier mainline: f896c13489d22b30d01257bc8316ab97b3359d1c

Right that's about 3 weeks of changes, I think
1b9a5cc9ec08e9f239dd2096edcc447b7a72f64a has improved imagick_r.

> geomean   1.03

That's a nice gain in 3 weeks!

[Bug tree-optimization/107413] Perf loss ~14% on 519.lbm_r SPEC cpu2017 benchmark with r8-7132-gb5b33e113434be

2022-11-29 Thread rvmallad at amazon dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107413

--- Comment #16 from Rama Malladi  ---
(In reply to Wilco from comment #15)
> (In reply to Rama Malladi from comment #14)
> > This fix also improved performance of 538.imagick_r by 15%. Did you have a
> > similar observation? Thank you.
> 
> No, but I was using -mcpu=neoverse-n1 as my baseline. It's possible
> -mcpu=neoverse-v1 shows larger speedups, what gain do you get on the overall
> FP score?

I was using -mcpu=native and run on a Neoverse V1 arch (Graviton3). Here are
the scores I got (relative gains of latest mainline vs. an earlier mainline).

Latest mainline: 0976b012d89e3d819d83cdaf0dab05925b3eb3a0
Earlier mainline: f896c13489d22b30d01257bc8316ab97b3359d1c

fp 1-copy rate  Ratio
503.bwaves_r0.98
507.cactuBSSN_r 1.00
508.namd_r  0.97
510.parest_rNA
511.povray_rNA
519.lbm_r   1.16
521.wrf_r   1.00
526.blender_r   0.99
527.cam4_r  NA
538.imagick_r   1.17
544.nab_r   1.01
549.fotonik3d_r NA
554.roms_r  1.00
geomean 1.03

[Bug tree-optimization/107413] Perf loss ~14% on 519.lbm_r SPEC cpu2017 benchmark with r8-7132-gb5b33e113434be

2022-11-29 Thread wilco at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107413

--- Comment #15 from Wilco  ---
(In reply to Rama Malladi from comment #14)
> This fix also improved performance of 538.imagick_r by 15%. Did you have a
> similar observation? Thank you.

No, but I was using -mcpu=neoverse-n1 as my baseline. It's possible
-mcpu=neoverse-v1 shows larger speedups, what gain do you get on the overall FP
score?

[Bug tree-optimization/107413] Perf loss ~14% on 519.lbm_r SPEC cpu2017 benchmark with r8-7132-gb5b33e113434be

2022-11-29 Thread rvmallad at amazon dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107413

--- Comment #14 from Rama Malladi  ---
This fix also improved performance of 538.imagick_r by 15%. Did you have a
similar observation? Thank you.

[Bug tree-optimization/107413] Perf loss ~14% on 519.lbm_r SPEC cpu2017 benchmark with r8-7132-gb5b33e113434be

2022-11-28 Thread rvmallad at amazon dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107413

--- Comment #13 from Rama Malladi  ---
(In reply to CVS Commits from comment #12)
> The master branch has been updated by Wilco Dijkstra :
> 
> https://gcc.gnu.org/g:0c1b0a23f1fe7db6a2e391b7cb78cff90032
> 
> commit r13-4291-g0c1b0a23f1fe7db6a2e391b7cb78cff90032
> Author: Wilco Dijkstra 
> Date:   Wed Nov 23 17:27:19 2022 +
> 
> AArch64: Add fma_reassoc_width [PR107413]
> 
> Add a reassocation width for FMA in per-CPU tuning structures. Keep
> the existing setting of 1 for cores with 2 FMA pipes (this disables
> reassociation), and use 4 for cores with 4 FMA pipes.  This improves
> SPECFP2017 on Neoverse V1 by ~1.5%.
> 
> gcc/
> PR tree-optimization/107413
> * config/aarch64/aarch64.cc (struct tune_params): Add
> fma_reassoc_width to all CPU tuning structures.
> (aarch64_reassociation_width): Use fma_reassoc_width.
> * config/aarch64/aarch64-protos.h (struct tune_params): Add
> fma_reassoc_width.

Thank you for this code change/ fix. I will attempt a run with this change.

[Bug tree-optimization/107413] Perf loss ~14% on 519.lbm_r SPEC cpu2017 benchmark with r8-7132-gb5b33e113434be

2022-11-24 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107413

--- Comment #12 from CVS Commits  ---
The master branch has been updated by Wilco Dijkstra :

https://gcc.gnu.org/g:0c1b0a23f1fe7db6a2e391b7cb78cff90032

commit r13-4291-g0c1b0a23f1fe7db6a2e391b7cb78cff90032
Author: Wilco Dijkstra 
Date:   Wed Nov 23 17:27:19 2022 +

AArch64: Add fma_reassoc_width [PR107413]

Add a reassocation width for FMA in per-CPU tuning structures. Keep
the existing setting of 1 for cores with 2 FMA pipes (this disables
reassociation), and use 4 for cores with 4 FMA pipes.  This improves
SPECFP2017 on Neoverse V1 by ~1.5%.

gcc/
PR tree-optimization/107413
* config/aarch64/aarch64.cc (struct tune_params): Add
fma_reassoc_width to all CPU tuning structures.
(aarch64_reassociation_width): Use fma_reassoc_width.
* config/aarch64/aarch64-protos.h (struct tune_params): Add
fma_reassoc_width.

[Bug tree-optimization/107413] Perf loss ~14% on 519.lbm_r SPEC cpu2017 benchmark with r8-7132-gb5b33e113434be

2022-11-06 Thread rvmallad at amazon dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107413

--- Comment #11 from Rama Malladi  ---
(In reply to Wilco from comment #10)
> I'm seeing about 1.5% gain on Neoverse V1 and 0.5% loss on Neoverse N1. I'll
> post a patch that allows per-CPU settings for FMA reassociation, so you'll
> get good performance with -mcpu=native. However reassociation really needs
> to be taught about the existence of FMAs.

Thank you very much Wilco.

[Bug tree-optimization/107413] Perf loss ~14% on 519.lbm_r SPEC cpu2017 benchmark with r8-7132-gb5b33e113434be

2022-11-04 Thread wilco at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107413

Wilco  changed:

   What|Removed |Added

 Ever confirmed|0   |1
   Last reconfirmed||2022-11-04
 Status|UNCONFIRMED |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |wilco at gcc dot gnu.org

--- Comment #10 from Wilco  ---
(In reply to Rama Malladi from comment #9)
> (In reply to Rama Malladi from comment #8)
> > (In reply to Wilco from comment #7)
> > > The revert results in about 0.5% loss on Neoverse N1, so it looks like the
> > > reassociation pass is still splitting FMAs into separate MUL and ADD 
> > > (which
> > > is bad for narrow cores).
> > 
> > Thank you for checking on N1. Did you happen to check on V1 too to reproduce
> > the perf results I had? Any other experiments/ tests I can do to help on
> > this filing? Thanks again for the debug/ fix.
> 
> I ran SPEC cpu2017 fprate 1-copy benchmark built with the patch reverted and
> using option 'neoverse-n1' on the Graviton 3 processor (which has support
> for SVE). The performance was up by 0.4%, primary contributor being
> 519.lbm_r which was up 13%.

I'm seeing about 1.5% gain on Neoverse V1 and 0.5% loss on Neoverse N1. I'll
post a patch that allows per-CPU settings for FMA reassociation, so you'll get
good performance with -mcpu=native. However reassociation really needs to be
taught about the existence of FMAs.

[Bug tree-optimization/107413] Perf loss ~14% on 519.lbm_r SPEC cpu2017 benchmark with r8-7132-gb5b33e113434be

2022-11-02 Thread rvmallad at amazon dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107413

--- Comment #9 from Rama Malladi  ---
(In reply to Rama Malladi from comment #8)
> (In reply to Wilco from comment #7)
> > The revert results in about 0.5% loss on Neoverse N1, so it looks like the
> > reassociation pass is still splitting FMAs into separate MUL and ADD (which
> > is bad for narrow cores).
> 
> Thank you for checking on N1. Did you happen to check on V1 too to reproduce
> the perf results I had? Any other experiments/ tests I can do to help on
> this filing? Thanks again for the debug/ fix.

I ran SPEC cpu2017 fprate 1-copy benchmark built with the patch reverted and
using option 'neoverse-n1' on the Graviton 3 processor (which has support for
SVE). The performance was up by 0.4%, primary contributor being 519.lbm_r which
was up 13%.

[Bug tree-optimization/107413] Perf loss ~14% on 519.lbm_r SPEC cpu2017 benchmark with r8-7132-gb5b33e113434be

2022-11-01 Thread rvmallad at amazon dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107413

--- Comment #8 from Rama Malladi  ---
(In reply to Wilco from comment #7)
> The revert results in about 0.5% loss on Neoverse N1, so it looks like the
> reassociation pass is still splitting FMAs into separate MUL and ADD (which
> is bad for narrow cores).

Thank you for checking on N1. Did you happen to check on V1 too to reproduce
the perf results I had? Any other experiments/ tests I can do to help on this
filing? Thanks again for the debug/ fix.

[Bug tree-optimization/107413] Perf loss ~14% on 519.lbm_r SPEC cpu2017 benchmark with r8-7132-gb5b33e113434be

2022-11-01 Thread wilco at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107413

--- Comment #7 from Wilco  ---
(In reply to Rama Malladi from comment #5)

> So, looks like we aren't impacted much with this commit revert.
> 
> I haven't yet tried fp_reassoc_width. Will try shortly.

The revert results in about 0.5% loss on Neoverse N1, so it looks like the
reassociation pass is still splitting FMAs into separate MUL and ADD (which is
bad for narrow cores).

[Bug tree-optimization/107413] Perf loss ~14% on 519.lbm_r SPEC cpu2017 benchmark

2022-10-28 Thread rvmallad at amazon dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107413

--- Comment #6 from Rama Malladi  ---
The compilation options were: -Ofast -mcpu=native -flto

[Bug tree-optimization/107413] Perf loss ~14% on 519.lbm_r SPEC cpu2017 benchmark

2022-10-28 Thread rvmallad at amazon dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107413

--- Comment #5 from Rama Malladi  ---
(In reply to Wilco from comment #2)
> That's interesting - if the reassociation pass has become a bit smarter in
> the last 5 years, we might no longer need this workaround. What is the
> effect on the overall SPECFP score? Did you try other values like
> fp_reassoc_width = 2 or 3?

Here is SPEC cpu2017 fprate perf data for 1-copy rate run. The runs were run on
a c7g.16xlarge AWS cloud instance.

Benchmark   w fix
--
503.bwaves_r0.98
507.cactuBSSN_r NA
508.namd_r  0.97
510.parest_rNA
511.povray_r1.01
519.lbm_r   1.16
521.wrf_r   1.00
526.blender_r   NA
527.cam4_r  1.00
538.imagick_r   0.99
544.nab_r   1.00
549.fotonik3d_r NA
554.roms_r  1.00
geomean 1.01

The baseline was gcc version 12.2.0 (GCC) compiler. Fix was revert of code
change in commit: b5b33e113434be909e8a6d7b93824196fb6925c0.

So, looks like we aren't impacted much with this commit revert.

I haven't yet tried fp_reassoc_width. Will try shortly.

[Bug tree-optimization/107413] Perf loss ~14% on 519.lbm_r SPEC cpu2017 benchmark

2022-10-27 Thread mark at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107413

--- Comment #4 from Mark Wielaard  ---
The content of attachment 53775 has been deleted for the following reason:

https://sourceware.org/pipermail/overseers/2022q4/019048.html

[Bug tree-optimization/107413] Perf loss ~14% on 519.lbm_r SPEC cpu2017 benchmark

2022-10-26 Thread rvmallad at amazon dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107413

--- Comment #3 from Rama Malladi  ---
I will get the effect of this revert for the overall SPEC FP score. I haven't
tried experimenting with fp_reassoc_width values. Will try it and update.

[Bug tree-optimization/107413] Perf loss ~14% on 519.lbm_r SPEC cpu2017 benchmark

2022-10-26 Thread wilco at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107413

Wilco  changed:

   What|Removed |Added

 CC||wilco at gcc dot gnu.org

--- Comment #2 from Wilco  ---
That's interesting - if the reassociation pass has become a bit smarter in the
last 5 years, we might no longer need this workaround. What is the effect on
the overall SPECFP score? Did you try other values like fp_reassoc_width = 2 or
3?

[Bug tree-optimization/107413] Perf loss ~14% on 519.lbm_r SPEC cpu2017 benchmark

2022-10-26 Thread rvmallad at amazon dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107413

--- Comment #1 from Rama Malladi  ---
$ /home/ubuntu/gccfixissue2/bin/gcc  -v
Using built-in specs.
COLLECT_GCC=/home/ubuntu/gccfixissue2/bin/gcc
COLLECT_LTO_WRAPPER=/home/ubuntu/gccfixissue2/libexec/gcc/aarch64-unknown-linux-gnu/13.0.0/lto-wrapper
Target: aarch64-unknown-linux-gnu
Configured with: ../configure --prefix=/home/ubuntu/gccfixissue2
--enable-languages=c,fortran
Thread model: posix
Supported LTO compression algorithms: zlib
gcc version 13.0.0 20221021 (experimental) (GCC)