[Bug ipa/114531] Feature proposal for an `-finline-functions-aggressive` compiler option
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114531 Rama Malladi changed: What|Removed |Added CC||rvmallad at amazon dot com --- Comment #8 from Rama Malladi --- Created attachment 57898 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57898=edit Updated patch for `-finline-functions-aggressive` GCC option. This is an updated patch to include a new GCC option: `-finline-functions-aggressive`. It has the `-O3` inlining heuristics replaced with an entry that implies `OPT_finline_functions_aggressive` is enabled. It also has an entry in `invoke.texi` for documentation stating that this option selects the same inlining heuristics as `-O3`.
[Bug ipa/114531] Feature proposal for an `-finline-functions-aggressive` compiler option
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114531 --- Comment #7 from Rama Malladi --- (In reply to Rama Malladi from comment #5) > (In reply to Andrew Pinski from comment #3) > > Also do you have numbers with lto enabled? Or is these without lto? > > > > Does LTO improve the situation for Envoy too? > > These numbers are without lto. I haven't tried it but I can try and post an > update. I checked and found the Envoy run was w/o LTO but SPEC cpu2017 intrate was w LTO. I tried a build of Envoy w LTO and it failed. I need to debug that issue further. Below are perf results w/o LTO. gcc version 11.4.0 (Ubuntu 11.4.0-1ubuntu1~22.04). copies=8-O2 -Ofast Gain w -O2 + inlining Gain w noLTO noLTO Ofast noLTO inlining 500.perlbench_r 33.733.398.8% 33.298.5% 502.gcc_r 45.246.9103.8% 46.3102.4% 505.mcf_r 44.744.399.1% 44.699.8% 520.omnetpp_r 21.424.4114.0% 21.399.5% 523.xalancbmk_r 41.645.5109.4% 44 105.8% 525.x264_r 44.289 201.4% 43.999.3% 531.deepsjeng_r 32.832.8100.0% 33.1100.9% 541.leela_r 28.630.5106.6% 30.3105.9% 548.exchange2_r 64.164.6100.8% 64.1100.0% 557.xz_r20.320.4100.5% 20.3100.0% SPECrate..base 35.639.4110.7% 36 101.1%
[Bug ipa/114531] Feature proposal for an `-finline-functions-aggressive` compiler option
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114531 --- Comment #5 from Rama Malladi --- (In reply to Andrew Pinski from comment #3) > Also do you have numbers with lto enabled? Or is these without lto? > > Does LTO improve the situation for Envoy too? These numbers are without lto. I haven't tried it but I can try and post an update.
[Bug ipa/114531] Feature proposal for an `-finline-functions-aggressive` compiler option
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114531 --- Comment #4 from Rama Malladi --- (In reply to Andrew Pinski from comment #1) > Maybe we should figure out why the increase of the limits help and add extra > code to get better heuristics rather than just tweaking the limits. > > I know that there was some improvements for gcc 14 already for the > heuristics for c++ code. interesting... thank you.
[Bug driver/114531] New: Feature proposal for an `-finline-functions-aggressive` compiler option
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114531 Bug ID: 114531 Summary: Feature proposal for an `-finline-functions-aggressive` compiler option Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: driver Assignee: unassigned at gcc dot gnu.org Reporter: rvmallad at amazon dot com CC: rsandifo at gcc dot gnu.org Target Milestone: --- Created attachment 57837 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57837=edit patch to implement -finline-functions-aggressive option in GCC This is a proposal for a user-visible GCC compiler option for aggressive inlining that is currently only available at -O3 as internal inline parameters (--param=early-inlining-insns=14 --param=inline-heuristics-hint-percent=600 --param=inline-min-speedup=15 --param=max-inline-insns-auto=30 --param=max-inline-insns-single=200). I got some perf data for Envoy (https://github.com/envoyproxy/envoy) and SPEC CPU2017 intrate benchmarks on C7g.2xlarge w Ubuntu22 + gcc-11.4.0. We see perf gains (2% - 5%) using these aggressive inline parameters (at -O2). Attached is a patch for this change. We do not want to add these inline limits at ‘-O2’ itself, as we see from one of the SPEC CPU tests that got slower. Also, more inline tuning at -O2 would make some of the symbols not to be available for probe/ debug (that are available when not using these aggressive inline params). --- Envoy load_balancer_benchmark – using only 1 CPU – Iterations, higher better $ bazel run -c opt //test/common/upstream:load_balancer_benchmark bazel-envoy/external/local_config_cc/BUILD can be changed for adding inline parameters/ options. Benchmark Iterations Baseline O2+ Inline Params Gain benchmarkRoundRobinLoad 1518 1596 1.05x BalancerBuild/500/50/50 benchmarkLeastRequestLoad1465 1514 1.03x BalancerChooseHost/100/3/1000 benchmarkRingHashLoadBalancer 33 34 1.03x ChooseHost/100/65536/10 benchmarkMaglevLoadBalancer69 72 1.04x Weighted/500/95/75/25/1 copies=8"-O2" "-Ofast" Gain "-O2 + Gain w w Ofastinlining" inlining 500.perlbench_r 36.534.3 94.0% 34.494.2% 502.gcc_r 45.447.6 104.8% 47.5104.6% 505.mcf_r 44.648.2 108.1% 44.399.3% 520.omnetpp_r 22.124.9 112.7% 21.999.1% 523.xalancbmk_r 43.846.3 105.7% 45.4103.7% 525.x264_r 44.389 200.9% 43.898.9% 531.deepsjeng_r 36 37.3 103.6% 37.5104.2% 541.leela_r 33.533.9 101.2% 34.2102.1% 548.exchange2_r 65.476.6 117.1% 65.399.8% 557.xz_r19.819.9 100.5% 19.9100.5% SPECrate..base 37.141.6 112.1% 37.3100.5% ---
[Bug sanitizer/97696] ICE since ASAN_MARK does not handle poly_int sized varibales
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97696 --- Comment #5 from Rama Malladi --- Thank you Richard for this patch/ fix.
[Bug sanitizer/97696] ICE since ASAN_MARK does not handle poly_int sized varibales
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97696 Rama Malladi changed: What|Removed |Added CC||rvmallad at amazon dot com --- Comment #2 from Rama Malladi --- Hi, Can this be actioned/ fixed? We had a related issue and would like this fixed. https://github.com/numpy/numpy/issues/25556 Thank you. Rama
[Bug tree-optimization/107409] Perf loss ~5% on 519.lbm_r SPEC cpu2017 benchmark with r10-5090-ga9a4edf0e71bba
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107409 --- Comment #23 from Rama Malladi --- (In reply to Rama Malladi from comment #22) > I will close this issue as we were unable to reproduce the perf drop going > from gcc-7 to gcc-8 on a Graviton2 based instance. The performance of > 519.lbm_r built with gcc-7.4 was same as that with gcc-8.5. Can someone from the GCC dev/ regression team close this issue as I am unable to find an option for the same? Thanks
[Bug tree-optimization/107409] Perf loss ~5% on 519.lbm_r SPEC cpu2017 benchmark with r10-5090-ga9a4edf0e71bba
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107409 --- Comment #22 from Rama Malladi --- I will close this issue as we were unable to reproduce the perf drop going from gcc-7 to gcc-8 on a Graviton2 based instance. The performance of 519.lbm_r built with gcc-7.4 was same as that with gcc-8.5.
[Bug tree-optimization/107409] Perf loss ~5% on 519.lbm_r SPEC cpu2017 benchmark with r10-5090-ga9a4edf0e71bba
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107409 --- Comment #21 from Rama Malladi --- I did another triage for perf loss on Graviton 2 processor (neoverse-n1) based instance and found this commit: `a9a4edf0e71bbac9f1b5dcecdcf9250111d16889` to be the reason. As I had indicated in my earlier reply, I was doing a triage of perf loss going from gcc-7 to gcc-10. The perf of 519.libm_r 1-copy run improved 1.08x with the revert of commit: `a9a4edf0e71bbac9f1b5dcecdcf9250111d16889` on gcc-mainline ( `2f1691be517fcdcabae9cd671ab511eb0e08b1d5`). I am guessing that we don't see it on LNT/ Altra CPUs. So, please look into this issue fix. Let me know if you have any queries. Thanks.
[Bug tree-optimization/107409] Perf loss ~5% on 519.lbm_r SPEC cpu2017 benchmark with r10-5090-ga9a4edf0e71bba
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107409 --- Comment #20 from Rama Malladi --- @Martin J and @Sebastian P, Let me walk you through the perf data and my triage. First, my triage has been on Graviton 3 (neoverse-v1) processor based instances. Next, I was looking for perf delta going from gcc-7 to gcc-10. I found 2 issues: One was reported in 107413 (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107413) and fixed (the perf delta between gcc-7 and gcc-8 -- 215s vs. 266s); Another one is the issue reported in here. I did another triage and landed at the same commit that I reported earlier. # first bad commit: [a9a4edf0e71bbac9f1b5dcecdcf9250111d16889] Update max_bb_count in execute_fixup_cfg Please let me know any further info/ studies you would like to see on this report. Thank you.
[Bug tree-optimization/107409] Perf loss ~5% on 519.lbm_r SPEC cpu2017 benchmark with r10-5090-ga9a4edf0e71bba
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107409 --- Comment #19 from Rama Malladi --- Thanks @Sebastian and @Martin J. I will get another bisect between GCC 7-and-8.
[Bug tree-optimization/107409] Perf loss ~5% on 519.lbm_r SPEC cpu2017 benchmark with r10-5090-ga9a4edf0e71bba
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107409 --- Comment #15 from Rama Malladi --- Hi, Can we review this issue and suggest next steps/ action please? Thanks.
[Bug tree-optimization/107409] Perf loss ~5% on 519.lbm_r SPEC cpu2017 benchmark with r10-5090-ga9a4edf0e71bba
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107409 --- Comment #14 from Rama Malladi --- (In reply to Martin Liška from comment #13) > Note the mentioned revision is a fix and yes, sometimes these revisions can > end up with a regression as profile estimation is a complex guess. Yes, possibly. So, from my understanding, the update_max_bb_count() tracks the max basic block count and takes a decision to inline or not in this case/ run. That is likely why we see a larger instruction count w this function executed/ enabled.
[Bug tree-optimization/107409] Perf loss ~5% on 519.lbm_r SPEC cpu2017 benchmark with r10-5090-ga9a4edf0e71bba
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107409 --- Comment #12 from Rama Malladi --- I found difference in dumps at various stages of the compilation for the mainline GCC and with update_max_bb_count() commented. Here are the details: Mainline: Commit ID: 63a42ffc0833553fbcb84b50cf0fd2d867b8a92f There was difference in the dumps for these 2 stages: "einline" and "earlydebug" Since we use LTO for this build of 519.lbm_r build, I found these differences in these stages of the link-time optimizer: "vect", "slp1", "ivopts", "earlydebug", "debug" Also, this perf drop of 5%-6% with update_max_bb_count() code was observed only on ARM64 instances (Graviton3) and not on x86_64 instances (Intel Xeon). I ran the other SPEC cpu2017_fprate benchmarks on ARM64 with this code commented on GCC mainline and I haven't observed any perf regression. So, maybe worth a fix. Thank you.
[Bug tree-optimization/107409] Perf loss ~5% on 519.lbm_r SPEC cpu2017 benchmark with r10-5090-ga9a4edf0e71bba
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107409 --- Comment #11 from Rama Malladi --- (In reply to Martin Liška from comment #10) > @Honza ? Just checking if this can be fixed/ implemented. Thanks.
[Bug tree-optimization/107413] Perf loss ~14% on 519.lbm_r SPEC cpu2017 benchmark with r8-7132-gb5b33e113434be
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107413 --- Comment #19 from Rama Malladi --- (In reply to Wilco from comment #17) > (In reply to Rama Malladi from comment #16) > > (In reply to Wilco from comment #15) > > > (In reply to Rama Malladi from comment #14) > > > > This fix also improved performance of 538.imagick_r by 15%. Did you > > > > have a > > > > similar observation? Thank you. > > > > > > No, but I was using -mcpu=neoverse-n1 as my baseline. It's possible > > > -mcpu=neoverse-v1 shows larger speedups, what gain do you get on the > > > overall > > > FP score? > > > > I was using -mcpu=native and run on a Neoverse V1 arch (Graviton3). Here are > > the scores I got (relative gains of latest mainline vs. an earlier > > mainline). > > > > Latest mainline: 0976b012d89e3d819d83cdaf0dab05925b3eb3a0 > > Earlier mainline: f896c13489d22b30d01257bc8316ab97b3359d1c > > Right that's about 3 weeks of changes, I think > 1b9a5cc9ec08e9f239dd2096edcc447b7a72f64a has improved imagick_r. > > > geomean 1.03 > > That's a nice gain in 3 weeks! Hi Wilco, Could you backport the change to active release branches? Thanks.
[Bug tree-optimization/107413] Perf loss ~14% on 519.lbm_r SPEC cpu2017 benchmark with r8-7132-gb5b33e113434be
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107413 --- Comment #18 from Rama Malladi --- (In reply to Wilco from comment #17) > (In reply to Rama Malladi from comment #16) > > (In reply to Wilco from comment #15) > > > (In reply to Rama Malladi from comment #14) > > > > This fix also improved performance of 538.imagick_r by 15%. Did you > > > > have a > > > > similar observation? Thank you. > > > > > > No, but I was using -mcpu=neoverse-n1 as my baseline. It's possible > > > -mcpu=neoverse-v1 shows larger speedups, what gain do you get on the > > > overall > > > FP score? > > > > I was using -mcpu=native and run on a Neoverse V1 arch (Graviton3). Here are > > the scores I got (relative gains of latest mainline vs. an earlier > > mainline). > > > > Latest mainline: 0976b012d89e3d819d83cdaf0dab05925b3eb3a0 > > Earlier mainline: f896c13489d22b30d01257bc8316ab97b3359d1c > > Right that's about 3 weeks of changes, I think > 1b9a5cc9ec08e9f239dd2096edcc447b7a72f64a has improved imagick_r. > > > geomean 1.03 > > That's a nice gain in 3 weeks! Yes, indeed :-) ... Thank you.
[Bug tree-optimization/107409] Perf loss ~5% on 519.lbm_r SPEC cpu2017 benchmark with r10-5090-ga9a4edf0e71bba
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107409 --- Comment #9 from Rama Malladi --- (In reply to Martin Liška from comment #3) > Can you please share perf-profile before and after the revision? > > Note I can't see it for Altra aarch64 CPU: > https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=633.477.0. > 1=683.477.0=664.477.0=648.477.0=618.477.0=605. > 477.0=759.477.0=584.477.0& > > However, there are huge changes in between GCC 6/7 and a newer releases. > Note the benchmark is pretty small and very sensitive to instruction caches. Hi, I got IPC data for baseline version of compiler and with this patch reverted. This is on Graviton3 processor machine, executing 1-copy rate run of 519.lbm_r. Baseline: Compiler commit ID: f896c13489d22b30d01257bc8316ab97b3359d1c Cycles:148,489,372,938 Instructions: 382,748,379,257 IPC: 2.58 Baseline with code change in a9a4edf0e71bbac9f1b5dcecdcf9250111d16889 reverted. $ git diff gcc/tree-cfg.cc diff --git a/gcc/tree-cfg.cc b/gcc/tree-cfg.cc index d982988048f..736432713fe 100644 --- a/gcc/tree-cfg.cc +++ b/gcc/tree-cfg.cc @@ -9984,7 +9984,7 @@ execute_fixup_cfg (void) } if (scale) { - update_max_bb_count (); +// update_max_bb_count (); compute_function_frequency (); } Cycles:140,937,228,769 Instructions: 368,881,714,982 IPC: 2.62 >From the above, I do see the instructions executed are higher for the baseline compiler code-gen vs. the one with patch reverted. Can you please look into the code-gen and let me know if you find some perf opportunity with this patch revert? Thank you.
[Bug tree-optimization/107413] Perf loss ~14% on 519.lbm_r SPEC cpu2017 benchmark with r8-7132-gb5b33e113434be
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107413 --- Comment #16 from Rama Malladi --- (In reply to Wilco from comment #15) > (In reply to Rama Malladi from comment #14) > > This fix also improved performance of 538.imagick_r by 15%. Did you have a > > similar observation? Thank you. > > No, but I was using -mcpu=neoverse-n1 as my baseline. It's possible > -mcpu=neoverse-v1 shows larger speedups, what gain do you get on the overall > FP score? I was using -mcpu=native and run on a Neoverse V1 arch (Graviton3). Here are the scores I got (relative gains of latest mainline vs. an earlier mainline). Latest mainline: 0976b012d89e3d819d83cdaf0dab05925b3eb3a0 Earlier mainline: f896c13489d22b30d01257bc8316ab97b3359d1c fp 1-copy rate Ratio 503.bwaves_r0.98 507.cactuBSSN_r 1.00 508.namd_r 0.97 510.parest_rNA 511.povray_rNA 519.lbm_r 1.16 521.wrf_r 1.00 526.blender_r 0.99 527.cam4_r NA 538.imagick_r 1.17 544.nab_r 1.01 549.fotonik3d_r NA 554.roms_r 1.00 geomean 1.03
[Bug tree-optimization/107413] Perf loss ~14% on 519.lbm_r SPEC cpu2017 benchmark with r8-7132-gb5b33e113434be
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107413 --- Comment #14 from Rama Malladi --- This fix also improved performance of 538.imagick_r by 15%. Did you have a similar observation? Thank you.
[Bug tree-optimization/107413] Perf loss ~14% on 519.lbm_r SPEC cpu2017 benchmark with r8-7132-gb5b33e113434be
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107413 --- Comment #13 from Rama Malladi --- (In reply to CVS Commits from comment #12) > The master branch has been updated by Wilco Dijkstra : > > https://gcc.gnu.org/g:0c1b0a23f1fe7db6a2e391b7cb78cff90032 > > commit r13-4291-g0c1b0a23f1fe7db6a2e391b7cb78cff90032 > Author: Wilco Dijkstra > Date: Wed Nov 23 17:27:19 2022 + > > AArch64: Add fma_reassoc_width [PR107413] > > Add a reassocation width for FMA in per-CPU tuning structures. Keep > the existing setting of 1 for cores with 2 FMA pipes (this disables > reassociation), and use 4 for cores with 4 FMA pipes. This improves > SPECFP2017 on Neoverse V1 by ~1.5%. > > gcc/ > PR tree-optimization/107413 > * config/aarch64/aarch64.cc (struct tune_params): Add > fma_reassoc_width to all CPU tuning structures. > (aarch64_reassociation_width): Use fma_reassoc_width. > * config/aarch64/aarch64-protos.h (struct tune_params): Add > fma_reassoc_width. Thank you for this code change/ fix. I will attempt a run with this change.
[Bug tree-optimization/107413] Perf loss ~14% on 519.lbm_r SPEC cpu2017 benchmark with r8-7132-gb5b33e113434be
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107413 --- Comment #11 from Rama Malladi --- (In reply to Wilco from comment #10) > I'm seeing about 1.5% gain on Neoverse V1 and 0.5% loss on Neoverse N1. I'll > post a patch that allows per-CPU settings for FMA reassociation, so you'll > get good performance with -mcpu=native. However reassociation really needs > to be taught about the existence of FMAs. Thank you very much Wilco.
[Bug tree-optimization/107413] Perf loss ~14% on 519.lbm_r SPEC cpu2017 benchmark with r8-7132-gb5b33e113434be
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107413 --- Comment #9 from Rama Malladi --- (In reply to Rama Malladi from comment #8) > (In reply to Wilco from comment #7) > > The revert results in about 0.5% loss on Neoverse N1, so it looks like the > > reassociation pass is still splitting FMAs into separate MUL and ADD (which > > is bad for narrow cores). > > Thank you for checking on N1. Did you happen to check on V1 too to reproduce > the perf results I had? Any other experiments/ tests I can do to help on > this filing? Thanks again for the debug/ fix. I ran SPEC cpu2017 fprate 1-copy benchmark built with the patch reverted and using option 'neoverse-n1' on the Graviton 3 processor (which has support for SVE). The performance was up by 0.4%, primary contributor being 519.lbm_r which was up 13%.
[Bug tree-optimization/107413] Perf loss ~14% on 519.lbm_r SPEC cpu2017 benchmark with r8-7132-gb5b33e113434be
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107413 --- Comment #8 from Rama Malladi --- (In reply to Wilco from comment #7) > The revert results in about 0.5% loss on Neoverse N1, so it looks like the > reassociation pass is still splitting FMAs into separate MUL and ADD (which > is bad for narrow cores). Thank you for checking on N1. Did you happen to check on V1 too to reproduce the perf results I had? Any other experiments/ tests I can do to help on this filing? Thanks again for the debug/ fix.
[Bug tree-optimization/107413] Perf loss ~14% on 519.lbm_r SPEC cpu2017 benchmark
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107413 --- Comment #6 from Rama Malladi --- The compilation options were: -Ofast -mcpu=native -flto
[Bug tree-optimization/107413] Perf loss ~14% on 519.lbm_r SPEC cpu2017 benchmark
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107413 --- Comment #5 from Rama Malladi --- (In reply to Wilco from comment #2) > That's interesting - if the reassociation pass has become a bit smarter in > the last 5 years, we might no longer need this workaround. What is the > effect on the overall SPECFP score? Did you try other values like > fp_reassoc_width = 2 or 3? Here is SPEC cpu2017 fprate perf data for 1-copy rate run. The runs were run on a c7g.16xlarge AWS cloud instance. Benchmark w fix -- 503.bwaves_r0.98 507.cactuBSSN_r NA 508.namd_r 0.97 510.parest_rNA 511.povray_r1.01 519.lbm_r 1.16 521.wrf_r 1.00 526.blender_r NA 527.cam4_r 1.00 538.imagick_r 0.99 544.nab_r 1.00 549.fotonik3d_r NA 554.roms_r 1.00 geomean 1.01 The baseline was gcc version 12.2.0 (GCC) compiler. Fix was revert of code change in commit: b5b33e113434be909e8a6d7b93824196fb6925c0. So, looks like we aren't impacted much with this commit revert. I haven't yet tried fp_reassoc_width. Will try shortly.
[Bug c++/107433] 510.parest_r, call of overloaded 'back_interpolate' is ambiguous
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107433 --- Comment #2 from Rama Malladi --- (In reply to Martin Liška from comment #1) > As mentioned slightly here: > https://www.spec.org/cpu2017/Docs/benchmarks/510.parest_r.html > please use -std=c++98 or something < c++17. Thank you. I had it for C compiler. Will add it to C++ compiler command-line too.
[Bug tree-optimization/107409] Perf loss ~5% on 519.lbm_r SPEC cpu2017 benchmark
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107409 --- Comment #8 from Rama Malladi --- (In reply to Mark Wielaard from comment #7) > The content of attachment 53773 [details] has been deleted for the following > reason: > > https://sourceware.org/pipermail/overseers/2022q4/019048.html Thank you.
[Bug tree-optimization/107409] Perf loss ~5% on 519.lbm_r SPEC cpu2017 benchmark
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107409 --- Comment #6 from Rama Malladi --- (In reply to Martin Liška from comment #5) > Please try writing here: overse...@sourceware.org I have asked for deletion. Thanks
[Bug c/107433] New: 510.parest_r, call of overloaded 'back_interpolate' is ambiguous
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107433 Bug ID: 107433 Summary: 510.parest_r, call of overloaded 'back_interpolate' is ambiguous Product: gcc Version: 13.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: rvmallad at amazon dot com Target Milestone: --- $ g++ -mabi=lp64 -c -o source/fe/fe_tools.o -DSPEC -DNDEBUG -Iinclude -I. -DSPEC_AUTO_SUPPRESS_OPENMP -g -O3 -mcpu=native -fpermissive -DSPEC_LP64 source/fe/fe_tools.cc source/fe/fe_tools.cc:1301:21: error: call of overloaded 'back_interpolate(const dealii::DoFHandler<3, 3>&, const dealii::BlockVector&, const dealii::FiniteElement<3, 3>&, dealii::BlockVector&)' is ambiguous 1301 | back_interpolate(dof1, u1, dof2.get_fe(), u1_interpolated); | ^~ $ /home/ubuntu/gccmainline/bin/g++ -v Using built-in specs. COLLECT_GCC=/home/ubuntu/gccmainline/bin/g++ COLLECT_LTO_WRAPPER=/home/ubuntu/gccmainline/libexec/gcc/aarch64-unknown-linux-gnu/13.0.0/lto-wrapper Target: aarch64-unknown-linux-gnu Configured with: ../configure --prefix=/home/ubuntu/gccmainline --enable-languages=c,fortran Thread model: posix Supported LTO compression algorithms: zlib gcc version 13.0.0 20221026 (experimental) (GCC)
[Bug tree-optimization/107409] Perf loss ~5% on 519.lbm_r SPEC cpu2017 benchmark
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107409 --- Comment #4 from Rama Malladi --- Hi Martin, Thanks for the guidance. Can we delete the attachment from this bug report? Regards, Rama
[Bug tree-optimization/107413] Perf loss ~14% on 519.lbm_r SPEC cpu2017 benchmark
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107413 --- Comment #3 from Rama Malladi --- I will get the effect of this revert for the overall SPEC FP score. I haven't tried experimenting with fp_reassoc_width values. Will try it and update.
[Bug tree-optimization/107413] Perf loss ~14% on 519.lbm_r SPEC cpu2017 benchmark
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107413 --- Comment #1 from Rama Malladi --- $ /home/ubuntu/gccfixissue2/bin/gcc -v Using built-in specs. COLLECT_GCC=/home/ubuntu/gccfixissue2/bin/gcc COLLECT_LTO_WRAPPER=/home/ubuntu/gccfixissue2/libexec/gcc/aarch64-unknown-linux-gnu/13.0.0/lto-wrapper Target: aarch64-unknown-linux-gnu Configured with: ../configure --prefix=/home/ubuntu/gccfixissue2 --enable-languages=c,fortran Thread model: posix Supported LTO compression algorithms: zlib gcc version 13.0.0 20221021 (experimental) (GCC)
[Bug tree-optimization/107409] Perf loss ~5% on 519.lbm_r SPEC cpu2017 benchmark
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107409 --- Comment #1 from Rama Malladi --- $ /home/ubuntu/gccfixissue1/bin/gcc -v Using built-in specs. COLLECT_GCC=/home/ubuntu/gccfixissue1/bin/gcc COLLECT_LTO_WRAPPER=/home/ubuntu/gccfixissue1/libexec/gcc/aarch64-unknown-linux-gnu/13.0.0/lto-wrapper Target: aarch64-unknown-linux-gnu Configured with: ../configure --prefix=/home/ubuntu/gccfixissue1 --enable-languages=c,fortran Thread model: posix Supported LTO compression algorithms: zlib gcc version 13.0.0 20221021 (experimental) (GCC)
[Bug tree-optimization/107413] New: Perf loss ~14% on 519.lbm_r SPEC cpu2017 benchmark
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107413 Bug ID: 107413 Summary: Perf loss ~14% on 519.lbm_r SPEC cpu2017 benchmark Product: gcc Version: 13.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: rvmallad at amazon dot com Target Milestone: --- Created attachment 53775 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53775=edit Input and source files. Below is some perf data executing the 519.lbm_r benchmark on aarch64 architecture (Graviton 3 processor). I have comparison of the baseline perf (mainline commit ID: f56d48b2471c388401174029324e1f4c4b84fcdb) vs. a fix for the same (revert the code change in commit ID: b5b33e113434be909e8a6d7b93824196fb6925c0). Steps to compile: $ gcc -std=c99 -mabi=lp64 -g -Ofast -mcpu=native lbm.i main.i -lm -flto -o 519_lbm_r_base $ time ./519_lbm_r_base 3000 reference.dat 0 0 100_100_130_ldc.of real2m50.946s Reverting the code changes in commit ID: b5b33e113434be909e8a6d7b93824196fb6925c0 $ time ./519_lbm_r_fix 3000 reference.dat 0 0 100_100_130_ldc.of real2m27.157s The code change reverted was: [AArch64] PR84114: Avoid reassociating FMA Author: Wilco Dijkstra Date: Mon Mar 5 14:40:55 2018 + Please find attached the files to reproduce this issue and the fix.
[Bug tree-optimization/107409] New: Perf loss ~5% on 519.lbm_r SPEC cpu2017 benchmark
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107409 Bug ID: 107409 Summary: Perf loss ~5% on 519.lbm_r SPEC cpu2017 benchmark Product: gcc Version: 13.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: rvmallad at amazon dot com Target Milestone: --- Created attachment 53773 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53773=edit Input and source files. Below is some perf data executing the 519.lbm_r benchmark on aarch64 architecture (Graviton 3 processor). I have comparison of the baseline perf (mainline commit ID: f56d48b2471c388401174029324e1f4c4b84fcdb) vs. a fix for the same (revert the code change in commit ID: a9a4edf0e71bbac9f1b5dcecdcf9250111d16889). Steps to compile: $ gcc -std=c99 -mabi=lp64 -g -Ofast -mcpu=native lbm.i main.i -lm -flto -o 519_lbm_r_base $ time ./519_lbm_r_base 3000 reference.dat 0 0 100_100_130_ldc.of real2m50.946s Reverting the code changes in commit ID: a9a4edf0e71bbac9f1b5dcecdcf9250111d16889 $ time ./519_lbm_r_fix 3000 reference.dat 0 0 100_100_130_ldc.of real2m42.091s The code change reverted was in the following file: * tree-cfg.c (execute_fixup_cfg): Update also max_bb_count when scaling happen. Author: Jan Hubicka Date: Sat Nov 30 22:25:24 2019 +0100 Please find attached the files to reproduce this issue and the fix.