[Bug rtl-optimization/117801] [15 regression] aarch64: 20% regression in TSVC s278 since r15-3509-gd34cda72098867
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117801 --- Comment #14 from GCC Commits --- The master branch has been updated by Richard Biener : https://gcc.gnu.org/g:dc0dea98c96e02c6b24060170bc88da8d4931bc2 commit r15-5943-gdc0dea98c96e02c6b24060170bc88da8d4931bc2 Author: Richard Biener Date: Wed Nov 27 13:36:19 2024 +0100 middle-end/117801 - failed register coalescing due to GIMPLE schedule For a TSVC testcase we see failed register coalescing due to a different schedule of GIMPLE .FMA and stores fed by it. This can be mitigated by making direct internal functions participate in TER - given we're using more and more of such functions to expose target capabilities it seems to be a natural thing to not exempt those. Unfortunately the internal function expanding API doesn't match what we usually have - passing in a target and returning an RTX but instead the LHS of the call is expanded and written to. This makes the TER expansion of a call SSA def a bit unwieldly. Bootstrapped and tested on x86_64-unknown-linux-gnu. The ccmp changes have likely not seen any coverage, the debug stmt changes might not be optimal, we might end up losing on replaceable calls. PR middle-end/117801 * tree-outof-ssa.cc (ssa_is_replaceable_p): Make direct internal function calls replaceable. * expr.cc (get_def_for_expr): Handle replacements with calls. (get_def_for_expr_class): Likewise. (optimize_bitfield_assignment_op): Likewise. (expand_expr_real_1): Likewise. Properly expand direct internal function defs. * cfgexpand.cc (expand_call_stmt): Handle replacements with calls. (avoid_deep_ter_for_debug): Likewise, always create a debug temp for calls. (expand_debug_expr): Likewise, give up for calls. (expand_gimple_basic_block): Likewise. * ccmp.cc (ccmp_candidate_p): Likewise. (get_compare_parts): Likewise.
[Bug rtl-optimization/117801] [15 regression] aarch64: 20% regression in TSVC s278 since r15-3509-gd34cda72098867
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117801 Richard Biener changed: What|Removed |Added Resolution|--- |FIXED Status|ASSIGNED|RESOLVED --- Comment #15 from Richard Biener --- Fixed then. Will be reverted in case of bigger problems though.
[Bug rtl-optimization/117801] [15 regression] aarch64: 20% regression in TSVC s278 since r15-3509-gd34cda72098867
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117801 Richard Biener changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org Status|NEW |ASSIGNED --- Comment #13 from Richard Biener --- I have posted the patch.
[Bug rtl-optimization/117801] [15 regression] aarch64: 20% regression in TSVC s278 since r15-3509-gd34cda72098867
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117801 --- Comment #12 from Dhruv Chawla --- (In reply to Richard Biener from comment #11) > Created attachment 59722 [details] > patch > > This patch passed bootstrap & regtest on x86_64-unknown-linux-gnu. I did > not check whether it solves the aarch64 issue with the TSVC test. Hi, the patch does fix the issue on AArch64. Thanks!
[Bug rtl-optimization/117801] [15 regression] aarch64: 20% regression in TSVC s278 since r15-3509-gd34cda72098867
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117801 Richard Biener changed: What|Removed |Added Attachment #59721|0 |1 is obsolete|| --- Comment #11 from Richard Biener --- Created attachment 59722 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=59722&action=edit patch This patch passed bootstrap & regtest on x86_64-unknown-linux-gnu. I did not check whether it solves the aarch64 issue with the TSVC test.
[Bug rtl-optimization/117801] [15 regression] aarch64: 20% regression in TSVC s278 since r15-3509-gd34cda72098867
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117801 --- Comment #9 from Richard Biener --- Hmm, yeah - SSA_NAME expansion will then not expand this stmt, so get_gimple_for_ssa_name has to return a gcall if it was replaceable and thus we have to adjust all users.
[Bug rtl-optimization/117801] [15 regression] aarch64: 20% regression in TSVC s278 since r15-3509-gd34cda72098867
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117801 Richard Biener changed: What|Removed |Added Attachment #59720|0 |1 is obsolete|| --- Comment #10 from Richard Biener --- Created attachment 59721 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=59721&action=edit better patch This one works better, but it's a bit ugly due to the unusual API for internal function RTL expansion.
[Bug rtl-optimization/117801] [15 regression] aarch64: 20% regression in TSVC s278 since r15-3509-gd34cda72098867
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117801 --- Comment #8 from Richard Biener --- Created attachment 59720 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=59720&action=edit prototype Like this. Note it seems to miscompile some cases, so more investigation is needed (maybe we can't just return NULL from get_gimple_for_ssa_name).
[Bug rtl-optimization/117801] [15 regression] aarch64: 20% regression in TSVC s278 since r15-3509-gd34cda72098867
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117801 --- Comment #7 from Richard Biener --- (In reply to Richard Biener from comment #6) > It's a property that would perfectly match a register "pressure" (it's not > really about pressure but register coalescing sensitive) sched1. > > For the specific case I wonder why TER doesn't come to rescue here? I > suppose we never TER internal function calls even when direct-optab? > (not that I really want to suggest to expand what TER does, but this might > be an acceptable knob to get back the performance for GCC 15) Indeed. bool ssa_is_replaceable_p (gimple *stmt) { use_operand_p use_p; tree def; gimple *use_stmt; /* Only consider modify stmts. */ if (!is_gimple_assign (stmt)) return false; ... /* No function calls can be replaced. */ if (is_gimple_call (stmt)) return false; for consistency (we're doing more and more direct-optab IFNs replacing gimple assign sequences) we want to handle those as replaceable. The following might work: diff --git a/gcc/tree-outof-ssa.cc b/gcc/tree-outof-ssa.cc index 3df8054a729..e01523cb4cc 100644 --- a/gcc/tree-outof-ssa.cc +++ b/gcc/tree-outof-ssa.cc @@ -45,6 +45,7 @@ along with GCC; see the file COPYING3. If not see #include "tree-ssa-coalesce.h" #include "tree-outof-ssa.h" #include "dojump.h" +#include "internal-fn.h" /* FIXME: A lot of code here deals with expanding to RTL. All that code should be in cfgexpand.cc. */ @@ -60,8 +61,11 @@ ssa_is_replaceable_p (gimple *stmt) tree def; gimple *use_stmt; - /* Only consider modify stmts. */ - if (!is_gimple_assign (stmt)) + /* Only consider modify stmts and direct internal fn calls. */ + if (!is_gimple_assign (stmt) + && (!is_gimple_call (stmt) + || !gimple_call_internal_p (stmt) + || !direct_internal_fn_p (gimple_call_internal_fn (stmt return false; /* If the statement may throw an exception, it cannot be replaced. */ @@ -96,10 +100,6 @@ ssa_is_replaceable_p (gimple *stmt) && DECL_HARD_REGISTER (gimple_assign_rhs1 (stmt))) return false; - /* No function calls can be replaced. */ - if (is_gimple_call (stmt)) -return false; - /* Leave any stmt with volatile operands alone as well. */ if (gimple_has_volatile_ops (stmt)) return false; but it surely needs adjustments for the TER helpers to expect calls (a quick check shows get_def_for_expr assumes an assignment, the easiest way might be to make get_gimple_for_ssa_name return NULL for non-assigns.
[Bug rtl-optimization/117801] [15 regression] aarch64: 20% regression in TSVC s278 since r15-3509-gd34cda72098867
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117801 Richard Biener changed: What|Removed |Added Component|middle-end |rtl-optimization --- Comment #6 from Richard Biener --- It's a property that would perfectly match a register "pressure" (it's not really about pressure but register coalescing sensitive) sched1. For the specific case I wonder why TER doesn't come to rescue here? I suppose we never TER internal function calls even when direct-optab? (not that I really want to suggest to expand what TER does, but this might be an acceptable knob to get back the performance for GCC 15)