[Bug rtl-optimization/117801] [15 regression] aarch64: 20% regression in TSVC s278 since r15-3509-gd34cda72098867

2024-12-05 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117801

--- Comment #14 from GCC Commits  ---
The master branch has been updated by Richard Biener :

https://gcc.gnu.org/g:dc0dea98c96e02c6b24060170bc88da8d4931bc2

commit r15-5943-gdc0dea98c96e02c6b24060170bc88da8d4931bc2
Author: Richard Biener 
Date:   Wed Nov 27 13:36:19 2024 +0100

middle-end/117801 - failed register coalescing due to GIMPLE schedule

For a TSVC testcase we see failed register coalescing due to a
different schedule of GIMPLE .FMA and stores fed by it.  This
can be mitigated by making direct internal functions participate
in TER - given we're using more and more of such functions to
expose target capabilities it seems to be a natural thing to not
exempt those.

Unfortunately the internal function expanding API doesn't match
what we usually have - passing in a target and returning an RTX
but instead the LHS of the call is expanded and written to.  This
makes the TER expansion of a call SSA def a bit unwieldly.

Bootstrapped and tested on x86_64-unknown-linux-gnu.

The ccmp changes have likely not seen any coverage, the debug stmt
changes might not be optimal, we might end up losing on replaceable
calls.

PR middle-end/117801
* tree-outof-ssa.cc (ssa_is_replaceable_p): Make
direct internal function calls replaceable.
* expr.cc (get_def_for_expr): Handle replacements with calls.
(get_def_for_expr_class): Likewise.
(optimize_bitfield_assignment_op): Likewise.
(expand_expr_real_1): Likewise.  Properly expand direct
internal function defs.
* cfgexpand.cc (expand_call_stmt): Handle replacements with calls.
(avoid_deep_ter_for_debug): Likewise, always create a debug temp
for calls.
(expand_debug_expr): Likewise, give up for calls.
(expand_gimple_basic_block): Likewise.
* ccmp.cc (ccmp_candidate_p): Likewise.
(get_compare_parts): Likewise.

[Bug rtl-optimization/117801] [15 regression] aarch64: 20% regression in TSVC s278 since r15-3509-gd34cda72098867

2024-12-05 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117801

Richard Biener  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED

--- Comment #15 from Richard Biener  ---
Fixed then.  Will be reverted in case of bigger problems though.

[Bug rtl-optimization/117801] [15 regression] aarch64: 20% regression in TSVC s278 since r15-3509-gd34cda72098867

2024-11-29 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117801

Richard Biener  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |rguenth at gcc dot 
gnu.org
 Status|NEW |ASSIGNED

--- Comment #13 from Richard Biener  ---
I have posted the patch.

[Bug rtl-optimization/117801] [15 regression] aarch64: 20% regression in TSVC s278 since r15-3509-gd34cda72098867

2024-11-28 Thread dhruvc at nvidia dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117801

--- Comment #12 from Dhruv Chawla  ---
(In reply to Richard Biener from comment #11)
> Created attachment 59722 [details]
> patch
> 
> This patch passed bootstrap & regtest on x86_64-unknown-linux-gnu.  I did
> not check whether it solves the aarch64 issue with the TSVC test.

Hi, the patch does fix the issue on AArch64. Thanks!

[Bug rtl-optimization/117801] [15 regression] aarch64: 20% regression in TSVC s278 since r15-3509-gd34cda72098867

2024-11-27 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117801

Richard Biener  changed:

   What|Removed |Added

  Attachment #59721|0   |1
is obsolete||

--- Comment #11 from Richard Biener  ---
Created attachment 59722
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=59722&action=edit
patch

This patch passed bootstrap & regtest on x86_64-unknown-linux-gnu.  I did not
check whether it solves the aarch64 issue with the TSVC test.

[Bug rtl-optimization/117801] [15 regression] aarch64: 20% regression in TSVC s278 since r15-3509-gd34cda72098867

2024-11-27 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117801

--- Comment #9 from Richard Biener  ---
Hmm, yeah - SSA_NAME expansion will then not expand this stmt, so
get_gimple_for_ssa_name has to return a gcall if it was replaceable
and thus we have to adjust all users.

[Bug rtl-optimization/117801] [15 regression] aarch64: 20% regression in TSVC s278 since r15-3509-gd34cda72098867

2024-11-27 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117801

Richard Biener  changed:

   What|Removed |Added

  Attachment #59720|0   |1
is obsolete||

--- Comment #10 from Richard Biener  ---
Created attachment 59721
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=59721&action=edit
better patch

This one works better, but it's a bit ugly due to the unusual API for internal
function RTL expansion.

[Bug rtl-optimization/117801] [15 regression] aarch64: 20% regression in TSVC s278 since r15-3509-gd34cda72098867

2024-11-27 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117801

--- Comment #8 from Richard Biener  ---
Created attachment 59720
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=59720&action=edit
prototype

Like this.

Note it seems to miscompile some cases, so more investigation is needed
(maybe we can't just return NULL from get_gimple_for_ssa_name).

[Bug rtl-optimization/117801] [15 regression] aarch64: 20% regression in TSVC s278 since r15-3509-gd34cda72098867

2024-11-27 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117801

--- Comment #7 from Richard Biener  ---
(In reply to Richard Biener from comment #6)
> It's a property that would perfectly match a register "pressure" (it's not
> really about pressure but register coalescing sensitive) sched1.
> 
> For the specific case I wonder why TER doesn't come to rescue here?  I
> suppose we never TER internal function calls even when direct-optab?
> (not that I really want to suggest to expand what TER does, but this might
> be an acceptable knob to get back the performance for GCC 15)

Indeed.

bool
ssa_is_replaceable_p (gimple *stmt)
{
  use_operand_p use_p;
  tree def;
  gimple *use_stmt;

  /* Only consider modify stmts.  */
  if (!is_gimple_assign (stmt))
return false;

...

  /* No function calls can be replaced.  */
  if (is_gimple_call (stmt))
return false;

for consistency (we're doing more and more direct-optab IFNs replacing
gimple assign sequences) we want to handle those as replaceable.

The following might work:

diff --git a/gcc/tree-outof-ssa.cc b/gcc/tree-outof-ssa.cc
index 3df8054a729..e01523cb4cc 100644
--- a/gcc/tree-outof-ssa.cc
+++ b/gcc/tree-outof-ssa.cc
@@ -45,6 +45,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-ssa-coalesce.h"
 #include "tree-outof-ssa.h"
 #include "dojump.h"
+#include "internal-fn.h"

 /* FIXME: A lot of code here deals with expanding to RTL.  All that code
should be in cfgexpand.cc.  */
@@ -60,8 +61,11 @@ ssa_is_replaceable_p (gimple *stmt)
   tree def;
   gimple *use_stmt;

-  /* Only consider modify stmts.  */
-  if (!is_gimple_assign (stmt))
+  /* Only consider modify stmts and direct internal fn calls.  */
+  if (!is_gimple_assign (stmt)
+  && (!is_gimple_call (stmt)
+ || !gimple_call_internal_p (stmt)
+ || !direct_internal_fn_p (gimple_call_internal_fn (stmt
 return false;

   /* If the statement may throw an exception, it cannot be replaced.  */
@@ -96,10 +100,6 @@ ssa_is_replaceable_p (gimple *stmt)
   && DECL_HARD_REGISTER (gimple_assign_rhs1 (stmt)))
 return false;

-  /* No function calls can be replaced.  */
-  if (is_gimple_call (stmt))
-return false;
-
   /* Leave any stmt with volatile operands alone as well.  */
   if (gimple_has_volatile_ops (stmt))
 return false;

but it surely needs adjustments for the TER helpers to expect calls
(a quick check shows get_def_for_expr assumes an assignment, the
easiest way might be to make get_gimple_for_ssa_name return NULL
for non-assigns.

[Bug rtl-optimization/117801] [15 regression] aarch64: 20% regression in TSVC s278 since r15-3509-gd34cda72098867

2024-11-27 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117801

Richard Biener  changed:

   What|Removed |Added

  Component|middle-end  |rtl-optimization

--- Comment #6 from Richard Biener  ---
It's a property that would perfectly match a register "pressure" (it's not
really about pressure but register coalescing sensitive) sched1.

For the specific case I wonder why TER doesn't come to rescue here?  I
suppose we never TER internal function calls even when direct-optab?
(not that I really want to suggest to expand what TER does, but this might
be an acceptable knob to get back the performance for GCC 15)