[Bug target/87627] GCC generates rube-goldberg machine for trivial tail call on 32-bit x86

2021-08-19 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87627

Andrew Pinski  changed:

   What|Removed |Added

   Last reconfirmed|2018-10-17 00:00:00 |2021-8-19

--- Comment #7 from Andrew Pinski  ---
clang and MSVC get this "correct".

[Bug target/87627] GCC generates rube-goldberg machine for trivial tail call on 32-bit x86

2018-10-29 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87627

--- Comment #6 from Alexander Monakov  ---
FWIW the following CSE enhancement cleans this up, but I'm unhappy with this
patch because it's too narrowly targeted; in particular, won't clean up

void g(int a, int *b, int c);
void f(int a, int *b, int c)
{
  if (a) *b = c;
  g(a, b, c);
}

Had to special-case for PARM_DECL because, in general, automatic variables that
are not address-taken on GIMPLE can become addressable on RTL when ABI requires
passing a large argument by reference.

--- a/gcc/cse.c
+++ b/gcc/cse.c
@@ -2232,6 +2232,15 @@ hash_rtx_string (const char *ps)
   return hash;
 }

+static bool
+mem_escapes_p (const_rtx x)
+{
+  tree decl = MEM_EXPR (x);
+  if (!decl || TREE_CODE (decl) != PARM_DECL)
+return true;
+  return may_be_aliased (decl);
+}
+
 /* Same as hash_rtx, but call CB on each rtx if it is not NULL.
When the callback returns true, we continue with the new rtx.  */

@@ -2421,7 +2430,8 @@ hash_rtx_cb (const_rtx x, machine_mode mode,
  return 0;
}
   if (hash_arg_in_memory_p && !MEM_READONLY_P (x))
-   *hash_arg_in_memory_p = 1;
+   if (*hash_arg_in_memory_p != 1)
+ *hash_arg_in_memory_p = mem_escapes_p (x) ? 1 : 2;

   /* Now that we have already found this special case,
 might as well speed it up as much as possible.  */
@@ -6127,7 +6137,7 @@ invalidate_memory (void)
 for (p = table[i]; p; p = next)
   {
next = p->next_same_hash;
-   if (p->in_memory)
+   if (p->in_memory == 1)
  remove_from_table (p, i);
   }
 }

[Bug target/87627] GCC generates rube-goldberg machine for trivial tail call on 32-bit x86

2018-10-22 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87627

--- Comment #5 from Alexander Monakov  ---
I've spent some time looking at this again, and I couldn't find a way to
preserve REG_EQUIV notes (it's actually unclear what REG_EQUIV means
precisely).

What I think could help in simple cases like this one, and might also be
helpful in other situations, is to have mem_attrs indicate that memory does not
escape. RTL CSE would not need to invalidate such MEMs when processing a call.

[Bug target/87627] GCC generates rube-goldberg machine for trivial tail call on 32-bit x86

2018-10-17 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87627

--- Comment #4 from Rich Felker  ---
Thanks, that's helpful!

For 64-bit what I mean is that it emits:

  pushq %r12
  movl %edx, %r12d
  pushq %rbp
  movl %esi, %ebp
  pushq %rbx
  movl %edi, %ebx
  call bar
  movl %r12d, %edx
  movl %ebp, %esi
  movl %ebx, %edi
  popq %rbx
  popq %rbp
  popq %r12
  jmp bah

whereas it would be much more efficient to do:

  pushq %rdx
  pushq %rsi
  pushq %rdi
  call bar
  popq %rdi
  popq %rsi
  popq %rdx
  jmp bah

[Bug target/87627] GCC generates rube-goldberg machine for trivial tail call on 32-bit x86

2018-10-17 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87627

Richard Biener  changed:

   What|Removed |Added

   Keywords||missed-optimization
 Target||i?86-*-*
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2018-10-17
Version|unknown |9.0
 Ever confirmed|0   |1

[Bug target/87627] GCC generates rube-goldberg machine for trivial tail call on 32-bit x86

2018-10-17 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87627

Alexander Monakov  changed:

   What|Removed |Added

 CC||amonakov at gcc dot gnu.org

--- Comment #3 from Alexander Monakov  ---
Noticed this back when working on -fno-plt patches:
https://gcc.gnu.org/ml/gcc-patches/2015-05/msg00229.html

Emitting a tailcall on RTL drops REG_EQUIV notes (perhaps because in the
general case equivalences might not hold just before the sibcall when the new
arguments are being prepared), and this penalizes code generation for the whole
function.

I'm not sure why you say "Results are similarly bad for 64-bit", there's
nothing to improve in this example with three arguments all of which are on
registers and thus need to be somehow saved/restored anyway?

[Bug target/87627] GCC generates rube-goldberg machine for trivial tail call on 32-bit x86

2018-10-16 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87627

--- Comment #2 from Rich Felker  ---
Further trial-and-error shows that it seems to be the sibcall itself that
causes the mess. My first guess is that something in the RTL considers the
whole argument area as clobbered/belonging to the sibcallee as soon as it
starts setting up for the sibcall, thereby forcing the arguments to be backed
up somewhere else and restored, but I'm not sure why that wouldn't affect the
case where there's no intervening call.

[Bug target/87627] GCC generates rube-goldberg machine for trivial tail call on 32-bit x86

2018-10-16 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87627

--- Comment #1 from Rich Felker  ---
Results are similarly bad for 64-bit, except at -Os where it effectively just
pushes/pops the argument registers around the call to bar rather than
allocating call-saved registers for them. Using -Os on 32-bit does not help.
-O0 does suppress the register shuffling but also suppresses the tail call.