[Bug rtl-optimization/110215] RA fails to allocate register when loop invariant lives across calls and eh

2023-11-09 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110215

--- Comment #7 from CVS Commits  ---
The master branch has been updated by Vladimir Makarov :

https://gcc.gnu.org/g:a99f6bb142bc4506dcb8aa2b7722310ad92e4528

commit r14-5294-ga99f6bb142bc4506dcb8aa2b7722310ad92e4528
Author: Vladimir N. Makarov 
Date:   Thu Nov 9 08:51:15 2023 -0500

[IRA]: Fixing conflict calculation from region landing pads.

The following patch fixes conflict calculation from exception landing
pads.  The previous patch processed only one newly created landing pad.
Besides it was wrong, it also resulted in large memory consumption by IRA.

gcc/ChangeLog:

PR rtl-optimization/110215
* ira-lives.cc: (add_conflict_from_region_landing_pads): New
function.
(process_bb_node_lives): Use it.

[Bug rtl-optimization/110215] RA fails to allocate register when loop invariant lives across calls and eh

2023-06-26 Thread wwwhhhyyy333 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110215

--- Comment #6 from Hongyu Wang  ---
Thanks for the fix, now for the attached test, main loop will not have any
load. 

There is a remaining issue that the loop epilogue still contains load from
stack and constant pool

.L9:
movslq  %edx, %rax
movss   72(%rsp), %xmm5
salq$2, %rax
leaq(%rbx,%rax), %rcx
movaps  %xmm5, %xmm1
subss   (%rcx), %xmm1
andps   .LC4(%rip), %xmm1
movss   %xmm1, (%rcx)
leal1(%rdx), %ecx
addss   %xmm1, %xmm0
cmpl%ecx, %r12d
jle .L8

IRA dump shows the pseudos does not have conflict but they still failed to be
allocated with register. This issue does not exist on aarch64.

[Bug rtl-optimization/110215] RA fails to allocate register when loop invariant lives across calls and eh

2023-06-16 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110215

--- Comment #5 from CVS Commits  ---
The master branch has been updated by Vladimir Makarov :

https://gcc.gnu.org/g:154c69039571c66b3a6d16ecfa9e6ff22942f59f

commit r14-1891-g154c69039571c66b3a6d16ecfa9e6ff22942f59f
Author: Vladimir N. Makarov 
Date:   Fri Jun 16 11:12:32 2023 -0400

RA: Ignore conflicts for some pseudos from insns throwing a final exception

IRA adds conflicts to the pseudos from insns can throw exceptions
internally even if the exception code is final for the function and
the pseudo value is not used in the exception code.  This results in
spilling a pseudo in a loop (see
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110215).

The following patch fixes the problem.

PR rtl-optimization/110215

gcc/ChangeLog:

* ira-lives.cc: Include except.h.
(process_bb_node_lives): Ignore conflicts from cleanup exceptions
when the pseudo does not live at the exception landing pad.

[Bug rtl-optimization/110215] RA fails to allocate register when loop invariant lives across calls and eh

2023-06-14 Thread vmakarov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110215

--- Comment #4 from Vladimir Makarov  ---
(In reply to Richard Biener from comment #3)
> 
> 
> We don't have any pass after reload that would perform loop invatiant motion,
> I'm not sure how this situation is handled in general in RA - is a post-RA
> pass optimizing the spill/reload placement "globally" usually done?

LRA does not do placement of reload insns.  Global RA is supposed to do this
when it forms regions for the allocation.

I've been working on this issue.  I hope the fix will be ready on this week.

[Bug rtl-optimization/110215] RA fails to allocate register when loop invariant lives across calls and eh

2023-06-12 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110215

Richard Biener  changed:

   What|Removed |Added

 CC||vmakarov at gcc dot gnu.org
   Keywords|EH  |

--- Comment #3 from Richard Biener  ---
The issue is that we fail to sink

 d_29 = {t_28, t_28, t_28 t_28};

we compute a good place in select_best_block but then since it is at the
same loop depth as the original place we apply

  /* If BEST_BB is at the same nesting level, then require it to have
 significantly lower execution frequency to avoid gratuitous movement.  */
  if (bb_loop_depth (best_bb) == bb_loop_depth (early_bb)
  /* If result of comparsion is unknown, prefer EARLY_BB.
 Thus use !(...>=..) rather than (...<...)  */
  && !(best_bb->count * 100 >= early_bb->count * threshold))
return best_bb;

and fail to sink.  I'm not exactly sure why we do the above - we probably
should when best_bb post-dominates early_bb, also if the sunk stmt
possibly (or provably) will enlarge lifetime of its uses (but that's also
hard to guess since we process sinking of the defs of the uses only
afterwards).  In this case we have a single use and a single def so
sinking shouldn't make things worse.  We could also weight in
spilling class of a reg here.

In our case we have the dominated block with a higher(!) count than
the dominating block which means the profile is corrupt.

With --param sink-frequency-threshold we sink the ctor and the feeding
division but still get

.L5:
movq(%rbx), %rax
pxor%xmm1, %xmm1
leaq0(%rbp,%rax), %rdx
.p2align 4,,10
.p2align 3
.L4:
movaps  (%rsp), %xmm0
addps   (%rax), %xmm0
addq$16, %rax
movaps  %xmm0, -16(%rax)
addps   %xmm0, %xmm1
cmpq%rax, %rdx
jne .L4
movaps  %xmm1, %xmm0
movhlps %xmm1, %xmm0
addps   %xmm0, %xmm1
movaps  %xmm1, %xmm0
shufps  $85, %xmm1, %xmm0
addps   %xmm1, %xmm0
.LEHB1:
call_Z1gf
addq$8, %rbx
cmpq%rbx, %r12
jne .L5

because we (rightfully so) refuse to sink into the outer loop.  What we
fail to do is hoist the reload out of the inner loop (I suppose
clang does exactly that).

We don't have any pass after reload that would perform loop invatiant motion,
I'm not sure how this situation is handled in general in RA - is a post-RA
pass optimizing the spill/reload placement "globally" usually done?

[Bug rtl-optimization/110215] RA fails to allocate register when loop invariant lives across calls and eh

2023-06-11 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110215

Andrew Pinski  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
Summary|RA fails to allocate|RA fails to allocate
   |register when loop  |register when loop
   |invariant lives across  |invariant lives across
   |calls   |calls and eh
 Ever confirmed|0   |1
   Keywords||ra
   Last reconfirmed||2023-06-12

--- Comment #2 from Andrew Pinski  ---
Reduced testcase for both x86_64 and aarch64:
```
#define vec __attribute__((vector_size(4*sizeof(float
struct s1
{
 s1();
 ~s1();
};
void g();
void g(float);
void f(float a, float b, vec float **c, int n, int j)
{
s1 t2;
float t = a/b;
vec float d = {t, t, t, t};
for (int l = 0; l < j; l++)
{
vec float s = {};
for(int i =0;i