Re: [PATCH 4/6] ira: Try to avoid propagating conflicts

2022-01-10 Thread Richard Sandiford via Gcc-patches
Hi Vlad,

Vladimir Makarov  writes:
> On 2022-01-06 09:47, Richard Sandiford wrote:
>> Suppose that:
>>
>> - an inner loop L contains an allocno A
>> - L clobbers hard register R while A is live
>> - A's parent allocno is AP
>>
>> Previously, propagate_allocno_info would propagate conflict sets up the
>> loop tree, so that the conflict between A and R would become a conflict
>> between AP and R (and so on for ancestors of AP).
> My thoughts for propagating hard register conflicts was to avoid 
> changing allocations on the region border as much as possible.  The 
> solution you are proposing might result in allocating R to the allocno 
> and creating moves/loads/stores on the region border when it would be 
> possible to assign R to another allocno and another hard reg to the 
> allocno in consideration.  As it is all about heuristics it is hard to 
> say just speculating what probability of such situation and what 
> heuristic is better.  Only checking credible benchmarks is a criterium 
> to choose heuristics. […]

Yeah, I agree with all of the above.  Any change to these heuristics is
likely to make some things worse: a strict improvement over the status
quo is essentially impossible.

I guess in principle, the new heuristic is more suited to those high
register pressure situations in which some spilling is inevitable.
The cases that are most likely to lose out under the new heuristics
are those where allocation is possible without spilling and where
recording the conflicts gave the allocator the extra “push” it needed to
prioritise the parent allocno over others with fewer subloop conflicts.
And the risks of that happening are probably greater if the target is
providing lop-sided costs.

(Talking of which, the aarch64 port is still providing equal costs
for loads and stores, which is something that we ought to look at.
But again, it's a sensitive heuristic, so tweaking it will need care.)

Thanks a lot for the quick reviews, really appreciate it.

I've now pushed the series.  Let's see what the fallout is :-)

Richard


Re: [PATCH 4/6] ira: Try to avoid propagating conflicts

2022-01-10 Thread Vladimir Makarov via Gcc-patches



On 2022-01-06 09:47, Richard Sandiford wrote:

Suppose that:

- an inner loop L contains an allocno A
- L clobbers hard register R while A is live
- A's parent allocno is AP

Previously, propagate_allocno_info would propagate conflict sets up the
loop tree, so that the conflict between A and R would become a conflict
between AP and R (and so on for ancestors of AP).
My thoughts for propagating hard register conflicts was to avoid 
changing allocations on the region border as much as possible.  The 
solution you are proposing might result in allocating R to the allocno 
and creating moves/loads/stores on the region border when it would be 
possible to assign R to another allocno and another hard reg to the 
allocno in consideration.  As it is all about heuristics it is hard to 
say just speculating what probability of such situation and what 
heuristic is better.  Only checking credible benchmarks is a criterium 
to choose heuristics.  It seems yours work better.  Thank you putting 
deep thoughts in improving existing heuristics in this and the following 
patches, Richard.

However, when IRA treats loops as separate allocation regions, it can
decide on a loop-by-loop basis whether to allocate a register or spill
to memory.  Conflicts in inner loops therefore don't need to become
hard conflicts in parent loops.  Instead we can record that using the
“conflicting” registers for the parent allocnos has a higher cost.
In the example above, this higher cost is the sum of:

- the cost of saving R on entry to L
- the cost of keeping the pseudo register in memory throughout L
- the cost of reloading R on exit from L

This value is also a cap on the hard register cost that A can contribute
to AP in general (not just for conflicts).  Whatever allocation we pick
for AP, there is always the option of spilling that register to memory
throughout L, so the cost to A of allocating a register to AP can't be
more than the cost of spilling A.

To take an extreme example: if allocating a register R2 to A is more
expensive than spilling A to memory, ALLOCNO_HARD_REG_COSTS (A)[R2]
could be (say) 2 times greater than ALLOCNO_MEMORY_COST (A) or 100
times greater than ALLOCNO_MEMORY_COST (A).  But this scale factor
doesn't matter to AP.  All that matters is that R2 is more expensive
than memory for A, so that allocating R2 to AP should be costed as
spilling A to memory (again assuming that A and AP are in different
allocation regions).  Propagating a factor of 100 would distort the
register costs for AP.

move_spill_restore tries to undo the propagation done by
propagate_allocno_info, so we need some extra processing there.

gcc/
PR rtl-optimization/98782
* ira-int.h (ira_allocno::might_conflict_with_parent_p): New field.
(ALLOCNO_MIGHT_CONFLICT_WITH_PARENT_P): New macro.
(ira_single_region_allocno_p): New function.
(ira_total_conflict_hard_regs): Likewise.
* ira-build.c (ira_create_allocno): Initialize
ALLOCNO_MIGHT_CONFLICT_WITH_PARENT_P.
(ira_propagate_hard_reg_costs): New function.
(propagate_allocno_info): Use it.  Try to avoid propagating
hard register conflicts to parent allocnos if we can handle
the conflicts by spilling instead.  Limit the propagated
register costs to the cost of spilling throughout the child loop.
* ira-color.c (color_pass): Use ira_single_region_allocno_p to
test whether a child and parent allocno can share the same
register.
(move_spill_restore): Adjust for the new behavior of
propagate_allocno_info.

gcc/testsuite/
* gcc.target/aarch64/reg-alloc-2.c: New test.

Thank you for the patch.  It is ok for me.



[PATCH 4/6] ira: Try to avoid propagating conflicts

2022-01-06 Thread Richard Sandiford via Gcc-patches
Suppose that:

- an inner loop L contains an allocno A
- L clobbers hard register R while A is live
- A's parent allocno is AP

Previously, propagate_allocno_info would propagate conflict sets up the
loop tree, so that the conflict between A and R would become a conflict
between AP and R (and so on for ancestors of AP).

However, when IRA treats loops as separate allocation regions, it can
decide on a loop-by-loop basis whether to allocate a register or spill
to memory.  Conflicts in inner loops therefore don't need to become
hard conflicts in parent loops.  Instead we can record that using the
“conflicting” registers for the parent allocnos has a higher cost.
In the example above, this higher cost is the sum of:

- the cost of saving R on entry to L
- the cost of keeping the pseudo register in memory throughout L
- the cost of reloading R on exit from L

This value is also a cap on the hard register cost that A can contribute
to AP in general (not just for conflicts).  Whatever allocation we pick
for AP, there is always the option of spilling that register to memory
throughout L, so the cost to A of allocating a register to AP can't be
more than the cost of spilling A.

To take an extreme example: if allocating a register R2 to A is more
expensive than spilling A to memory, ALLOCNO_HARD_REG_COSTS (A)[R2]
could be (say) 2 times greater than ALLOCNO_MEMORY_COST (A) or 100
times greater than ALLOCNO_MEMORY_COST (A).  But this scale factor
doesn't matter to AP.  All that matters is that R2 is more expensive
than memory for A, so that allocating R2 to AP should be costed as
spilling A to memory (again assuming that A and AP are in different
allocation regions).  Propagating a factor of 100 would distort the
register costs for AP.

move_spill_restore tries to undo the propagation done by
propagate_allocno_info, so we need some extra processing there.

gcc/
PR rtl-optimization/98782
* ira-int.h (ira_allocno::might_conflict_with_parent_p): New field.
(ALLOCNO_MIGHT_CONFLICT_WITH_PARENT_P): New macro.
(ira_single_region_allocno_p): New function.
(ira_total_conflict_hard_regs): Likewise.
* ira-build.c (ira_create_allocno): Initialize
ALLOCNO_MIGHT_CONFLICT_WITH_PARENT_P.
(ira_propagate_hard_reg_costs): New function.
(propagate_allocno_info): Use it.  Try to avoid propagating
hard register conflicts to parent allocnos if we can handle
the conflicts by spilling instead.  Limit the propagated
register costs to the cost of spilling throughout the child loop.
* ira-color.c (color_pass): Use ira_single_region_allocno_p to
test whether a child and parent allocno can share the same
register.
(move_spill_restore): Adjust for the new behavior of
propagate_allocno_info.

gcc/testsuite/
* gcc.target/aarch64/reg-alloc-2.c: New test.
---
 gcc/ira-build.c   | 55 +--
 gcc/ira-color.c   | 53 --
 gcc/ira-int.h | 37 +
 .../gcc.target/aarch64/reg-alloc-2.c  | 47 
 4 files changed, 169 insertions(+), 23 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/reg-alloc-2.c

diff --git a/gcc/ira-build.c b/gcc/ira-build.c
index 0547edfde55..875b4d8ed7c 100644
--- a/gcc/ira-build.c
+++ b/gcc/ira-build.c
@@ -497,6 +497,7 @@ ira_create_allocno (int regno, bool cap_p,
   bitmap_set_bit (loop_tree_node->all_allocnos, ALLOCNO_NUM (a));
   ALLOCNO_NREFS (a) = 0;
   ALLOCNO_FREQ (a) = 0;
+  ALLOCNO_MIGHT_CONFLICT_WITH_PARENT_P (a) = false;
   ALLOCNO_HARD_REGNO (a) = -1;
   ALLOCNO_CALL_FREQ (a) = 0;
   ALLOCNO_CALLS_CROSSED_NUM (a) = 0;
@@ -1991,6 +1992,35 @@ propagate_modified_regnos (ira_loop_tree_node_t 
loop_tree_node)
   loop_tree_node->modified_regnos);
 }
 
+/* Propagate ALLOCNO_HARD_REG_COSTS from A to PARENT_A.  Use SPILL_COST
+   as the cost of spilling a register throughout A (which we have to do
+   for PARENT_A allocations that conflict with A).  */
+static void
+ira_propagate_hard_reg_costs (ira_allocno_t parent_a, ira_allocno_t a,
+ int spill_cost)
+{
+  HARD_REG_SET conflicts = ira_total_conflict_hard_regs (a);
+  conflicts &= ~ira_total_conflict_hard_regs (parent_a);
+
+  auto costs = ALLOCNO_HARD_REG_COSTS (a);
+  if (!hard_reg_set_empty_p (conflicts))
+ALLOCNO_MIGHT_CONFLICT_WITH_PARENT_P (a) = true;
+  else if (!costs)
+return;
+
+  auto aclass = ALLOCNO_CLASS (a);
+  ira_allocate_and_set_costs (_HARD_REG_COSTS (parent_a),
+ aclass, ALLOCNO_CLASS_COST (parent_a));
+  auto parent_costs = ALLOCNO_HARD_REG_COSTS (parent_a);
+  for (int i = 0; i < ira_class_hard_regs_num[aclass]; ++i)
+if (TEST_HARD_REG_BIT (conflicts, ira_class_hard_regs[aclass][i]))
+  parent_costs[i] += spill_cost;
+else if (costs)
+  /* The cost to A of