Re: RFC: IRA patch to reduce lifetimes

2012-05-26 Thread H.J. Lu
On Mon, May 21, 2012 at 9:33 AM, H.J. Lu hjl.to...@gmail.com wrote:
 On Wed, Apr 11, 2012 at 7:35 AM, Bernd Schmidt ber...@codesourcery.com 
 wrote:
 On 12/23/2011 05:31 PM, Vladimir Makarov wrote:
 On 12/21/2011 09:09 AM, Bernd Schmidt wrote:
 This patch was an experiment to see if we can get the same improvement
 with modifications to IRA, making it more tolerant to over-aggressive
 scheduling. THe idea is that if an instruction sets a register A, and
 all its inputs are live and unmodified for the lifetime of A, then
 moving the instruction downwards towards its first use is going to be
 beneficial from a register pressure point of view.

 That alone, however, turns out to be too aggressive, performance drops
 presumably because we undo too many scheduling decisions. So, the patch
 detects such situations, and splits the pseudo; a new pseudo is
 introduced in the original setting instruction, and a copy is added
 before the first use. If the new pseudo does not get a hard register, it
 is removed again and instead the setting instruction is moved to the
 point of the copy.

 This gets up to 6.5% on 456.hmmer on the mips target I was working on;
 an embedded benchmark suite also seems to have a (small) geomean
 improvement. On x86_64, I've tested spec2k, where specint is unchanged
 and specfp has a tiny performance regression. All these tests were done
 with a gcc-4.6 based tree.

 Thoughts? Currently the patch feels somewhat bolted on to the side of
 IRA, maybe there's a nicer way to achieve this?

 I think that is an excellent idea.  I used analogous approach for
 splitting pseudo in IRA on loop bounds even if it gets hard register
 inside and outside loops.  The copies are removed if the live ranges
 were not spilled in reload.

 I have no problem with this patch.  It is just a small change in IRA.

 Sounds like you're happier with the patch than I am, so who am I to argue.

 Here's an updated version against current trunk, with some cc0 bugfixes
 that I've since discovered to be necessary. Bootstrapped and tested (but
 not benchmarked again) on i686-linux. Ok?


 Bernd

 This caused:

 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53411


This also caused:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53495


-- 
H.J.


Re: RFC: IRA patch to reduce lifetimes

2012-05-21 Thread H.J. Lu
On Wed, Apr 11, 2012 at 7:35 AM, Bernd Schmidt ber...@codesourcery.com wrote:
 On 12/23/2011 05:31 PM, Vladimir Makarov wrote:
 On 12/21/2011 09:09 AM, Bernd Schmidt wrote:
 This patch was an experiment to see if we can get the same improvement
 with modifications to IRA, making it more tolerant to over-aggressive
 scheduling. THe idea is that if an instruction sets a register A, and
 all its inputs are live and unmodified for the lifetime of A, then
 moving the instruction downwards towards its first use is going to be
 beneficial from a register pressure point of view.

 That alone, however, turns out to be too aggressive, performance drops
 presumably because we undo too many scheduling decisions. So, the patch
 detects such situations, and splits the pseudo; a new pseudo is
 introduced in the original setting instruction, and a copy is added
 before the first use. If the new pseudo does not get a hard register, it
 is removed again and instead the setting instruction is moved to the
 point of the copy.

 This gets up to 6.5% on 456.hmmer on the mips target I was working on;
 an embedded benchmark suite also seems to have a (small) geomean
 improvement. On x86_64, I've tested spec2k, where specint is unchanged
 and specfp has a tiny performance regression. All these tests were done
 with a gcc-4.6 based tree.

 Thoughts? Currently the patch feels somewhat bolted on to the side of
 IRA, maybe there's a nicer way to achieve this?

 I think that is an excellent idea.  I used analogous approach for
 splitting pseudo in IRA on loop bounds even if it gets hard register
 inside and outside loops.  The copies are removed if the live ranges
 were not spilled in reload.

 I have no problem with this patch.  It is just a small change in IRA.

 Sounds like you're happier with the patch than I am, so who am I to argue.

 Here's an updated version against current trunk, with some cc0 bugfixes
 that I've since discovered to be necessary. Bootstrapped and tested (but
 not benchmarked again) on i686-linux. Ok?


 Bernd

This caused:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53411

-- 
H.J.


Re: RFC: IRA patch to reduce lifetimes

2012-04-11 Thread Bernd Schmidt
On 12/23/2011 05:31 PM, Vladimir Makarov wrote:
 On 12/21/2011 09:09 AM, Bernd Schmidt wrote:
 This patch was an experiment to see if we can get the same improvement
 with modifications to IRA, making it more tolerant to over-aggressive
 scheduling. THe idea is that if an instruction sets a register A, and
 all its inputs are live and unmodified for the lifetime of A, then
 moving the instruction downwards towards its first use is going to be
 beneficial from a register pressure point of view.

 That alone, however, turns out to be too aggressive, performance drops
 presumably because we undo too many scheduling decisions. So, the patch
 detects such situations, and splits the pseudo; a new pseudo is
 introduced in the original setting instruction, and a copy is added
 before the first use. If the new pseudo does not get a hard register, it
 is removed again and instead the setting instruction is moved to the
 point of the copy.

 This gets up to 6.5% on 456.hmmer on the mips target I was working on;
 an embedded benchmark suite also seems to have a (small) geomean
 improvement. On x86_64, I've tested spec2k, where specint is unchanged
 and specfp has a tiny performance regression. All these tests were done
 with a gcc-4.6 based tree.

 Thoughts? Currently the patch feels somewhat bolted on to the side of
 IRA, maybe there's a nicer way to achieve this?

 I think that is an excellent idea.  I used analogous approach for
 splitting pseudo in IRA on loop bounds even if it gets hard register
 inside and outside loops.  The copies are removed if the live ranges
 were not spilled in reload.
 
 I have no problem with this patch.  It is just a small change in IRA.

Sounds like you're happier with the patch than I am, so who am I to argue.

Here's an updated version against current trunk, with some cc0 bugfixes
that I've since discovered to be necessary. Bootstrapped and tested (but
not benchmarked again) on i686-linux. Ok?


Bernd
* dbgcnt.def (ira_move): New counter.
* ira-int.h (ira_create_new_reg): Declare function.
(first_moveable_pseudo, last_moveable_pseudo): Declare variables.
* ira-emit.c (ira_create_new_reg): Renamed from craete_new_reg and
no longer static.  All callers changed.
* ira.c: Include dbgcnt.h.
(rtx_moveable_p, insn_dominated_by_p, find_moveable_pseudos,
move_unallocated_pseudos): New static functions.
(first_moveable_pseudo, last_moveable_pseudo): New global variables.
(pseudo_replaced_reg, pseudo_move_insn): New static variables.
(ira): Call find_moveable_pseudos and move_unallocated_pseudos.
* ira-costs.c (find_costs_and_classes): Assign a memory cost of zero
to the pseudos generated in find_moveable_pseudos.
* Makefile.in (ira.o): Add $(DBGCNT_H).

Index: gcc/dbgcnt.def
===
--- gcc/dbgcnt.def  (revision 186270)
+++ gcc/dbgcnt.def  (working copy)
@@ -184,3 +184,4 @@ DEBUG_COUNTER (sms_sched_loop)
 DEBUG_COUNTER (store_motion)
 DEBUG_COUNTER (split_for_sched2)
 DEBUG_COUNTER (tail_call)
+DEBUG_COUNTER (ira_move)
Index: gcc/ira-int.h
===
--- gcc/ira-int.h   (revision 186270)
+++ gcc/ira-int.h   (working copy)
@@ -1416,3 +1416,6 @@ ira_allocate_and_set_or_copy_costs (int
reg_costs[i] = val;
 }
 }
+
+extern rtx ira_create_new_reg (rtx);
+extern int first_moveable_pseudo, last_moveable_pseudo;
Index: gcc/ira-emit.c
===
--- gcc/ira-emit.c  (revision 186270)
+++ gcc/ira-emit.c  (working copy)
@@ -330,8 +330,8 @@ add_to_edge_list (edge e, move_t move, b
 
 /* Create and return new pseudo-register with the same attributes as
ORIGINAL_REG.  */
-static rtx
-create_new_reg (rtx original_reg)
+rtx
+ira_create_new_reg (rtx original_reg)
 {
   rtx new_reg;
 
@@ -625,7 +625,7 @@ change_loop (ira_loop_tree_node_t node)
fprintf (ira_dump_file,   %i vs parent %i:,
 ALLOCNO_HARD_REGNO (allocno),
 ALLOCNO_HARD_REGNO (parent_allocno));
- set_allocno_reg (allocno, create_new_reg (original_reg));
+ set_allocno_reg (allocno, ira_create_new_reg (original_reg));
}
}
 }
@@ -646,7 +646,7 @@ change_loop (ira_loop_tree_node_t node)
   if (! used_p)
continue;
   bitmap_set_bit (renamed_regno_bitmap, regno);
-  set_allocno_reg (allocno, create_new_reg (allocno_emit_reg (allocno)));
+  set_allocno_reg (allocno, ira_create_new_reg (allocno_emit_reg 
(allocno)));
 }
 }
 
@@ -852,7 +852,7 @@ modify_move_list (move_t list)
ALLOCNO_ASSIGNED_P (new_allocno) = true;
ALLOCNO_HARD_REGNO (new_allocno) = -1;
ALLOCNO_EMIT_DATA (new_allocno)-reg
- = create_new_reg (allocno_emit_reg (set_move-to));

Re: RFC: IRA patch to reduce lifetimes

2012-04-11 Thread Vladimir Makarov

On 04/11/2012 10:35 AM, Bernd Schmidt wrote:

On 12/23/2011 05:31 PM, Vladimir Makarov wrote:

On 12/21/2011 09:09 AM, Bernd Schmidt wrote:

This patch was an experiment to see if we can get the same improvement
with modifications to IRA, making it more tolerant to over-aggressive
scheduling. THe idea is that if an instruction sets a register A, and
all its inputs are live and unmodified for the lifetime of A, then
moving the instruction downwards towards its first use is going to be
beneficial from a register pressure point of view.

That alone, however, turns out to be too aggressive, performance drops
presumably because we undo too many scheduling decisions. So, the patch
detects such situations, and splits the pseudo; a new pseudo is
introduced in the original setting instruction, and a copy is added
before the first use. If the new pseudo does not get a hard register, it
is removed again and instead the setting instruction is moved to the
point of the copy.

This gets up to 6.5% on 456.hmmer on the mips target I was working on;
an embedded benchmark suite also seems to have a (small) geomean
improvement. On x86_64, I've tested spec2k, where specint is unchanged
and specfp has a tiny performance regression. All these tests were done
with a gcc-4.6 based tree.

Thoughts? Currently the patch feels somewhat bolted on to the side of
IRA, maybe there's a nicer way to achieve this?


I think that is an excellent idea.  I used analogous approach for
splitting pseudo in IRA on loop bounds even if it gets hard register
inside and outside loops.  The copies are removed if the live ranges
were not spilled in reload.

I have no problem with this patch.  It is just a small change in IRA.

Sounds like you're happier with the patch than I am, so who am I to argue.

Here's an updated version against current trunk, with some cc0 bugfixes
that I've since discovered to be necessary. Bootstrapped and tested (but
not benchmarked again) on i686-linux. Ok?



It is ok.  At least it will be useful for gcc4.8.

But I am not sure about the longevity of this code.  Since my last email 
a lot was changed on LRA project (which I hope will be ready for 
gcc4.9).  I've implemented live range splitting which works analogously: 
some pseudo ranges are splited and if a split range does not change the 
assignment, pseudo live range split is undone.  The difference in your 
approach is that it is done with usage of global view (global RA) and 
mine is done locally.  So it needs more investigation how different the 
results are.  It seems to me that they will complement each other.  
Probably I'll investigate this when/if LRA is merged.


In any case, thanks, Bernd.  It is ok to commit this patch.



Re: RFC: IRA patch to reduce lifetimes

2011-12-23 Thread Vladimir Makarov

On 12/21/2011 09:09 AM, Bernd Schmidt wrote:

For a customer I've looked into improving code for 456.hmmer on a mips64
target. The benchmark responds to -fsched-pressure, which reduces
lifetimes of a few registers.

This patch was an experiment to see if we can get the same improvement
with modifications to IRA, making it more tolerant to over-aggressive
scheduling. THe idea is that if an instruction sets a register A, and
all its inputs are live and unmodified for the lifetime of A, then
moving the instruction downwards towards its first use is going to be
beneficial from a register pressure point of view.

That alone, however, turns out to be too aggressive, performance drops
presumably because we undo too many scheduling decisions. So, the patch
detects such situations, and splits the pseudo; a new pseudo is
introduced in the original setting instruction, and a copy is added
before the first use. If the new pseudo does not get a hard register, it
is removed again and instead the setting instruction is moved to the
point of the copy.

This gets up to 6.5% on 456.hmmer on the mips target I was working on;
an embedded benchmark suite also seems to have a (small) geomean
improvement. On x86_64, I've tested spec2k, where specint is unchanged
and specfp has a tiny performance regression. All these tests were done
with a gcc-4.6 based tree.

Thoughts? Currently the patch feels somewhat bolted on to the side of
IRA, maybe there's a nicer way to achieve this?

I think that is an excellent idea.  I used analogous approach for 
splitting pseudo in IRA on loop bounds even if it gets hard register 
inside and outside loops.  The copies are removed if the live ranges 
were not spilled in reload.


I have no problem with this patch.  It is just a small change in IRA.