Re: patch for PR53125

2012-05-16 Thread Paolo Bonzini
 Now with patch
 
 On Fri, May 11, 2012 at 8:42 PM, Steven Bosscher stevenb@gmail.com 
 wrote:
 2012-05-10  Vladimir Makarovvmaka...@redhat.com

  PR rtl-optimization/53125
  * ira.c (ira): Call find_moveable_pseudos and
  move_unallocated_pseudos if only ira_conflicts_p is true.
 And the attached patch fixes the reginfo slowness.

 The reginfo pass was doing:

  for each bb:
for each insn in bb, from BB_END(bb) to BB_HEAD(bb):
  for each reg that died between BB_END(bb) and insn: // i.e.
 register is live at insn
REG_LIVE_LENGTH(reg)++

 With very large basic blocks, and the kind of code for the test case
 of the PR results in many live registers for SPARC (for x86_64 there
 are far fewer live registers). For SPARC, there are O(1e5) insns and
 O(1e4) registers live for most of the basic block. That is effectively
 almost quadratic behavior in the number of insns per basic block.

 But the above loop is a bit silly. It is much easier and
 computationally cheaper to just remember at what insn reg died (last
 used), and add to REG_LIVE_LENGTH the distance from the insn that sets
 reg to the insn that used reg.

Yes, this makes a lot of sense.  Patch ok with one typo fixed.

 It turns out that (before or after the patch) partial or conditional
 sets never kill a register, so that REG_LIVE_LENGTH for registers set
 by partial or conditional stores is not very accurate.

But this is correct, isn't it?  The live range for such registers is
indeed extending before and after the store.

 @@ -310,22 +322,26 @@ regstat_bb_compute_ri (unsigned int bb_i
  
 if (bitmap_set_bit (live, uregno))
   {
 -   /* This register is now live.  */
 +   /* This register is now live.  Begin to process it locally.
  
 -   /* If we have seen this regno, then it has already been
 -  processed correctly with the per insn increment.  If
 -  we have not seen it we set the bit so that begins to
 -  get processed locally.  Note that we don't even get
 -  here if the variable was live at the end of the block
 -  since just a ref inside the block does not effect the
 -  calculations.  */
 +  Note that we don't even get here if the variable was live
 +  at the end of the block since just a ref inside the block
 +  does not effect the calculations.  */
 REG_LIVE_LENGTH (uregno) ++;
 +   local_live_last_luid[uregno] = luid;
 bitmap_set_bit (local_live, uregno);
 bitmap_set_bit (local_processed, uregno);
   }
   }
  }
  
 +  /* Add the liveness length to all registers that were used somewhere
 + in this bock, but not between that use and the head of this block.  */

Typo is here (bock-block).

Paolo


Re: patch for PR53125

2012-05-15 Thread Steven Bosscher
On Fri, May 11, 2012 at 8:42 PM, Steven Bosscher stevenb@gmail.com wrote:
 Now with patch

 On Fri, May 11, 2012 at 8:42 PM, Steven Bosscher stevenb@gmail.com 
 wrote:
 2012-05-10  Vladimir Makarovvmaka...@redhat.com

          PR rtl-optimization/53125
          * ira.c (ira): Call find_moveable_pseudos and
          move_unallocated_pseudos if only ira_conflicts_p is true.

 And the attached patch fixes the reginfo slowness.

 The reginfo pass was doing:

  for each bb:
    for each insn in bb, from BB_END(bb) to BB_HEAD(bb):
      for each reg that died between BB_END(bb) and insn: // i.e.
 register is live at insn
        REG_LIVE_LENGTH(reg)++

 With very large basic blocks, and the kind of code for the test case
 of the PR results in many live registers for SPARC (for x86_64 there
 are far fewer live registers). For SPARC, there are O(1e5) insns and
 O(1e4) registers live for most of the basic block. That is effectively
 almost quadratic behavior in the number of insns per basic block.

 But the above loop is a bit silly. It is much easier and
 computationally cheaper to just remember at what insn reg died (last
 used), and add to REG_LIVE_LENGTH the distance from the insn that sets
 reg to the insn that used reg.

 It turns out that (before or after the patch) partial or conditional
 sets never kill a register, so that REG_LIVE_LENGTH for registers set
 by partial or conditional stores is not very accurate. I have a few
 ideas for how to improve that situation a bit, but I'm not sure if
 it's worth the trouble.

 I also think that the computation of REG_FREQ and
 REG_FREQ_CALLS_CROSSED has been wrong for some time. The maximum
 REG_FREQ is REG_FREQ_MAX (see regs.h) but regstat.c just accumulates
 REG_FREQ_FROM_BB. I've added a few lines to limit the REG_FREQ to
 REG_FREQ_MAX, e.g.:

                  REG_FREQ (dregno) += REG_FREQ_FROM_BB (bb);
 +                 REG_FREQ (dregno) =
 +                   MIN (REG_FREQ (dregno), REG_FREQ_MAX);

 With this patch, the reginfo pass disappears from the time report for
 the test case attached to PR53125, compiled at -O0.

 Bootstrapped and tested on x86_64-unknown-linux-gnu, and I verified
 for a few simple test cases that the computed REG_LIVE_LENGHTs are
 unchanged. OK for trunk?

 Ciao!
 Steven

Ping?


Re: patch for PR53125

2012-05-15 Thread Vladimir Makarov

On 05/15/2012 07:59 AM, Steven Bosscher wrote:

On Fri, May 11, 2012 at 8:42 PM, Steven Bosscherstevenb@gmail.com  wrote:

Now with patch

On Fri, May 11, 2012 at 8:42 PM, Steven Bosscherstevenb@gmail.com  wrote:

2012-05-10  Vladimir Makarovvmaka...@redhat.com

  PR rtl-optimization/53125
  * ira.c (ira): Call find_moveable_pseudos and
  move_unallocated_pseudos if only ira_conflicts_p is true.

And the attached patch fixes the reginfo slowness.

The reginfo pass was doing:

  for each bb:
for each insn in bb, from BB_END(bb) to BB_HEAD(bb):
  for each reg that died between BB_END(bb) and insn: // i.e.
register is live at insn
REG_LIVE_LENGTH(reg)++

With very large basic blocks, and the kind of code for the test case
of the PR results in many live registers for SPARC (for x86_64 there
are far fewer live registers). For SPARC, there are O(1e5) insns and
O(1e4) registers live for most of the basic block. That is effectively
almost quadratic behavior in the number of insns per basic block.

But the above loop is a bit silly. It is much easier and
computationally cheaper to just remember at what insn reg died (last
used), and add to REG_LIVE_LENGTH the distance from the insn that sets
reg to the insn that used reg.

It turns out that (before or after the patch) partial or conditional
sets never kill a register, so that REG_LIVE_LENGTH for registers set
by partial or conditional stores is not very accurate. I have a few
ideas for how to improve that situation a bit, but I'm not sure if
it's worth the trouble.

I also think that the computation of REG_FREQ and
REG_FREQ_CALLS_CROSSED has been wrong for some time. The maximum
REG_FREQ is REG_FREQ_MAX (see regs.h) but regstat.c just accumulates
REG_FREQ_FROM_BB. I've added a few lines to limit the REG_FREQ to
REG_FREQ_MAX, e.g.:

  REG_FREQ (dregno) += REG_FREQ_FROM_BB (bb);
+ REG_FREQ (dregno) =
+   MIN (REG_FREQ (dregno), REG_FREQ_MAX);

With this patch, the reginfo pass disappears from the time report for
the test case attached to PR53125, compiled at -O0.

Bootstrapped and tested on x86_64-unknown-linux-gnu, and I verified
for a few simple test cases that the computed REG_LIVE_LENGHTs are
unchanged. OK for trunk?
Sorry, Steven.  I believe that regstat.c is a part of 
DF-infrastructure.  At least, it was introduced by this work.  Therefore 
I did not respond to your email and probably can not have a right to 
approve the patch.


May be Kenny or Paolo (as the most active reviewers of DF-infrastructure 
these days) can help you with the approval.







Re: patch for PR53125

2012-05-11 Thread Vladimir Makarov

On 05/11/2012 08:17 AM, Richard Earnshaw wrote:

On 10/05/12 20:58, Vladimir Makarov wrote:

The following patch is for PR53125.  The PR is described on
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53125.

The patch improves the compilation speed by 35% for the case.

The patch was successfully bootstrapped on x86-64.

Committed as rev. 187373.

2012-05-10  Vladimir Makarovvmaka...@redhat.com

  PR rtl-optimization/53125
  * ira.c (ira): Call find_moveable_pseudos and
  move_unallocated_pseudos if only ira_conflicts_p is true.




ENOPATCH?


Sorry for ommitting the patch.  I added it to the attachment.


Index: ira.c
===
--- ira.c   (revision 187372)
+++ ira.c   (working copy)
@@ -4125,7 +4125,12 @@ ira (FILE *f)
 }

   allocated_reg_info_size = max_reg_num ();
-  find_moveable_pseudos ();
+
+  /* It is not worth to do such improvement when we use a simple
+ allocation because of -O0 usage or because the function is too
+ big.  */
+  if (ira_conflicts_p)
+find_moveable_pseudos ();

   max_regno_before_ira = max_reg_num ();
   ira_setup_eliminable_regset ();
@@ -4234,7 +4239,10 @@ ira (FILE *f)
  max_regno * sizeof (struct ira_spilled_reg_stack_slot));
 }
   allocate_initial_values (reg_equivs);
-  move_unallocated_pseudos ();
+
+  /* See comment for find_moveable_pseudos call.  */
+  if (ira_conflicts_p)
+move_unallocated_pseudos ();
 }

 static void


Re: patch for PR53125

2012-05-11 Thread Steven Bosscher
 2012-05-10  Vladimir Makarovvmaka...@redhat.com

          PR rtl-optimization/53125
          * ira.c (ira): Call find_moveable_pseudos and
          move_unallocated_pseudos if only ira_conflicts_p is true.

And the attached patch fixes the reginfo slowness.

The reginfo pass was doing:

  for each bb:
for each insn in bb, from BB_END(bb) to BB_HEAD(bb):
  for each reg that died between BB_END(bb) and insn: // i.e.
register is live at insn
REG_LIVE_LENGTH(reg)++

With very large basic blocks, and the kind of code for the test case
of the PR results in many live registers for SPARC (for x86_64 there
are far fewer live registers). For SPARC, there are O(1e5) insns and
O(1e4) registers live for most of the basic block. That is effectively
almost quadratic behavior in the number of insns per basic block.

But the above loop is a bit silly. It is much easier and
computationally cheaper to just remember at what insn reg died (last
used), and add to REG_LIVE_LENGTH the distance from the insn that sets
reg to the insn that used reg.

It turns out that (before or after the patch) partial or conditional
sets never kill a register, so that REG_LIVE_LENGTH for registers set
by partial or conditional stores is not very accurate. I have a few
ideas for how to improve that situation a bit, but I'm not sure if
it's worth the trouble.

I also think that the computation of REG_FREQ and
REG_FREQ_CALLS_CROSSED has been wrong for some time. The maximum
REG_FREQ is REG_FREQ_MAX (see regs.h) but regstat.c just accumulates
REG_FREQ_FROM_BB. I've added a few lines to limit the REG_FREQ to
REG_FREQ_MAX, e.g.:

  REG_FREQ (dregno) += REG_FREQ_FROM_BB (bb);
+ REG_FREQ (dregno) =
+   MIN (REG_FREQ (dregno), REG_FREQ_MAX);

With this patch, the reginfo pass disappears from the time report for
the test case attached to PR53125, compiled at -O0.

Bootstrapped and tested on x86_64-unknown-linux-gnu, and I verified
for a few simple test cases that the computed REG_LIVE_LENGHTs are
unchanged. OK for trunk?

Ciao!
Steven


Re: patch for PR53125

2012-05-11 Thread Steven Bosscher
Now with patch

On Fri, May 11, 2012 at 8:42 PM, Steven Bosscher stevenb@gmail.com wrote:
 2012-05-10  Vladimir Makarovvmaka...@redhat.com

          PR rtl-optimization/53125
          * ira.c (ira): Call find_moveable_pseudos and
          move_unallocated_pseudos if only ira_conflicts_p is true.

 And the attached patch fixes the reginfo slowness.

 The reginfo pass was doing:

  for each bb:
    for each insn in bb, from BB_END(bb) to BB_HEAD(bb):
      for each reg that died between BB_END(bb) and insn: // i.e.
 register is live at insn
        REG_LIVE_LENGTH(reg)++

 With very large basic blocks, and the kind of code for the test case
 of the PR results in many live registers for SPARC (for x86_64 there
 are far fewer live registers). For SPARC, there are O(1e5) insns and
 O(1e4) registers live for most of the basic block. That is effectively
 almost quadratic behavior in the number of insns per basic block.

 But the above loop is a bit silly. It is much easier and
 computationally cheaper to just remember at what insn reg died (last
 used), and add to REG_LIVE_LENGTH the distance from the insn that sets
 reg to the insn that used reg.

 It turns out that (before or after the patch) partial or conditional
 sets never kill a register, so that REG_LIVE_LENGTH for registers set
 by partial or conditional stores is not very accurate. I have a few
 ideas for how to improve that situation a bit, but I'm not sure if
 it's worth the trouble.

 I also think that the computation of REG_FREQ and
 REG_FREQ_CALLS_CROSSED has been wrong for some time. The maximum
 REG_FREQ is REG_FREQ_MAX (see regs.h) but regstat.c just accumulates
 REG_FREQ_FROM_BB. I've added a few lines to limit the REG_FREQ to
 REG_FREQ_MAX, e.g.:

                  REG_FREQ (dregno) += REG_FREQ_FROM_BB (bb);
 +                 REG_FREQ (dregno) =
 +                   MIN (REG_FREQ (dregno), REG_FREQ_MAX);

 With this patch, the reginfo pass disappears from the time report for
 the test case attached to PR53125, compiled at -O0.

 Bootstrapped and tested on x86_64-unknown-linux-gnu, and I verified
 for a few simple test cases that the computed REG_LIVE_LENGHTs are
 unchanged. OK for trunk?

 Ciao!
 Steven


regstat_smarter_local_live_v2.diff
Description: Binary data


patch for PR53125

2012-05-10 Thread Vladimir Makarov

The following patch is for PR53125.  The PR is described on
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53125.

The patch improves the compilation speed by 35% for the case.

The patch was successfully bootstrapped on x86-64.

Committed as rev. 187373.

2012-05-10  Vladimir Makarov vmaka...@redhat.com

PR rtl-optimization/53125
* ira.c (ira): Call find_moveable_pseudos and
move_unallocated_pseudos if only ira_conflicts_p is true.