Re: [PATCH] correct COUNT and PROB for unrolled loop

2020-02-10 Thread Jiufu Guo
Jan Hubicka  writes:

>> On Mon, 2020-02-03 at 10:04 -0600, Pat Haugen wrote:
>> > On 2/3/20 2:17 AM, Jiufu Guo wrote:
>> > > +/* { dg-final { scan-rtl-dump-times "REG_BR_PROB 937042044" 1 
>> > > "loop2_unroll"} } */
>> > 
>> > Sorry I didn't catch this addition to the original testcase
>> > earlier, but I wonder how stable this test is going to be. If
>> > there are future changes to default count/probability, or changes
>> > in their representation, this may fail and need to be updated. The
>> > fact that the loop is still getting aligned is the main concern.
>> Unless you're really interested in those probabilities, I'd suggest not
>> testing for them.  If you really need to test for them, then I'd
>> suggest testing for them being "close" rather than a specific value for
>> REG_BR_PROB.
>
> Note that REG_BR_PROB now encodes the actual probability as well as the
> profile quality (i.e. it is m_val * 8 + m_quality).
> We may want to invent better way to dump them, but it is better to match
> for CFG edge probability rather than the REG_BR_PROB_NOTE.
Thanks Honza, Pat, Jeff.
String like "count 661119332" in dump file may also not perfect.
I would removing this line before commit.

Welcome for any other comments!

Thanks,
Jiufu Guo.

>
> honza
>> 
>> jeff
>> 


Re: [PATCH] correct COUNT and PROB for unrolled loop

2020-02-03 Thread Jan Hubicka
> On Mon, 2020-02-03 at 10:04 -0600, Pat Haugen wrote:
> > On 2/3/20 2:17 AM, Jiufu Guo wrote:
> > > +/* { dg-final { scan-rtl-dump-times "REG_BR_PROB 937042044" 1 
> > > "loop2_unroll"} } */
> > 
> > Sorry I didn't catch this addition to the original testcase earlier, but I 
> > wonder how stable this test is going to be. If there are future changes to 
> > default count/probability, or changes in their representation, this may 
> > fail and need to be updated. The fact that the loop is still getting 
> > aligned is the main concern.
> Unless you're really interested in those probabilities, I'd suggest not
> testing for them.  If you really need to test for them, then I'd
> suggest testing for them being "close" rather than a specific value for
> REG_BR_PROB.

Note that REG_BR_PROB now encodes the actual probability as well as the
profile quality (i.e. it is m_val * 8 + m_quality).
We may want to invent better way to dump them, but it is better to match
for CFG edge probability rather than the REG_BR_PROB_NOTE.

honza
> 
> jeff
> 


Re: [PATCH] correct COUNT and PROB for unrolled loop

2020-02-03 Thread Jeff Law
On Mon, 2020-02-03 at 10:04 -0600, Pat Haugen wrote:
> On 2/3/20 2:17 AM, Jiufu Guo wrote:
> > +/* { dg-final { scan-rtl-dump-times "REG_BR_PROB 937042044" 1 
> > "loop2_unroll"} } */
> 
> Sorry I didn't catch this addition to the original testcase earlier, but I 
> wonder how stable this test is going to be. If there are future changes to 
> default count/probability, or changes in their representation, this may fail 
> and need to be updated. The fact that the loop is still getting aligned is 
> the main concern.
Unless you're really interested in those probabilities, I'd suggest not
testing for them.  If you really need to test for them, then I'd
suggest testing for them being "close" rather than a specific value for
REG_BR_PROB.

jeff



Re: [PATCH] correct COUNT and PROB for unrolled loop

2020-02-03 Thread Pat Haugen
On 2/3/20 2:17 AM, Jiufu Guo wrote:
> +/* { dg-final { scan-rtl-dump-times "REG_BR_PROB 937042044" 1 
> "loop2_unroll"} } */

Sorry I didn't catch this addition to the original testcase earlier, but I 
wonder how stable this test is going to be. If there are future changes to 
default count/probability, or changes in their representation, this may fail 
and need to be updated. The fact that the loop is still getting aligned is the 
main concern.

-Pat



[PATCH] correct COUNT and PROB for unrolled loop

2020-02-03 Thread Jiufu Guo
Hi,
PR68212 mentioned that the COUNT of unrolled loop was not correct, and
comments of this PR also mentioned that loop become 'cold'.  The patches
of the PR fixed part of the issue.  With reference the patch
(https://gcc.gnu.org/ml/gcc-patches/2018-11/msg02368.html) and comment
(https://gcc.gnu.org/ml/gcc-patches/2018-11/msg02380.html), below patch
is drafted to fix other part of this issue.

The following patch fixes the wrong COUNT/PROB of unrolled loop.  And the
patch handles the case where unrolling in unreliable count number can
cause a loop to no longer look hot and therefor not get aligned.  This
patch corrects the PROB of loop exit edge, and corrects RPOB/COUNT of
latch block, and the loop count after last peeling.  This patch scale by
profile_probability::likely () if unrolled count gets unrealistically small.

Bootstrap/regtest on powerpc64le with no new regressions.
And spec2017 result is fine: a couple INT benchmarks that showed around
1.7% improvement, everything else was +/- <= 1%.

Ok for trunk?

Jiufu Guo

2020-02-03  Jiufu Guo   
Pat Haugen  

PR rtl-optimization/68212
* cfgloopmanip.c (duplicate_loop_to_header_edge): Correct COUNT/PROB
for unrolled/peeled blocks.

testsuite/ChangeLog:
2020-02-03  Jiufu Guo   
Pat Haugen  
PR rtl-optimization/68212
* gcc.dg/pr68212.c: New test.


---
 gcc/cfgloopmanip.c | 53 --
 gcc/testsuite/gcc.dg/pr68212.c | 13 +++
 2 files changed, 64 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/pr68212.c

diff --git a/gcc/cfgloopmanip.c b/gcc/cfgloopmanip.c
index 727e951..ded0046 100644
--- a/gcc/cfgloopmanip.c
+++ b/gcc/cfgloopmanip.c
@@ -31,6 +31,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "gimplify-me.h"
 #include "tree-ssa-loop-manip.h"
 #include "dumpfile.h"
+#include "cfgrtl.h"
 
 static void copy_loops_to (class loop **, int,
   class loop *);
@@ -1258,14 +1259,30 @@ duplicate_loop_to_header_edge (class loop *loop, edge e,
  /* If original loop is executed COUNT_IN times, the unrolled
 loop will account SCALE_MAIN_DEN times.  */
  scale_main = count_in.probability_in (scale_main_den);
+
+ /* If we are guessing at the number of iterations and count_in
+becomes unrealistically small, reset probability.  */
+ if (!(count_in.reliable_p () || loop->any_estimate))
+   {
+ profile_count new_count_in = count_in.apply_probability 
(scale_main);
+ profile_count preheader_count = loop_preheader_edge (loop)->count 
();
+ if (new_count_in.apply_scale (1, 10) < preheader_count)
+   scale_main = profile_probability::likely ();
+   }
+
  scale_act = scale_main * prob_pass_main;
}
   else
{
+ profile_count new_loop_count;
  profile_count preheader_count = e->count ();
- for (i = 0; i < ndupl; i++)
-   scale_main = scale_main * scale_step[i];
  scale_act = preheader_count.probability_in (count_in);
+ /* Compute final preheader count after peeling NDUPL copies.  */
+ for (i = 0; i < ndupl; i++)
+   preheader_count = preheader_count.apply_probability (scale_step[i]);
+ /* Subtract out exit(s) from peeled copies.  */
+ new_loop_count = count_in - (e->count () - preheader_count);
+ scale_main = new_loop_count.probability_in (count_in);
}
 }
 
@@ -1381,6 +1398,38 @@ duplicate_loop_to_header_edge (class loop *loop, edge e,
  scale_bbs_frequencies (new_bbs, n, scale_act);
  scale_act = scale_act * scale_step[j];
}
+
+  /* Need to update PROB of exit edge and corresponding COUNT.  */
+  if (orig && is_latch && (!bitmap_bit_p (wont_exit, j + 1))
+ && bbs_to_scale)
+   {
+ edge new_exit = new_spec_edges[SE_ORIG];
+ profile_count new_count_in = new_exit->src->count;
+ profile_count preheader_count = loop_preheader_edge (loop)->count ();
+ edge e;
+ edge_iterator ei;
+
+ FOR_EACH_EDGE (e, ei, new_exit->src->succs)
+   if (e != new_exit)
+ break;
+
+ gcc_assert (e && e != new_exit);
+
+ new_exit->probability = preheader_count.probability_in (new_count_in);
+ e->probability = new_exit->probability.invert ();
+
+ profile_count new_latch_count
+   = new_exit->src->count.apply_probability (e->probability);
+ profile_count old_latch_count = e->dest->count;
+
+ EXECUTE_IF_SET_IN_BITMAP (bbs_to_scale, 0, i, bi)
+   scale_bbs_frequencies_profile_count (new_bbs + i, 1,
+new_latch_count,
+old_latch_count);
+
+ if (current_ir_type () != IR_GIMPLE)
+   update_br_prob_note (e->src);
+