Re: [PATCH] PR68212, Correct frequencies/counts when unrolling

2016-10-13 Thread Jeff Law

On 09/20/2016 03:27 PM, Pat Haugen wrote:

The following patch corrects frequency/count values computed when generating 
the switch blocks/peeled loop copies before the loop. The two main problem 
areas were for the peeled copies duplicate_loop_to_header_edge was not using 
the preheader frequency as part of the scale factor when peeling a copy of the 
loop to the preheader edge, and the second was that the switch block generation 
was just totally lacking code to compute correct freq/count values. Verified by 
comparing freq/count values in the unroller dump before/after.

Bootstrap/regtest on powerpc64le with no new regressions. Ok for trunk?

-Pat



2016-09-20  Pat Haugen  

PR rtl-optimization/68212
* cfgloopmanip.c (duplicate_loop_to_header_edge): Use preheader edge
frequency when computing scale factor for peeled copies.
* loop-unroll.c (unroll_loop_runtime_iterations): Fix freq/count
values for switch/peel blocks/edges.



OK.  Thanks for your patience.

Jeff


PING Re: [PATCH] PR68212, Correct frequencies/counts when unrolling

2016-10-04 Thread Pat Haugen
Ping for the following patch 
https://gcc.gnu.org/ml/gcc-patches/2016-09/msg01363.html

-Pat



[PATCH] PR68212, Correct frequencies/counts when unrolling

2016-09-20 Thread Pat Haugen
The following patch corrects frequency/count values computed when generating 
the switch blocks/peeled loop copies before the loop. The two main problem 
areas were for the peeled copies duplicate_loop_to_header_edge was not using 
the preheader frequency as part of the scale factor when peeling a copy of the 
loop to the preheader edge, and the second was that the switch block generation 
was just totally lacking code to compute correct freq/count values. Verified by 
comparing freq/count values in the unroller dump before/after.

Bootstrap/regtest on powerpc64le with no new regressions. Ok for trunk?

-Pat



2016-09-20  Pat Haugen  

PR rtl-optimization/68212
* cfgloopmanip.c (duplicate_loop_to_header_edge): Use preheader edge
frequency when computing scale factor for peeled copies.
* loop-unroll.c (unroll_loop_runtime_iterations): Fix freq/count
values for switch/peel blocks/edges.


Index: gcc/cfgloopmanip.c
===
--- gcc/cfgloopmanip.c	(revision 240167)
+++ gcc/cfgloopmanip.c	(working copy)
@@ -1276,10 +1276,13 @@ duplicate_loop_to_header_edge (struct lo
 	}
   else
 	{
+	  int preheader_freq = EDGE_FREQUENCY (e);
 	  scale_main = REG_BR_PROB_BASE;
 	  for (i = 0; i < ndupl; i++)
 	scale_main = combine_probabilities (scale_main, scale_step[i]);
-	  scale_act = REG_BR_PROB_BASE - prob_pass_thru;
+	  if (preheader_freq > freq_in)
+	preheader_freq = freq_in;
+	  scale_act = GCOV_COMPUTE_SCALE (preheader_freq, freq_in);
 	}
   for (i = 0; i < ndupl; i++)
 	gcc_assert (scale_step[i] >= 0 && scale_step[i] <= REG_BR_PROB_BASE);
Index: gcc/loop-unroll.c
===
--- gcc/loop-unroll.c	(revision 240167)
+++ gcc/loop-unroll.c	(working copy)
@@ -858,7 +858,8 @@ unroll_loop_runtime_iterations (struct l
   rtx_insn *init_code, *branch_code;
   unsigned i, j, p;
   basic_block preheader, *body, swtch, ezc_swtch;
-  int may_exit_copy;
+  int may_exit_copy, iter_freq, new_freq;
+  gcov_type iter_count, new_count;
   unsigned n_peel;
   edge e;
   bool extra_zero_check, last_may_exit;
@@ -952,6 +953,15 @@ unroll_loop_runtime_iterations (struct l
   /* Record the place where switch will be built for preconditioning.  */
   swtch = split_edge (loop_preheader_edge (loop));
 
+  /* Compute frequency/count increments for each switch block and initialize
+ innermost switch block.  Switch blocks and peeled loop copies are built
+ from innermost outward.  */
+  iter_freq = new_freq = swtch->frequency / (max_unroll + 1);
+  iter_count = new_count = swtch->count / (max_unroll + 1);
+  swtch->frequency = new_freq;
+  swtch->count = new_count;
+  single_succ_edge (swtch)->count = new_count;
+
   for (i = 0; i < n_peel; i++)
 {
   /* Peel the copy.  */
@@ -969,6 +979,10 @@ unroll_loop_runtime_iterations (struct l
   p = REG_BR_PROB_BASE / (i + 2);
 
   preheader = split_edge (loop_preheader_edge (loop));
+  /* Add in frequency/count of edge from switch block.  */
+  preheader->frequency += iter_freq;
+  preheader->count += iter_count;
+  single_succ_edge (preheader)->count = preheader->count;
   branch_code = compare_and_jump_seq (copy_rtx (niter), GEN_INT (j), EQ,
 	  block_label (preheader), p,
 	  NULL);
@@ -980,9 +994,14 @@ unroll_loop_runtime_iterations (struct l
   swtch = split_edge_and_insert (single_pred_edge (swtch), branch_code);
   set_immediate_dominator (CDI_DOMINATORS, preheader, swtch);
   single_succ_edge (swtch)->probability = REG_BR_PROB_BASE - p;
+  single_succ_edge (swtch)->count = new_count;
+  new_freq += iter_freq;
+  new_count += iter_count;
+  swtch->frequency = new_freq;
+  swtch->count = new_count;
   e = make_edge (swtch, preheader,
 		 single_succ_edge (swtch)->flags & EDGE_IRREDUCIBLE_LOOP);
-  e->count = RDIV (preheader->count * REG_BR_PROB_BASE, p);
+  e->count = iter_count;
   e->probability = p;
 }
 
@@ -992,6 +1011,14 @@ unroll_loop_runtime_iterations (struct l
   p = REG_BR_PROB_BASE / (max_unroll + 1);
   swtch = ezc_swtch;
   preheader = split_edge (loop_preheader_edge (loop));
+  /* Recompute frequency/count adjustments since initial peel copy may
+	 have exited and reduced those values that were computed above.  */
+  iter_freq = swtch->frequency / (max_unroll + 1);
+  iter_count = swtch->count / (max_unroll + 1);
+  /* Add in frequency/count of edge from switch block.  */
+  preheader->frequency += iter_freq;
+  preheader->count += iter_count;
+  single_succ_edge (preheader)->count = preheader->count;
   branch_code = compare_and_jump_seq (copy_rtx (niter), const0_rtx, EQ,
 	  block_label (preheader), p,
 	  NULL);
@@ -1000,9 +1027,10 @@ unroll_loop_runtime_iterations (struct l
   swtch = split_edge_and_insert (single_succ_edge