Re: Ping: [PATCH 1/2] correct BB frequencies after loop changed

2021-07-04 Thread guojiufu via Gcc-patches

Hi Honza and All,

After more checks, I'm thinking these patches may still be useful.
For patch 1:

https://gcc.gnu.org/pipermail/gcc-patches/2020-October/555871.html

This patch recalculates the loop's BB-count and could correct
some BB-count mismatch for loops which has a single exit.
From the test result, we could say it reduce mismatched BB-counts
slightly.

For patch 2:
https://gcc.gnu.org/pipermail/gcc-patches/2020-October/555872.html
I updated as below:
It reset the loop's probability when the loop count becomes 
unrealistically
small.  In theory, it seems this would be the right direction to do 
this.


Bootstrap/regtest on powerpc64le with no new regressions. I'm thinking 
if

this is acceptable for trunk?

BR,
Jiufu Guo

Subject: Reset edge probability and BB-count for peeled/unrolled loop

This patch fix handles the case where unrolling in an unreliable count
number can cause a loop to no longer look hot and therefore not get 
aligned.
This patch scale by profile_probability::likely () if unrolled count 
gets
unrealistically small.  And this patch fixes the COUNT/PROB of peeled 
loop.


gcc/ChangeLog:
2021-07-01  Jiufu Guo   
Pat Haugen  

PR rtl-optimization/68212
* cfgloopmanip.c (duplicate_loop_to_header_edge): Reset probablity
of unrolled/peeled loop.

testsuite/ChangeLog:
2021-07-01  Jiufu Guo   
Pat Haugen  
PR rtl-optimization/68212
* gcc.dg/pr68212.c: New test.


---
 gcc/cfgloopmanip.c | 20 ++--
 gcc/testsuite/gcc.dg/pr68212.c | 13 +
 2 files changed, 31 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/pr68212.c

diff --git a/gcc/cfgloopmanip.c b/gcc/cfgloopmanip.c
index 4a9ab74642c..29d858c878a 100644
--- a/gcc/cfgloopmanip.c
+++ b/gcc/cfgloopmanip.c
@@ -1258,14 +1258,30 @@ duplicate_loop_to_header_edge (class loop *loop, 
edge e,

  /* If original loop is executed COUNT_IN times, the unrolled
 loop will account SCALE_MAIN_DEN times.  */
  scale_main = count_in.probability_in (scale_main_den);
+
+ /* If we are guessing at the number of iterations and count_in
+becomes unrealistically small, reset probability.  */
+ if (!(count_in.reliable_p () || loop->any_estimate))
+   {
+	  profile_count new_count_in = count_in.apply_probability 
(scale_main);
+	  profile_count preheader_count = loop_preheader_edge 
(loop)->count ();

+ if (new_count_in.apply_scale (1, 10) < preheader_count)
+   scale_main = profile_probability::likely ();
+   }
+
  scale_act = scale_main * prob_pass_main;
}
   else
{
+ profile_count new_loop_count;
  profile_count preheader_count = e->count ();
- for (i = 0; i < ndupl; i++)
-   scale_main = scale_main * scale_step[i];
  scale_act = preheader_count.probability_in (count_in);
+ /* Compute final preheader count after peeling NDUPL copies.  */
+ for (i = 0; i < ndupl; i++)
+	preheader_count = preheader_count.apply_probability 
(scale_step[i]);

+ /* Subtract out exit(s) from peeled copies.  */
+ new_loop_count = count_in - (e->count () - preheader_count);
+ scale_main = new_loop_count.probability_in (count_in);
}
 }

diff --git a/gcc/testsuite/gcc.dg/pr68212.c 
b/gcc/testsuite/gcc.dg/pr68212.c

new file mode 100644
index 000..e0cf71d5202
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr68212.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fno-tree-vectorize -funroll-loops --param 
max-unroll-times=4 -fdump-rtl-alignments" } */

+
+void foo(long int *a, long int *b, long int n)
+{
+  long int i;
+
+  for (i = 0; i < n; i++)
+a[i] = *b;
+}
+
+/* { dg-final { scan-rtl-dump-times "internal loop alignment added" 1 
"alignments"} } */

+
--
2.17.1



On 2021-06-18 16:24, guojiufu via Gcc-patches wrote:

On 2021-06-15 12:57, guojiufu via Gcc-patches wrote:

On 2021-06-14 17:16, Jan Hubicka wrote:



On 5/6/2021 8:36 PM, guojiufu via Gcc-patches wrote:
> Gentle ping.
>
> Original message:
> https://gcc.gnu.org/pipermail/gcc-patches/2020-October/555871.html
I think you need a more aggressive ping  :-)

OK for the trunk.  Sorry for the long delay.  I kept hoping someone 
else

would step in and look at it.

Sorry, the patch was on my todo list to think through for a while :(
It seems to me that both old and new code needs bit more work.  First
the exit loop frequency is set to

 prob = profile_probability::always ().apply_scale (1, new_est_niter 
+ 1);


which is only correct if the estimated number of iterations is 
accurate.
If we do not have profile feedback and trip count is not known 
precisely
in most cases it won't be.  We estimate loops to iterate about 3 
times
and then niter_for_unrolled_loop will apply the capping to 5 
iterations

that is completely arbitrary.

Forcing exit probability to preci

Re: Ping: [PATCH 1/2] correct BB frequencies after loop changed

2021-06-18 Thread guojiufu via Gcc-patches

On 2021-06-15 12:57, guojiufu via Gcc-patches wrote:

On 2021-06-14 17:16, Jan Hubicka wrote:



On 5/6/2021 8:36 PM, guojiufu via Gcc-patches wrote:
> Gentle ping.
>
> Original message:
> https://gcc.gnu.org/pipermail/gcc-patches/2020-October/555871.html
I think you need a more aggressive ping  :-)

OK for the trunk.  Sorry for the long delay.  I kept hoping someone 
else

would step in and look at it.

Sorry, the patch was on my todo list to think through for a while :(
It seems to me that both old and new code needs bit more work.  First
the exit loop frequency is set to

 prob = profile_probability::always ().apply_scale (1, new_est_niter + 
1);


which is only correct if the estimated number of iterations is 
accurate.
If we do not have profile feedback and trip count is not known 
precisely

in most cases it won't be.  We estimate loops to iterate about 3 times
and then niter_for_unrolled_loop will apply the capping to 5 
iterations

that is completely arbitrary.

Forcing exit probability to precise may then disable futher loop
optimizations since after the change we will think we know the loop
iterates 5 times and thus it is not worthy for loop opt (which is 
quite
oposite with the fact that we are just unrolling it thinking it is 
hot).


Thanks, understand your concern, both new and old code are assuming the
the number of iterations is accurate.
Maybe we could add code to reset exit probability for the case
where "!count_in.reliable_p ()".



Old code does
 1) scale body down so only one iteration is done
 2) set exit edge probability to be 1/(new_est_iter+1)
precisely
 3) scale up accoring to the 1/new_nonexit_prob
which would be correct if the nonexit probability was updated to
1-exit_probability but that does not seem to happen.

New code does

Yes, this is intended: we know that the enter-count should be
equal to the exit-count of one loop, and then the
"loop-body-count * exit-probability = exit-count".
Also, the entry count of the loop would not be changed before and after
one optimization (or slightly change,e.g. peeling count).

Based on this, we could adjust the loop body count according to
exit-count (or say enter-count) and exit-probability, when the
exit-probability is easy to estimate.


 1) give up when there are multiple exits.
I wonder how common this is - we do outer loop vectorizaiton


Hi Honza, and guys:

I just had a statistic for bootstrap/test and spec2017 build and find
there are ~1700 times of single loops are hit this code; in spec2017 
build,

it hits 226 single-exit loops, and multi-exit loops are not hit.

Had a test with profile-report to see "mismatch count', with these 
patches
we may say the "mismatch count' is mitigated slightly, but not very 
aggressive:

150 mismatch counts are reduced.
But 119 mismatch counts are increased.

Any comments about this patch? Is it acceptable for the trunk? Thanks.


BR,
Jiufu Guo.




The computation in the new code is based on a single exit. This is
also a requirement of old code, and it would be true when run to here.


 2) adjust loop body count according to the exit
 3) updat profile of BB after the exit edge.





Why do you need:
+  if (current_ir_type () != IR_GIMPLE)
+update_br_prob_note (exit->src);

It is tree_transform_and_unroll_loop, so I think we should always have
IR_GIMPLE?


These two lines are added to "recompute_loop_frequencies" which can be 
used

in rtl, like the second patch of this:
https://gcc.gnu.org/pipermail/gcc-patches/2020-October/555872.html
Oh, maybe these two lines code would be put to 
tree_transform_and_unroll_loop

instead of common code recompute_loop_frequencies.

Thanks a lot for the review in your busy time!

BR.
Jiufu Guo


Honza


jeff


Re: Ping: [PATCH 1/2] correct BB frequencies after loop changed

2021-06-16 Thread guojiufu via Gcc-patches

On 2021-06-15 12:57, guojiufu via Gcc-patches wrote:

On 2021-06-14 17:16, Jan Hubicka wrote:



On 5/6/2021 8:36 PM, guojiufu via Gcc-patches wrote:
> Gentle ping.
>
> Original message:
> https://gcc.gnu.org/pipermail/gcc-patches/2020-October/555871.html
I think you need a more aggressive ping  :-)

OK for the trunk.  Sorry for the long delay.  I kept hoping someone 
else

would step in and look at it.

Sorry, the patch was on my todo list to think through for a while :(
It seems to me that both old and new code needs bit more work.  First
the exit loop frequency is set to

 prob = profile_probability::always ().apply_scale (1, new_est_niter + 
1);


which is only correct if the estimated number of iterations is 
accurate.
If we do not have profile feedback and trip count is not known 
precisely

in most cases it won't be.  We estimate loops to iterate about 3 times
and then niter_for_unrolled_loop will apply the capping to 5 
iterations

that is completely arbitrary.

Forcing exit probability to precise may then disable futher loop
optimizations since after the change we will think we know the loop
iterates 5 times and thus it is not worthy for loop opt (which is 
quite
oposite with the fact that we are just unrolling it thinking it is 
hot).


Thanks, understand your concern, both new and old code are assuming the
the number of iterations is accurate.
Maybe we could add code to reset exit probability for the case
where "!count_in.reliable_p ()".



Old code does
 1) scale body down so only one iteration is done
 2) set exit edge probability to be 1/(new_est_iter+1)
precisely
 3) scale up accoring to the 1/new_nonexit_prob
which would be correct if the nonexit probability was updated to
1-exit_probability but that does not seem to happen.

New code does

Yes, this is intended: we know that the enter-count should be
equal to the exit-count of one loop, and then the
"loop-body-count * exit-probability = exit-count".
Also, the entry count of the loop would not be changed before and after
one optimization (or slightly change,e.g. peeling count).

Based on this, we could adjust the loop body count according to
exit-count (or say enter-count) and exit-probability, when the
exit-probability is easy to estimate.


 1) give up when there are multiple exits.
I wonder how common this is - we do outer loop vectorizaiton


The computation in the new code is based on a single exit. This is
also a requirement of old code, and it would be true when run to here.


To support multiple exits, I'm thinking about the way to calculate the
count/probability for each basic_block and each exit edge.  While it 
seems
the count/prob may not scale up on the same ratio.  This is another 
reason

I give up these cases with multi-exits.

Any suggestions about supporting these cases?


BR,
Jiufu Guo




 2) adjust loop body count according to the exit
 3) updat profile of BB after the exit edge.





Why do you need:
+  if (current_ir_type () != IR_GIMPLE)
+update_br_prob_note (exit->src);

It is tree_transform_and_unroll_loop, so I think we should always have
IR_GIMPLE?


These two lines are added to "recompute_loop_frequencies" which can be 
used

in rtl, like the second patch of this:
https://gcc.gnu.org/pipermail/gcc-patches/2020-October/555872.html
Oh, maybe these two lines code would be put to 
tree_transform_and_unroll_loop

instead of common code recompute_loop_frequencies.

Thanks a lot for the review in your busy time!

BR.
Jiufu Guo


Honza


jeff


Re: Ping: [PATCH 1/2] correct BB frequencies after loop changed

2021-06-14 Thread guojiufu via Gcc-patches

On 2021-06-14 17:16, Jan Hubicka wrote:



On 5/6/2021 8:36 PM, guojiufu via Gcc-patches wrote:
> Gentle ping.
>
> Original message:
> https://gcc.gnu.org/pipermail/gcc-patches/2020-October/555871.html
I think you need a more aggressive ping  :-)

OK for the trunk.  Sorry for the long delay.  I kept hoping someone 
else

would step in and look at it.

Sorry, the patch was on my todo list to think through for a while :(
It seems to me that both old and new code needs bit more work.  First
the exit loop frequency is set to

 prob = profile_probability::always ().apply_scale (1, new_est_niter + 
1);


which is only correct if the estimated number of iterations is 
accurate.
If we do not have profile feedback and trip count is not known 
precisely

in most cases it won't be.  We estimate loops to iterate about 3 times
and then niter_for_unrolled_loop will apply the capping to 5 iterations
that is completely arbitrary.

Forcing exit probability to precise may then disable futher loop
optimizations since after the change we will think we know the loop
iterates 5 times and thus it is not worthy for loop opt (which is quite
oposite with the fact that we are just unrolling it thinking it is 
hot).


Thanks, understand your concern, both new and old code are assuming the
the number of iterations is accurate.
Maybe we could add code to reset exit probability for the case
where "!count_in.reliable_p ()".



Old code does
 1) scale body down so only one iteration is done
 2) set exit edge probability to be 1/(new_est_iter+1)
precisely
 3) scale up accoring to the 1/new_nonexit_prob
which would be correct if the nonexit probability was updated to
1-exit_probability but that does not seem to happen.

New code does

Yes, this is intended: we know that the enter-count should be
equal to the exit-count of one loop, and then the
"loop-body-count * exit-probability = exit-count".
Also, the entry count of the loop would not be changed before and after
one optimization (or slightly change,e.g. peeling count).

Based on this, we could adjust the loop body count according to
exit-count (or say enter-count) and exit-probability, when the
exit-probability is easy to estimate.


 1) give up when there are multiple exits.
I wonder how common this is - we do outer loop vectorizaiton


The computation in the new code is based on a single exit. This is
also a requirement of old code, and it would be true when run to here.


 2) adjust loop body count according to the exit
 3) updat profile of BB after the exit edge.





Why do you need:
+  if (current_ir_type () != IR_GIMPLE)
+update_br_prob_note (exit->src);

It is tree_transform_and_unroll_loop, so I think we should always have
IR_GIMPLE?


These two lines are added to "recompute_loop_frequencies" which can be 
used

in rtl, like the second patch of this:
https://gcc.gnu.org/pipermail/gcc-patches/2020-October/555872.html
Oh, maybe these two lines code would be put to 
tree_transform_and_unroll_loop

instead of common code recompute_loop_frequencies.

Thanks a lot for the review in your busy time!

BR.
Jiufu Guo


Honza


jeff


Re: Ping: [PATCH 1/2] correct BB frequencies after loop changed

2021-06-14 Thread Jan Hubicka
> 
> 
> On 5/6/2021 8:36 PM, guojiufu via Gcc-patches wrote:
> > Gentle ping.
> > 
> > Original message:
> > https://gcc.gnu.org/pipermail/gcc-patches/2020-October/555871.html
> I think you need a more aggressive ping  :-)
> 
> OK for the trunk.  Sorry for the long delay.  I kept hoping someone else
> would step in and look at it.
Sorry, the patch was on my todo list to think through for a while :(
It seems to me that both old and new code needs bit more work.  First
the exit loop frequency is set to

 prob = profile_probability::always ().apply_scale (1, new_est_niter + 1);

which is only correct if the estimated number of iterations is accurate.
If we do not have profile feedback and trip count is not known precisely
in most cases it won't be.  We estimate loops to iterate about 3 times
and then niter_for_unrolled_loop will apply the capping to 5 iterations
that is completely arbitrary.

Forcing exit probability to precise may then disable futher loop
optimizations since after the change we will think we know the loop
iterates 5 times and thus it is not worthy for loop opt (which is quite
oposite with the fact that we are just unrolling it thinking it is hot).

Old code does
 1) scale body down so only one iteration is done
 2) set exit edge probability to be 1/(new_est_iter+1)
precisely
 3) scale up accoring to the 1/new_nonexit_prob
which would be correct if the nonexit probability was updated to
1-exit_probability but that does not seem to happen.

New code does
 1) give up when there are multiple exits.
I wonder how common this is - we do outer loop vectorizaiton
 2) adjust loop body count according to the exit
 3) updat profile of BB after the exit edge.

Why do you need:
+  if (current_ir_type () != IR_GIMPLE)
+update_br_prob_note (exit->src);

It is tree_transform_and_unroll_loop, so I think we should always have 
IR_GIMPLE?

Honza
> 
> jeff


Re: Ping: [PATCH 1/2] correct BB frequencies after loop changed

2021-06-13 Thread Jeff Law via Gcc-patches




On 5/6/2021 8:36 PM, guojiufu via Gcc-patches wrote:

Gentle ping.

Original message: 
https://gcc.gnu.org/pipermail/gcc-patches/2020-October/555871.html

I think you need a more aggressive ping  :-)

OK for the trunk.  Sorry for the long delay.  I kept hoping someone else 
would step in and look at it.


jeff


Ping: [PATCH 1/2] correct BB frequencies after loop changed

2021-05-06 Thread guojiufu via Gcc-patches

Gentle ping.

Original message: 
https://gcc.gnu.org/pipermail/gcc-patches/2020-October/555871.html



Thanks,
Jiufu Guo.