[Bug gcov-profile/77698] Unrolled loop not considered hot after profiling

2018-11-04 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77698

Martin Liška  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #10 from Martin Liška  ---
Fix on trunk.

[Bug gcov-profile/77698] Unrolled loop not considered hot after profiling

2018-11-04 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77698

--- Comment #9 from Martin Liška  ---
Author: marxin
Date: Mon Nov  5 07:32:48 2018
New Revision: 265784

URL: https://gcc.gnu.org/viewcvs?rev=265784=gcc=rev
Log:
Fix setting of hotness in non-LTO mode (PR gcov-profile/77698).

2018-11-05  Martin Liska  

PR gcov-profile/77698
* ipa-profile.c (ipa_profile): Adjust hotness threshold
only in LTO mode.
2018-11-05  Martin Liska  

PR gcov-profile/77698
* gcc.dg/tree-prof/pr77698.c: New test.

Added:
trunk/gcc/testsuite/gcc.dg/tree-prof/pr77698.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/ipa-profile.c
trunk/gcc/testsuite/ChangeLog

[Bug gcov-profile/77698] Unrolled loop not considered hot after profiling

2018-10-10 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77698

--- Comment #8 from Martin Liška  ---
(In reply to Pat Haugen from comment #7)
> I also see the loop now being aligned when I apply your patch.
> 
> srdi 10,10,2
> mtctr 10
> .p2align 4,,15
> .L6:
> ld 9,0(11)
> ld 8,0(4)

Great, patch has been tested and is sitting here:
https://gcc.gnu.org/ml/gcc-patches/2018-10/msg00501.html

[Bug gcov-profile/77698] Unrolled loop not considered hot after profiling

2018-10-10 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77698

Martin Liška  changed:

   What|Removed |Added

   Target Milestone|--- |9.0

[Bug gcov-profile/77698] Unrolled loop not considered hot after profiling

2018-10-09 Thread pthaugen at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77698

--- Comment #7 from Pat Haugen  ---
I also see the loop now being aligned when I apply your patch.

srdi 10,10,2
mtctr 10
.p2align 4,,15
.L6:
ld 9,0(11)
ld 8,0(4)

[Bug gcov-profile/77698] Unrolled loop not considered hot after profiling

2018-10-09 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77698

Martin Liška  changed:

   What|Removed |Added

 Status|REOPENED|ASSIGNED

--- Comment #6 from Martin Liška  ---
Confirmed, I've got patch candidate for it:

diff --git a/gcc/ipa-profile.c b/gcc/ipa-profile.c
index c74f4a4a41d..7065af59ba9 100644
--- a/gcc/ipa-profile.c
+++ b/gcc/ipa-profile.c
@@ -533,11 +533,10 @@ ipa_profile (void)
   cumulated_size * 100.0 / overall_size);
}

-  if (threshold > get_hot_bb_threshold ()
- || in_lto_p)
+  if (in_lto_p)
{
  if (dump_file)
-   fprintf (dump_file, "Threshold updated.\n");
+   fprintf (dump_file, "Setting hotness threshold in LTO mode.\n");
   set_hot_bb_threshold (threshold);
}
 }

With the patch applied, I see following .S diff:

iff -u before.s after.s
--- before.s2018-10-09 13:40:03.464360367 +0200
+++ after.s 2018-10-09 13:38:46.246736676 +0200
@@ -56,6 +56,8 @@
movq%rax, (%rdi,%rcx,8)
cmpq%r9, %rdx
je  .L23
+   .p2align 4,,10
+   .p2align 3
 .L6:
movqj(%rip), %r10
movq(%rsi), %r8

Which is hopefully what you expect to happen?

[Bug gcov-profile/77698] Unrolled loop not considered hot after profiling

2018-10-05 Thread pthaugen at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77698

Pat Haugen  changed:

   What|Removed |Added

 Status|RESOLVED|REOPENED
 Resolution|FIXED   |---

--- Comment #5 from Pat Haugen  ---
It's still not fixed in current trunk. After unrolling maybe_hot_bb_p() returns
false (via maybe_hot_count_p()), which prevents aligning the label in
final.c:compute_alignments(). Here's the tail section of debug session and
partial backtrace to show.

maybe_hot_count_p (fun=0x759f, count=...) at
/home/pthaugen/src/gcc/trunk_work/gcc/gcc/predict.c:185
185   return (count.to_gcov_type () >= get_hot_bb_threshold ());
(gdb) p count.to_gcov_type ()
$3 = 25
(gdb) p get_hot_bb_threshold ()
$4 = 100
(gdb) bt
#0  maybe_hot_count_p (fun=0x759f, count=...) at
/home/pthaugen/src/gcc/trunk_work/gcc/gcc/predict.c:185
#1  0x10d8fdb0 in maybe_hot_bb_p (fun=0x759f,
bb=0x759801a0)
at /home/pthaugen/src/gcc/trunk_work/gcc/gcc/predict.c:195
#2  0x10d9045c in optimize_bb_for_size_p (bb=0x759801a0)
at /home/pthaugen/src/gcc/trunk_work/gcc/gcc/predict.c:301
#3  0x108c7234 in compute_alignments () at
/home/pthaugen/src/gcc/trunk_work/gcc/gcc/final.c:674
#4  0x108c7d3c in (anonymous
namespace)::pass_compute_alignments::execute (this=0x12886200)
at /home/pthaugen/src/gcc/trunk_work/gcc/gcc/final.c:823

[Bug gcov-profile/77698] Unrolled loop not considered hot after profiling

2018-10-03 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77698

Martin Liška  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #4 from Martin Liška  ---
Should be fixed since r264462, not planning to backport that.

[Bug gcov-profile/77698] Unrolled loop not considered hot after profiling

2017-04-13 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77698

--- Comment #3 from Martin Liška  ---
(In reply to Martin Liška from comment #2)
> One another observation, working sets are wrong as they do not respect # of
> runs:
> 
> 1 run:
> threshold = 1, sum_all=190579
> 
> 10 tuns:
> threshold = 10, sum_all=1905790

Ahh, no it's correct as the counters are also growing with multiple runs.

[Bug gcov-profile/77698] Unrolled loop not considered hot after profiling

2017-04-13 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77698

--- Comment #2 from Martin Liška  ---
One another observation, working sets are wrong as they do not respect # of
runs:

1 run:
threshold = 1, sum_all=190579

10 tuns:
threshold = 10, sum_all=1905790

[Bug gcov-profile/77698] Unrolled loop not considered hot after profiling

2017-04-12 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77698

Martin Liška  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2017-04-12
 CC||marxin at gcc dot gnu.org
   Assignee|unassigned at gcc dot gnu.org  |marxin at gcc dot 
gnu.org
 Ever confirmed|0   |1

--- Comment #1 from Martin Liška  ---
Confirmed. Actually I talked with Honza last week about usage of working sets
and problems it has. Your sample nicely illustrates one of them:

As you have a really dominant edge and the rest of program has sum really
really small, then hotness is equal to execution count of the maximal edge.
Which is obviously very wrong even in case where loop unrolling is asking for a
split edge with fraction of the frequency.

Just for your information, before the current state (r193747), we used to
compute the hotness threshold as follows:

profile_info->sum_max / PARAM_VALUE (HOT_BB_COUNT_FRACTION)

where the param used to have value 1. That results in value 100. By the
way, I believe the value should be also divided by profile_info->runs (number
of runs) as sum_max is increasing with # of a binary is executed.

Second issue I see is quite huge performance overhead during instrumentation
run. For programs that are executed repeatedly, one can see in perf top:

11.23%  git [.] gcov_do_dump
 9.14%  git [.] __gcov_write_summary
 5.60%  libc-2.25.so[.] __memset_sse2_unaligned_erms
 3.82%  git [.] __gcov_read_summary

and of course it occupies space both in profile and an instrumented binary.

That said I'm planning to test the original mechanism and compare it to the
current one.
And we can add an option to switch in between these 2 methods.