[Bug gcov-profile/89307] -fprofile-generate binary may be too slow in multithreaded environment due to cache-line conflicts on counters

2023-07-07 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89307

Richard Biener  changed:

   What|Removed |Added

   Target Milestone|10.5|---

[Bug gcov-profile/89307] -fprofile-generate binary may be too slow in multithreaded environment due to cache-line conflicts on counters

2022-06-28 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89307

Jakub Jelinek  changed:

   What|Removed |Added

   Target Milestone|10.4|10.5

--- Comment #11 from Jakub Jelinek  ---
GCC 10.4 is being released, retargeting bugs to GCC 10.5.

[Bug gcov-profile/89307] -fprofile-generate binary may be too slow in multithreaded environment due to cache-line conflicts on counters

2021-04-08 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89307

Richard Biener  changed:

   What|Removed |Added

   Target Milestone|10.3|10.4

--- Comment #10 from Richard Biener  ---
GCC 10.3 is being released, retargeting bugs to GCC 10.4.

[Bug gcov-profile/89307] -fprofile-generate binary may be too slow in multithreaded environment due to cache-line conflicts on counters

2020-07-22 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89307

Richard Biener  changed:

   What|Removed |Added

   Target Milestone|10.2|10.3

--- Comment #9 from Richard Biener  ---
GCC 10.2 is released, adjusting target milestone.

[Bug gcov-profile/89307] -fprofile-generate binary may be too slow in multithreaded environment due to cache-line conflicts on counters

2020-05-07 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89307

Jakub Jelinek  changed:

   What|Removed |Added

   Target Milestone|10.0|10.2

--- Comment #8 from Jakub Jelinek  ---
GCC 10.1 has been released.

[Bug gcov-profile/89307] -fprofile-generate binary may be too slow in multithreaded environment due to cache-line conflicts on counters

2019-02-18 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89307

--- Comment #7 from Richard Biener  ---
Btw, use of TLS has

 * size of counters overhead (one could use char sized TLS counters and
   update the global ones with locking on overflow)
 * tear-down/build-up cost at thread termination/creation

the advantage is of course it's simple implementation-wise.

[Bug gcov-profile/89307] -fprofile-generate binary may be too slow in multithreaded environment due to cache-line conflicts on counters

2019-02-14 Thread hubicka at ucw dot cz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89307

--- Comment #6 from Jan Hubicka  ---
> Ah now, it's really doing sampling. I guess it can lead to quite some profile
> inconsistencies..
Yep, it is not coolest solution. I would not worry too much about
precision loss unless you get some weird interference between the
sampling counter and actual program behaviour.  Adding conditionals
everywhere is not very good and I am not sure how well CPU will predict
such branches.

Honza

[Bug gcov-profile/89307] -fprofile-generate binary may be too slow in multithreaded environment due to cache-line conflicts on counters

2019-02-14 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89307

--- Comment #5 from Martin Liška  ---
> 
> which effectively updates edge counters just for a limited time. I would
> expect

Ah now, it's really doing sampling. I guess it can lead to quite some profile
inconsistencies..

[Bug gcov-profile/89307] -fprofile-generate binary may be too slow in multithreaded environment due to cache-line conflicts on counters

2019-02-14 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89307

--- Comment #4 from Martin Liška  ---
I'm just looking at the google/gcc-4.9 branch:
https://android.googlesource.com/toolchain/gcc/+/master/gcc-4.9/

and they have a sampling approach:

/* Transform:

   ORIGINAL CODE

   Into:

   __gcov_sample_counter++;
   if (__gcov_sample_counter >= __gcov_sampling_period)
 {
   __gcov_sample_counter = 0;
   ORIGINAL CODE
 }

which effectively updates edge counters just for a limited time. I would expect
size increase:

Removing basic block 9
Removing basic block 10
main (int argc)
{
  unsigned int PROF_sample.2;
  unsigned int PROF_sample.1;
  long int PROF_edge_counter_6;
  long int PROF_edge_counter_7;
  long int PROF_edge_counter_8;
  long int PROF_edge_counter_9;

  :
  __gcov_indirect_call_profiler_v2 (1005944783, main);
  __gcov_indirect_call_callee = 0B;
  if (argc_2(D) != 0)
goto ;
  else
goto ;

  :
  a = 123;
  PROF_sample.2_13 = __gcov_sample_counter;
  PROF_sample.2_14 = PROF_sample.2_13 + 1;
  __gcov_sample_counter = PROF_sample.2_14;
  PROF_sample.2_15 = __gcov_sampling_period;
  if (PROF_sample.2_14 >= PROF_sample.2_15)
goto ;
  else
goto ;

  :
  goto ;

  :
  __gcov_sample_counter = 0;
  PROF_edge_counter_6 = __gcov0.main[0];
  PROF_edge_counter_7 = PROF_edge_counter_6 + 1;
  __gcov0.main[0] = PROF_edge_counter_7;
  goto ;

  :
  a = 0;
  PROF_sample.1_10 = __gcov_sample_counter;
  PROF_sample.1_11 = PROF_sample.1_10 + 1;
  __gcov_sample_counter = PROF_sample.1_11;
  PROF_sample.1_12 = __gcov_sampling_period;
  if (PROF_sample.1_11 >= PROF_sample.1_12)
goto ;
  else
goto ;

  :
  __gcov_sample_counter = 0;
  PROF_edge_counter_8 = __gcov0.main[1];
  PROF_edge_counter_9 = PROF_edge_counter_8 + 1;
  __gcov0.main[1] = PROF_edge_counter_9;

  :
  return 0;
}

[Bug gcov-profile/89307] -fprofile-generate binary may be too slow in multithreaded environment due to cache-line conflicts on counters

2019-02-14 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89307

--- Comment #3 from Martin Liška  ---
(In reply to Jan Hubicka from comment #2)
> Created attachment 45703 [details]
> patch for tls counters (incomplete - no runtime bits)

Isn't the patch only a refactoring that is eliminating tls_model from
tree_decl_with_vis and moving that into cgraph_node?

[Bug gcov-profile/89307] -fprofile-generate binary may be too slow in multithreaded environment due to cache-line conflicts on counters

2019-02-13 Thread hubicka at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89307

--- Comment #2 from Jan Hubicka  ---
Created attachment 45703
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=45703&action=edit
patch for tls counters (incomplete - no runtime bits)

Also I think google's code to reduce cacheline conflicts is
https://gcc.gnu.org/ml/gcc-patches/2012-05/msg00959.html

[Bug gcov-profile/89307] -fprofile-generate binary may be too slow in multithreaded environment due to cache-line conflicts on counters

2019-02-12 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89307

Martin Liška  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2019-02-13
   Target Milestone|--- |10.0
 Ever confirmed|0   |1

--- Comment #1 from Martin Liška  ---
Can you please attach WIP patch you have?