[Bug gcov-profile/89307] -fprofile-generate binary may be too slow in multithreaded environment due to cache-line conflicts on counters
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89307 Richard Biener changed: What|Removed |Added Target Milestone|10.5|---
[Bug gcov-profile/89307] -fprofile-generate binary may be too slow in multithreaded environment due to cache-line conflicts on counters
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89307 Jakub Jelinek changed: What|Removed |Added Target Milestone|10.4|10.5 --- Comment #11 from Jakub Jelinek --- GCC 10.4 is being released, retargeting bugs to GCC 10.5.
[Bug gcov-profile/89307] -fprofile-generate binary may be too slow in multithreaded environment due to cache-line conflicts on counters
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89307 Richard Biener changed: What|Removed |Added Target Milestone|10.3|10.4 --- Comment #10 from Richard Biener --- GCC 10.3 is being released, retargeting bugs to GCC 10.4.
[Bug gcov-profile/89307] -fprofile-generate binary may be too slow in multithreaded environment due to cache-line conflicts on counters
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89307 Richard Biener changed: What|Removed |Added Target Milestone|10.2|10.3 --- Comment #9 from Richard Biener --- GCC 10.2 is released, adjusting target milestone.
[Bug gcov-profile/89307] -fprofile-generate binary may be too slow in multithreaded environment due to cache-line conflicts on counters
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89307 Jakub Jelinek changed: What|Removed |Added Target Milestone|10.0|10.2 --- Comment #8 from Jakub Jelinek --- GCC 10.1 has been released.
[Bug gcov-profile/89307] -fprofile-generate binary may be too slow in multithreaded environment due to cache-line conflicts on counters
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89307 --- Comment #7 from Richard Biener --- Btw, use of TLS has * size of counters overhead (one could use char sized TLS counters and update the global ones with locking on overflow) * tear-down/build-up cost at thread termination/creation the advantage is of course it's simple implementation-wise.
[Bug gcov-profile/89307] -fprofile-generate binary may be too slow in multithreaded environment due to cache-line conflicts on counters
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89307 --- Comment #6 from Jan Hubicka --- > Ah now, it's really doing sampling. I guess it can lead to quite some profile > inconsistencies.. Yep, it is not coolest solution. I would not worry too much about precision loss unless you get some weird interference between the sampling counter and actual program behaviour. Adding conditionals everywhere is not very good and I am not sure how well CPU will predict such branches. Honza
[Bug gcov-profile/89307] -fprofile-generate binary may be too slow in multithreaded environment due to cache-line conflicts on counters
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89307 --- Comment #5 from Martin Liška --- > > which effectively updates edge counters just for a limited time. I would > expect Ah now, it's really doing sampling. I guess it can lead to quite some profile inconsistencies..
[Bug gcov-profile/89307] -fprofile-generate binary may be too slow in multithreaded environment due to cache-line conflicts on counters
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89307 --- Comment #4 from Martin Liška --- I'm just looking at the google/gcc-4.9 branch: https://android.googlesource.com/toolchain/gcc/+/master/gcc-4.9/ and they have a sampling approach: /* Transform: ORIGINAL CODE Into: __gcov_sample_counter++; if (__gcov_sample_counter >= __gcov_sampling_period) { __gcov_sample_counter = 0; ORIGINAL CODE } which effectively updates edge counters just for a limited time. I would expect size increase: Removing basic block 9 Removing basic block 10 main (int argc) { unsigned int PROF_sample.2; unsigned int PROF_sample.1; long int PROF_edge_counter_6; long int PROF_edge_counter_7; long int PROF_edge_counter_8; long int PROF_edge_counter_9; : __gcov_indirect_call_profiler_v2 (1005944783, main); __gcov_indirect_call_callee = 0B; if (argc_2(D) != 0) goto ; else goto ; : a = 123; PROF_sample.2_13 = __gcov_sample_counter; PROF_sample.2_14 = PROF_sample.2_13 + 1; __gcov_sample_counter = PROF_sample.2_14; PROF_sample.2_15 = __gcov_sampling_period; if (PROF_sample.2_14 >= PROF_sample.2_15) goto ; else goto ; : goto ; : __gcov_sample_counter = 0; PROF_edge_counter_6 = __gcov0.main[0]; PROF_edge_counter_7 = PROF_edge_counter_6 + 1; __gcov0.main[0] = PROF_edge_counter_7; goto ; : a = 0; PROF_sample.1_10 = __gcov_sample_counter; PROF_sample.1_11 = PROF_sample.1_10 + 1; __gcov_sample_counter = PROF_sample.1_11; PROF_sample.1_12 = __gcov_sampling_period; if (PROF_sample.1_11 >= PROF_sample.1_12) goto ; else goto ; : __gcov_sample_counter = 0; PROF_edge_counter_8 = __gcov0.main[1]; PROF_edge_counter_9 = PROF_edge_counter_8 + 1; __gcov0.main[1] = PROF_edge_counter_9; : return 0; }
[Bug gcov-profile/89307] -fprofile-generate binary may be too slow in multithreaded environment due to cache-line conflicts on counters
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89307 --- Comment #3 from Martin Liška --- (In reply to Jan Hubicka from comment #2) > Created attachment 45703 [details] > patch for tls counters (incomplete - no runtime bits) Isn't the patch only a refactoring that is eliminating tls_model from tree_decl_with_vis and moving that into cgraph_node?
[Bug gcov-profile/89307] -fprofile-generate binary may be too slow in multithreaded environment due to cache-line conflicts on counters
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89307 --- Comment #2 from Jan Hubicka --- Created attachment 45703 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=45703&action=edit patch for tls counters (incomplete - no runtime bits) Also I think google's code to reduce cacheline conflicts is https://gcc.gnu.org/ml/gcc-patches/2012-05/msg00959.html
[Bug gcov-profile/89307] -fprofile-generate binary may be too slow in multithreaded environment due to cache-line conflicts on counters
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89307 Martin Liška changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2019-02-13 Target Milestone|--- |10.0 Ever confirmed|0 |1 --- Comment #1 from Martin Liška --- Can you please attach WIP patch you have?