[Bug gcov-profile/113646] PGO hurts run-time of 538.imagick_r as much as 68% at -Ofast -march=native

2024-02-01 Thread hubicka at ucw dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113646

--- Comment #4 from Jan Hubicka  ---
> 
> With -fprofile-partial-training the znver4 LTO vs LTOPGO regression (on a 
> newer
> master) goes down from 66% to 54%.  
> 
> So far I did not find a way to easily train with the reference run (when I add
> "train_with = refrate" to the config, I always get "ERROR: The workload
> specified by train_with MUST be a training workload!")

I do that with a crude hack of simply rewriting training data files with
reference versions in SPEC directly.   I believe that here problem must
be that with PGO we confuse vectorizer somehow.

I did not know there is train_with option.  Perhaps hacking the spec
driver to not output error is easy enough

[Bug gcov-profile/113646] PGO hurts run-time of 538.imagick_r as much as 68% at -Ofast -march=native

2024-01-31 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113646

--- Comment #3 from Martin Jambor  ---
(In reply to Richard Biener from comment #1)
> Did you try with -fprofile-partial-training (is that default on?  it
> probably should ...).  Can you please try training with the rate data
> instead of train
> to rule out a mismatch?

With -fprofile-partial-training the znver4 LTO vs LTOPGO regression (on a newer
master) goes down from 66% to 54%.  

So far I did not find a way to easily train with the reference run (when I add
"train_with = refrate" to the config, I always get "ERROR: The workload
specified by train_with MUST be a training workload!")

[Bug gcov-profile/113646] PGO hurts run-time of 538.imagick_r as much as 68% at -Ofast -march=native

2024-01-29 Thread hubicka at ucw dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113646

--- Comment #2 from Jan Hubicka  ---
> Did you try with -fprofile-partial-training (is that default on?  it probably
> should ...).  Can you please try training with the rate data instead of train

It is not on by default - the problem of partial training is that it
mostly nullifies any code size benefits from profile-use and that is
relatively noticebale aspect of it in real-world situations (like
for GCC itself or Firefox the overall size of binary matters).

I need to work on this more, but now we have two-state optimize_size
predicates and with level 1 we can turn off those -Os optimizations that
make large tradeoffs of performance for size optimization.

Honza
> to rule out a mismatch?

[Bug gcov-profile/113646] PGO hurts run-time of 538.imagick_r as much as 68% at -Ofast -march=native

2024-01-29 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113646

Richard Biener  changed:

   What|Removed |Added

   Keywords||missed-optimization

--- Comment #1 from Richard Biener  ---
Did you try with -fprofile-partial-training (is that default on?  it probably
should ...).  Can you please try training with the rate data instead of train
to rule out a mismatch?