[Bug gcov-profile/113646] PGO hurts run-time of 538.imagick_r as much as 68% at -Ofast -march=native
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113646 --- Comment #4 from Jan Hubicka --- > > With -fprofile-partial-training the znver4 LTO vs LTOPGO regression (on a > newer > master) goes down from 66% to 54%. > > So far I did not find a way to easily train with the reference run (when I add > "train_with = refrate" to the config, I always get "ERROR: The workload > specified by train_with MUST be a training workload!") I do that with a crude hack of simply rewriting training data files with reference versions in SPEC directly. I believe that here problem must be that with PGO we confuse vectorizer somehow. I did not know there is train_with option. Perhaps hacking the spec driver to not output error is easy enough
[Bug gcov-profile/113646] PGO hurts run-time of 538.imagick_r as much as 68% at -Ofast -march=native
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113646 --- Comment #3 from Martin Jambor --- (In reply to Richard Biener from comment #1) > Did you try with -fprofile-partial-training (is that default on? it > probably should ...). Can you please try training with the rate data > instead of train > to rule out a mismatch? With -fprofile-partial-training the znver4 LTO vs LTOPGO regression (on a newer master) goes down from 66% to 54%. So far I did not find a way to easily train with the reference run (when I add "train_with = refrate" to the config, I always get "ERROR: The workload specified by train_with MUST be a training workload!")
[Bug gcov-profile/113646] PGO hurts run-time of 538.imagick_r as much as 68% at -Ofast -march=native
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113646 --- Comment #2 from Jan Hubicka --- > Did you try with -fprofile-partial-training (is that default on? it probably > should ...). Can you please try training with the rate data instead of train It is not on by default - the problem of partial training is that it mostly nullifies any code size benefits from profile-use and that is relatively noticebale aspect of it in real-world situations (like for GCC itself or Firefox the overall size of binary matters). I need to work on this more, but now we have two-state optimize_size predicates and with level 1 we can turn off those -Os optimizations that make large tradeoffs of performance for size optimization. Honza > to rule out a mismatch?
[Bug gcov-profile/113646] PGO hurts run-time of 538.imagick_r as much as 68% at -Ofast -march=native
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113646 Richard Biener changed: What|Removed |Added Keywords||missed-optimization --- Comment #1 from Richard Biener --- Did you try with -fprofile-partial-training (is that default on? it probably should ...). Can you please try training with the rate data instead of train to rule out a mismatch?