[Bug target/85829] [8/9 Regression] PARTIAL_REG_DEPENDENCY and MOVX were disabled for Haswell and newer processors
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85829 H.J. Lu changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #8 from H.J. Lu --- Fixed for GCC 9 and GCC 8.2.
[Bug target/85829] [8/9 Regression] PARTIAL_REG_DEPENDENCY and MOVX were disabled for Haswell and newer processors
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85829 --- Comment #7 from hjl at gcc dot gnu.org --- Author: hjl Date: Thu May 31 15:37:22 2018 New Revision: 261028 URL: https://gcc.gnu.org/viewcvs?rev=261028=gcc=rev Log: x86: Re-enable partial_reg_dependency and movx for Haswell r254152 disabled partial_reg_dependency and movx for Haswell and newer Intel processors. r258972 restored them for skylake-avx512. For Haswell, movx improves performance. But partial_reg_stall may be better than partial_reg_dependency in theory. We will investigate performance impact of partial_reg_stall vs partial_reg_dependency on Haswell for GCC 9. In the meantime, this patch restores both partial_reg_dependency and mox for Haswell in GCC 8. On Haswell, improvements for EEMBC benchmarks with -mtune-ctrl=movx,partial_reg_dependency -Ofast -march=haswell vs -Ofast -mtune=haswell are automotive = aifftr01 (default) - goodperf: Runtime improvement of 2.6% (time). aiifft01 (default) - goodperf: Runtime improvement of 2.2% (time). networking = ip_pktcheckb1m (default) - goodperf: Runtime improvement of 3.8% (time). ip_pktcheckb2m (default) - goodperf: Runtime improvement of 5.2% (time). ip_pktcheckb4m (default) - goodperf: Runtime improvement of 4.4% (time). ip_pktcheckb512k (default) - goodperf: Runtime improvement of 4.2% (time). telecom = fft00data_1 (default) - goodperf: Runtime improvement of 8.4% (time). fft00data_2 (default) - goodperf: Runtime improvement of 8.6% (time). fft00data_3 (default) - goodperf: Runtime improvement of 9.0% (time). PR target/85829 * config/i386/x86-tune.def: Re-enable partial_reg_dependency and movx for Haswell. Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/x86-tune.def
[Bug target/85829] [8/9 Regression] PARTIAL_REG_DEPENDENCY and MOVX were disabled for Haswell and newer processors
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85829 --- Comment #5 from Sebastian Peryt --- I have made measurements on HSW comparing -mtune-ctrl=movx,partial_reg_dependency -Ofast -march=haswell to -Ofast -mtune=haswell and I see improvements on EEMBC benchmarks. automotive = aifftr01 (default) - goodperf: Runtime improvement of 2.6% (time). aiifft01 (default) - goodperf: Runtime improvement of 2.2% (time). networking = ip_pktcheckb1m (default) - goodperf: Runtime improvement of 3.8% (time). ip_pktcheckb2m (default) - goodperf: Runtime improvement of 5.2% (time). ip_pktcheckb4m (default) - goodperf: Runtime improvement of 4.4% (time). ip_pktcheckb512k (default) - goodperf: Runtime improvement of 4.2% (time). telecom = fft00data_1 (default) - goodperf: Runtime improvement of 8.4% (time). fft00data_2 (default) - goodperf: Runtime improvement of 8.6% (time). fft00data_3 (default) - goodperf: Runtime improvement of 9.0% (time). --- Comment #6 from hjl at gcc dot gnu.org --- Author: hjl Date: Thu May 31 15:02:36 2018 New Revision: 261026 URL: https://gcc.gnu.org/viewcvs?rev=261026=gcc=rev Log: x86: Re-enable partial_reg_dependency and movx for Haswell r254152 disabled partial_reg_dependency and movx for Haswell and newer Intel processors. r258972 restored them for skylake-avx512. For Haswell, movx improves performance. But partial_reg_stall may be better than partial_reg_dependency in theory. We will investigate performance impact of partial_reg_stall vs partial_reg_dependency on Haswell for GCC 9. In the meantime, this patch restores both partial_reg_dependency and mox for Haswell in GCC 8. On Haswell, improvements for EEMBC benchmarks with -mtune-ctrl=movx,partial_reg_dependency -Ofast -march=haswell vs -Ofast -mtune=haswell are automotive = aifftr01 (default) - goodperf: Runtime improvement of 2.6% (time). aiifft01 (default) - goodperf: Runtime improvement of 2.2% (time). networking = ip_pktcheckb1m (default) - goodperf: Runtime improvement of 3.8% (time). ip_pktcheckb2m (default) - goodperf: Runtime improvement of 5.2% (time). ip_pktcheckb4m (default) - goodperf: Runtime improvement of 4.4% (time). ip_pktcheckb512k (default) - goodperf: Runtime improvement of 4.2% (time). telecom = fft00data_1 (default) - goodperf: Runtime improvement of 8.4% (time). fft00data_2 (default) - goodperf: Runtime improvement of 8.6% (time). fft00data_3 (default) - goodperf: Runtime improvement of 9.0% (time). PR target/85829 * config/i386/x86-tune.def: Re-enable partial_reg_dependency and movx for Haswell. Modified: branches/gcc-8-branch/gcc/ChangeLog branches/gcc-8-branch/gcc/config/i386/x86-tune.def
[Bug target/85829] [8/9 Regression] PARTIAL_REG_DEPENDENCY and MOVX were disabled for Haswell and newer processors
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85829 Richard Biener changed: What|Removed |Added Keywords||missed-optimization Target Milestone|--- |8.2
[Bug target/85829] [8/9 Regression] PARTIAL_REG_DEPENDENCY and MOVX were disabled for Haswell and newer processors
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85829 --- Comment #4 from H.J. Lu --- (In reply to Jan Hubicka from comment #3) > > So I would suggest to revisit PARTIAL_REG_DEPENDENCY wrt PARTIAL_REG_STALL > for Haswell+ We should do that for GCC 9. For GCC 8, we should restore what we had before.
[Bug target/85829] [8/9 Regression] PARTIAL_REG_DEPENDENCY and MOVX were disabled for Haswell and newer processors
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85829 --- Comment #3 from Jan Hubicka --- > Haswell tuning was done many years ago. We really shouldn't change it. > For newer processors, we need to investigate PARTIAL_REG_DEPENDENCY vs > PARTIAL_REG_STALL. I have revisited the tunning options primarily to define more reasonable generic. For that I have revisited some flags which seems to have been set incorrectly. We run regular benchmarks on Haswell at https://gcc.opensuse.org/gcc-old/index.html (Czerny) and especially specfp2000 has improved noticeably past release cycle. https://gcc.opensuse.org/gcc-old/SPEC/CFP/sb-czerny-head-64/mean-fp_big.png There are quite few haswell chips around so I do not see why we should stop trying to improve code generated there plus it would be good to have fewer combinations enabled for differnt generations. So I would suggest to revisit PARTIAL_REG_DEPENDENCY wrt PARTIAL_REG_STALL for Haswell+ Honza
[Bug target/85829] [8/9 Regression] PARTIAL_REG_DEPENDENCY and MOVX were disabled for Haswell and newer processors
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85829 --- Comment #2 from H.J. Lu --- Haswell tuning was done many years ago. We really shouldn't change it. For newer processors, we need to investigate PARTIAL_REG_DEPENDENCY vs PARTIAL_REG_STALL.
[Bug target/85829] [8/9 Regression] PARTIAL_REG_DEPENDENCY and MOVX were disabled for Haswell and newer processors
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85829 Jan Hubicka changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2018-05-18 CC||hubicka at gcc dot gnu.org Ever confirmed|0 |1 --- Comment #1 from Jan Hubicka --- It would be nice to have an examples where they are needed. The motivation to disable them was optimization manuals claiming that Haswell+ handles partial reg stalls better than old cores. Also PARTIAL_REG_DEPENDENCY does not really fit the hardware design of cores which are partial reg stall architecture. So in theory if we enable something it should be the PARTIAL_REG_STALL flag.