[Bug target/85829] [8/9 Regression] PARTIAL_REG_DEPENDENCY and MOVX were disabled for Haswell and newer processors

2018-05-31 Thread hjl.tools at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85829

H.J. Lu  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #8 from H.J. Lu  ---
Fixed for GCC 9 and GCC 8.2.

[Bug target/85829] [8/9 Regression] PARTIAL_REG_DEPENDENCY and MOVX were disabled for Haswell and newer processors

2018-05-31 Thread hjl at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85829

--- Comment #7 from hjl at gcc dot gnu.org  ---
Author: hjl
Date: Thu May 31 15:37:22 2018
New Revision: 261028

URL: https://gcc.gnu.org/viewcvs?rev=261028=gcc=rev
Log:
x86: Re-enable partial_reg_dependency and movx for Haswell

r254152 disabled partial_reg_dependency and movx for Haswell and newer
Intel processors.  r258972 restored them for skylake-avx512.  For Haswell,
movx improves performance.  But partial_reg_stall may be better than
partial_reg_dependency in theory.  We will investigate performance impact
of partial_reg_stall vs partial_reg_dependency on Haswell for GCC 9.  In
the meantime, this patch restores both partial_reg_dependency and mox for
Haswell in GCC 8.

On Haswell, improvements for EEMBC benchmarks with

-mtune-ctrl=movx,partial_reg_dependency -Ofast -march=haswell

vs

-Ofast -mtune=haswell

are

automotive
=
  aifftr01 (default) - goodperf: Runtime improvement of   2.6% (time).
  aiifft01 (default) - goodperf: Runtime improvement of   2.2% (time).

networking
=
  ip_pktcheckb1m (default) - goodperf: Runtime improvement of   3.8% (time).
  ip_pktcheckb2m (default) - goodperf: Runtime improvement of   5.2% (time).
  ip_pktcheckb4m (default) - goodperf: Runtime improvement of   4.4% (time).
  ip_pktcheckb512k (default) - goodperf: Runtime improvement of   4.2% (time).

telecom
=
  fft00data_1 (default) - goodperf: Runtime improvement of   8.4% (time).
  fft00data_2 (default) - goodperf: Runtime improvement of   8.6% (time).
  fft00data_3 (default) - goodperf: Runtime improvement of   9.0% (time).

PR target/85829
* config/i386/x86-tune.def: Re-enable partial_reg_dependency
and movx for Haswell.

Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/i386/x86-tune.def

[Bug target/85829] [8/9 Regression] PARTIAL_REG_DEPENDENCY and MOVX were disabled for Haswell and newer processors

2018-05-31 Thread hjl at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85829

--- Comment #5 from Sebastian Peryt  ---
I have made measurements on HSW comparing
-mtune-ctrl=movx,partial_reg_dependency -Ofast -march=haswell to -Ofast
-mtune=haswell and I see improvements on EEMBC benchmarks.

automotive
=
  aifftr01 (default) - goodperf: Runtime improvement of   2.6% (time).
  aiifft01 (default) - goodperf: Runtime improvement of   2.2% (time).

networking
=
  ip_pktcheckb1m (default) - goodperf: Runtime improvement of   3.8% (time).
  ip_pktcheckb2m (default) - goodperf: Runtime improvement of   5.2% (time).
  ip_pktcheckb4m (default) - goodperf: Runtime improvement of   4.4% (time).
  ip_pktcheckb512k (default) - goodperf: Runtime improvement of   4.2% (time).

telecom
=
  fft00data_1 (default) - goodperf: Runtime improvement of   8.4% (time).
  fft00data_2 (default) - goodperf: Runtime improvement of   8.6% (time).
  fft00data_3 (default) - goodperf: Runtime improvement of   9.0% (time).

--- Comment #6 from hjl at gcc dot gnu.org  ---
Author: hjl
Date: Thu May 31 15:02:36 2018
New Revision: 261026

URL: https://gcc.gnu.org/viewcvs?rev=261026=gcc=rev
Log:
x86: Re-enable partial_reg_dependency and movx for Haswell

r254152 disabled partial_reg_dependency and movx for Haswell and newer
Intel processors.  r258972 restored them for skylake-avx512.  For Haswell,
movx improves performance.  But partial_reg_stall may be better than
partial_reg_dependency in theory.  We will investigate performance impact
of partial_reg_stall vs partial_reg_dependency on Haswell for GCC 9.  In
the meantime, this patch restores both partial_reg_dependency and mox for
Haswell in GCC 8.

On Haswell, improvements for EEMBC benchmarks with

-mtune-ctrl=movx,partial_reg_dependency -Ofast -march=haswell

vs

-Ofast -mtune=haswell

are

automotive
=
  aifftr01 (default) - goodperf: Runtime improvement of   2.6% (time).
  aiifft01 (default) - goodperf: Runtime improvement of   2.2% (time).

networking
=
  ip_pktcheckb1m (default) - goodperf: Runtime improvement of   3.8% (time).
  ip_pktcheckb2m (default) - goodperf: Runtime improvement of   5.2% (time).
  ip_pktcheckb4m (default) - goodperf: Runtime improvement of   4.4% (time).
  ip_pktcheckb512k (default) - goodperf: Runtime improvement of   4.2% (time).

telecom
=
  fft00data_1 (default) - goodperf: Runtime improvement of   8.4% (time).
  fft00data_2 (default) - goodperf: Runtime improvement of   8.6% (time).
  fft00data_3 (default) - goodperf: Runtime improvement of   9.0% (time).

PR target/85829
* config/i386/x86-tune.def: Re-enable partial_reg_dependency
and movx for Haswell.

Modified:
branches/gcc-8-branch/gcc/ChangeLog
branches/gcc-8-branch/gcc/config/i386/x86-tune.def

[Bug target/85829] [8/9 Regression] PARTIAL_REG_DEPENDENCY and MOVX were disabled for Haswell and newer processors

2018-05-22 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85829

Richard Biener  changed:

   What|Removed |Added

   Keywords||missed-optimization
   Target Milestone|--- |8.2

[Bug target/85829] [8/9 Regression] PARTIAL_REG_DEPENDENCY and MOVX were disabled for Haswell and newer processors

2018-05-20 Thread hjl.tools at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85829

--- Comment #4 from H.J. Lu  ---
(In reply to Jan Hubicka from comment #3)
> 
> So I would suggest to revisit PARTIAL_REG_DEPENDENCY wrt PARTIAL_REG_STALL
> for Haswell+

We should do that for GCC 9.  For GCC 8, we should restore what we had
before.

[Bug target/85829] [8/9 Regression] PARTIAL_REG_DEPENDENCY and MOVX were disabled for Haswell and newer processors

2018-05-18 Thread hubicka at ucw dot cz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85829

--- Comment #3 from Jan Hubicka  ---
> Haswell tuning was done many years ago.  We really shouldn't change it.
> For newer processors, we need to investigate PARTIAL_REG_DEPENDENCY vs
> PARTIAL_REG_STALL.
I have revisited the tunning options primarily to define more reasonable
generic.
For that I have revisited some flags which seems to have been set incorrectly.
We run regular benchmarks on Haswell at
https://gcc.opensuse.org/gcc-old/index.html
(Czerny) and especially specfp2000 has improved noticeably past release cycle.
https://gcc.opensuse.org/gcc-old/SPEC/CFP/sb-czerny-head-64/mean-fp_big.png

There are quite few haswell chips around so I do not see why we should stop
trying to improve code generated there plus it would be good to have fewer
combinations enabled for differnt generations.

So I would suggest to revisit PARTIAL_REG_DEPENDENCY wrt PARTIAL_REG_STALL
for Haswell+

Honza

[Bug target/85829] [8/9 Regression] PARTIAL_REG_DEPENDENCY and MOVX were disabled for Haswell and newer processors

2018-05-18 Thread hjl.tools at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85829

--- Comment #2 from H.J. Lu  ---
Haswell tuning was done many years ago.  We really shouldn't change it.
For newer processors, we need to investigate PARTIAL_REG_DEPENDENCY vs
PARTIAL_REG_STALL.

[Bug target/85829] [8/9 Regression] PARTIAL_REG_DEPENDENCY and MOVX were disabled for Haswell and newer processors

2018-05-18 Thread hubicka at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85829

Jan Hubicka  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2018-05-18
 CC||hubicka at gcc dot gnu.org
 Ever confirmed|0   |1

--- Comment #1 from Jan Hubicka  ---
It would be nice to have an examples where they are needed.  The motivation to
disable them was optimization manuals claiming that Haswell+ handles partial
reg stalls better than old cores.
Also PARTIAL_REG_DEPENDENCY does not really fit the hardware design of cores
which are partial reg stall architecture. So in theory if we enable something
it should be the PARTIAL_REG_STALL flag.