[Bug middle-end/77484] [6/7 Regression] Static branch predictor causes ~6-8% regression of SPEC2000 GAP

2017-02-02 Thread hubicka at ucw dot cz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77484

--- Comment #39 from Jan Hubicka  ---
> Finally, the total between after the last and before the first patch.  
> Overall,
> some tests gain some performance and others lose some.  The total number of
> instructions has grown somewhat (especially tonto, calculix, dealII and wrf),
> but there's no obvious connection between an increased number of instructions
> and loss of performance.
> 
> Is this what can be expected of the patches?

I would say so - the prediction controls a lot of different heuristics
and call predictor is quite weak (random) so it is expected to have bit random
effects.

I also can't see much of corelation in the tests, so I guess it is just
random noise.  Thanks for the tests!

Honza

[Bug middle-end/77484] [6/7 Regression] Static branch predictor causes ~6-8% regression of SPEC2000 GAP

2017-02-01 Thread vogt at linux dot vnet.ibm.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77484

--- Comment #38 from Dominik Vogt  ---
Finally, the total between after the last and before the first patch.  Overall,
some tests gain some performance and others lose some.  The total number of
instructions has grown somewhat (especially tonto, calculix, dealII and wrf),
but there's no obvious connection between an increased number of instructions
and loss of performance.

Is this what can be expected of the patches?

All compiled with -O3 -funroll-loops -march=zEC12.

r244260 vs. r243994
---
   run-old.resultrun-new.result
f410.bwaves 1.28s1.27s (  -0.78%,   0.79% )
f416.gamess 7.10s6.82s (  -3.94%,   4.11% )
f433.milc   5.53s5.53s (   0.00%,   0.00% )
f434.zeusmp 2.19s2.18s (  -0.46%,   0.46% )
f435.gromacs1.34s1.33s (  -0.75%,   0.75% )
f436.cactusADM 24.72s   24.80s (   0.32%,  -0.32% )
f437.leslie3d   2.76s2.75s (  -0.36%,   0.36% )
f444.namd  12.13s   12.13s (   0.00%,   0.00% )
f447.dealII 2.03s2.02s (  -0.49%,   0.50% )
f450.soplex 3.90s3.92s (   0.51%,  -0.51% )
f453.povray 2.88s2.86s (  -0.69%,   0.70% )
f454.calculix  17.32s   17.36s (   0.23%,  -0.23% )
f459.GemsFDTD   7.22s7.13s (  -1.25%,   1.26% )
f465.tonto  0.93s0.93s (   0.00%,   0.00% )
f470.lbm2.65s2.66s (   0.38%,  -0.38% )
f481.wrf3.84s3.84s (   0.00%,   0.00% )
f482.sphinx3   10.49s   10.56s (   0.67%,  -0.66% )
i400.perlbench  7.58s7.25s (  -4.35%,   4.55% )
i401.bzip2  3.98s3.96s (  -0.50%,   0.51% )
i403.gcc1.00s1.01s (   1.00%,  -0.99% )
i429.mcf1.49s1.49s (   0.00%,   0.00% )
i445.gobmk  3.55s3.53s (  -0.56%,   0.57% )
i456.hmmer  1.56s1.55s (  -0.64%,   0.65% )
i458.sjeng  3.81s3.79s (  -0.52%,   0.53% )
i462.libquantum17.12s   17.11s (  -0.06%,   0.06% )
i464.h264ref3.14s3.17s (   0.96%,  -0.95% )
i471.omnetpp   11.39s   11.52s (   1.14%,  -1.13% )
i473.astar  7.22s7.26s (   0.55%,  -0.55% )
i483.xalancbmk  7.62s7.69s (   0.92%,  -0.91% )

--

f470.lbm 2984 insns identical
i429.mcf 4165 insns -4 smaller
i462.libquantum 11735 insns +0 changed
i473.astar 12460 insns +32 BIGGER!, 2 funcs bigger (max +79 insns)
f410.bwaves 9820 insns +7 BIGGER!, 1 funcs bigger (max +7 insns)
i401.bzip2 22439 insns -63 smaller
f437.leslie3d 28725 insns +9 BIGGER!, 5 funcs bigger (max +19 insns)
i458.sjeng 38864 insns -26 smaller, 2 funcs bigger (max +24 insns)
f433.milc 35091 insns -70 smaller, 1 funcs bigger (max +5 insns)
f482.sphinx3 51879 insns +4 BIGGER!, 5 funcs bigger (max +15 insns)
i456.hmmer 85157 insns -33 smaller, 4 funcs bigger (max +91 insns)
f444.namd 76220 insns -3 smaller
f434.zeusmp 73937 insns +43 BIGGER!, 3 funcs bigger (max +27 insns)
f459.GemsFDTD 111465 insns +84 BIGGER!, 5 funcs bigger (max +57 insns)
f436.cactusADM 201648 insns +125 BIGGER!, 37 funcs bigger (max +68 insns)
f435.gromacs 250725 insns -53 smaller, 11 funcs bigger (max +25 insns)
i471.omnetpp 135992 insns -435 smaller, 15 funcs bigger (max +80 insns)
i445.gobmk 249112 insns -1167 smaller, 16 funcs bigger (max +82 insns)
f450.soplex 131531 insns -558 smaller, 22 funcs bigger (max +18 insns)
f453.povray 247399 insns -48 smaller, 3 funcs bigger (max +92 insns)
i400.perlbench 305683 insns -216 smaller, 51 funcs bigger (max +554 insns)
f454.calculix 478026 insns +485 BIGGER!, 22 funcs bigger (max +157 insns)
i464.h264ref 316483 insns -76 smaller, 8 funcs bigger (max +76 insns)
i403.gcc 800574 insns -782 smaller, 100 funcs bigger (max +1674 insns)
f465.tonto 1138432 insns +2511 BIGGER!, 235 funcs bigger (max +455 insns)
f447.dealII 764322 insns +597 BIGGER!, 171 funcs bigger (max +295 insns)
f481.wrf 1081604 insns +2769 BIGGER!, 141 funcs bigger (max +2329 insns)
i483.xalancbmk 919758 insns -483 smaller, 277 funcs bigger (max +1002 insns)
f416.gamess 2553939 insns -1589 smaller, 127 funcs bigger (max +46 insns)

statistics:
---
29  tests (total)
11  test executables have grown (more insns)
16  test executables have shrunk (fewer insns)
10140169insns total (old)
+1060   insns difference
+104insns per 1,000,000
-360weighted insns per 1,000,000 *
1264

[Bug middle-end/77484] [6/7 Regression] Static branch predictor causes ~6-8% regression of SPEC2000 GAP

2017-02-01 Thread vogt at linux dot vnet.ibm.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77484

--- Comment #37 from Dominik Vogt  ---
r244260 vs. r244256 (comment 25)
---
   run-old.resultrun-new.result
f410.bwaves 1.27s1.27s (   0.00%,   0.00% )
f416.gamess 6.80s6.82s (   0.29%,  -0.29% )
f433.milc   5.56s5.53s (  -0.54%,   0.54% )
f434.zeusmp 2.18s2.18s (   0.00%,   0.00% )
f435.gromacs1.36s1.33s (  -2.21%,   2.26% )
f436.cactusADM 24.66s   24.75s (   0.36%,  -0.36% )
f437.leslie3d   2.76s2.75s (  -0.36%,   0.36% )
f444.namd  12.13s   12.13s (   0.00%,   0.00% )
f447.dealII 2.05s2.01s (  -1.95%,   1.99% )
f450.soplex 3.97s3.92s (  -1.26%,   1.28% )
f453.povray 2.91s2.86s (  -1.72%,   1.75% )
f454.calculix  17.28s   17.36s (   0.46%,  -0.46% )
f459.GemsFDTD   7.28s7.14s (  -1.92%,   1.96% )
f465.tonto  0.94s0.94s (   0.00%,   0.00% )
f470.lbm2.66s2.65s (  -0.38%,   0.38% )
f481.wrf3.84s3.84s (   0.00%,   0.00% )
f482.sphinx3   10.59s   10.61s (   0.19%,  -0.19% )
i400.perlbench  7.49s7.30s (  -2.54%,   2.60% )
i401.bzip2  3.97s3.96s (  -0.25%,   0.25% )
i403.gcc1.01s1.01s (   0.00%,   0.00% )
i429.mcf1.49s1.49s (   0.00%,   0.00% )
i445.gobmk  3.61s3.53s (  -2.22%,   2.27% )
i456.hmmer  1.56s1.57s (   0.64%,  -0.64% )
i458.sjeng  3.77s3.79s (   0.53%,  -0.53% )
i462.libquantum17.13s   17.08s (  -0.29%,   0.29% )
i464.h264ref3.30s3.17s (  -3.94%,   4.10% )
i471.omnetpp   11.38s   11.52s (   1.23%,  -1.22% )
i473.astar  7.58s7.26s (  -4.22%,   4.41% )
i483.xalancbmk  7.53s7.73s (   2.66%,  -2.59% )

--

f470.lbm 2984 insns +0 changed
i429.mcf 4506 insns -346 smaller, 1 funcs bigger (max +2 insns)
i462.libquantum 11728 insns +7 BIGGER!, 5 funcs bigger (max +4 insns)
i473.astar 12309 insns +182 BIGGER!, 8 funcs bigger (max +109 insns)
i401.bzip2 22375 insns +11 BIGGER!, 20 funcs bigger (max +25 insns)
f437.leslie3d 28715 insns +21 BIGGER!, 2 funcs bigger (max +18 insns)
i458.sjeng 38693 insns +145 BIGGER!, 15 funcs bigger (max +69 insns)
f433.milc 34740 insns +265 BIGGER!, 49 funcs bigger (max +72 insns)
f482.sphinx3 52048 insns -148 smaller, 37 funcs bigger (max +195 insns)
i456.hmmer 84420 insns +676 BIGGER!, 61 funcs bigger (max +518 insns)
f444.namd 76218 insns +0 changed, 1 funcs bigger (max +11 insns)
f434.zeusmp 73993 insns -7 smaller, 1 funcs bigger (max +1 insns)
f459.GemsFDTD 111458 insns +85 BIGGER!, 9 funcs bigger (max +89 insns)
f436.cactusADM 201167 insns +608 BIGGER!, 86 funcs bigger (max +264 insns)
f435.gromacs 249275 insns +1416 BIGGER!, 104 funcs bigger (max +978 insns)
i471.omnetpp 137902 insns -2351 smaller, 64 funcs bigger (max +410 insns)
i445.gobmk 247898 insns +57 BIGGER!, 182 funcs bigger (max +782 insns)
f450.soplex 127631 insns +3348 BIGGER!, 56 funcs bigger (max +2104 insns)
f453.povray 245450 insns +1900 BIGGER!, 197 funcs bigger (max +2029 insns)
i400.perlbench 304835 insns +632 BIGGER!, 365 funcs bigger (max +930 insns)
f454.calculix 40 insns +714 BIGGER!, 182 funcs bigger (max +562 insns)
i464.h264ref 316389 insns -34 smaller, 61 funcs bigger (max +1116 insns)
i403.gcc 797389 insns +2408 BIGGER!, 503 funcs bigger (max +1371 insns)
f465.tonto 1141420 insns -449 smaller, 329 funcs bigger (max +874 insns)
f447.dealII 764299 insns +556 BIGGER!, 291 funcs bigger (max +1826 insns)
f481.wrf 1084747 insns -388 smaller, 196 funcs bigger (max +1552 insns)
i483.xalancbmk 919878 insns -411 smaller, 507 funcs bigger (max +1508 insns)
f416.gamess 2561829 insns -9468 smaller, 714 funcs bigger (max +1562 insns)

statistics:
---
29  tests (total)
17  test executables have grown (more insns)
9   test executables have shrunk (fewer insns)
10141892insns total (old)
-571insns difference
-56 insns per 1,000,000
-524weighted insns per 1,000,000 *
4046functions have grown (total) **
+2104   insns in most grown function

[Bug middle-end/77484] [6/7 Regression] Static branch predictor causes ~6-8% regression of SPEC2000 GAP

2017-02-01 Thread vogt at linux dot vnet.ibm.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77484

--- Comment #36 from Dominik Vogt  ---
r244207 vs. r244206 (comment 24)
---
   run-old.resultrun-new.result
f410.bwaves 1.27s1.27s (   0.00%,   0.00% )
f416.gamess 6.87s7.21s (   4.95%,  -4.72% )
f433.milc   5.57s5.57s (   0.00%,   0.00% )
f434.zeusmp 2.18s2.18s (   0.00%,   0.00% )
f435.gromacs1.34s1.36s (   1.49%,  -1.47% )
f436.cactusADM 24.63s   24.56s (  -0.28%,   0.29% )
f437.leslie3d   2.76s2.76s (   0.00%,   0.00% )
f444.namd  12.13s   12.13s (   0.00%,   0.00% )
f447.dealII 2.03s2.02s (  -0.49%,   0.50% )
f450.soplex 3.98s3.98s (   0.00%,   0.00% )
f453.povray 2.89s2.90s (   0.35%,  -0.34% )
f454.calculix  17.28s   17.30s (   0.12%,  -0.12% )
f459.GemsFDTD   7.29s7.29s (   0.00%,   0.00% )
f465.tonto  0.94s0.94s (   0.00%,   0.00% )
f470.lbm2.65s2.64s (  -0.38%,   0.38% )
f481.wrf3.84s3.84s (   0.00%,   0.00% )
f482.sphinx3   10.61s   10.58s (  -0.28%,   0.28% )
i400.perlbench  7.32s7.46s (   1.91%,  -1.88% )
i401.bzip2  3.97s3.97s (   0.00%,   0.00% )
i403.gcc1.00s1.01s (   1.00%,  -0.99% )
i429.mcf1.49s1.49s (   0.00%,   0.00% )
i445.gobmk  3.59s3.61s (   0.56%,  -0.55% )
i456.hmmer  1.57s1.56s (  -0.64%,   0.64% )
i458.sjeng  3.76s3.77s (   0.27%,  -0.27% )
i462.libquantum17.11s   17.08s (  -0.18%,   0.18% )
i464.h264ref3.09s3.29s (   6.47%,  -6.08% )
i471.omnetpp   11.20s   11.16s (  -0.36%,   0.36% )
i473.astar  7.58s7.56s (  -0.26%,   0.26% )
i483.xalancbmk  7.43s7.49s (   0.81%,  -0.80% )

--

i401.bzip2 22375 insns +0 changed
i458.sjeng 38701 insns -8 smaller
f482.sphinx3 52038 insns +7 BIGGER!, 1 funcs bigger (max +7 insns)
i456.hmmer 84421 insns +0 changed
f436.cactusADM 201172 insns -6 smaller, 11 funcs bigger (max +5 insns)
f435.gromacs 249282 insns -3 smaller, 1 funcs bigger (max +2 insns)
i471.omnetpp 137988 insns -86 smaller, 3 funcs bigger (max +2 insns)
i445.gobmk 247886 insns +11 BIGGER!, 6 funcs bigger (max +17 insns)
f450.soplex 127628 insns +3 BIGGER!, 2 funcs bigger (max +2 insns)
f453.povray 245457 insns -2 smaller, 3 funcs bigger (max +3 insns)
i400.perlbench 304597 insns +249 BIGGER!, 17 funcs bigger (max +419 insns)
f454.calculix 40 insns identical
i464.h264ref 316393 insns -3 smaller, 4 funcs bigger (max +7 insns)
i403.gcc 796623 insns +798 BIGGER!, 31 funcs bigger (max +977 insns)
f465.tonto 1141420 insns +0 changed
f447.dealII 764301 insns +1 BIGGER!, 11 funcs bigger (max +139 insns)
f481.wrf 1084840 insns -11 smaller
i483.xalancbmk 919877 insns +3 BIGGER!, 1 funcs bigger (max +12 insns)
f416.gamess 2562020 insns +2 BIGGER!, 1 funcs bigger (max +2 insns)

statistics:
---
29  tests (total)
8   test executables have grown (more insns)
7   test executables have shrunk (fewer insns)
10141266insns total (old)
+955insns difference
+94 insns per 1,000,000
+38 weighted insns per 1,000,000 *
92  functions have grown (total) **
+977insns in most grown function

[Bug middle-end/77484] [6/7 Regression] Static branch predictor causes ~6-8% regression of SPEC2000 GAP

2017-02-01 Thread vogt at linux dot vnet.ibm.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77484

--- Comment #35 from Dominik Vogt  ---
r244167 vs. r244166 (comment 21)
---
   run-old.resultrun-new.result
f410.bwaves 1.27s1.27s (   0.00%,   0.00% )
f416.gamess 6.87s6.87s (   0.00%,   0.00% )
f433.milc   5.57s5.57s (   0.00%,   0.00% )
f434.zeusmp 2.18s2.19s (   0.46%,  -0.46% )
f435.gromacs1.34s1.34s (   0.00%,   0.00% )
f436.cactusADM 24.71s   24.69s (  -0.08%,   0.08% )
f437.leslie3d   2.76s2.76s (   0.00%,   0.00% )
f444.namd  12.13s   12.13s (   0.00%,   0.00% )
f447.dealII 2.04s2.03s (  -0.49%,   0.49% )
f450.soplex 3.91s3.98s (   1.79%,  -1.76% )
f453.povray 2.90s2.89s (  -0.34%,   0.35% )
f454.calculix  17.29s   17.29s (   0.00%,   0.00% )
f459.GemsFDTD   7.27s7.30s (   0.41%,  -0.41% )
f465.tonto  0.94s0.94s (   0.00%,   0.00% )
f470.lbm2.65s2.66s (   0.38%,  -0.38% )
f481.wrf3.84s3.84s (   0.00%,   0.00% )
f482.sphinx3   10.62s   10.62s (   0.00%,   0.00% )
i400.perlbench  7.27s7.34s (   0.96%,  -0.95% )
i401.bzip2  3.97s3.97s (   0.00%,   0.00% )
i403.gcc1.01s1.01s (   0.00%,   0.00% )
i429.mcf1.49s1.49s (   0.00%,   0.00% )
i445.gobmk  3.59s3.59s (   0.00%,   0.00% )
i456.hmmer  1.57s1.59s (   1.27%,  -1.26% )
i458.sjeng  3.77s3.76s (  -0.27%,   0.27% )
i462.libquantum17.09s   17.14s (   0.29%,  -0.29% )
i464.h264ref3.09s3.09s (   0.00%,   0.00% )
i471.omnetpp   11.16s   11.25s (   0.81%,  -0.80% )
i473.astar  7.56s7.58s (   0.26%,  -0.26% )
i483.xalancbmk  7.80s7.37s (  -5.51%,   5.83% )

--

i471.omnetpp 138049 insns -75 smaller, 26 funcs bigger (max +202 insns)
f450.soplex 127589 insns +43 BIGGER!, 13 funcs bigger (max +108 insns)
f453.povray 245456 insns +10 BIGGER!, 5 funcs bigger (max +5 insns)
f447.dealII 764156 insns +150 BIGGER!, 35 funcs bigger (max +1800 insns)
i483.xalancbmk 921932 insns -2045 smaller, 391 funcs bigger (max +1513 insns)

command line:
-
  # ak-scripts/compare.sh --suffix -244167-244166 -r 10 -o
/home/vogt/src/gcc/install-244166 -n /home/vogt/src/gcc/install-244167 -c -O3
-march=zEC12 -funroll-loops

output files:
-
executable diff:
/home/vogt/src/minispec-2006/diff-31012017.result-244167-244166
functions grown:
/home/vogt/src/minispec-2006/funcs-grown-31012017.result-244167-244166
build times:
/home/vogt/src/minispec-2006/buildtime-31012017.result-244167-244166

statistics:
---
29  tests (total)
3   test executables have grown (more insns)
2   test executables have shrunk (fewer insns)
10143197insns total (old)
-1917   insns difference
-188insns per 1,000,000
-77 weighted insns per 1,000,000 *
470 functions have grown (total) **
+1800   insns in most grown function

[Bug middle-end/77484] [6/7 Regression] Static branch predictor causes ~6-8% regression of SPEC2000 GAP

2017-02-01 Thread vogt at linux dot vnet.ibm.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77484

--- Comment #34 from Dominik Vogt  ---
Some Spec2006 results on s390x (zEC12) for the files:

r243995 vs. r243994 (comment 14)
---
   run-old.resultrun-new.result
f410.bwaves 1.27s1.28s (   0.79%,  -0.78% )
f416.gamess 7.09s6.61s (  -6.77%,   7.26% )
f433.milc   5.53s5.54s (   0.18%,  -0.18% )
f434.zeusmp 2.19s2.19s (   0.00%,   0.00% )
f435.gromacs1.34s1.34s (   0.00%,   0.00% )
f436.cactusADM 24.63s   24.67s (   0.16%,  -0.16% )
f437.leslie3d   2.76s2.75s (  -0.36%,   0.36% )
f444.namd  12.13s   12.13s (   0.00%,   0.00% )
f447.dealII 2.03s2.02s (  -0.49%,   0.50% )
f450.soplex 3.92s3.96s (   1.02%,  -1.01% )
f453.povray 2.89s2.87s (  -0.69%,   0.70% )
f454.calculix  17.32s   17.23s (  -0.52%,   0.52% )
f459.GemsFDTD   7.24s7.19s (  -0.69%,   0.70% )
f465.tonto  0.94s0.94s (   0.00%,   0.00% )
f470.lbm2.65s2.66s (   0.38%,  -0.38% )
f481.wrf3.84s3.85s (   0.26%,  -0.26% )
f482.sphinx3   10.50s   10.54s (   0.38%,  -0.38% )
i400.perlbench  7.58s7.37s (  -2.77%,   2.85% )
i401.bzip2  3.98s3.95s (  -0.75%,   0.76% )
i403.gcc1.01s1.00s (  -0.99%,   1.00% )
i429.mcf1.49s1.49s (   0.00%,   0.00% )
i445.gobmk  3.56s3.62s (   1.69%,  -1.66% )
i456.hmmer  1.59s1.57s (  -1.26%,   1.27% )
i458.sjeng  3.81s3.84s (   0.79%,  -0.78% )
i462.libquantum17.13s   17.46s (   1.93%,  -1.89% )
i464.h264ref3.14s3.31s (   5.41%,  -5.14% )
i471.omnetpp   11.50s   11.51s (   0.09%,  -0.09% )
i473.astar  7.22s7.54s (   4.43%,  -4.24% )
i483.xalancbmk  7.51s8.15s (   8.52%,  -7.85% )

--

f470.lbm 2984 insns +0 changed
i429.mcf 4165 insns +346 BIGGER!, 3 funcs bigger (max +339 insns)
i462.libquantum 11735 insns -7 smaller, 4 funcs bigger (max +3 insns)
i473.astar 12460 insns -181 smaller, 12 funcs bigger (max +9 insns)
i401.bzip2 22439 insns -13 smaller, 6 funcs bigger (max +25 insns)
f437.leslie3d 28725 insns -21 smaller
i458.sjeng 38864 insns -144 smaller, 10 funcs bigger (max +26 insns)
f433.milc 35091 insns -262 smaller, 7 funcs bigger (max +6 insns)
f482.sphinx3 51879 insns +139 BIGGER!, 16 funcs bigger (max +326 insns)
i456.hmmer 85157 insns -677 smaller, 26 funcs bigger (max +463 insns)
f444.namd 76220 insns +0 changed, 1 funcs bigger (max +11 insns)
f434.zeusmp 73937 insns +6 BIGGER!, 1 funcs bigger (max +7 insns)
f459.GemsFDTD 111465 insns -151 smaller, 3 funcs bigger (max +21 insns)
f436.cactusADM 201648 insns -638 smaller, 69 funcs bigger (max +103 insns)
f435.gromacs 250725 insns -1425 smaller, 64 funcs bigger (max +316 insns)
i471.omnetpp 135992 insns +2095 BIGGER!, 70 funcs bigger (max +2101 insns)
i445.gobmk 249112 insns -726 smaller, 188 funcs bigger (max +1282 insns)
f450.soplex 131531 insns -3584 smaller, 33 funcs bigger (max +546 insns)
f453.povray 247399 insns -1827 smaller, 118 funcs bigger (max +1481 insns)
i400.perlbench 305683 insns -886 smaller, 207 funcs bigger (max +480 insns)
f454.calculix 478026 insns -837 smaller, 97 funcs bigger (max +1036 insns)
i464.h264ref 316483 insns +35 BIGGER!, 44 funcs bigger (max +2229 insns)
i403.gcc 800574 insns -4048 smaller, 514 funcs bigger (max +1614 insns)
f465.tonto 1138432 insns -970 smaller, 139 funcs bigger (max +653 insns)
f447.dealII 764322 insns -873 smaller, 248 funcs bigger (max +2980 insns)
f481.wrf 1081604 insns +32 BIGGER!, 126 funcs bigger (max +404 insns)
i483.xalancbmk 919758 insns +1914 BIGGER!, 731 funcs bigger (max +1835 insns)
f416.gamess 2553939 insns +9369 BIGGER!, 328 funcs bigger (max +9157 insns)

statistics:
---
29  tests (total)
8   test executables have grown (more insns)
18  test executables have shrunk (fewer insns)
10140169insns total (old)
-3334   insns difference
-328insns per 1,000,000
+435weighted insns per 1,000,000 *
3065functions have grown (total) **
+9157   insns in most grown function
 * Each test case is scaled to 100 insns.  The displayed number is the
   average of all tests.

[Bug middle-end/77484] [6/7 Regression] Static branch predictor causes ~6-8% regression of SPEC2000 GAP

2017-01-19 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77484

--- Comment #33 from wilco at gcc dot gnu.org ---
(In reply to Jan Hubicka from comment #32)
> Apparently fixed. The coremark is PR77445

Yes, my SPEC2006 results look good, no real change. Coremark is now up by 20%
or more, thanks for that :-)

[Bug middle-end/77484] [6/7 Regression] Static branch predictor causes ~6-8% regression of SPEC2000 GAP

2017-01-19 Thread hubicka at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77484

Jan Hubicka  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #32 from Jan Hubicka  ---
Apparently fixed. The coremark is PR77445

[Bug middle-end/77484] [6/7 Regression] Static branch predictor causes ~6-8% regression of SPEC2000 GAP

2017-01-16 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77484

--- Comment #31 from Wilco  ---
(In reply to Jan Hubicka from comment #30)
> > 
> > When I looked at gap at the time, the main change was the reordering of a 
> > few
> > if statements in several hot functions. Incorrect block frequencies also 
> > change
> > register allocation in a bad way, but I didn't notice anything obvious in 
> > gap.
> > And many optimizations are being disabled on blocks with an incorrect 
> > frequency
> > - this happens all over the place and is the issue causing the huge Coremark
> > regression.
> 
> This is the issue with jump threading code no longer sanely updating profile,
> right?  I will try to find time to look into it this week.

I don't know the exact details but James proved that the blocks are incorrectly
assumed cold so part of the optimization doesn't trigger as expected. I'm not
sure whether that is because the frequencies got too low, set incorrectly or
not set at all. 

> > I could do some experiments but I believe the key underlying problem is that
> > GCC treats the block frequencies as accurate when they are really very vague
> > estimates (often incorrect) and so should only be used to break ties.
> > 
> > In fact I would claim that even modelling if-statements as a balanced 50/50 
> > is
> > incorrect. It suggests that a block that is guarded by multiple 
> > if-statements
> > handling exceptional cases is much less important than the very same block 
> > that
> > isn't, even if they are both always executed. Without profile data providing
> > actual frequencies we should not optimize the outer block for speed and the
> > inner block for size.
> 
> There are --param options to control this. They was originally tuned based on
> Spec2000 and x86_64 scores (in GCC 3.x timeframe). if you can get resonable
> data that they are not working very well anymore (or for ARM), we could try
> to
> tune them better.
> 
> I have WIP patches to get the propagation bit more fine grained and
> propagate i.e.
> info if BB is reachable only bo known to be cold path (such that one that has
> EH edge on it). This may make the logic bit more reliable.

I'll have a look, but I think the key is to think in terms of block importance
(from cold to hot). Apart from highly skewed cases (eg. exception edges or
loops), most blocks should be equally important to optimize.

[Bug middle-end/77484] [6/7 Regression] Static branch predictor causes ~6-8% regression of SPEC2000 GAP

2017-01-16 Thread hubicka at ucw dot cz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77484

--- Comment #30 from Jan Hubicka  ---
> 
> When I looked at gap at the time, the main change was the reordering of a few
> if statements in several hot functions. Incorrect block frequencies also 
> change
> register allocation in a bad way, but I didn't notice anything obvious in gap.
> And many optimizations are being disabled on blocks with an incorrect 
> frequency
> - this happens all over the place and is the issue causing the huge Coremark
> regression.

This is the issue with jump threading code no longer sanely updating profile,
right?  I will try to find time to look into it this week.
> 
> I could do some experiments but I believe the key underlying problem is that
> GCC treats the block frequencies as accurate when they are really very vague
> estimates (often incorrect) and so should only be used to break ties.
> 
> In fact I would claim that even modelling if-statements as a balanced 50/50 is
> incorrect. It suggests that a block that is guarded by multiple if-statements
> handling exceptional cases is much less important than the very same block 
> that
> isn't, even if they are both always executed. Without profile data providing
> actual frequencies we should not optimize the outer block for speed and the
> inner block for size.

There are --param options to control this. They was originally tuned based on
Spec2000 and x86_64 scores (in GCC 3.x timeframe). if you can get resonable
data that they are not working very well anymore (or for ARM), we could try to
tune them better.

I have WIP patches to get the propagation bit more fine grained and propagate
i.e.
info if BB is reachable only bo known to be cold path (such that one that has
EH edge on it). This may make the logic bit more reliable.

Honza

[Bug middle-end/77484] [6/7 Regression] Static branch predictor causes ~6-8% regression of SPEC2000 GAP

2017-01-16 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77484

--- Comment #29 from Wilco  ---
(In reply to Jan Hubicka from comment #28)
> > On SPEC2000 the latest changes look good, compared to the old predictor gap
> > improved by 10% and INT/FP by 0.8%/0.6%. I'll run SPEC2006 tonight.
> 
> It is rather surprising you are seeing such large changes for one branch
> predictor
> change.  Is most of it really comming just from the bb-reorder changes? On
> x86 the
> effect is mostly within noise and on Itanium Gap improve by 2-3%.
> It may be interesting to experiment with reorderin and prediction more on
> this target.

When I looked at gap at the time, the main change was the reordering of a few
if statements in several hot functions. Incorrect block frequencies also change
register allocation in a bad way, but I didn't notice anything obvious in gap.
And many optimizations are being disabled on blocks with an incorrect frequency
- this happens all over the place and is the issue causing the huge Coremark
regression.

I could do some experiments but I believe the key underlying problem is that
GCC treats the block frequencies as accurate when they are really very vague
estimates (often incorrect) and so should only be used to break ties.

In fact I would claim that even modelling if-statements as a balanced 50/50 is
incorrect. It suggests that a block that is guarded by multiple if-statements
handling exceptional cases is much less important than the very same block that
isn't, even if they are both always executed. Without profile data providing
actual frequencies we should not optimize the outer block for speed and the
inner block for size.

[Bug middle-end/77484] [6/7 Regression] Static branch predictor causes ~6-8% regression of SPEC2000 GAP

2017-01-16 Thread hubicka at ucw dot cz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77484

--- Comment #28 from Jan Hubicka  ---
> On SPEC2000 the latest changes look good, compared to the old predictor gap
> improved by 10% and INT/FP by 0.8%/0.6%. I'll run SPEC2006 tonight.

It is rather surprising you are seeing such large changes for one branch
predictor
change.  Is most of it really comming just from the bb-reorder changes? On x86
the
effect is mostly within noise and on Itanium Gap improve by 2-3%.
It may be interesting to experiment with reorderin and prediction more on this
target.

Honza

[Bug middle-end/77484] [6/7 Regression] Static branch predictor causes ~6-8% regression of SPEC2000 GAP

2017-01-16 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77484

--- Comment #27 from wilco at gcc dot gnu.org ---
(In reply to Jan Hubicka from comment #26)
> Hello, did the Gap scores on arm too? Both Itanium and PPC testers seems to
> show improved gap scores, so hope arm and the other ppc tester too.

On SPEC2000 the latest changes look good, compared to the old predictor gap
improved by 10% and INT/FP by 0.8%/0.6%. I'll run SPEC2006 tonight.

[Bug middle-end/77484] [6/7 Regression] Static branch predictor causes ~6-8% regression of SPEC2000 GAP

2017-01-14 Thread hubicka at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77484

--- Comment #26 from Jan Hubicka  ---
Hello, did the Gap scores on arm too? Both Itanium and PPC testers seems to
show improved gap scores, so hope arm and the other ppc tester too.

[Bug middle-end/77484] [6/7 Regression] Static branch predictor causes ~6-8% regression of SPEC2000 GAP

2017-01-10 Thread hubicka at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77484

--- Comment #25 from Jan Hubicka  ---
Author: hubicka
Date: Tue Jan 10 09:14:54 2017
New Revision: 244260

URL: https://gcc.gnu.org/viewcvs?rev=244260=gcc=rev
Log:
PR middle-end/77484
* predict.def (PRED_CALL): Set to 67.

Modified:
trunk/gcc/ChangeLog
trunk/gcc/predict.def

[Bug middle-end/77484] [6/7 Regression] Static branch predictor causes ~6-8% regression of SPEC2000 GAP

2017-01-08 Thread hubicka at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77484

--- Comment #24 from Jan Hubicka  ---
Author: hubicka
Date: Sun Jan  8 09:53:06 2017
New Revision: 244207

URL: https://gcc.gnu.org/viewcvs?rev=244207=gcc=rev
Log:
PR middle-end/77484
* predict.def (PRED_INDIR_CALL): Set to 86.

Modified:
trunk/gcc/ChangeLog
trunk/gcc/predict.def

[Bug middle-end/77484] [6/7 Regression] Static branch predictor causes ~6-8% regression of SPEC2000 GAP

2017-01-07 Thread trippels at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77484

--- Comment #23 from Markus Trippelsdorf  ---
Unfortunately vmakarov SPEC tester is currently stalled for most archs.
However it still works for POWER7 and here r244167 shows no effect.

https://vmakarov.fedorapeople.org/spec/spec2000.ibm-p730-05-lp5/gcc/home.html

[Bug middle-end/77484] [6/7 Regression] Static branch predictor causes ~6-8% regression of SPEC2000 GAP

2017-01-07 Thread vogt at linux dot vnet.ibm.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77484

--- Comment #22 from Dominik Vogt  ---
> Is changing one a day enough for periodic testers to catch up?

I'll try to keep up with testing.

> New Revision: 244167

Which numbers do you need r244167 vs. r244166 or vs. 243994 or both?  (If I'm
supposed to run the statistics script I'd need a pointer where to find and how
to run it.)

[Bug middle-end/77484] [6/7 Regression] Static branch predictor causes ~6-8% regression of SPEC2000 GAP

2017-01-06 Thread hubicka at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77484

--- Comment #21 from Jan Hubicka  ---
Author: hubicka
Date: Fri Jan  6 16:10:09 2017
New Revision: 244167

URL: https://gcc.gnu.org/viewcvs?rev=244167=gcc=rev
Log:
PR middle-end/77484
* predict.def (PRED_POLYMORPHIC_CALL): Set to 58
* predict.c (tree_estimate_probability_bb): Reverse direction of
polymorphic call predictor.

Modified:
trunk/gcc/ChangeLog
trunk/gcc/predict.c
trunk/gcc/predict.def

[Bug middle-end/77484] [6/7 Regression] Static branch predictor causes ~6-8% regression of SPEC2000 GAP

2017-01-06 Thread hubicka at ucw dot cz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77484

--- Comment #20 from Jan Hubicka  ---
Hi,
it turns out that Martin added another column to his statistics script which I
have misinterpretted.
https://gcc.opensuse.org/SPEC/CINT/sb-terbium-head-64/recent.html also shows
interesting reaction
to the change.  I will update the probabilities to correct values one by one
and let us see
how benchmarks react.  Is changing one a day enough for periodic testers to
catch up?

The hitrates on spec2k6 combined with spec v6 is as follows.
It means that indirect call should have 14%, call  67% and polymorphic call
should have oposite outcime and 59% The statistic samples are quite small and
dominated by one use, so we may diverge from those values if it seem to make
sense.

Honza


COMBINED

HEURISTICS   BRANCHES  (REL)  BR. HITRATE  
 HITRATE   COVERAGE COVERAGE  (REL)
loop guard with recursion  13   0.0%   92.31%   85.06%
/  85.06% 66724402676.67G   0.4%
Fortran loop preheader 42   0.0%   97.62%   97.78%
/  99.07%14487181.45M   0.0%
loop iv compare71   0.1%   78.87%   49.26%
/  63.89%  163168352  163.17M   0.0%
loop exit with recursion   85   0.1%   75.29%   84.89%
/  86.96% 98737414359.87G   0.6%
extra loop exit98   0.1%   71.43%   31.39%
/  76.96%  312263024  312.26M   0.0%
recursive call121   0.1%   64.46%   37.55%
/  82.78%  531996473  532.00M   0.0%
Fortran repeated allocation/deallocation  392   0.4%  100.00%  100.00%
/ 100.00%630   630.00   0.0%
guess loop iv compare 402   0.4%   90.05%   95.82%
/  96.04% 56831517715.68G   0.3%
indirect call 425   0.4%   52.00%   14.06%
/  91.41% 38153329633.82G   0.2%
Fortran zero-sized array  549   0.6%   99.64%  100.00%
/ 100.00%   20794317   20.79M   0.0%
const return  651   0.7%   94.93%   84.04%
/  93.18% 10677746531.07G   0.1%
null return   716   0.7%   92.18%   91.83%
/  93.39% 34213219563.42G   0.2%
negative return   734   0.8%   97.14%   64.62%
/  65.01% 47448863154.74G   0.3%
continue  773   0.8%   66.11%   79.71%
/  87.43%29089649102   29.09G   1.6%
polymorphic call  803   0.8%   43.59%   59.05%
/  86.63% 38285550303.83G   0.2%
Fortran fail alloc944   1.0%  100.00%  100.00%
/ 100.00% 167691  167.69K   0.0%
Fortran overflow 1237   1.3%  100.00%  100.00%
/ 100.00%   55197159   55.20M   0.0%
loop guard   1861   1.9%   48.90%   69.86%
/  84.62%17979028127   17.98G   1.0%
noreturn call3769   3.9%   99.95%  100.00%
/ 100.00% 81750534258.18G   0.5%
loop exit5017   5.2%   83.50%   90.01%
/  91.68%   143815212145  143.82G   8.1%
opcode values positive (on trees)5763   6.0%   66.23%   60.41%
/  86.25%43349211449   43.35G   2.4%
loop iterations  6276   6.6%   99.94%   78.54%
/  78.54%   662671304506  662.67G  37.3%
early return (on trees)  9512   9.9%   61.02%   58.05%
/  85.82%56222551226   56.22G   3.2%
pointer (on trees)  10969  11.5%   63.29%   75.18%
/  89.33%23318221976   23.32G   1.3%
opcode values nonequal (on trees)   11633  12.2%   63.23%   74.53%
/  85.23%   118246028039  118.25G   6.7%
guessed loop iterations 13631  14.2%   96.54%   92.61%
/  93.12%   417886815060  417.89G  23.5%
call20747  21.7%   54.47%   67.24%
/  92.56%50096613362   50.10G   2.8%

[Bug middle-end/77484] [6/7 Regression] Static branch predictor causes ~6-8% regression of SPEC2000 GAP

2017-01-05 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77484

--- Comment #19 from wilco at gcc dot gnu.org ---
> The commit in comment 14 has instroduced size and runtime regressions in the
> Spec2006 testsuite on s390x:

I get reproducible regressions on AArch64 as well with the latest patch
(changes >0.5%):

400.perlbench   -1.26%
403.gcc -3.16%
445.gobmk   -2.70%
458.sjeng   1.65%
464.h264ref -0.78%
453.povray  -2.65%

It seems this is worse than the earlier versions of the patch which all used
NOT_TAKEN.

[Bug middle-end/77484] [6/7 Regression] Static branch predictor causes ~6-8% regression of SPEC2000 GAP

2017-01-03 Thread vogt at linux dot vnet.ibm.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77484

--- Comment #18 from Dominik Vogt  ---
(The perlbench result looks like a bad measurement result; we sometimes have
this on devel machine for unknown reasons, possibly when someone compiles or
tests on a different partition.)

[Bug middle-end/77484] [6/7 Regression] Static branch predictor causes ~6-8% regression of SPEC2000 GAP

2017-01-03 Thread vogt at linux dot vnet.ibm.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77484

--- Comment #17 from Dominik Vogt  ---
Can you make sense of these results?  The size of gamess has not changed, but
the runtime has but still looks noticeably worse.  The astar performance looks
similar to yesterday's result without the change from comment 16.

--
Diffing i473.astar 9458 old.s 4936 smaller (-294 lines) (11 funcs bigger)
Diffing i458.sjeng 34678 old.s 23820 smaller (-197 lines) (7 funcs bigger)
Diffing i445.gobmk 216664 old.s 152439 smaller (-1352 lines) (178 funcs bigger)
Diffing i400.perlbench 248535 old.s 232015 smaller (-1201 lines) (209 funcs
bigger)
Diffing i483.xalancbmk 743937 old.s 374620 BIGGER! (+404 lines) (630 funcs
bigger)
Diffing f416.gamess 1913604 old.s 1175120 BIGGER! (+7805 lines) (327 funcs
bigger)
--   run-old.resultrun-new.result
f416.gamess 6.55s6.70s (   2.29%,  -2.24% )
i400.perlbench  7.69s7.20s (  -6.37%,   6.81% )
i445.gobmk  3.65s3.55s (  -2.74%,   2.82% )
i458.sjeng  3.83s3.75s (  -2.09%,   2.13% )
i473.astar  7.34s7.61s (   3.68%,  -3.55% )
i483.xalancbmk  7.62s7.55s (  -0.92%,   0.93% )
--

[Bug middle-end/77484] [6/7 Regression] Static branch predictor causes ~6-8% regression of SPEC2000 GAP

2017-01-02 Thread hubicka at ucw dot cz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77484

--- Comment #16 from Jan Hubicka  ---
>run-old.resultrun-new.result
> f416.gamess 6.55s6.70s (   2.29%,  -2.24% )
> i400.perlbench  7.17s7.37s (   2.79%,  -2.71% )
> i445.gobmk  3.64s3.55s (  -2.47%,   2.54% )
> i458.sjeng  3.83s3.75s (  -2.09%,   2.13% )
> i473.astar  7.33s7.62s (   3.96%,  -3.81% )

I can imagine perlbench to have indirect call in the internal loop, but the
other
benchmarks may be just a noise for reducing the hitrate down.

> i483.xalancbmk  7.47s8.06s (   7.90%,  -7.32% )

This however is probably a bug.  Does it help to change the direction of
predictor
for polymorphic calls back to likely not taken?
Index: predict.c
===
--- predict.c   (revision 244002)
+++ predict.c   (working copy)
@@ -2789,7 +2789,7 @@ tree_estimate_probability_bb (basic_bloc
  if (gimple_call_fndecl (stmt))
predict_edge_def (e, PRED_CALL, NOT_TAKEN);
  else if (virtual_method_call_p (gimple_call_fn (stmt)))
-   predict_edge_def (e, PRED_POLYMORPHIC_CALL, TAKEN);
+   predict_edge_def (e, PRED_POLYMORPHIC_CALL, NOT_TAKEN);
  else
predict_edge_def (e, PRED_INDIR_CALL, TAKEN);
  break;


Honza

[Bug middle-end/77484] [6/7 Regression] Static branch predictor causes ~6-8% regression of SPEC2000 GAP

2017-01-02 Thread vogt at linux dot vnet.ibm.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77484

Dominik Vogt  changed:

   What|Removed |Added

 CC||vogt at linux dot vnet.ibm.com

--- Comment #15 from Dominik Vogt  ---
The commit in comment 14 has instroduced size and runtime regressions in the
Spec2006 testsuite on s390x:

Runtime (only changes > 2%):

   run-old.resultrun-new.result
f416.gamess 6.55s6.70s (   2.29%,  -2.24% )
i400.perlbench  7.17s7.37s (   2.79%,  -2.71% )
i445.gobmk  3.64s3.55s (  -2.47%,   2.54% )
i458.sjeng  3.83s3.75s (  -2.09%,   2.13% )
i473.astar  7.33s7.62s (   3.96%,  -3.81% )
i483.xalancbmk  7.47s8.06s (   7.90%,  -7.32% )

Executable size ("+/- lines" menas number of instructions);

f470.lbm: 2718 old.s 27 changed (+0 lines)
i429.mcf: 2801 old.s 2091 BIGGER! (+346 lines) (2 funcs bigger)
i462.libquantum: 8200 old.s 2723 smaller (-15 lines)
i473.astar: 9458 old.s 4936 smaller (-294 lines) (11 funcs bigger)
f410.bwaves: 9035 old.s 0 identical (+/- 0 lines)
i401.bzip2: 18190 old.s 8032 smaller (-11 lines) (6 funcs bigger)
f437.leslie3d: 19536 old.s 1939 smaller (-14 lines)
i458.sjeng: 34678 old.s 23820 smaller (-197 lines) (7 funcs bigger)
f433.milc: 29745 old.s 14898 smaller (-276 lines) (5 funcs bigger)
f482.sphinx3: 37726 old.s 23881 BIGGER! (+115 lines) (15 funcs bigger)
i456.hmmer: 64427 old.s 33803 smaller (-698 lines) (28 funcs bigger)
f444.namd: 55512 old.s 1785 smaller (-2 lines) (1 funcs bigger)
f434.zeusmp: 63606 old.s 2764 BIGGER! (+4 lines) (1 funcs bigger)
f459.GemsFDTD: 76971 old.s 32948 BIGGER! (+30 lines) (3 funcs bigger)
f436.cactusADM: 148768 old.s 60861 smaller (-547 lines) (68 funcs bigger)
f435.gromacs: 198339 old.s 86425 smaller (-1483 lines) (62 funcs bigger)
i471.omnetpp: 118737 old.s 37232 BIGGER! (+1879 lines) (59 funcs bigger)
i445.gobmk: 216664 old.s 152439 smaller (-1352 lines) (178 funcs bigger)
f450.soplex: 94178 old.s 55926 smaller (-2624 lines) (39 funcs bigger)
f453.povray: 221353 old.s 144618 smaller (-1680 lines) (118 funcs bigger)
i400.perlbench: 248535 old.s 232015 smaller (-1201 lines) (209 funcs bigger)
f454.calculix: 372030 old.s 222377 smaller (-788 lines) (96 funcs bigger)
i464.h264ref: 302278 old.s 152578 BIGGER! (+51 lines) (45 funcs bigger)
i403.gcc: 715454 old.s 614572 smaller (-2639 lines) (504 funcs bigger)
f465.tonto: 760124 old.s 174792 smaller (-987 lines) (140 funcs bigger)
f447.dealII: 553779 old.s 247834 smaller (-1134 lines) (246 funcs bigger)
f481.wrf: 811803 old.s 238154 smaller (-76 lines) (120 funcs bigger)
i483.xalancbmk: 743937 old.s 441474 BIGGER! (+2434 lines) (733 funcs bigger)
f416.gamess: 1913604 old.s 1175120 BIGGER! (+7805 lines) (327 funcs bigger)

[Bug middle-end/77484] [6/7 Regression] Static branch predictor causes ~6-8% regression of SPEC2000 GAP

2017-01-01 Thread hubicka at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77484

--- Comment #14 from Jan Hubicka  ---
Author: hubicka
Date: Sun Jan  1 15:40:29 2017
New Revision: 243995

URL: https://gcc.gnu.org/viewcvs?rev=243995=gcc=rev
Log:

PR middle-end/77484
* predict.def (PRED_CALL): Update hitrate.
(PRED_INDIR_CALL, PRED_POLYMORPHIC_CALL): New predictors.
* predict.c (tree_estimate_probability_bb): Split CALL predictor
into direct/indirect/polymorphic variants.

Modified:
trunk/gcc/ChangeLog
trunk/gcc/predict.c
trunk/gcc/predict.def

[Bug middle-end/77484] [6/7 Regression] Static branch predictor causes ~6-8% regression of SPEC2000 GAP

2016-12-21 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77484

Jakub Jelinek  changed:

   What|Removed |Added

   Target Milestone|6.3 |6.4

--- Comment #13 from Jakub Jelinek  ---
GCC 6.3 is being released, adjusting target milestone.

[Bug middle-end/77484] [6/7 Regression] Static branch predictor causes ~6-8% regression of SPEC2000 GAP

2016-12-15 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77484

--- Comment #12 from wilco at gcc dot gnu.org ---
(In reply to wilco from comment #10)
> (In reply to Jan Hubicka from comment #9)
> > Created attachment 40217 [details]
> > predict
> > 
> > Hi,
> > here is patch adding the polymorphic case, too.
> > 
> > Honza
> 
> Looks good - gap still improves by 12%, SPECINT2k by 0.5%, SPECFP2k flat. So
> that fixes this issue.

I also ran SPEC2006 which didn't show any differences.(In reply to Martin Liška
from comment #11)
> I'm planning to run SPEC benchmarks late this week to find a proper value
> for the new predictor.

Any news on that? I ran SPEC2006 as well with the suggested values, and this
didn't show any differences.

[Bug middle-end/77484] [6/7 Regression] Static branch predictor causes ~6-8% regression of SPEC2000 GAP

2016-12-06 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77484

--- Comment #11 from Martin Liška  ---
I'm planning to run SPEC benchmarks late this week to find a proper value for
the new predictor.

[Bug middle-end/77484] [6/7 Regression] Static branch predictor causes ~6-8% regression of SPEC2000 GAP

2016-12-06 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77484

Richard Biener  changed:

   What|Removed |Added

   Priority|P3  |P2

[Bug middle-end/77484] [6/7 Regression] Static branch predictor causes ~6-8% regression of SPEC2000 GAP

2016-12-06 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77484

--- Comment #10 from wilco at gcc dot gnu.org ---
(In reply to Jan Hubicka from comment #9)
> Created attachment 40217 [details]
> predict
> 
> Hi,
> here is patch adding the polymorphic case, too.
> 
> Honza

Looks good - gap still improves by 12%, SPECINT2k by 0.5%, SPECFP2k flat. So
that fixes this issue.

[Bug middle-end/77484] [6/7 Regression] Static branch predictor causes ~6-8% regression of SPEC2000 GAP

2016-12-01 Thread hubicka at ucw dot cz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77484

--- Comment #9 from Jan Hubicka  ---
Hi,
here is patch adding the polymorphic case, too.

Honza

[Bug middle-end/77484] [6/7 Regression] Static branch predictor causes ~6-8% regression of SPEC2000 GAP

2016-12-01 Thread hubicka at ucw dot cz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77484

--- Comment #8 from Jan Hubicka  ---
> Yes that's it, a single run shows 12% speedup with this patch!

Looks promising.  We probably should try to differentiate from polymorphic
calls
as virtual methods are also used in different patterns.  
let me cook the patch.

Honza

[Bug middle-end/77484] [6/7 Regression] Static branch predictor causes ~6-8% regression of SPEC2000 GAP

2016-12-01 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77484

--- Comment #7 from wilco at gcc dot gnu.org ---
(In reply to Jan Hubicka from comment #6)
> Created attachment 40216 [details]
> predict
> 
> Aha, indirect calls should probably be treated separately as their use cases
> are quite
> special. What about this patch? (Maritn, it would be great if you can run
> the analyze_brprob
> for it)

Yes that's it, a single run shows 12% speedup with this patch!

[Bug middle-end/77484] [6/7 Regression] Static branch predictor causes ~6-8% regression of SPEC2000 GAP

2016-12-01 Thread hubicka at ucw dot cz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77484

--- Comment #6 from Jan Hubicka  ---
Aha, indirect calls should probably be treated separately as their use cases
are quite
special. What about this patch? (Maritn, it would be great if you can run the
analyze_brprob
for it)

Honza

[Bug middle-end/77484] [6/7 Regression] Static branch predictor causes ~6-8% regression of SPEC2000 GAP

2016-12-01 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77484

wilco at gcc dot gnu.org changed:

   What|Removed |Added

 CC||wilco at gcc dot gnu.org

--- Comment #5 from wilco at gcc dot gnu.org ---
(In reply to Jan Hubicka from comment #4)
> Wilco,
> do you have a specific function where this happens?
> 
> Martin,
> do you know what is hitrate of call predictor here?  I am not sure how much
> we can do about this (it is all heuristics after all).

Top functions in the profile with this issue are EvElmList, Sum, EvAssList,
Diff, Prod. It's the macro EVAL, it does a test and then an indirect call. If
is used multiple times in a row all the extra taken branches start to take
their toll.

[Bug middle-end/77484] [6/7 Regression] Static branch predictor causes ~6-8% regression of SPEC2000 GAP

2016-12-01 Thread hubicka at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77484

Jan Hubicka  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
 CC||hubicka at gcc dot gnu.org
   Assignee|unassigned at gcc dot gnu.org  |hubicka at gcc dot 
gnu.org

--- Comment #4 from Jan Hubicka  ---
Wilco,
do you have a specific function where this happens?

Martin,
do you know what is hitrate of call predictor here?  I am not sure how much we
can do about this (it is all heuristics after all).

[Bug middle-end/77484] [6/7 Regression] Static branch predictor causes ~6-8% regression of SPEC2000 GAP

2016-09-15 Thread ramana at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77484

Ramana Radhakrishnan  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2016-09-15
 CC||ramana at gcc dot gnu.org
 Ever confirmed|0   |1
  Known to fail||6.0, 7.0

--- Comment #3 from Ramana Radhakrishnan  ---
Confirmed then

[Bug middle-end/77484] [6/7 Regression] Static branch predictor causes ~6-8% regression of SPEC2000 GAP

2016-09-11 Thread pinskia at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77484

Andrew Pinski  changed:

   What|Removed |Added

   Keywords||missed-optimization
   Target Milestone|--- |6.3
Summary|Static branch predictor |[6/7 Regression] Static
   |causes ~6-8% regression of  |branch predictor causes
   |SPEC2000 GAP|~6-8% regression of
   ||SPEC2000 GAP