Linux 4.10.1 (has SMT scheduler fix)
GCC 5.4 - so no Ryzen optimizations

pachi-git-13115394
Intel Haswell
t=8   13325 g/s t=1 1665 g/s   @3.6GHz
t=4    9352 g/s t=1 2338 g/s   @3.6GHz
t=1    2542 g/s                @3.8GHz

AMD Ryzen
t=16  26589 g/s t=1 1661 g/s   @3.7GHz
t=8   15464 g/s t=1 1933 g/s   @3.7GHz
t=4    8141 g/s t=1 2035 g/s   @3.7GHz
t=1    2221 g/s                @3.7GHz

Leela 0.9.4
Intel Haswell @3.8GHz (OpenBLAS Haswell BLAS)
1 thread
benchmark = 5685 g/s           (mostly INT)
netbench predictions = 33 p/s  (DCNN AVX2 FPU)
netbench evaluations = 238 p/s (DCNN AVX2 FPU)

AMD Ryzen @3.7GHz (OpenBLAS Haswell BLAS)
1 thread
benchmark = 5099 g/s           (mostly INT)
netbench predictions = 27 p/s  (DCNN AVX2 FPU)
netbench evaluations = 239 p/s (DCNN AVX2 FPU)

Observations:
- SMT performance of Ryzen appears to be extremely good (+72% on pachi
vs +42% Intel).
- Single core IPC is 8.5% ~ 11.5% behind Haswell.
- Ryzen's AVX2 performance is too good. Ryzen has 2 x 128 bit FPU vs. 2
x 256 bit FPU for Haswell, and the majority of the time in Leela 0.9 is
spent in SGEMM, which is an ideal case for AVX2 code. I would have
predicted AVX2 results to be about half as fast on Ryzen, but its
results are extremely competitive or even better. I have no real
explanation for this, my best guess is a win due to a better fit cache
subsystem.
- By default OpenBLAS selects the Barcelona kernel for Ryzen (ugh!).
Overriding with the Haswell kernel gives much better results due to AVX2
usage.

-- 
GCP
_______________________________________________
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Reply via email to