[PyOpenCL] Re: Thanks! Maybe IBM's POWER9 CPUs... (Was: Can you recommend a GPU ...)

Jerome Kieffer Wed, 02 Oct 2019 00:13:19 -0700

On Tue, 1 Oct 2019 21:31:33 -0700
"Kingsley G. Morse Jr." <kings...@loaner.com> wrote:



> To be fair, though, I don't recall seeing any
> computer benchmarks from Intel as dramatic
> as AMD's Radeon VII.

We are speaking of the GPU driver for their (coming) discrete
accelerator. The figures available correspond to IGP which have at best
only to 1/4 of what can be expected from a discrete card.

> The Radeon VII is evidently rated at a theoretical
> max of 3 trillion 64 bit floating point
> operations.
> 
> Per second!
> 
> The other info I have that may be of interest is
> for IBM's POWER9 CPUs.
> 
> I think they may sometimes be fast enough to
> replace GPUs.
> 
> My understanding is an 8 core POWER9 CPU has been
> bench marked at half the speed of a GeForce 980
> running cuda.

We have a few of them ... There is no official OpenCL driver for IBM
power9, but POCL works there and is limited to pthread. Apparently
there is no usage of vectorial instruction. I also doubt there is any
topology analysis for their various levels of cache of the Power9 which
is different from Xeon's architecture.

> The author seemed to think gcc supports the
> POWER9's vectorization pretty well.

I confirm gcc8 has automatic translation for SSE2 code into VSX (the
successor of ALTIVEC). This worked out of the box for the cases I
tested. When re-writing the code using the vectorial instruction set I
won from 1-4x versus the SSE2 code which was 2-4x faster than native C
code compiled. I can provide a few figures if needed.

> Up to 44 POWER9 cores are available on mother
> boards from Raptor Computing Systems.

IBM's server are competitive in price with equivalent Xeon gold

> Wouldn't it be nice if POWER9 CPUs were fast
> enough to run regular python, without rewriting
> applications to conform to (py)opencl, or a GPU?

I found out python is ~2x slower on power9 compared to Xeon. Probably
related to the complex cache structure. The latency for launching
OpenCL kernels or calling a c-function from ctypes is also twice larger.
(20µs on xeon, 40µs on P9, 400µs on Cortex A53).
https://www.phoronix.com/scan.php?page=article&item=power9-talos-2&num=4

Cheers,

-- 
Jérôme Kieffer
_______________________________________________
PyOpenCL mailing list -- pyopencl@tiker.net
To unsubscribe send an email to pyopencl-le...@tiker.net

[PyOpenCL] Re: Thanks! Maybe IBM's POWER9 CPUs... (Was: Can you recommend a GPU ...)

Reply via email to