On Tue, 1 Oct 2019 21:31:33 -0700 "Kingsley G. Morse Jr." <kings...@loaner.com> wrote:
> To be fair, though, I don't recall seeing any > computer benchmarks from Intel as dramatic > as AMD's Radeon VII. We are speaking of the GPU driver for their (coming) discrete accelerator. The figures available correspond to IGP which have at best only to 1/4 of what can be expected from a discrete card. > The Radeon VII is evidently rated at a theoretical > max of 3 trillion 64 bit floating point > operations. > > Per second! > > The other info I have that may be of interest is > for IBM's POWER9 CPUs. > > I think they may sometimes be fast enough to > replace GPUs. > > My understanding is an 8 core POWER9 CPU has been > bench marked at half the speed of a GeForce 980 > running cuda. We have a few of them ... There is no official OpenCL driver for IBM power9, but POCL works there and is limited to pthread. Apparently there is no usage of vectorial instruction. I also doubt there is any topology analysis for their various levels of cache of the Power9 which is different from Xeon's architecture. > The author seemed to think gcc supports the > POWER9's vectorization pretty well. I confirm gcc8 has automatic translation for SSE2 code into VSX (the successor of ALTIVEC). This worked out of the box for the cases I tested. When re-writing the code using the vectorial instruction set I won from 1-4x versus the SSE2 code which was 2-4x faster than native C code compiled. I can provide a few figures if needed. > Up to 44 POWER9 cores are available on mother > boards from Raptor Computing Systems. IBM's server are competitive in price with equivalent Xeon gold > Wouldn't it be nice if POWER9 CPUs were fast > enough to run regular python, without rewriting > applications to conform to (py)opencl, or a GPU? I found out python is ~2x slower on power9 compared to Xeon. Probably related to the complex cache structure. The latency for launching OpenCL kernels or calling a c-function from ctypes is also twice larger. (20µs on xeon, 40µs on P9, 400µs on Cortex A53). https://www.phoronix.com/scan.php?page=article&item=power9-talos-2&num=4 Cheers, -- Jérôme Kieffer _______________________________________________ PyOpenCL mailing list -- pyopencl@tiker.net To unsubscribe send an email to pyopencl-le...@tiker.net