Re: [Rcpp-devel] examples of using cula matrix multiplication in Rcpp

Matt D. Mon, 18 May 2015 07:44:28 -0700

On 5/18/2015 15:12, Dale Smith wrote:

I'm not a big fan of GPU computing for many of the reasons Dirkmentions below and something else I discovered while taking a Courseraclass last winter.
CUDA requires significant effort to keep up your skills unless you doit semi-regularly or more often. It's a very hard learning curve. Ican't climb that curve at this point in my working life. An occasionaluser may want to skip CUDA and investigate OpenACC or somethingrelated. Do what works best for you. I’ll investigate rCUDA, PyCUDA,OpenACC, etc, and leave the lower-level stuff to others.

I also think the focus on the high-level approach is often the rightchoice, at least initially.

Using either CUDA or OpenCL directly adds a lot of repetitive (andredundant) boilerplate code -- oftentimes (unless you actually makeactive use of the fine-tuning this allows you to use) with noperformance benefits compared to the higher-level solutions (this reallyshouldn't need (re)stating, but I still occasionally encounter folksexpecting "lower level" -- read: longer -- code to be somehowautomagically faster). At the same time, having to deal with thelower-level details can also make the whole experience more error-prone(e.g., due to manual resource management -- which, again, unless you'reexplicitly fine-tuning it yourself, will not make your codeautomagically perform faster).

Personally, I've had a good experience with C++AMP (hardware-vendorindependent; note: the last time I've used it it was more polished onMSFT platforms, although open-source Linux implementation is available)and Thrust (CUDA / NVIDIA hardware): http://thrust.github.io/SYCL looks (I'm yet to try it out) like an OpenCL equivalent of Thrust-- and its parallel STL implementation looks quite promising:https://github.com/KhronosGroup/SyclParallelSTL// OpenCL-based Boost.Compute has been recently accepted to Boost:https://github.com/boostorg/compute(The flip side being that NVIDIA hasn't historically kept OpenCL driversfor its cards very much up-to-date... perhaps this will change withimprovements necessary for CUDA 7, as well as requirements needed toimplement Vulkan API.)

In other words, instead of starting directly with CUDA, I'd suggeststarting with Thrust -- analogously, instead of jumping straight to rawOpenCL, I'd probably start with SYCL Parallel STL (or Boost.Compute?).

There's plenty of high-level GPGPU solutions available for C++, here aresome good overviews:http://www.soa-world.de/echelon/2014/04/c-accelerator-libraries.html //multiple reviews: http://www.soa-world.de/echelon/

http://arxiv.org/abs/1212.6326

What I haven't seen is any study of integrating these with R (I've onlyused standalone C++ code for GPGPU), could be interesting.

I’d like to reiterate that by far the most difficult think aboutworking with GPU technology is efficiently moving data on and off thecard. Do you have a rigorously established use case for using GPUtechnology?

In my experience, the "best" use case (in terms of being thelowest-hanging-fruit) would be an embarrassingly parallel problem; forexamples, see:

http://en.wikipedia.org/wiki/Embarrassingly_parallel

Naturally, the larger the workload, the higher the chance of thespeed-up exceeding the data transfer costs.


Best,

Matt

_______________________________________________
Rcpp-devel mailing list
Rcpp-devel@lists.r-forge.r-project.org
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel

Re: [Rcpp-devel] examples of using cula matrix multiplication in Rcpp

Reply via email to