On 16 May 2015 at 11:46, Yue Li wrote: | I wonder if anyone worked on incorporating CULA tools library functionality into Rcpp. How much speed gain on top of Rcpp do we expect on basic operation like matrix multiplication? | | In particular, I’m currently usnig RArmadillo to seamlessly perform matrix multiplication. But the speed gain over my R implementation is 5-10 times if not less. | | I’m wondering if there is an equivalent easy-to-use library for doing matrix multiplication with GPU enabled. A complete simple example would be greatly appreciated.
A few years ago I did some work on the 'gcbd' package to time and benchmark precisely these types of things: because they will depend on the hardware used for the gpu, hardware used for the cpu, software used as the compiler, software used for the BLAS/LAPACK library, software used as the OS etc pp I worked out a framework to benchmark these things and compare them. So have a look at this package and its vignette: it at least times several BLAS libraries against the gpu card I had (have). In general, I think its conclusion stands. You "waste" so much time copying data over to the gpu that any computation gain is dwarfed until you get to truly enormous (and unsual) matrix sizes. So gpus are still good for things like limited (maybe one-time) transfer and then a of iterations: some finance applications with Monte Carlo prices some to mind, anything MCMC and of course the whole 'deep learning' complex. And with that: no, as far as I know nobody has tightly integrated Rcpp and gpu computing as it simply is not that clearly a match. That's my $0.02. More comments welcome, particularly with benchmarks. Dirk -- http://dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org _______________________________________________ Rcpp-devel mailing list Rcpp-devel@lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel