Thank you Karl, I am glad that I can at least understand why I am seeing this difference. I absolutely think the CUDA 'port' should be added to ViennaCL. It certainly may be preferable to some to call the direct cuBLAS routines but I am in favor of trying to find a balance between speed and 'ease-of-use'. >From my point of view, having both optimized OpenCL and CUDA kernels would be a great selling point for ViennaCL.
Regards, Charles On Mon, Aug 3, 2015 at 7:37 AM, Karl Rupp <r...@iue.tuwien.ac.at> wrote: > Hi Charles, > > > I was benchmarking 4096x4096 matrices (again, with my R bindings). By > >> 'slower' I mean that I am observing OpenCL at this size beating the >> OpenBLAS CPU implementation by over 2X but the CUDA implementation is >> nearly 5X slower than the CPU. This seemed odd to me that the CUDA >> would be so much slower than the OpenCL, hence my initial thought to >> invite others to review my code if I am making some sort of silly >> mistake. Otherwise I was intending to begin trying to pursue direct >> cublas methods but I would very much prefer to use ViennaCL. >> > > okay, in this case what Philippe was just the full answer. Our OpenCL > kernels are highly GPU-specific and generate a 'good' kernel at runtime. We > haven't 'ported' (i.e. a one-to-one translation from OpenCL to CUDA) these > kernels to the CUDA backend yet, so only a fallback kernel is used for the > CUDA backend. It should be possible to carry these over with not too much > effort, but in such case it makes more sense to just call the cuBLAS > routines instead. Adding this for ViennaCL 1.7.1 is certainly possible if > that is what you would be happy with. > > Best regards, > Karli > > > > On Sat, Aug 1, 2015 at 3:56 AM, Karl Rupp <r...@iue.tuwien.ac.at >> <mailto:r...@iue.tuwien.ac.at>> wrote: >> >> Hi Charles, >> >> can you please quantify what you mean by 'slower'? How does 'slower' >> change as you increase the problem size? I would not be surprised if >> you see no performance gains below matrices of size 500-by-500. With >> the extra back-and-forth through PCI-Express you may even need >> matrices of at least 1000-by-1000. >> >> Best regards, >> Karli >> >> >> >> On 07/31/2015 09:04 PM, Charles Determan wrote: >> >> Greetings, >> >> Brief background, I am developing a series of R packages to bring >> ViennaCL to the R community. I have had success with the >> development of >> my gpuR package (https://github.com/cdeterman/gpuR) which relies >> on the >> OpenCL backend of ViennaCL (which is housed in the package >> RViennaCL). >> I am hoping to submit to CRAN in the coming weeks now that the >> latest >> stable ViennaCL version has just been released. >> >> Naturally, I wanted a companion package for a CUDA backend. >> This is now >> the gpuRcuda package (https://github.com/cdeterman/gpuRcuda). >> This has >> appeared to work successfully as most of the code is the same. >> However, >> my initial benchmarks are showing very dismal performance with >> the CUDA >> backend. >> >> I was wondering if someone from this list would be willing to >> have a >> look at my code to see why the CUDA code would be so much >> worse. I had >> thought, given working a NVIDIA card (GeForce GTX 970), CUDA would >> provide improved speed but the benchmarks are showing performance >> at >> least 5-fold slower than the CPU based R multiplication. Even the >> 'float' type matrix multiplication is slower than R (which only >> has >> double type support!). >> >> The sgemm CUDA file is >> ( >> https://github.com/cdeterman/gpuRcuda/blob/master/src/vcl_sgemm.cu) >> and >> the associated C++ file is >> ( >> https://github.com/cdeterman/gpuRcuda/blob/master/src/vcl_cudaMatrix_gemm.cpp >> ). >> >> Other note, I have tried making the two packages completely >> independent >> and the performance is still very poor with CUDA. >> >> I really appreciate any help others could provide >> troubleshooting this. >> I have truly run out of ideas as to why the code has such poor >> performance. >> >> Regards, >> Charles >> >> >> >> ------------------------------------------------------------------------------ >> >> >> >> _______________________________________________ >> ViennaCL-devel mailing list >> ViennaCL-devel@lists.sourceforge.net >> <mailto:ViennaCL-devel@lists.sourceforge.net> >> https://lists.sourceforge.net/lists/listinfo/viennacl-devel >> >> >> >> >
------------------------------------------------------------------------------
_______________________________________________ ViennaCL-devel mailing list ViennaCL-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/viennacl-devel