Hi. > A short update : I've implemented linkage to CBlas and CuBlas with > dynamic selection. > If activated through VIENNACL_WITH_CUBLAS, one can go back and forth > between cublas and the original backend by doing: > > A.blas().gemm(NULL); > A.blas().gemm(viennacl::backend::blas::cublas_functions<value_type>::gemm); > > (and similarly for cblas.)
Nice, thanks! I think we can shorten the second call to something like A.blas().gemm(viennacl::backend::cublas); for convenience. > There is some trickery going on with transpositions and layout, but it > works for every transpose/layout combination. One can also link A's blas > to his own gemm function, provided a tiny wrapper (essentially to ensure > signature compatibility) Cool! > A very good news is that this allows viennacl to work very well on very > recent NVidia Hardware, until our autotuning engine is fully operational. > On my laptop, cublasSgemm is about 5 times faster than the current CUDA > implementation , and 20% faster than the OpenCL kernel found by the > autotuner (120GFLOPs vs 25GFLOPs vs 95GFLOPs). Also,linking with > OpenBlas leads to HUGE performance boost on the CPU ( 0.02GFLOP/s vs > 70GFLOP/s)...! For our native CUDA implementation it's probably only a matter of porting the results from the OpenCL tuner over. Unfortunately I don't see a good way of doing this with CUDA without a significant penalty on compilation times, because there is no concept of runtime kernel selection in CUDA so far. The performance difference for GEMM of our CPU backend is not surprising, this was never subject to optimization ;-) > A little question remains. For now, the behavior is really weird when > one defines both VIENNACL_WITH_CBLAS and VIENNACL_WITH_CUBLAS. How to > handle this? I am not very familiar with the multiple backends and I > don't know to which extent they can be combined. Therefore, I see > multiple options, but can't tell which one is better. > > 1 -> trigger a preprocessor error when both commands are defined together > 2 -> slightly modify the API : A.cuda_blas(), A.host_blas(), A.cl_blas() > > I think that option 2 is better, considering that there is already > cuda_handle(), opencl_handle(), cpu_handle() or something similar, if > I'm correct. Any advice? The reason why cuda_handle(), opencl_handle() and cpu_handle() exists under different names is that they return different types (i.e. the memory buffer). For the BLAS backends I don't want to have different member names, because this gets annoying for users. For example, if a user wants to cycle through the backends for e.g. benchmark purposes, she would have to write if (my_constant == CUDA) A.cuda_blas()... else if (my_constant == HOST) A.host_blas()... else A.cl_blas()... so making the code longer than necessary. I suggest to query some central registry where the backends are registered and then cycle through them: SomeListType blas_list = viennacl::blas_implementations_available(); for ( it = blas_list.begin(); ... ) { A.blas(*it); do_something(A); } I don't know whether .blas() is the best name for this, because in the future we might also have more non-BLAS operations such as sorting or FFT - maybe we use .operations() to better reflect the operations table? --- It seems to me that this is going in a very fruitful directions. Any objections in pushing and extending this for the 1.6.0 release? 1.5.0 is essentially done, I'm currently writing the last bits of documentation and resolve some minor warnings on Visual Studio... Best regards, Karli ------------------------------------------------------------------------------ Rapidly troubleshoot problems before they affect your business. Most IT organizations don't have a clear picture of how application performance affects their revenue. With AppDynamics, you get 100% visibility into your Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro! http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk _______________________________________________ ViennaCL-devel mailing list ViennaCL-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/viennacl-devel