Re: [ViennaCL-devel] Linking ViennaCL (CUDA backend) to cuBLAS ...?

2013-12-19 Thread Karl Rupp
Hey, There is some trickery going on with transpositions and layout, but it works for every transpose/layout combination. One can also link A's blas to his own gemm function, provided a tiny wrapper (essentially to ensure signature

Re: [ViennaCL-devel] Linking ViennaCL (CUDA backend) to cuBLAS ...?

2013-12-18 Thread Philippe Tillet
Hey, 2013/12/18 Karl Rupp r...@iue.tuwien.ac.at Hi. A short update : I've implemented linkage to CBlas and CuBlas with dynamic selection. If activated through VIENNACL_WITH_CUBLAS, one can go back and forth between cublas and the original backend by doing: A.blas().gemm(NULL);

Re: [ViennaCL-devel] Linking ViennaCL (CUDA backend) to cuBLAS ...?

2013-12-17 Thread Philippe Tillet
Hi, A short update : I've implemented linkage to CBlas and CuBlas with dynamic selection. If activated through VIENNACL_WITH_CUBLAS, one can go back and forth between cublas and the original backend by doing: A.blas().gemm(NULL);

Re: [ViennaCL-devel] Linking ViennaCL (CUDA backend) to cuBLAS ...?

2013-12-15 Thread Philippe Tillet
Hey, 2013/12/15 Karl Rupp r...@iue.tuwien.ac.at Hi again, While we're at it, let's discuss the dynamic dispatching mechanism we'd ideally want. I see two options: (1) A global function pointer table. So, one could for example set: viennacl::internal_blas::sgemv_ptr =

Re: [ViennaCL-devel] Linking ViennaCL (CUDA backend) to cuBLAS ...?

2013-12-15 Thread Karl Rupp
Hey, I agree. However, it seems to me that setting the implementation for each matrix would end up being tedious... one table per memory backend since to make sense conceptually to me, since the performance (and the portability) of each blas implementation is determined by the underlying

Re: [ViennaCL-devel] Linking ViennaCL (CUDA backend) to cuBLAS ...?

2013-12-15 Thread Philippe Tillet
Hi, 2013/12/15 Karl Rupp r...@iue.tuwien.ac.at Hey, I agree. However, it seems to me that setting the implementation for each matrix would end up being tedious... one table per memory backend since to make sense conceptually to me, since the performance (and the portability) of each

Re: [ViennaCL-devel] Linking ViennaCL (CUDA backend) to cuBLAS ...?

2013-12-15 Thread Karl Rupp
Hi, Yeah, it certainly is a bit tedious. Feel free to only do this for matrix-matrix multiplications for now, a full operation table is presumably too much of a refactoring for ViennaCL 1.x.y, but much better suited for 2.0.0. Yes. It's actually a pretty complicated

Re: [ViennaCL-devel] Linking ViennaCL (CUDA backend) to cuBLAS ...?

2013-12-15 Thread Philippe Tillet
Hi, 2013/12/15 Karl Rupp r...@iue.tuwien.ac.at Hi, Yeah, it certainly is a bit tedious. Feel free to only do this for matrix-matrix multiplications for now, a full operation table is presumably too much of a refactoring for ViennaCL 1.x.y, but much better suited for

Re: [ViennaCL-devel] Linking ViennaCL (CUDA backend) to cuBLAS ...?

2013-12-14 Thread Philippe Tillet
Hello, I've just realized that most BLAS implementation don't provide anyway to do strided matrix accesses in the non-leading dimension ... ! Is this correct? I was hoping that we could have avoided such special cases, but it seems like a couple of tests will need to be made. Philippe