Hey,
There is some trickery going on with transpositions and layout,
but it
works for every transpose/layout combination. One can also link
A's blas
to his own gemm function, provided a tiny wrapper (essentially
to ensure
signature
Hey,
2013/12/18 Karl Rupp r...@iue.tuwien.ac.at
Hi.
A short update : I've implemented linkage to CBlas and CuBlas with
dynamic selection.
If activated through VIENNACL_WITH_CUBLAS, one can go back and forth
between cublas and the original backend by doing:
A.blas().gemm(NULL);
Hi,
A short update : I've implemented linkage to CBlas and CuBlas with dynamic
selection.
If activated through VIENNACL_WITH_CUBLAS, one can go back and forth
between cublas and the original backend by doing:
A.blas().gemm(NULL);
Hey,
2013/12/15 Karl Rupp r...@iue.tuwien.ac.at
Hi again,
While we're at it, let's discuss the dynamic dispatching mechanism we'd
ideally want. I see two options:
(1) A global function pointer table. So, one could for example set:
viennacl::internal_blas::sgemv_ptr =
Hey,
I agree. However, it seems to me that setting the implementation for
each matrix would end up being tedious... one table per memory backend
since to make sense conceptually to me, since the performance (and the
portability) of each blas implementation is determined by the underlying
Hi,
2013/12/15 Karl Rupp r...@iue.tuwien.ac.at
Hey,
I agree. However, it seems to me that setting the implementation for
each matrix would end up being tedious... one table per memory backend
since to make sense conceptually to me, since the performance (and the
portability) of each
Hi,
Yeah, it certainly is a bit tedious. Feel free to only do this for
matrix-matrix multiplications for now, a full operation table is
presumably too much of a refactoring for ViennaCL 1.x.y, but much
better suited for 2.0.0.
Yes. It's actually a pretty complicated
Hi,
2013/12/15 Karl Rupp r...@iue.tuwien.ac.at
Hi,
Yeah, it certainly is a bit tedious. Feel free to only do this for
matrix-matrix multiplications for now, a full operation table is
presumably too much of a refactoring for ViennaCL 1.x.y, but much
better suited for
Hello,
I've just realized that most BLAS implementation don't provide anyway to do
strided matrix accesses in the non-leading dimension ... ! Is this correct?
I was hoping that we could have avoided such special cases, but it seems
like a couple of tests will need to be made.
Philippe