Hey everybody,
In order to get good GEMM performance, it's pretty much admitted that we
have at some point to pad matrices, in such a way that each row/column
should be aligned (and not only the global piece or memory).
It seems to me that this potentially breaks fast_copy(ScalarType* p1,
ScalarType * p2, viennacl::matrix<>). Or at least, it requires to define
how we want it to behave.
What do you think is the most reasonable?
-> throwing an exception if matrix.internal_size1()!=matrix.size1() ||
matrix.internal_size2()!=matrix.size2().
-> Somewhat handling it internally. There is both a cost in memory
footprint and execution time, since the only way (I can think of) involves
creating a temporary handle and invoking a kernel. I'm not that worried
about the execution time cost of this copy kernel which will anyway happen
on GDDR5 or GDDR6, but it seems like the memory cost is fairly big (linear
in the size of the handle to copy). We can probably solve the issue with a
fairly easy blocking mechanism, but then this will increase the execution
time overhead.
I prefer option 2, because the following scenario is common :
-> Have data on CPU
-> fast_copy from CPU to GPU
-> Execute operations on GPU (here we want the fastest kernels , and they
require padding)
-> fast_copy from GPU to CPU
On the other hand, on some problems the user will run out of memory just
because of the memory overhead induced by the copy operation...
What do you think?
Philippe
------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60133471&iu=/4140/ostg.clktrk
_______________________________________________
ViennaCL-devel mailing list
ViennaCL-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/viennacl-devel