Hi Phil,

 > In order to get good GEMM performance, it's pretty much admitted that we
> have at some point to pad matrices, in such a way that each row/column
> should be aligned (and not only the global piece or memory).

This is now already the case: All the rows and columns are padded with 
zero so that the fast GEMM kernels can be used :-) If I'm not mistaken, 
I still need to update the GEMM selection slightly such that there are 
no ambiguities with matrix proxies, but in essence it's there and works.


> It seems to me that this potentially breaks fast_copy(ScalarType* p1,
> ScalarType * p2, viennacl::matrix<>). Or at least, it requires to define
> how we want it to behave.
> What do you think is the most reasonable?
> -> throwing an exception if matrix.internal_size1()!=matrix.size1() ||
> matrix.internal_size2()!=matrix.size2().

fast_copy() is explicitly said to be an optimization and to be used with 
care. Thus, it should either throw an exception (preferred) or run into 
an assertion if the provided values do not match the internal buffer size.

> ->  Somewhat handling it internally. There is both a cost in memory
> footprint and execution time, since the only way (I can think of)
> involves creating a temporary handle and invoking a kernel. I'm not that
> worried about the execution time cost of this copy kernel which will
> anyway happen on GDDR5 or GDDR6, but it seems like the memory cost is
> fairly big (linear in the size of the handle to copy). We can probably
> solve the issue with a fairly easy blocking mechanism, but then this
> will increase the execution time overhead.

copy() is available for stability, fast_copy() for performance. I don't 
think we should provide a fast_copy() which silently throws away 
performance just for the sake of stability. I expect that a user wants 
to get some sort of error if the 'performance version' does not work.

Best regards,
Karli


------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60134791&iu=/4140/ostg.clktrk
_______________________________________________
ViennaCL-devel mailing list
ViennaCL-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/viennacl-devel

Reply via email to