Hi Phil, > In order to get good GEMM performance, it's pretty much admitted that we > have at some point to pad matrices, in such a way that each row/column > should be aligned (and not only the global piece or memory).
This is now already the case: All the rows and columns are padded with zero so that the fast GEMM kernels can be used :-) If I'm not mistaken, I still need to update the GEMM selection slightly such that there are no ambiguities with matrix proxies, but in essence it's there and works. > It seems to me that this potentially breaks fast_copy(ScalarType* p1, > ScalarType * p2, viennacl::matrix<>). Or at least, it requires to define > how we want it to behave. > What do you think is the most reasonable? > -> throwing an exception if matrix.internal_size1()!=matrix.size1() || > matrix.internal_size2()!=matrix.size2(). fast_copy() is explicitly said to be an optimization and to be used with care. Thus, it should either throw an exception (preferred) or run into an assertion if the provided values do not match the internal buffer size. > -> Somewhat handling it internally. There is both a cost in memory > footprint and execution time, since the only way (I can think of) > involves creating a temporary handle and invoking a kernel. I'm not that > worried about the execution time cost of this copy kernel which will > anyway happen on GDDR5 or GDDR6, but it seems like the memory cost is > fairly big (linear in the size of the handle to copy). We can probably > solve the issue with a fairly easy blocking mechanism, but then this > will increase the execution time overhead. copy() is available for stability, fast_copy() for performance. I don't think we should provide a fast_copy() which silently throws away performance just for the sake of stability. I expect that a user wants to get some sort of error if the 'performance version' does not work. Best regards, Karli ------------------------------------------------------------------------------ October Webinars: Code for Performance Free Intel webinars can help you accelerate application performance. Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from the latest Intel processors and coprocessors. See abstracts and register > http://pubads.g.doubleclick.net/gampad/clk?id=60134791&iu=/4140/ostg.clktrk _______________________________________________ ViennaCL-devel mailing list ViennaCL-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/viennacl-devel