Hey,

I've started back on the generator today, and realized how ugly the
dispatching mechanism was, to take advantage of the equivalencies based on
the fact that
RowMajor + Trans <=> ColMajor + NoTrans

Actually, I've been wondering : why wouldn't we do this on the whole
codebase? We could presumably solely focus on providing a simple BLAS
interface (All Column-Major), and do the additional trickery at some point
beforewards. I see a couple of advantages to this:
=> This would enable us to maintain only 4 GEMM and 2 GEMV kernels, instead
of 32 GEMM and 4 GEMV kernels.
=> This would enormously increase the consistency between the default
implementations, the BLAS backends and the kernel generator (because all
these implementations can focus on providing just a simple column major
BLAS interface)

Am I missing something? If not, at which point such a dispatching mechanism
should take place?

Best regards,
Philippe
------------------------------------------------------------------------------
Rapidly troubleshoot problems before they affect your business. Most IT 
organizations don't have a clear picture of how application performance 
affects their revenue. With AppDynamics, you get 100% visibility into your 
Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro!
http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk
_______________________________________________
ViennaCL-devel mailing list
ViennaCL-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/viennacl-devel

Reply via email to