Hi Philippe, > Since our generator is skeleton-based anyway, what about having a look > at the best performing kernels in RaijinCL and then extending the > current generator accordingly such that these kernels are covered as > well? I consider this to be *far* less painful then trying to merge in > RaijinCL - as you certainly know, it's not that easy to 'just interface > with a kernel generator', particularly if this is supposed to happen at > runtime and in a reliable way. Even just within ViennaCL this took us > (at least) three iterations to come up with a useful model in > practice... > > > > Yes, probably. Plus, we need not all functionalities of RaijinCL > (images, for example). I have taken contact with Rahul (author of > RaijinCL). I just want to make sure that RaijinCL gets the credits it > deserves (3TFLOP/s on HD7970 is a lot !), and maybe join our expertise > to get even better performance :)
Sure, they should get all the credits for their work. On the other hand, I'm not sure whether we can use images for peak GEMM performance, as this would hit us otherwise when trying to do reasonable things inside algorithms. So maybe we already get the 'best' performance under the constraint of not using images? Best regards, Karli ------------------------------------------------------------------------------ Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more! Discover the easy way to master current and previous Microsoft technologies and advance your career. Get an incredible 1,500+ hours of step-by-step tutorial videos with LearnDevNow. Subscribe today and save! http://pubads.g.doubleclick.net/gampad/clk?id=58040911&iu=/4140/ostg.clktrk _______________________________________________ ViennaCL-devel mailing list ViennaCL-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/viennacl-devel