Hi,
> It seems to me that most of the differences between CUDA and OpenCL come > from the respective APIs, but that the kernel code is very similar in > the two cases. > Do you guys think it's possible to easily translate the generated kernel > from OpenCL to CUDA, by just doing one-to-one replacements of the > keywords? (__local => __shared__, __global > __device__, ...), or is > there any particular difficulty i've missed? this is pretty much what I've been doing while adding the CUDA backend ;-) In some reductions one may skip certain __syncthread(); statements, because threads within a warp are guaranteed to run at the same time. On the other hand, it does not seem to make much of a difference for all the memory-bandwidth-bound kernels anyway. Are the GEMM kernels on Fermi and Kepler reasonably similar so that we can use one GEMM kernel for both of them? I would prefer not to clutter the repository too much with periodic updates of such semi-automatically generated CUDA kernels. Best regards, Karli ------------------------------------------------------------------------------ Get 100% visibility into Java/.NET code with AppDynamics Lite! It's a free troubleshooting tool designed for production. Get down to code-level detail for bottlenecks, with <2% overhead. Download for free and get started troubleshooting in minutes. http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk _______________________________________________ ViennaCL-devel mailing list ViennaCL-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/viennacl-devel