Hi,

 > It seems to me that most of the differences between CUDA and OpenCL come
> from the respective APIs, but that the kernel code is very similar in
> the two cases.
> Do you guys think it's possible to easily translate the generated kernel
> from OpenCL to CUDA, by just doing one-to-one replacements of the
> keywords? (__local => __shared__, __global > __device__, ...), or is
> there any particular difficulty i've missed?

this is pretty much what I've been doing while adding the CUDA backend 
;-) In some reductions one may skip certain __syncthread(); statements, 
because threads within a warp are guaranteed to run at the same time. On 
the other hand, it does not seem to make much of a difference for all 
the memory-bandwidth-bound kernels anyway.

Are the GEMM kernels on Fermi and Kepler reasonably similar so that we 
can use one GEMM kernel for both of them? I would prefer not to clutter 
the repository too much with periodic updates of such semi-automatically 
generated CUDA kernels.

Best regards,
Karli


------------------------------------------------------------------------------
Get 100% visibility into Java/.NET code with AppDynamics Lite!
It's a free troubleshooting tool designed for production.
Get down to code-level detail for bottlenecks, with <2% overhead. 
Download for free and get started troubleshooting in minutes. 
http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk
_______________________________________________
ViennaCL-devel mailing list
ViennaCL-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/viennacl-devel

Reply via email to