On 03/23/2013 02:22 AM, Peter Colberg wrote:
> The issue is local and constant memory passed by parameter. The CUDA
> driver rejects these cases, despite supporting them in OpenCL.

Did you try a kernel with constant size automatic local variables in
the kernel, i.e., not passing them as pointer parameters?

Perhaps NVIDIA's OpenCL compiler converts the pointer args to automatic
ones (or a single shared pointer arg, e.g., as the last argument in the
list) as soon as the sizes are known (at latest at launch). This would be
the opposite to what pocl does by default.

Seems one can allocate only one shared memory region dynamically
in CUDA kernels:

http://stackoverflow.com/questions/9187899/cuda-shared-memory-array-variable
https://devtalk.nvidia.com/default/topic/400873/shared-memory-dynamic-allocation/
https://devtalk.nvidia.com/default/topic/379736/how-to-dynamically-allocate-shared-memory-in-_global__-or-__device__-functions/

Thus, maybe we need to collect all the local allocations to a single pointer
and allocate it once and in the kernel reassign the variables to point parts
of this region. Should not be a difficult addition to the LLVM pass we already
use for processing the automatic locals.

 > This gives the impression that the real issue with this pocl backend
 > will not be the LLVM NVPTX backend, but the CUDA driver, which is
 > outside the realm of LLVM. In the above case, the CUDA driver rejects
 > valid PTX code without giving any reason.

Asking in the NVIDIA forums should not harm, especially if this repeats
with a CUDA kernel too.

-- 
--Pekka


------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_mar
_______________________________________________
pocl-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/pocl-devel

Reply via email to