On 03/23/2013 02:22 AM, Peter Colberg wrote: > The issue is local and constant memory passed by parameter. The CUDA > driver rejects these cases, despite supporting them in OpenCL.
Did you try a kernel with constant size automatic local variables in the kernel, i.e., not passing them as pointer parameters? Perhaps NVIDIA's OpenCL compiler converts the pointer args to automatic ones (or a single shared pointer arg, e.g., as the last argument in the list) as soon as the sizes are known (at latest at launch). This would be the opposite to what pocl does by default. Seems one can allocate only one shared memory region dynamically in CUDA kernels: http://stackoverflow.com/questions/9187899/cuda-shared-memory-array-variable https://devtalk.nvidia.com/default/topic/400873/shared-memory-dynamic-allocation/ https://devtalk.nvidia.com/default/topic/379736/how-to-dynamically-allocate-shared-memory-in-_global__-or-__device__-functions/ Thus, maybe we need to collect all the local allocations to a single pointer and allocate it once and in the kernel reassign the variables to point parts of this region. Should not be a difficult addition to the LLVM pass we already use for processing the automatic locals. > This gives the impression that the real issue with this pocl backend > will not be the LLVM NVPTX backend, but the CUDA driver, which is > outside the realm of LLVM. In the above case, the CUDA driver rejects > valid PTX code without giving any reason. Asking in the NVIDIA forums should not harm, especially if this repeats with a CUDA kernel too. -- --Pekka ------------------------------------------------------------------------------ Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://p.sf.net/sfu/appdyn_d2d_mar _______________________________________________ pocl-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/pocl-devel
