Re: [PyCUDA] Dynamic parallelism (sm_35) with PyCUDA

Andreas Kloeckner Wed, 13 Feb 2013 21:37:52 -0800

Hi Ahmed,

Ahmed Fasih <[email protected]> writes:
> Hi folks, I write in the hope that someone has gotten a K20 Kepler 3.5
> compute capability device and has gotten it to do dynamic parallelism,
> wherein a kernel can kick off grids on its own without returning to
> the CPU. A "hello world" example is given at [1], page 23. With the
> suggested nvcc command, I was able to compile and run the hello world
> perfectly, but when I tried to load the kernels into PyCUDA, I get
> LogicErrors, asserting
>
> "cuModuleLoadDataEx failed: invalid image -"
>
> This is before and after I take out the runtime API calls like
> cudaDeviceSynchronize() in the kernel code, and persist after I pass
> the "-rdc=true" and "-lcudadevrt" options into SourceModule. Could
> PyCUDA be hating the runtime API calls in a kernel? If anybody's
> gotten this to work, please share!
>
> On a related note, I've added support for the sm_35 (30?) feature of
> choosing four versus eight byte banks for shared memory from PyCUDA, I
> can send out a rough patch.
>
> [1] 
> http://docs.nvidia.com/cuda/pdf/CUDA_Dynamic_Parallelism_Programming_Guide.pdf
>
> For completeness, here's the original CUDA code from there:
>
> #include <stdio.h>
> __global__ void childKernel()
> {
> printf("Hello ");
> }
> __global__ void parentKernel()
> {
> // launch child
> childKernel<<<1,1>>>();
> if (cudaSuccess != cudaGetLastError()) {
> return;
> }
> // wait for child to complete
> if (cudaSuccess != cudaDeviceSynchronize()) {
> return;
> }
> printf("World!\n");
> }
> int main(int argc, char *argv[])
> {
> // launch parent
> parentKernel<<<1,1>>>();
> if (cudaSuccess != cudaGetLastError()) {
> return 1;
> }
> // wait for parent to complete
> if (cudaSuccess != cudaDeviceSynchronize()) {
> return 2;
> }
> return 0;
> }
>
>
> And compiled via:
> $ nvcc -arch=sm_35 -rdc=true hello_world.cu -o hello -lcudadevrt
>
> (copied & pasted from NVIDIA doc [1], all rights reserved by NVIDIA etc.)


Can you send a runnable example that exhibits that failure? (And yes,
please send that patch!) I've got access to a K20, but that currently
has the weird problem of working fine under CL and refusing to be
detected under CUDA...?

Andreas

_______________________________________________
PyCUDA mailing list
[email protected]
http://lists.tiker.net/listinfo/pycuda

Re: [PyCUDA] Dynamic parallelism (sm_35) with PyCUDA

Reply via email to