[PyCUDA] Dynamic parallelism (sm_35) with PyCUDA

Ahmed Fasih Tue, 12 Feb 2013 05:35:43 -0800

Hi folks, I write in the hope that someone has gotten a K20 Kepler 3.5
compute capability device and has gotten it to do dynamic parallelism,
wherein a kernel can kick off grids on its own without returning to
the CPU. A "hello world" example is given at [1], page 23. With the
suggested nvcc command, I was able to compile and run the hello world
perfectly, but when I tried to load the kernels into PyCUDA, I get
LogicErrors, asserting


"cuModuleLoadDataEx failed: invalid image -"

This is before and after I take out the runtime API calls like
cudaDeviceSynchronize() in the kernel code, and persist after I pass
the "-rdc=true" and "-lcudadevrt" options into SourceModule. Could
PyCUDA be hating the runtime API calls in a kernel? If anybody's
gotten this to work, please share!

On a related note, I've added support for the sm_35 (30?) feature of
choosing four versus eight byte banks for shared memory from PyCUDA, I
can send out a rough patch.

[1] 
http://docs.nvidia.com/cuda/pdf/CUDA_Dynamic_Parallelism_Programming_Guide.pdf

For completeness, here's the original CUDA code from there:

#include <stdio.h>
__global__ void childKernel()
{
printf("Hello ");
}
__global__ void parentKernel()
{
// launch child
childKernel<<<1,1>>>();
if (cudaSuccess != cudaGetLastError()) {
return;
}
// wait for child to complete
if (cudaSuccess != cudaDeviceSynchronize()) {
return;
}
printf("World!\n");
}
int main(int argc, char *argv[])
{
// launch parent
parentKernel<<<1,1>>>();
if (cudaSuccess != cudaGetLastError()) {
return 1;
}
// wait for parent to complete
if (cudaSuccess != cudaDeviceSynchronize()) {
return 2;
}
return 0;
}


And compiled via:
$ nvcc -arch=sm_35 -rdc=true hello_world.cu -o hello -lcudadevrt

(copied & pasted from NVIDIA doc [1], all rights reserved by NVIDIA etc.)

_______________________________________________
PyCUDA mailing list
[email protected]
http://lists.tiker.net/listinfo/pycuda

[PyCUDA] Dynamic parallelism (sm_35) with PyCUDA

Reply via email to