Hi folks, I write in the hope that someone has gotten a K20 Kepler 3.5 compute capability device and has gotten it to do dynamic parallelism, wherein a kernel can kick off grids on its own without returning to the CPU. A "hello world" example is given at [1], page 23. With the suggested nvcc command, I was able to compile and run the hello world perfectly, but when I tried to load the kernels into PyCUDA, I get LogicErrors, asserting
"cuModuleLoadDataEx failed: invalid image -" This is before and after I take out the runtime API calls like cudaDeviceSynchronize() in the kernel code, and persist after I pass the "-rdc=true" and "-lcudadevrt" options into SourceModule. Could PyCUDA be hating the runtime API calls in a kernel? If anybody's gotten this to work, please share! On a related note, I've added support for the sm_35 (30?) feature of choosing four versus eight byte banks for shared memory from PyCUDA, I can send out a rough patch. [1] http://docs.nvidia.com/cuda/pdf/CUDA_Dynamic_Parallelism_Programming_Guide.pdf For completeness, here's the original CUDA code from there: #include <stdio.h> __global__ void childKernel() { printf("Hello "); } __global__ void parentKernel() { // launch child childKernel<<<1,1>>>(); if (cudaSuccess != cudaGetLastError()) { return; } // wait for child to complete if (cudaSuccess != cudaDeviceSynchronize()) { return; } printf("World!\n"); } int main(int argc, char *argv[]) { // launch parent parentKernel<<<1,1>>>(); if (cudaSuccess != cudaGetLastError()) { return 1; } // wait for parent to complete if (cudaSuccess != cudaDeviceSynchronize()) { return 2; } return 0; } And compiled via: $ nvcc -arch=sm_35 -rdc=true hello_world.cu -o hello -lcudadevrt (copied & pasted from NVIDIA doc [1], all rights reserved by NVIDIA etc.) _______________________________________________ PyCUDA mailing list [email protected] http://lists.tiker.net/listinfo/pycuda
