Hi all,

Ahmed Fasih <[email protected]> writes:
> Hi folks, I write in the hope that someone has gotten a K20 Kepler 3.5
> compute capability device and has gotten it to do dynamic parallelism,
> wherein a kernel can kick off grids on its own without returning to
> the CPU. A "hello world" example is given at [1], page 23. With the
> suggested nvcc command, I was able to compile and run the hello world
> perfectly, but when I tried to load the kernels into PyCUDA, I get
> LogicErrors, asserting
>
> "cuModuleLoadDataEx failed: invalid image -"
>
> This is before and after I take out the runtime API calls like
> cudaDeviceSynchronize() in the kernel code, and persist after I pass
> the "-rdc=true" and "-lcudadevrt" options into SourceModule. Could
> PyCUDA be hating the runtime API calls in a kernel? If anybody's
> gotten this to work, please share!
>
> On a related note, I've added support for the sm_35 (30?) feature of
> choosing four versus eight byte banks for shared memory from PyCUDA, I
> can send out a rough patch.
>
> [1] 
> http://docs.nvidia.com/cuda/pdf/CUDA_Dynamic_Parallelism_Programming_Guide.pdf
>
> For completeness, here's the original CUDA code from there:
>
> #include <stdio.h>
> __global__ void childKernel()
> {
> printf("Hello ");
> }
> __global__ void parentKernel()
> {
> // launch child
> childKernel<<<1,1>>>();
> if (cudaSuccess != cudaGetLastError()) {
> return;
> }
> // wait for child to complete
> if (cudaSuccess != cudaDeviceSynchronize()) {
> return;
> }
> printf("World!\n");
> }
> int main(int argc, char *argv[])
> {
> // launch parent
> parentKernel<<<1,1>>>();
> if (cudaSuccess != cudaGetLastError()) {
> return 1;
> }
> // wait for parent to complete
> if (cudaSuccess != cudaDeviceSynchronize()) {
> return 2;
> }
> return 0;
> }
>
>
> And compiled via:
> $ nvcc -arch=sm_35 -rdc=true hello_world.cu -o hello -lcudadevrt
>
> (copied & pasted from NVIDIA doc [1], all rights reserved by NVIDIA etc.)

Sorry about the thread necromancy. I think I've got a lead on how to
make this happen, here:

https://github.com/inducer/pycuda/issues/45#issuecomment-48953922

Andreas

Attachment: pgpBavep8bLMX.pgp
Description: PGP signature

_______________________________________________
PyCUDA mailing list
[email protected]
http://lists.tiker.net/listinfo/pycuda

Reply via email to