Hello-
I'm working on porting NVIDIA's example separable convolution kernel
to pycuda, and am running into a little snag. The kernel uses some
sort of recursive C++ function templates to do loop unrolling. When I
compile, I get "error: this declaration may not have extern "C"
linkage". I assume this has something to do with the 'extern
"C" {...}' that pycuda is wrapping around the kernel before sending it
to the compiler.
My questions:
1) Is there some trivial workaround for this that will leave the
template stuff un-molested?
2) If I just rip out the templates and do the loop unrolling in
python, are there some pythonic examples of doing the loop unrolling
out there? (I could roll my own, but I'm a novice so it might be ugly)
I'm already using string.Template() to replace some stuff that was
#defined in the kernel but is needed in the calling function.
Thanks!
Drew
_______________________________________________
PyCUDA mailing list
[email protected]
http://tiker.net/mailman/listinfo/pycuda_tiker.net