Re: [PyCUDA] Porting nvidia's separable convolution example to pycuda: C++ templates, loop unrolling

Nicolas Pinto Sat, 13 Jun 2009 16:20:22 -0700

Andrew,

The following patch should make it work. PyCuda kernel functions take
numpy.int32() whereas the grid should be int().



% diff -u convolution_original.py convolution_new.py
--- convolution_original.py    2009-06-13 19:15:30.000000000 -0400
+++ convolution_new.py    2009-06-13 19:17:54.000000000 -0400
@@ -334,8 +334,8 @@
     DATA_W = int(DATA_W)
 #    DATA_H = numpy.int32(DATA_H)
 #    DATA_W = numpy.int32(DATA_W)
-    convolutionRowGPU(intermediateImage_gpu,  sourceImage_gpu,  DATA_W,
DATA_H,  grid=blockGridRows,  block=threadBlockRows)
-    convolutionColumnGPU(destImage_gpu,  intermediateImage_gpu,  DATA_W,
DATA_H,  COLUMN_TILE_W * threadBlockColumns[1],  DATA_W *
threadBlockColumns[1],  grid=blockGridColumns,  block=threadBlockColumns)
+    convolutionRowGPU(intermediateImage_gpu,  sourceImage_gpu,
numpy.int32(DATA_W),  numpy.int32(DATA_H),  grid=[int(e) for e in
blockGridRows],  block=threadBlockRows)
+    convolutionColumnGPU(destImage_gpu,  intermediateImage_gpu,
numpy.int32(DATA_W),  numpy.int32(DATA_H),  numpy.int32(COLUMN_TILE_W *
threadBlockColumns[1]),  numpy.int32(DATA_W * threadBlockColumns[1]),
grid=[int(e) for e in blockGridColumns],  block=threadBlockColumns)
     # Pull the data back from the GPU.
     cuda.memcpy_dtoh(destImage,  destImage_gpu)
     return destImage

Cheers,

On Sat, Jun 13, 2009 at 6:46 PM, Andrew Wagner <[email protected]> wrote:

> OK, I'm completely stuck on my port of the convolution example.
>
> It seems no matter how I call the kernel it complains that my call
> doesn't match the interface.  For most of the calling variations I
> tried, the types it claimed I passed in were not the types I actually
> passed in, which certainly isn't helping debugging.
>
> to be precise, on line 337 of the attached convolution.py, I'm getting:
>
> The debugged program raised the exception TypeError
> "invalid type on parameter #2 (0-based)"
> File:
> /usr/lib/python2.5/site-packages/pycuda-0.92-py2.5-linux-x86_64.egg/pycuda/driver.py,
> Line: 78
>
> The code shouldn't have any dependencies outside of pycuda and numpy,
> so it should be easy to run.
>
> Also, pycuda.Driver.Module.get_global seems to return a length 2
> tuple, while pycuda.Driver.memcpy_htod expects the reference to be an
> integer.  I got past this error by pulling out the first entry of the
> tuple, which seems like the address, but I'm not sure if this is
> correct.  This is for transferring the convolution kernel (the filter
> parameters, not the cuda kernel) into constant memory.
>
> I'm using pycuda 0.92 with CUDA 2.1 on Debian 5.0.1 with a kernel that
> a friend had to help me recompile to be compatible with NVIDIA's
> drivers.
>
> Any ideas?
>
> Thanks!
> Drew
>
> _______________________________________________
> PyCUDA mailing list
> [email protected]
> http://tiker.net/mailman/listinfo/pycuda_tiker.net
>
>


-- 
Nicolas Pinto
Ph.D. Candidate, Brain & Computer Sciences
Massachusetts Institute of Technology, USA
http://web.mit.edu/pinto

_______________________________________________
PyCUDA mailing list
[email protected]
http://tiker.net/mailman/listinfo/pycuda_tiker.net

Re: [PyCUDA] Porting nvidia's separable convolution example to pycuda: C++ templates, loop unrolling

Reply via email to