Re: [PyCUDA] Porting nvidia's separable convolution example to pycuda: C++ templates, loop unrolling

Nicolas Pinto Sat, 13 Jun 2009 20:18:10 -0700

Andrew,

memcpy_htod is expecting a uint, not a tuple:


--- convolution_original.py     2009-06-13 23:12:49.000000000 -0400
+++ convolution_new.py  2009-06-13 23:16:37.000000000 -0400
@@ -324,8 +324,8 @@
     sourceImage_gpu = cuda.mem_alloc_like(sourceImage)
     intermediateImage_gpu = cuda.mem_alloc_like(sourceImage)
     cuda.memcpy_htod(sourceImage_gpu, sourceImage)
-    cuda.memcpy_htod(d_Kernel_rows,  filterx) # The kernel goes into
constant memory via a symbol defined in the kernel
-    cuda.memcpy_htod(d_Kernel_columns,  filtery)
+    cuda.memcpy_htod(d_Kernel_rows[0],  filterx) # The kernel goes into
constant memory via a symbol defined in the kernel
+    cuda.memcpy_htod(d_Kernel_columns[0],  filtery)
     # Call the kernels for convolution in each direction.
     blockGridRows = (iDivUp(DATA_W, ROW_TILE_W), DATA_H)
     blockGridColumns = (iDivUp(DATA_W, COLUMN_TILE_W), iDivUp(DATA_H,
COLUMN_TILE_H))

Best,

On Sat, Jun 13, 2009 at 10:16 PM, Andrew Wagner <[email protected]> wrote:

> On Sat, Jun 13, 2009 at 6:20 PM, Nicolas Pinto<[email protected]> wrote:
> > Andrew,
> >
> > The following patch should make it work. PyCuda kernel functions take
> > numpy.int32() whereas the grid should be int().
>
> Thanks a lot, Nicolas!  That got the kernel at least running.  I'm
> still getting garbage output, and I think it may be because my filter
> kernel (filterx) is not making it into constant memory (under the
> identifier d_Kernel_rows).
>
> >> Also, pycuda.Driver.Module.get_global seems to return a length 2
> >> tuple, while pycuda.Driver.memcpy_htod expects the reference to be an
> >> integer.  I got past this error by pulling out the first entry of the
> >> tuple, which seems like the address, but I'm not sure if this is
> >> correct.  This is for transferring the convolution kernel (the filter
> >> parameters, not the cuda kernel) into constant memory.
>
> The declaration of the constant array is in the kernel source at line
> 29 of convolution.py:
>
> __device__ __constant__ float d_Kernel_rows[KERNEL_W];
>
> I get the address for the symbol d_Kernel_rows at line 231:
>
> d_Kernel_rows = module.get_global('d_Kernel_rows')
>
> I try to upload data to the array on line 327:
>
> cuda.memcpy_htod(d_Kernel_rows,  filterx) # The kernel goes into
> constant memory via a symbol defined in the kernel
>
> I get the following error:
>
> The debugged program raised the exception ArgumentError
> "Python argument types in pycuda._driver.memcpy_htod(tuple,
> numpy.ndarray) did not match C++ signature: memcpy_htod(unsigned int
> dest, boost::python::api::object src, boost::python::api::object
> stream=None)"
>
> Here are some of the relevant variables from the debugger...
>
> >>> d_Kernel_rows
> (16778496, 68)
> >>> type(d_Kernel_rows[0])
> <type 'int'>
> >>> type(d_Kernel_rows[1])
> <type 'int'>
> >>> filterx
> array([ 0.01396019,  0.02230832,  0.03348875,  0.04722672,  0.06256524,
>        0.07786369,  0.09103188,  0.09997895,  0.10315263,  0.09997895,
>        0.09103188,  0.07786369,  0.06256524,  0.04722672,  0.03348875,
>        0.02230832,  0.01396019], dtype=float32)
> >>> filterx.shape
> (17,)
> >>> KERNEL_W
> 17
>
> Again, I have attached a stand-alone version of the code.
>
> Thanks!
>
> _______________________________________________
> PyCUDA mailing list
> [email protected]
> http://tiker.net/mailman/listinfo/pycuda_tiker.net
>
>


-- 
Nicolas Pinto
Ph.D. Candidate, Brain & Computer Sciences
Massachusetts Institute of Technology, USA
http://web.mit.edu/pinto

_______________________________________________
PyCUDA mailing list
[email protected]
http://tiker.net/mailman/listinfo/pycuda_tiker.net

Re: [PyCUDA] Porting nvidia's separable convolution example to pycuda: C++ templates, loop unrolling

Reply via email to