Andrew,
memcpy_htod is expecting a uint, not a tuple:
--- convolution_original.py 2009-06-13 23:12:49.000000000 -0400
+++ convolution_new.py 2009-06-13 23:16:37.000000000 -0400
@@ -324,8 +324,8 @@
sourceImage_gpu = cuda.mem_alloc_like(sourceImage)
intermediateImage_gpu = cuda.mem_alloc_like(sourceImage)
cuda.memcpy_htod(sourceImage_gpu, sourceImage)
- cuda.memcpy_htod(d_Kernel_rows, filterx) # The kernel goes into
constant memory via a symbol defined in the kernel
- cuda.memcpy_htod(d_Kernel_columns, filtery)
+ cuda.memcpy_htod(d_Kernel_rows[0], filterx) # The kernel goes into
constant memory via a symbol defined in the kernel
+ cuda.memcpy_htod(d_Kernel_columns[0], filtery)
# Call the kernels for convolution in each direction.
blockGridRows = (iDivUp(DATA_W, ROW_TILE_W), DATA_H)
blockGridColumns = (iDivUp(DATA_W, COLUMN_TILE_W), iDivUp(DATA_H,
COLUMN_TILE_H))
Best,
On Sat, Jun 13, 2009 at 10:16 PM, Andrew Wagner <[email protected]> wrote:
> On Sat, Jun 13, 2009 at 6:20 PM, Nicolas Pinto<[email protected]> wrote:
> > Andrew,
> >
> > The following patch should make it work. PyCuda kernel functions take
> > numpy.int32() whereas the grid should be int().
>
> Thanks a lot, Nicolas! That got the kernel at least running. I'm
> still getting garbage output, and I think it may be because my filter
> kernel (filterx) is not making it into constant memory (under the
> identifier d_Kernel_rows).
>
> >> Also, pycuda.Driver.Module.get_global seems to return a length 2
> >> tuple, while pycuda.Driver.memcpy_htod expects the reference to be an
> >> integer. I got past this error by pulling out the first entry of the
> >> tuple, which seems like the address, but I'm not sure if this is
> >> correct. This is for transferring the convolution kernel (the filter
> >> parameters, not the cuda kernel) into constant memory.
>
> The declaration of the constant array is in the kernel source at line
> 29 of convolution.py:
>
> __device__ __constant__ float d_Kernel_rows[KERNEL_W];
>
> I get the address for the symbol d_Kernel_rows at line 231:
>
> d_Kernel_rows = module.get_global('d_Kernel_rows')
>
> I try to upload data to the array on line 327:
>
> cuda.memcpy_htod(d_Kernel_rows, filterx) # The kernel goes into
> constant memory via a symbol defined in the kernel
>
> I get the following error:
>
> The debugged program raised the exception ArgumentError
> "Python argument types in pycuda._driver.memcpy_htod(tuple,
> numpy.ndarray) did not match C++ signature: memcpy_htod(unsigned int
> dest, boost::python::api::object src, boost::python::api::object
> stream=None)"
>
> Here are some of the relevant variables from the debugger...
>
> >>> d_Kernel_rows
> (16778496, 68)
> >>> type(d_Kernel_rows[0])
> <type 'int'>
> >>> type(d_Kernel_rows[1])
> <type 'int'>
> >>> filterx
> array([ 0.01396019, 0.02230832, 0.03348875, 0.04722672, 0.06256524,
> 0.07786369, 0.09103188, 0.09997895, 0.10315263, 0.09997895,
> 0.09103188, 0.07786369, 0.06256524, 0.04722672, 0.03348875,
> 0.02230832, 0.01396019], dtype=float32)
> >>> filterx.shape
> (17,)
> >>> KERNEL_W
> 17
>
> Again, I have attached a stand-alone version of the code.
>
> Thanks!
>
> _______________________________________________
> PyCUDA mailing list
> [email protected]
> http://tiker.net/mailman/listinfo/pycuda_tiker.net
>
>
--
Nicolas Pinto
Ph.D. Candidate, Brain & Computer Sciences
Massachusetts Institute of Technology, USA
http://web.mit.edu/pinto
_______________________________________________
PyCUDA mailing list
[email protected]
http://tiker.net/mailman/listinfo/pycuda_tiker.net