The lines:
#copy data to page locked mem
host1 = rand1
host2 = rand2

won't copy the data to page locked memory, they'll just change what
host1 and host2 refer to (now they point to non-page-locked memory,
and the page-locked memory you just allocated gets GC'd). Replace them
with:
host1[:] = rand1[:]
host2[:] = rand2[:]
and it should work.

-- 
Dan Lepage



On Wed, May 6, 2009 at 5:20 PM, James Gurtowski <[email protected]> wrote:
> Hi,
>   I'm trying to implement streams in pycuda. I created this simple program
> to test, but I'm having trouble. When I try to copy data asynchronously I
> get this error.
>
> Traceback (most recent call last):
>   File "py_stream_test.py", line 40, in <module>
>     cuda.memcpy_htod(dev1,host1,stream1)
> pycuda._driver.LogicError: cuMemcpyHtoDAsync failed: invalid value
>
> code:
>
>
> import pycuda.driver as cuda
> import pycuda.autoinit
> import numpy
>
> size = 2000
>
> #random numpy
> arrays
> rand1 = numpy.random.rand(size).astype(numpy.float32)
> rand2 = numpy.random.rand(size).astype(numpy.float32)
>
> #init page locked
> memory
> host1 = cuda.pagelocked_empty_like(rand1)
> host2 = cuda.pagelocked_empty_like(rand2)
>
> #copy data to page locked
> mem
> host1 = rand1
> host2 = rand2
> print host1
> print host2
>
> #allocate space on
> device
> dev1 = cuda.mem_alloc(rand1.size*rand1.dtype.itemsize)
> dev2 = cuda.mem_alloc(rand2.size*rand2.dtype.itemsize)
>
>
> mod =
> cuda.SourceModule("""
>   __global__ void cuda_double(float
> *A){
>     int idx = blockIdx.x*blockDim.x +
> threadIdx.x;
>     A[idx] =
> A[idx]*2;
>
> }
>  """
> )
>
> cuda_double = mod.get_function("cuda_double")
>
> #create two
> streams
> stream1=cuda.Stream()
> stream2=cuda.Stream()
>
> #copy the
> data
> cuda.memcpy_htod(dev1,host1,stream1)
> cuda.memcpy_htod(dev2,host2,stream2)
>
> #run
> kernel
> num_blocks = (size/512)+1
> cuda_double(dev1,block=(512,1,1),grid=(num_blocks,1),stream=stream1)
> cuda_double(dev2,block=(512,1,1),grid=(num_blocks,1),stream=stream2)
>
> #copy
> back
> cuda.memcpy_dtoh(host1,dev1,stream1)
> cuda.memcpy_dtoh(host2,dev2,stream2)
>
> print host1
> print host2
>
>
> If I remove the stream argument from the memcpy it runs fine.
> Thanks,
> James
>
>
> _______________________________________________
> PyCuda mailing list
> [email protected]
> http://tiker.net/mailman/listinfo/pycuda_tiker.net
>
>

_______________________________________________
PyCuda mailing list
[email protected]
http://tiker.net/mailman/listinfo/pycuda_tiker.net

Reply via email to