Re: [PyCuda] Passing struct with arrays into kernel

Per B. Sederberg Sun, 10 May 2009 17:02:40 -0700

Hi Andreas:

Thanks for the response.  I think you are exactly correct that I'm
mixing up passing the pointer to the array or the actual data.

Although I was able to get what I wanted based on this section in the
documentation:

http://documen.tician.de/pycuda/tutorial.html#structures

I still think I'm a bit confused.

I definitely want to pass in pointers to the arrays that are part of
the struct (because I'll be changing their values in the kernel.  In
the structure example you call cuda.mem_alloc() for the memsize of the
structure, essentially the size of the int (plus padding for 64-bit
pointers) and the pointer to the array.  Then you allocate memory for
the array using the cuda.to_device() method and assign the pointer to
that array to the pointer in the structure.  That all makes sense,
thought is certainly complicated.

I think my mistake came because I was following a similar procedure
for a structure including an array as I would take for just an array.
Let's say I just had an array C that I could allocate and pass to a
function like this:

import numpy as np
C = np.zeros(10,dtype=np.float32)
Cdev = cuda.mem_alloc(C.nbytes)  # note use of nbytes, not memsize of a pointer
cuda.memcpy_htod(Cdev,C)

myfunc(Cdev)  # where myfunc looks like __global__ void addone(float *dat){}...

I could have skipped the above by using the In and Out or InOut methods.

In my struct example I wanted to do the same thing for a pointer to my
struct (allocate for all the data in the struct, including the arrays,
and then copy the data to the device.)  My code figured out the size
of the data in the struct, turned it all into a concatenated string
representation, and performed a memcpy to transfer it out.  Is the
reason that this did not work because I'm on a 64bit system and I
needed to do allocate the memory differently?  Or when I add in the
second layer of the struct, do I have to do the mem_alloc for the
pointers and the data separately like in the documented struct
example?

Still, I think I've convinced myself that it's not easier to wrap up
my arrays and other information in a struct to keep things looking
clean because it actually adds a whole bunch of code on top of just
passing in a bunch of arguments to my kernel function.  Perhaps if i
actually _needed_ a struct I would use it.

Sorry for the rambling response.  Perhaps there is a little signal in
the noise that someone will find helpful sometime...

Thanks,
Per

On Sun, May 10, 2009 at 4:51 PM, Andreas Klöckner
<[email protected]> wrote:
> You're a bit confused about whether you're passing in a *pointer* to an array
> or the actual array data. The C struct says pointer to, your packing code says
> inlined array.
>
> I'd suggest checking out numpy record arrays.
>
> Andreas
>
> On Sonntag 10 Mai 2009, Per B. Sederberg wrote:
>> Hi Folks:
>>
>> I'm working on simulating a simple neural network model on the GPU.
>> In my case, I should see benefits from performing many simulations of
>> the simple model at once across threads instead of parallelizing
>> individual simulations because the neural network is so small.
>>
>> I'd like to pass a struct with arrays containing parameters and
>> initialization information for the neural network and also a place to
>> put results.  This is only to keep the code clean (otherwise I'll be
>> passing in handfuls of parameters to the kernel.)  I have had full
>> success passing in separate parameters, but have failed to pass the
>> struct, getting launch failed errors at various stages of the process
>> (sometimes when allocating memory and sometimes with trying to read it
>> off the device.)
>>
>> I've included a simplified example below.  I realize the class to
>> handle talking to the C struct is a bit crazy, but if it worked I
>> could clean it up into a more general class.
>>
>> Is there any clue as to what is wrong or is there a better way to
>> accomplish what I'm trying to do?  I'm pretty new to pycuda and cuda,
>> so I won't be offended at all if you give me drastically different
>> suggestions of what to do or if you point out a ridiculous error that
>> I'm making ;)
>>
>> Thanks,
>> Per
>>
>> PS-> I'm using a git clone of pycuda from about a week ago and version
>> 2.1 of CUDA libs on a GTX285.
>>
>> struct_test.py (also attached, but in case no attachments are allowed):
>> ------------------
>>
>> import pycuda.driver as cuda
>> import pycuda.autoinit
>> from pycuda.compiler import SourceModule
>>
>> import numpy as np
>>
>> mod = SourceModule(
>>     """
>>
>> struct results
>> {
>>   unsigned int n;
>>   float *A;
>>   float *B;
>> };
>>
>> __global__ void struct_test(results *res)
>> {
>>   unsigned int i;
>>   for (i=0; i<res->n; i++)
>>   {
>>     res->A[i] = res->B[i] + 1;
>>   }
>> }
>>
>>     """)
>>
>>
>> cu_struct = mod.get_function("struct_test")
>>
>> class Results(object):
>>     def __init__(self, n=10):
>>         self._cptr = None
>>         self.n = np.uint32(n)
>>         self.A = np.zeros(self.n,dtype=np.float32)
>>         self.B = np.ones(self.n,dtype=np.float32)
>>     def send_to_gpu(self):
>>         if self._cptr is None:
>>             self._cptr = cuda.mem_alloc(self.nbytes())
>>         cuda.memcpy_htod(self._cptr, self.pack())
>>     def get_from_gpu(self):
>>         if not self._cptr is None:
>>             tempstr = np.array([' ']*self.nbytes())
>>             cuda.memcpy_dtoh(tempstr,self._cptr)
>>             ind = np.array([0,self.n.nbytes])
>>             self.n = np.fromstring(tempstr[ind[0]:ind[1]],
>>
>> dtype=self.n.dtype).reshape(self.n.shape) ind[0] += self.n.nbytes
>>             ind[1] += self.A.nbytes
>>             self.A = np.fromstring(tempstr[ind[0]:ind[1]],
>>
>> dtype=self.A.dtype).reshape(self.A.dtype) ind[0] += self.A.nbytes
>>             ind[1] += self.B.nbytes
>>             self.B = np.fromstring(tempstr[ind[0]:ind[1]],
>>
>> dtype=self.B.dtype).reshape(self.B.dtype) def pack(self):
>>         return self.n.tostring() + self.A.tostring() + self.B.tostring()
>>     def nbytes(self):
>>         return self.n.nbytes + self.A.nbytes + self.B.nbytes
>>
>> res = Results(10)
>> res.send_to_gpu()
>> cu_struct(res._cptr, block=(1,1,1))
>> res.get_from_gpu()
>>
>> print res.A
>> print res.B
>> print res.n
>
>
>
> _______________________________________________
> PyCuda mailing list
> [email protected]
> http://tiker.net/mailman/listinfo/pycuda_tiker.net
>
>

_______________________________________________
PyCuda mailing list
[email protected]
http://tiker.net/mailman/listinfo/pycuda_tiker.net

Re: [PyCuda] Passing struct with arrays into kernel

Reply via email to