Thanks Siddharth, that paper is quite interesting.  That was a
direction I thought about going, but then realized that instead of
having networks with large numbers of units that would benefit from
parallelization, I actually am currently simulating much smaller
networks (of around 100 units) that are faster on the CPU than
transfering out to the GPU.

Still, I need to run these smaller networks many many times, so I hope
to move the entire simulation out to the kernel and run each
simulation in a separate thread.  The main thing I'm worried about is
possibly running out of shared memory because I have to allocate local
memory for the weight matrices in each thread.

Best,
Per


On Sun, May 10, 2009 at 5:54 PM, Siddharth Priya
<[email protected]> wrote:
> Hi Per,
>
> Though this may be bit off topic with regards to implementation. This paper
> actually implemented a neural netwtork on a GPU (using CUBLAS i believe)
> http://www.iiit.net/techreports/2008_109.pdf .
>
> thanks
> Siddharth
>
> On Mon, May 11, 2009 at 2:21 AM, Andreas Klöckner <[email protected]>
> wrote:
>>
>> You're a bit confused about whether you're passing in a *pointer* to an
>> array
>> or the actual array data. The C struct says pointer to, your packing code
>> says
>> inlined array.
>>
>> I'd suggest checking out numpy record arrays.
>>
>> Andreas
>>
>> On Sonntag 10 Mai 2009, Per B. Sederberg wrote:
>> > Hi Folks:
>> >
>> > I'm working on simulating a simple neural network model on the GPU.
>> > In my case, I should see benefits from performing many simulations of
>> > the simple model at once across threads instead of parallelizing
>> > individual simulations because the neural network is so small.
>> >
>> > I'd like to pass a struct with arrays containing parameters and
>> > initialization information for the neural network and also a place to
>> > put results.  This is only to keep the code clean (otherwise I'll be
>> > passing in handfuls of parameters to the kernel.)  I have had full
>> > success passing in separate parameters, but have failed to pass the
>> > struct, getting launch failed errors at various stages of the process
>> > (sometimes when allocating memory and sometimes with trying to read it
>> > off the device.)
>> >
>> > I've included a simplified example below.  I realize the class to
>> > handle talking to the C struct is a bit crazy, but if it worked I
>> > could clean it up into a more general class.
>> >
>> > Is there any clue as to what is wrong or is there a better way to
>> > accomplish what I'm trying to do?  I'm pretty new to pycuda and cuda,
>> > so I won't be offended at all if you give me drastically different
>> > suggestions of what to do or if you point out a ridiculous error that
>> > I'm making ;)
>> >
>> > Thanks,
>> > Per
>> >
>> > PS-> I'm using a git clone of pycuda from about a week ago and version
>> > 2.1 of CUDA libs on a GTX285.
>> >
>> > struct_test.py (also attached, but in case no attachments are allowed):
>> > ------------------
>> >
>> > import pycuda.driver as cuda
>> > import pycuda.autoinit
>> > from pycuda.compiler import SourceModule
>> >
>> > import numpy as np
>> >
>> > mod = SourceModule(
>> >     """
>> >
>> > struct results
>> > {
>> >   unsigned int n;
>> >   float *A;
>> >   float *B;
>> > };
>> >
>> > __global__ void struct_test(results *res)
>> > {
>> >   unsigned int i;
>> >   for (i=0; i<res->n; i++)
>> >   {
>> >     res->A[i] = res->B[i] + 1;
>> >   }
>> > }
>> >
>> >     """)
>> >
>> >
>> > cu_struct = mod.get_function("struct_test")
>> >
>> > class Results(object):
>> >     def __init__(self, n=10):
>> >         self._cptr = None
>> >         self.n = np.uint32(n)
>> >         self.A = np.zeros(self.n,dtype=np.float32)
>> >         self.B = np.ones(self.n,dtype=np.float32)
>> >     def send_to_gpu(self):
>> >         if self._cptr is None:
>> >             self._cptr = cuda.mem_alloc(self.nbytes())
>> >         cuda.memcpy_htod(self._cptr, self.pack())
>> >     def get_from_gpu(self):
>> >         if not self._cptr is None:
>> >             tempstr = np.array([' ']*self.nbytes())
>> >             cuda.memcpy_dtoh(tempstr,self._cptr)
>> >             ind = np.array([0,self.n.nbytes])
>> >             self.n = np.fromstring(tempstr[ind[0]:ind[1]],
>> >
>> > dtype=self.n.dtype).reshape(self.n.shape) ind[0] += self.n.nbytes
>> >             ind[1] += self.A.nbytes
>> >             self.A = np.fromstring(tempstr[ind[0]:ind[1]],
>> >
>> > dtype=self.A.dtype).reshape(self.A.dtype) ind[0] += self.A.nbytes
>> >             ind[1] += self.B.nbytes
>> >             self.B = np.fromstring(tempstr[ind[0]:ind[1]],
>> >
>> > dtype=self.B.dtype).reshape(self.B.dtype) def pack(self):
>> >         return self.n.tostring() + self.A.tostring() + self.B.tostring()
>> >     def nbytes(self):
>> >         return self.n.nbytes + self.A.nbytes + self.B.nbytes
>> >
>> > res = Results(10)
>> > res.send_to_gpu()
>> > cu_struct(res._cptr, block=(1,1,1))
>> > res.get_from_gpu()
>> >
>> > print res.A
>> > print res.B
>> > print res.n
>>
>>
>>
>> _______________________________________________
>> PyCuda mailing list
>> [email protected]
>> http://tiker.net/mailman/listinfo/pycuda_tiker.net
>>
>
>
> _______________________________________________
> PyCuda mailing list
> [email protected]
> http://tiker.net/mailman/listinfo/pycuda_tiker.net
>
>

_______________________________________________
PyCuda mailing list
[email protected]
http://tiker.net/mailman/listinfo/pycuda_tiker.net

Reply via email to