On Wed, 21 Jul 2010 10:29:09 -0400, Frédéric Bastien <[email protected]> wrote: > CudaNdarray are object that > we created that is similar to GPUArray but with support for stride.
First of all, I don't understand why you're developing a stride-supporting GPUArray separately from PyCUDA. (Does it depend on all of Theano?) I'm sure lots of people would want that functionality--including myself! Would you be willing to contribute this code to PyCUDA in some form? > I changed CudaNdarray to have the member gpudata to return the pointer > on the device. > > You also use the _grid and _block member of GPUArray when we call a > pycuda function created with ElementwiseKernel. I could replicated the > creation of those members in CudaNdarray. I don't like that idea as if > you change the way you compute those value, I also need to do so or we > won't have the same speed. I don't get this--you have a number of options here. You can inherit From ElementwiseKernel and override the invocation logic. You can make _grid and _block computed properties of your array that actually call PyCUDA. Or you could simply replicate _grid and _block by calling PyCUDA's splay--without having to reimplement splay. > I just thought of another way that would work more easily(see new > patch attached). During the pycuda fct call if the input don't have a > _grid and _block member, pycuda call its splay fct and use those > value. This inserts extra code into a heavily-traveled, somewhat performance-relevant path in PyCUDA. Me no like. Andreas
pgp9CKKcaTnMj.pgp
Description: PGP signature
_______________________________________________ PyCUDA mailing list [email protected] http://lists.tiker.net/listinfo/pycuda
