Luke Pfister <[email protected]> writes: > Is there a suggested way to do the equivalent of np.sum along a particular > axis for a high-dimensional GPUarray? > > I saw that this was discussed in 2009, before GPUarrays carried stride > information.
Hand-writing a kernel is probably still your best option. Just map the non-reduction axes to the grid/thread block axes, and write a for loop to do the summation. HTH, Andreas
signature.asc
Description: PGP signature
_______________________________________________ PyCUDA mailing list [email protected] http://lists.tiker.net/listinfo/pycuda
