Hey Peter,

Have you looked at
pycuda/reduction.py<http://git.tiker.net/pycuda.git/blob/HEAD:/pycuda/reduction.py>?

N

On Fri, Aug 21, 2009 at 9:31 PM, Peter Berrington <
[email protected]> wrote:

> This may be a stupid question, but has anyone got any experience in
> implementing a parallel prefix sum with pycuda? I am trying to replace a
> portion of numpy code I had which was used to calculate the root mean square
> difference between two images. Theres a few steps in this procedure: a new
> array is created from the pairwise difference between the two source arrays.
> Each element of this new array is squared, and finally the elements of the
> new array are summed and divided by the array length in order to calculate
> the average square error between the two source images. My numpy code looks
> like this (hopefully the indentation is intact):
>
> def rmse(first, second):
>     """ Returns the root mean square error between two image arrays. """
>     assert numpy.size(first) == numpy.size(second)
>     difference = numpy.diff(numpy.dstack(
>         (first.astype(numpy.float64),
>         second.astype(numpy.float64))))
>     return 1 - math.sqrt(numpy.mean(difference**2)) / 255.0
>
>
> It's not hard to see how pycuda could be used to accomplish the first few
> steps of calculating the squared pairwise difference between two source
> arrays: I was thinking that I would upcast both image arrays to floats,
> since I would be unable to accurately square or subtract them from each
> other as 8-bit unsigned integers. Off the top of my head, the kernel might
> start looking like this
>
> mod = SourceModule("""
> __global__ void diff_and_square(float *dest, float *a, float *b)
> {
>   const int i = threadIdx.x;
>   dest[i] = a[i]*a[i] + b[i]*b[i] - (2 * a[i] * b[i]);
> }
> """)
>
> but I'm stuck on the last step: summation.
>
> I've read through this page on implementing parallel prefix sum with CUDA:
> http://http.developer.nvidia.com/GPUGems3/gpugems3_ch39.html
> It's a pretty good resource for understanding how to implement a prefix
> scan while avoiding bank conflicts and remaining work efficient.
> Unfortunately, the source code offered for download on that page is no
> longer hosted, it seems that its been replaced with similar functionality in
> the CUDA Data Parallel Primitives library- I was trying to skim through this
> code to see if it might be possible to plug it in and it seems quite
> complicated, using a plan manager class to control the behaviour of the
> prefix scan in a very genericized way.
>
>  I was hoping someone might be able to offer some advice on the most
> painless way to accomplish this with pycuda. Thanks in advance!
>
> _______________________________________________
> PyCUDA mailing list
> [email protected]
> http://tiker.net/mailman/listinfo/pycuda_tiker.net
>
>


-- 
Nicolas Pinto
Ph.D. Candidate, Brain & Computer Sciences
Massachusetts Institute of Technology, USA
http://web.mit.edu/pinto
_______________________________________________
PyCUDA mailing list
[email protected]
http://tiker.net/mailman/listinfo/pycuda_tiker.net

Reply via email to