Hello!

I'm a bit of a pycuda newb, and today I dove into the source for the first time.

Essentially, I was trying to implement a numpy-style argmax argument
using a custom reduction kernel:
http://docs.scipy.org/doc/numpy/reference/generated/numpy.argmax.html

I thought this would be a pretty simple procedure, and the code I tried was:

                maxloc_reduction_k = ReductionKernel(
                        numpy.int32,
                        arguments="float *x",
                        neutral="0",
                        map_expr="i",
                        reduce_expr="(x[(int)a] > x[(int)b]) ? (int)a : (int)b")

What I discovered was that pycuda really didn't appreciate me using
the specified arguments (in this case, "float *x") in the reduce_expr
part.

Looking at reduction.py (in the latest release code, 0.94.2) , I
couldn't really see a reason for why this shouldn't be possible. I
added my own parameter to the reduction kernel's constructor ("hack")
to allow my original parameters to be passed to both _stage1 and
_stage2. This allowed me to do this:

                maxloc_reduction_k = ReductionKernel(
                        numpy.int32,
                        hack=True,
                        arguments="float *x",
                        neutral="0",
                        map_expr="i",
                        reduce_expr="(x[(int)a] > x[(int)b]) ? (int)a : (int)b")

This way, my hacked changes would only affect some reduction kernels.

I'm not totally clear on how to create a patch. Hopefully I didn't
screw it up. I don't really know if I'm allowed to send attachments to
a mailing group... so I used pastebin. Here's my changes:
http://pastebin.com/R5i5JveM

Just in case I messed up creating a patch, here's my full modified
version of reduction.py: http://pastebin.com/WExgBTQ9

Is there a better way to be doing this?

I considered using the current dev-build of pycuda to access float2,
and store my array value in x and my index in y, but the reduction
code hasn't been updated to properly handle float2's (you can't just
assert a float to another volatile type -- you have to copy the .x and
the .y).

If this is functionality that is actually missing, I'll try to
actually do the work to implement this in a non-hackish way.

Thanks,

Ryan Marcus

_______________________________________________
PyCUDA mailing list
[email protected]
http://lists.tiker.net/listinfo/pycuda

Reply via email to