Forgot to send to the list.
________________________________________
From: Ian Cullinan
Sent: Tuesday, 3 February 2009 11:23 AM
To: Dan Goodman
Subject: RE: [PyCuda] global memory?

I dunno about the allocation (unless you can do the filter in-place, then 
reshape the array to the right size) I think you're either going to have to do 
it twice (the first time to find out how much space you need, then again to 
store the result).

But for the actual filtering, since you're expecting very few results you 
should be able to get good performance by just partitioning the input among 
lots of threads, storing an index in global memory and using atomicInc whenever 
you get a match.

Cheers,

Ian
________________________________________
From: [email protected] [[email protected]] On Behalf Of Dan 
Goodman [[email protected]]
Sent: Tuesday, 3 February 2009 4:41 AM
To: [email protected]
Subject: [PyCuda] global memory?

Hi all,

I hope this isn't a stupid question for this list, I've only just
started using CUDA programming.

What I want to do is implement the numpy operation J=where(x>x0) for a
gpu array x and a fixed constant x0. I want J to be computed on the GPU
(so that x doesn't have to be copied from the GPU to the CPU) but then
to be copied to the CPU. How would I go about doing this?

I was thinking about using the global memory space of the GPU basically,
and just using a single thread on the GPU to do the thresholding
operation. This isn't a very efficient way to use the GPU but I don't
see how I can do it in a parallel way. The thresholding operation is
performed many, many times with the array x updated (by the GPU) in
between, but each individual thresholding operation is only expected to
return an array J with a handful of values. For example, x might be an
array of 30,000 elements, and J might be say 5-20 elements.

So my question is basically, how can I allocate space on the global
memory using PyCuda, and then copy from this space. I couldn't decide
how to do this from the docs (or even if its possible).

Of course if anyone has another idea for a parallel way to do my
thresholding operation that would also be great! :-)

Thanks in advance for any help,
Dan

_______________________________________________
PyCuda mailing list
[email protected]
http://tiker.net/mailman/listinfo/pycuda_tiker.net

Reply via email to