On Sun, 25 Dec 2011 06:28:02 -0800, Lewis Anderson <[email protected]> wrote: > Hello, > > I'm working on a neural network-based project which relies heavily on > array operations. I have moved from Numpy to PyOpenCL in order to > speed up these operations, and gotten great results (3.7x speedup on > my laptop). I'm looking forward to even better results when I move to > a better graphics card. However, in order to get optimal performance, > I want to properly handle asynchronous behavior, but I am not sure how > to do that when using builtin Array functions (+,-,/,abs(), fill(), > etc). > > I have defined several custom kernels, and used them successfully, > along with some more primitive operations. These custom kernels all > return event objects, which I can then use with the wait_for argument > to synchronize execution. However, it seems that the only way to do > this with the built in functions is by using queue.finish(), since > they do not return event objections. Is there a more sophisticated way > to do so?
That depends on a few things. First of all, I would not recommend using pyopencl arrays with out-of-order queues (but I also know of no GPU CL implementation that actually supports such queues). And if you have an in-order queue, you can simply say pyopencl.enqueue_marker() to get an event that will trigger once all previously enqueued array operations finish. If you're using 2011.2, enqueue_marker also takes a wait_for argument (for older versions you'd use the now-deprecated enqueue_wait_for_events). Does that fit the bill? Andreas
pgpz6fxIR1H47.pgp
Description: PGP signature
_______________________________________________ PyOpenCL mailing list [email protected] http://lists.tiker.net/listinfo/pyopencl
