Hi all,

now that we have a flexible scan, a lot of stuff becomes quite easy:

http://documen.tician.de/pyopencl/array.html#sorting

:)

Performance isn't a dream yet, but I've also done exactly zero
tuning. It manages 34 MKeys/s on Fermi and 42 MKeys/s on Tahiti. For
comparison, numpy does about 10 MKeys/s on a CPU with a decent memory
system. The CL code on the CPU achieves about 10 MKeys/s on 4+ cores,
with the AMD implementation being 50% faster than Intel. (All this is on
32-bit integers.) If you've got some time to help tune this... :P

But the real good news here is that a) this was pretty easy to put
together on top of the existing scan primitive, and b) it actually
yields code that works on quite a bunch of CL implementations.

Hope you're finding this as exciting as me. :)

Andreas


_______________________________________________
PyOpenCL mailing list
[email protected]
http://lists.tiker.net/listinfo/pyopencl

Reply via email to