Device emulation mode isn't that fast anyways; it forks a bunch of pthreads, which is quite slow.
For relevant speed comparisons, you might look at mcuda. I'm not sure if there's any source available though. cython with the numpy buffer interface is a pretty fast way to implement efficient host-side C algorithms. regards, Nicholas
_______________________________________________ PyCuda mailing list [email protected] http://tiker.net/mailman/listinfo/pycuda_tiker.net
