Hi all, I am wondering if anyone has worked up a class to automatically select a suitable thread block dimension given a function, nrow and ncol. I know using OccupancyRecord I can determine the occupancy for a given number of threads but it does not appear to be able to solve the inverse problem.
While I know there is more to performance than just occupancy it does often correlate with performance. Regards, Freddie.
signature.asc
Description: OpenPGP digital signature
_______________________________________________ PyCUDA mailing list [email protected] http://lists.tiker.net/listinfo/pycuda
