I am trying to learn the http://wiki.tiker.net/PyCuda/Examples/MatrixmulSimple and its working so far but for only smaller size matrix. When I increase the size of the matrix the CPU and GPU values diverge as far as 5.9e+01.
I suspect its due to block and grid parameters I need to pass to matrixmul(). Is that correct? How can I pick the most optimal values? Or is there something else I should be considering? My matrix size is 10000x3 _______________________________________________ PyCUDA mailing list [email protected] http://lists.tiker.net/listinfo/pycuda
