Hi Lev, I started to use NVBLAS which is an implementation of BLAS for GPUs. So far its OK (i think). I don't see all of my GPUs being utilized when I do a numpy.dot(A,B). I will play more around with it to get a better ideas. I want to avoid writing my own matrix multiplication method.
On Mon, Nov 9, 2015 at 12:26 AM, Lev Givon <[email protected]> wrote: > Received from Keith Brown on Sun, Nov 08, 2015 at 11:46:47PM EST: >> Thanks Lev. >> My matrix size is going to be large, somewhere near n=100000. > > (I assume n = total number of elements in the matrix; a matrix of size 10**5 x > 10**5 32-bit floating point values would require more memory than currently > available GPUs can provide.) > >> So, how can I test between CPU and GPU matrix math? I though my >> technique was good enough but apparently not. > > If you are trying to ensure that the CPU and GPU are doing as similar floating > point computations as possible, you may want to look into whether the > intrinsic > single precision functions that CUDA provides to enable control of rounding > during addition and multiplication (e.g., __fadd_rd, __fad_rn, etc.) may be > useful, as well as compiler options that affect processing of denormals (e.g., > --ftz). For the purposes of checking algorithmic correctness against an > existing > (CPU-based) implementation, you may want to use double precision (even if you > plan to use single precision for your actual computations). In general, > though, > it is prudent to test results (via allclose()) with some defined tolerance in > light of the effects of floating point operations. > -- > Lev Givon > Bionet Group | Neurokernel Project > http://lebedov.github.io/ > http://neurokernel.github.io/ > _______________________________________________ PyCUDA mailing list [email protected] http://lists.tiker.net/listinfo/pycuda
