Re: [PyCUDA] Handling large matrix multiplication

Lev Givon Sun, 08 Nov 2015 21:28:52 -0800

Received from Keith Brown on Sun, Nov 08, 2015 at 11:46:47PM EST:
> Thanks Lev.
> My matrix size is going to be large, somewhere near n=100000.


(I assume n = total number of elements in the matrix; a matrix of size 10**5 x
10**5 32-bit floating point values would require more memory than currently
available GPUs can provide.)

> So, how can I test between CPU and GPU matrix math? I though my
> technique was good enough but apparently not.

If you are trying to ensure that the CPU and GPU are doing as similar floating
point computations as possible, you may want to look into whether the intrinsic
single precision functions that CUDA provides to enable control of rounding
during addition and multiplication (e.g., __fadd_rd, __fad_rn, etc.) may be
useful, as well as compiler options that affect processing of denormals (e.g.,
--ftz). For the purposes of checking algorithmic correctness against an existing
(CPU-based) implementation, you may want to use double precision (even if you
plan to use single precision for your actual computations). In general, though,
it is prudent to test results (via allclose()) with some defined tolerance in
light of the effects of floating point operations.
-- 
Lev Givon
Bionet Group | Neurokernel Project
http://lebedov.github.io/
http://neurokernel.github.io/


_______________________________________________
PyCUDA mailing list
[email protected]
http://lists.tiker.net/listinfo/pycuda

Re: [PyCUDA] Handling large matrix multiplication

Reply via email to