Received from Keith Brown on Sun, Nov 08, 2015 at 11:46:47PM EST: > Thanks Lev. > My matrix size is going to be large, somewhere near n=100000.
(I assume n = total number of elements in the matrix; a matrix of size 10**5 x 10**5 32-bit floating point values would require more memory than currently available GPUs can provide.) > So, how can I test between CPU and GPU matrix math? I though my > technique was good enough but apparently not. If you are trying to ensure that the CPU and GPU are doing as similar floating point computations as possible, you may want to look into whether the intrinsic single precision functions that CUDA provides to enable control of rounding during addition and multiplication (e.g., __fadd_rd, __fad_rn, etc.) may be useful, as well as compiler options that affect processing of denormals (e.g., --ftz). For the purposes of checking algorithmic correctness against an existing (CPU-based) implementation, you may want to use double precision (even if you plan to use single precision for your actual computations). In general, though, it is prudent to test results (via allclose()) with some defined tolerance in light of the effects of floating point operations. -- Lev Givon Bionet Group | Neurokernel Project http://lebedov.github.io/ http://neurokernel.github.io/ _______________________________________________ PyCUDA mailing list [email protected] http://lists.tiker.net/listinfo/pycuda
