I modified add_dot() to use cublas.xt.cublasXtSgemm. I don't think I need to modify dot() because its calling add_dot at the end. Its not calling cublasxt.cublasXtsgemm directly unless my matrix is 1d (which it isn't) Correct?
BTW, smaller matrices work fine its just for larger matrices. On Mon, Nov 23, 2015 at 11:35 AM, Lev Givon <[email protected]> wrote: > Received from Keith Brown on Mon, Nov 23, 2015 at 11:10:45AM EST: >> I have a 2 small matrix (160080,3) of type float32 and I am >> calculating their dot product. While doing this, I keep getting >> pycuda.__driver.MemoryError: cuMemAlloc failed out of memory. >> >> I have 2 cards, each with 3GB of memory. Each matrix takes about 1875 >> kilobytes. I am not sure why this is occuring. >> >> x=np.ones((160080,3L)).astype(np.float32) >> a_gpu=gpuarray.to_gpu(x) >> b_gpu=gpuarray.to_gpu(x) >> c_gpu = linalg.dot(a_gpu,b_gpu,'N','T',handle=handle) >> >> My handle is a cublasxt (not regular cublas since blasxt apprently >> does better memory handling). >> >> Any idea what is going on? > > Did you also modify skcuda.linalg.dot() to explicitly call the cublasXt*gemm > functions rather than the stock cublas*gemm functions? The cublasXt*gemm > functions expect host memory pointers as their arguments, not GPU memory > pointers. > -- > Lev Givon > Bionet Group | Neurokernel Project > http://lebedov.github.io/ > http://neurokernel.github.io/ > _______________________________________________ PyCUDA mailing list [email protected] http://lists.tiker.net/listinfo/pycuda
