does anyone have any thoughts? is this feasible?
On Mon, Nov 23, 2015 at 3:14 PM, Keith Brown <[email protected]> wrote: > Thanks all for the replies. > > My goal is simple. Atleast, I though it was simple :-) > > I have function where I calculate the dot product > > def F(a,b): > return np.dot(a.T,b) > > I need to do this 8k times. The max size of 'a' and 'b' are (3 million, 1). > > For smaller size of a and b. linalg.dot is working great. But I want a > more efficient way using GPU. > > Perhaps, GPU isn't the way to go since the memory is too large? > > > > > > On Mon, Nov 23, 2015 at 2:26 PM, Stanley Seibert <[email protected]> wrote: >> From the cuBLAS-XT description: >> >> (https://developer.nvidia.com/cublas) >> >> "By using a streaming design, cuBLAS-XT efficiently manages transfers across >> the PCI-Express bus automatically, which allows input and output data to be >> stored on the host’s system memory. This provides out-of-core operation – >> the size of operand data is only limited by system memory size, not by GPU >> on-board memory size.” >> >> So I don’t think cuBLAS-XT can help unless you have more than 95 GB of >> system RAM. If that is not the case, I think you have to step back and >> think about what you need to do with this array ultimately, and where you >> want to stage the data if you need to compute all 95 GB of it at once. >> >> >>> On Nov 23, 2015, at 12:58 PM, Keith Brown <[email protected]> wrote: >>> >>> Correct. My result matrix will be too large. >>> >>> <sigh> >>> >>> I would think cublasXT would take care of this for me. I though it >>> would do some sort of divide and conquer. >>> >>> Is there a way to attack this sort of problem? >>> >>> On Mon, Nov 23, 2015 at 11:38 AM, Jonas Bardino <[email protected]> wrote: >>>> Ehmm, I'm not sure I understand exactly what you do, but to me it sounds >>>> like you try to calculate the dot product of a 160080 x 3 matrix and a >>>> similar one transposed, i.e. a 3 x 160080 matrix. That would give you a >>>> 160080 x 160080 matrix result - which surely won't fit your 3GB of GPU >>>> memory. >>>> >>>> Cheers, Jonas >>>> >>>> On 2015-11-23 17:10, Keith Brown wrote: >>>>> I have a 2 small matrix (160080,3) of type float32 and I am >>>>> calculating their dot product. While doing this, I keep getting >>>>> pycuda.__driver.MemoryError: cuMemAlloc failed out of memory. >>>>> >>>>> I have 2 cards, each with 3GB of memory. Each matrix takes about 1875 >>>>> kilobytes. I am not sure why this is occuring. >>>>> >>>>> x=np.ones((160080,3L)).astype(np.float32) >>>>> a_gpu=gpuarray.to_gpu(x) >>>>> b_gpu=gpuarray.to_gpu(x) >>>>> c_gpu = linalg.dot(a_gpu,b_gpu,'N','T',handle=handle) >>>>> >>>>> My handle is a cublasxt (not regular cublas since blasxt apprently >>>>> does better memory handling). >>>>> >>>>> Any idea what is going on? >>>>> >>>>> _______________________________________________ >>>>> PyCUDA mailing list >>>>> [email protected] >>>>> http://lists.tiker.net/listinfo/pycuda >>>>> >>>> >>>> >>>> >>>> _______________________________________________ >>>> PyCUDA mailing list >>>> [email protected] >>>> http://lists.tiker.net/listinfo/pycuda >>>> >>> >>> _______________________________________________ >>> PyCUDA mailing list >>> [email protected] >>> http://lists.tiker.net/listinfo/pycuda >> _______________________________________________ PyCUDA mailing list [email protected] http://lists.tiker.net/listinfo/pycuda
