Hi,

Does MatMult function is performed on GPU? when I prepared program which
just executes this function with parameters -vec_type cuda and -mat_type
seqaijcuda i havent seen in summary log any VecCUDACopyTo entry


Dnia 2010-12-11, sob o godzinie 11:50 -0600, Barry Smith pisze:
> To answer this you need to understand that PETSc copies vectors and matrices 
> to the GPU memory "on demand" (that is exactly when they are first needed on 
> the GPU, and not before) and once it has copied to the GPU it keeps track of 
> it and will NOT copy it down again if it is already there.
> 
>    Hence in your run below, yes it includes the copy time down. 
> 
>    But note that ONE multiply on the GPU is absurd, it does not make sense to 
> copy a matrix down to the GPU and then do ONE multiply with it. Thus I NEVER 
> do "sandalone" benchmarking where a single kernel is called by it self once, 
> the time results are useless. Always run a FULL application with 
> -log_summary; for example in this case a full KSPSolve() that requires a 
> bunch of iterations. Then you can look at the performance of each kernel. The 
> reason to do it this way is that the numbers can be very different and what 
> matters is runs in APPLICATIONS so that is what should be measured.
> 
>    If say you run KSP with 20 iterations then the time to copy the matrix 
> down to the GPU is amortized over those 20 iterations and thus maybe ok. You 
> should see the flop rate for the MatMult() go up in this case.
> 
>    You may have noticed we have a log entry for VecCopyToGPU() we will be 
> adding one for matrices as well thus you will be able to see how long the 
> copy time is but not that the copy time is still counted in the MatMult() 
> time if the first copy of the matrix to GPU is triggered by the MatMult. You 
> can subtract the copy time from the mult time to get the per multiply time, 
> this would correspond to the multiply time in the limit of a single copy down 
> and many, many multiplies on the GPU.
> 
>    Barry
> 
> 
> 
> 
> On Dec 11, 2010, at 8:32 AM, Jakub Pola wrote:
> 
> > Hello again,
> > 
> > I compiled one of te examples. I used sparse matix called 02-raefsky3.
> > I used -vec_type cuda and -mat_type seqaijcuda. 
> > 
> > When I see summary of the operations performed by program there is
> > 
> > MatMult 1 1.0 2.0237e-02 1.0 2.98e+06 1.0 0.0e+00 0.0e+00 0.0e+00  2100
> > 0  0  0   2100  0  0  0   147
> > 
> > Does time of performing MatMult includes memory transfer for loading
> > matrix in GPU memory or just exact computation time?
> > 
> > Thanks in advance. 
> > Kuba.
> > 
> 


Reply via email to