To answer this you need to understand that PETSc copies vectors and matrices 
to the GPU memory "on demand" (that is exactly when they are first needed on 
the GPU, and not before) and once it has copied to the GPU it keeps track of it 
and will NOT copy it down again if it is already there.

   Hence in your run below, yes it includes the copy time down. 

   But note that ONE multiply on the GPU is absurd, it does not make sense to 
copy a matrix down to the GPU and then do ONE multiply with it. Thus I NEVER do 
"sandalone" benchmarking where a single kernel is called by it self once, the 
time results are useless. Always run a FULL application with -log_summary; for 
example in this case a full KSPSolve() that requires a bunch of iterations. 
Then you can look at the performance of each kernel. The reason to do it this 
way is that the numbers can be very different and what matters is runs in 
APPLICATIONS so that is what should be measured.

   If say you run KSP with 20 iterations then the time to copy the matrix down 
to the GPU is amortized over those 20 iterations and thus maybe ok. You should 
see the flop rate for the MatMult() go up in this case.

   You may have noticed we have a log entry for VecCopyToGPU() we will be 
adding one for matrices as well thus you will be able to see how long the copy 
time is but not that the copy time is still counted in the MatMult() time if 
the first copy of the matrix to GPU is triggered by the MatMult. You can 
subtract the copy time from the mult time to get the per multiply time, this 
would correspond to the multiply time in the limit of a single copy down and 
many, many multiplies on the GPU.

   Barry




On Dec 11, 2010, at 8:32 AM, Jakub Pola wrote:

> Hello again,
> 
> I compiled one of te examples. I used sparse matix called 02-raefsky3.
> I used -vec_type cuda and -mat_type seqaijcuda. 
> 
> When I see summary of the operations performed by program there is
> 
> MatMult 1 1.0 2.0237e-02 1.0 2.98e+06 1.0 0.0e+00 0.0e+00 0.0e+00  2100
> 0  0  0   2100  0  0  0   147
> 
> Does time of performing MatMult includes memory transfer for loading
> matrix in GPU memory or just exact computation time?
> 
> Thanks in advance. 
> Kuba.
> 

Reply via email to