Thank you very much for you answer. That helps me a lot.
Dnia 2010-12-11, sob o godzinie 11:50 -0600, Barry Smith pisze:
> To answer this you need to understand that PETSc copies vectors and matrices
> to the GPU memory "on demand" (that is exactly when they are first needed on
> the GPU, and not before) and once it has copied to the GPU it keeps track of
> it and will NOT copy it down again if it is already there.
>
> Hence in your run below, yes it includes the copy time down.
>
> But note that ONE multiply on the GPU is absurd, it does not make sense to
> copy a matrix down to the GPU and then do ONE multiply with it. Thus I NEVER
> do "sandalone" benchmarking where a single kernel is called by it self once,
> the time results are useless. Always run a FULL application with
> -log_summary; for example in this case a full KSPSolve() that requires a
> bunch of iterations. Then you can look at the performance of each kernel. The
> reason to do it this way is that the numbers can be very different and what
> matters is runs in APPLICATIONS so that is what should be measured.
>
> If say you run KSP with 20 iterations then the time to copy the matrix
> down to the GPU is amortized over those 20 iterations and thus maybe ok. You
> should see the flop rate for the MatMult() go up in this case.
>
> You may have noticed we have a log entry for VecCopyToGPU() we will be
> adding one for matrices as well thus you will be able to see how long the
> copy time is but not that the copy time is still counted in the MatMult()
> time if the first copy of the matrix to GPU is triggered by the MatMult. You
> can subtract the copy time from the mult time to get the per multiply time,
> this would correspond to the multiply time in the limit of a single copy down
> and many, many multiplies on the GPU.
>
> Barry
>
>
>
>
> On Dec 11, 2010, at 8:32 AM, Jakub Pola wrote:
>
> > Hello again,
> >
> > I compiled one of te examples. I used sparse matix called 02-raefsky3.
> > I used -vec_type cuda and -mat_type seqaijcuda.
> >
> > When I see summary of the operations performed by program there is
> >
> > MatMult 1 1.0 2.0237e-02 1.0 2.98e+06 1.0 0.0e+00 0.0e+00 0.0e+00 2100
> > 0 0 0 2100 0 0 0 147
> >
> > Does time of performing MatMult includes memory transfer for loading
> > matrix in GPU memory or just exact computation time?
> >
> > Thanks in advance.
> > Kuba.
> >
>