Yes, it should run on the GPU. Check an example, like ex19. Matt
On Mon, Dec 13, 2010 at 7:29 AM, Jakub Pola <jakub.pola at gmail.com> wrote: > Hi, > > Does MatMult function is performed on GPU? when I prepared program which > just executes this function with parameters -vec_type cuda and -mat_type > seqaijcuda i havent seen in summary log any VecCUDACopyTo entry > > > Dnia 2010-12-11, sob o godzinie 11:50 -0600, Barry Smith pisze: > > To answer this you need to understand that PETSc copies vectors and > matrices to the GPU memory "on demand" (that is exactly when they are first > needed on the GPU, and not before) and once it has copied to the GPU it > keeps track of it and will NOT copy it down again if it is already there. > > > > Hence in your run below, yes it includes the copy time down. > > > > But note that ONE multiply on the GPU is absurd, it does not make > sense to copy a matrix down to the GPU and then do ONE multiply with it. > Thus I NEVER do "sandalone" benchmarking where a single kernel is called by > it self once, the time results are useless. Always run a FULL application > with -log_summary; for example in this case a full KSPSolve() that requires > a bunch of iterations. Then you can look at the performance of each kernel. > The reason to do it this way is that the numbers can be very different and > what matters is runs in APPLICATIONS so that is what should be measured. > > > > If say you run KSP with 20 iterations then the time to copy the matrix > down to the GPU is amortized over those 20 iterations and thus maybe ok. You > should see the flop rate for the MatMult() go up in this case. > > > > You may have noticed we have a log entry for VecCopyToGPU() we will be > adding one for matrices as well thus you will be able to see how long the > copy time is but not that the copy time is still counted in the MatMult() > time if the first copy of the matrix to GPU is triggered by the MatMult. You > can subtract the copy time from the mult time to get the per multiply time, > this would correspond to the multiply time in the limit of a single copy > down and many, many multiplies on the GPU. > > > > Barry > > > > > > > > > > On Dec 11, 2010, at 8:32 AM, Jakub Pola wrote: > > > > > Hello again, > > > > > > I compiled one of te examples. I used sparse matix called 02-raefsky3. > > > I used -vec_type cuda and -mat_type seqaijcuda. > > > > > > When I see summary of the operations performed by program there is > > > > > > MatMult 1 1.0 2.0237e-02 1.0 2.98e+06 1.0 0.0e+00 0.0e+00 0.0e+00 2100 > > > 0 0 0 2100 0 0 0 147 > > > > > > Does time of performing MatMult includes memory transfer for loading > > > matrix in GPU memory or just exact computation time? > > > > > > Thanks in advance. > > > Kuba. > > > > > > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20101213/4e839060/attachment.htm>
