Runs ok for me. Barry
On Dec 13, 2010, at 2:20 AM, Jakub Pola wrote: > Could you please check the file attached to this email. there is source > code and log summary from execution of mat mult. > > When I run the ex131 with parameters -vec_type cuda and -mat_type > seqaijcuda > > mpiexec -n 1 ./ex131 -f ../matbinary.ex -vec 0 -mat_type seqaijcuda > -vec_type cuda -log_summary > > it fails because of CUDA Error 4. see MatMultKO.log > > > When I run the same program without -vec_type cuda parameter only with > -mat_type seqaijcuda it run ok. > mpiexec -n 1 ./ex131 -f ../matbinary.ex -vec 0 -mat_type seqaijcuda > -log_summary > > MatMltOK.log > > When I run without -math_type seqaijcuda only with -vec_type cuda it > fails again because > > terminate called after throwing an instance of > 'thrust::system::system_error' > what(): invalid argument > terminate called after throwing an instance of > 'thrust::system::system_error' > what(): invalid argument > -------------------------------------------------------------------------- > mpiexec noticed that process rank 0 with PID 3755 on node desktop exited > on signal 6 (Aborted). > -------------------------------------------------------------------------- > > > Could you please give me some comments on that > > Dnia 2010-12-13, pon o godzinie 07:37 +0000, Matthew Knepley pisze: >> Yes, it should run on the GPU. Check an example, like ex19. >> >> >> Matt >> >> On Mon, Dec 13, 2010 at 7:29 AM, Jakub Pola <jakub.pola at gmail.com> >> wrote: >> Hi, >> >> Does MatMult function is performed on GPU? when I prepared >> program which >> just executes this function with parameters -vec_type cuda and >> -mat_type >> seqaijcuda i havent seen in summary log any VecCUDACopyTo >> entry >> >> >> Dnia 2010-12-11, sob o godzinie 11:50 -0600, Barry Smith >> pisze: >> >> >>> To answer this you need to understand that PETSc copies >> vectors and matrices to the GPU memory "on demand" (that is >> exactly when they are first needed on the GPU, and not before) >> and once it has copied to the GPU it keeps track of it and >> will NOT copy it down again if it is already there. >>> >>> Hence in your run below, yes it includes the copy time >> down. >>> >>> But note that ONE multiply on the GPU is absurd, it does >> not make sense to copy a matrix down to the GPU and then do >> ONE multiply with it. Thus I NEVER do "sandalone" benchmarking >> where a single kernel is called by it self once, the time >> results are useless. Always run a FULL application with >> -log_summary; for example in this case a full KSPSolve() that >> requires a bunch of iterations. Then you can look at the >> performance of each kernel. The reason to do it this way is >> that the numbers can be very different and what matters is >> runs in APPLICATIONS so that is what should be measured. >>> >>> If say you run KSP with 20 iterations then the time to >> copy the matrix down to the GPU is amortized over those 20 >> iterations and thus maybe ok. You should see the flop rate for >> the MatMult() go up in this case. >>> >>> You may have noticed we have a log entry for >> VecCopyToGPU() we will be adding one for matrices as well thus >> you will be able to see how long the copy time is but not that >> the copy time is still counted in the MatMult() time if the >> first copy of the matrix to GPU is triggered by the MatMult. >> You can subtract the copy time from the mult time to get the >> per multiply time, this would correspond to the multiply time >> in the limit of a single copy down and many, many multiplies >> on the GPU. >>> >>> Barry >>> >>> >>> >>> >>> On Dec 11, 2010, at 8:32 AM, Jakub Pola wrote: >>> >>>> Hello again, >>>> >>>> I compiled one of te examples. I used sparse matix called >> 02-raefsky3. >>>> I used -vec_type cuda and -mat_type seqaijcuda. >>>> >>>> When I see summary of the operations performed by program >> there is >>>> >>>> MatMult 1 1.0 2.0237e-02 1.0 2.98e+06 1.0 0.0e+00 0.0e+00 >> 0.0e+00 2100 >>>> 0 0 0 2100 0 0 0 147 >>>> >>>> Does time of performing MatMult includes memory transfer >> for loading >>>> matrix in GPU memory or just exact computation time? >>>> >>>> Thanks in advance. >>>> Kuba. >>>> >>> >> >> >> >> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which >> their experiments lead. >> -- Norbert Wiener >> > > <tests.zip>
