Please run with the options ./ex19 -da_vec_type seqcuda -da_mat_type seqaijcuda -pc_type none -dmmg_nlevels 1 -da_grid_x 100 -da_grid_y 100 -log_summary -mat_no_inode -preload off -cuda_synchronize
On Aug 31, 2010, at 11:45 AM, Keita Teranishi wrote: > Hi PETSc Developer team, > > I have just measured the performance of ex19 program running on Fermi GPU. > I hope it will help you to develop GPU-enabled PETSc further. > > Thanks, > > Keita > > ./ex19 -pc_type jacobi -dmmg_nlevels 5 -da_vec_type cuda -da_mat_type aijcuda > -log_summary -cuda_synchronize > > > --- Event Stage 0: Main Stage > > PetscBarrier 2 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > --- Event Stage 1: SetUp > > VecSet 8 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > VecCUDACopyFrom 8 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatMultTranspose 4 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 58 0 0 0 0 > MatAssemblyBegin 9 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatAssemblyEnd 9 1.0 3.9999e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 14 0 0 0 0 0 > MatFDColorCreate 5 1.0 1.2000e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 43 0 0 0 0 0 > > --- Event Stage 2: Solve > > VecDot 2 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > VecMDot 980 1.0 5.5599e-01 1.0 2.95e+08 1.0 0.0e+00 0.0e+00 > 0.0e+00 10 14 0 0 0 39 28 0 0 0 530 > VecNorm 1025 1.0 1.2399e-01 1.0 1.95e+07 1.0 0.0e+00 0.0e+00 > 0.0e+00 2 1 0 0 0 9 2 0 0 0 158 > VecScale 1013 1.0 9.9998e-02 1.0 9.73e+06 1.0 0.0e+00 0.0e+00 > 0.0e+00 2 0 0 0 0 7 1 0 0 0 97 > VecCopy 208 1.0 3.9999e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > VecSet 45 1.0 7.9989e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 1 0 0 0 0 0 > VecAXPY 233 1.0 3.9999e-03 1.0 1.68e+06 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 419 > VecWAXPY 33 1.0 3.9990e-03 1.0 3.17e+05 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 79 > VecMAXPY 1013 1.0 2.9199e-01 1.0 3.14e+08 1.0 0.0e+00 0.0e+00 > 0.0e+00 5 15 0 0 0 21 30 0 0 0 1074 > VecPointwiseMult 988 1.0 9.5995e-02 1.0 9.42e+06 1.0 0.0e+00 0.0e+00 > 0.0e+00 2 0 0 0 0 7 1 0 0 0 98 > VecScatterBegin 13 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > VecReduceArith 2 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > VecReduceComm 1 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > VecCUDACopyTo 24 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > VecCUDACopyFrom 21 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatMult 1013 1.0 1.3600e-01 1.0 3.83e+08 1.0 0.0e+00 0.0e+00 > 0.0e+00 2 18 0 0 0 10 37 0 0 0 2815 > MatMultTranspose 8 1.0 3.9999e-03 1.0 1.15e+05 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 29 > MatAssemblyBegin 10 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatAssemblyEnd 10 1.0 8.0001e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 1 0 0 0 0 0 > MatZeroEntries 10 1.0 4.0002e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatFDColorApply 10 1.0 8.7998e-02 1.0 1.26e+07 1.0 0.0e+00 0.0e+00 > 0.0e+00 2 1 0 0 0 6 1 0 0 0 143 > MatFDColorFunc 210 1.0 1.2000e-02 1.0 1.15e+07 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 1 0 0 0 1 1 0 0 0 958 > SNESSolve 1 1.0 1.4160e+00 1.0 1.04e+09 1.0 0.0e+00 0.0e+00 > 0.0e+00 25 50 0 0 0 100100 0 0 0 737 > SNESLineSearch 2 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > SNESFunctionEval 3 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > SNESJacobianEval 2 1.0 9.1998e-02 1.0 1.27e+07 1.0 0.0e+00 0.0e+00 > 0.0e+00 2 1 0 0 0 6 1 0 0 0 138 > KSPGMRESOrthog 980 1.0 8.3199e-01 1.0 5.89e+08 1.0 0.0e+00 0.0e+00 > 0.0e+00 15 28 0 0 0 59 56 0 0 0 708 > KSPSetup 2 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > KSPSolve 2 1.0 1.3240e+00 1.0 1.03e+09 1.0 0.0e+00 0.0e+00 > 0.0e+00 23 49 0 0 0 93 99 0 0 0 778 > PCSetUp 2 1.0 3.9999e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > PCApply 980 1.0 9.5995e-02 1.0 9.41e+06 1.0 0.0e+00 0.0e+00 > 0.0e+00 2 0 0 0 0 7 1 0 0 0 98 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20100831/83af605f/attachment.html>
