Indeed PCSetUp is taking most of the time (79%). In the version of PETSc you are running it is doing a great deal of the setup work on the CPU. You can see there is a lot of data movement between the CPU and GPU (in both directions) during the setup; 64 1.91e+03 54 1.21e+03 90
Clearly, we need help in porting all the parts of the GAMG setup that still occur on the CPU to the GPU. Barry > On Mar 22, 2022, at 12:07 PM, Qi Yang <qiy...@oakland.edu> wrote: > > Dear Barry, > > Your advice is helpful, now the total time reduce from 30s to 20s(now all > matrix run on gpu), actually I have tried other settings for amg > predicontioner, seems not help that a lot, like -pc_gamg_threshold 0.05 > -pc_gamg_threshold_scale 0.5. > it seems the key point is the PCSetup process, from the log, it takes the > most time, and we can find from the new nsight system analysis, there is a > big gap before the ksp solver starts, seems like the PCSetup process, not > sure, am I right? > <3.png> > > PCSetUp 2 1.0 1.5594e+01 1.0 3.06e+09 1.0 0.0e+00 0.0e+00 > 0.0e+00 79 78 0 0 0 79 78 0 0 0 196 8433 64 1.91e+03 54 > 1.21e+03 90 > > > Regards, > Qi > > On Tue, Mar 22, 2022 at 10:44 PM Barry Smith <bsm...@petsc.dev > <mailto:bsm...@petsc.dev>> wrote: > > It is using > > MatSOR 369 1.0 9.1214e+00 1.0 7.32e+09 1.0 0.0e+00 0.0e+00 > 0.0e+00 29 27 0 0 0 29 27 0 0 0 803 0 0 0.00e+00 565 > 1.35e+03 0 > > which runs on the CPU not the GPU hence the large amount of time in memory > copies and poor performance. We are switching the default to be > Chebyshev/Jacobi which runs completely on the GPU (may already be switched in > the main branch). > > You can run with -mg_levels_pc_type jacobi You should then see almost the > entire solver running on the GPU. > > You may need to tune the number of smoothing steps or other parameters of > GAMG to get the faster solution time. > > Barry > > >> On Mar 22, 2022, at 10:30 AM, Qi Yang <qiy...@oakland.edu >> <mailto:qiy...@oakland.edu>> wrote: >> >> To whom it may concern, >> >> I have tried petsc ex50(Possion) with cuda, ksp cg solver and gamg >> precondition, however, it run for about 30s. I also tried NVIDIA AMGX with >> the same solver and same grid (3000*3000), it only took 2s. I used nsight >> system software to analyze those two cases, found petsc took much time in >> the memory process (63% of total time, however, amgx only took 19%). >> Attached are screenshots of them. >> >> The petsc command is : mpiexec -n 1 ./ex50 -da_grid_x 3000 -da_grid_y 3000 >> -ksp_type cg -pc_type gamg -pc_gamg_type agg -pc_gamg_agg_nsmooths 1 >> -vec_type cuda -mat_type aijcusparse -ksp_monitor -ksp_view -log-view >> >> The log file is also attached. >> >> Regards, >> Qi >> >> <1.png> >> <2.png> >> <log.PETSc_cg_amg_ex50_gpu_cuda> > > <log.PETSc_cg_amg_jacobi_ex50_gpu_cuda>