On Sat, Jan 22, 2022 at 12:29 PM Jed Brown <j...@jedbrown.org> wrote:
> Mark Adams <mfad...@lbl.gov> writes: > > >> > >> > >> > >> > VecPointwiseMult 402 1.0 2.9605e-01 3.6 1.05e+08 1.0 0.0e+00 > 0.0e+00 > >> 0.0e+00 0 0 0 0 0 5 1 0 0 0 22515 70608 0 0.00e+00 > 0 > >> 0.00e+00 100 > >> > VecScatterBegin 400 1.0 1.6791e-01 6.0 0.00e+00 0.0 3.7e+05 > 1.6e+04 > >> 0.0e+00 0 0 62 54 0 2 0100100 0 0 0 0 0.00e+00 > 0 > >> 0.00e+00 0 > >> > VecScatterEnd 400 1.0 1.0057e+00 7.0 0.00e+00 0.0 0.0e+00 > 0.0e+00 > >> 0.0e+00 0 0 0 0 0 5 0 0 0 0 0 0 0 0.00e+00 > 0 > >> 0.00e+00 0 > >> > PCApply 402 1.0 2.9638e-01 3.6 1.05e+08 1.0 0.0e+00 > 0.0e+00 > >> 0.0e+00 0 0 0 0 0 5 1 0 0 0 22490 70608 0 0.00e+00 > 0 > >> 0.00e+00 100 > >> > >> Most of the MatMult time is attributed to VecScatterEnd here. Can you > >> share a run of the same total problem size on 8 ranks (one rank per > GPU)? > >> > >> > > attached. I ran out of memory with the same size problem so this is the > > 262K / GPU version. > > How was this launched? Is it possible all 8 ranks were using the same GPU? > (Perf is that bad.) > srun -n8 -N1 *--ntasks-per-gpu=1* --gpu-bind=closest ../ex13 -dm_plex_box_faces 2,2,2 -petscpartitioner_simple_process_grid 2,2,2 -dm_plex_box_upper 1,1,1 -petscpartitioner_simple_node_grid 1,1,1 -dm_refine 6 -dm_view -dm_mat_type aijkokkos -dm_vec_type kokkos -pc_type jacobi -log_view -ksp_view -use_gpu_aware_mpi true + a large .petscrc file > >> From the other log file (10x bigger problem) > >> > >> > > ???? > > You had attached two files and the difference seemed to be that the second > was 10x more dofs/rank. > I am refining a cube so it goes by 8x. jac_out_001_kokkos_Crusher_6_1_notpl.txt number of nodes number of refinements number of process per GPU > > --- Event Stage 2: KSP Solve only > > > > MatMult 400 1.0 8.8003e+00 1.1 1.06e+11 1.0 2.2e+04 8.5e+04 > 0.0e+00 2 55 61 54 0 70 91100100 0 95058 132242 0 0.00e+00 0 > 0.00e+00 100 > > MatView 2 1.0 1.1643e-03 1.8 0.00e+00 0.0 0.0e+00 0.0e+00 > 2.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 > 0.00e+00 0 > > KSPSolve 2 1.0 1.2540e+01 1.0 1.17e+11 1.0 2.2e+04 8.5e+04 > 1.2e+03 3 60 61 54 60 100100100100100 73592 116796 0 0.00e+00 0 > 0.00e+00 100 > > SFPack 400 1.0 1.8276e-03 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 > 0.00e+00 0 > > SFUnpack 400 1.0 6.2653e-05 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 > 0.00e+00 0 > > VecTDot 802 1.0 1.3551e+00 1.2 3.36e+09 1.0 0.0e+00 0.0e+00 > 8.0e+02 0 2 0 0 40 10 3 0 0 67 19627 52599 0 0.00e+00 0 > 0.00e+00 100 > > VecNorm 402 1.0 9.0151e-01 2.2 1.69e+09 1.0 0.0e+00 0.0e+00 > 4.0e+02 0 1 0 0 20 5 1 0 0 33 14788 125477 0 0.00e+00 0 > 0.00e+00 100 > > VecCopy 4 1.0 7.3905e-03 2.5 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 > 0.00e+00 0 > > VecSet 4 1.0 3.1814e-03 1.8 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 > 0.00e+00 0 > > VecAXPY 800 1.0 8.2617e-01 1.0 3.36e+09 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 2 0 0 0 7 3 0 0 0 32112 61644 0 0.00e+00 0 > 0.00e+00 100 > > VecAYPX 398 1.0 8.1525e-01 1.6 1.67e+09 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 1 0 0 0 5 1 0 0 0 16190 20689 0 0.00e+00 0 > 0.00e+00 100 > > VecPointwiseMult 402 1.0 3.5694e-01 1.0 8.43e+08 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 3 1 0 0 0 18675 38633 0 0.00e+00 0 > 0.00e+00 100 > > VecScatterBegin 400 1.0 1.3391e+00 2.6 0.00e+00 0.0 2.2e+04 8.5e+04 > 0.0e+00 0 0 61 54 0 7 0100100 0 0 0 0 0.00e+00 0 > 0.00e+00 0 > > VecScatterEnd 400 1.0 1.3240e+00 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 9 0 0 0 0 0 0 0 0.00e+00 0 > 0.00e+00 0 > > PCApply 402 1.0 3.5712e-01 1.0 8.43e+08 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 3 1 0 0 0 18665 38633 0 0.00e+00 0 > 0.00e+00 100 >