> > > Note that Mark's logs have been switching back and forth between > -use_gpu_aware_mpi and changing number of ranks -- we won't have that > information if we do manual timing hacks. This is going to be a routine > thing we'll need on the mailing list and we need the provenance to go with > it. >
GPU aware MPI crashes sometimes so to be safe, while debugging, I had it off. It works fine here so it has been on in the last tests. Here is a comparison.
Script started on 2022-01-25 13:44:31-05:00 [TERM="xterm-256color" TTY="/dev/pts/0" COLUMNS="296" LINES="100"] 13:44[;1m[35m[;0m [34madams/aijkokkos-gpu-logging *= [33mcrusher:[32m/gpfs/alpine/csc314/scratch/adams/petsc/src/snes/tests/data[0m$ exitbash -x run_crusher_jac.sbatchexit[Kbash -x run_crusher_jac.sbatch + '[' -z '' ']' + case "$-" in + __lmod_vx=x + '[' -n x ']' + set +x Shell debugging temporarily silenced: export LMOD_SH_DBG_ON=1 for this output (/usr/share/lmod/lmod/init/bash) Shell debugging restarted + unset __lmod_vx + NG=8 + NC=1 + date Tue 25 Jan 2022 01:44:38 PM EST + EXTRA='-dm_view -log_viewx -ksp_view -use_gpu_aware_mpi true' + HYPRE_EXTRA='-pc_hypre_boomeramg_relax_type_all l1scaled-Jacobi -pc_hypre_boomeramg_interp_type ext+i -pc_hypre_boomeramg_coarsen_type PMIS -pc_hypre_boomeramg_no_CF' + HYPRE_EXTRA='-pc_hypre_boomeramg_no_CF true -pc_hypre_boomeramg_strong_threshold 0.75 -pc_hypre_boomeramg_agg_nl 1 -pc_hypre_boomeramg_coarsen_type HMIS -pc_hypre_boomeramg_interp_type ext+i ' + for REFINE in 5 + for NPIDX in 1 + let 'N1 = 1 * 1' ++ bc -l + PG=2.00000000000000000000 ++ printf %.0f 2.00000000000000000000 + PG=2 + let 'NCC = 8 / 1' + let 'N4 = 2 * 1' + let 'NODES = 1 * 1 * 1' + let 'N = 1 * 1 * 8' + echo n= 8 ' NODES=' 1 ' NC=' 1 ' PG=' 2 n= 8 NODES= 1 NC= 1 PG= 2 ++ printf %03d 1 + foo=001 + srun -n8 -N1 --ntasks-per-gpu=1 --gpu-bind=closest -c 8 ../ex13 -dm_plex_box_faces 2,2,2 -petscpartitioner_simple_process_grid 2,2,2 -dm_plex_box_upper 1,1,1 -petscpartitioner_simple_node_grid 1,1,1 -dm_refine 5 -dm_view -log_viewx -ksp_view -use_gpu_aware_mpi true -dm_mat_type aijkokkos -dm_vec_type kokkos -pc_type jacobi + tee jac_out_001_kokkos_Crusher_5_1_noview.txt DM Object: box 8 MPI processes type: plex box in 3 dimensions: Number of 0-cells per rank: 35937 35937 35937 35937 35937 35937 35937 35937 Number of 1-cells per rank: 104544 104544 104544 104544 104544 104544 104544 104544 Number of 2-cells per rank: 101376 101376 101376 101376 101376 101376 101376 101376 Number of 3-cells per rank: 32768 32768 32768 32768 32768 32768 32768 32768 Labels: celltype: 4 strata with value/size (0 (35937), 1 (104544), 4 (101376), 7 (32768)) depth: 4 strata with value/size (0 (35937), 1 (104544), 2 (101376), 3 (32768)) marker: 1 strata with value/size (1 (12474)) Face Sets: 3 strata with value/size (1 (3969), 3 (3969), 6 (3969)) Linear solve did not converge due to DIVERGED_ITS iterations 200 KSP Object: 8 MPI processes type: cg maximum iterations=200, initial guess is zero tolerances: relative=1e-12, absolute=1e-50, divergence=10000. left preconditioning using UNPRECONDITIONED norm type for convergence test PC Object: 8 MPI processes type: jacobi type DIAGONAL linear system matrix = precond matrix: Mat Object: 8 MPI processes type: mpiaijkokkos rows=2048383, cols=2048383 total: nonzeros=127263527, allocated nonzeros=127263527 total number of mallocs used during MatSetValues calls=0 not using I-node (on process 0) routines Linear solve did not converge due to DIVERGED_ITS iterations 200 KSP Object: 8 MPI processes type: cg maximum iterations=200, initial guess is zero tolerances: relative=1e-12, absolute=1e-50, divergence=10000. left preconditioning using UNPRECONDITIONED norm type for convergence test PC Object: 8 MPI processes type: jacobi type DIAGONAL linear system matrix = precond matrix: Mat Object: 8 MPI processes type: mpiaijkokkos rows=2048383, cols=2048383 total: nonzeros=127263527, allocated nonzeros=127263527 total number of mallocs used during MatSetValues calls=0 not using I-node (on process 0) routines Linear solve did not converge due to DIVERGED_ITS iterations 200 KSP Object: 8 MPI processes type: cg maximum iterations=200, initial guess is zero tolerances: relative=1e-12, absolute=1e-50, divergence=10000. left preconditioning using UNPRECONDITIONED norm type for convergence test PC Object: 8 MPI processes type: jacobi type DIAGONAL linear system matrix = precond matrix: Mat Object: 8 MPI processes type: mpiaijkokkos rows=2048383, cols=2048383 total: nonzeros=127263527, allocated nonzeros=127263527 total number of mallocs used during MatSetValues calls=0 not using I-node (on process 0) routines Solve time: 0.34211 #PETSc Option Table entries: -benchmark_it 2 -dm_distribute -dm_mat_type aijkokkos -dm_plex_box_faces 2,2,2 -dm_plex_box_lower 0,0,0 -dm_plex_box_upper 1,1,1 -dm_plex_dim 3 -dm_plex_simplex 0 -dm_refine 5 -dm_vec_type kokkos -dm_view -ksp_converged_reason -ksp_max_it 200 -ksp_norm_type unpreconditioned -ksp_rtol 1.e-12 -ksp_type cg -ksp_view -log_viewx -mg_levels_esteig_ksp_max_it 10 -mg_levels_esteig_ksp_type cg -mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 -mg_levels_ksp_type chebyshev -mg_levels_pc_type jacobi -options_left -pc_gamg_coarse_eq_limit 100 -pc_gamg_coarse_grid_layout_type compact -pc_gamg_esteig_ksp_max_it 10 -pc_gamg_esteig_ksp_type cg -pc_gamg_process_eq_limit 400 -pc_gamg_repartition false -pc_gamg_reuse_interpolation true -pc_gamg_square_graph 0 -pc_gamg_threshold -0.01 -pc_type jacobi -petscpartitioner_simple_node_grid 1,1,1 -petscpartitioner_simple_process_grid 2,2,2 -petscpartitioner_type simple -potential_petscspace_degree 2 -snes_max_it 1 -snes_rtol 1.e-8 -snes_type ksponly -use_gpu_aware_mpi true #End of PETSc Option Table entries WARNING! There are options you set that were not used! WARNING! could be spelling mistake, etc! There are 15 unused database options. They are: Option left: name:-log_viewx (no value) Option left: name:-mg_levels_esteig_ksp_max_it value: 10 Option left: name:-mg_levels_esteig_ksp_type value: cg Option left: name:-mg_levels_ksp_chebyshev_esteig value: 0,0.05,0,1.05 Option left: name:-mg_levels_ksp_type value: chebyshev Option left: name:-mg_levels_pc_type value: jacobi Option left: name:-pc_gamg_coarse_eq_limit value: 100 Option left: name:-pc_gamg_coarse_grid_layout_type value: compact Option left: name:-pc_gamg_esteig_ksp_max_it value: 10 Option left: name:-pc_gamg_esteig_ksp_type value: cg Option left: name:-pc_gamg_process_eq_limit value: 400 Option left: name:-pc_gamg_repartition value: false Option left: name:-pc_gamg_reuse_interpolation value: true Option left: name:-pc_gamg_square_graph value: 0 Option left: name:-pc_gamg_threshold value: -0.01 + srun -n8 -N1 --ntasks-per-gpu=1 --gpu-bind=closest -c 8 ../ex13 -dm_plex_box_faces 2,2,2 -petscpartitioner_simple_process_grid 2,2,2 -dm_plex_box_upper 1,1,1 -petscpartitioner_simple_node_grid 1,1,1 -dm_refine 5 -dm_view -log_viewx -ksp_view -use_gpu_aware_mpi true -use_gpu_aware_mpi false -dm_mat_type aijkokkos -dm_vec_type kokkos -pc_type jacobi + tee jac_out_001_kokkos_Crusher_5_1.txt DM Object: box 8 MPI processes type: plex box in 3 dimensions: Number of 0-cells per rank: 35937 35937 35937 35937 35937 35937 35937 35937 Number of 1-cells per rank: 104544 104544 104544 104544 104544 104544 104544 104544 Number of 2-cells per rank: 101376 101376 101376 101376 101376 101376 101376 101376 Number of 3-cells per rank: 32768 32768 32768 32768 32768 32768 32768 32768 Labels: celltype: 4 strata with value/size (0 (35937), 1 (104544), 4 (101376), 7 (32768)) depth: 4 strata with value/size (0 (35937), 1 (104544), 2 (101376), 3 (32768)) marker: 1 strata with value/size (1 (12474)) Face Sets: 3 strata with value/size (1 (3969), 3 (3969), 6 (3969)) Linear solve did not converge due to DIVERGED_ITS iterations 200 KSP Object: 8 MPI processes type: cg maximum iterations=200, initial guess is zero tolerances: relative=1e-12, absolute=1e-50, divergence=10000. left preconditioning using UNPRECONDITIONED norm type for convergence test PC Object: 8 MPI processes type: jacobi type DIAGONAL linear system matrix = precond matrix: Mat Object: 8 MPI processes type: mpiaijkokkos rows=2048383, cols=2048383 total: nonzeros=127263527, allocated nonzeros=127263527 total number of mallocs used during MatSetValues calls=0 not using I-node (on process 0) routines Linear solve did not converge due to DIVERGED_ITS iterations 200 KSP Object: 8 MPI processes type: cg maximum iterations=200, initial guess is zero tolerances: relative=1e-12, absolute=1e-50, divergence=10000. left preconditioning using UNPRECONDITIONED norm type for convergence test PC Object: 8 MPI processes type: jacobi type DIAGONAL linear system matrix = precond matrix: Mat Object: 8 MPI processes type: mpiaijkokkos rows=2048383, cols=2048383 total: nonzeros=127263527, allocated nonzeros=127263527 total number of mallocs used during MatSetValues calls=0 not using I-node (on process 0) routines Linear solve did not converge due to DIVERGED_ITS iterations 200 KSP Object: 8 MPI processes type: cg maximum iterations=200, initial guess is zero tolerances: relative=1e-12, absolute=1e-50, divergence=10000. left preconditioning using UNPRECONDITIONED norm type for convergence test PC Object: 8 MPI processes type: jacobi type DIAGONAL linear system matrix = precond matrix: Mat Object: 8 MPI processes type: mpiaijkokkos rows=2048383, cols=2048383 total: nonzeros=127263527, allocated nonzeros=127263527 total number of mallocs used during MatSetValues calls=0 not using I-node (on process 0) routines Solve time: 0.370544 #PETSc Option Table entries: -benchmark_it 2 -dm_distribute -dm_mat_type aijkokkos -dm_plex_box_faces 2,2,2 -dm_plex_box_lower 0,0,0 -dm_plex_box_upper 1,1,1 -dm_plex_dim 3 -dm_plex_simplex 0 -dm_refine 5 -dm_vec_type kokkos -dm_view -ksp_converged_reason -ksp_max_it 200 -ksp_norm_type unpreconditioned -ksp_rtol 1.e-12 -ksp_type cg -ksp_view -log_viewx -mg_levels_esteig_ksp_max_it 10 -mg_levels_esteig_ksp_type cg -mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 -mg_levels_ksp_type chebyshev -mg_levels_pc_type jacobi -options_left -pc_gamg_coarse_eq_limit 100 -pc_gamg_coarse_grid_layout_type compact -pc_gamg_esteig_ksp_max_it 10 -pc_gamg_esteig_ksp_type cg -pc_gamg_process_eq_limit 400 -pc_gamg_repartition false -pc_gamg_reuse_interpolation true -pc_gamg_square_graph 0 -pc_gamg_threshold -0.01 -pc_type jacobi -petscpartitioner_simple_node_grid 1,1,1 -petscpartitioner_simple_process_grid 2,2,2 -petscpartitioner_type simple -potential_petscspace_degree 2 -snes_max_it 1 -snes_rtol 1.e-8 -snes_type ksponly -use_gpu_aware_mpi false #End of PETSc Option Table entries WARNING! There are options you set that were not used! WARNING! could be spelling mistake, etc! There are 15 unused database options. They are: Option left: name:-log_viewx (no value) Option left: name:-mg_levels_esteig_ksp_max_it value: 10 Option left: name:-mg_levels_esteig_ksp_type value: cg Option left: name:-mg_levels_ksp_chebyshev_esteig value: 0,0.05,0,1.05 Option left: name:-mg_levels_ksp_type value: chebyshev Option left: name:-mg_levels_pc_type value: jacobi Option left: name:-pc_gamg_coarse_eq_limit value: 100 Option left: name:-pc_gamg_coarse_grid_layout_type value: compact Option left: name:-pc_gamg_esteig_ksp_max_it value: 10 Option left: name:-pc_gamg_esteig_ksp_type value: cg Option left: name:-pc_gamg_process_eq_limit value: 400 Option left: name:-pc_gamg_repartition value: false Option left: name:-pc_gamg_reuse_interpolation value: true Option left: name:-pc_gamg_square_graph value: 0 Option left: name:-pc_gamg_threshold value: -0.01 + date Tue 25 Jan 2022 01:46:55 PM EST 13:46[;1m[35m[;0m [34madams/aijkokkos-gpu-logging *= [33mcrusher:[32m/gpfs/alpine/csc314/scratch/adams/petsc/src/snes/tests/data[0m$ exit exit Script done on 2022-01-25 13:47:22-05:00 [COMMAND_EXIT_CODE="0"]