Here are two runs, without and with -log_view, respectively. My new timer is "Solve time = ...." About 10% difference
On Tue, Jan 25, 2022 at 12:53 PM Mark Adams <mfad...@lbl.gov> wrote: > BTW, a -device_view would be great. > > On Tue, Jan 25, 2022 at 12:30 PM Mark Adams <mfad...@lbl.gov> wrote: > >> >> >> On Tue, Jan 25, 2022 at 11:56 AM Jed Brown <j...@jedbrown.org> wrote: >> >>> Barry Smith <bsm...@petsc.dev> writes: >>> >>> > Thanks Mark, far more interesting. I've improved the formatting to >>> make it easier to read (and fixed width font for email reading) >>> > >>> > * Can you do same run with say 10 iterations of Jacobi PC? >>> > >>> > * PCApply performance (looks like GAMG) is terrible! Problems too >>> small? >>> >>> This is -pc_type jacobi. >>> >>> > * VecScatter time is completely dominated by SFPack! Junchao what's >>> up with that? Lots of little kernels in the PCApply? PCJACOBI run will help >>> clarify where that is coming from. >>> >>> It's all in MatMult. >>> >>> I'd like to see a run that doesn't wait for the GPU. >>> >>> >> Not sure what you mean. Can I do that? >> >> >
Script started on 2022-01-25 13:33:45-05:00 [TERM="xterm-256color" TTY="/dev/pts/0" COLUMNS="296" LINES="100"] 13:33[;1m[35m[;0m [34madams/aijkokkos-gpu-logging *= [33mcrusher:[32m/gpfs/alpine/csc314/scratch/adams/petsc/src/snes/tests/data[0m$ bash -x run_crusher_jac.sbatch + '[' -z '' ']' + case "$-" in + __lmod_vx=x + '[' -n x ']' + set +x Shell debugging temporarily silenced: export LMOD_SH_DBG_ON=1 for this output (/usr/share/lmod/lmod/init/bash) Shell debugging restarted + unset __lmod_vx + NG=8 + NC=1 + date Tue 25 Jan 2022 01:33:53 PM EST + EXTRA='-dm_view -log_viewx -ksp_view -use_gpu_aware_mpi true' + HYPRE_EXTRA='-pc_hypre_boomeramg_relax_type_all l1scaled-Jacobi -pc_hypre_boomeramg_interp_type ext+i -pc_hypre_boomeramg_coarsen_type PMIS -pc_hypre_boomeramg_no_CF' + HYPRE_EXTRA='-pc_hypre_boomeramg_no_CF true -pc_hypre_boomeramg_strong_threshold 0.75 -pc_hypre_boomeramg_agg_nl 1 -pc_hypre_boomeramg_coarsen_type HMIS -pc_hypre_boomeramg_interp_type ext+i ' + for REFINE in 5 + for NPIDX in 1 + let 'N1 = 1 * 1' ++ bc -l + PG=2.00000000000000000000 ++ printf %.0f 2.00000000000000000000 + PG=2 + let 'NCC = 8 / 1' + let 'N4 = 2 * 1' + let 'NODES = 1 * 1 * 1' + let 'N = 1 * 1 * 8' + echo n= 8 ' NODES=' 1 ' NC=' 1 ' PG=' 2 n= 8 NODES= 1 NC= 1 PG= 2 ++ printf %03d 1 + foo=001 + srun -n8 -N1 --ntasks-per-gpu=1 --gpu-bind=closest -c 8 ../ex13 -dm_plex_box_faces 2,2,2 -petscpartitioner_simple_process_grid 2,2,2 -dm_plex_box_upper 1,1,1 -petscpartitioner_simple_node_grid 1,1,1 -dm_refine 5 -dm_view -log_viewx -ksp_view -use_gpu_aware_mpi true -dm_mat_type aijkokkos -dm_vec_type kokkos -pc_type jacobi + tee jac_out_001_kokkos_Crusher_5_1_noview.txt DM Object: box 8 MPI processes type: plex box in 3 dimensions: Number of 0-cells per rank: 35937 35937 35937 35937 35937 35937 35937 35937 Number of 1-cells per rank: 104544 104544 104544 104544 104544 104544 104544 104544 Number of 2-cells per rank: 101376 101376 101376 101376 101376 101376 101376 101376 Number of 3-cells per rank: 32768 32768 32768 32768 32768 32768 32768 32768 Labels: celltype: 4 strata with value/size (0 (35937), 1 (104544), 4 (101376), 7 (32768)) depth: 4 strata with value/size (0 (35937), 1 (104544), 2 (101376), 3 (32768)) marker: 1 strata with value/size (1 (12474)) Face Sets: 3 strata with value/size (1 (3969), 3 (3969), 6 (3969)) Linear solve did not converge due to DIVERGED_ITS iterations 200 KSP Object: 8 MPI processes type: cg maximum iterations=200, initial guess is zero tolerances: relative=1e-12, absolute=1e-50, divergence=10000. left preconditioning using UNPRECONDITIONED norm type for convergence test PC Object: 8 MPI processes type: jacobi type DIAGONAL linear system matrix = precond matrix: Mat Object: 8 MPI processes type: mpiaijkokkos rows=2048383, cols=2048383 total: nonzeros=127263527, allocated nonzeros=127263527 total number of mallocs used during MatSetValues calls=0 not using I-node (on process 0) routines Linear solve did not converge due to DIVERGED_ITS iterations 200 KSP Object: 8 MPI processes type: cg maximum iterations=200, initial guess is zero tolerances: relative=1e-12, absolute=1e-50, divergence=10000. left preconditioning using UNPRECONDITIONED norm type for convergence test PC Object: 8 MPI processes type: jacobi type DIAGONAL linear system matrix = precond matrix: Mat Object: 8 MPI processes type: mpiaijkokkos rows=2048383, cols=2048383 total: nonzeros=127263527, allocated nonzeros=127263527 total number of mallocs used during MatSetValues calls=0 not using I-node (on process 0) routines Linear solve did not converge due to DIVERGED_ITS iterations 200 KSP Object: 8 MPI processes type: cg maximum iterations=200, initial guess is zero tolerances: relative=1e-12, absolute=1e-50, divergence=10000. left preconditioning using UNPRECONDITIONED norm type for convergence test PC Object: 8 MPI processes type: jacobi type DIAGONAL linear system matrix = precond matrix: Mat Object: 8 MPI processes type: mpiaijkokkos rows=2048383, cols=2048383 total: nonzeros=127263527, allocated nonzeros=127263527 total number of mallocs used during MatSetValues calls=0 not using I-node (on process 0) routines Solve time: 0.341614 #PETSc Option Table entries: -benchmark_it 2 -dm_distribute -dm_mat_type aijkokkos -dm_plex_box_faces 2,2,2 -dm_plex_box_lower 0,0,0 -dm_plex_box_upper 1,1,1 -dm_plex_dim 3 -dm_plex_simplex 0 -dm_refine 5 -dm_vec_type kokkos -dm_view -ksp_converged_reason -ksp_max_it 200 -ksp_norm_type unpreconditioned -ksp_rtol 1.e-12 -ksp_type cg -ksp_view -log_viewx -mg_levels_esteig_ksp_max_it 10 -mg_levels_esteig_ksp_type cg -mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 -mg_levels_ksp_type chebyshev -mg_levels_pc_type jacobi -options_left -pc_gamg_coarse_eq_limit 100 -pc_gamg_coarse_grid_layout_type compact -pc_gamg_esteig_ksp_max_it 10 -pc_gamg_esteig_ksp_type cg -pc_gamg_process_eq_limit 400 -pc_gamg_repartition false -pc_gamg_reuse_interpolation true -pc_gamg_square_graph 0 -pc_gamg_threshold -0.01 -pc_type jacobi -petscpartitioner_simple_node_grid 1,1,1 -petscpartitioner_simple_process_grid 2,2,2 -petscpartitioner_type simple -potential_petscspace_degree 2 -snes_max_it 1 -snes_rtol 1.e-8 -snes_type ksponly -use_gpu_aware_mpi true #End of PETSc Option Table entries WARNING! There are options you set that were not used! WARNING! could be spelling mistake, etc! There are 15 unused database options. They are: Option left: name:-log_viewx (no value) Option left: name:-mg_levels_esteig_ksp_max_it value: 10 Option left: name:-mg_levels_esteig_ksp_type value: cg Option left: name:-mg_levels_ksp_chebyshev_esteig value: 0,0.05,0,1.05 Option left: name:-mg_levels_ksp_type value: chebyshev Option left: name:-mg_levels_pc_type value: jacobi Option left: name:-pc_gamg_coarse_eq_limit value: 100 Option left: name:-pc_gamg_coarse_grid_layout_type value: compact Option left: name:-pc_gamg_esteig_ksp_max_it value: 10 Option left: name:-pc_gamg_esteig_ksp_type value: cg Option left: name:-pc_gamg_process_eq_limit value: 400 Option left: name:-pc_gamg_repartition value: false Option left: name:-pc_gamg_reuse_interpolation value: true Option left: name:-pc_gamg_square_graph value: 0 Option left: name:-pc_gamg_threshold value: -0.01 + srun -n8 -N1 --ntasks-per-gpu=1 --gpu-bind=closest -c 8 ../ex13 -dm_plex_box_faces 2,2,2 -petscpartitioner_simple_process_grid 2,2,2 -dm_plex_box_upper 1,1,1 -petscpartitioner_simple_node_grid 1,1,1 -dm_refine 5 -dm_view -log_viewx -ksp_view -use_gpu_aware_mpi true -log_view -dm_mat_type aijkokkos -dm_vec_type kokkos -pc_type jacobi + tee jac_out_001_kokkos_Crusher_5_1.txt DM Object: box 8 MPI processes type: plex box in 3 dimensions: Number of 0-cells per rank: 35937 35937 35937 35937 35937 35937 35937 35937 Number of 1-cells per rank: 104544 104544 104544 104544 104544 104544 104544 104544 Number of 2-cells per rank: 101376 101376 101376 101376 101376 101376 101376 101376 Number of 3-cells per rank: 32768 32768 32768 32768 32768 32768 32768 32768 Labels: celltype: 4 strata with value/size (0 (35937), 1 (104544), 4 (101376), 7 (32768)) depth: 4 strata with value/size (0 (35937), 1 (104544), 2 (101376), 3 (32768)) marker: 1 strata with value/size (1 (12474)) Face Sets: 3 strata with value/size (1 (3969), 3 (3969), 6 (3969)) Linear solve did not converge due to DIVERGED_ITS iterations 200 KSP Object: 8 MPI processes type: cg maximum iterations=200, initial guess is zero tolerances: relative=1e-12, absolute=1e-50, divergence=10000. left preconditioning using UNPRECONDITIONED norm type for convergence test PC Object: 8 MPI processes type: jacobi type DIAGONAL linear system matrix = precond matrix: Mat Object: 8 MPI processes type: mpiaijkokkos rows=2048383, cols=2048383 total: nonzeros=127263527, allocated nonzeros=127263527 total number of mallocs used during MatSetValues calls=0 not using I-node (on process 0) routines Linear solve did not converge due to DIVERGED_ITS iterations 200 KSP Object: 8 MPI processes type: cg maximum iterations=200, initial guess is zero tolerances: relative=1e-12, absolute=1e-50, divergence=10000. left preconditioning using UNPRECONDITIONED norm type for convergence test PC Object: 8 MPI processes type: jacobi type DIAGONAL linear system matrix = precond matrix: Mat Object: 8 MPI processes type: mpiaijkokkos rows=2048383, cols=2048383 total: nonzeros=127263527, allocated nonzeros=127263527 total number of mallocs used during MatSetValues calls=0 not using I-node (on process 0) routines Linear solve did not converge due to DIVERGED_ITS iterations 200 KSP Object: 8 MPI processes type: cg maximum iterations=200, initial guess is zero tolerances: relative=1e-12, absolute=1e-50, divergence=10000. left preconditioning using UNPRECONDITIONED norm type for convergence test PC Object: 8 MPI processes type: jacobi type DIAGONAL linear system matrix = precond matrix: Mat Object: 8 MPI processes type: mpiaijkokkos rows=2048383, cols=2048383 total: nonzeros=127263527, allocated nonzeros=127263527 total number of mallocs used during MatSetValues calls=0 not using I-node (on process 0) routines Solve time: 0.373754 **************************************** *********************************************************************************************************************** *** WIDEN YOUR WINDOW TO 160 CHARACTERS. Use 'enscript -r -fCourier9' to print this document *** **************************************************************************************************************************************************************** ------------------------------------------------------------------ PETSc Performance Summary: ------------------------------------------------------------------- /gpfs/alpine/csc314/scratch/adams/petsc/src/snes/tests/data/../ex13 on a arch-olcf-crusher named crusher002 with 8 processors, by adams Tue Jan 25 13:36:10 2022 Using Petsc Development GIT revision: v3.16.3-696-g46640c56cb GIT Date: 2022-01-25 09:20:51 -0500 Max Max/Min Avg Total Time (sec): 6.792e+01 1.000 6.792e+01 Objects: 1.920e+03 1.028 1.877e+03 Flop: 2.402e+10 1.054 2.340e+10 1.872e+11 Flop/sec: 3.537e+08 1.054 3.445e+08 2.756e+09 MPI Messages: 4.778e+03 1.063 4.552e+03 3.642e+04 MPI Message Lengths: 1.120e+08 1.030 2.416e+04 8.799e+08 MPI Reductions: 1.988e+03 1.000 Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) e.g., VecAXPY() for real vectors of length N --> 2N flop and VecAXPY() for complex vectors of length N --> 8N flop Summary of Stages: ----- Time ------ ----- Flop ------ --- Messages --- -- Message Lengths -- -- Reductions -- Avg %Total Avg %Total Count %Total Avg %Total Count %Total 0: Main Stage: 6.7566e+01 99.5% 7.4725e+10 39.9% 1.402e+04 38.5% 2.884e+04 45.9% 7.630e+02 38.4% 1: PCSetUp: 1.5145e-02 0.0% 0.0000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0% 2: KSP Solve only: 3.4260e-01 0.5% 1.1247e+11 60.1% 2.240e+04 61.5% 2.123e+04 54.1% 1.206e+03 60.7% ------------------------------------------------------------------------------------------------------------------------ See the 'Profiling' chapter of the users' manual for details on interpreting output. Phase summary info: Count: number of times phase was executed Time and Flop: Max - maximum over all processors Ratio - ratio of maximum to minimum over all processors Mess: number of messages sent AvgLen: average message length (bytes) Reduct: number of global reductions Global: entire computation Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). %T - percent time in this phase %F - percent flop in this phase %M - percent messages in this phase %L - percent message lengths in this phase %R - percent reductions in this phase Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors) GPU Mflop/s: 10e-6 * (sum of flop on GPU over all processors)/(max GPU time over all processors) CpuToGpu Count: total number of CPU to GPU copies per processor CpuToGpu Size (Mbytes): 10e-6 * (total size of CPU to GPU copies per processor) GpuToCpu Count: total number of GPU to CPU copies per processor GpuToCpu Size (Mbytes): 10e-6 * (total size of GPU to CPU copies per processor) GPU %F: percent flops on GPU in this event ------------------------------------------------------------------------------------------------------------------------ Event Count Time (sec) Flop --- Global --- --- Stage ---- Total GPU - CpuToGpu - - GpuToCpu - GPU Max Ratio Max Ratio Max Ratio Mess AvgLen Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s Mflop/s Count Size Count Size %F --------------------------------------------------------------------------------------------------------------------------------------------------------------- --- Event Stage 0: Main Stage PetscBarrier 5 1.0 1.5201e-01 1.0 0.00e+00 0.0 7.8e+02 9.9e+02 1.8e+01 0 0 2 0 1 0 0 6 0 2 0 0 0 0.00e+00 0 0.00e+00 0 BuildTwoSided 40 1.0 3.2703e-0110.9 0.00e+00 0.0 7.1e+02 4.0e+00 4.0e+01 0 0 2 0 2 0 0 5 0 5 0 0 0 0.00e+00 0 0.00e+00 0 BuildTwoSidedF 6 1.0 3.0504e-0113.9 0.00e+00 0.0 1.5e+02 4.8e+05 6.0e+00 0 0 0 8 0 0 0 1 18 1 0 0 0 0.00e+00 0 0.00e+00 0 MatMult 12109 1.0 1.2460e-01 1.1 6.56e+09 1.1 1.1e+04 2.1e+04 2.0e+00 0 27 32 27 0 0 68 82 59 0 408579 664816 1 7.43e-02 0 0.00e+00 100 MatAssemblyBegin 43 1.0 3.1584e-01 3.8 0.00e+00 0.0 1.5e+02 4.8e+05 6.0e+00 0 0 0 8 0 0 0 1 18 1 0 0 0 0.00e+00 0 0.00e+00 0 MatAssemblyEnd 43 1.0 1.7832e-01 3.9 1.16e+06 0.0 0.0e+00 0.0e+00 9.0e+00 0 0 0 0 0 0 0 0 0 1 25 0 0 0.00e+00 0 0.00e+00 0 MatZeroEntries 3 1.0 4.2839e-04 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 MatView 1 1.0 9.1165e-05 2.2 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 KSPSetUp 1 1.0 1.2154e-03 2.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 KSPSolve 1 1.0 2.7111e-01 1.3 7.24e+09 1.1 1.1e+04 2.1e+04 6.0e+02 0 30 31 27 30 0 75 81 59 79 207437 462502 1 7.43e-02 0 0.00e+00 100 SNESSolve 1 1.0 2.9523e+01 1.0 8.41e+09 1.1 1.1e+04 2.4e+04 6.1e+02 43 35 31 31 31 44 88 82 68 80 2224 462047 3 2.15e+00 2 4.10e+00 86 SNESSetUp 1 1.0 5.9730e+00 1.0 0.00e+00 0.0 3.6e+02 2.3e+05 1.8e+01 9 0 1 10 1 9 0 3 21 2 0 0 0 0.00e+00 0 0.00e+00 0 SNESFunctionEval 2 1.0 2.0096e+00 1.1 7.96e+08 1.0 1.1e+02 1.5e+04 3.0e+00 3 3 0 0 0 3 9 1 0 0 3170 20058 3 4.12e+00 2 4.10e+00 0 SNESJacobianEval 2 1.0 5.9313e+01 1.0 1.52e+09 1.0 1.1e+02 6.5e+05 2.0e+00 87 6 0 8 0 88 16 1 18 0 204 0 0 0.00e+00 2 4.10e+00 0 DMCreateInterp 1 1.0 8.9532e-04 1.1 8.29e+04 1.0 7.6e+01 1.1e+03 1.6e+01 0 0 0 0 1 0 0 1 0 2 741 0 0 0.00e+00 0 0.00e+00 0 DMCreateMat 1 1.0 5.9724e+00 1.0 0.00e+00 0.0 3.6e+02 2.3e+05 1.8e+01 9 0 1 10 1 9 0 3 21 2 0 0 0 0.00e+00 0 0.00e+00 0 Mesh Partition 1 1.0 7.0985e-04 1.1 0.00e+00 0.0 3.5e+01 1.1e+02 8.0e+00 0 0 0 0 0 0 0 0 0 1 0 0 0 0.00e+00 0 0.00e+00 0 Mesh Migration 1 1.0 3.3181e-03 1.0 0.00e+00 0.0 2.0e+02 8.2e+01 2.9e+01 0 0 1 0 1 0 0 1 0 4 0 0 0 0.00e+00 0 0.00e+00 0 DMPlexPartSelf 1 1.0 1.0845e-0414.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 DMPlexPartLblInv 1 1.0 2.1960e-04 2.2 0.00e+00 0.0 0.0e+00 0.0e+00 3.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 DMPlexPartLblSF 1 1.0 1.1283e-04 1.5 0.00e+00 0.0 1.4e+01 5.6e+01 1.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 DMPlexPartStrtSF 1 1.0 1.3113e-04 1.1 0.00e+00 0.0 7.0e+00 2.2e+02 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 DMPlexPointSF 1 1.0 2.1087e-04 1.1 0.00e+00 0.0 1.4e+01 2.7e+02 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 DMPlexInterp 19 1.0 5.6904e-04 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 DMPlexDistribute 1 1.0 4.2444e-03 1.0 0.00e+00 0.0 2.5e+02 9.7e+01 3.7e+01 0 0 1 0 2 0 0 2 0 5 0 0 0 0.00e+00 0 0.00e+00 0 DMPlexDistCones 1 1.0 1.2005e-04 1.0 0.00e+00 0.0 4.2e+01 1.4e+02 2.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 DMPlexDistLabels 1 1.0 3.1844e-04 1.0 0.00e+00 0.0 1.0e+02 6.6e+01 2.4e+01 0 0 0 0 1 0 0 1 0 3 0 0 0 0.00e+00 0 0.00e+00 0 DMPlexDistField 1 1.0 2.7282e-03 1.0 0.00e+00 0.0 4.9e+01 5.9e+01 2.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 DMPlexStratify 33 1.0 5.5198e-03 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 7.0e+00 0 0 0 0 0 0 0 0 0 1 0 0 0 0.00e+00 0 0.00e+00 0 DMPlexSymmetrize 33 1.0 1.2717e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 DMPlexPrealloc 1 1.0 5.9666e+00 1.0 0.00e+00 0.0 3.6e+02 2.3e+05 1.6e+01 9 0 1 10 1 9 0 3 21 2 0 0 0 0.00e+00 0 0.00e+00 0 DMPlexResidualFE 2 1.0 1.5728e+00 1.0 7.87e+08 1.0 0.0e+00 0.0e+00 0.0e+00 2 3 0 0 0 2 8 0 0 0 4003 0 0 0.00e+00 0 0.00e+00 0 DMPlexJacobianFE 2 1.0 5.9201e+01 1.0 1.51e+09 1.0 7.6e+01 9.7e+05 2.0e+00 87 6 0 8 0 87 16 1 18 0 203 0 0 0.00e+00 0 0.00e+00 0 DMPlexInterpFE 1 1.0 8.6844e-04 1.1 8.29e+04 1.0 7.6e+01 1.1e+03 1.6e+01 0 0 0 0 1 0 0 1 0 2 764 0 0 0.00e+00 0 0.00e+00 0 SFSetGraph 43 1.0 7.5083e-04 3.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 SFSetUp 34 1.0 4.7128e-02 1.2 0.00e+00 0.0 1.3e+03 2.4e+04 3.4e+01 0 0 3 3 2 0 0 9 7 4 0 0 0 0.00e+00 0 0.00e+00 0 SFBcastBegin 65 1.0 1.7854e-0146.9 0.00e+00 0.0 9.8e+02 1.4e+04 0.0e+00 0 0 3 2 0 0 0 7 3 0 0 0 1 2.44e-02 4 8.19e+00 0 SFBcastEnd 65 1.0 2.7145e-0137.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 SFReduceBegin 16 1.0 1.3564e-0187.4 5.24e+05 1.0 2.9e+02 1.0e+05 0.0e+00 0 0 1 3 0 0 0 2 7 0 30 0 2 4.10e+00 0 0.00e+00 100 SFReduceEnd 16 1.0 1.9102e-0125.0 2.50e+04 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1 0 0 0.00e+00 0 0.00e+00 100 SFFetchOpBegin 2 1.0 5.4600e-04124.1 0.00e+00 0.0 3.8e+01 2.5e+05 0.0e+00 0 0 0 1 0 0 0 0 2 0 0 0 0 0.00e+00 0 0.00e+00 0 SFFetchOpEnd 2 1.0 2.6090e-03 1.6 0.00e+00 0.0 3.8e+01 2.5e+05 0.0e+00 0 0 0 1 0 0 0 0 2 0 0 0 0 0.00e+00 0 0.00e+00 0 SFCreateEmbed 8 1.0 9.0613e-02142.5 0.00e+00 0.0 1.4e+02 8.5e+02 0.0e+00 0 0 0 0 0 0 0 1 0 0 0 0 0 0.00e+00 0 0.00e+00 0 SFDistSection 9 1.0 3.4540e-03 2.0 0.00e+00 0.0 3.1e+02 6.5e+03 1.1e+01 0 0 1 0 1 0 0 2 0 1 0 0 0 0.00e+00 0 0.00e+00 0 SFSectionSF 16 1.0 1.8106e-02 1.7 0.00e+00 0.0 4.8e+02 2.0e+04 1.6e+01 0 0 1 1 1 0 0 3 2 2 0 0 0 0.00e+00 0 0.00e+00 0 SFRemoteOff 7 1.0 9.2144e-0240.8 0.00e+00 0.0 4.2e+02 1.6e+03 4.0e+00 0 0 1 0 0 0 0 3 0 1 0 0 0 0.00e+00 0 0.00e+00 0 SFPack 290 1.0 1.5857e-0163.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 2 9.87e-02 0 0.00e+00 0 SFUnpack 292 1.0 1.3520e-0165.8 5.49e+05 1.1 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 31 0 0 0.00e+00 0 0.00e+00 100 VecTDot 401 1.0 3.9918e-02 1.6 2.10e+08 1.0 0.0e+00 0.0e+00 4.0e+02 0 1 0 0 20 0 2 0 0 53 41154 98745 0 0.00e+00 0 0.00e+00 100 VecNorm 201 1.0 7.9221e-02 5.3 1.05e+08 1.0 0.0e+00 0.0e+00 2.0e+02 0 0 0 0 10 0 1 0 0 26 10394 78440 0 0.00e+00 0 0.00e+00 100 VecCopy 2 1.0 1.0405e-04 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 VecSet 54 1.0 1.3159e-03 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 VecAXPY 400 1.0 1.1370e-02 1.1 2.10e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 2 0 0 0 144122 202169 0 0.00e+00 0 0.00e+00 100 VecAYPX 199 1.0 5.3881e-03 1.1 1.04e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 1 0 0 0 151307 226976 0 0.00e+00 0 0.00e+00 100 VecPointwiseMult 201 1.0 5.8045e-03 1.1 5.27e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 1 0 0 0 70933 102845 0 0.00e+00 0 0.00e+00 100 VecScatterBegin 201 1.0 2.7933e-02 4.7 0.00e+00 0.0 1.1e+04 2.1e+04 2.0e+00 0 0 32 27 0 0 0 82 59 0 0 0 1 7.43e-02 0 0.00e+00 0 VecScatterEnd 201 1.0 1.8493e-02 2.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 DualSpaceSetUp 2 1.0 2.4687e-03 1.0 1.80e+03 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 6 0 0 0.00e+00 0 0.00e+00 0 FESetUp 2 1.0 1.0635e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 PCSetUp 1 1.0 4.3190e-06 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 PCApply 201 1.0 2.7189e-02 1.0 5.27e+07 1.0 0.0e+00 0.0e+00 2.0e+00 0 0 0 0 0 0 1 0 0 0 15143 43140 0 0.00e+00 0 0.00e+00 100 --- Event Stage 1: PCSetUp PCSetUp 1 1.0 1.6281e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 100 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 --- Event Stage 2: KSP Solve only MatMult 400 1.0 1.9253e-01 1.1 1.31e+10 1.1 2.2e+04 2.1e+04 0.0e+00 0 54 62 54 0 54 91100100 0 528807 717110 0 0.00e+00 0 0.00e+00 100 MatView 2 1.0 8.5814e-05 2.3 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 KSPSolve 2 1.0 3.7359e-01 1.2 1.45e+10 1.1 2.2e+04 2.1e+04 1.2e+03 1 60 62 54 61 100100100100100 301067 520834 0 0.00e+00 0 0.00e+00 100 SFPack 400 1.0 1.3133e-03 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 SFUnpack 400 1.0 3.9090e-05 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 VecTDot 802 1.0 6.5250e-02 1.5 4.20e+08 1.0 0.0e+00 0.0e+00 8.0e+02 0 2 0 0 40 15 3 0 0 67 50354 108871 0 0.00e+00 0 0.00e+00 100 VecNorm 402 1.0 9.4344e-02 3.4 2.11e+08 1.0 0.0e+00 0.0e+00 4.0e+02 0 1 0 0 20 19 1 0 0 33 17456 82582 0 0.00e+00 0 0.00e+00 100 VecCopy 4 1.0 1.7995e-04 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 VecSet 4 1.0 1.7595e-04 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 VecAXPY 800 1.0 2.0554e-02 1.1 4.19e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 2 0 0 0 6 3 0 0 0 159451 231664 0 0.00e+00 0 0.00e+00 100 VecAYPX 398 1.0 1.0453e-02 1.1 2.09e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 3 1 0 0 0 155981 224425 0 0.00e+00 0 0.00e+00 100 VecPointwiseMult 402 1.0 1.1216e-02 1.1 1.05e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 3 1 0 0 0 73420 107169 0 0.00e+00 0 0.00e+00 100 VecScatterBegin 400 1.0 1.7302e-02 1.6 0.00e+00 0.0 2.2e+04 2.1e+04 0.0e+00 0 0 62 54 0 4 0100100 0 0 0 0 0.00e+00 0 0.00e+00 0 VecScatterEnd 400 1.0 1.3178e-02 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 3 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 PCApply 402 1.0 1.1307e-02 1.1 1.05e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 3 1 0 0 0 72825 107169 0 0.00e+00 0 0.00e+00 100 --------------------------------------------------------------------------------------------------------------------------------------------------------------- Memory usage is given in bytes: Object Type Creations Destructions Memory Descendants' Mem. Reports information only for process 0. --- Event Stage 0: Main Stage Container 32 32 18432 0. SNES 1 1 1540 0. DMSNES 1 1 688 0. Krylov Solver 1 1 1664 0. DMKSP interface 1 1 656 0. Matrix 75 75 195551600 0. Distributed Mesh 70 70 7826872 0. DM Label 172 172 108704 0. Quadrature 148 148 87616 0. Mesh Transform 5 5 3780 0. Index Set 633 633 1440932 0. IS L to G Mapping 2 2 1100416 0. Section 249 249 177288 0. Star Forest Graph 173 173 188592 0. Discrete System 116 116 111364 0. Weak Form 117 117 72072 0. GraphPartitioner 33 33 22704 0. Vector 54 54 19589336 0. Linear Space 5 5 3416 0. Dual Space 26 26 24336 0. FE Space 2 2 1576 0. Viewer 2 1 840 0. Preconditioner 1 1 872 0. Field over DM 1 1 704 0. --- Event Stage 1: PCSetUp --- Event Stage 2: KSP Solve only ======================================================================================================================== Average time to get PetscTime(): 3.5e-08 Average time for MPI_Barrier(): 2.679e-06 Average time for zero size MPI_Send(): 1.07156e-05 #PETSc Option Table entries: -benchmark_it 2 -dm_distribute -dm_mat_type aijkokkos -dm_plex_box_faces 2,2,2 -dm_plex_box_lower 0,0,0 -dm_plex_box_upper 1,1,1 -dm_plex_dim 3 -dm_plex_simplex 0 -dm_refine 5 -dm_vec_type kokkos -dm_view -ksp_converged_reason -ksp_max_it 200 -ksp_norm_type unpreconditioned -ksp_rtol 1.e-12 -ksp_type cg -ksp_view -log_view -log_viewx -mg_levels_esteig_ksp_max_it 10 -mg_levels_esteig_ksp_type cg -mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 -mg_levels_ksp_type chebyshev -mg_levels_pc_type jacobi -options_left -pc_gamg_coarse_eq_limit 100 -pc_gamg_coarse_grid_layout_type compact -pc_gamg_esteig_ksp_max_it 10 -pc_gamg_esteig_ksp_type cg -pc_gamg_process_eq_limit 400 -pc_gamg_repartition false -pc_gamg_reuse_interpolation true -pc_gamg_square_graph 0 -pc_gamg_threshold -0.01 -pc_type jacobi -petscpartitioner_simple_node_grid 1,1,1 -petscpartitioner_simple_process_grid 2,2,2 -petscpartitioner_type simple -potential_petscspace_degree 2 -snes_max_it 1 -snes_rtol 1.e-8 -snes_type ksponly -use_gpu_aware_mpi true #End of PETSc Option Table entries Compiled without FORTRAN kernels Compiled with full precision matrices (default) sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4 Configure options: --with-cc=cc --with-cxx=CC --with-fc=ftn --with-fortran-bindings=0 LIBS="-L/opt/cray/pe/mpich/8.1.12/gtl/lib -lmpi_gtl_hsa" --with-debugging=0 --COPTFLAGS="-g -O" --CXXOPTFLAGS="-g -O" --FOPTFLAGS=-g --with-mpiexec="srun -p batch -N 1 -A csc314_crusher -t 00:10:00" --with-hip --with-hipc=hipcc --download-hypre --with-hip-arch=gfx90a --download-kokkos --download-kokkos-kernels --with-kokkos-kernels-tpl=0 --download-p4est=1 --with-zlib-dir=/sw/crusher/spack-envs/base/opt/cray-sles15-zen3/cce-13.0.0/zlib-1.2.11-qx5p4iereg4sjvfi5uwk6jn56o6se2q4 PETSC_ARCH=arch-olcf-crusher ----------------------------------------- Libraries compiled on 2022-01-25 14:29:13 on login2 Machine characteristics: Linux-5.3.18-59.16_11.0.39-cray_shasta_c-x86_64-with-glibc2.3.4 Using PETSc directory: /gpfs/alpine/csc314/scratch/adams/petsc Using PETSc arch: arch-olcf-crusher ----------------------------------------- Using C compiler: cc -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fstack-protector -Qunused-arguments -fvisibility=hidden -g -O Using Fortran compiler: ftn -fPIC -g ----------------------------------------- Using include paths: -I/gpfs/alpine/csc314/scratch/adams/petsc/include -I/gpfs/alpine/csc314/scratch/adams/petsc/arch-olcf-crusher/include -I/sw/crusher/spack-envs/base/opt/cray-sles15-zen3/cce-13.0.0/zlib-1.2.11-qx5p4iereg4sjvfi5uwk6jn56o6se2q4/include -I/opt/rocm-4.5.0/include ----------------------------------------- Using C linker: cc Using Fortran linker: ftn Using libraries: -Wl,-rpath,/gpfs/alpine/csc314/scratch/adams/petsc/arch-olcf-crusher/lib -L/gpfs/alpine/csc314/scratch/adams/petsc/arch-olcf-crusher/lib -lpetsc -Wl,-rpath,/gpfs/alpine/csc314/scratch/adams/petsc/arch-olcf-crusher/lib -L/gpfs/alpine/csc314/scratch/adams/petsc/arch-olcf-crusher/lib -Wl,-rpath,/sw/crusher/spack-envs/base/opt/cray-sles15-zen3/cce-13.0.0/zlib-1.2.11-qx5p4iereg4sjvfi5uwk6jn56o6se2q4/lib -L/sw/crusher/spack-envs/base/opt/cray-sles15-zen3/cce-13.0.0/zlib-1.2.11-qx5p4iereg4sjvfi5uwk6jn56o6se2q4/lib -Wl,-rpath,/opt/rocm-4.5.0/lib -L/opt/rocm-4.5.0/lib -Wl,-rpath,/opt/cray/pe/mpich/8.1.12/gtl/lib -L/opt/cray/pe/mpich/8.1.12/gtl/lib -Wl,-rpath,/opt/cray/pe/gcc/8.1.0/snos/lib64 -L/opt/cray/pe/gcc/8.1.0/snos/lib64 -Wl,-rpath,/opt/cray/pe/libsci/21.08.1.2/CRAY/9.0/x86_64/lib -L/opt/cray/pe/libsci/21.08.1.2/CRAY/9.0/x86_64/lib -Wl,-rpath,/opt/cray/pe/mpich/8.1.12/ofi/cray/10.0/lib -L/opt/cray/pe/mpich/8.1.12/ofi/cray/10.0/lib -Wl,-rpath,/opt/cray/pe/dsmml/0.2.2/dsmml/lib -L/opt/cray/pe/dsmml/0.2.2/dsmml/lib -Wl,-rpath,/opt/cray/pe/pmi/6.0.16/lib -L/opt/cray/pe/pmi/6.0.16/lib -Wl,-rpath,/opt/cray/pe/cce/13.0.0/cce/x86_64/lib -L/opt/cray/pe/cce/13.0.0/cce/x86_64/lib -Wl,-rpath,/opt/cray/xpmem/2.3.2-2.2_1.16__g9ea452c.shasta/lib64 -L/opt/cray/xpmem/2.3.2-2.2_1.16__g9ea452c.shasta/lib64 -Wl,-rpath,/opt/cray/pe/cce/13.0.0/cce-clang/x86_64/lib/clang/13.0.0/lib/linux -L/opt/cray/pe/cce/13.0.0/cce-clang/x86_64/lib/clang/13.0.0/lib/linux -Wl,-rpath,/opt/cray/pe/gcc/8.1.0/snos/lib/gcc/x86_64-suse-linux/8.1.0 -L/opt/cray/pe/gcc/8.1.0/snos/lib/gcc/x86_64-suse-linux/8.1.0 -Wl,-rpath,/opt/cray/pe/cce/13.0.0/binutils/x86_64/x86_64-unknown-linux-gnu/lib -L/opt/cray/pe/cce/13.0.0/binutils/x86_64/x86_64-unknown-linux-gnu/lib -lHYPRE -lkokkoskernels -lkokkoscontainers -lkokkoscore -lp4est -lsc -lz -lhipsparse -lhipblas -lrocsparse -lrocsolver -lrocblas -lrocrand -lamdhip64 -ldl -lmpi_gtl_hsa -lmpifort_cray -lmpi_cray -ldsmml -lpmi -lpmi2 -lxpmem -lstdc++ -lpgas-shmem -lquadmath -lmodules -lfi -lcraymath -lf -lu -lcsup -lgfortran -lpthread -lgcc_eh -lm -lclang_rt.craypgo-x86_64 -lclang_rt.builtins-x86_64 -lquadmath -ldl -lmpi_gtl_hsa ----------------------------------------- #PETSc Option Table entries: -benchmark_it 2 -dm_distribute -dm_mat_type aijkokkos -dm_plex_box_faces 2,2,2 -dm_plex_box_lower 0,0,0 -dm_plex_box_upper 1,1,1 -dm_plex_dim 3 -dm_plex_simplex 0 -dm_refine 5 -dm_vec_type kokkos -dm_view -ksp_converged_reason -ksp_max_it 200 -ksp_norm_type unpreconditioned -ksp_rtol 1.e-12 -ksp_type cg -ksp_view -log_view -log_viewx -mg_levels_esteig_ksp_max_it 10 -mg_levels_esteig_ksp_type cg -mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 -mg_levels_ksp_type chebyshev -mg_levels_pc_type jacobi -options_left -pc_gamg_coarse_eq_limit 100 -pc_gamg_coarse_grid_layout_type compact -pc_gamg_esteig_ksp_max_it 10 -pc_gamg_esteig_ksp_type cg -pc_gamg_process_eq_limit 400 -pc_gamg_repartition false -pc_gamg_reuse_interpolation true -pc_gamg_square_graph 0 -pc_gamg_threshold -0.01 -pc_type jacobi -petscpartitioner_simple_node_grid 1,1,1 -petscpartitioner_simple_process_grid 2,2,2 -petscpartitioner_type simple -potential_petscspace_degree 2 -snes_max_it 1 -snes_rtol 1.e-8 -snes_type ksponly -use_gpu_aware_mpi true #End of PETSc Option Table entries WARNING! There are options you set that were not used! WARNING! could be spelling mistake, etc! There are 15 unused database options. They are: Option left: name:-log_viewx (no value) Option left: name:-mg_levels_esteig_ksp_max_it value: 10 Option left: name:-mg_levels_esteig_ksp_type value: cg Option left: name:-mg_levels_ksp_chebyshev_esteig value: 0,0.05,0,1.05 Option left: name:-mg_levels_ksp_type value: chebyshev Option left: name:-mg_levels_pc_type value: jacobi Option left: name:-pc_gamg_coarse_eq_limit value: 100 Option left: name:-pc_gamg_coarse_grid_layout_type value: compact Option left: name:-pc_gamg_esteig_ksp_max_it value: 10 Option left: name:-pc_gamg_esteig_ksp_type value: cg Option left: name:-pc_gamg_process_eq_limit value: 400 Option left: name:-pc_gamg_repartition value: false Option left: name:-pc_gamg_reuse_interpolation value: true Option left: name:-pc_gamg_square_graph value: 0 Option left: name:-pc_gamg_threshold value: -0.01 + date Tue 25 Jan 2022 01:36:10 PM EST 13:36[;1m[35m[;0m [34madams/aijkokkos-gpu-logging *= [33mcrusher:[32m/gpfs/alpine/csc314/scratch/adams/petsc/src/snes/tests/data[0m$ exit exit Script done on 2022-01-25 13:36:17-05:00 [COMMAND_EXIT_CODE="0"]