I am running snes/ex13 on Perlmutter and doing a scaling study in a script. The first case runs fine:
+ srun -G 32 -n 256 --cpu-bind=cores --ntasks-per-core=1 /global/homes/m/madams/mps-wrapper.sh ../ex13 -dm_plex_box_faces 4,8,8 -petscpartitioner_simple_node_grid 2,2,2 *-dm_refine 3 *-dm_mat_type aijcusparse -dm_vec_type cuda -dm_view -ksp_max_it 15 -log_view but the next case *with another levels of refinement*: + srun -G 32 -n 256 --cpu-bind=cores --ntasks-per-core=1 /global/homes/m/madams/mps-wrapper.sh ../ex13 -dm_plex_box_faces 4,8,8 -petscpartitioner_simple_node_grid 2,2,2 *-dm_refine 4 *-dm_mat_type aijcusparse -dm_vec_type cuda -dm_view -ksp_max_it 15 -log_ hangs in BuildTwoSided. With log_trace I see this (grepping on 177). Args appended. Any ideas? Thanks [177] 21.4441 Event begin: MatGetBrAoCol [177] 21.4441 Event begin: SFSetUp [177] 21.4441 Event begin: BuildTwoSided [177] 21.5443 Event end: BuildTwoSided [177] 21.5443 Event end: SFSetUp [177] 21.5443 Event begin: MatAssemblyBegin [177] 21.5444 Event end: MatAssemblyBegin [177] 21.5444 Event begin: MatAssemblyEnd [177] 21.5444 Event end: MatAssemblyEnd [177] 21.5444 Event end: MatGetBrAoCol [177] 21.5444 Event begin: MatGetLocalMat [177] 21.5569 Event begin: MatCUSPARSCopyTo [177] 21.557 Event end: MatCUSPARSCopyTo [177] 21.557 Event begin: MatCUSPARSCopyTo [177] 21.5571 Event end: MatCUSPARSCopyTo [177] 21.5571 Event end: MatGetLocalMat [177] 21.5571 Event begin: MatCUSPARSCopyTo [177] 21.5571 Event end: MatCUSPARSCopyTo [177] 21.5698 Event begin: MatCUSPARSCopyTo [177] 21.5698 Event end: MatCUSPARSCopyTo [177] 21.5827 Event begin: MatConvert [177] 21.5954 Event end: MatConvert [177] 21.5954 Event begin: MatCUSPARSCopyTo [177] 21.5954 Event end: MatCUSPARSCopyTo [177] 21.5954 Event begin: MatCUSPARSCopyTo [177] 21.5955 Event end: MatCUSPARSCopyTo [177] 21.6208 Event begin: SFSetGraph [177] 21.6208 Event end: SFSetGraph [177] 21.6208 Event begin: SFSetUp [177] 21.6208 Event begin: BuildTwoSided #PETSc Option Table entries: -benchmark_it 10 -dm_distribute -dm_mat_type aijcusparse -dm_plex_box_faces 4,8,8 -dm_plex_box_lower 0,0,0 -dm_plex_box_upper 2,4,4 -dm_plex_dim 3 -dm_plex_simplex 0 -dm_refine 3 -dm_vec_type cuda -dm_view -ksp_max_it 15 -ksp_monitor_short -ksp_norm_type unpreconditioned -ksp_rtol 1.e-12 -ksp_type cg -log_view -matptap_via scalable -mg_levels_esteig_ksp_max_it 5 -mg_levels_esteig_ksp_type cg -mg_levels_ksp_max_it 2 -mg_levels_ksp_type richardson -mg_levels_pc_type jacobi -options_left -pc_gamg_coarse_eq_limit 100 -pc_gamg_coarse_grid_layout_type spread -pc_gamg_esteig_ksp_max_it 5 -pc_gamg_esteig_ksp_type cg -pc_gamg_process_eq_limit 100 -pc_gamg_repartition false -pc_gamg_reuse_interpolation true -pc_gamg_square_graph 1 -pc_gamg_threshold 0.01 -pc_gamg_threshold_scale .5 -pc_type gamg -petscpartitioner_simple_node_grid 2,2,2 -petscpartitioner_simple_process_grid 2,4,4 -petscpartitioner_type simple -potential_petscspace_degree 2 -snes_max_it 1 -snes_rtol 1.e-8 -snes_type ksponly -use_gpu_aware_mpi 0 #End of PETSc Option Table entries Compiled without FORTRAN kernels Compiled with full precision matrices (default) sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4 Configure options: --CFLAGS=" -g -DLANDAU_DIM=2 -DLANDAU_MAX_SPECIES=10 -DLANDAU_MAX_Q=4" --CXXFLAGS=" -g -DLANDAU_DIM=2 -DLANDAU_MAX_SPECIES=10 -DLANDAU_MAX_Q=4" --CUDAFLAGS="-g -Xcompiler -rdynamic -DLANDAU_DIM=2 -DLANDAU_MAX_SPECIES=10 -DLANDAU_MAX_ Q=4" --with-cc=cc --with-cxx=CC --with-fc=ftn --with-cudac=/global/common/software/nersc/cos1.3/cuda/11.3.0/bin/nvcc --FFLAGS=" -g " --COPTFLAGS=" -O" --CXXOPTFLAGS=" -O" --FOPTFLAGS=" -O" --with-debugging=0 --with-cuda=1 --with-cuda-arch=80 --with -mpiexec=srun --with-batch=0 --download-p4est=1 --with-zlib=1 PETSC_ARCH=arch-perlmutter-opt-nvidia-cuda ----------------------------------------- Libraries compiled on 2021-10-16 18:33:45 on login02