Barry, Option left: name:-mg_coarse_mat_solver_type value: cusparse
I tried this too: Option left: name:-mg_coarse_sub_mat_solver_type value: cusparse Here is the view. cuda did not get into the factor type. PC Object: 24 MPI processes type: gamg type is MULTIPLICATIVE, levels=5 cycles=v Cycles per PCApply=1 Using externally compute Galerkin coarse grid matrices GAMG specific options Threshold for dropping small values in graph on each level = 0.05 0.025 0.0125 Threshold scaling factor for each level not specified = 0.5 AGG specific options Symmetric graph false Number of levels to square graph 10 Number smoothing steps 1 Complexity: grid = 1.14213 Coarse grid solver -- level ------------------------------- KSP Object: (mg_coarse_) 24 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using NONE norm type for convergence test PC Object: (mg_coarse_) 24 MPI processes type: bjacobi number of blocks = 24 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (mg_coarse_sub_) 1 MPI processes type: preonly maximum iterations=1, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using NONE norm type for convergence test PC Object: (mg_coarse_sub_) 1 MPI processes type: lu out-of-place factorization tolerance for zero pivot 2.22045e-14 using diagonal shift on blocks to prevent zero pivot [INBLOCKS] matrix ordering: nd factor fill ratio given 5., needed 1. Factored matrix follows: Mat Object: 1 MPI processes *type: seqaij* rows=6, cols=6 package used to perform factorization: petsc total: nonzeros=36, allocated nonzeros=36 total number of mallocs used during MatSetValues calls =0 using I-node routines: found 2 nodes, limit used is 5 linear system matrix = precond matrix: Mat Object: 1 MPI processes *type: seqaijcusparse* rows=6, cols=6 total: nonzeros=36, allocated nonzeros=36 total number of mallocs used during MatSetValues calls =0 using I-node routines: found 2 nodes, limit used is 5 linear system matrix = precond matrix: Mat Object: 24 MPI processes * type: mpiaijcusparse* rows=6, cols=6, bs=6 total: nonzeros=36, allocated nonzeros=36 total number of mallocs used during MatSetValues calls =0 using scalable MatPtAP() implementation using I-node (on process 0) routines: found 2 nodes, limit used is 5 Down solver (pre-smoother) on level 1 ------------------------------- On Sun, Jul 21, 2019 at 3:58 PM Mark Adams <mfad...@lbl.gov> wrote: > Barry, I do NOT see communication. This is what made me think it was not > running on the GPU. I added print statements and found that > MatSolverTypeRegister_CUSPARSE IS called but (what it registers) > MatGetFactor_seqaijcusparse_cusparse does NOT get called. > > I have a job waiting on the queue. I'll send ksp_view when it runs. I will > try -mg_coarse_mat_solver_type cusparse. That is probably the problem. > Maybe I should set the coarse grid solver in a more robust way in GAMG, > like use the matrix somehow? I currently use PCSetType(pc, PCLU). > > I can't get an interactive shell now to run DDT, but I can try stepping > through from MatGetFactor to see what its doing. > > Thanks, > Mark > > On Sun, Jul 21, 2019 at 11:14 AM Smith, Barry F. <bsm...@mcs.anl.gov> > wrote: > >> >> >> > On Jul 21, 2019, at 8:55 AM, Mark Adams via petsc-dev < >> petsc-dev@mcs.anl.gov> wrote: >> > >> > I am running ex56 with -ex56_dm_vec_type cuda -ex56_dm_mat_type >> aijcusparse and I see no GPU communication in MatSolve (the serial LU >> coarse grid solver). >> >> Do you mean to say, you DO see communication? >> >> What does -ksp_view should you? It should show the factor type in the >> information about the coarse grid solve? >> >> You might need something like -mg_coarse_mat_solver_type cusparse >> (because it may default to the PETSc one, it may be possible to have it >> default to the cusparse if it exists and the matrix is of type >> MATSEQAIJCUSPARSE). >> >> The determination of the MatGetFactor() is a bit involved including >> pasting together strings and string compares and could be finding a CPU >> factorization. >> >> I could run on one MPI_Rank() in the debugger and put a break point in >> MatGetFactor() and track along to see what it picks and why. You could do >> this debugging without GAMG first, just -pc_type lu >> >> > GAMG does set the coarse grid solver to LU manually like this: ierr = >> PCSetType(pc2, PCLU);CHKERRQ(ierr); >> >> For parallel runs this won't work using the GPU code and only >> sequential direct solvers, so it must using BJACOBI in that case? >> >> Barry >> >> >> >> >> >> > I am thinking the dispatch of the CUDA version of this got dropped >> somehow. >> > >> > I see that this is getting called: >> > >> > PETSC_EXTERN PetscErrorCode MatSolverTypeRegister_CUSPARSE(void) >> > { >> > PetscErrorCode ierr; >> > >> > PetscFunctionBegin; >> > ierr = >> MatSolverTypeRegister(MATSOLVERCUSPARSE,MATSEQAIJCUSPARSE,MAT_FACTOR_LU,MatGetFactor_seqaijcusparse_cusparse);CHKERRQ(ierr); >> > ierr = >> MatSolverTypeRegister(MATSOLVERCUSPARSE,MATSEQAIJCUSPARSE,MAT_FACTOR_CHOLESKY,MatGetFactor_seqaijcusparse_cusparse);CHKERRQ(ierr); >> > ierr = >> MatSolverTypeRegister(MATSOLVERCUSPARSE,MATSEQAIJCUSPARSE,MAT_FACTOR_ILU,MatGetFactor_seqaijcusparse_cusparse);CHKERRQ(ierr); >> > ierr = >> MatSolverTypeRegister(MATSOLVERCUSPARSE,MATSEQAIJCUSPARSE,MAT_FACTOR_ICC,MatGetFactor_seqaijcusparse_cusparse);CHKERRQ(ierr); >> > PetscFunctionReturn(0); >> > } >> > >> > but MatGetFactor_seqaijcusparse_cusparse is not getting called. >> > >> > GAMG does set the coarse grid solver to LU manually like this: ierr = >> PCSetType(pc2, PCLU);CHKERRQ(ierr); >> > >> > Any ideas? >> > >> > Thanks, >> > Mark >> > >> > >> >>