On Tue, Oct 12, 2021 at 1:07 PM Mark Adams <mfad...@lbl.gov> wrote: > > > On Tue, Oct 12, 2021 at 1:45 PM Chang Liu <c...@pppl.gov> wrote: > >> Hi Mark, >> >> The option I use is like >> >> -pc_type bjacobi -pc_bjacobi_blocks 16 -ksp_type fgmres -mat_type >> aijcusparse *-sub_pc_factor_mat_solver_type cusparse *-sub_ksp_type >> preonly *-sub_pc_type lu* -ksp_max_it 2000 -ksp_rtol 1.e-300 -ksp_atol >> 1.e-300 >> >> > Note, If you use -log_view the last column (rows are the method like > MatFactorNumeric) has the percent of work in the GPU. > > Junchao: *This* implies that we have a cuSparse LU factorization. Is > that correct? (I don't think we do) > No, we don't have cuSparse LU factorization. If you check MatLUFactorSymbolic_SeqAIJCUSPARSE(),you will find it calls MatLUFactorSymbolic_SeqAIJ() instead. So I don't understand Chang's idea. Do you want to make bigger blocks?
> > I think this one do both factorization and solve on gpu. >> >> You can check the runex72_aijcusparse.sh file in petsc install >> directory, and try it your self (this is only lu factorization without >> iterative solve). >> >> Chang >> >> On 10/12/21 1:17 PM, Mark Adams wrote: >> > >> > >> > On Tue, Oct 12, 2021 at 11:19 AM Chang Liu <c...@pppl.gov >> > <mailto:c...@pppl.gov>> wrote: >> > >> > Hi Junchao, >> > >> > No I only needs it to be transferred within a node. I use >> block-Jacobi >> > method and GMRES to solve the sparse matrix, so each direct solver >> will >> > take care of a sub-block of the whole matrix. In this way, I can use >> > one >> > GPU to solve one sub-block, which is stored within one node. >> > >> > It was stated in the documentation that cusparse solver is slow. >> > However, in my test using ex72.c, the cusparse solver is faster than >> > mumps or superlu_dist on CPUs. >> > >> > >> > Are we talking about the factorization, the solve, or both? >> > >> > We do not have an interface to cuSparse's LU factorization (I just >> > learned that it exists a few weeks ago). >> > Perhaps your fast "cusparse solver" is '-pc_type lu -mat_type >> > aijcusparse' ? This would be the CPU factorization, which is the >> > dominant cost. >> > >> > >> > Chang >> > >> > On 10/12/21 10:24 AM, Junchao Zhang wrote: >> > > Hi, Chang, >> > > For the mumps solver, we usually transfers matrix and vector >> > data >> > > within a compute node. For the idea you propose, it looks like >> > we need >> > > to gather data within MPI_COMM_WORLD, right? >> > > >> > > Mark, I remember you said cusparse solve is slow and you >> would >> > > rather do it on CPU. Is it right? >> > > >> > > --Junchao Zhang >> > > >> > > >> > > On Mon, Oct 11, 2021 at 10:25 PM Chang Liu via petsc-users >> > > <petsc-users@mcs.anl.gov <mailto:petsc-users@mcs.anl.gov> >> > <mailto:petsc-users@mcs.anl.gov <mailto:petsc-users@mcs.anl.gov>>> >> > wrote: >> > > >> > > Hi, >> > > >> > > Currently, it is possible to use mumps solver in PETSC with >> > > -mat_mumps_use_omp_threads option, so that multiple MPI >> > processes will >> > > transfer the matrix and rhs data to the master rank, and then >> > master >> > > rank will call mumps with OpenMP to solve the matrix. >> > > >> > > I wonder if someone can develop similar option for cusparse >> > solver. >> > > Right now, this solver does not work with mpiaijcusparse. I >> > think a >> > > possible workaround is to transfer all the matrix data to >> one MPI >> > > process, and then upload the data to GPU to solve. In this >> > way, one can >> > > use cusparse solver for a MPI program. >> > > >> > > Chang >> > > -- >> > > Chang Liu >> > > Staff Research Physicist >> > > +1 609 243 3438 >> > > c...@pppl.gov <mailto:c...@pppl.gov> <mailto:c...@pppl.gov >> > <mailto:c...@pppl.gov>> >> > > Princeton Plasma Physics Laboratory >> > > 100 Stellarator Rd, Princeton NJ 08540, USA >> > > >> > >> > -- >> > Chang Liu >> > Staff Research Physicist >> > +1 609 243 3438 >> > c...@pppl.gov <mailto:c...@pppl.gov> >> > Princeton Plasma Physics Laboratory >> > 100 Stellarator Rd, Princeton NJ 08540, USA >> > >> >> -- >> Chang Liu >> Staff Research Physicist >> +1 609 243 3438 >> c...@pppl.gov >> Princeton Plasma Physics Laboratory >> 100 Stellarator Rd, Princeton NJ 08540, USA >> >