On Wed, Oct 13, 2021 at 11:10 AM Chang Liu <c...@pppl.gov> wrote: > Thank you Junchao for explaining this. I guess in my case the code is > just calling a seq solver like superlu to do factorization on GPUs. > > My idea is that I want to have a traditional MPI code to utilize GPUs > with cusparse. Right now cusparse does not support mpiaij matrix,
Sure it does: '-mat_type aijcusparse' will give you an mpiaijcusparse matrix with > 1 processes. (-mat_type mpiaijcusparse might also work with >1 proc). However, I see in grepping the repo that all the mumps and superlu tests use aij or sell matrix type. MUMPS and SuperLU provide their own solves, I assume .... but you might want to do other matrix operations on the GPU. Is that the issue? Did you try -mat_type aijcusparse with MUMPS and/or SuperLU have a problem? (no test with it so it probably does not work) Thanks, Mark > so I > want the code to have a mpiaij matrix when adding all the matrix terms, > and then transform the matrix to seqaij when doing the factorization and > solve. This involves sending the data to the master process, and I think > the petsc mumps solver have something similar already. > > Chang > > On 10/13/21 10:18 AM, Junchao Zhang wrote: > > > > > > > > On Tue, Oct 12, 2021 at 1:07 PM Mark Adams <mfad...@lbl.gov > > <mailto:mfad...@lbl.gov>> wrote: > > > > > > > > On Tue, Oct 12, 2021 at 1:45 PM Chang Liu <c...@pppl.gov > > <mailto:c...@pppl.gov>> wrote: > > > > Hi Mark, > > > > The option I use is like > > > > -pc_type bjacobi -pc_bjacobi_blocks 16 -ksp_type fgmres -mat_type > > aijcusparse *-sub_pc_factor_mat_solver_type cusparse > *-sub_ksp_type > > preonly *-sub_pc_type lu* -ksp_max_it 2000 -ksp_rtol 1.e-300 > > -ksp_atol 1.e-300 > > > > > > Note, If you use -log_view the last column (rows are the method like > > MatFactorNumeric) has the percent of work in the GPU. > > > > Junchao: *This* implies that we have a cuSparse LU factorization. Is > > that correct? (I don't think we do) > > > > No, we don't have cuSparse LU factorization. If you check > > MatLUFactorSymbolic_SeqAIJCUSPARSE(),you will find it calls > > MatLUFactorSymbolic_SeqAIJ() instead. > > So I don't understand Chang's idea. Do you want to make bigger blocks? > > > > > > I think this one do both factorization and solve on gpu. > > > > You can check the runex72_aijcusparse.sh file in petsc install > > directory, and try it your self (this is only lu factorization > > without > > iterative solve). > > > > Chang > > > > On 10/12/21 1:17 PM, Mark Adams wrote: > > > > > > > > > On Tue, Oct 12, 2021 at 11:19 AM Chang Liu <c...@pppl.gov > > <mailto:c...@pppl.gov> > > > <mailto:c...@pppl.gov <mailto:c...@pppl.gov>>> wrote: > > > > > > Hi Junchao, > > > > > > No I only needs it to be transferred within a node. I use > > block-Jacobi > > > method and GMRES to solve the sparse matrix, so each > > direct solver will > > > take care of a sub-block of the whole matrix. In this > > way, I can use > > > one > > > GPU to solve one sub-block, which is stored within one > node. > > > > > > It was stated in the documentation that cusparse solver > > is slow. > > > However, in my test using ex72.c, the cusparse solver is > > faster than > > > mumps or superlu_dist on CPUs. > > > > > > > > > Are we talking about the factorization, the solve, or both? > > > > > > We do not have an interface to cuSparse's LU factorization (I > > just > > > learned that it exists a few weeks ago). > > > Perhaps your fast "cusparse solver" is '-pc_type lu -mat_type > > > aijcusparse' ? This would be the CPU factorization, which is > the > > > dominant cost. > > > > > > > > > Chang > > > > > > On 10/12/21 10:24 AM, Junchao Zhang wrote: > > > > Hi, Chang, > > > > For the mumps solver, we usually transfers matrix > > and vector > > > data > > > > within a compute node. For the idea you propose, it > > looks like > > > we need > > > > to gather data within MPI_COMM_WORLD, right? > > > > > > > > Mark, I remember you said cusparse solve is slow > > and you would > > > > rather do it on CPU. Is it right? > > > > > > > > --Junchao Zhang > > > > > > > > > > > > On Mon, Oct 11, 2021 at 10:25 PM Chang Liu via > petsc-users > > > > <petsc-users@mcs.anl.gov > > <mailto:petsc-users@mcs.anl.gov> <mailto:petsc-users@mcs.anl.gov > > <mailto:petsc-users@mcs.anl.gov>> > > > <mailto:petsc-users@mcs.anl.gov > > <mailto:petsc-users@mcs.anl.gov> <mailto:petsc-users@mcs.anl.gov > > <mailto:petsc-users@mcs.anl.gov>>>> > > > wrote: > > > > > > > > Hi, > > > > > > > > Currently, it is possible to use mumps solver in > > PETSC with > > > > -mat_mumps_use_omp_threads option, so that > > multiple MPI > > > processes will > > > > transfer the matrix and rhs data to the master > > rank, and then > > > master > > > > rank will call mumps with OpenMP to solve the > matrix. > > > > > > > > I wonder if someone can develop similar option for > > cusparse > > > solver. > > > > Right now, this solver does not work with > > mpiaijcusparse. I > > > think a > > > > possible workaround is to transfer all the matrix > > data to one MPI > > > > process, and then upload the data to GPU to solve. > > In this > > > way, one can > > > > use cusparse solver for a MPI program. > > > > > > > > Chang > > > > -- > > > > Chang Liu > > > > Staff Research Physicist > > > > +1 609 243 3438 > > > > c...@pppl.gov <mailto:c...@pppl.gov> > > <mailto:c...@pppl.gov <mailto:c...@pppl.gov>> > > <mailto:c...@pppl.gov <mailto:c...@pppl.gov> > > > <mailto:c...@pppl.gov <mailto:c...@pppl.gov>>> > > > > Princeton Plasma Physics Laboratory > > > > 100 Stellarator Rd, Princeton NJ 08540, USA > > > > > > > > > > -- > > > Chang Liu > > > Staff Research Physicist > > > +1 609 243 3438 > > > c...@pppl.gov <mailto:c...@pppl.gov> <mailto:c...@pppl.gov > > <mailto:c...@pppl.gov>> > > > Princeton Plasma Physics Laboratory > > > 100 Stellarator Rd, Princeton NJ 08540, USA > > > > > > > -- > > Chang Liu > > Staff Research Physicist > > +1 609 243 3438 > > c...@pppl.gov <mailto:c...@pppl.gov> > > Princeton Plasma Physics Laboratory > > 100 Stellarator Rd, Princeton NJ 08540, USA > > > > -- > Chang Liu > Staff Research Physicist > +1 609 243 3438 > c...@pppl.gov > Princeton Plasma Physics Laboratory > 100 Stellarator Rd, Princeton NJ 08540, USA >