Re: [petsc-users] [External] Re: request to add an option similar to use_omp_threads for mumps to cusparse solver

Mark Adams Wed, 13 Oct 2021 09:04:29 -0700

On Wed, Oct 13, 2021 at 11:10 AM Chang Liu <c...@pppl.gov> wrote:

> Thank you Junchao for explaining this. I guess in my case the code is
> just calling a seq solver like superlu to do factorization on GPUs.
>
> My idea is that I want to have a traditional MPI code to utilize GPUs
> with cusparse. Right now cusparse does not support mpiaij matrix,



Sure it does: '-mat_type aijcusparse' will give you an
mpiaijcusparse matrix with > 1 processes.
(-mat_type mpiaijcusparse might also work with >1 proc).

However, I see in grepping the repo that all the mumps and superlu tests
use aij or sell matrix type.
MUMPS and SuperLU provide their own solves, I assume .... but you might
want to do other matrix operations on the GPU. Is that the issue?
Did you try -mat_type aijcusparse with MUMPS and/or SuperLU have a problem?
(no test with it so it probably does not work)

Thanks,
Mark


> so I
> want the code to have a mpiaij matrix when adding all the matrix terms,
> and then transform the matrix to seqaij when doing the factorization and
> solve. This involves sending the data to the master process, and I think
> the petsc mumps solver have something similar already.
>
> Chang
>
> On 10/13/21 10:18 AM, Junchao Zhang wrote:
> >
> >
> >
> > On Tue, Oct 12, 2021 at 1:07 PM Mark Adams <mfad...@lbl.gov
> > <mailto:mfad...@lbl.gov>> wrote:
> >
> >
> >
> >     On Tue, Oct 12, 2021 at 1:45 PM Chang Liu <c...@pppl.gov
> >     <mailto:c...@pppl.gov>> wrote:
> >
> >         Hi Mark,
> >
> >         The option I use is like
> >
> >         -pc_type bjacobi -pc_bjacobi_blocks 16 -ksp_type fgmres -mat_type
> >         aijcusparse *-sub_pc_factor_mat_solver_type cusparse
> *-sub_ksp_type
> >         preonly *-sub_pc_type lu* -ksp_max_it 2000 -ksp_rtol 1.e-300
> >         -ksp_atol 1.e-300
> >
> >
> >     Note, If you use -log_view the last column (rows are the method like
> >     MatFactorNumeric) has the percent of work in the GPU.
> >
> >     Junchao: *This* implies that we have a cuSparse LU factorization. Is
> >     that correct? (I don't think we do)
> >
> > No, we don't have cuSparse LU factorization.  If you check
> > MatLUFactorSymbolic_SeqAIJCUSPARSE(),you will find it calls
> > MatLUFactorSymbolic_SeqAIJ() instead.
> > So I don't understand Chang's idea. Do you want to make bigger blocks?
> >
> >
> >         I think this one do both factorization and solve on gpu.
> >
> >         You can check the runex72_aijcusparse.sh file in petsc install
> >         directory, and try it your self (this is only lu factorization
> >         without
> >         iterative solve).
> >
> >         Chang
> >
> >         On 10/12/21 1:17 PM, Mark Adams wrote:
> >          >
> >          >
> >          > On Tue, Oct 12, 2021 at 11:19 AM Chang Liu <c...@pppl.gov
> >         <mailto:c...@pppl.gov>
> >          > <mailto:c...@pppl.gov <mailto:c...@pppl.gov>>> wrote:
> >          >
> >          >     Hi Junchao,
> >          >
> >          >     No I only needs it to be transferred within a node. I use
> >         block-Jacobi
> >          >     method and GMRES to solve the sparse matrix, so each
> >         direct solver will
> >          >     take care of a sub-block of the whole matrix. In this
> >         way, I can use
> >          >     one
> >          >     GPU to solve one sub-block, which is stored within one
> node.
> >          >
> >          >     It was stated in the documentation that cusparse solver
> >         is slow.
> >          >     However, in my test using ex72.c, the cusparse solver is
> >         faster than
> >          >     mumps or superlu_dist on CPUs.
> >          >
> >          >
> >          > Are we talking about the factorization, the solve, or both?
> >          >
> >          > We do not have an interface to cuSparse's LU factorization (I
> >         just
> >          > learned that it exists a few weeks ago).
> >          > Perhaps your fast "cusparse solver" is '-pc_type lu -mat_type
> >          > aijcusparse' ? This would be the CPU factorization, which is
> the
> >          > dominant cost.
> >          >
> >          >
> >          >     Chang
> >          >
> >          >     On 10/12/21 10:24 AM, Junchao Zhang wrote:
> >          >      > Hi, Chang,
> >          >      >     For the mumps solver, we usually transfers matrix
> >         and vector
> >          >     data
> >          >      > within a compute node.  For the idea you propose, it
> >         looks like
> >          >     we need
> >          >      > to gather data within MPI_COMM_WORLD, right?
> >          >      >
> >          >      >     Mark, I remember you said cusparse solve is slow
> >         and you would
> >          >      > rather do it on CPU. Is it right?
> >          >      >
> >          >      > --Junchao Zhang
> >          >      >
> >          >      >
> >          >      > On Mon, Oct 11, 2021 at 10:25 PM Chang Liu via
> petsc-users
> >          >      > <petsc-users@mcs.anl.gov
> >         <mailto:petsc-users@mcs.anl.gov> <mailto:petsc-users@mcs.anl.gov
> >         <mailto:petsc-users@mcs.anl.gov>>
> >          >     <mailto:petsc-users@mcs.anl.gov
> >         <mailto:petsc-users@mcs.anl.gov> <mailto:petsc-users@mcs.anl.gov
> >         <mailto:petsc-users@mcs.anl.gov>>>>
> >          >     wrote:
> >          >      >
> >          >      >     Hi,
> >          >      >
> >          >      >     Currently, it is possible to use mumps solver in
> >         PETSC with
> >          >      >     -mat_mumps_use_omp_threads option, so that
> >         multiple MPI
> >          >     processes will
> >          >      >     transfer the matrix and rhs data to the master
> >         rank, and then
> >          >     master
> >          >      >     rank will call mumps with OpenMP to solve the
> matrix.
> >          >      >
> >          >      >     I wonder if someone can develop similar option for
> >         cusparse
> >          >     solver.
> >          >      >     Right now, this solver does not work with
> >         mpiaijcusparse. I
> >          >     think a
> >          >      >     possible workaround is to transfer all the matrix
> >         data to one MPI
> >          >      >     process, and then upload the data to GPU to solve.
> >         In this
> >          >     way, one can
> >          >      >     use cusparse solver for a MPI program.
> >          >      >
> >          >      >     Chang
> >          >      >     --
> >          >      >     Chang Liu
> >          >      >     Staff Research Physicist
> >          >      >     +1 609 243 3438
> >          >      > c...@pppl.gov <mailto:c...@pppl.gov>
> >         <mailto:c...@pppl.gov <mailto:c...@pppl.gov>>
> >         <mailto:c...@pppl.gov <mailto:c...@pppl.gov>
> >          >     <mailto:c...@pppl.gov <mailto:c...@pppl.gov>>>
> >          >      >     Princeton Plasma Physics Laboratory
> >          >      >     100 Stellarator Rd, Princeton NJ 08540, USA
> >          >      >
> >          >
> >          >     --
> >          >     Chang Liu
> >          >     Staff Research Physicist
> >          >     +1 609 243 3438
> >          > c...@pppl.gov <mailto:c...@pppl.gov> <mailto:c...@pppl.gov
> >         <mailto:c...@pppl.gov>>
> >          >     Princeton Plasma Physics Laboratory
> >          >     100 Stellarator Rd, Princeton NJ 08540, USA
> >          >
> >
> >         --
> >         Chang Liu
> >         Staff Research Physicist
> >         +1 609 243 3438
> >         c...@pppl.gov <mailto:c...@pppl.gov>
> >         Princeton Plasma Physics Laboratory
> >         100 Stellarator Rd, Princeton NJ 08540, USA
> >
>
> --
> Chang Liu
> Staff Research Physicist
> +1 609 243 3438
> c...@pppl.gov
> Princeton Plasma Physics Laboratory
> 100 Stellarator Rd, Princeton NJ 08540, USA
>

Re: [petsc-users] [External] Re: request to add an option similar to use_omp_threads for mumps to cusparse solver

Reply via email to