Re: [petsc-users] [External] Re: request to add an option similar to use_omp_threads for mumps to cusparse solver

Chang Liu via petsc-users Wed, 13 Oct 2021 08:10:49 -0700

Thank you Junchao for explaining this. I guess in my case the code isjust calling a seq solver like superlu to do factorization on GPUs.

My idea is that I want to have a traditional MPI code to utilize GPUswith cusparse. Right now cusparse does not support mpiaij matrix, so Iwant the code to have a mpiaij matrix when adding all the matrix terms,and then transform the matrix to seqaij when doing the factorization andsolve. This involves sending the data to the master process, and I thinkthe petsc mumps solver have something similar already.


Chang

On 10/13/21 10:18 AM, Junchao Zhang wrote:

On Tue, Oct 12, 2021 at 1:07 PM Mark Adams <mfad...@lbl.gov<mailto:mfad...@lbl.gov>> wrote:




    On Tue, Oct 12, 2021 at 1:45 PM Chang Liu <c...@pppl.gov
    <mailto:c...@pppl.gov>> wrote:

        Hi Mark,

        The option I use is like

        -pc_type bjacobi -pc_bjacobi_blocks 16 -ksp_type fgmres -mat_type
        aijcusparse *-sub_pc_factor_mat_solver_type cusparse *-sub_ksp_type
        preonly *-sub_pc_type lu* -ksp_max_it 2000 -ksp_rtol 1.e-300
        -ksp_atol 1.e-300


    Note, If you use -log_view the last column (rows are the method like
    MatFactorNumeric) has the percent of work in the GPU.

    Junchao: *This* implies that we have a cuSparse LU factorization. Is
    that correct? (I don't think we do)

No, we don't have cuSparse LU factorization. If you checkMatLUFactorSymbolic_SeqAIJCUSPARSE(),you will find it callsMatLUFactorSymbolic_SeqAIJ() instead.

So I don't understand Chang's idea. Do you want to make bigger blocks?


        I think this one do both factorization and solve on gpu.

        You can check the runex72_aijcusparse.sh file in petsc install
        directory, and try it your self (this is only lu factorization
        without
        iterative solve).

        Chang

        On 10/12/21 1:17 PM, Mark Adams wrote:
         >
         >
         > On Tue, Oct 12, 2021 at 11:19 AM Chang Liu <c...@pppl.gov
        <mailto:c...@pppl.gov>
         > <mailto:c...@pppl.gov <mailto:c...@pppl.gov>>> wrote:
         >
         >     Hi Junchao,
         >
         >     No I only needs it to be transferred within a node. I use
        block-Jacobi
         >     method and GMRES to solve the sparse matrix, so each
        direct solver will
         >     take care of a sub-block of the whole matrix. In this
        way, I can use
         >     one
         >     GPU to solve one sub-block, which is stored within one node.
         >
         >     It was stated in the documentation that cusparse solver
        is slow.
         >     However, in my test using ex72.c, the cusparse solver is
        faster than
         >     mumps or superlu_dist on CPUs.
         >
         >
         > Are we talking about the factorization, the solve, or both?
         >
         > We do not have an interface to cuSparse's LU factorization (I
        just
         > learned that it exists a few weeks ago).
         > Perhaps your fast "cusparse solver" is '-pc_type lu -mat_type
         > aijcusparse' ? This would be the CPU factorization, which is the
         > dominant cost.
         >
         >
         >     Chang
         >
         >     On 10/12/21 10:24 AM, Junchao Zhang wrote:
         >      > Hi, Chang,
         >      >     For the mumps solver, we usually transfers matrix
        and vector
         >     data
         >      > within a compute node.  For the idea you propose, it
        looks like
         >     we need
         >      > to gather data within MPI_COMM_WORLD, right?
         >      >
         >      >     Mark, I remember you said cusparse solve is slow
        and you would
         >      > rather do it on CPU. Is it right?
         >      >
         >      > --Junchao Zhang
         >      >
         >      >
         >      > On Mon, Oct 11, 2021 at 10:25 PM Chang Liu via petsc-users
         >      > <petsc-users@mcs.anl.gov
        <mailto:petsc-users@mcs.anl.gov> <mailto:petsc-users@mcs.anl.gov
        <mailto:petsc-users@mcs.anl.gov>>
         >     <mailto:petsc-users@mcs.anl.gov
        <mailto:petsc-users@mcs.anl.gov> <mailto:petsc-users@mcs.anl.gov
        <mailto:petsc-users@mcs.anl.gov>>>>
         >     wrote:
         >      >
         >      >     Hi,
         >      >
         >      >     Currently, it is possible to use mumps solver in
        PETSC with
         >      >     -mat_mumps_use_omp_threads option, so that
        multiple MPI
         >     processes will
         >      >     transfer the matrix and rhs data to the master
        rank, and then
         >     master
         >      >     rank will call mumps with OpenMP to solve the matrix.
         >      >
         >      >     I wonder if someone can develop similar option for
        cusparse
         >     solver.
         >      >     Right now, this solver does not work with
        mpiaijcusparse. I
         >     think a
         >      >     possible workaround is to transfer all the matrix
        data to one MPI
         >      >     process, and then upload the data to GPU to solve.
        In this
         >     way, one can
         >      >     use cusparse solver for a MPI program.
         >      >
         >      >     Chang
         >      >     --
         >      >     Chang Liu
         >      >     Staff Research Physicist
         >      >     +1 609 243 3438
         >      > c...@pppl.gov <mailto:c...@pppl.gov>
        <mailto:c...@pppl.gov <mailto:c...@pppl.gov>>
        <mailto:c...@pppl.gov <mailto:c...@pppl.gov>
         >     <mailto:c...@pppl.gov <mailto:c...@pppl.gov>>>
         >      >     Princeton Plasma Physics Laboratory
         >      >     100 Stellarator Rd, Princeton NJ 08540, USA
         >      >
         >
         >     --
         >     Chang Liu
         >     Staff Research Physicist
         >     +1 609 243 3438
         > c...@pppl.gov <mailto:c...@pppl.gov> <mailto:c...@pppl.gov
        <mailto:c...@pppl.gov>>
         >     Princeton Plasma Physics Laboratory
         >     100 Stellarator Rd, Princeton NJ 08540, USA
         >

--Chang Liu

        Staff Research Physicist
        +1 609 243 3438
        c...@pppl.gov <mailto:c...@pppl.gov>
        Princeton Plasma Physics Laboratory
        100 Stellarator Rd, Princeton NJ 08540, USA


--
Chang Liu
Staff Research Physicist
+1 609 243 3438
c...@pppl.gov
Princeton Plasma Physics Laboratory
100 Stellarator Rd, Princeton NJ 08540, USA

Re: [petsc-users] [External] Re: request to add an option similar to use_omp_threads for mumps to cusparse solver

Reply via email to