Thank you Junchao for explaining this. I guess in my case the code is just calling a seq solver like superlu to do factorization on GPUs.

My idea is that I want to have a traditional MPI code to utilize GPUs with cusparse. Right now cusparse does not support mpiaij matrix, so I want the code to have a mpiaij matrix when adding all the matrix terms, and then transform the matrix to seqaij when doing the factorization and solve. This involves sending the data to the master process, and I think the petsc mumps solver have something similar already.

Chang

On 10/13/21 10:18 AM, Junchao Zhang wrote:



On Tue, Oct 12, 2021 at 1:07 PM Mark Adams <mfad...@lbl.gov <mailto:mfad...@lbl.gov>> wrote:



    On Tue, Oct 12, 2021 at 1:45 PM Chang Liu <c...@pppl.gov
    <mailto:c...@pppl.gov>> wrote:

        Hi Mark,

        The option I use is like

        -pc_type bjacobi -pc_bjacobi_blocks 16 -ksp_type fgmres -mat_type
        aijcusparse *-sub_pc_factor_mat_solver_type cusparse *-sub_ksp_type
        preonly *-sub_pc_type lu* -ksp_max_it 2000 -ksp_rtol 1.e-300
        -ksp_atol 1.e-300


    Note, If you use -log_view the last column (rows are the method like
    MatFactorNumeric) has the percent of work in the GPU.

    Junchao: *This* implies that we have a cuSparse LU factorization. Is
    that correct? (I don't think we do)

No, we don't have cuSparse LU factorization.  If you check MatLUFactorSymbolic_SeqAIJCUSPARSE(),you will find it calls MatLUFactorSymbolic_SeqAIJ() instead.
So I don't understand Chang's idea. Do you want to make bigger blocks?


        I think this one do both factorization and solve on gpu.

        You can check the runex72_aijcusparse.sh file in petsc install
        directory, and try it your self (this is only lu factorization
        without
        iterative solve).

        Chang

        On 10/12/21 1:17 PM, Mark Adams wrote:
         >
         >
         > On Tue, Oct 12, 2021 at 11:19 AM Chang Liu <c...@pppl.gov
        <mailto:c...@pppl.gov>
         > <mailto:c...@pppl.gov <mailto:c...@pppl.gov>>> wrote:
         >
         >     Hi Junchao,
         >
         >     No I only needs it to be transferred within a node. I use
        block-Jacobi
         >     method and GMRES to solve the sparse matrix, so each
        direct solver will
         >     take care of a sub-block of the whole matrix. In this
        way, I can use
         >     one
         >     GPU to solve one sub-block, which is stored within one node.
         >
         >     It was stated in the documentation that cusparse solver
        is slow.
         >     However, in my test using ex72.c, the cusparse solver is
        faster than
         >     mumps or superlu_dist on CPUs.
         >
         >
         > Are we talking about the factorization, the solve, or both?
         >
         > We do not have an interface to cuSparse's LU factorization (I
        just
         > learned that it exists a few weeks ago).
         > Perhaps your fast "cusparse solver" is '-pc_type lu -mat_type
         > aijcusparse' ? This would be the CPU factorization, which is the
         > dominant cost.
         >
         >
         >     Chang
         >
         >     On 10/12/21 10:24 AM, Junchao Zhang wrote:
         >      > Hi, Chang,
         >      >     For the mumps solver, we usually transfers matrix
        and vector
         >     data
         >      > within a compute node.  For the idea you propose, it
        looks like
         >     we need
         >      > to gather data within MPI_COMM_WORLD, right?
         >      >
         >      >     Mark, I remember you said cusparse solve is slow
        and you would
         >      > rather do it on CPU. Is it right?
         >      >
         >      > --Junchao Zhang
         >      >
         >      >
         >      > On Mon, Oct 11, 2021 at 10:25 PM Chang Liu via petsc-users
         >      > <petsc-users@mcs.anl.gov
        <mailto:petsc-users@mcs.anl.gov> <mailto:petsc-users@mcs.anl.gov
        <mailto:petsc-users@mcs.anl.gov>>
         >     <mailto:petsc-users@mcs.anl.gov
        <mailto:petsc-users@mcs.anl.gov> <mailto:petsc-users@mcs.anl.gov
        <mailto:petsc-users@mcs.anl.gov>>>>
         >     wrote:
         >      >
         >      >     Hi,
         >      >
         >      >     Currently, it is possible to use mumps solver in
        PETSC with
         >      >     -mat_mumps_use_omp_threads option, so that
        multiple MPI
         >     processes will
         >      >     transfer the matrix and rhs data to the master
        rank, and then
         >     master
         >      >     rank will call mumps with OpenMP to solve the matrix.
         >      >
         >      >     I wonder if someone can develop similar option for
        cusparse
         >     solver.
         >      >     Right now, this solver does not work with
        mpiaijcusparse. I
         >     think a
         >      >     possible workaround is to transfer all the matrix
        data to one MPI
         >      >     process, and then upload the data to GPU to solve.
        In this
         >     way, one can
         >      >     use cusparse solver for a MPI program.
         >      >
         >      >     Chang
         >      >     --
         >      >     Chang Liu
         >      >     Staff Research Physicist
         >      >     +1 609 243 3438
         >      > c...@pppl.gov <mailto:c...@pppl.gov>
        <mailto:c...@pppl.gov <mailto:c...@pppl.gov>>
        <mailto:c...@pppl.gov <mailto:c...@pppl.gov>
         >     <mailto:c...@pppl.gov <mailto:c...@pppl.gov>>>
         >      >     Princeton Plasma Physics Laboratory
         >      >     100 Stellarator Rd, Princeton NJ 08540, USA
         >      >
         >
         >     --
         >     Chang Liu
         >     Staff Research Physicist
         >     +1 609 243 3438
         > c...@pppl.gov <mailto:c...@pppl.gov> <mailto:c...@pppl.gov
        <mailto:c...@pppl.gov>>
         >     Princeton Plasma Physics Laboratory
         >     100 Stellarator Rd, Princeton NJ 08540, USA
         >

-- Chang Liu
        Staff Research Physicist
        +1 609 243 3438
        c...@pppl.gov <mailto:c...@pppl.gov>
        Princeton Plasma Physics Laboratory
        100 Stellarator Rd, Princeton NJ 08540, USA


--
Chang Liu
Staff Research Physicist
+1 609 243 3438
c...@pppl.gov
Princeton Plasma Physics Laboratory
100 Stellarator Rd, Princeton NJ 08540, USA

Reply via email to