Hi Mark,

The option I use is like

-pc_type bjacobi -pc_bjacobi_blocks 16 -ksp_type fgmres -mat_type aijcusparse -sub_pc_factor_mat_solver_type cusparse -sub_ksp_type preonly -sub_pc_type lu -ksp_max_it 2000 -ksp_rtol 1.e-300 -ksp_atol 1.e-300

I think this one do both factorization and solve on gpu.

You can check the runex72_aijcusparse.sh file in petsc install directory, and try it your self (this is only lu factorization without iterative solve).

Chang

On 10/12/21 1:17 PM, Mark Adams wrote:


On Tue, Oct 12, 2021 at 11:19 AM Chang Liu <c...@pppl.gov <mailto:c...@pppl.gov>> wrote:

    Hi Junchao,

    No I only needs it to be transferred within a node. I use block-Jacobi
    method and GMRES to solve the sparse matrix, so each direct solver will
    take care of a sub-block of the whole matrix. In this way, I can use
    one
    GPU to solve one sub-block, which is stored within one node.

    It was stated in the documentation that cusparse solver is slow.
    However, in my test using ex72.c, the cusparse solver is faster than
    mumps or superlu_dist on CPUs.


Are we talking about the factorization, the solve, or both?

We do not have an interface to cuSparse's LU factorization (I just learned that it exists a few weeks ago). Perhaps your fast "cusparse solver" is '-pc_type lu -mat_type aijcusparse' ? This would be the CPU factorization, which is the dominant cost.


    Chang

    On 10/12/21 10:24 AM, Junchao Zhang wrote:
     > Hi, Chang,
     >     For the mumps solver, we usually transfers matrix and vector
    data
     > within a compute node.  For the idea you propose, it looks like
    we need
     > to gather data within MPI_COMM_WORLD, right?
     >
     >     Mark, I remember you said cusparse solve is slow and you would
     > rather do it on CPU. Is it right?
     >
     > --Junchao Zhang
     >
     >
     > On Mon, Oct 11, 2021 at 10:25 PM Chang Liu via petsc-users
     > <petsc-users@mcs.anl.gov <mailto:petsc-users@mcs.anl.gov>
    <mailto:petsc-users@mcs.anl.gov <mailto:petsc-users@mcs.anl.gov>>>
    wrote:
     >
     >     Hi,
     >
     >     Currently, it is possible to use mumps solver in PETSC with
     >     -mat_mumps_use_omp_threads option, so that multiple MPI
    processes will
     >     transfer the matrix and rhs data to the master rank, and then
    master
     >     rank will call mumps with OpenMP to solve the matrix.
     >
     >     I wonder if someone can develop similar option for cusparse
    solver.
     >     Right now, this solver does not work with mpiaijcusparse. I
    think a
     >     possible workaround is to transfer all the matrix data to one MPI
     >     process, and then upload the data to GPU to solve. In this
    way, one can
     >     use cusparse solver for a MPI program.
     >
     >     Chang
     >     --
     >     Chang Liu
     >     Staff Research Physicist
     >     +1 609 243 3438
     > c...@pppl.gov <mailto:c...@pppl.gov> <mailto:c...@pppl.gov
    <mailto:c...@pppl.gov>>
     >     Princeton Plasma Physics Laboratory
     >     100 Stellarator Rd, Princeton NJ 08540, USA
     >

-- Chang Liu
    Staff Research Physicist
    +1 609 243 3438
    c...@pppl.gov <mailto:c...@pppl.gov>
    Princeton Plasma Physics Laboratory
    100 Stellarator Rd, Princeton NJ 08540, USA


--
Chang Liu
Staff Research Physicist
+1 609 243 3438
c...@pppl.gov
Princeton Plasma Physics Laboratory
100 Stellarator Rd, Princeton NJ 08540, USA

Reply via email to