Hi Junchao,

No I only needs it to be transferred within a node. I use block-Jacobi method and GMRES to solve the sparse matrix, so each direct solver will take care of a sub-block of the whole matrix. In this way, I can use one GPU to solve one sub-block, which is stored within one node.

It was stated in the documentation that cusparse solver is slow. However, in my test using ex72.c, the cusparse solver is faster than mumps or superlu_dist on CPUs.

Chang

On 10/12/21 10:24 AM, Junchao Zhang wrote:
Hi, Chang,
   For the mumps solver, we usually transfers matrix and vector data within a compute node.  For the idea you propose, it looks like we need to gather data within MPI_COMM_WORLD, right?

   Mark, I remember you said cusparse solve is slow and you would rather do it on CPU. Is it right?

--Junchao Zhang


On Mon, Oct 11, 2021 at 10:25 PM Chang Liu via petsc-users <petsc-users@mcs.anl.gov <mailto:petsc-users@mcs.anl.gov>> wrote:

    Hi,

    Currently, it is possible to use mumps solver in PETSC with
    -mat_mumps_use_omp_threads option, so that multiple MPI processes will
    transfer the matrix and rhs data to the master rank, and then master
    rank will call mumps with OpenMP to solve the matrix.

    I wonder if someone can develop similar option for cusparse solver.
    Right now, this solver does not work with mpiaijcusparse. I think a
    possible workaround is to transfer all the matrix data to one MPI
    process, and then upload the data to GPU to solve. In this way, one can
    use cusparse solver for a MPI program.

    Chang
-- Chang Liu
    Staff Research Physicist
    +1 609 243 3438
    c...@pppl.gov <mailto:c...@pppl.gov>
    Princeton Plasma Physics Laboratory
    100 Stellarator Rd, Princeton NJ 08540, USA


--
Chang Liu
Staff Research Physicist
+1 609 243 3438
c...@pppl.gov
Princeton Plasma Physics Laboratory
100 Stellarator Rd, Princeton NJ 08540, USA

Reply via email to