It uses nonblocking point-to-point by default since that tends to perform
better and is less prone to MPI implementation bugs, but you can select
`-sf_type window` to try it, or use other strategies here depending on the sort
of problem you're working with.
#define PETSCSFBASIC "basic"
#define PETSCSFNEIGHBOR "neighbor"
#define PETSCSFALLGATHERV "allgatherv"
#define PETSCSFALLGATHER "allgather"
#define PETSCSFGATHERV"gatherv"
#define PETSCSFGATHER "gather"
#define PETSCSFALLTOALL "alltoall"
#define PETSCSFWINDOW "window"
PETSc does try to use GPU-aware MPI, though implementation bugs are present on
many machines and it often requires a delicate environment arrangement.
"Maeder Alexander" writes:
> I am a new user of PETSc
>
> and want to know more about the underlying implementation for matrix-vector
> multiplication (Ax=y).
>
> PETSc utilizes a 1D distribution and communicates only parts of the vector x
> utilized depending on the sparsity pattern of A.
>
> Is the communication of x done with MPI-3 RMA and utilizes cuda-aware mpi for
> RMA?
>
>
> Best regards,
>
>
> Alexander Maeder