Re: [petsc-users] GAMG and Hypre preconditioner

2023-06-27 Thread Zisheng Ye via petsc-users
Hi Jed

Thanks for your reply. I have sent the log files to petsc-ma...@mcs.anl.gov.

Zisheng

From: Jed Brown 
Sent: Tuesday, June 27, 2023 1:02 PM
To: Zisheng Ye ; petsc-users@mcs.anl.gov 

Subject: Re: [petsc-users] GAMG and Hypre preconditioner

[External Sender]

Zisheng Ye via petsc-users  writes:

> Dear PETSc Team
>
> We are testing the GPU support in PETSc's KSPSolve, especially for the GAMG 
> and Hypre preconditioners. We have encountered several issues that we would 
> like to ask for your suggestions.
>
> First, we have couple of questions when working with a single MPI rank:
>
>   1.  We have tested two backends, CUDA and Kokkos. One commonly encountered 
> error is related to SpGEMM in CUDA when the mat is large as listed below:
>
> cudaMalloc((void **), bufferSize2) error( cudaErrorMemoryAllocation): 
> out of memory
>
> For CUDA backend, one can use "-matmatmult_backend_cpu -matptap_backend_cpu" 
> to avoid these problems. However, there seems no equivalent options in Kokkos 
> backend. Is there any good practice to avoid this error for both backends and 
> if we can avoid this error in Kokkos backend?

Junchao will know more about KK tuning, but the faster GPU matrix-matrix 
algorithms use extra memory. We should be able to make the host option 
available with kokkos.

>   2.  We have tested the combination of Hypre and Kokkos as backend. It looks 
> like this combination is not compatible with each other, as we observed that 
> KSPSolve takes a greater number of iterations to exit, and the residual norm 
> in the post-checking is much larger than the one obtained when working with 
> CUDA backend. This happens for matrices with block size larger than 1. Is 
> there any explanation to the error?
>
> Second, we have couple more questions when working with multiple MPI ranks:
>
>   1.  We are currently using OpenMPI as we couldnt get Intel MPI to work as a 
> GPU-aware MPI, is this a known issue with Intel MPI?

As far as I know, Intel's MPI is only for SYCL/Intel GPUs. In general, 
GPU-aware MPI has been incredibly flaky on all HPC systems despite being 
introduced ten years ago.

>   2.  With OpenMPI we currently see a slow down when increasing the MPI count 
> as shown in the figure below, is this normal?

Could you share -log_view output from a couple representative runs? You could 
send those here or to petsc-ma...@mcs.anl.gov. We need to see what kind of work 
is not scaling to attribute what may be causing it.


Re: [petsc-users] GAMG and Hypre preconditioner

2023-06-27 Thread Jed Brown
Zisheng Ye via petsc-users  writes:

> Dear PETSc Team
>
> We are testing the GPU support in PETSc's KSPSolve, especially for the GAMG 
> and Hypre preconditioners. We have encountered several issues that we would 
> like to ask for your suggestions.
>
> First, we have couple of questions when working with a single MPI rank:
>
>   1.  We have tested two backends, CUDA and Kokkos. One commonly encountered 
> error is related to SpGEMM in CUDA when the mat is large as listed below:
>
> cudaMalloc((void **), bufferSize2) error( cudaErrorMemoryAllocation): 
> out of memory
>
> For CUDA backend, one can use "-matmatmult_backend_cpu -matptap_backend_cpu" 
> to avoid these problems. However, there seems no equivalent options in Kokkos 
> backend. Is there any good practice to avoid this error for both backends and 
> if we can avoid this error in Kokkos backend?

Junchao will know more about KK tuning, but the faster GPU matrix-matrix 
algorithms use extra memory. We should be able to make the host option 
available with kokkos.

>   2.  We have tested the combination of Hypre and Kokkos as backend. It looks 
> like this combination is not compatible with each other, as we observed that 
> KSPSolve takes a greater number of iterations to exit, and the residual norm 
> in the post-checking is much larger than the one obtained when working with 
> CUDA backend. This happens for matrices with block size larger than 1. Is 
> there any explanation to the error?
>
> Second, we have couple more questions when working with multiple MPI ranks:
>
>   1.  We are currently using OpenMPI as we couldnt get Intel MPI to work as a 
> GPU-aware MPI, is this a known issue with Intel MPI?

As far as I know, Intel's MPI is only for SYCL/Intel GPUs. In general, 
GPU-aware MPI has been incredibly flaky on all HPC systems despite being 
introduced ten years ago.

>   2.  With OpenMPI we currently see a slow down when increasing the MPI count 
> as shown in the figure below, is this normal?

Could you share -log_view output from a couple representative runs? You could 
send those here or to petsc-ma...@mcs.anl.gov. We need to see what kind of work 
is not scaling to attribute what may be causing it.


[petsc-users] GAMG and Hypre preconditioner

2023-06-27 Thread Zisheng Ye via petsc-users
Dear PETSc Team

We are testing the GPU support in PETSc's KSPSolve, especially for the GAMG and 
Hypre preconditioners. We have encountered several issues that we would like to 
ask for your suggestions.

First, we have couple of questions when working with a single MPI rank:

  1.  We have tested two backends, CUDA and Kokkos. One commonly encountered 
error is related to SpGEMM in CUDA when the mat is large as listed below:

cudaMalloc((void **), bufferSize2) error( cudaErrorMemoryAllocation): 
out of memory

For CUDA backend, one can use "-matmatmult_backend_cpu -matptap_backend_cpu" to 
avoid these problems. However, there seems no equivalent options in Kokkos 
backend. Is there any good practice to avoid this error for both backends and 
if we can avoid this error in Kokkos backend?

  2.  We have tested the combination of Hypre and Kokkos as backend. It looks 
like this combination is not compatible with each other, as we observed that 
KSPSolve takes a greater number of iterations to exit, and the residual norm in 
the post-checking is much larger than the one obtained when working with CUDA 
backend. This happens for matrices with block size larger than 1. Is there any 
explanation to the error?

Second, we have couple more questions when working with multiple MPI ranks:

  1.  We are currently using OpenMPI as we couldnt get Intel MPI to work as a 
GPU-aware MPI, is this a known issue with Intel MPI?
  2.  With OpenMPI we currently see a slow down when increasing the MPI count 
as shown in the figure below, is this normal?

[cid:9242808d-34af-4b51-8a0b-8295f0a012e5]

Zisheng