Hi Jed Thanks for your reply. I have sent the log files to [email protected].
Zisheng ________________________________ From: Jed Brown <[email protected]> Sent: Tuesday, June 27, 2023 1:02 PM To: Zisheng Ye <[email protected]>; [email protected] <[email protected]> Subject: Re: [petsc-users] GAMG and Hypre preconditioner [External Sender] Zisheng Ye via petsc-users <[email protected]> writes: > Dear PETSc Team > > We are testing the GPU support in PETSc's KSPSolve, especially for the GAMG > and Hypre preconditioners. We have encountered several issues that we would > like to ask for your suggestions. > > First, we have couple of questions when working with a single MPI rank: > > 1. We have tested two backends, CUDA and Kokkos. One commonly encountered > error is related to SpGEMM in CUDA when the mat is large as listed below: > > cudaMalloc((void **)&buffer2, bufferSize2) error( cudaErrorMemoryAllocation): > out of memory > > For CUDA backend, one can use "-matmatmult_backend_cpu -matptap_backend_cpu" > to avoid these problems. However, there seems no equivalent options in Kokkos > backend. Is there any good practice to avoid this error for both backends and > if we can avoid this error in Kokkos backend? Junchao will know more about KK tuning, but the faster GPU matrix-matrix algorithms use extra memory. We should be able to make the host option available with kokkos. > 2. We have tested the combination of Hypre and Kokkos as backend. It looks > like this combination is not compatible with each other, as we observed that > KSPSolve takes a greater number of iterations to exit, and the residual norm > in the post-checking is much larger than the one obtained when working with > CUDA backend. This happens for matrices with block size larger than 1. Is > there any explanation to the error? > > Second, we have couple more questions when working with multiple MPI ranks: > > 1. We are currently using OpenMPI as we couldnt get Intel MPI to work as a > GPU-aware MPI, is this a known issue with Intel MPI? As far as I know, Intel's MPI is only for SYCL/Intel GPUs. In general, GPU-aware MPI has been incredibly flaky on all HPC systems despite being introduced ten years ago. > 2. With OpenMPI we currently see a slow down when increasing the MPI count > as shown in the figure below, is this normal? Could you share -log_view output from a couple representative runs? You could send those here or to [email protected]. We need to see what kind of work is not scaling to attribute what may be causing it.
