> On Aug 24, 2023, at 2:00 PM, Vanella, Marcos (Fed)
> wrote:
>
> Thank you Barry, I will dial back the MPI_F08 use in our source code and try
> compiling it. I haven't found much information regarding using MPI and
> MPI_F08 in different modules other than the following link from several
PETSc uses the non-MPI_F08 Fortran modules so I am guessing when you also
use the MPI_F08 modules the compiler sees two sets of interfaces for the same
functions hence the error. I am not sure if it portable to use PETSc with the
F08 Fortran modules in the same program or routine.
>
Thank you Matt and Junchao. I've been testing further with nvhpc on summit. You
might have an idea on what is going on here.
These are my modules:
Currently Loaded Modules:
1) lsf-tools/2.0 3) darshan-runtime/3.4.0-lite 5) DefApps 7)
spectrum-mpi/10.4.0.3-20210112 9)
Macros,
yes, refer to the example script Matt mentioned for Summit. Feel free to
turn on/off options in the file. In my experience, gcc is easier to use.
Also, I found
https://docs.alcf.anl.gov/polaris/running-jobs/#binding-mpi-ranks-to-gpus,
which might be similar to your machine (4 GPUs
On Tue, Aug 22, 2023 at 2:54 PM Vanella, Marcos (Fed) via petsc-users <
petsc-users@mcs.anl.gov> wrote:
> Hi Junchao, both the slurm scontrol show job_id -dd and looking at
> CUDA_VISIBLE_DEVICES does not provide information about which MPI process
> is associated to which GPU in the node in our
Hi Junchao, both the slurm scontrol show job_id -dd and looking at
CUDA_VISIBLE_DEVICES does not provide information about which MPI process is
associated to which GPU in the node in our system. I can see this with
nvidia-smi, but if you have any other suggestion using slurm I would like to
That is a good question. Looking at
https://slurm.schedmd.com/gres.html#GPU_Management, I was wondering if you
can share the output of your job so we can search CUDA_VISIBLE_DEVICES and
see how GPUs were allocated.
--Junchao Zhang
On Mon, Aug 21, 2023 at 2:38 PM Vanella, Marcos (Fed) <
Ok thanks Junchao, so is GPU 0 actually allocating memory for the 8 MPI
processes meshes but only working on 2 of them?
It says in the script it has allocated 2.4GB
Best,
Marcos
From: Junchao Zhang
Sent: Monday, August 21, 2023 3:29 PM
To: Vanella, Marcos (Fed)
Hi, Macros,
If you look at the PIDs of the nvidia-smi output, you will only find 8
unique PIDs, which is expected since you allocated 8 MPI ranks per node.
The duplicate PIDs are usually for threads spawned by the MPI runtime
(for example, progress threads in MPI implementation). So your job
Hi Junchao, something I'm noting related to running with cuda enabled linear
solvers (CG+HYPRE, CG+GAMG) is that for multi cpu-multi gpu calculations, the
GPU 0 in the node is taking what seems to be all sub-matrices corresponding to
all the MPI processes in the node. This is the result of the
I don't see a problem in the matrix assembly.
If you point me to your repo and show me how to build it, I can try to
reproduce.
--Junchao Zhang
On Mon, Aug 14, 2023 at 2:53 PM Vanella, Marcos (Fed) <
marcos.vane...@nist.gov> wrote:
> Hi Junchao, I've tried for my case using the -ksp_type gmres
Yeah, it looks like ex60 was run correctly.
Double check your code again and if you still run into errors, we can try
to reproduce on our end.
Thanks.
--Junchao Zhang
On Mon, Aug 14, 2023 at 1:05 PM Vanella, Marcos (Fed) <
marcos.vane...@nist.gov> wrote:
> Hi Junchao, I compiled and run ex60
Before digging into the details, could you try to run
src/ksp/ksp/tests/ex60.c to make sure the environment is ok.
The comment at the end shows how to run it
test:
requires: cuda
suffix: 1_cuda
nsize: 4
args: -ksp_view -mat_type aijcusparse
Marcos,
We do not have good petsc/gpu documentation, but see
https://petsc.org/main/faq/#doc-faq-gpuhowto, and also search "requires:
cuda" in petsc tests and you will find examples using GPU.
For the Fortran compile errors, attach your configure.log and Satish
(Cc'ed) or others should know
Hi, Macros,
I saw MatSetPreallocationCOO_MPIAIJCUSPARSE_Basic() in the error stack.
We recently refactored the COO code and got rid of that function. So could
you try petsc/main?
We map MPI processes to GPUs in a round-robin fashion. We query the
number of visible CUDA devices (g), and assign
Hi Junchao, thank you for replying. I compiled petsc in debug mode and this is
what I get for the case:
terminate called after throwing an instance of 'thrust::system::system_error'
what(): merge_sort: failed to synchronize: cudaErrorIllegalAddress: an
illegal memory access was encountered
Hi, Marcos,
Could you build petsc in debug mode and then copy and paste the whole
error stack message?
Thanks
--Junchao Zhang
On Thu, Aug 10, 2023 at 5:51 PM Vanella, Marcos (Fed) via petsc-users <
petsc-users@mcs.anl.gov> wrote:
> Hi, I'm trying to run a parallel matrix vector build and
Hi, I'm trying to run a parallel matrix vector build and linear solution with
PETSc on 2 MPI processes + one V100 GPU. I tested that the matrix build and
solution is successful in CPUs only. I'm using cuda 11.5 and cuda enabled
openmpi and gcc 9.3. When I run the job with GPU enabled I get the
18 matches
Mail list logo