Hi Victor, Junchao, Thank you for providing the script, it is very useful! There are still issues with hypre not binding correctly, and I'm getting the error message occasionally (but much less often). I added some additional environment variables to the script that seem to make the behavior more consistent.
export CUDA_DEVICE_ORDER=PCI_BUS_ID export CUDA_VISIBLE_DEVICES=$MV2_COMM_WORLD_LOCAL_RANK ## as Victor suggested export HYPRE_MEMORY_DEVICE=$MV2_COMM_WORLD_LOCAL_RANK The last environment variable is from hypre's documentation on GPUs. In 30 runs for a small problem size, 4 fail with a hypre-related error. Do you have any other thoughts or suggestions? Best, Anna ________________________________ From: Victor Eijkhout <eijkh...@tacc.utexas.edu> Sent: Thursday, February 1, 2024 11:26 AM To: Junchao Zhang <junchao.zh...@gmail.com>; Yesypenko, Anna <a...@oden.utexas.edu> Cc: petsc-users@mcs.anl.gov <petsc-users@mcs.anl.gov> Subject: Re: [petsc-users] errors with hypre with MPI and multiple GPUs on a node Only for mvapich2-gdr: #!/bin/bash # Usage: mpirun -n <num_proc> MV2_USE_AFFINITY=0 MV2_ENABLE_AFFINITY=0 ./launch ./bin export CUDA_VISIBLE_DEVICES=$MV2_COMM_WORLD_LOCAL_RANK case $MV2_COMM_WORLD_LOCAL_RANK in [0]) cpus=0-3 ;; [1]) cpus=64-67 ;; [2]) cpus=72-75 ;; esac numactl --physcpubind=$cpus $@