Hi all, I am attempting to run PyFR on a GPU cluster with Slurm. The program runs fine serially, but I am getting issues with running in parallel on the cluster while my local version runs well. The issue I am getting is
RuntimeError: Mesh has 4 partitions but running with 1 MPI ranks which I am assuming is from the MPI not initializing properly. I have tried first with my build of MVAPICH2, then with the cluster's OpenMPI module (intel-mpi/gcc/2018.1/64) with the suggestion of the cluster admins. I have rebuilt mpi4py in between changing from MVAPICH2 to OpenMPI. My Slurm script is as follows: #!/bin/bash #SBATCH -J t106_RR #SBATCH -t 00:05:00 #SBATCH -N 1 #SBATCH --ntasks=4 #SBATCH --ntasks-per-node=4 #SBATCH --ntasks-per-socket=2 #SBATCH --gres=gpu:4 ulimit -s unlimited source /home/tdzanic/.bashrc srun pyfr run -b cuda /home/tdzanic/TestCases/Turbine/mesh_t106a.pyfrm / home/tdzanic/TestCases/Turbine/t106_3D_baseline.ini -p I have tried also with mpiexec with the same error. I am using Anaconda, could that be an issue? Also, in my move from MVAPICH2 to OpenMPI, are there any other packages that I need to rebuild besides mpi4py? Thanks, Tarik -- You received this message because you are subscribed to the Google Groups "PyFR Mailing List" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send an email to [email protected]. Visit this group at https://groups.google.com/group/pyfrmailinglist. For more options, visit https://groups.google.com/d/optout.
