Fixed with: Hydra Environment Variables (intel.com) <https://www.intel.com/content/www/us/en/develop/documentation/mpi-developer-reference-linux/top/environment-variable-reference/hydra-environment-variables.html> I_MPI_HYDRA_BOOTSTRAP=ssh
On Tue, Aug 16, 2022 at 11:09 AM Joe Teumer <[email protected]> wrote: > Hello! > > Is there a way to turn off slurm MPI hooks? > A job submitted via sbatch executes Intel MPI and the thread affinity > settings are incorrect. > However, running MPI manually over SSH works and all bindings are correct. > > We are looking to run our MPI jobs via slurm sbatch and have the same > behavior as running the job manually over SSH. > > slurmd -V > slurm 22.05.3 > > RUNNING OMP_NUM_THREADS=, cmd=numactl -C 0-63,128-191 -m 0 mpirun -verbose > -genv I_MPI_DEBUG=4 -genv KMP_AFFINITY=verbose,granularity=fine,compact -np > 64 -ppn 64 ./mpiprogram -in in.program -log program -pk intel 0 omp 2 -sf > intel -screen none -v d 1 > > which mpirun > /opt/intel/psxe_runtime_2019.6.324/linux/mpi/intel64/bin/mpirun > > slurm sbatch: > > [mpiexec@node] *Launch arguments: /usr/local/bin/srun -N 1 -n 1 > --ntasks-per-node 1 --nodelist node --input none > /opt/intel/psxe_runtime_2019.6.324/linux/mpi/intel64/bin//hydra_bstrap_proxy* > --upstream-host > node --upstream-port 45427 --pgid 0 --launcher slurm --launcher-number 1 > --base-path /opt/intel/psxe_runtime_2019.6.324/linux/mpi/intel64/bin/ > --tree-width 16 --tree-level 1 --time-left -1 --collective-launch 1 --debug > /opt/intel/psxe_runtime_2019.6.324/linux/mpi/intel64/bin//hydra_pmi_proxy > --usize -1 --auto-cleanup 1 --abort-signal 9 > > SSH manual run: > > [mpiexec@node] Launch arguments: > */opt/intel/psxe_runtime_2019.6.324/linux/mpi/intel64/bin//hydra_bstrap_proxy* > --upstream-host > node --upstream-port 35747 --pgid 0 --launcher ssh --launcher-number 0 > --base-path /opt/intel/psxe_runtime_2019.6.324/linux/mpi/intel64/bin/ > --tree-width 16 --tree-level 1 --time-left -1 --collective-launch 1 --debug > --proxy-id 0 --node-id 0 --subtree-size 1 --upstream-fd 7 > /opt/intel/psxe_runtime_2019.6.324/linux/mpi/intel64/bin//hydra_pmi_proxy > --usize -1 --auto-cleanup 1 --abort-signal 9 >
