Hi,
I have a small setup with one headnode and two compute nodes connected
via IB-QDR running CentOS 8.2 and Mellanox OFED 4.9 LTS. I installed
openmpi 3.0.6, 3.1.6, 4.0.3 and 4.0.4 with identical configuration
(configure, compile, nothing configured in openmpi-mca-params.conf), the
output from ompi-info and orte-info looks identical.
There is a small benchmark basically just doing MPI_Send() and
MPI_Recv(). I can invoke it directly like this (as 4.0.3 and 4.0.4)
/opt/openmpi/4.0.3/gcc/bin/mpirun -np 16 -hostfile HOSTFILE_2x8 -nolocal
./OWnetbench.openmpi-4.0.3
when running this job from slurm, it works with 4.0.3, but there is an
error with 4.0.4. Any hint what to check?
### running ./OWnetbench/OWnetbench.openmpi-4.0.4 with
/opt/openmpi/4.0.4/gcc/bin/mpirun ###
[node002.cluster:04960] MCW rank 0 bound to socket 0[core 7[hwt 0-1]]:
[../../../../../../../BB]
[node002.cluster:04963] PMIX ERROR: OUT-OF-RESOURCE in file
client/pmix_client.c at line 231
[node002.cluster:04963] OPAL ERROR: Error in file pmix3x_client.c at
line 112
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
*** and potentially your MPI job)
[node002.cluster:04963] Local abort before MPI_INIT completed completed
successfully, but am not able to aggregate error messages, and not able
to guarantee that all other processes were kil
led!
--------------------------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status,
thus causing
the job to be terminated. The first process to do so was:
Process name: [[15424,1],0]
Exit code: 1
--------------------------------------------------------------------------
Any hint why 4.0.4 behaves not like the other versions?
--
DELTA Computer Products GmbH
Röntgenstr. 4
D-21465 Reinbek bei Hamburg
T: +49 40 300672-30
F: +49 40 300672-11
E: michael.fuck...@delta.de
Internet: https://www.delta.de
Handelsregister Lübeck HRB 3678-RE, Ust.-IdNr.: DE135110550
Geschäftsführer: Hans-Peter Hellmann