Hi,

I have a small setup with one headnode and two compute nodes connected via IB-QDR running CentOS 8.2 and Mellanox OFED 4.9 LTS. I installed openmpi 3.0.6, 3.1.6, 4.0.3 and 4.0.4 with identical configuration (configure, compile, nothing configured in openmpi-mca-params.conf), the output from ompi-info and orte-info looks identical.

There is a small benchmark basically just doing MPI_Send() and MPI_Recv(). I can invoke it directly like this (as 4.0.3 and 4.0.4)

/opt/openmpi/4.0.3/gcc/bin/mpirun -np 16 -hostfile HOSTFILE_2x8 -nolocal ./OWnetbench.openmpi-4.0.3

when running this job from slurm, it works with 4.0.3, but there is an error with 4.0.4. Any hint what to check?


### running ./OWnetbench/OWnetbench.openmpi-4.0.4 with /opt/openmpi/4.0.4/gcc/bin/mpirun ### [node002.cluster:04960] MCW rank 0 bound to socket 0[core 7[hwt 0-1]]: [../../../../../../../BB] [node002.cluster:04963] PMIX ERROR: OUT-OF-RESOURCE in file client/pmix_client.c at line 231 [node002.cluster:04963] OPAL ERROR: Error in file pmix3x_client.c at line 112
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***    and potentially your MPI job)
[node002.cluster:04963] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were kil
led!
--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

  Process name: [[15424,1],0]
  Exit code:    1
--------------------------------------------------------------------------

Any hint why 4.0.4 behaves not like the other versions?

--
DELTA Computer Products GmbH
Röntgenstr. 4
D-21465 Reinbek bei Hamburg
T: +49 40 300672-30
F: +49 40 300672-11
E: michael.fuck...@delta.de

Internet: https://www.delta.de
Handelsregister Lübeck HRB 3678-RE, Ust.-IdNr.: DE135110550
Geschäftsführer: Hans-Peter Hellmann

Reply via email to