I should give more background. In the slurm error log for this job,
there was another error about a memcpy operation failing listed first,
so that caused the job to fail. I suspect these errors below are the
result of the other MPI ranks being killed in a not exactly simultaneous
manner, which is to be expected. I just want to make sure that this was
the case, and the error below wasn't a sign of another issue with the job.
Prentice
On 11/11/20 5:47 PM, Ralph Castain via users wrote:
Looks like it is coming from the Slurm PMIx plugin, not OMPI.
Artem - any ideas?
Ralph
On Nov 11, 2020, at 10:03 AM, Prentice Bisbal via users
<users@lists.open-mpi.org> wrote:
One of my users recently reported a failed job that was using OpenMPI 4.0.4
compiled with PGI 20.4. There two different errors reported. One was reported
once, and I think had nothing to do with OpenMPI or PMIX, and then this error
was repeated multiple times in the Slurm error output for the job:
pmixp_client_v2.c:210 [_errhandler] mpi/pmix: ERROR: Error handler invoked:
status = -25: No such file or directory (2)
Anyone else see this before? Any idea what would cause this error? I did a
google search but couldn't find any discussion of this error anywhere.
--
Prentice
--
Prentice Bisbal
Lead Software Engineer
Research Computing
Princeton Plasma Physics Laboratory
http://www.pppl.gov