Hello, I have already applied the patch and recompiled and everything
works correctly.
Now to wait for the 19.05.1.
Thank you.
--
Andrés Marín Díaz
Servicio de Infraestructura e Innovación
Universidad Politécnica de Madrid
Centro de Supercomputación y Visualización de Madrid (CeSViM
See Tim Wickberg's comment and patch from this morning
(https://bugs.schedmd.com/show_bug.cgi?id=7191#c7); especially:
> Some variant of this patch - albeit with a warning message added in
to note that --cpu-bind is the correct spelling - will be in 19.05.1
when released, and supported through
Good morning, thank you very much to all for helping us find the problem.
I join Levi's proposal to reverse the change.
Is there any way to temporarily patch the slurm code 19.05 while
analyzing the proposal so that you do not have to patch and recompile
the different versions of OpenMPI?
Th
On 6/6/19 12:01 PM, Kilian Cavalotti wrote:
Levi did already.
Aha, race condition between searching bugzilla and writing the email. ;-)
--
Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA
On Thu, Jun 6, 2019 at 11:16 AM Christopher Samuel wrote:
> Sounds like a good reason to file a bug.
Levi did already. Everybody can vote at
https://bugs.schedmd.com/show_bug.cgi?id=7191 :)
Cheers,
--
Kilian
On 6/6/19 10:21 AM, Levi Morrison wrote:
This means all OpenMPI programs that end up calling `srun` on Slurm
19.05 will fail.
Sounds like a good reason to file a bug. We're not on 19.05 yet so
we're not affected (yet) but this may cause us some pain when we get to
that point (though at leas
Slurm 19.05 removed support for `--cpu_bind`, which is what all released
versions of OpenMPI are using when they call into srun. This issue was
fixed 24 days ago in [OpenMPI's git repo][1].
This means all OpenMPI programs that end up calling `srun` on Slurm
19.05 will fail.
This enormous amo
Slurm 19.05 removed support for `--cpu_bind`, which is what /all/
released versions of OpenMPI are using when they call into srun. This
issue was fixed 24 days ago in [OpenMPI's git repo][1].
This means /all/ OpenMPI programs that end up calling `srun` on Slurm
19.05 will fail.
This enormous
Hello,
We have tried to compile it in 2 ways, in principle we had compiled it
with pmix in the following way:
rpmbuild -ta slurm-19.05.0.tar.bz2 --define = '_ with_pmix --with-pmix =
/ opt / pmix / 3.1.2 /'
But we have also tried compiling it without pmix:
rpmbuild -ta slurm-19.05.0.tar.bz2
How did you compile SLURM? Did you add the contribs/pmi and/or contribs/pmi2
plugins to the install? Or did you use PMIx?
Sean
--
Sean Crosby
Senior DevOpsHPC Engineer and HPC Team Lead | Research Platform Services
Research Computing | CoEPP | School of Physics
University of Melbourne
On Thu,
Hello,
Yes, we have recompiled OpenMPI with integration with SLURM 19.05 but
the problem remains.
We have also tried to recompile OpenMPI without integration with SLURM.
In this case executions fail with srun, but with mpirun it continues to
work in SLURM 18.08 and fails in 19.05 in the same
Hi Andrés,
Did you recompile OpenMPI after updating to SLURM 19.05?
Sean
--
Sean Crosby
Senior DevOpsHPC Engineer and HPC Team Lead | Research Platform Services
Research Computing | CoEPP | School of Physics
University of Melbourne
On Thu, 6 Jun 2019 at 20:11, Andrés Marín Díaz
mailto:ama...@
Thank you very much for the help, I update some information.
- If we use Intel MPI (IMPI) mpirun it works correctly.
- If we use mpirun without using the scheduler it works correctly.
- If we use srun with software compiled with OpenMPI it works correctly.
- If we use SLURM 18.08.6 it works corre
On Wednesday, 5 June 2019 10:04:11 AM PDT Andrés Marín Díaz wrote:
> Can it be a bug in the new version?
If it's working with srun but not with mpirun it sounds like there's some
incompatibility between how mpirun is calling srun to launch orted and what
Slurm is doing now.
You'd need to find
Hello, since we have updated to the new slurm version (19.05) every time
a jobstep is launched with mpirun it ends with the following error message:
An ORTE daemon has unexpectedly failed after launch and before
communicating back to mpirun. This could be caused by a number
of factor
15 matches
Mail list logo