Re: [slurm-users] Failed to launch jobs with mpirun after upgrading to Slurm 19.05

2019-06-10 Thread Andrés Marín Díaz
Hello, I have already applied the patch and recompiled and everything works correctly. Now to wait for the 19.05.1. Thank you. -- Andrés Marín Díaz Servicio de Infraestructura e Innovación Universidad Politécnica de Madrid Centro de Supercomputación y Visualización de Madrid (CeSViM

Re: [slurm-users] Failed to launch jobs with mpirun after upgrading to Slurm 19.05

2019-06-07 Thread Levi Morrison
See Tim Wickberg's comment and patch from this morning (https://bugs.schedmd.com/show_bug.cgi?id=7191#c7); especially: > Some variant of this patch - albeit with a warning message added in to note that --cpu-bind is the correct spelling - will be in 19.05.1 when released, and supported through

Re: [slurm-users] Failed to launch jobs with mpirun after upgrading to Slurm 19.05

2019-06-07 Thread Andrés Marín Díaz
Good morning, thank you very much to all for helping us find the problem. I join Levi's proposal to reverse the change. Is there any way to temporarily patch the slurm code 19.05 while analyzing the proposal so that you do not have to patch and recompile the different versions of OpenMPI? Th

Re: [slurm-users] Failed to launch jobs with mpirun after upgrading to Slurm 19.05

2019-06-06 Thread Christopher Samuel
On 6/6/19 12:01 PM, Kilian Cavalotti wrote: Levi did already. Aha, race condition between searching bugzilla and writing the email. ;-) -- Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA

Re: [slurm-users] Failed to launch jobs with mpirun after upgrading to Slurm 19.05

2019-06-06 Thread Kilian Cavalotti
On Thu, Jun 6, 2019 at 11:16 AM Christopher Samuel wrote: > Sounds like a good reason to file a bug. Levi did already. Everybody can vote at https://bugs.schedmd.com/show_bug.cgi?id=7191 :) Cheers, -- Kilian

Re: [slurm-users] Failed to launch jobs with mpirun after upgrading to Slurm 19.05

2019-06-06 Thread Christopher Samuel
On 6/6/19 10:21 AM, Levi Morrison wrote: This means all OpenMPI programs that end up calling `srun` on Slurm 19.05 will fail. Sounds like a good reason to file a bug. We're not on 19.05 yet so we're not affected (yet) but this may cause us some pain when we get to that point (though at leas

Re: [slurm-users] Failed to launch jobs with mpirun after upgrading to Slurm 19.05

2019-06-06 Thread Levi Morrison
Slurm 19.05 removed support for `--cpu_bind`, which is what all released versions of OpenMPI are using when they call into srun. This issue was fixed 24 days ago in [OpenMPI's git repo][1]. This means all OpenMPI programs that end up calling `srun` on Slurm 19.05 will fail. This enormous amo

Re: [slurm-users] Failed to launch jobs with mpirun after upgrading to Slurm 19.05

2019-06-06 Thread Levi Morrison
Slurm 19.05 removed support for `--cpu_bind`, which is what /all/ released versions of OpenMPI are using when they call into srun. This issue was fixed 24 days ago in [OpenMPI's git repo][1]. This means /all/ OpenMPI programs that end up calling `srun` on Slurm 19.05 will fail. This enormous

Re: [slurm-users] Failed to launch jobs with mpirun after upgrading to Slurm 19.05

2019-06-06 Thread Andrés Marín Díaz
Hello, We have tried to compile it in 2 ways, in principle we had compiled it with pmix in the following way: rpmbuild -ta slurm-19.05.0.tar.bz2 --define = '_ with_pmix --with-pmix = / opt / pmix / 3.1.2 /' But we have also tried compiling it without pmix: rpmbuild -ta slurm-19.05.0.tar.bz2

Re: [slurm-users] Failed to launch jobs with mpirun after upgrading to Slurm 19.05

2019-06-06 Thread Sean Crosby
How did you compile SLURM? Did you add the contribs/pmi and/or contribs/pmi2 plugins to the install? Or did you use PMIx? Sean -- Sean Crosby Senior DevOpsHPC Engineer and HPC Team Lead | Research Platform Services Research Computing | CoEPP | School of Physics University of Melbourne On Thu,

Re: [slurm-users] Failed to launch jobs with mpirun after upgrading to Slurm 19.05

2019-06-06 Thread Andrés Marín Díaz
Hello, Yes, we have recompiled OpenMPI with integration with SLURM 19.05 but the problem remains. We have also tried to recompile OpenMPI without integration with SLURM. In this case executions fail with srun, but with mpirun it continues to work in SLURM 18.08 and fails in 19.05 in the same

Re: [slurm-users] Failed to launch jobs with mpirun after upgrading to Slurm 19.05

2019-06-06 Thread Sean Crosby
Hi Andrés, Did you recompile OpenMPI after updating to SLURM 19.05? Sean -- Sean Crosby Senior DevOpsHPC Engineer and HPC Team Lead | Research Platform Services Research Computing | CoEPP | School of Physics University of Melbourne On Thu, 6 Jun 2019 at 20:11, Andrés Marín Díaz mailto:ama...@

Re: [slurm-users] Failed to launch jobs with mpirun after upgrading to Slurm 19.05

2019-06-06 Thread Andrés Marín Díaz
Thank you very much for the help, I update some information. - If we use Intel MPI (IMPI) mpirun it works correctly. - If we use mpirun without using the scheduler it works correctly. - If we use srun with software compiled with OpenMPI it works correctly. - If we use SLURM 18.08.6 it works corre

Re: [slurm-users] Failed to launch jobs with mpirun after upgrading to Slurm 19.05

2019-06-05 Thread Chris Samuel
On Wednesday, 5 June 2019 10:04:11 AM PDT Andrés Marín Díaz wrote: > Can it be a bug in the new version? If it's working with srun but not with mpirun it sounds like there's some incompatibility between how mpirun is calling srun to launch orted and what Slurm is doing now. You'd need to find

[slurm-users] Failed to launch jobs with mpirun after upgrading to Slurm 19.05

2019-06-05 Thread Andrés Marín Díaz
Hello, since we have updated to the new slurm version (19.05) every time a jobstep is launched with mpirun it ends with the following error message:     An ORTE daemon has unexpectedly failed after launch and before     communicating back to mpirun. This could be caused by a number     of factor