Re: [OMPI users] OpenMPI2 + slurm
On Fri, 23 Nov 2018 09:17:00 +0100 Lothar Brendel wrote: [...] > looking into orte/mca/ras/slurm/ras_slurm_module.c, I find that while > orte_ras_slurm_allocate() reads the value of SLURM_CPUS_PER_TASK into its > local variable cpus_per_task, it doesn't use it anywhere. Rather, the number > of slots is determined from SLURM_TASKS_PER_NODE. > > Is this intended behaviour? > > What's wrong here? I know that I can use --oversubscribe, but that seems > rather a workaround. I was wrong. OpenMPI has *always* ignored the value of SLURM_CPUS_PER_TASK and does so in 4.0.0. Only the default behaviour wrt oversubscribing has changed: Before version 2 you had to deny it explicitly via "--nooversubscribe", since version 2 you have to allow it explicitly via "--oversubscribe", cf. also https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=850229 Hence, I've been oversubscribing all these years; oh my. Ciao Lothar ___ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users
Re: [OMPI users] OpenMPI2 + slurm (Ralph H Castain)
> Couple of comments. Your original cmd line: > > >> srun -n 2 mpirun MPI-hellow > > tells srun to launch two copies of mpirun, each of which is to run as many > processes as there are slots assigned to the allocation. srun will get an > allocation of two slots, and so you?ll get two concurrent MPI jobs, each > consisting of two procs. Exactly. Hence, launching a job via this "minimal command line" and a certain number x after the -n, will always occupy x^2 processors. Just out of curiocity: In which situation is this reasonable? IMHO the number of concurrent MPI jobs on one hand and the number of procs allocated by each of them on the other, these are quite independent parameters. > Your other cmd line: > > >>srun -c 2 mpirun -np 2 MPI-hellow > > told srun to get two slots but only run one copy (the default value of the -n > option) of mpirun, and you told mpirun to launch two procs. So you got one > job consisting of two procs. Exactly. Sadly, it bails out with Slurm 16.05.9 + OpenMPI 2.0.2 (while it used to work with Slurm 14.03.9 + OpenMPI 1.6.5). But most probabably I'm barking up the wrong tree here: I've just checked that the handling of SLURM_CPUS_PER_TASK hasn't changed in OpenMPI's ras_slurm_module.c. > What you probably want to do is what Gilles advised. However, Slurm 16.05 > only supports PMIx v1, Actually, --mpi=list doesn't show any PMIx at all. Well, I'll probably pass to newer versions of Slurm AND OpenMPI. Thanks a lot to both of you Lothar ___ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users
Re: [OMPI users] OpenMPI2 + slurm
Couple of comments. Your original cmd line: >> srun -n 2 mpirun MPI-hellow tells srun to launch two copies of mpirun, each of which is to run as many processes as there are slots assigned to the allocation. srun will get an allocation of two slots, and so you’ll get two concurrent MPI jobs, each consisting of two procs. Your other cmd line: >>srun -c 2 mpirun -np 2 MPI-hellow told srun to get two slots but only run one copy (the default value of the -n option) of mpirun, and you told mpirun to launch two procs. So you got one job consisting of two procs. What you probably want to do is what Gilles advised. However, Slurm 16.05 only supports PMIx v1, so you’d want to download and build PMIx v1.2.5, and then build Slurm against it. OMPI v2.0.2 may have a slightly older copy of PMIx in it (I honestly don’t remember) - to be safe, it would be best to configure OMPI to use the 1.2.5 you installed for Slurm. You’ll also be required to build OMPI against an external copy of libevent and hwloc to ensure OMPI is linked against the same versions used by PMIx. Or you can just build OMPI against the Slurm PMI library - up to you. Ralph > On Nov 23, 2018, at 2:31 AM, Gilles Gouaillardet > wrote: > > Lothar, > > it seems you did not configure Open MPI with --with-pmi= > > If SLURM was built with PMIx support, then an other option is to use that. > First, srun --mpi=list will show you the list of available MPI > modules, and then you could > srun --mpi=pmix_v2 ... MPI_Hellow > If you believe that should be the default, then you should contact > your sysadmin that can make that for you. > > You you want to use PMIx, then I recommend you configure Open MPI with > the same external PMIx that was used to > build SLURM (e.g. configure --with-pmix=). Though PMIx > has cross version support, using the same PMIx will avoid you running > incompatible PMIx versions. > > > Cheers, > > Gilles > On Fri, Nov 23, 2018 at 5:20 PM Lothar Brendel > wrote: >> >> Hi guys, >> >> I've always been somewhat at a loss regarding slurm's idea about tasks vs. >> jobs. That didn't cause any problems, though, until passing to OpenMPI2 >> (2.0.2 that is, with slurm 16.05.9). >> >> Running http://mpitutorial.com/tutorials/mpi-hello-world as an example with >> just >> >>srun -n 2 MPI-hellow >> >> yields >> >> Hello world from processor node31, rank 0 out of 1 processors >> Hello world from processor node31, rank 0 out of 1 processors >> >> i.e. the two tasks don't see each other MPI-wise. Well, srun doesn't include >> an mpirun. >> >> But running >> >>srun -n 2 mpirun MPI-hellow >> >> produces >> >> Hello world from processor node31, rank 1 out of 2 processors >> Hello world from processor node31, rank 0 out of 2 processors >> Hello world from processor node31, rank 1 out of 2 processors >> Hello world from processor node31, rank 0 out of 2 processors >> >> i.e. I get *two* independent MPI-tasks with 2 processors each. (The same >> applies if stating explicitly "mpirun -np 2".) >> I never could make sense of this squaring, I rather used to run my jobs like >> >>srun -c 2 mpirun -np 2 MPI-hellow >> >> which provided the desired job with *one* task using 2 processors. Passing >> from OpenMPI 1.6.5 to 2.0.2 (Debian Jessie -> Stretch), though, I'm getting >> the error >> "There are not enough slots available in the system to satisfy the 2 slots >> that were requested by the application: >> MPI-hellow" now. >> >> The environment on the node contains >> >> SLURM_CPUS_ON_NODE=2 >> SLURM_CPUS_PER_TASK=2 >> SLURM_JOB_CPUS_PER_NODE=2 >> SLURM_NTASKS=1 >> SLURM_TASKS_PER_NODE=1 >> >> which looks fine to me, but mpirun infers slots=1 from that (confirmed by >> ras_base_verbose 5). In deed, looking into >> orte/mca/ras/slurm/ras_slurm_module.c, I find that while >> orte_ras_slurm_allocate() reads the value of SLURM_CPUS_PER_TASK into its >> local variable cpus_per_task, it doesn't use it anywhere. Rather, the number >> of slots is determined from SLURM_TASKS_PER_NODE. >> >> Is this intended behaviour? >> >> What's wrong here? I know that I can use --oversubscribe, but that seems >> rather a workaround. >> >> Thanks in advance, >>Lothar >> ___ >> users mailing list >> users@lists.open-mpi.org >> https://lists.open-mpi.org/mailman/listinfo/users > ___ > users mailing list > users@lists.open-mpi.org > https://lists.open-mpi.org/mailman/listinfo/users ___ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users
Re: [OMPI users] OpenMPI2 + slurm
Lothar, it seems you did not configure Open MPI with --with-pmi= If SLURM was built with PMIx support, then an other option is to use that. First, srun --mpi=list will show you the list of available MPI modules, and then you could srun --mpi=pmix_v2 ... MPI_Hellow If you believe that should be the default, then you should contact your sysadmin that can make that for you. You you want to use PMIx, then I recommend you configure Open MPI with the same external PMIx that was used to build SLURM (e.g. configure --with-pmix=). Though PMIx has cross version support, using the same PMIx will avoid you running incompatible PMIx versions. Cheers, Gilles On Fri, Nov 23, 2018 at 5:20 PM Lothar Brendel wrote: > > Hi guys, > > I've always been somewhat at a loss regarding slurm's idea about tasks vs. > jobs. That didn't cause any problems, though, until passing to OpenMPI2 > (2.0.2 that is, with slurm 16.05.9). > > Running http://mpitutorial.com/tutorials/mpi-hello-world as an example with > just > > srun -n 2 MPI-hellow > > yields > > Hello world from processor node31, rank 0 out of 1 processors > Hello world from processor node31, rank 0 out of 1 processors > > i.e. the two tasks don't see each other MPI-wise. Well, srun doesn't include > an mpirun. > > But running > > srun -n 2 mpirun MPI-hellow > > produces > > Hello world from processor node31, rank 1 out of 2 processors > Hello world from processor node31, rank 0 out of 2 processors > Hello world from processor node31, rank 1 out of 2 processors > Hello world from processor node31, rank 0 out of 2 processors > > i.e. I get *two* independent MPI-tasks with 2 processors each. (The same > applies if stating explicitly "mpirun -np 2".) > I never could make sense of this squaring, I rather used to run my jobs like > > srun -c 2 mpirun -np 2 MPI-hellow > > which provided the desired job with *one* task using 2 processors. Passing > from OpenMPI 1.6.5 to 2.0.2 (Debian Jessie -> Stretch), though, I'm getting > the error > "There are not enough slots available in the system to satisfy the 2 slots > that were requested by the application: > MPI-hellow" now. > > The environment on the node contains > > SLURM_CPUS_ON_NODE=2 > SLURM_CPUS_PER_TASK=2 > SLURM_JOB_CPUS_PER_NODE=2 > SLURM_NTASKS=1 > SLURM_TASKS_PER_NODE=1 > > which looks fine to me, but mpirun infers slots=1 from that (confirmed by > ras_base_verbose 5). In deed, looking into > orte/mca/ras/slurm/ras_slurm_module.c, I find that while > orte_ras_slurm_allocate() reads the value of SLURM_CPUS_PER_TASK into its > local variable cpus_per_task, it doesn't use it anywhere. Rather, the number > of slots is determined from SLURM_TASKS_PER_NODE. > > Is this intended behaviour? > > What's wrong here? I know that I can use --oversubscribe, but that seems > rather a workaround. > > Thanks in advance, > Lothar > ___ > users mailing list > users@lists.open-mpi.org > https://lists.open-mpi.org/mailman/listinfo/users ___ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users