Re: [OMPI users] OpenMPI2 + slurm

2018-11-30 Thread Lothar Brendel
On Fri, 23 Nov 2018 09:17:00 +0100
Lothar Brendel  wrote:

[...]

> looking into orte/mca/ras/slurm/ras_slurm_module.c, I find that while 
> orte_ras_slurm_allocate() reads the value of SLURM_CPUS_PER_TASK into its 
> local variable cpus_per_task, it doesn't use it anywhere. Rather, the number 
> of slots is determined from SLURM_TASKS_PER_NODE.
> 
> Is this intended behaviour?
> 
> What's wrong here? I know that I can use --oversubscribe, but that seems 
> rather a workaround.

I was wrong. OpenMPI has *always* ignored the value of SLURM_CPUS_PER_TASK and 
does so in 4.0.0. Only the default behaviour wrt oversubscribing has changed: 
Before version 2 you had to deny it explicitly via "--nooversubscribe", since 
version 2 you have to allow it explicitly via "--oversubscribe", cf. also 
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=850229

Hence, I've been oversubscribing all these years; oh my.

Ciao
Lothar
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users


Re: [OMPI users] OpenMPI2 + slurm (Ralph H Castain)

2018-11-25 Thread Lothar Brendel
> Couple of comments. Your original cmd line:
> 
> >>   srun -n 2 mpirun MPI-hellow
> 
> tells srun to launch two copies of mpirun, each of which is to run as many 
> processes as there are slots assigned to the allocation. srun will get an 
> allocation of two slots, and so you?ll get two concurrent MPI jobs, each 
> consisting of two procs.

Exactly. Hence, launching a job via this "minimal command line" and a certain 
number x after the -n, will always occupy x^2 processors.
Just out of curiocity: In which situation is this reasonable? IMHO the number 
of concurrent MPI jobs on one hand and the number of procs allocated by each of 
them on the other, these are quite independent parameters.


> Your other cmd line:
> 
> >>srun -c 2 mpirun -np 2 MPI-hellow
> 
> told srun to get two slots but only run one copy (the default value of the -n 
> option) of mpirun, and you told mpirun to launch two procs. So you got one 
> job consisting of two procs.

Exactly. Sadly, it bails out with Slurm 16.05.9 + OpenMPI 2.0.2 (while it used 
to work with Slurm 14.03.9 + OpenMPI 1.6.5). But most probabably I'm barking up 
the wrong tree here: I've just checked that the handling of SLURM_CPUS_PER_TASK 
hasn't changed in OpenMPI's ras_slurm_module.c.


> What you probably want to do is what Gilles advised. However, Slurm 16.05 
> only supports PMIx v1,

Actually, --mpi=list doesn't show any PMIx at all.

Well, I'll probably pass to newer versions of Slurm AND OpenMPI.

Thanks a lot to both of you
Lothar
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users


Re: [OMPI users] OpenMPI2 + slurm

2018-11-23 Thread Ralph H Castain
Couple of comments. Your original cmd line:

>>   srun -n 2 mpirun MPI-hellow

tells srun to launch two copies of mpirun, each of which is to run as many 
processes as there are slots assigned to the allocation. srun will get an 
allocation of two slots, and so you’ll get two concurrent MPI jobs, each 
consisting of two procs.

Your other cmd line:

>>srun -c 2 mpirun -np 2 MPI-hellow

told srun to get two slots but only run one copy (the default value of the -n 
option) of mpirun, and you told mpirun to launch two procs. So you got one job 
consisting of two procs.

What you probably want to do is what Gilles advised. However, Slurm 16.05 only 
supports PMIx v1, so you’d want to download and build PMIx v1.2.5, and then 
build Slurm against it. OMPI v2.0.2 may have a slightly older copy of PMIx in 
it (I honestly don’t remember) - to be safe, it would be best to configure OMPI 
to use the 1.2.5 you installed for Slurm. You’ll also be required to build OMPI 
against an external copy of libevent and hwloc to ensure OMPI is linked against 
the same versions used by PMIx.

Or you can just build OMPI against the Slurm PMI library - up to you.

Ralph


> On Nov 23, 2018, at 2:31 AM, Gilles Gouaillardet 
>  wrote:
> 
> Lothar,
> 
> it seems you did not configure Open MPI with --with-pmi=
> 
> If SLURM was built with PMIx support, then an other option is to use that.
> First, srun --mpi=list will show you the list of available MPI
> modules, and then you could
> srun --mpi=pmix_v2 ... MPI_Hellow
> If you believe that should be the default, then you should contact
> your sysadmin that can make that for you.
> 
> You you want to use PMIx, then I recommend you configure Open MPI with
> the same external PMIx that was used to
> build SLURM (e.g. configure --with-pmix=). Though PMIx
> has cross version support, using the same PMIx will avoid you running
> incompatible PMIx versions.
> 
> 
> Cheers,
> 
> Gilles
> On Fri, Nov 23, 2018 at 5:20 PM Lothar Brendel
>  wrote:
>> 
>> Hi guys,
>> 
>> I've always been somewhat at a loss regarding slurm's idea about tasks vs. 
>> jobs. That didn't cause any problems, though, until passing to OpenMPI2 
>> (2.0.2 that is, with slurm 16.05.9).
>> 
>> Running http://mpitutorial.com/tutorials/mpi-hello-world as an example with 
>> just
>> 
>>srun -n 2 MPI-hellow
>> 
>> yields
>> 
>> Hello world from processor node31, rank 0 out of 1 processors
>> Hello world from processor node31, rank 0 out of 1 processors
>> 
>> i.e. the two tasks don't see each other MPI-wise. Well, srun doesn't include 
>> an mpirun.
>> 
>> But running
>> 
>>srun -n 2 mpirun MPI-hellow
>> 
>> produces
>> 
>> Hello world from processor node31, rank 1 out of 2 processors
>> Hello world from processor node31, rank 0 out of 2 processors
>> Hello world from processor node31, rank 1 out of 2 processors
>> Hello world from processor node31, rank 0 out of 2 processors
>> 
>> i.e. I get *two* independent MPI-tasks with 2 processors each. (The same 
>> applies if stating explicitly "mpirun -np 2".)
>> I never could make sense of this squaring, I rather used to run my jobs like
>> 
>>srun -c 2 mpirun -np 2 MPI-hellow
>> 
>> which provided the desired job with *one* task using 2 processors. Passing 
>> from OpenMPI 1.6.5 to 2.0.2 (Debian Jessie -> Stretch), though, I'm getting 
>> the error
>> "There are not enough slots available in the system to satisfy the 2 slots
>> that were requested by the application:
>>  MPI-hellow" now.
>> 
>> The environment on the node contains
>> 
>> SLURM_CPUS_ON_NODE=2
>> SLURM_CPUS_PER_TASK=2
>> SLURM_JOB_CPUS_PER_NODE=2
>> SLURM_NTASKS=1
>> SLURM_TASKS_PER_NODE=1
>> 
>> which looks fine to me, but mpirun infers slots=1 from that (confirmed by 
>> ras_base_verbose 5). In deed, looking into 
>> orte/mca/ras/slurm/ras_slurm_module.c, I find that while 
>> orte_ras_slurm_allocate() reads the value of SLURM_CPUS_PER_TASK into its 
>> local variable cpus_per_task, it doesn't use it anywhere. Rather, the number 
>> of slots is determined from SLURM_TASKS_PER_NODE.
>> 
>> Is this intended behaviour?
>> 
>> What's wrong here? I know that I can use --oversubscribe, but that seems 
>> rather a workaround.
>> 
>> Thanks in advance,
>>Lothar
>> ___
>> users mailing list
>> users@lists.open-mpi.org
>> https://lists.open-mpi.org/mailman/listinfo/users
> ___
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users

___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] OpenMPI2 + slurm

2018-11-23 Thread Gilles Gouaillardet
Lothar,

it seems you did not configure Open MPI with --with-pmi=

If SLURM was built with PMIx support, then an other option is to use that.
First, srun --mpi=list will show you the list of available MPI
modules, and then you could
srun --mpi=pmix_v2 ... MPI_Hellow
If you believe that should be the default, then you should contact
your sysadmin that can make that for you.

You you want to use PMIx, then I recommend you configure Open MPI with
the same external PMIx that was used to
build SLURM (e.g. configure --with-pmix=). Though PMIx
has cross version support, using the same PMIx will avoid you running
incompatible PMIx versions.


Cheers,

Gilles
On Fri, Nov 23, 2018 at 5:20 PM Lothar Brendel
 wrote:
>
> Hi guys,
>
> I've always been somewhat at a loss regarding slurm's idea about tasks vs. 
> jobs. That didn't cause any problems, though, until passing to OpenMPI2 
> (2.0.2 that is, with slurm 16.05.9).
>
> Running http://mpitutorial.com/tutorials/mpi-hello-world as an example with 
> just
>
> srun -n 2 MPI-hellow
>
> yields
>
> Hello world from processor node31, rank 0 out of 1 processors
> Hello world from processor node31, rank 0 out of 1 processors
>
> i.e. the two tasks don't see each other MPI-wise. Well, srun doesn't include 
> an mpirun.
>
> But running
>
> srun -n 2 mpirun MPI-hellow
>
> produces
>
> Hello world from processor node31, rank 1 out of 2 processors
> Hello world from processor node31, rank 0 out of 2 processors
> Hello world from processor node31, rank 1 out of 2 processors
> Hello world from processor node31, rank 0 out of 2 processors
>
> i.e. I get *two* independent MPI-tasks with 2 processors each. (The same 
> applies if stating explicitly "mpirun -np 2".)
> I never could make sense of this squaring, I rather used to run my jobs like
>
> srun -c 2 mpirun -np 2 MPI-hellow
>
> which provided the desired job with *one* task using 2 processors. Passing 
> from OpenMPI 1.6.5 to 2.0.2 (Debian Jessie -> Stretch), though, I'm getting 
> the error
> "There are not enough slots available in the system to satisfy the 2 slots
> that were requested by the application:
>   MPI-hellow" now.
>
> The environment on the node contains
>
> SLURM_CPUS_ON_NODE=2
> SLURM_CPUS_PER_TASK=2
> SLURM_JOB_CPUS_PER_NODE=2
> SLURM_NTASKS=1
> SLURM_TASKS_PER_NODE=1
>
> which looks fine to me, but mpirun infers slots=1 from that (confirmed by 
> ras_base_verbose 5). In deed, looking into 
> orte/mca/ras/slurm/ras_slurm_module.c, I find that while 
> orte_ras_slurm_allocate() reads the value of SLURM_CPUS_PER_TASK into its 
> local variable cpus_per_task, it doesn't use it anywhere. Rather, the number 
> of slots is determined from SLURM_TASKS_PER_NODE.
>
> Is this intended behaviour?
>
> What's wrong here? I know that I can use --oversubscribe, but that seems 
> rather a workaround.
>
> Thanks in advance,
> Lothar
> ___
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users