For the time being, you can
srun --ntasks-per-node 24 --jobid=...
When joining the allocation.

This use case looks a bit convoluted to me, so i am not even sure we should 
consider there is a bug in Open MPI.

Ralph, any thoughts  ?

Cheers,

Gilles

Gilles Gouaillardet <gil...@rist.or.jp> wrote:
>Thanks, now i can reproduce the issue
>
>
>Cheers,
>
>
>Gilles
>
>
>On 9/8/2017 5:20 PM, Maksym Planeta wrote:
>> I run start an interactive allocation and I just noticed that the problem 
>> happens, when I join this allocation from another shell.
>>
>> Here is how I join:
>>
>> srun --pty --x11 --jobid=$(squeue -u $USER -o %A | tail -n 1) bash
>>
>> And here is how I create the allocation:
>>
>> srun --pty --nodes 8 --ntasks-per-node 24 --mem 50G --time=3:00:00 
>> --partition=haswell --x11 bash
>>
>>
>> On 09/08/2017 09:58 AM, Gilles Gouaillardet wrote:
>>> Maxsym,
>>>
>>>
>>> can you please post your sbatch script ?
>>>
>>> fwiw, i am unable to reproduce the issue with the latest v2.x from github.
>>>
>>>
>>> by any chance, would you be able to test the latest openmpi 2.1.2rc3 ?
>>>
>>>
>>> Cheers,
>>>
>>>
>>> Gilles
>>>
>>>
>>> On 9/8/2017 4:19 PM, Maksym Planeta wrote:
>>>> Indeed mpirun shows slots=1 per node, but I create allocation with
>>>> --ntasks-per-node 24, so I do have all cores of the node allocated.
>>>>
>>>> When I use srun I can get all the cores.
>>>>
>>>> On 09/07/2017 02:12 PM, r...@open-mpi.org wrote:
>>>>> My best guess is that SLURM has only allocated 2 slots, and we
>>>>> respect the RM regardless of what you say in the hostfile. You can
>>>>> check this by adding --display-allocation to your cmd line. You
>>>>> probably need to tell slurm to allocate more cpus/node.
>>>>>
>>>>>
>>>>>> On Sep 7, 2017, at 3:33 AM, Maksym Planeta
>>>>>> <mplan...@os.inf.tu-dresden.de> wrote:
>>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> I'm trying to tell OpenMPI how many processes per node I want to
>>>>>> use, but mpirun seems to ignore the configuration I provide.
>>>>>>
>>>>>> I create following hostfile:
>>>>>>
>>>>>> $ cat hostfile.16
>>>>>> taurusi6344 slots=16
>>>>>> taurusi6348 slots=16
>>>>>>
>>>>>> And then start the app as follows:
>>>>>>
>>>>>> $ mpirun --display-map   -machinefile hostfile.16 -np 2 hostname
>>>>>> Data for JOB [42099,1] offset 0
>>>>>>
>>>>>> ========================   JOB MAP   ========================
>>>>>>
>>>>>> Data for node: taurusi6344     Num slots: 1    Max slots: 0    Num
>>>>>> procs: 1
>>>>>>           Process OMPI jobid: [42099,1] App: 0 Process rank: 0 Bound:
>>>>>> socket 0[core 0[hwt 0]], socket 0[core 1[hwt 0]], socket 0[core
>>>>>> 2[hwt 0]], socket 0[core 3[hwt 0]], socket 0[core 4[hwt 0]], socket
>>>>>> 0[core 5[hwt 0]], socket 0[core 6[hwt 0]], socket 0[core 7[hwt 0]],
>>>>>> socket 0[core 8[hwt 0]], socket 0[core 9[hwt 0]], socket 0[core
>>>>>> 10[hwt 0]], socket 0[core 11[hwt
>>>>>> 0]]:[B/B/B/B/B/B/B/B/B/B/B/B][./././././././././././.]
>>>>>>
>>>>>> Data for node: taurusi6348     Num slots: 1    Max slots: 0    Num
>>>>>> procs: 1
>>>>>>           Process OMPI jobid: [42099,1] App: 0 Process rank: 1 Bound:
>>>>>> socket 0[core 0[hwt 0]], socket 0[core 1[hwt 0]], socket 0[core
>>>>>> 2[hwt 0]], socket 0[core 3[hwt 0]], socket 0[core 4[hwt 0]], socket
>>>>>> 0[core 5[hwt 0]], socket 0[core 6[hwt 0]], socket 0[core 7[hwt 0]],
>>>>>> socket 0[core 8[hwt 0]], socket 0[core 9[hwt 0]], socket 0[core
>>>>>> 10[hwt 0]], socket 0[core 11[hwt
>>>>>> 0]]:[B/B/B/B/B/B/B/B/B/B/B/B][./././././././././././.]
>>>>>>
>>>>>> =============================================================
>>>>>> taurusi6344
>>>>>> taurusi6348
>>>>>>
>>>>>> If I put anything more than 2 in "-np 2", I get following error
>>>>>> message:
>>>>>>
>>>>>> $ mpirun --display-map   -machinefile hostfile.16 -np 4 hostname
>>>>>> --------------------------------------------------------------------------
>>>>>>
>>>>>> There are not enough slots available in the system to satisfy the 4
>>>>>> slots
>>>>>> that were requested by the application:
>>>>>>     hostname
>>>>>>
>>>>>> Either request fewer slots for your application, or make more slots
>>>>>> available
>>>>>> for use.
>>>>>> --------------------------------------------------------------------------
>>>>>>
>>>>>>
>>>>>> The OpenMPI version is "mpirun (Open MPI) 2.1.0"
>>>>>>
>>>>>> Also there is SLURM installed with version "slurm
>>>>>> 16.05.7-Bull.1.1-20170512-1252"
>>>>>>
>>>>>> Could you help me to enforce OpenMPI to respect slots paremeter?
>>>>>> -- 
>>>>>> Regards,
>>>>>> Maksym Planeta
>>>>>>
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> users@lists.open-mpi.org
>>>>>> https://lists.open-mpi.org/mailman/listinfo/users
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> users@lists.open-mpi.org
>>>>> https://lists.open-mpi.org/mailman/listinfo/users
>>>>>
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> users@lists.open-mpi.org
>>>> https://lists.open-mpi.org/mailman/listinfo/users
>>> _______________________________________________
>>> users mailing list
>>> users@lists.open-mpi.org
>>> https://lists.open-mpi.org/mailman/listinfo/users
>>
>>
>> _______________________________________________
>> users mailing list
>> users@lists.open-mpi.org
>> https://lists.open-mpi.org/mailman/listinfo/users
>
>_______________________________________________
>users mailing list
>users@lists.open-mpi.org
>https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Reply via email to