For the time being, you can srun --ntasks-per-node 24 --jobid=... When joining the allocation.
This use case looks a bit convoluted to me, so i am not even sure we should consider there is a bug in Open MPI. Ralph, any thoughts ? Cheers, Gilles Gilles Gouaillardet <gil...@rist.or.jp> wrote: >Thanks, now i can reproduce the issue > > >Cheers, > > >Gilles > > >On 9/8/2017 5:20 PM, Maksym Planeta wrote: >> I run start an interactive allocation and I just noticed that the problem >> happens, when I join this allocation from another shell. >> >> Here is how I join: >> >> srun --pty --x11 --jobid=$(squeue -u $USER -o %A | tail -n 1) bash >> >> And here is how I create the allocation: >> >> srun --pty --nodes 8 --ntasks-per-node 24 --mem 50G --time=3:00:00 >> --partition=haswell --x11 bash >> >> >> On 09/08/2017 09:58 AM, Gilles Gouaillardet wrote: >>> Maxsym, >>> >>> >>> can you please post your sbatch script ? >>> >>> fwiw, i am unable to reproduce the issue with the latest v2.x from github. >>> >>> >>> by any chance, would you be able to test the latest openmpi 2.1.2rc3 ? >>> >>> >>> Cheers, >>> >>> >>> Gilles >>> >>> >>> On 9/8/2017 4:19 PM, Maksym Planeta wrote: >>>> Indeed mpirun shows slots=1 per node, but I create allocation with >>>> --ntasks-per-node 24, so I do have all cores of the node allocated. >>>> >>>> When I use srun I can get all the cores. >>>> >>>> On 09/07/2017 02:12 PM, r...@open-mpi.org wrote: >>>>> My best guess is that SLURM has only allocated 2 slots, and we >>>>> respect the RM regardless of what you say in the hostfile. You can >>>>> check this by adding --display-allocation to your cmd line. You >>>>> probably need to tell slurm to allocate more cpus/node. >>>>> >>>>> >>>>>> On Sep 7, 2017, at 3:33 AM, Maksym Planeta >>>>>> <mplan...@os.inf.tu-dresden.de> wrote: >>>>>> >>>>>> Hello, >>>>>> >>>>>> I'm trying to tell OpenMPI how many processes per node I want to >>>>>> use, but mpirun seems to ignore the configuration I provide. >>>>>> >>>>>> I create following hostfile: >>>>>> >>>>>> $ cat hostfile.16 >>>>>> taurusi6344 slots=16 >>>>>> taurusi6348 slots=16 >>>>>> >>>>>> And then start the app as follows: >>>>>> >>>>>> $ mpirun --display-map -machinefile hostfile.16 -np 2 hostname >>>>>> Data for JOB [42099,1] offset 0 >>>>>> >>>>>> ======================== JOB MAP ======================== >>>>>> >>>>>> Data for node: taurusi6344 Num slots: 1 Max slots: 0 Num >>>>>> procs: 1 >>>>>> Process OMPI jobid: [42099,1] App: 0 Process rank: 0 Bound: >>>>>> socket 0[core 0[hwt 0]], socket 0[core 1[hwt 0]], socket 0[core >>>>>> 2[hwt 0]], socket 0[core 3[hwt 0]], socket 0[core 4[hwt 0]], socket >>>>>> 0[core 5[hwt 0]], socket 0[core 6[hwt 0]], socket 0[core 7[hwt 0]], >>>>>> socket 0[core 8[hwt 0]], socket 0[core 9[hwt 0]], socket 0[core >>>>>> 10[hwt 0]], socket 0[core 11[hwt >>>>>> 0]]:[B/B/B/B/B/B/B/B/B/B/B/B][./././././././././././.] >>>>>> >>>>>> Data for node: taurusi6348 Num slots: 1 Max slots: 0 Num >>>>>> procs: 1 >>>>>> Process OMPI jobid: [42099,1] App: 0 Process rank: 1 Bound: >>>>>> socket 0[core 0[hwt 0]], socket 0[core 1[hwt 0]], socket 0[core >>>>>> 2[hwt 0]], socket 0[core 3[hwt 0]], socket 0[core 4[hwt 0]], socket >>>>>> 0[core 5[hwt 0]], socket 0[core 6[hwt 0]], socket 0[core 7[hwt 0]], >>>>>> socket 0[core 8[hwt 0]], socket 0[core 9[hwt 0]], socket 0[core >>>>>> 10[hwt 0]], socket 0[core 11[hwt >>>>>> 0]]:[B/B/B/B/B/B/B/B/B/B/B/B][./././././././././././.] >>>>>> >>>>>> ============================================================= >>>>>> taurusi6344 >>>>>> taurusi6348 >>>>>> >>>>>> If I put anything more than 2 in "-np 2", I get following error >>>>>> message: >>>>>> >>>>>> $ mpirun --display-map -machinefile hostfile.16 -np 4 hostname >>>>>> -------------------------------------------------------------------------- >>>>>> >>>>>> There are not enough slots available in the system to satisfy the 4 >>>>>> slots >>>>>> that were requested by the application: >>>>>> hostname >>>>>> >>>>>> Either request fewer slots for your application, or make more slots >>>>>> available >>>>>> for use. >>>>>> -------------------------------------------------------------------------- >>>>>> >>>>>> >>>>>> The OpenMPI version is "mpirun (Open MPI) 2.1.0" >>>>>> >>>>>> Also there is SLURM installed with version "slurm >>>>>> 16.05.7-Bull.1.1-20170512-1252" >>>>>> >>>>>> Could you help me to enforce OpenMPI to respect slots paremeter? >>>>>> -- >>>>>> Regards, >>>>>> Maksym Planeta >>>>>> >>>>>> _______________________________________________ >>>>>> users mailing list >>>>>> users@lists.open-mpi.org >>>>>> https://lists.open-mpi.org/mailman/listinfo/users >>>>> _______________________________________________ >>>>> users mailing list >>>>> users@lists.open-mpi.org >>>>> https://lists.open-mpi.org/mailman/listinfo/users >>>>> >>>> >>>> _______________________________________________ >>>> users mailing list >>>> users@lists.open-mpi.org >>>> https://lists.open-mpi.org/mailman/listinfo/users >>> _______________________________________________ >>> users mailing list >>> users@lists.open-mpi.org >>> https://lists.open-mpi.org/mailman/listinfo/users >> >> >> _______________________________________________ >> users mailing list >> users@lists.open-mpi.org >> https://lists.open-mpi.org/mailman/listinfo/users > >_______________________________________________ >users mailing list >users@lists.open-mpi.org >https://lists.open-mpi.org/mailman/listinfo/users _______________________________________________ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users