On Oct 9, 2015, at 6:14 AM, Marcin Krotkiewski <marcin.krotkiew...@gmail.com>
wrote:
Ralph,
Here is the result running
mpirun --map-by slot:pe=4 -display-allocation ./affinity
====================== ALLOCATED NODES ======================
c12-29: slots=4 max_slots=0 slots_inuse=0 state=UP
=================================================================
rank 0 @ compute-12-29.local 1, 2, 3, 4, 17, 18, 19, 20,
I also attach output with --mca rmaps_base_verbose 10. It says 4 slots all over
the place, so it is really weird it should not work.
Thanks!
Marcin
[login-0-1.local:30710] mca: base: components_register: registering rmaps
components
[login-0-1.local:30710] mca: base: components_register: found loaded component
round_robin
[login-0-1.local:30710] mca: base: components_register: component round_robin
register function successful
[login-0-1.local:30710] mca: base: components_register: found loaded component
rank_file
[login-0-1.local:30710] mca: base: components_register: component rank_file
register function successful
[login-0-1.local:30710] mca: base: components_register: found loaded component
seq
[login-0-1.local:30710] mca: base: components_register: component seq register
function successful
[login-0-1.local:30710] mca: base: components_register: found loaded component
resilient
[login-0-1.local:30710] mca: base: components_register: component resilient
register function successful
[login-0-1.local:30710] mca: base: components_register: found loaded component
staged
[login-0-1.local:30710] mca: base: components_register: component staged has no
register or open function
[login-0-1.local:30710] mca: base: components_register: found loaded component
mindist
[login-0-1.local:30710] mca: base: components_register: component mindist
register function successful
[login-0-1.local:30710] mca: base: components_register: found loaded component
ppr
[login-0-1.local:30710] mca: base: components_register: component ppr register
function successful
[login-0-1.local:30710] [[61064,0],0] rmaps:base set policy with slot:pe=4
[login-0-1.local:30710] [[61064,0],0] rmaps:base policy slot modifiers pe=4
provided
[login-0-1.local:30710] [[61064,0],0] rmaps:base check modifiers with pe=4
[login-0-1.local:30710] [[61064,0],0] rmaps:base setting pe/rank to 4
[login-0-1.local:30710] mca: base: components_open: opening rmaps components
[login-0-1.local:30710] mca: base: components_open: found loaded component
round_robin
[login-0-1.local:30710] mca: base: components_open: component round_robin open
function successful
[login-0-1.local:30710] mca: base: components_open: found loaded component
rank_file
[login-0-1.local:30710] mca: base: components_open: component rank_file open
function successful
[login-0-1.local:30710] mca: base: components_open: found loaded component seq
[login-0-1.local:30710] mca: base: components_open: component seq open function
successful
[login-0-1.local:30710] mca: base: components_open: found loaded component
resilient
[login-0-1.local:30710] mca: base: components_open: component resilient open
function successful
[login-0-1.local:30710] mca: base: components_open: found loaded component
staged
[login-0-1.local:30710] mca: base: components_open: component staged open
function successful
[login-0-1.local:30710] mca: base: components_open: found loaded component
mindist
[login-0-1.local:30710] mca: base: components_open: component mindist open
function successful
[login-0-1.local:30710] mca: base: components_open: found loaded component ppr
[login-0-1.local:30710] mca: base: components_open: component ppr open function
successful
[login-0-1.local:30710] mca:rmaps:select: checking available component
round_robin
[login-0-1.local:30710] mca:rmaps:select: Querying component [round_robin]
[login-0-1.local:30710] mca:rmaps:select: checking available component rank_file
[login-0-1.local:30710] mca:rmaps:select: Querying component [rank_file]
[login-0-1.local:30710] mca:rmaps:select: checking available component seq
[login-0-1.local:30710] mca:rmaps:select: Querying component [seq]
[login-0-1.local:30710] mca:rmaps:select: checking available component resilient
[login-0-1.local:30710] mca:rmaps:select: Querying component [resilient]
[login-0-1.local:30710] mca:rmaps:select: checking available component staged
[login-0-1.local:30710] mca:rmaps:select: Querying component [staged]
[login-0-1.local:30710] mca:rmaps:select: checking available component mindist
[login-0-1.local:30710] mca:rmaps:select: Querying component [mindist]
[login-0-1.local:30710] mca:rmaps:select: checking available component ppr
[login-0-1.local:30710] mca:rmaps:select: Querying component [ppr]
[login-0-1.local:30710] [[61064,0],0]: Final mapper priorities
[login-0-1.local:30710] Mapper: ppr Priority: 90
[login-0-1.local:30710] Mapper: seq Priority: 60
[login-0-1.local:30710] Mapper: resilient Priority: 40
[login-0-1.local:30710] Mapper: mindist Priority: 20
[login-0-1.local:30710] Mapper: round_robin Priority: 10
[login-0-1.local:30710] Mapper: staged Priority: 5
[login-0-1.local:30710] Mapper: rank_file Priority: 0
====================== ALLOCATED NODES ======================
c12-29: slots=4 max_slots=0 slots_inuse=0 state=UP
=================================================================
[login-0-1.local:30710] mca:rmaps: mapping job [61064,1]
[login-0-1.local:30710] mca:rmaps: creating new map for job [61064,1]
[login-0-1.local:30710] AVAILABLE NODES FOR MAPPING:
[login-0-1.local:30710] node: c12-29 daemon: 1
[login-0-1.local:30710] mca:rmaps: nprocs 4
[login-0-1.local:30710] mca:rmaps mapping given - using default
[login-0-1.local:30710] mca:rmaps:ppr: job [61064,1] not using ppr mapper
[login-0-1.local:30710] mca:rmaps:seq: job [61064,1] not using seq mapper
[login-0-1.local:30710] mca:rmaps:resilient: cannot perform initial map of job
[61064,1] - no fault groups
[login-0-1.local:30710] mca:rmaps:mindist: job [61064,1] not using mindist
mapper
[login-0-1.local:30710] mca:rmaps:rr: mapping job [61064,1]
[login-0-1.local:30710] AVAILABLE NODES FOR MAPPING:
[login-0-1.local:30710] node: c12-29 daemon: 1
[login-0-1.local:30710] mca:rmaps:rr: mapping by slot for job [61064,1] slots 4
num_procs 1
[login-0-1.local:30710] mca:rmaps:rr:slot working node c12-29
[login-0-1.local:30710] mca:rmaps:rr:slot assigning 1 procs to node c12-29
[login-0-1.local:30710] mca:rmaps:base: computing vpids by slot for job
[61064,1]
[login-0-1.local:30710] mca:rmaps:base: assigning rank 0 to node c12-29
[login-0-1.local:30710] mca:rmaps: compute bindings for job [61064,1] with
policy CORE:IF-SUPPORTED[5008]
[login-0-1.local:30710] [[61064,0],0] reset_usage: node c12-29 has 1 procs on it
[login-0-1.local:30710] [[61064,0],0] reset_usage: ignoring proc [[61064,1],0]
[login-0-1.local:30710] [[61064,0],0] bind_depth: 6 map_depth 0
[login-0-1.local:30710] mca:rmaps: bind downward for job [61064,1] with
bindings CORE:IF-SUPPORTED
[login-0-1.local:30710] [[61064,0],0] GOT 1 CPUS
[login-0-1.local:30710] [[61064,0],0] GOT 1 CPUS
[login-0-1.local:30710] [[61064,0],0] GOT 1 CPUS
[login-0-1.local:30710] [[61064,0],0] GOT 1 CPUS
[login-0-1.local:30710] [[61064,0],0] PROC [[61064,1],0] BITMAP 0-3,16-19
[login-0-1.local:30710] [[61064,0],0] BOUND PROC [[61064,1],0][c12-29] TO
socket 0[core 0[hwt 0-1]], socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt
0-1]], socket 0[core 3[hwt 0-1]]:
[BB/BB/BB/BB/../../../..][../../../../../../../..]
rank 0 @ compute-12-29.local 1, 2, 3, 4, 17, 18, 19, 20,
[login-0-1.local:30710] mca: base: close: component round_robin closed
[login-0-1.local:30710] mca: base: close: unloading component round_robin
[login-0-1.local:30710] mca: base: close: component rank_file closed
[login-0-1.local:30710] mca: base: close: unloading component rank_file
[login-0-1.local:30710] mca: base: close: component seq closed
[login-0-1.local:30710] mca: base: close: unloading component seq
[login-0-1.local:30710] mca: base: close: component resilient closed
[login-0-1.local:30710] mca: base: close: unloading component resilient
[login-0-1.local:30710] mca: base: close: component staged closed
[login-0-1.local:30710] mca: base: close: unloading component staged
[login-0-1.local:30710] mca: base: close: component mindist closed
[login-0-1.local:30710] mca: base: close: unloading component mindist
[login-0-1.local:30710] mca: base: close: component ppr closed
[login-0-1.local:30710] mca: base: close: unloading component ppr
On 10/09/2015 02:07 AM, Ralph Castain wrote:
Hi Marcin
Looking again at this: could you get a similar reservation again and rerun
mpirun with “-display-allocation” added to the command line? I’d like to see if
we are correctly parsing the number of slots assigned in the allocation
Ralph
On Oct 6, 2015, at 11:52 AM, marcin.krotkiewski <marcin.krotkiew...@gmail.com>
wrote:
Thank you both for your suggestion. I still cannot make this work though, and I
think - as Ralph predicted - most problems are likely related to
non-homogeneous mapping of cpus to jobs. But there is problems even before that
part..
If I reserve one entire compute node with SLURM:
salloc --ntasks=16 --tasks-per-node=16
I can run my code as you suggested with _any_ N (including odd numbers!).
OpenMPI will figure out the maximun number of tasks that fits and launch them.
This also works for many complete nodes, but this is the only case when I
managed to get it to work.
If I specify cpus per task, also allocating one full node
salloc --ntasks=4 --cpus-per-task=4 --tasks-per-node=4
things go astray:
mpirun --map-by slot:pe=4 ./affinity
rank 0 @ compute-1-6.local 0, 1, 2, 3, 16, 17, 18, 19,
Yes, only one MPI process was started. Running what Gilles previously suggested:
$ srun grep Cpus_allowed_list /proc/self/status
Cpus_allowed_list: 0-31
Cpus_allowed_list: 0-31
Cpus_allowed_list: 0-31
Cpus_allowed_list: 0-31
So the allocation seems fine. The SLURM environment is also correct, as far as
I can tell:
SLURM_CPUS_PER_TASK=4
SLURM_JOB_CPUS_PER_NODE=16
SLURM_JOB_NODELIST=c1-6
SLURM_JOB_NUM_NODES=1
SLURM_NNODES=1
SLURM_NODELIST=c1-6
SLURM_NPROCS=4
SLURM_NTASKS=4
SLURM_NTASKS_PER_NODE=4
SLURM_TASKS_PER_NODE=4
I do not understand why openmpi does not want to start more than 1 process. If
I try to force it (-n 4) I of course get an error:
mpirun --map-by slot:pe=4 -n 4 ./affinity
--------------------------------------------------------------------------
There are not enough slots available in the system to satisfy the 4 slots
that were requested by the application:
./affinity
Either request fewer slots for your application, or make more slots available
for use.
--------------------------------------------------------------------------
For clarity, I will not describe other cases / non-contiguous cpu sets /
heterogeneous nodes. Clearly something is wrong already with the simple ones..
Does anyone have any ideas? Should I record some logs to see what's going on?
Thanks a lot!
Marcin
On 10/06/2015 01:04 AM, tmish...@jcity.maeda.co.jp wrote:
Hi Ralph, it's been a long time.
The option "map-by core" does not work when pe=N > 1 is specified.
So, you should use "map-by slot:pe=N" as far as I remember.
Regards,
Tetsuya Mishima
2015/10/06 5:40:33、"users"さんは「Re: [OMPI users] Hybrid OpenMPI+OpenMP
tasks using SLURM」で書きました
Hmmm…okay, try -map-by socket:pe=4
We’ll still hit the asymmetric topology issue, but otherwise this should
work
On Oct 5, 2015, at 1:25 PM, marcin.krotkiewski
<marcin.krotkiew...@gmail.com> wrote:
Ralph,
Thank you for a fast response! Sounds very good, unfortunately I get an
error:
$ mpirun --map-by core:pe=4 ./affinity
--------------------------------------------------------------------------
A request for multiple cpus-per-proc was given, but a directive
was also give to map to an object level that cannot support that
directive.
Please specify a mapping level that has more than one cpu, or
else let us define a default mapping that will allow multiple
cpus-per-proc.
--------------------------------------------------------------------------
I have allocated my slurm job as
salloc --ntasks=2 --cpus-per-task=4
I have checked in 1.10.0 and 1.10.1rc1.
On 10/05/2015 09:58 PM, Ralph Castain wrote:
You would presently do:
mpirun —map-by core:pe=4
to get what you are seeking. If we don’t already set that qualifier
when we see “cpus_per_task”, then we probably should do so as there isn’t
any reason to make you set it twice (well, other than
trying to track which envar slurm is using now).
On Oct 5, 2015, at 12:38 PM, marcin.krotkiewski
<marcin.krotkiew...@gmail.com> wrote:
Yet another question about cpu binding under SLURM environment..
Short version: will OpenMPI support SLURM_CPUS_PER_TASK for the
purpose of cpu binding?
Full version: When you allocate a job like, e.g., this
salloc --ntasks=2 --cpus-per-task=4
SLURM will allocate 8 cores in total, 4 for each 'assumed' MPI tasks.
This is useful for hybrid jobs, where each MPI process spawns some internal
worker threads (e.g., OpenMP). The intention is
that there are 2 MPI procs started, each of them 'bound' to 4 cores.
SLURM will also set an environment variable
SLURM_CPUS_PER_TASK=4
which should (probably?) be taken into account by the method that
launches the MPI processes to figure out the cpuset. In case of OpenMPI +
mpirun I think something should happen in
orte/mca/ras/slurm/ras_slurm_module.c, where the variable _is_ actually
parsed. Unfortunately, it is never really used...
As a result, cpuset of all tasks started on a given compute node
includes all CPU cores of all MPI tasks on that node, just as provided by
SLURM (in the above example - 8). In general, there is
no simple way for the user code in the MPI procs to 'split' the cores
between themselves. I imagine the original intention to support this in
OpenMPI was something like
mpirun --bind-to subtask_cpuset
with an artificial bind target that would cause OpenMPI to divide the
allocated cores between the mpi tasks. Is this right? If so, it seems that
at this point this is not implemented. Is there
plans to do this? If no, does anyone know another way to achieve that?
Thanks a lot!
Marcin
_______________________________________________
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post:
http://www.open-mpi.org/community/lists/users/2015/10/27803.php
_______________________________________________
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post:
http://www.open-mpi.org/community/lists/users/2015/10/27804.php
_______________________________________________
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post:
http://www.open-mpi.org/community/lists/users/2015/10/27805.php
_______________________________________________
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/usersLink to
this post: http://www.open-mpi.org/community/lists/users/2015/10/27806.php
_______________________________________________
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post:
http://www.open-mpi.org/community/lists/users/2015/10/27809.php
_______________________________________________
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post:
http://www.open-mpi.org/community/lists/users/2015/10/27817.php
_______________________________________________
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post:
http://www.open-mpi.org/community/lists/users/2015/10/27851.php
_______________________________________________
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post:
http://www.open-mpi.org/community/lists/users/2015/10/27857.php