Ralph,
Here is the result running
mpirun --map-by slot:pe=4 -display-allocation ./affinity
====================== ALLOCATED NODES ======================
c12-29: slots=4 max_slots=0 slots_inuse=0 state=UP
=================================================================
rank 0 @ compute-12-29.local 1, 2, 3, 4, 17, 18, 19, 20,
I also attach output with --mca rmaps_base_verbose 10. It says 4 slots
all over the place, so it is really weird it should not work.
Thanks!
Marcin
[login-0-1.local:30710] mca: base: components_register: registering
rmaps components
[login-0-1.local:30710] mca: base: components_register: found loaded
component round_robin
[login-0-1.local:30710] mca: base: components_register: component
round_robin register function successful
[login-0-1.local:30710] mca: base: components_register: found loaded
component rank_file
[login-0-1.local:30710] mca: base: components_register: component
rank_file register function successful
[login-0-1.local:30710] mca: base: components_register: found loaded
component seq
[login-0-1.local:30710] mca: base: components_register: component seq
register function successful
[login-0-1.local:30710] mca: base: components_register: found loaded
component resilient
[login-0-1.local:30710] mca: base: components_register: component
resilient register function successful
[login-0-1.local:30710] mca: base: components_register: found loaded
component staged
[login-0-1.local:30710] mca: base: components_register: component staged
has no register or open function
[login-0-1.local:30710] mca: base: components_register: found loaded
component mindist
[login-0-1.local:30710] mca: base: components_register: component
mindist register function successful
[login-0-1.local:30710] mca: base: components_register: found loaded
component ppr
[login-0-1.local:30710] mca: base: components_register: component ppr
register function successful
[login-0-1.local:30710] [[61064,0],0] rmaps:base set policy with slot:pe=4
[login-0-1.local:30710] [[61064,0],0] rmaps:base policy slot modifiers
pe=4 provided
[login-0-1.local:30710] [[61064,0],0] rmaps:base check modifiers with pe=4
[login-0-1.local:30710] [[61064,0],0] rmaps:base setting pe/rank to 4
[login-0-1.local:30710] mca: base: components_open: opening rmaps components
[login-0-1.local:30710] mca: base: components_open: found loaded
component round_robin
[login-0-1.local:30710] mca: base: components_open: component
round_robin open function successful
[login-0-1.local:30710] mca: base: components_open: found loaded
component rank_file
[login-0-1.local:30710] mca: base: components_open: component rank_file
open function successful
[login-0-1.local:30710] mca: base: components_open: found loaded
component seq
[login-0-1.local:30710] mca: base: components_open: component seq open
function successful
[login-0-1.local:30710] mca: base: components_open: found loaded
component resilient
[login-0-1.local:30710] mca: base: components_open: component resilient
open function successful
[login-0-1.local:30710] mca: base: components_open: found loaded
component staged
[login-0-1.local:30710] mca: base: components_open: component staged
open function successful
[login-0-1.local:30710] mca: base: components_open: found loaded
component mindist
[login-0-1.local:30710] mca: base: components_open: component mindist
open function successful
[login-0-1.local:30710] mca: base: components_open: found loaded
component ppr
[login-0-1.local:30710] mca: base: components_open: component ppr open
function successful
[login-0-1.local:30710] mca:rmaps:select: checking available component
round_robin
[login-0-1.local:30710] mca:rmaps:select: Querying component [round_robin]
[login-0-1.local:30710] mca:rmaps:select: checking available component
rank_file
[login-0-1.local:30710] mca:rmaps:select: Querying component [rank_file]
[login-0-1.local:30710] mca:rmaps:select: checking available component seq
[login-0-1.local:30710] mca:rmaps:select: Querying component [seq]
[login-0-1.local:30710] mca:rmaps:select: checking available component
resilient
[login-0-1.local:30710] mca:rmaps:select: Querying component [resilient]
[login-0-1.local:30710] mca:rmaps:select: checking available component
staged
[login-0-1.local:30710] mca:rmaps:select: Querying component [staged]
[login-0-1.local:30710] mca:rmaps:select: checking available component
mindist
[login-0-1.local:30710] mca:rmaps:select: Querying component [mindist]
[login-0-1.local:30710] mca:rmaps:select: checking available component ppr
[login-0-1.local:30710] mca:rmaps:select: Querying component [ppr]
[login-0-1.local:30710] [[61064,0],0]: Final mapper priorities
[login-0-1.local:30710] Mapper: ppr Priority: 90
[login-0-1.local:30710] Mapper: seq Priority: 60
[login-0-1.local:30710] Mapper: resilient Priority: 40
[login-0-1.local:30710] Mapper: mindist Priority: 20
[login-0-1.local:30710] Mapper: round_robin Priority: 10
[login-0-1.local:30710] Mapper: staged Priority: 5
[login-0-1.local:30710] Mapper: rank_file Priority: 0
====================== ALLOCATED NODES ======================
c12-29: slots=4 max_slots=0 slots_inuse=0 state=UP
=================================================================
[login-0-1.local:30710] mca:rmaps: mapping job [61064,1]
[login-0-1.local:30710] mca:rmaps: creating new map for job [61064,1]
[login-0-1.local:30710] AVAILABLE NODES FOR MAPPING:
[login-0-1.local:30710] node: c12-29 daemon: 1
[login-0-1.local:30710] mca:rmaps: nprocs 4
[login-0-1.local:30710] mca:rmaps mapping given - using default
[login-0-1.local:30710] mca:rmaps:ppr: job [61064,1] not using ppr mapper
[login-0-1.local:30710] mca:rmaps:seq: job [61064,1] not using seq mapper
[login-0-1.local:30710] mca:rmaps:resilient: cannot perform initial map
of job [61064,1] - no fault groups
[login-0-1.local:30710] mca:rmaps:mindist: job [61064,1] not using
mindist mapper
[login-0-1.local:30710] mca:rmaps:rr: mapping job [61064,1]
[login-0-1.local:30710] AVAILABLE NODES FOR MAPPING:
[login-0-1.local:30710] node: c12-29 daemon: 1
[login-0-1.local:30710] mca:rmaps:rr: mapping by slot for job [61064,1]
slots 4 num_procs 1
[login-0-1.local:30710] mca:rmaps:rr:slot working node c12-29
[login-0-1.local:30710] mca:rmaps:rr:slot assigning 1 procs to node c12-29
[login-0-1.local:30710] mca:rmaps:base: computing vpids by slot for job
[61064,1]
[login-0-1.local:30710] mca:rmaps:base: assigning rank 0 to node c12-29
[login-0-1.local:30710] mca:rmaps: compute bindings for job [61064,1]
with policy CORE:IF-SUPPORTED[5008]
[login-0-1.local:30710] [[61064,0],0] reset_usage: node c12-29 has 1
procs on it
[login-0-1.local:30710] [[61064,0],0] reset_usage: ignoring proc
[[61064,1],0]
[login-0-1.local:30710] [[61064,0],0] bind_depth: 6 map_depth 0
[login-0-1.local:30710] mca:rmaps: bind downward for job [61064,1] with
bindings CORE:IF-SUPPORTED
[login-0-1.local:30710] [[61064,0],0] GOT 1 CPUS
[login-0-1.local:30710] [[61064,0],0] GOT 1 CPUS
[login-0-1.local:30710] [[61064,0],0] GOT 1 CPUS
[login-0-1.local:30710] [[61064,0],0] GOT 1 CPUS
[login-0-1.local:30710] [[61064,0],0] PROC [[61064,1],0] BITMAP 0-3,16-19
[login-0-1.local:30710] [[61064,0],0] BOUND PROC [[61064,1],0][c12-29]
TO socket 0[core 0[hwt 0-1]], socket 0[core 1[hwt 0-1]], socket 0[core
2[hwt 0-1]], socket 0[core 3[hwt 0-1]]:
[BB/BB/BB/BB/../../../..][../../../../../../../..]
rank 0 @ compute-12-29.local 1, 2, 3, 4, 17, 18, 19, 20,
[login-0-1.local:30710] mca: base: close: component round_robin closed
[login-0-1.local:30710] mca: base: close: unloading component round_robin
[login-0-1.local:30710] mca: base: close: component rank_file closed
[login-0-1.local:30710] mca: base: close: unloading component rank_file
[login-0-1.local:30710] mca: base: close: component seq closed
[login-0-1.local:30710] mca: base: close: unloading component seq
[login-0-1.local:30710] mca: base: close: component resilient closed
[login-0-1.local:30710] mca: base: close: unloading component resilient
[login-0-1.local:30710] mca: base: close: component staged closed
[login-0-1.local:30710] mca: base: close: unloading component staged
[login-0-1.local:30710] mca: base: close: component mindist closed
[login-0-1.local:30710] mca: base: close: unloading component mindist
[login-0-1.local:30710] mca: base: close: component ppr closed
[login-0-1.local:30710] mca: base: close: unloading component ppr
On 10/09/2015 02:07 AM, Ralph Castain wrote:
Hi Marcin
Looking again at this: could you get a similar reservation again and rerun
mpirun with “-display-allocation” added to the command line? I’d like to see if
we are correctly parsing the number of slots assigned in the allocation
Ralph
On Oct 6, 2015, at 11:52 AM, marcin.krotkiewski <marcin.krotkiew...@gmail.com>
wrote:
Thank you both for your suggestion. I still cannot make this work though, and I
think - as Ralph predicted - most problems are likely related to
non-homogeneous mapping of cpus to jobs. But there is problems even before that
part..
If I reserve one entire compute node with SLURM:
salloc --ntasks=16 --tasks-per-node=16
I can run my code as you suggested with _any_ N (including odd numbers!).
OpenMPI will figure out the maximun number of tasks that fits and launch them.
This also works for many complete nodes, but this is the only case when I
managed to get it to work.
If I specify cpus per task, also allocating one full node
salloc --ntasks=4 --cpus-per-task=4 --tasks-per-node=4
things go astray:
mpirun --map-by slot:pe=4 ./affinity
rank 0 @ compute-1-6.local 0, 1, 2, 3, 16, 17, 18, 19,
Yes, only one MPI process was started. Running what Gilles previously suggested:
$ srun grep Cpus_allowed_list /proc/self/status
Cpus_allowed_list: 0-31
Cpus_allowed_list: 0-31
Cpus_allowed_list: 0-31
Cpus_allowed_list: 0-31
So the allocation seems fine. The SLURM environment is also correct, as far as
I can tell:
SLURM_CPUS_PER_TASK=4
SLURM_JOB_CPUS_PER_NODE=16
SLURM_JOB_NODELIST=c1-6
SLURM_JOB_NUM_NODES=1
SLURM_NNODES=1
SLURM_NODELIST=c1-6
SLURM_NPROCS=4
SLURM_NTASKS=4
SLURM_NTASKS_PER_NODE=4
SLURM_TASKS_PER_NODE=4
I do not understand why openmpi does not want to start more than 1 process. If
I try to force it (-n 4) I of course get an error:
mpirun --map-by slot:pe=4 -n 4 ./affinity
--------------------------------------------------------------------------
There are not enough slots available in the system to satisfy the 4 slots
that were requested by the application:
./affinity
Either request fewer slots for your application, or make more slots available
for use.
--------------------------------------------------------------------------
For clarity, I will not describe other cases / non-contiguous cpu sets /
heterogeneous nodes. Clearly something is wrong already with the simple ones..
Does anyone have any ideas? Should I record some logs to see what's going on?
Thanks a lot!
Marcin
On 10/06/2015 01:04 AM, tmish...@jcity.maeda.co.jp wrote:
Hi Ralph, it's been a long time.
The option "map-by core" does not work when pe=N > 1 is specified.
So, you should use "map-by slot:pe=N" as far as I remember.
Regards,
Tetsuya Mishima
2015/10/06 5:40:33、"users"さんは「Re: [OMPI users] Hybrid OpenMPI+OpenMP
tasks using SLURM」で書きました
Hmmm…okay, try -map-by socket:pe=4
We’ll still hit the asymmetric topology issue, but otherwise this should
work
On Oct 5, 2015, at 1:25 PM, marcin.krotkiewski
<marcin.krotkiew...@gmail.com> wrote:
Ralph,
Thank you for a fast response! Sounds very good, unfortunately I get an
error:
$ mpirun --map-by core:pe=4 ./affinity
--------------------------------------------------------------------------
A request for multiple cpus-per-proc was given, but a directive
was also give to map to an object level that cannot support that
directive.
Please specify a mapping level that has more than one cpu, or
else let us define a default mapping that will allow multiple
cpus-per-proc.
--------------------------------------------------------------------------
I have allocated my slurm job as
salloc --ntasks=2 --cpus-per-task=4
I have checked in 1.10.0 and 1.10.1rc1.
On 10/05/2015 09:58 PM, Ralph Castain wrote:
You would presently do:
mpirun —map-by core:pe=4
to get what you are seeking. If we don’t already set that qualifier
when we see “cpus_per_task”, then we probably should do so as there isn’t
any reason to make you set it twice (well, other than
trying to track which envar slurm is using now).
On Oct 5, 2015, at 12:38 PM, marcin.krotkiewski
<marcin.krotkiew...@gmail.com> wrote:
Yet another question about cpu binding under SLURM environment..
Short version: will OpenMPI support SLURM_CPUS_PER_TASK for the
purpose of cpu binding?
Full version: When you allocate a job like, e.g., this
salloc --ntasks=2 --cpus-per-task=4
SLURM will allocate 8 cores in total, 4 for each 'assumed' MPI tasks.
This is useful for hybrid jobs, where each MPI process spawns some internal
worker threads (e.g., OpenMP). The intention is
that there are 2 MPI procs started, each of them 'bound' to 4 cores.
SLURM will also set an environment variable
SLURM_CPUS_PER_TASK=4
which should (probably?) be taken into account by the method that
launches the MPI processes to figure out the cpuset. In case of OpenMPI +
mpirun I think something should happen in
orte/mca/ras/slurm/ras_slurm_module.c, where the variable _is_ actually
parsed. Unfortunately, it is never really used...
As a result, cpuset of all tasks started on a given compute node
includes all CPU cores of all MPI tasks on that node, just as provided by
SLURM (in the above example - 8). In general, there is
no simple way for the user code in the MPI procs to 'split' the cores
between themselves. I imagine the original intention to support this in
OpenMPI was something like
mpirun --bind-to subtask_cpuset
with an artificial bind target that would cause OpenMPI to divide the
allocated cores between the mpi tasks. Is this right? If so, it seems that
at this point this is not implemented. Is there
plans to do this? If no, does anyone know another way to achieve that?
Thanks a lot!
Marcin
_______________________________________________
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post:
http://www.open-mpi.org/community/lists/users/2015/10/27803.php
_______________________________________________
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post:
http://www.open-mpi.org/community/lists/users/2015/10/27804.php
_______________________________________________
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post:
http://www.open-mpi.org/community/lists/users/2015/10/27805.php
_______________________________________________
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/usersLink to
this post: http://www.open-mpi.org/community/lists/users/2015/10/27806.php
_______________________________________________
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post:
http://www.open-mpi.org/community/lists/users/2015/10/27809.php
_______________________________________________
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post:
http://www.open-mpi.org/community/lists/users/2015/10/27817.php
_______________________________________________
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post:
http://www.open-mpi.org/community/lists/users/2015/10/27851.php