Hi All,

I just got the same behaviour with old Torque (2.5, uses cpusets) we have
and OpenMPI 1.10.0; when --bind-to core is set, occasionally (not always)
it fails 

Open MPI tried to bind a new process, but something went wrong.  The
process was killed without launching the target application.  Your job
will now abort.

  Local host:        nXXX
  Application name:
/global/software/espresso-5.2.1-intel14-ompi110/bin/pw.x
  Error message:     hwloc_set_cpubind returned "Error" for bitmap "0"
  Location:        
../../../../../openmpi-1.10.0/orte/mca/odls/default/odls_default_module.c:5
51



-- 
Grigory Shamov

Westgrid/ComputeCanada Site Lead
University of Manitoba
E2-588 EITC Building,
(204) 474-9625






On 15-10-02 10:25 AM, "users on behalf of Marcin Krotkiewski"
<users-boun...@open-mpi.org on behalf of marcin.krotkiew...@gmail.com>
wrote:

>Hi,
>
>I fail to make OpenMPI bind to cores correctly when running from within
>SLURM-allocated CPU resources spread over a range of compute nodes in an
>otherwise homogeneous cluster. I have found this thread
>
>http://www.open-mpi.org/community/lists/users/2014/06/24682.php
>
>and did try to use what Ralph suggested there (--hetero-nodes), but it
>does not work (v. 1.10.0). When running with --report-bindings I get
>messages like
>
>[compute-9-11.local:27571] MCW rank 10 is not bound (or bound to all
>available processors)
>
>for all ranks outside of my first physical compute node. Moreover,
>everything works as expected if I ask SLURM to assign entire compute
>nodes. So it does look like Ralph's diagnose presented in that thread is
>correct, just the --hetero-nodes switch does not work for me.
>
>I have written a short code that uses sched_getaffinity to print the
>effective bindings: all MPI ranks except of those on the first node are
>bound to all CPU cores allocated by SLURM.
>
>Do I have to do something except of --hetero-nodes, or is this a problem
>that needs further investigation?
>
>Thanks a lot!
>
>Marcin
>
>_______________________________________________
>users mailing list
>us...@open-mpi.org
>Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>Link to this post:
>http://www.open-mpi.org/community/lists/users/2015/10/27770.php

Reply via email to