Hi,

Am 15.11.2010 um 13:13 schrieb Chris Jewell:

> Okay so I tried what you suggested.  You essentially get the requested number 
> of bound cores on each execution node, so if I use
> 
> $ qsub -pe openmpi 8 -binding linear:2 <myscript.com>
> 
> then I get 2 bound cores per node, irrespective of the number of slots (and 
> hence parallel) processes allocated by GE.  This is irrespective of which 
> setting I use for the allocation_rule.

but it should work fine with an "allocation_rule 2" then.


> My aim with this was to deal with badly behaved multithreaded algorithms

Yep, this causes sometimes the overloading of a machine. When I know that I 
want to compile a parallel Open MPI application, I use non-threaded versions of 
ATLAS, MKL or other libraries.


> which end up spreading across more cores on an execution node than the number 
> of GE-allocated slots (thereby interfering with other GE scheduled tasks 
> running on the same exec node).  By binding a process to one or more cores, 
> one can "box in" processes and prevent them from spawning erroneous 
> sub-processes and threads.  Unfortunately, the above solution sets the same 
> core binding for each execution node to be the same.
> 
>> From exploring the software (both OpenMPI and GE) further, I have two 
>> comments:
> 
> 1) The core binding feature in GE appears to apply the requested core-binding 
> topology to every execution node involved in a parallel job, rather than 
> assuming that the topology requested is *per parallel process*.  So, if I 
> request 'qsub -pe mpi 8 -binding linear:1 <myscript.com>' with the intention 
> of getting each of the 8 parallel processes to be bound to 1 core, I actually 
> get all processes associated with the job_id on one exec node bound to 1 
> core.  Oops!
> 
> 2) OpenMPI has its own core-binding feature (-mca mpi_paffinity_alone 1) 
> which works well to bind each parallel process to one processor.  
> Unfortunately, the binding framework (hwloc) is different to that which GE 
> uses (PLPA), resulting in binding overlaps between GE-bound tasks (eg serial 
> and smp jobs) and OpenMPI-bound processes (ie my mpi jobs).  Again, oops ;-)

> If, indeed, it is not possible currently to implement this type of 
> core-binding in tightly integrated OpenMPI/GE, then a solution might lie in a 
> custom script run in the parallel environment's 'start proc args'.  This 
> script would have to find out which slots are allocated where on the cluster, 
> and write an OpenMPI rankfile.

Exactly this should work.

If you use "binding_instance" "pe" and reformat the information in the 
$PE_HOSTFILE to a "rankfile", it should work to get the desired allocation. 
Maybe you can share the script with this list once you got it working.

-- Reuti

Reply via email to