Hi, Am 15.11.2010 um 13:13 schrieb Chris Jewell:
> Okay so I tried what you suggested. You essentially get the requested number > of bound cores on each execution node, so if I use > > $ qsub -pe openmpi 8 -binding linear:2 <myscript.com> > > then I get 2 bound cores per node, irrespective of the number of slots (and > hence parallel) processes allocated by GE. This is irrespective of which > setting I use for the allocation_rule. but it should work fine with an "allocation_rule 2" then. > My aim with this was to deal with badly behaved multithreaded algorithms Yep, this causes sometimes the overloading of a machine. When I know that I want to compile a parallel Open MPI application, I use non-threaded versions of ATLAS, MKL or other libraries. > which end up spreading across more cores on an execution node than the number > of GE-allocated slots (thereby interfering with other GE scheduled tasks > running on the same exec node). By binding a process to one or more cores, > one can "box in" processes and prevent them from spawning erroneous > sub-processes and threads. Unfortunately, the above solution sets the same > core binding for each execution node to be the same. > >> From exploring the software (both OpenMPI and GE) further, I have two >> comments: > > 1) The core binding feature in GE appears to apply the requested core-binding > topology to every execution node involved in a parallel job, rather than > assuming that the topology requested is *per parallel process*. So, if I > request 'qsub -pe mpi 8 -binding linear:1 <myscript.com>' with the intention > of getting each of the 8 parallel processes to be bound to 1 core, I actually > get all processes associated with the job_id on one exec node bound to 1 > core. Oops! > > 2) OpenMPI has its own core-binding feature (-mca mpi_paffinity_alone 1) > which works well to bind each parallel process to one processor. > Unfortunately, the binding framework (hwloc) is different to that which GE > uses (PLPA), resulting in binding overlaps between GE-bound tasks (eg serial > and smp jobs) and OpenMPI-bound processes (ie my mpi jobs). Again, oops ;-) > If, indeed, it is not possible currently to implement this type of > core-binding in tightly integrated OpenMPI/GE, then a solution might lie in a > custom script run in the parallel environment's 'start proc args'. This > script would have to find out which slots are allocated where on the cluster, > and write an OpenMPI rankfile. Exactly this should work. If you use "binding_instance" "pe" and reformat the information in the $PE_HOSTFILE to a "rankfile", it should work to get the desired allocation. Maybe you can share the script with this list once you got it working. -- Reuti