Hi, > > If, indeed, it is not possible currently to implement this type of > > core-binding in tightly integrated OpenMPI/GE, then a solution might lie in > > a custom script run in the parallel environment's 'start proc args'. This > > script would have to find out which slots are allocated where on the > > cluster, and write an OpenMPI rankfile. > > Exactly this should work. > > If you use "binding_instance" "pe" and reformat the information in the > $PE_HOSTFILE to a "rankfile", it should work to get the desired allocation. > Maybe you can share the script with this list once you got it working.
As far as I can see, that's not going to work. This is because, exactly like "binding_instance" "set", for -binding pe linear:n you get n cores bound per node. This is easily verifiable by using a long job and examining the pe_hostfile. For example, I submit a job with: $ qsub -pe mpi 8 -binding pe linear:1 myScript.com and my pe_hostfile looks like: exec6.cluster.stats.local 2 batch.q@exec6.cluster.stats.local 0,1 exec1.cluster.stats.local 1 batch.q@exec1.cluster.stats.local 0,1 exec7.cluster.stats.local 1 batch.q@exec7.cluster.stats.local 0,1 exec5.cluster.stats.local 1 batch.q@exec5.cluster.stats.local 0,1 exec4.cluster.stats.local 1 batch.q@exec4.cluster.stats.local 0,1 exec3.cluster.stats.local 1 batch.q@exec3.cluster.stats.local 0,1 exec2.cluster.stats.local 1 batch.q@exec2.cluster.stats.local 0,1 Notice that, because I have specified the -binding pe linear:1, each execution node binds processes for the job_id to one core. If I have -binding pe linear:2, I get: exec6.cluster.stats.local 2 batch.q@exec6.cluster.stats.local 0,1:0,2 exec1.cluster.stats.local 1 batch.q@exec1.cluster.stats.local 0,1:0,2 exec7.cluster.stats.local 1 batch.q@exec7.cluster.stats.local 0,1:0,2 exec4.cluster.stats.local 1 batch.q@exec4.cluster.stats.local 0,1:0,2 exec3.cluster.stats.local 1 batch.q@exec3.cluster.stats.local 0,1:0,2 exec2.cluster.stats.local 1 batch.q@exec2.cluster.stats.local 0,1:0,2 exec5.cluster.stats.local 1 batch.q@exec5.cluster.stats.local 0,1:0,2 So the pe_hostfile still doesn't give an accurate representation of the binding allocation for use by OpenMPI. Question: is there a system file or command that I could use to check which processors are "occupied"? Chris -- Dr Chris Jewell Department of Statistics University of Warwick Coventry CV4 7AL UK Tel: +44 (0)24 7615 0778