Am 15.11.2010 um 15:29 schrieb Chris Jewell: > Hi, > >>> If, indeed, it is not possible currently to implement this type of >>> core-binding in tightly integrated OpenMPI/GE, then a solution might lie in >>> a custom script run in the parallel environment's 'start proc args'. This >>> script would have to find out which slots are allocated where on the >>> cluster, and write an OpenMPI rankfile. >> >> Exactly this should work. >> >> If you use "binding_instance" "pe" and reformat the information in the >> $PE_HOSTFILE to a "rankfile", it should work to get the desired allocation. >> Maybe you can share the script with this list once you got it working. > > > As far as I can see, that's not going to work. This is because, exactly like > "binding_instance" "set", for -binding pe linear:n you get n cores bound per > node. This is easily verifiable by using a long job and examining the > pe_hostfile. For example, I submit a job with: > > $ qsub -pe mpi 8 -binding pe linear:1 myScript.com > > and my pe_hostfile looks like: > > exec6.cluster.stats.local 2 batch.q@exec6.cluster.stats.local 0,1 > exec1.cluster.stats.local 1 batch.q@exec1.cluster.stats.local 0,1 > exec7.cluster.stats.local 1 batch.q@exec7.cluster.stats.local 0,1 > exec5.cluster.stats.local 1 batch.q@exec5.cluster.stats.local 0,1 > exec4.cluster.stats.local 1 batch.q@exec4.cluster.stats.local 0,1 > exec3.cluster.stats.local 1 batch.q@exec3.cluster.stats.local 0,1 > exec2.cluster.stats.local 1 batch.q@exec2.cluster.stats.local 0,1 > > Notice that, because I have specified the -binding pe linear:1, each > execution node binds processes for the job_id to one core. If I have > -binding pe linear:2, I get: > > exec6.cluster.stats.local 2 batch.q@exec6.cluster.stats.local 0,1:0,2
So the cores 1 and 2 on socket 0 aren't free? -- Reuti > exec1.cluster.stats.local 1 batch.q@exec1.cluster.stats.local 0,1:0,2 > exec7.cluster.stats.local 1 batch.q@exec7.cluster.stats.local 0,1:0,2 > exec4.cluster.stats.local 1 batch.q@exec4.cluster.stats.local 0,1:0,2 > exec3.cluster.stats.local 1 batch.q@exec3.cluster.stats.local 0,1:0,2 > exec2.cluster.stats.local 1 batch.q@exec2.cluster.stats.local 0,1:0,2 > exec5.cluster.stats.local 1 batch.q@exec5.cluster.stats.local 0,1:0,2 > > So the pe_hostfile still doesn't give an accurate representation of the > binding allocation for use by OpenMPI. Question: is there a system file or > command that I could use to check which processors are "occupied"? > > Chris > > -- > Dr Chris Jewell > Department of Statistics > University of Warwick > Coventry > CV4 7AL > UK > Tel: +44 (0)24 7615 0778 > > > > > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users