Am 15.11.2010 um 15:29 schrieb Chris Jewell:

> Hi,
> 
>>> If, indeed, it is not possible currently to implement this type of 
>>> core-binding in tightly integrated OpenMPI/GE, then a solution might lie in 
>>> a custom script run in the parallel environment's 'start proc args'. This 
>>> script would have to find out which slots are allocated where on the 
>>> cluster, and write an OpenMPI rankfile. 
>> 
>> Exactly this should work. 
>> 
>> If you use "binding_instance" "pe" and reformat the information in the 
>> $PE_HOSTFILE to a "rankfile", it should work to get the desired allocation. 
>> Maybe you can share the script with this list once you got it working. 
> 
> 
> As far as I can see, that's not going to work.  This is because, exactly like 
> "binding_instance" "set", for -binding pe linear:n you get n cores bound per 
> node.  This is easily verifiable by using a long job and examining the 
> pe_hostfile.  For example, I submit a job with:
> 
> $ qsub -pe mpi 8 -binding pe linear:1 myScript.com
> 
> and my pe_hostfile looks like:
> 
> exec6.cluster.stats.local 2 batch.q@exec6.cluster.stats.local 0,1
> exec1.cluster.stats.local 1 batch.q@exec1.cluster.stats.local 0,1
> exec7.cluster.stats.local 1 batch.q@exec7.cluster.stats.local 0,1
> exec5.cluster.stats.local 1 batch.q@exec5.cluster.stats.local 0,1
> exec4.cluster.stats.local 1 batch.q@exec4.cluster.stats.local 0,1
> exec3.cluster.stats.local 1 batch.q@exec3.cluster.stats.local 0,1
> exec2.cluster.stats.local 1 batch.q@exec2.cluster.stats.local 0,1
> 
> Notice that, because I have specified the -binding pe linear:1, each 
> execution node binds processes for the job_id to one core.  If I have 
> -binding pe linear:2, I get:
> 
> exec6.cluster.stats.local 2 batch.q@exec6.cluster.stats.local 0,1:0,2

So the cores 1 and 2 on socket 0 aren't free?

-- Reuti


> exec1.cluster.stats.local 1 batch.q@exec1.cluster.stats.local 0,1:0,2
> exec7.cluster.stats.local 1 batch.q@exec7.cluster.stats.local 0,1:0,2
> exec4.cluster.stats.local 1 batch.q@exec4.cluster.stats.local 0,1:0,2
> exec3.cluster.stats.local 1 batch.q@exec3.cluster.stats.local 0,1:0,2
> exec2.cluster.stats.local 1 batch.q@exec2.cluster.stats.local 0,1:0,2
> exec5.cluster.stats.local 1 batch.q@exec5.cluster.stats.local 0,1:0,2
> 
> So the pe_hostfile still doesn't give an accurate representation of the 
> binding allocation for use by OpenMPI.  Question: is there a system file or 
> command that I could use to check which processors are "occupied"?
> 
> Chris
> 
> --
> Dr Chris Jewell
> Department of Statistics
> University of Warwick
> Coventry
> CV4 7AL
> UK
> Tel: +44 (0)24 7615 0778
> 
> 
> 
> 
> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


Reply via email to