Sorry, I am still trying to grok all your email as what the problem you are trying to solve. So is the issue is trying to have two jobs having processes on the same node be able to bind there processes on different resources. Like core 1 for the first job and core 2 and 3 for the 2nd job?

--td

On 11/15/2010 09:29 AM, Chris Jewell wrote:
Hi,

If, indeed, it is not possible currently to implement this type of core-binding 
in tightly integrated OpenMPI/GE, then a solution might lie in a custom script 
run in the parallel environment's 'start proc args'. This script would have to 
find out which slots are allocated where on the cluster, and write an OpenMPI 
rankfile.
Exactly this should work.

If you use "binding_instance" "pe" and reformat the information in the $PE_HOSTFILE to a 
"rankfile", it should work to get the desired allocation. Maybe you can share the script with this 
list once you got it working.

As far as I can see, that's not going to work.  This is because, exactly like 
"binding_instance" "set", for -binding pe linear:n you get n cores bound per 
node.  This is easily verifiable by using a long job and examining the pe_hostfile.  For example, I 
submit a job with:

$ qsub -pe mpi 8 -binding pe linear:1 myScript.com

and my pe_hostfile looks like:

exec6.cluster.stats.local 2 batch.q@exec6.cluster.stats.local 0,1
exec1.cluster.stats.local 1 batch.q@exec1.cluster.stats.local 0,1
exec7.cluster.stats.local 1 batch.q@exec7.cluster.stats.local 0,1
exec5.cluster.stats.local 1 batch.q@exec5.cluster.stats.local 0,1
exec4.cluster.stats.local 1 batch.q@exec4.cluster.stats.local 0,1
exec3.cluster.stats.local 1 batch.q@exec3.cluster.stats.local 0,1
exec2.cluster.stats.local 1 batch.q@exec2.cluster.stats.local 0,1

Notice that, because I have specified the -binding pe linear:1, each execution 
node binds processes for the job_id to one core.  If I have -binding pe 
linear:2, I get:

exec6.cluster.stats.local 2 batch.q@exec6.cluster.stats.local 0,1:0,2
exec1.cluster.stats.local 1 batch.q@exec1.cluster.stats.local 0,1:0,2
exec7.cluster.stats.local 1 batch.q@exec7.cluster.stats.local 0,1:0,2
exec4.cluster.stats.local 1 batch.q@exec4.cluster.stats.local 0,1:0,2
exec3.cluster.stats.local 1 batch.q@exec3.cluster.stats.local 0,1:0,2
exec2.cluster.stats.local 1 batch.q@exec2.cluster.stats.local 0,1:0,2
exec5.cluster.stats.local 1 batch.q@exec5.cluster.stats.local 0,1:0,2

So the pe_hostfile still doesn't give an accurate representation of the binding 
allocation for use by OpenMPI.  Question: is there a system file or command that I could 
use to check which processors are "occupied"?

Chris

--
Dr Chris Jewell
Department of Statistics
University of Warwick
Coventry
CV4 7AL
UK
Tel: +44 (0)24 7615 0778






_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


--
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com <mailto:terry.don...@oracle.com>



Reply via email to