Sorry, I am still trying to grok all your email as what the problem you
are trying to solve. So is the issue is trying to have two jobs having
processes on the same node be able to bind there processes on different
resources. Like core 1 for the first job and core 2 and 3 for the 2nd job?
--td
On 11/15/2010 09:29 AM, Chris Jewell wrote:
Hi,
If, indeed, it is not possible currently to implement this type of core-binding
in tightly integrated OpenMPI/GE, then a solution might lie in a custom script
run in the parallel environment's 'start proc args'. This script would have to
find out which slots are allocated where on the cluster, and write an OpenMPI
rankfile.
Exactly this should work.
If you use "binding_instance" "pe" and reformat the information in the $PE_HOSTFILE to a
"rankfile", it should work to get the desired allocation. Maybe you can share the script with this
list once you got it working.
As far as I can see, that's not going to work. This is because, exactly like
"binding_instance" "set", for -binding pe linear:n you get n cores bound per
node. This is easily verifiable by using a long job and examining the pe_hostfile. For example, I
submit a job with:
$ qsub -pe mpi 8 -binding pe linear:1 myScript.com
and my pe_hostfile looks like:
exec6.cluster.stats.local 2 batch.q@exec6.cluster.stats.local 0,1
exec1.cluster.stats.local 1 batch.q@exec1.cluster.stats.local 0,1
exec7.cluster.stats.local 1 batch.q@exec7.cluster.stats.local 0,1
exec5.cluster.stats.local 1 batch.q@exec5.cluster.stats.local 0,1
exec4.cluster.stats.local 1 batch.q@exec4.cluster.stats.local 0,1
exec3.cluster.stats.local 1 batch.q@exec3.cluster.stats.local 0,1
exec2.cluster.stats.local 1 batch.q@exec2.cluster.stats.local 0,1
Notice that, because I have specified the -binding pe linear:1, each execution
node binds processes for the job_id to one core. If I have -binding pe
linear:2, I get:
exec6.cluster.stats.local 2 batch.q@exec6.cluster.stats.local 0,1:0,2
exec1.cluster.stats.local 1 batch.q@exec1.cluster.stats.local 0,1:0,2
exec7.cluster.stats.local 1 batch.q@exec7.cluster.stats.local 0,1:0,2
exec4.cluster.stats.local 1 batch.q@exec4.cluster.stats.local 0,1:0,2
exec3.cluster.stats.local 1 batch.q@exec3.cluster.stats.local 0,1:0,2
exec2.cluster.stats.local 1 batch.q@exec2.cluster.stats.local 0,1:0,2
exec5.cluster.stats.local 1 batch.q@exec5.cluster.stats.local 0,1:0,2
So the pe_hostfile still doesn't give an accurate representation of the binding
allocation for use by OpenMPI. Question: is there a system file or command that I could
use to check which processors are "occupied"?
Chris
--
Dr Chris Jewell
Department of Statistics
University of Warwick
Coventry
CV4 7AL
UK
Tel: +44 (0)24 7615 0778
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
--
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com <mailto:terry.don...@oracle.com>