> 
>> Perhaps if someone could run this test again with --report-bindings 
>> --leave-session-attached and provide -all- output we could verify that 
>> analysis and clear up the confusion?
>> 
> Yeah, however I bet you we still won't see output.

Actually, it seems we do get more output!  Results of 'qsub -pe mpi 8 -binding 
linear:2 myScript.com'

with

'mpirun -mca ras_gridengine_verbose 100 -report-bindings 
--leave-session-attached -bycore -bind-to-core ./unterm'

[exec1:06504] System has detected external process binding to cores 0028
[exec1:06504] ras:gridengine: JOB_ID: 59467
[exec1:06504] ras:gridengine: PE_HOSTFILE: 
/usr/sge/default/spool/exec1/active_jobs/59467.1/pe_hostfile
[exec1:06504] ras:gridengine: exec1.cluster.stats.local: PE_HOSTFILE shows 
slots=2
[exec1:06504] ras:gridengine: exec3.cluster.stats.local: PE_HOSTFILE shows 
slots=1
[exec1:06504] ras:gridengine: exec2.cluster.stats.local: PE_HOSTFILE shows 
slots=1
[exec1:06504] ras:gridengine: exec7.cluster.stats.local: PE_HOSTFILE shows 
slots=1
[exec1:06504] ras:gridengine: exec4.cluster.stats.local: PE_HOSTFILE shows 
slots=1
[exec1:06504] ras:gridengine: exec5.cluster.stats.local: PE_HOSTFILE shows 
slots=1
[exec1:06504] ras:gridengine: exec6.cluster.stats.local: PE_HOSTFILE shows 
slots=1
[exec1:06504] [[59608,0],0] odls:default:fork binding child [[59608,1],0] to 
cpus 0008
[exec1:06504] [[59608,0],0] odls:default:fork binding child [[59608,1],1] to 
cpus 0020
[exec3:20248] [[59608,0],1] odls:default:fork binding child [[59608,1],2] to 
cpus 0008
[exec4:26792] [[59608,0],4] odls:default:fork binding child [[59608,1],5] to 
cpus 0001
[exec2:32462] [[59608,0],2] odls:default:fork binding child [[59608,1],3] to 
cpus 0001
[exec7:09833] [[59608,0],3] odls:default:fork binding child [[59608,1],4] to 
cpus 0002
[exec5:10834] [[59608,0],5] odls:default:fork binding child [[59608,1],6] to 
cpus 0001
[exec6:04230] [[59608,0],6] odls:default:fork binding child [[59608,1],7] to 
cpus 0001

AHHA!  Now I get the following if I use 'qsub -pe mpi 8 -binding linear:1 
myScript.com' with the above mpirun command:

[exec1:06552] System has detected external process binding to cores 0020
[exec1:06552] ras:gridengine: JOB_ID: 59468
[exec1:06552] ras:gridengine: PE_HOSTFILE: 
/usr/sge/default/spool/exec1/active_jobs/59468.1/pe_hostfile
[exec1:06552] ras:gridengine: exec1.cluster.stats.local: PE_HOSTFILE shows 
slots=2
[exec1:06552] ras:gridengine: exec3.cluster.stats.local: PE_HOSTFILE shows 
slots=1
[exec1:06552] ras:gridengine: exec2.cluster.stats.local: PE_HOSTFILE shows 
slots=1
[exec1:06552] ras:gridengine: exec7.cluster.stats.local: PE_HOSTFILE shows 
slots=1
[exec1:06552] ras:gridengine: exec4.cluster.stats.local: PE_HOSTFILE shows 
slots=1
[exec1:06552] ras:gridengine: exec5.cluster.stats.local: PE_HOSTFILE shows 
slots=1
[exec1:06552] ras:gridengine: exec6.cluster.stats.local: PE_HOSTFILE shows 
slots=1
--------------------------------------------------------------------------
mpirun was unable to start the specified application as it encountered an error:

Error name: Unknown error: 1
Node: exec1

when attempting to start process rank 0.
--------------------------------------------------------------------------
[exec1:06552] [[59432,0],0] odls:default:fork binding child [[59432,1],0] to 
cpus 0020
--------------------------------------------------------------------------
Not enough processors were found on the local host to meet the requested
binding action:

  Local host:        exec1
  Action requested:  bind-to-core
  Application name:  ./unterm

Please revise the request and try again.
--------------------------------------------------------------------------
[exec4:26816] [[59432,0],4] odls:default:fork binding child [[59432,1],5] to 
cpus 0001
[exec3:20345] [[59432,0],1] odls:default:fork binding child [[59432,1],2] to 
cpus 0020
[exec2:32486] [[59432,0],2] odls:default:fork binding child [[59432,1],3] to 
cpus 0001
[exec7:09921] [[59432,0],3] odls:default:fork binding child [[59432,1],4] to 
cpus 0002
[exec6:04257] [[59432,0],6] odls:default:fork binding child [[59432,1],7] to 
cpus 0001
[exec5:10861] [[59432,0],5] odls:default:fork binding child [[59432,1],6] to 
cpus 0001



Hope that helps clear up the confusion!  Please say it does, my head hurts...

Chris


--
Dr Chris Jewell
Department of Statistics
University of Warwick
Coventry
CV4 7AL
UK
Tel: +44 (0)24 7615 0778






Reply via email to