Ok,

I found one clue.    "qstat" and "qstat -f" are reporting different number of 
cores ( slots ) in use:

Qstat is reporting 25 + 32 + 32 cores while "qstat -f " reports 25 + 15 + 10 
cores:

qstat -f   ( for compute-2-6 )
[email protected]          BIP   0/50/64        3.79 lx-amd64
  45647 0.54310 QRLOGIN    user1        r     11/07/2012 15:55:04    25
  40044 0.55421 SNPtable   user2        r     11/06/2012 11:13:18    15
  40279 0.55421 SNPtable   user2        r     11/06/2012 14:50:25    10


$ qstat | grep compute-2-6
  45647 0.54310 QRLOGIN    user1        r     11/07/2012 15:55:04 
[email protected]             25
  40044 0.55421 SNPtable   user2        r     11/06/2012 11:13:18 
[email protected]             32
  40279 0.55421 SNPtable   user2        r     11/06/2012 14:50:25 
[email protected]             32


So it looks like SGE is confused.    How can I fix this?


On 11/7/2012 9:25 PM, Joseph Farran wrote:
Hi.

I am using SGE 8.1.2 with several queues and recently, several of my 64-slots 
queues are not scheduling the full 64-cores.

So if I submit 64 1-core jobs, only 57 or so are schedule per node instead of 
64.      If I submit 4 16-core pe jobs, only 3 of the 16-core pe jobs are 
scheduled on a node instead of 4 ( 16x4 = 64 ).

This was working before just fine, so I think SGE just lost track or something. I tried restarting SGE with same symptoms. My queues do show "slots=64". The compute nodes do not have any special settings.

Is there a way to tell SGE to re-count cores per node, or to reset SGE without 
disrupting running jobs?

Joseph



_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to