Hello,

   We (fairly) recently upgraded our cluster to Rocks 6.1.1
and we now seem to be having problems with RQS.  On our old
cluster, we had an RQS quota set as follows:

{
   name         host-slots
   description  restrict slots to core count
   enabled      TRUE
   limit        hosts {*} to slots=$num_proc
}

The reason for this was to try to prevent oversubscription
of the processors on the clients.  Now, if I have this quota
enabled, jobs that are submitted don't start and if I do a
'qstat -j job-number' under "scheduling info" I see things like

cannot run because it exceeds limit "////compute-0-7/" in rule "host-slots/1"
cannot run because it exceeds limit "////compute-0-7/" in rule "host-slots/1"
(-l slots=1) cannot run in queue "compute-0-39.local" because it offers only 
hc:slots=0.000000
cannot run because it exceeds limit "////compute-0-78/" in rule "host-slots/1"
cannot run because it exceeds limit "////compute-0-78/" in rule "host-slots/1"
cannot run because it exceeds limit "////compute-0-55/" in rule "host-slots/1"
cannot run because it exceeds limit "////compute-0-55/" in rule "host-slots/1"
cannot run because it exceeds limit "////compute-0-74/" in rule "host-slots/1"
cannot run because it exceeds limit "////compute-0-74/" in rule "host-slots/1"
cannot run because it exceeds limit "////compute-2-7/" in rule "host-slots/1"
cannot run because it exceeds limit "////compute-2-1/" in rule "host-slots/1"
cannot run because it exceeds limit "////compute-2-2/" in rule "host-slots/1"
cannot run because it exceeds limit "////compute-0-22/" in rule "host-slots/1"
cannot run because it exceeds limit "////compute-0-22/" in rule "host-slots/1"
cannot run because it exceeds limit "////compute-1-2/" in rule "host-slots/1"
cannot run in PE "mpich" because it only offers 0 slots

But as soon as I run 'qconf -mrqs' and change TRUE to FALSE, the job runs.

Has the process for preventing oversubscription changed?  Any ideas?

JY

_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to