Dear all

I'm facing a new problem on my cluster with SGE. I don't show this
before.. O maybe I never detect it.
I have some nodes with 2 queue, one (named "all.q" ) to run jobs no more
than 24h , and another queue (named "lenta.q" ) to run jobs than need
more than 24 h.
I determine qa resource quota as i read some time in this email list,
defined as following:

{
   name         slots_equals_cores
   description  Prevent core over-subscription across queues
   enabled      TRUE
   limit        hosts {*} to slots=$num_proc
}


For now, i have a node with 64 cores, 40 cores for the normal queue ,
and 24 for the large queue.


all.q@compute-2-0.local        BP    0/16/40        15.93    lx-amd64

lenta.q@compute-2-0.local      BP    0/0/24         15.93    lx-amd64

Some jobs with 2 cores don't enter in this node on the large time queue,
althougth there is no problem with memory or core. The qstat indicate me
this:

"////compute-2-0/" in rule "slots_equals_cores/1"
                            cannot run because it exceeds limit
"////compute-2-0/" in rule "slots_equals_cores/1"
                            cannot run because it exceeds limit
"////compute-0-4/" in rule "slots_equals_cores/1"
                            cannot run in PE "thread" because it only
offers 0 slots

I really don't understand why the job is not running on tis nodes, at
for my opinion it's free for this.

Somenoe can help me about this?

REgards.

-- 
-- Jérôme
Le baiser est la plus sûre façon de se taire en disant tout.
        (Guy de Maupassant)
_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Reply via email to