On 3/5/2019 12:34 PM, David Trimboli wrote:

On 3/5/2019 12:18 PM, Reuti wrote:
Am 05.03.2019 um 18:06 schrieb David Trimboli<trimb...@cshl.edu>:

I'm looking at SGE limits, and I'm not sure when something applies to all users 
or each user individually. I want to find out how to limit each user to a 
certain number of slots across the entire cluster (just one queue).

I feel like this isn't it:

     Name           limit-user-slots
     description    Limit each user to 10 slots
     enabled        true
     limit          users * queues {all.q} to slots=10
limit users {*} queues all.q to slots=10

In principle {all.q} wouldn't hurt as it means "for each entry in the list", 
and the only entry is all.q. But to lower the impact I would leave this out.
Ohhhhhhh! I didn't realize that {} meant to apply to each entry in the list. That gives me everything I need. Thanks to you and Bernd.

Now a followup question. I implemented this rule to ensure that no single user takes more than 90% of our available slots:

    name    limit90percent
    description    NONE
    enabled    TRUE
    limit    users {*} to slots=536

(Our cluster has a total of 596 slots.) This worked fine until someone tried to submit a parallel environment job with the -pe option. On 16 out of our 24 nodes, it still worked. But if they sent a job hard-queued to one of the upper nodes 17–24, it would never run, with this in the scheduling info:

   cannot run because it exceeds limit "trimboli/////" in rule
   cannot run in PE "threads" because it only offers 0 slots

(My username is trimboli.) Now, it's quite possible that the upper nodes are set up differently than the lower nodes. The upper eight nodes were installed later than the others and have been treated differently in the past. I'd like to find what setting in the upper nodes is making this limit say that there are 0 slots when a PE job is run. Where can I look to find the culprit?

users mailing list

Reply via email to