On Fri, Jul 14, 2017 at 08:36:06AM +0000, Simon Andrews wrote:
>    Can anyone shed any light on an error I'm getting repeated thousands of
>    times in my grid engine messages log.  This happens when I have a job
>    which is submitted and which is stopped from running by an RQS rule I have
>    set up.  The error I get is:
> 
>     
> 
>    07/14/2017 09:27:08|schedu|rocks1|C|not a single host excluded in
>    rqs_excluded_hosts()
> 
>     
> 
>    The RQS ruleset I have which triggers this looks like:
> 
Not so much a fix but a possible workaround:
Send your logs to syslog (rather than having qmaster log directly into files) 
and rely
on the syslog replacing repeated messages with 'last message repeated <n> times

You could also try tweaking the log_level parameter.

I don't use RQS myself but my best guess is that you have two sorts of hosts.
Regular with a batch queue and the hosts in @interactive with an interactive 
queue
Because the hosts {@interactive} clause doesn't further restrict where the limit
applies (because jobs are already limited by being batch or interactive) grid 
engine 
complains that you appear to have a no-op in yor limit.  I think this complaint 
by SGE 
is spurious.

Possibly:
Give the interactive queue a different name from the regular batch queue.  Make 
sure the batch 
queue can't run on the interactive hosts and vice versa.  Then apply the limit 
to the queue
rather than the host.

>     
> 
>    {
> 
>       name         per_user_slot_limit
> 
>       description  "limit the number of slots per user"
> 
>       enabled      TRUE
> 
>       limit        users {*} hosts {@interactive} to slots=8
> 
>       limit        users {andrewss} to slots=2
> 
>       limit        users {@bioinf} to slots=616
> 
>       limit        users {*} to slots=411
> 
>    }
> 
>     
> 
>    The rule seems to work, and jobs are held, and then started as expected. 
>    A job which fails to schedule gets a state like this:
> 
>     
> 
>    scheduling info:            cannot run in queue instance
>    "all.q@compute-1-6.local" because it is not of type batch
> 
>                                cannot run in queue instance
>    "all.q@compute-1-5.local" because it is not of type batch
> 
>                                cannot run in queue instance
>    "all.q@compute-1-7.local" because it is not of type batch
> 
>                                cannot run in queue instance
>    "all.q@compute-1-0.local" because it is not of type batch
> 
>                                cannot run in queue instance
>    "all.q@compute-1-3.local" because it is not of type batch
> 
>                                cannot run because it exceeds limit
>    "andrewss/////" in rule "per_user_slot_limit/3"
> 
>                                cannot run in queue instance
>    "all.q@compute-1-4.local" because it is not of type batch
> 
>                                cannot run in queue instance
>    "all.q@compute-1-1.local" because it is not of type batch
> 
>                                cannot run in queue instance
>    "all.q@compute-1-2.local" because it is not of type batch
> 
>     
> 
>    So it's seeing the rule and is applying it correctly, but the spurious
>    errors are causing my messages file to inflate quickly when there are a
>    lot of queued jobs.
> 
>     
> 
>    Can anyone suggest how to debug or fix this?  I can't find anything
>    relevant from googling around for the specific error outside of the
>    library API it comes from.
> 
>     
> 
>    This is using SGE-6.2u5p2-1.x86_64.
> 
>     
> 
>    Thanks for any help you can offer!
> 
>     
> 
>    Simon.
> 
>     
> 
>     
> 
>    The Babraham Institute, Babraham Research Campus, Cambridge CB22 3AT
>    Registered Charity No. 1053902.
> 
>    The information transmitted in this email is directed only to the
>    addressee. If you received this in error, please contact the sender and
>    delete this email from your system. The contents of this e-mail are the
>    views of the sender and do not necessarily represent the views of the
>    Babraham Institute. Full conditions at: www.babraham.ac.uk

> _______________________________________________
> users mailing list
> users@gridengine.org
> https://gridengine.org/mailman/listinfo/users

Attachment: signature.asc
Description: Digital signature

_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Reply via email to