Thanks, Ian!

I take it that "alarm" usually means something job-related (asking for
more resources than available, for example) as opposed to something gone
wrong in the queuing system per se.

Anyway, I'll try "-explain" - thanks!!

Marty



On 9/23/11 4:22 PM, Ian Kaufman wrote:
> On Fri, Sep 23, 2011 at 1:55 PM, Marty Dippel <[email protected]> wrote:
>> SGE Newbie question-
>>
>> When I "qstat -f" a few of the nodes return an "a" state, which I
>> believe means the node is in alarm.
>>
>>
>> queuename                      qtype used/tot. load_avg arch          states
>> ----------------------------------------------------------------------------
>> [email protected]        BIP   2/2       4.03     lx26-amd64    a
>>  35329 0.50894 finer3a    abaezgua     r     09/23/2011 11:08:04     2
>>
>> ----------------------------------------------------------------------------
>>
>>
>> 1. What's the best way for me to discover the cause of the alarm state?
> 
> qstat -explain a JOBID
> 
>>
>> 2. Once a node is in alarm, will it reset by itself when the condition
>> is corrected or will it require human intervention to clear this state?
> 
> Depends on if the node can clear out the job or not without human
> intervention. Usually, its best to intervene.
> 
> Ian
> 
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to