Thanks, Ian! I take it that "alarm" usually means something job-related (asking for more resources than available, for example) as opposed to something gone wrong in the queuing system per se.
Anyway, I'll try "-explain" - thanks!! Marty On 9/23/11 4:22 PM, Ian Kaufman wrote: > On Fri, Sep 23, 2011 at 1:55 PM, Marty Dippel <[email protected]> wrote: >> SGE Newbie question- >> >> When I "qstat -f" a few of the nodes return an "a" state, which I >> believe means the node is in alarm. >> >> >> queuename qtype used/tot. load_avg arch states >> ---------------------------------------------------------------------------- >> [email protected] BIP 2/2 4.03 lx26-amd64 a >> 35329 0.50894 finer3a abaezgua r 09/23/2011 11:08:04 2 >> >> ---------------------------------------------------------------------------- >> >> >> 1. What's the best way for me to discover the cause of the alarm state? > > qstat -explain a JOBID > >> >> 2. Once a node is in alarm, will it reset by itself when the condition >> is corrected or will it require human intervention to clear this state? > > Depends on if the node can clear out the job or not without human > intervention. Usually, its best to intervene. > > Ian > _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
