Jonathan Pierce <[email protected]> writes:

> On May 3, 2011, at 5:01 PM, William Deegan wrote:
>
>> Greetings,
>> 
>> What's the best way to take a node offline for maintenence?
>> 
>> I've seen this note:
>> http://serverfault.com/questions/65691/how-can-i-tell-sge-to-stop-assigning-work-to-a-compute-node

[Strange place to ask.]  That answer is specific to a single queue;
-q *@<node> is more general.

>> Which suggests qmod -d/qmod -e the queue.
>
> "qmod -d" would be the way to go. Just wait for the existing jobs to finish, 
> and qmod -e once the node's ready for jobs again. From the man page:
>
>        -d     Disables  the  queue(s), i.e. no further jobs are dispatched to
>               disabled queues while jobs already executing  in  these  queues
>               are allowed to finish.
>        -e     Enables the queue(s).

Disabling the queues, as opposed to putting them in an ACL-restricted
group, means admins can't submit test jobs to them, or ones to get mail
when they're free.  Enabling them can also cause trouble if a specific
queue is disabled on particular nodes (like a parallel queue when the
MPI interface is broken).

>> Also I saw a note from Reuti on the old mailing list about using an
>> advanced reservation to do so.

Manually I normally use sge-{dis,en}able-submits from
http://www.nw-grid.ac.uk/LivScripts (he says, quickly checking the
current version), but for automated use by Nagios I do use something
similar to {dis,en}able-nodes from the same page to disable all the
queues.  Those scripts depend on conventions for the node naming and
database to be able just to use just node numbers conveniently, but they
could be distributed as examples unless people have better ones.  (There
are several collections of utility scripts around.)

>> This is probably worthy of an FAQ entry.

Yes; I think it's covered in the archives.  Maintaining an FAQ would be
a useful contribution in exchange for help with consulting work.
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to