You can run "scontrol update nodename=X state=down reason=X"
or execute the underlying RPC. This can only be done by user
root or SlurmUser, so you might make some sort of set-uid
wrapper for this.



Quoting Paul Thirumalai <[email protected]>:

I have a distributed app that runs on wine on distributed nodes. At times it
looks like the wine install goes haywire and for no reason the give app
starts failing on a node.
When my main application sees this error, I would like to put the node down
( slurm should not schedule any job on the same node). Is there a way to do
this programmatically.




Reply via email to