You can run "scontrol update nodename=X state=down reason=X" or execute the underlying RPC. This can only be done by user root or SlurmUser, so you might make some sort of set-uid wrapper for this.
Quoting Paul Thirumalai <[email protected]>:
I have a distributed app that runs on wine on distributed nodes. At times it looks like the wine install goes haywire and for no reason the give app starts failing on a node. When my main application sees this error, I would like to put the node down ( slurm should not schedule any job on the same node). Is there a way to do this programmatically.
