I have a distributed app that runs on wine on distributed nodes. At times it looks like the wine install goes haywire and for no reason the give app starts failing on a node. When my main application sees this error, I would like to put the node down ( slurm should not schedule any job on the same node). Is there a way to do this programmatically.
- [slurm-dev] Putting a node down on error Paul Thirumalai
