Hi Daniel, On Wed, Feb 3, 2016 at 3:33 AM, Daniel Letai <[email protected]> wrote: > The question is - does slurm also use the dev files to track the > availability of the cards? > > I do not wish to drain any nodes with failing cards - just let slurm know > about this dynamically so jobs requesting gpus are properly scheduled, while > other jobs can use the "bad" nodes.
I don't have an answer to your question, but running "scontrol -dd show node <nodename> | grep -i gres" reports a GresDrain property: Gres=gpu:8 GresDrain=N/A GresUsed=gpu:4 No idea how to set this though, but if there is a way to drain specific GRES, that could be a way to do what you want. Cheers, -- Kilian
