is it possible to add or remove just a single node from a partition
without having to re-establish the whole list of nodes?

for example

if i have nodes[001-100] and i want to remove only node 049.  is there
some incantation that will allow me to do that without having to say
nodes[001-048,050-100]

the motivation is that we have a mixed pool of nodes some with gpu's
and some without.  as our cluster ages, the gpus are getting flaky.
often the gpu flakes out or dies, but the rest of the node is
perfectly fine.

i'd like to dynamically move a node out of the gpu partition and into
a non-gpu partition using a node-health script

yes, gres would probably handle this better then split partitions, but
we haven't rolled to gres allocations on the gpu's yet

Reply via email to