Hi all,
I've found that slurmctld does not take into account nodes in power save
modes when using healthcheck functionality, so it tries to contact not
responding nodes, resulting in being set to:
node_ptr->not_responding = true
and then, they are any more available to assign resources.
The problem is solved by changing at the 'void run_health_check(void)'
function in file 'ping_nodes.c':
if (IS_NODE_NO_RESPOND(node_ptr) || IS_NODE_FUTURE(node_ptr))
continue;
by the conditional
if (IS_NODE_NO_RESPOND(node_ptr) || IS_NODE_FUTURE(node_ptr) ||
IS_NODE_POWER_SAVE(node_ptr))
continue;
This bug is present both in 2.2.7 and 2.3.0 version as far as I could
test.
Cheers
--
Ramiro Alba
Centre Tecnològic de Tranferència de Calor
http://www.cttc.upc.edu
Escola Tècnica Superior d'Enginyeries
Industrial i Aeronàutica de Terrassa
Colom 11, E-08222, Terrassa, Barcelona, Spain
Tel: (+34) 93 739 86 46
--
Aquest missatge ha estat analitzat per MailScanner
a la cerca de virus i d'altres continguts perillosos,
i es considera que est� net.