Peter,

This small attached patch will do what you request. This is only applicable with a configuration of FastSchedule=0 and select/linear and no gang scheduling. Both the select/cons_res plugin and the gang scheduling module rely upon bitmaps to keep track of every CPU in the system, so that information can not change (i.e. the CPU count must be defined in slurm.conf for every node when slurmctld starts and not change).

Quoting Peter Kruse <[email protected]>:

Hello,

I've just successfully configured power save mode in one of our clusters,
when I noticed that the information about suspended nodes is lost
when I restart slurmctl.  For example "CoresPerSocket" is set to 1
when in fact it is 4 for some nodes.  This makes it impossible to
start a job requesting the 4 Cores.
I know that I can manually enter the number of cores, Real Memory
and alike to slurm.conf, but I hoped that there is a way to automate
this process.  I also thought that this information is stored in
the database?  Is there any way to automatically make the
information about the nodes persistent?

Thanks,

  Peter



diff --git a/src/slurmctld/node_mgr.c b/src/slurmctld/node_mgr.c
index edc8dec..c968cd7 100644
--- a/src/slurmctld/node_mgr.c
+++ b/src/slurmctld/node_mgr.c
@@ -430,6 +430,14 @@ extern int load_all_node_state ( bool state_only )
 						hostset_insert(hs, node_name);
 					else
 						hs = hostset_create(node_name);
+					/* Recover hardware state for powered
+					 * down nodes */
+					node_ptr->cpus          = cpus;
+					node_ptr->sockets       = sockets;
+					node_ptr->cores         = cores;
+					node_ptr->threads       = threads;
+					node_ptr->real_memory   = real_memory;
+					node_ptr->tmp_disk      = tmp_disk;
 				}
 				if (node_state & NODE_STATE_POWER_UP) {
 					if (power_save_mode) {

Reply via email to