[slurm-dev] Oddity with power control

Jeff Squyres (jsquyres) Tue, 26 Mar 2013 06:25:13 -0700

I am using the Bright cluster manager version 6.0, which uses SLURM v2.3.4.


I'm seeing an odd issue: I have many jobs queued up, but SLURM has decided to 
power down most of my nodes and mark them as "~idle".

The short version is that I have multiple partitions and multiple different 
types of servers in my cluster, and I have SLURM's power control enabled to 
power off my servers when they're not in use.  However, I have seen SLURM mark 
nodes as "~idle" (i.e., idle and powered off) while there are lots of jobs in 
the queue.  For example, this morning, I see about 1000 jobs queued up in 
"defq" (my default partition), but only 4 nodes (out of 32) are powered up and 
running jobs from that queue.  The remaining 28 are marked as "~idle":

Here's some of the details:

-----
[root@savbu-usnic-a ~]# srun --version
slurm 2.3.4
[root@savbu-usnic-a ~]# squeue | head
  JOBID PARTITION     NAME     USER  ST       TIME  NODES NODELIST(REASON)
  82646      defq Run imb-  mpiteam  PD       0:00      2 (Priority)
  82647      defq Run netp  mpiteam  PD       0:00      2 (Priority)
  82649      defq Run triv  mpiteam  PD       0:00      2 (Priority)
  82650      defq Run inte  mpiteam  PD       0:00      2 (Priority)
  82651      defq Run ibm   mpiteam  PD       0:00      2 (Priority)
  82652      defq Run ones  mpiteam  PD       0:00      2 (Priority)
  82653      defq Run mpic  mpiteam  PD       0:00      2 (Priority)
  82654      defq Run mpi-  mpiteam  PD       0:00      2 (Priority)
  82655      defq Run java  mpiteam  PD       0:00      2 (Priority)
[root@savbu-usnic-a ~]# squeue | grep Resour
  82642      defq Run mpic  mpiteam  PD       0:00      2 (Resources)
[root@savbu-usnic-a ~]# sinfo
PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
defq*        up   infinite     28  idle~ node[001-011,014-015,018-032]
defq*        up   infinite      4  alloc node[012-013,016-017]
eurompi      up   infinite      0    n/a 
infiniban    up   infinite     38  idle~ dell[001-016,022-043]
[root@savbu-usnic-a ~]# 
-----

Do you still answer questions about SLURM v2.3.4?  (upgrading is not really an 
option, since Bright controls my entire SLURM setup)

Thanks.

-- 
Jeff Squyres
[email protected]
For corporate legal information go to: 
http://lists.schedmd.com/cgi-bin/dada/mail.cgi/r/slurmdev/314919641674/

[slurm-dev] Oddity with power control

Reply via email to