[slurm-dev] Understanding cons_res with CR_Core: It either cannot allocate resource or jobs end up in CG status ... Why?

Somesh Roy Sun, 05 Mar 2017 20:32:56 -0800

I am new to SLURM and trying to configure slurm for a new cluster.


I have 4 nodes, each has 14 cores. I wanted to share nodes in a way that
every core can run independently (i.e., node01 can have 14 independent
serial jobs going on at the same time), but no core should run more than
one job. Going through the documentation I figured I need to set

----

   SelectType              = select/cons_res
   SelectTypeParameters    = CR_CORE

----

So I did so in slurm.conf and restarted slurmctld. But now if I submit a
job, I get either errors that it cannot find node configuration per
resource required, or the job ends up CG state.


Example 1:

----

   [sr@clstr mpitests]$ cat newHello.slrm
   #!/bin/sh
   #SBATCH --time=00:01:00
   #SBATCH -N 1
   #SBATCH --ntasks=4
   #SBATCH --ntasks-per-node=4

   module add shared openmpi/gcc/64 slurm
   module load somesh/scripts/1.0

   mpirun helloMPIf90
---


Leads to:


---

   [sr@clstr mpitests]$ sbatch -v newHello.slrm
   sbatch: defined options for program `sbatch'
   sbatch: ----------------- ---------------------
   sbatch: user              : `sr'
   sbatch: uid               : 1003
   sbatch: gid               : 1003
   sbatch: cwd               : /home/sr/clusterTests/mpitests
   sbatch: ntasks            : 4 (set)
   sbatch: nodes             : 1-1
   sbatch: jobid             : 4294967294 (default)
   sbatch: partition         : default
   sbatch: profile           : `NotSet'
   sbatch: job name          : `newHello.slrm'
   sbatch: reservation       : `(null)'
   sbatch: wckey             : `(null)'
   sbatch: distribution      : unknown
   sbatch: verbose           : 1
   sbatch: immediate         : false
   sbatch: overcommit        : false
   sbatch: time_limit        : 1
   sbatch: nice              : -2
   sbatch: account           : (null)
   sbatch: comment           : (null)
   sbatch: dependency        : (null)
   sbatch: qos               : (null)
   sbatch: constraints       :
   sbatch: geometry          : (null)
   sbatch: reboot            : yes
   sbatch: rotate            : no
   sbatch: network           : (null)
   sbatch: array             : N/A
   sbatch: cpu_freq_min      : 4294967294
   sbatch: cpu_freq_max      : 4294967294
   sbatch: cpu_freq_gov      : 4294967294
   sbatch: mail_type         : NONE
   sbatch: mail_user         : (null)
   sbatch: sockets-per-node  : -2
   sbatch: cores-per-socket  : -2
   sbatch: threads-per-core  : -2
   sbatch: ntasks-per-node   : 4
   sbatch: ntasks-per-socket : -2
   sbatch: ntasks-per-core   : -2
   sbatch: mem_bind          : default
   sbatch: plane_size        : 4294967294
   sbatch: propagate         : NONE
   sbatch: switches          : -1
   sbatch: wait-for-switches : -1
   sbatch: core-spec         : NA
   sbatch: burst_buffer      : `(null)'
   sbatch: remote command    :
`/home/sr/clusterTests/mpitests/newHello.slrm'
   sbatch: power             :
   sbatch: wait              : yes
   sbatch: Consumable Resources (CR) Node Selection plugin loaded with
argument 4
   sbatch: Cray node selection plugin loaded
   sbatch: Linear node selection plugin loaded with argument 4
   sbatch: Serial Job Resource Selection plugin loaded with argument 4
   sbatch: error: Batch job submission failed: Requested node configuration
is not available

---


Example 2:


---

   [sr@clstr mpitests]$ cat newHello.slrm
   #!/bin/sh
   #SBATCH --time=00:01:00
   #SBATCH -N 1
   #SBATCH --ntasks=1
   #SBATCH --ntasks-per-node=1

   module add shared openmpi/gcc/64 slurm
   module load somesh/scripts/1.0

   helloMPIf90
---


Leads to:


---

   [sr@clstr mpitests]$ sbatch -v newHello.slrm
   sbatch: defined options for program `sbatch'
   sbatch: ----------------- ---------------------
   sbatch: user              : `sr'
   sbatch: uid               : 1003
   sbatch: gid               : 1003
   sbatch: cwd               : /home/sr/clusterTests/mpitests
   sbatch: ntasks            : 1 (set)
   sbatch: nodes             : 1-1
   sbatch: jobid             : 4294967294 (default)
   sbatch: partition         : default
   sbatch: profile           : `NotSet'
   sbatch: job name          : `newHello.slrm'
   sbatch: reservation       : `(null)'
   sbatch: wckey             : `(null)'
   sbatch: distribution      : unknown
   sbatch: verbose           : 1
   sbatch: immediate         : false
   sbatch: overcommit        : false
   sbatch: time_limit        : 1
   sbatch: nice              : -2
   sbatch: account           : (null)
   sbatch: comment           : (null)
   sbatch: dependency        : (null)
   sbatch: qos               : (null)
   sbatch: constraints       :
   sbatch: geometry          : (null)
   sbatch: reboot            : yes
   sbatch: rotate            : no
   sbatch: network           : (null)
   sbatch: array             : N/A
   sbatch: cpu_freq_min      : 4294967294
   sbatch: cpu_freq_max      : 4294967294
   sbatch: cpu_freq_gov      : 4294967294
   sbatch: mail_type         : NONE
   sbatch: mail_user         : (null)
   sbatch: sockets-per-node  : -2
   sbatch: cores-per-socket  : -2
   sbatch: threads-per-core  : -2
   sbatch: ntasks-per-node   : 1
   sbatch: ntasks-per-socket : -2
   sbatch: ntasks-per-core   : -2
   sbatch: mem_bind          : default
   sbatch: plane_size        : 4294967294
   sbatch: propagate         : NONE
   sbatch: switches          : -1
   sbatch: wait-for-switches : -1
   sbatch: core-spec         : NA
   sbatch: burst_buffer      : `(null)'
   sbatch: remote command    :
`/home/sr/clusterTests/mpitests/newHello.slrm'
   sbatch: power             :
   sbatch: wait              : yes
   sbatch: Consumable Resources (CR) Node Selection plugin loaded with
argument 4
   sbatch: Cray node selection plugin loaded
   sbatch: Linear node selection plugin loaded with argument 4
   sbatch: Serial Job Resource Selection plugin loaded with argument 4
   Submitted batch job 108

   [sr@clstr mpitests]$ squeue
                JOBID PARTITION     NAME     USER ST       TIME  NODES
NODELIST(REASON)
                  108      defq newHello     sr CG       0:01      1 node001

   [sr@clstr mpitests]$ scontrol show job=108
   JobId=108 JobName=newHello.slrm
      UserId=sr(1003) GroupId=sr(1003) MCS_label=N/A
      Priority=4294901756 Nice=0 Account=(null) QOS=normal
      JobState=COMPLETING Reason=NonZeroExitCode Dependency=(null)
      Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=1:0
      RunTime=00:00:01 TimeLimit=00:01:00 TimeMin=N/A
      SubmitTime=2017-03-03T18:25:51 EligibleTime=2017-03-03T18:25:51
      StartTime=2017-03-03T18:26:01 EndTime=2017-03-03T18:26:02 Deadline=N/A
      PreemptTime=None SuspendTime=None SecsPreSuspend=0
      Partition=defq AllocNode:Sid=clstr:20260
      ReqNodeList=(null) ExcNodeList=(null)
      NodeList=node001
      BatchHost=node001
      NumNodes=1 NumCPUs=1 NumTasks=1 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
      TRES=cpu=1,node=1
      Socks/Node=* NtasksPerN:B:S:C=1:0:*:* CoreSpec=*
      MinCPUsNode=1 MinMemoryNode=0 MinTmpDiskNode=0
      Features=(null) Gres=(null) Reservation=(null)
      OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
      Command=/home/sr/clusterTests/mpitests/newHello.slrm
      WorkDir=/home/sr/clusterTests/mpitests
      StdErr=/home/sr/clusterTests/mpitests/slurm-108.out
      StdIn=/dev/null
      StdOut=/home/sr/clusterTests/mpitests/slurm-108.out
      Power=

---

In the case of second example, it stays in CG state until I reset the node.
If I reset the slurm.conf to SelectType=select/linear, things behave
normally as they should.

I am at a loss as to where am I making mistake. Is it to do with the slurm
configuration, or with my slurm job submission script, or something else
entirely.

Also, what do the following settings mean and why are they show up as
negative?

   sbatch: sockets-per-node  : -2
   sbatch: cores-per-socket  : -2
   sbatch: threads-per-core  : -2
   sbatch: ntasks-per-socket : -2
   sbatch: ntasks-per-core   : -2


If anyone can point me to the right direction, that would very helpful.

Thanks in advance,
Somesh

[slurm-dev] Understanding cons_res with CR_Core: It either cannot allocate resource or jobs end up in CG status ... Why?

Reply via email to