[slurm-dev] Re: reservation/priority problems

Bjørn-Helge Mevik Fri, 17 Jan 2014 02:12:47 -0800

I see the same problem that Andy describes.  We are testing 2.6.5 on our
test cluster.


As root:

# scontrol create reservation=bhms users=bhm nodes=c0-0 start=now 
duration=unlimited
Reservation created: bhms
# scontrol create reservation=staffs accounts=staff nodes=c2-0 start=now 
duration=unlimited
Reservation created: staffs
# scontrol show reservation
ReservationName=bhms StartTime=2014-01-17T10:49:23 EndTime=2015-01-17T10:49:23 
Duration=365-00:00:00
   Nodes=c0-0 NodeCnt=1 CoreCnt=32 Features=(null) PartitionName=(null) 
Flags=SPEC_NODES
   Users=bhm Accounts=(null) Licenses=(null) State=ACTIVE

ReservationName=staffs StartTime=2014-01-17T10:50:53 
EndTime=2015-01-17T10:50:53 Duration=365-00:00:00
   Nodes=c2-0 NodeCnt=1 CoreCnt=4 Features=(null) PartitionName=(null) 
Flags=SPEC_NODES
   Users=(null) Accounts=staff Licenses=(null) State=ACTIVE


As bhm:

$ /opt/slurm/bin/sbatch --reservation=bhms mal.sm
Submitted batch job 20
$ /opt/slurm/bin/sbatch --reservation=staffs mal.sm
Submitted batch job 21

$ bjob
  JOBID NAME       USER     ACCOUNT   PARTITI QOS     ST PRIORI        TIME   
TIME_LEFT CPUS NOD MIN_MEM MIN_TMP NODELIST(REASON)
     21 mal.sm     bhm      staff     normal  staff    R  21000        2:55     
  57:05    1   1     500       0 c2-0
     20 mal.sm     bhm      staff     normal  staff   PD  21001        0:00     
1:00:00    1   1     500       0 (Reservation)


So the job requesting the staff account reservation starts, while the
job requesting the bhm user reservation does not.


slurmctld.log says:


[2014-01-17T10:56:54.021] backfill test for job 20
[2014-01-17T10:56:54.021] debug2: backfill: found new user 10231. Total #users 
now 1
[2014-01-17T10:56:54.021] backfill: completed testing 1 jobs, usec=21741
[2014-01-17T10:56:54.490] debug2: Testing job time limits and checkpoints
[2014-01-17T10:56:54.490] debug:  sched: Running job scheduler
[2014-01-17T10:56:54.490] debug3: acct_policy_job_runnable: job 20: MPC: 
job_memory set to 500
[2014-01-17T10:56:54.490] No nodes satisfy job 20 requirements in partition 
normal
[2014-01-17T10:56:54.490] debug3: sched: JobId=20. State=PENDING. 
Reason=Reservation. Priority=21001. Partition=normal.
[2014-01-17T10:57:03.241] debug3: Processing RPC: REQUEST_JOB_INFO from uid=0
[2014-01-17T10:57:21.163] debug2: Processing RPC: REQUEST_PARTITION_INFO 
uid=10231
[2014-01-17T10:57:21.163] debug2: _slurm_rpc_dump_partitions, size=482 usec=217
[2014-01-17T10:57:21.165] debug3: Processing RPC: REQUEST_NODE_INFO from 
uid=10231
[2014-01-17T10:57:21.171] debug2: Processing RPC: REQUEST_RESERVATION_INFO from 
uid=10231
[2014-01-17T10:57:24.493] debug2: Testing job time limits and checkpoints
[2014-01-17T10:57:24.494] debug2: Performing purge of old job records
[2014-01-17T10:57:40.643] debug3: Processing RPC: REQUEST_JOB_INFO from 
uid=10231
[2014-01-17T10:57:54.497] debug2: Testing job time limits and checkpoints
[2014-01-17T10:57:54.497] debug:  sched: Running job scheduler
[2014-01-17T10:57:54.497] debug3: acct_policy_job_runnable: job 20: MPC: 
job_memory set to 500
[2014-01-17T10:57:54.497] No nodes satisfy job 20 requirements in partition 
normal
[2014-01-17T10:57:54.498] debug3: sched: JobId=20. State=PENDING. 
Reason=Reservation. Priority=21001. Partition=normal.

Increasing the priority of the job with

# scontrol update jobid=20 nice=-1000

does not help.  It still does not start.


-- 
Regards,
Bjørn-Helge Mevik, dr. scient,
Department for Research Computing, University of Oslo

[slurm-dev] Re: reservation/priority problems

Reply via email to