What is your requested wall time on that job?  If there isn't a DefaultTime set 
for the debug partition, it might be assuming the job will use the maximum 
(infinite), which would run into the reservation start time.  

-----Original Message-----
From: Tim Donahue <[email protected]>
Reply-To: slurm-dev <[email protected]>
Date: Thursday, June 22, 2017 at 4:59 PM
To: slurm-dev <[email protected]>
Subject: [slurm-dev] Node not available due to future reservation?


I have a very simple system, one controller, one server node. The node 
is up.

> ubuntu@controller:~$ sinfo
> PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
> debug*       up   infinite      1   idle server1
I create a reservation containing the server node but having a start 
time many days in advance:

> ubuntu@controller:~$ scontrol show reservations -o
> ReservationName=foo3 StartTime=2017-07-03T00:00:00 
> EndTime=2017-07-03T01:00:00 Duration=01:00:00 Nodes=server1 NodeCnt=1 
> CoreCnt=1 Features=(null) PartitionName=debug Flags= TRES=cpu=1 
> Users=ubuntu Accounts=(null) Licenses=(null) State=INACTIVE 
> BurstBuffer=(null) Watts=n/a
> ubuntu@controller:~$

I then try to run a (very simple) job, but the job is queued:

> ubuntu@controller:~$ srun hostname
> srun: Required node not available (down, drained or reserved)
> srun: job 630 queued and waiting for resources
squeue suggests the job is queued because the server node is not available:

>
> ubuntu@controller:~$ squeue
>              JOBID PARTITION     NAME     USER ST TIME  NODES 
> NODELIST(REASON)
>                629     debug hostname   ubuntu PD 0:00      1 
> (ReqNodeNotAvail, May be reserved for other job)

Is this the expected behavior and, if so, why?

Thanks

Tim Donahue

MIT / BU / MassOpenCloud




Reply via email to