[slurm-dev] Re: How advanced reservation should work?

Daniel Jana Wed, 17 Feb 2016 03:03:07 -0800

Hello,

I can replicate the same problem in a very specific condition. I'll
avoid most of the stuff you showed above, because my setup is similar
to yours. My reservation starts today at 7 pm. I reserved only node24,
so I specify this node:
$ srun -w node24 hostname
srun: Required node not available (down, drained or reserved)
srun: job 195 queued and waiting for resources


So, I have the same issue as you. However, if I do:
$ srun --time=2:00 -w node24 hostname
node24.cluster

It runs as expected.
The problem you are seeing is probably due to not setting a time limit
for a job in a partition with a long max time (or infinite). SLURM has
no way of knowing a priori whether the command will finish on time, so
it won't let you run it. If you make sure the jobs you want to run
finish before the reservation starts they should run. You can do this
by either setting an appropriate time limit or submitting them to a
partition with a short time limit (e.g. a debug partition).

Hope this helps,
Daniel

On 17 February 2016 at 10:29, Taras Shapovalov
<[email protected]> wrote:
> Hi guys,
>
> Could you please explain the behavior of advanced reservation? I thought I
> understand how it works in Slurm, but it seems I am missing something. If I
> reserve some nodes for a particular period of time in the future, then it
> seems the nodes cannot be used outside the period of time (when the
> reservation is inactive). For example:
>
> For example:
>
>
> [root@taras-trunk-slurm-tmp ~]# scontrol show res
> ReservationName=root_1 StartTime=2016-02-18T16:00:00
> EndTime=2016-02-18T18:00:00 Duration=02:00:00
>    Nodes=node[001-002] NodeCnt=2 CoreCnt=2 Features=(null)
> PartitionName=(null) Flags=MAINT,IGNORE_JOBS,SPEC_NODES
>    TRES=cpu=2
>    Users=root Accounts=(null) Licenses=(null) State=INACTIVE
> BurstBuffer=(null) Watts=n/a
>
> [root@taras-trunk-slurm-tmp ~]# scontrol show node node001,node002
> NodeName=node001 Arch=x86_64 CoresPerSocket=1
>    CPUAlloc=0 CPUErr=0 CPUTot=1 CPULoad=0.03 Features=(null)
>    Gres=(null)
>    NodeAddr=node001 NodeHostName=node001 Version=15.08
>    OS=Linux RealMemory=993 AllocMem=0 FreeMem=550 Sockets=1 Boards=1
>    State=IDLE ThreadsPerCore=1 TmpDisk=14988 Weight=1 Owner=N/A
>    BootTime=2016-02-16T19:56:33 SlurmdStartTime=2016-02-16T20:05:00
>    CapWatts=n/a
>    CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
>    ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
>
>
> NodeName=node002 Arch=x86_64 CoresPerSocket=1
>    CPUAlloc=0 CPUErr=0 CPUTot=1 CPULoad=0.04 Features=(null)
>    Gres=(null)
>    NodeAddr=node002 NodeHostName=node002 Version=15.08
>    OS=Linux RealMemory=993 AllocMem=0 FreeMem=551 Sockets=1 Boards=1
>    State=IDLE ThreadsPerCore=1 TmpDisk=14988 Weight=1 Owner=N/A
>    BootTime=2016-02-16T19:56:34 SlurmdStartTime=2016-02-16T20:04:59
>    CapWatts=n/a
>    CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
>    ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
>
> [root@taras-trunk-slurm-tmp ~]# date
> Wed Feb 17 09:53:42 CET 2016
> [root@taras-trunk-slurm-tmp ~]# srun hostname
> srun: Required node not available (down, drained or reserved)
> srun: job 10 queued and waiting for resources
>
> <CTRL-C>
>
> [root@taras-trunk-slurm-tmp ~]# scontrol delete reservationname=root_1
> [root@taras-trunk-slurm-tmp ~]# srun hostname
> node001
> [root@taras-trunk-slurm-tmp ~]#
>
>
> Is this really expected? The documentation does not say this is expected, at
> least explicitly.
>
> From slurmctld log:
>
> [2016-02-17T09:53:43.028] debug2: found 2 usable nodes from config
> containing node[001,002]
> [2016-02-17T09:53:43.028] debug2: Advanced reservation removed 2 nodes from
> consideration for job 10
>
> Best regards,
>
> Taras
>

[slurm-dev] Re: How advanced reservation should work?

Reply via email to