(We are running version 2.2.0 of SLURM.)
Yesterday, one of my reservations ended at six p.m.
It was named "raalvmar" like the user that is was
reserved for:
ReservationName=raalvmar StartTime=2011-03-28T08:00:00
EndTime=2011-03-30T18:00:00 Duration=2-10:00:00
Nodes=h1 NodeCnt=1 Features=(null) PartitionName=halvan Flags=IGNORE_JOBS
Users=lka,raalvmar Accounts=(null) Licenses=(null)
24 minutes before the reservation ended, the user was allowed to start
a job. Here is the "scontrol show job":
JobId=155 Name=job_20110330_173627_eBoJT3
UserId=raalvmar(40037) GroupId=uppmax(40001)
Priority=70058 Account=b2010051 QOS=normal WCKey=*
JobState=TIMEOUT Reason=TimeLimit Dependency=(null)
Requeue=0 Restarts=0 BatchFlag=1 ExitCode=0:1
RunTime=00:23:24 TimeLimit=10:00:00 TimeMin=N/A
SubmitTime=2011-03-30T17:36:28 EligibleTime=2011-03-30T17:36:28
StartTime=2011-03-30T17:36:48 EndTime=2011-03-30T18:00:12
SuspendTime=None SecsPreSuspend=0
Partition=halvan AllocNode:Sid=kalkyl4:22785
ReqNodeList=(null) ExcNodeList=(null)
NodeList=h1
NumNodes=1 NumCPUs=8 CPUs/Task=1 ReqS:C:T=*:*:*
MinCPUsNode=1 MinMemoryNode=0 MinTmpDiskNode=0
Features=(null) Gres=(null) Reservation=(null)
Shared=OK Contiguous=0 Licenses=(null) Network=(null)
Command=/bubo/home/h13/raalvmar/glob/jobs/job_20110330_173627_eBoJT3
WorkDir=/bubo/glob/g13/raalvmar/jobs
As you can read, the timelimit was 10 hours but it was terminated with
status TIMEOUT long before that. We, the user and I, were both surprised.
In my view, it would have been better if the job had not started, because
SLURM should know that it was unlikely that it would be able to run
for ten hours. Even better would be if the job was rejected at submit time.
We would like to reserve nodes for users or accounts, but the reservation
system seems to have strange limits:
1/ At submit time you must specify exactly one reservation name, otherwise
your job can not use a node that is reserved.
2/ If you specify a reservation name, your job will not start on the node
until the reservation starts, even if the node is free also before the
reservation starts. As I now understand, it is not allowed to continue
running on the node when the reservation ends.
As I see the situation, the reservation would be more useful, if it
was allowed
a/ ... for a job to specify that it would not mind to run freely within
any reservation that allows it's user and/or account to run there.
b/ ... for a job to run within the reservation and also before and after
the reservation.
Have I misunderstood the reservation rules, so reservations actually
are more useful than I understand? Or are there other mechanisms
within SLURM that does what I am asking for?
Otherwise, are there any plans to make reservations more
versatile?
Best regards,
-- Lennart Karlsson
UPPMAX, Uppsala, Sweden
http://www.uppmax.uu.se