Hi all,

i have used scontrol to make a reservation for all cluster nodes for a
scheduled system maintenance (cluster shutdown) on 2013-08-05T18:00:00. My
problem is, that in one partition, jobs are not eligable to run
(ReqNodeNotAvail) even though nodes are idle and the job time (7-00:00:00)
still fits before the maintenance window. I have the problem only on a
partition which is marked "Shared=YES", the other partition with
"Shared=FORCE:1" is working fine. Another interesting fact is, that jobs
are still accepted when they complete until 2013-07-29T00:00:00.

Is this a bug or a configuration problem?

Markus


bagheera2 slurm # scontrol show reserv
ReservationName=root_3 StartTime=2013-08-05T18:00:00
EndTime=2013-08-08T18:00:00 Duration=3-00:00:00
   Nodes=kaa-[1-17,26-33,45-48,50-106] NodeCnt=86 CoreCnt=1420
Features=(null) PartitionName=(null) Flags=MAINT,IGNORE_JOBS,SPEC_NODES
   Users=root Accounts=(null) Licenses=(null) State=INACTIVE

bagheera2 slurm # srun -p parallel -w kaa-104 --qos=long --time=2-13:15:00
hostname
kaa-104

bagheera2 slurm # srun -p parallel -w kaa-104 --qos=long --time=2-13:20:00
hostname
srun: Required node not available (down or drained)
srun: job 12899 queued and waiting for resources

PartitionName=serial
   AllocNodes=ALL AllowGroups=ALL Default=NO
   DefaultTime=NONE DisableRootJobs=NO GraceTime=0 Hidden=NO
   MaxNodes=UNLIMITED MaxTime=UNLIMITED MinNodes=1 MaxCPUsPerNode=UNLIMITED
   Nodes=kaa-[1-17,26-33,45-48,50-84]
   Priority=1 RootOnly=NO ReqResv=NO Shared=FORCE:1 PreemptMode=GANG,SUSPEND
   State=UP TotalCPUs=492 TotalNodes=64 SelectTypeParameters=N/A
   DefMemPerNode=UNLIMITED MaxMemPerNode=UNLIMITED

PartitionName=parallel
   AllocNodes=ALL AllowGroups=ALL Default=NO
   DefaultTime=NONE DisableRootJobs=NO GraceTime=0 Hidden=NO
   MaxNodes=UNLIMITED MaxTime=UNLIMITED MinNodes=1 MaxCPUsPerNode=UNLIMITED
   Nodes=kaa-[95-106]
   Priority=1 RootOnly=NO ReqResv=NO Shared=YES:4 PreemptMode=GANG,SUSPEND
   State=UP TotalCPUs=768 TotalNodes=12 SelectTypeParameters=N/A
   DefMemPerNode=UNLIMITED MaxMemPerNode=UNLIMITED

Attachment: slurmconfig
Description: Binary data

Reply via email to