hi

out cluster is setup with the configuration below. yet we have been having
a lot of jobs cancelled when preempted:

slurmd[node004]: *** JOB 79188 CANCELLED AT 2014-08-05T15:31:41 DUE TO
PREEMPTION ***
i thought the settings would simply suspend the job instead of canceling it.

cheers,

satra

Partial configuration
---------------------------

PreemptMode=GANG,SUSPEND

PreemptType=preempt/partition_prio

# default

SchedulerTimeSlice=30

DefMemPerCPU=2048

DefMemPerNode=2048

PartitionName=DEFAULT MaxTime=7-0 DefaultTime=24:00:00

# Partitions

PartitionName=defq Default=NO MinNodes=1 DefaultTime=1-00:00:00
MaxTime=7-00:00:00 AllowGroups=ALL Priority=1 DisableRootJobs=NO
RootOnly=NO Hidden=YES Shared=NO GraceTime=0 ReqResv=NO
PreemptMode=GANG,SUSPEND State=UP

PartitionName=om_all_nodes Default=YES MinNodes=1 DefaultTime=1-00:00:00
MaxTime=7-00:00:00 AllowGroups=ALL Priority=1 DisableRootJobs=NO
RootOnly=NO Hidden=NO Shared=FORCE:4 GraceTime=0 ReqResv=NO
PreemptMode=GANG,SUSPEND State=UP Nodes=node[001-030]

PartitionName=om_interactive Default=NO MinNodes=1 MaxNodes=1
DefaultTime=01:00:00 MaxTime=01:00:00 AllowGroups=ALL Priority=10
DisableRootJobs=NO RootOnly=NO Hidden=NO Shared=FORCE:1 GraceTime=0
MaxCPUsPerNode=32 ReqResv=NO PreemptMode=GANG,SUSPEND State=UP Nodes=node017

Reply via email to