"Jette, Moe" <[email protected]> writes:

> My tests of this show try_sched() completing in in a few milliseconds
> and I don't see how the existence of a constraint would measurably
> impact performance.

The only thing I can see (from the code, but I don't know whether it makes
any difference in practice), is that if the job has constraints with
counts, _try_sched() runs job_req_node_filter(job_ptr, *avail_bitmap) to
reduce the nodes to test, but not if the job only has constraints
without counts.

I'm sorry for the lack of details.

> What version of SLURM are you using?

2.2.1

> What is your configuration?

Rough overview: ~ 650 nodes, ~ 4400 cpus, priority/multifactor, sched/backfill,
select/cons_res with CR_CPU_Memory, PreemptMode requeue, preempt/qos.

Details: see attached slurm.conf

> Do you have job preemption configured and if so, how?

Yes, preemption by requeueing, PreemptType=preempt/qos.  Basically, all
QoS'es can preempt jobs in the lowpri QoS.

> How many active and queued jobs are there?

At the time, about 1000 running jobs, and about 1000 queued jobs.

The problem is most likely related to the load of the cluster, so it is
hard to investigate this on our test cluster.  Is there some
debug/logging output that would help us figure out what happens?

## slurm.conf: main configuration file for SLURM
## $Id: slurm.conf,v 1.25 2011/02/12 16:59:04 root Exp root $


###
### Cluster
###

ClusterName=titan
#default: AuthType=auth/munge
#default: CryptoType=crypto/munge
SlurmctldPort=6817
SlurmdPort=6818
TmpFs=/work
#default: TreeWidth=50  Use ceil(sqrt(#nodes))
TreeWidth=5

## Timers:
#default: MessageTimeout=10
SlurmdTimeout=36000
WaitTime=0


###
### Slurmctld
###

ControlMachine=teflon
#default: MinJobAge=300
SlurmUser=slurm
StateSaveLocation=/state/partition1/slurm/slurmstate


###
### Nodes
###

FastSchedule=2
HealthCheckInterval=60
HealthCheckProgram=/sbin/healthcheck
ReturnToService=1
Nodename=DEFAULT CoresPerSocket=2 Sockets=2 RealMemory=3949 State=unknown 
TmpDisk=10000 Weight=2027
PartitionName=DEFAULT MaxTime=Infinite State=up Shared=NO
Include /etc/slurm/slurmnodes.conf


###
### Jobs
###

PropagateResourceLimits=NONE
DefMemPerCPU=500
EnforcePartLimits=yes
#default: InactiveLimit=0
JobFileAppend=1
#default: JobRequeue=1
JobSubmitPlugins=lua
#default: MaxJobCount=10000
#default: MpiDefault=none #FIXME: openmpi?
#default: OverTimeLimit=0
VSizeFactor=150

## Prologs/Epilogs
# run by slurmctld as SlurmUser on ControlMachine before granting a job 
allocation:
#PrologSlurmctld=
# run by slurmd on each node prior to the first job step on the node:
Prolog=/site/sbin/slurmprolog
# run by srun on the node running srun, prior to the launch of a job step:
#SrunProlog=
# run as user for each task prior to initiate the task:
TaskProlog=/site/sbin/taskprolog
# run as user for each task after the task finishes:
#TaskEpilog=
# run by srun on the node running srun, after a job step finishes:
#SrunEpilog=
# run as root on each node when job has completed
Epilog=/site/sbin/slurmepilog
# run as SlurmUser on ControlMachine after the allocation is released:
#EpilogSlurmctld=


###
### Job Priority
###

PriorityType=priority/multifactor
#default: PriorityCalcPeriod=5
#default: PriorityDecayHalfLife=7-0 #(7 days)
#default: PriorityUsageResetPeriod=NONE
#default: PriorityMaxAge=7-0 #(7 days)
#default: PriorityFavorSmall=no
PriorityWeightAge=10000
#default: PriorityWeightFairshare=0
PriorityWeightJobSize=1000
#default: PriorityWeightPartition=0
PriorityWeightQOS=10000


###
### Scheduling
###

SchedulerType=sched/backfill
#default: 
SchedulerParameters=default_queue_depth=100,defer=?,bf_interval=30,bf_window=1440,max_job_bf=50
SelectType=select/cons_res
SelectTypeParameters=CR_CPU_Memory
PreemptMode=requeue
#PreemptMode=checkpoint         # FIXME: cancels if checkpoint is not possible!
PreemptType=preempt/qos
CompleteWait=32                 # KillWait + 2
#default: KillWait=30


###
### Checkpointing
###

# ************** WARNING ***********************
# *** ENABLING/DISABLING THIS KILLS ALL JOBS ***
# **********************************************
CheckpointType=checkpoint/blcr
JobCheckpointDir=/state/partition1/slurm/checkpoint


###
### Logging
###

SlurmctldDebug=6
SlurmctldLogFile=/var/log/slurm/slurmctld.log
SlurmSchedLogLevel=1
SlurmSchedLogFile=/var/log/slurm/sched.log
SlurmdDebug=5
SlurmdLogFile=/var/log/slurm/slurmd.log
#default: DebugFlags=


###
### Accounting (Slurmdbd)
###

AccountingStorageType=accounting_storage/slurmdbd
AccountingStorageHost=blaster
JobAcctGatherType=jobacct_gather/linux
#default: JobAcctGatherFrequency=30
ProctrackType=proctrack/linuxproc # FIXME: check out cgroup
AccountingStorageEnforce=limits,qos
# combination of associations < limits < wckeys, qos
-- 
Cheers,
B/H

Reply via email to