On Wed, 16 Feb 2011 06:22:29 -0800, Bjørn-Helge Mevik <[email protected]> 
wrote:

> > How many active and queued jobs are there?
> 
> At the time, about 1000 running jobs, and about 1000 queued jobs.
> 
> The problem is most likely related to the load of the cluster, so it is
> hard to investigate this on our test cluster.  Is there some
> debug/logging output that would help us figure out what happens?

I have found statistical profilers like oprofile very useful in the
past for this kind of debugging. Assuming that slurmctld is doing
something on the CPU when the scheduling takes a long time 
(and not waiting or sleeping for some reason), you might see if
oprofile will shed any light.

Quickstart:

 # Start profiling
 opcontrol --separate=all --start --vmlinux=/boot/vmlinux

 # Exercise slow scheduling issue ... 

 # Stop profiling
 opcontrol --stop

 # Get samples from slurmctld

 opreport --symbols | grep slurmctld


Depending on what distro you are using, there may be other tools
that could give us hints into what slurmctld is doing when hitting
this issue (perf, systemtap, etc.)

mark
 
> ## slurm.conf: main configuration file for SLURM
> ## $Id: slurm.conf,v 1.25 2011/02/12 16:59:04 root Exp root $
> 
> 
> ###
> ### Cluster
> ###
> 
> ClusterName=titan
> #default: AuthType=auth/munge
> #default: CryptoType=crypto/munge
> SlurmctldPort=6817
> SlurmdPort=6818
> TmpFs=/work
> #default: TreeWidth=50  Use ceil(sqrt(#nodes))
> TreeWidth=5
> 
> ## Timers:
> #default: MessageTimeout=10
> SlurmdTimeout=36000
> WaitTime=0
> 
> 
> ###
> ### Slurmctld
> ###
> 
> ControlMachine=teflon
> #default: MinJobAge=300
> SlurmUser=slurm
> StateSaveLocation=/state/partition1/slurm/slurmstate
> 
> 
> ###
> ### Nodes
> ###
> 
> FastSchedule=2
> HealthCheckInterval=60
> HealthCheckProgram=/sbin/healthcheck
> ReturnToService=1
> Nodename=DEFAULT CoresPerSocket=2 Sockets=2 RealMemory=3949 State=unknown 
> TmpDisk=10000 Weight=2027
> PartitionName=DEFAULT MaxTime=Infinite State=up Shared=NO
> Include /etc/slurm/slurmnodes.conf
> 
> 
> ###
> ### Jobs
> ###
> 
> PropagateResourceLimits=NONE
> DefMemPerCPU=500
> EnforcePartLimits=yes
> #default: InactiveLimit=0
> JobFileAppend=1
> #default: JobRequeue=1
> JobSubmitPlugins=lua
> #default: MaxJobCount=10000
> #default: MpiDefault=none #FIXME: openmpi?
> #default: OverTimeLimit=0
> VSizeFactor=150
> 
> ## Prologs/Epilogs
> # run by slurmctld as SlurmUser on ControlMachine before granting a job 
> allocation:
> #PrologSlurmctld=
> # run by slurmd on each node prior to the first job step on the node:
> Prolog=/site/sbin/slurmprolog
> # run by srun on the node running srun, prior to the launch of a job step:
> #SrunProlog=
> # run as user for each task prior to initiate the task:
> TaskProlog=/site/sbin/taskprolog
> # run as user for each task after the task finishes:
> #TaskEpilog=
> # run by srun on the node running srun, after a job step finishes:
> #SrunEpilog=
> # run as root on each node when job has completed
> Epilog=/site/sbin/slurmepilog
> # run as SlurmUser on ControlMachine after the allocation is released:
> #EpilogSlurmctld=
> 
> 
> ###
> ### Job Priority
> ###
> 
> PriorityType=priority/multifactor
> #default: PriorityCalcPeriod=5
> #default: PriorityDecayHalfLife=7-0 #(7 days)
> #default: PriorityUsageResetPeriod=NONE
> #default: PriorityMaxAge=7-0 #(7 days)
> #default: PriorityFavorSmall=no
> PriorityWeightAge=10000
> #default: PriorityWeightFairshare=0
> PriorityWeightJobSize=1000
> #default: PriorityWeightPartition=0
> PriorityWeightQOS=10000
> 
> 
> ###
> ### Scheduling
> ###
> 
> SchedulerType=sched/backfill
> #default: 
> SchedulerParameters=default_queue_depth=100,defer=?,bf_interval=30,bf_window=1440,max_job_bf=50
> SelectType=select/cons_res
> SelectTypeParameters=CR_CPU_Memory
> PreemptMode=requeue
> #PreemptMode=checkpoint               # FIXME: cancels if checkpoint is not 
> possible!
> PreemptType=preempt/qos
> CompleteWait=32                       # KillWait + 2
> #default: KillWait=30
> 
> 
> ###
> ### Checkpointing
> ###
> 
> # ************** WARNING ***********************
> # *** ENABLING/DISABLING THIS KILLS ALL JOBS ***
> # **********************************************
> CheckpointType=checkpoint/blcr
> JobCheckpointDir=/state/partition1/slurm/checkpoint
> 
> 
> ###
> ### Logging
> ###
> 
> SlurmctldDebug=6
> SlurmctldLogFile=/var/log/slurm/slurmctld.log
> SlurmSchedLogLevel=1
> SlurmSchedLogFile=/var/log/slurm/sched.log
> SlurmdDebug=5
> SlurmdLogFile=/var/log/slurm/slurmd.log
> #default: DebugFlags=
> 
> 
> ###
> ### Accounting (Slurmdbd)
> ###
> 
> AccountingStorageType=accounting_storage/slurmdbd
> AccountingStorageHost=blaster
> JobAcctGatherType=jobacct_gather/linux
> #default: JobAcctGatherFrequency=30
> ProctrackType=proctrack/linuxproc # FIXME: check out cgroup
> AccountingStorageEnforce=limits,qos
> # combination of associations < limits < wckeys, qos
> 
> -- 
> Cheers,
> B/H

Reply via email to