Hi all, I am trying to setup a power saving configuration with slurm 2.2.7 on Ubuntu 10.04, but the shut down nodes are not recovered when resources are asked. The problem seems to be related to the scheduler, as nodes can be suspended and resumed with the comands:
scontrol update NodeName=jff232 state=POWER_DOWN scontrol show node NodeName=jff232 Arch=x86_64 CoresPerSocket=4 CPUAlloc=0 CPUErr=0 CPUTot=8 Features=(null) Gres=(null) OS=Linux RealMemory=16000 Sockets=2 State=IDLE*+POWER ThreadsPerCore=1 TmpDisk=0 Weight=1 BootTime=2011-09-27T23:00:52 SlurmdStartTime=2011-09-27T23:01:25 Reason=(null) and scontrol update NodeName=jff232 state=POWER_UP In fact, slurm shut down nodes after the inactivity period configured (see atached slurm.conf file), but when some tasks are asked: sbatch -n 8 test-slurm.sh test-slurm.sh --------------------------- #!/bin/bash mpirun mpi_calcs 1000 3 1 1 --------------------------- I can see in /var/log/slurm-llnl/slurmctl.log: debug2: initial priority for job 67 is 10244 debug2: found 1 usable nodes from config containing jff232 debug3: _pick_best_nodes: job 67 idle_nodes 1 share_nodes 1 cons_res: select_p_job_test: job 67 node_req 1 mode 1 cons_res: select_p_job_test: min_n 1 max_n 500000 req_n 1 avail_n 0 node:jff232 cpus:8 c:4 s:2 t:1 mem:16000 a_mem:0 state:0 part:global rows:1 pri:1 row0: num_jobs 0: bitmap: cons_res: cr_job_test: evaluating job 67 on 0 nodes cons_res: cr_job_test: test 0 fail: insufficient resources no job_resources info for job 67 cons_res: select_p_job_test: job 67 node_req 1 mode 1 cons_res: select_p_job_test: min_n 1 max_n 500000 req_n 1 avail_n 1 node:jff232 cpus:8 c:4 s:2 t:1 mem:16000 a_mem:0 state:0 part:global rows:1 pri:1 row0: num_jobs 0: bitmap: cons_res: cr_job_test: evaluating job 67 on 1 nodes cons_res: eval_nodes:0 consec c=8 n=1 b=0 e=0 r=-1 cons_res: cr_job_test: test 0 pass: test_only no job_resources info for job 67 debug3: JobId=67 not runnable with present config _slurm_rpc_submit_batch_job JobId=67 usec=1074 debug: sched: Running job scheduler debug3: sched: JobId=67. State=PENDING. Reason=Resources. Priority=1. Partition=global. But, if I do a slurmctl restart: debug2: found 1 usable nodes from config containing jff232 debug3: _pick_best_nodes: job 67 idle_nodes 1 share_nodes 1 cons_res: select_p_job_test: job 67 node_req 1 mode 0 cons_res: select_p_job_test: min_n 1 max_n 500000 req_n 1 avail_n 1 node:jff232 cpus:8 c:4 s:2 t:1 mem:16000 a_mem:0 state:0 part:global rows:1 pri:1 cons_res: cr_job_test: evaluating job 67 on 1 nodes cons_res: eval_nodes:0 consec c=8 n=1 b=0 e=0 r=-1 cons_res: cr_job_test: test 0 pass - job fits on given resources cons_res: eval_nodes:0 consec c=8 n=1 b=0 e=0 r=-1 cons_res: cr_job_test: test 1 pass - idle resources found cons_res: cr_job_test: distributing job 67 cons_res: cr_job_test: job 67 ncpus 8 cbits 8/8 nbits 1 debug3: dist_task: best_fit : using node[0]:socket[1] : 4 cores available debug3: dist_task: best_fit : using node[0]:socket[0] : 4 cores available ==================== ob_id:67 nhosts:1 ncpus:8 node_req:1 nodes=jff232 Node[0]: Mem(MB):0:0 Sockets:2 Cores:4 CPUs:8:0 Socket[0] Core[0] is allocated Socket[0] Core[1] is allocated Socket[0] Core[2] is allocated Socket[0] Core[3] is allocated Socket[1] Core[0] is allocated Socket[1] Core[1] is allocated Socket[1] Core[2] is allocated Socket[1] Core[3] is allocated -------------------- cpu_array_value[0]:8 reps:1 ==================== debug3: cons_res: _add_job_to_res: job 67 act 0 DEBUG: Dump job_resources: nhosts 1 cb 0-7 debug3: cons_res: adding job 67 to part global row 0 DEBUG: _add_job_to_res (after): part:global rows:1 pri:1 row0: num_jobs 1: bitmap: 0-7 debug3: sched: JobId=67 initiated sched: Allocate JobId=67 NodeList=jff232 #CPUs=8 After that, the node is recoverd and the job is run: power_save: waking nodes jff232 I'd be grateful if anyone could give me some idea about why the resources are not seen if not doing a slurmctl restart each time a job is submitted? Thanks in advance -- Ramiro Alba Centre Tecnològic de Tranferència de Calor http://www.cttc.upc.edu Escola Tècnica Superior d'Enginyeries Industrial i Aeronàutica de Terrassa Colom 11, E-08222, Terrassa, Barcelona, Spain Tel: (+34) 93 739 86 46 -- Aquest missatge ha estat analitzat per MailScanner a la cerca de virus i d'altres continguts perillosos, i es considera que est� net.
# -------------------------------------------------------- # MISCELANEOUS # -------------------------------------------------------- ControlMachine=jffmds BackupController=jff222 AuthType=auth/munge CacheGroups=0 CheckpointType=checkpoint/none JobCheckpointDir=/var/lib/slurm-llnl/checkpoint CryptoType=crypto/munge DisableRootJobs=NO EnforcePartLimits=YES GroupUpdateForce=0 GroupUpdateTime=600 JobFileAppend=0 JobRequeue=0 KillOnBadExit=1 MailProg=/usr/bin/mail MaxJobCount=500 MaxTasksPerNode=128 MpiDefault=none PrivateData=accounts,usage ProctrackType=proctrack/linuxproc PrologSlurmctld=/usr/local/lib/slurm/prologSlurmctld PropagatePrioProcess=0 PropagateResourceLimits=ALL PropagateResourceLimitsExcept=MEMLOCK ReturnToService=1 SallocDefaultCommand=/bin/bash SlurmctldPidFile=/var/run/slurm-llnl/slurmctld.pid SlurmctldPort=6817 SlurmdPidFile=/var/run/slurm-llnl/slurmd.pid SlurmdPort=6818 SlurmdSpoolDir=/var/lib/slurm-llnl/slurmd SlurmUser=slurm StateSaveLocation=/var/lib/slurm-llnl/slurmctld SwitchType=switch/none TaskPlugin=task/affinity TaskPluginParam=Sched TopologyPlugin=topology/none TmpFs=/tmp UsePAM=1 # -------------------------------------------------------- # TIMERS # -------------------------------------------------------- BatchStartTimeout=10 CompleteWait=0 EpilogMsgTime=2000 HealthCheckInterval=300 HealthCheckProgram=/usr/local/sbin/check-nodehealth InactiveLimit=0 KillWait=30 MessageTimeout=10 ResvOverRun=0 MinJobAge=300 OverTimeLimit=3 SlurmctldTimeout=120 SlurmdTimeout=300 VSizeFactor=101 Waittime=0 # -------------------------------------------------------- # SCHEDULING # -------------------------------------------------------- DefMemPerCPU=0 MaxMemPerCPU=0 SchedulerPort=7321 SchedulerType=sched/backfill FastSchedule=1 SchedulerRootFilter=1 SelectType=select/cons_res SelectTypeParameters=CR_CORE_DEFAULT_DIST_BLOCK # -------------------------------------------------------- # JOB PRIORITY # -------------------------------------------------------- PriorityType=priority/multifactor PriorityDecayHalfLife=30-0 PriorityCalcPeriod=5 PriorityFavorSmall=NO PriorityMaxAge=7-0 PriorityUsageResetPeriod=NONE PriorityWeightAge=1000 PriorityWeightFairshare=10000 PriorityWeightJobSize=1000 PriorityWeightPartition=0 PriorityWeightQOS=0 # -------------------------------------------------------- # LOGGING AND ACCOUNTING # -------------------------------------------------------- DebugFlags=Backfill,CPU_Bind,Gres,Priority,Reservation,SelectType,Steps,Triggers SlurmctldDebug=9 SlurmctldLogFile=/var/log/slurm-llnl/slurmctld.log SlurmSchedLogLevel=1 SlurmSchedLogFile=/var/log/slurm-llnl/slurmsched.log SlurmdDebug=3 SlurmdLogFile=/var/log/slurm-llnl/slurmd.log ClusterName=jff AccountingStorageEnforce=limits AccountingStorageHost=jffmds AccountingStoragePass=/var/run/munge/munge.socket.2 AccountingStorageType=accounting_storage/slurmdbd JobCompType=jobcomp/none JobAcctGatherType=jobacct_gather/linux JobAcctGatherFrequency=30 # -------------------------------------------------------- # POWER SAVE SUPPORT FOR IDLE NODES (optional) # -------------------------------------------------------- SuspendProgram=/usr/local/lib/slurm/suspendProgram ResumeProgram=/usr/local/lib/slurm/resumeProgram SuspendTimeout=60 ResumeTimeout=180 ResumeRate=60 #SuspendExcNodes= SuspendRate=60 SuspendTime=3600 # -------------------------------------------------------- # COMPUTE NODES # -------------------------------------------------------- NodeName=jff232 RealMemory=16000 Sockets=2 CoresPerSocket=4 ThreadsPerCore=1 State=UNKNOWN PartitionName=global Nodes=jff232 Default=YES MaxTime=INFINITE State=UP