A user input a job for which half the nodes were sleeping. Slurm woke
the nodes and the job ran but the state hasn't changed from configuring.
slurm 2.6.5
[2014-01-31T07:54:03.848] sched: update_job: releasing user hold for job_id
13186
[2014-01-31T07:54:03.848] _slurm_rpc_update_job complete JobId=13186 uid=1345
usec=150
[2014-01-31T07:54:03.849] debug: sched: Running job scheduler
[2014-01-31T07:54:03.880] sched: Allocate JobId=13186 NodeList=delta[29-56]
#CPUs=112
[2014-01-31T07:54:04.712] power_save: waking nodes delta[29-42]
[2014-01-31T07:54:11.000] debug: backfill: beginning
[2014-01-31T07:54:11.000] debug: backfill: no jobs to backfill
[2014-01-31T07:54:58.015] debug: sched: Running job scheduler
[2014-01-31T07:55:45.025] debug: Spawning ping agent for delta[29-42]
[2014-01-31T07:55:51.026] error: Nodes delta[29-42] not responding
[2014-01-31T07:55:58.027] debug: sched: Running job scheduler
[2014-01-31T07:56:25.950] Node delta37 rebooted 58 secs ago
[2014-01-31T07:56:26.244] Node delta31 rebooted 58 secs ago
[2014-01-31T07:56:26.284] Node delta35 rebooted 58 secs ago
[2014-01-31T07:56:27.134] Node delta36 rebooted 58 secs ago
[2014-01-31T07:56:27.811] Node delta30 rebooted 63 secs ago
[2014-01-31T07:56:28.222] Node delta34 rebooted 64 secs ago
[2014-01-31T07:56:28.313] Node delta38 rebooted 60 secs ago
[2014-01-31T07:56:28.609] Node delta40 rebooted 63 secs ago
[2014-01-31T07:56:29.175] Node delta33 rebooted 58 secs ago
[2014-01-31T07:56:29.196] Node delta41 rebooted 60 secs ago
[2014-01-31T07:56:29.287] Node delta42 rebooted 63 secs ago
[2014-01-31T07:56:30.834] Node delta32 rebooted 63 secs ago
[2014-01-31T07:56:33.651] Node delta29 rebooted 63 secs ago
[2014-01-31T07:56:33.772] Node delta39 rebooted 64 secs ago
[2014-01-31T07:57:35.000] debug: backfill: beginning
[2014-01-31T07:57:35.000] debug: backfill: no jobs to backfill
[2014-01-31T07:57:58.179] debug: sched: Running job scheduler
[2014-01-31T07:58:58.186] debug: sched: Running job scheduler
[2014-01-31T07:59:05.191] debug: Spawning ping agent for delta[29-42]
# squeue -l
Fri Jan 31 08:11:35 2014
JOBID PARTITION NAME USER STATE TIME TIMELIMIT
NODES NODELIST(REASON)
13186 d2d3 mz626_il nobody CONFIGUR 17:32 UNLIMITED
28 delta[29-56]
[2014-01-31T08:15:22.499] _slurm_rpc_update_job complete JobId=13186 uid=1348
usec=140
Fri Jan 31 08:19:23 2014
JOBID PARTITION NAME USER STATE TIME TIMELIMIT
NODES NODELIST(REASON)
13186 d2d3 mz626_il nobody CONFIGUR 25:20 UNLIMITED
28 delta[29-56]