A user input a job for which half the nodes were sleeping. Slurm woke
the nodes and the job ran but the state hasn't changed from configuring.

slurm 2.6.5

[2014-01-31T07:54:03.848] sched: update_job: releasing user hold for job_id 
13186
[2014-01-31T07:54:03.848] _slurm_rpc_update_job complete JobId=13186 uid=1345 
usec=150
[2014-01-31T07:54:03.849] debug:  sched: Running job scheduler
[2014-01-31T07:54:03.880] sched: Allocate JobId=13186 NodeList=delta[29-56] 
#CPUs=112
[2014-01-31T07:54:04.712] power_save: waking nodes delta[29-42]
[2014-01-31T07:54:11.000] debug:  backfill: beginning
[2014-01-31T07:54:11.000] debug:  backfill: no jobs to backfill
[2014-01-31T07:54:58.015] debug:  sched: Running job scheduler
[2014-01-31T07:55:45.025] debug:  Spawning ping agent for delta[29-42]
[2014-01-31T07:55:51.026] error: Nodes delta[29-42] not responding
[2014-01-31T07:55:58.027] debug:  sched: Running job scheduler
[2014-01-31T07:56:25.950] Node delta37 rebooted 58 secs ago
[2014-01-31T07:56:26.244] Node delta31 rebooted 58 secs ago
[2014-01-31T07:56:26.284] Node delta35 rebooted 58 secs ago
[2014-01-31T07:56:27.134] Node delta36 rebooted 58 secs ago
[2014-01-31T07:56:27.811] Node delta30 rebooted 63 secs ago
[2014-01-31T07:56:28.222] Node delta34 rebooted 64 secs ago
[2014-01-31T07:56:28.313] Node delta38 rebooted 60 secs ago
[2014-01-31T07:56:28.609] Node delta40 rebooted 63 secs ago
[2014-01-31T07:56:29.175] Node delta33 rebooted 58 secs ago
[2014-01-31T07:56:29.196] Node delta41 rebooted 60 secs ago
[2014-01-31T07:56:29.287] Node delta42 rebooted 63 secs ago
[2014-01-31T07:56:30.834] Node delta32 rebooted 63 secs ago
[2014-01-31T07:56:33.651] Node delta29 rebooted 63 secs ago
[2014-01-31T07:56:33.772] Node delta39 rebooted 64 secs ago
[2014-01-31T07:57:35.000] debug:  backfill: beginning
[2014-01-31T07:57:35.000] debug:  backfill: no jobs to backfill
[2014-01-31T07:57:58.179] debug:  sched: Running job scheduler
[2014-01-31T07:58:58.186] debug:  sched: Running job scheduler
[2014-01-31T07:59:05.191] debug:  Spawning ping agent for delta[29-42]


# squeue -l 
Fri Jan 31 08:11:35 2014
             JOBID PARTITION     NAME     USER    STATE       TIME TIMELIMIT  
NODES NODELIST(REASON)
             13186      d2d3 mz626_il   nobody CONFIGUR      17:32 UNLIMITED    
 28 delta[29-56]

[2014-01-31T08:15:22.499] _slurm_rpc_update_job complete JobId=13186 uid=1348 
usec=140

Fri Jan 31 08:19:23 2014
             JOBID PARTITION     NAME     USER    STATE       TIME TIMELIMIT  
NODES NODELIST(REASON)
             13186      d2d3 mz626_il   nobody CONFIGUR      25:20 UNLIMITED    
 28 delta[29-56]


Reply via email to