Hi

I don't understand why node is busy after job is launched.

slurm 2.6.0-0pre1

grep 86742 /slurmdb/log/Slurmctld.log
[2013-06-03T09:45:46+03:00] _slurm_rpc_submit_batch_job JobId=86742 usec=589
[2013-06-03T09:46:12+03:00] backfill: Started JobId=86742 on c196
[2013-06-03T09:46:14+03:00] _slurm_rpc_job_step_create for job 86742: Requested 
nodes are busy
[2013-06-03T09:47:14+03:00] _slurm_rpc_job_step_create for job 86742: Requested 
nodes are busy

c196:

scontrol show node c196
NodeName=c196 Arch=x86_64 CoresPerSocket=8
   CPUAlloc=16 CPUErr=0 CPUTot=16 CPULoad=4.40 Features=(null)
   Gres=(null)
   NodeAddr=c196 NodeHostName=c196
   OS=Linux RealMemory=64000 Sockets=2 Boards=1
   State=ALLOCATED ThreadsPerCore=1 TmpDisk=1800000 Weight=10
   BootTime=2013-05-16T17:45:13 SlurmdStartTime=2013-05-16T17:46:48
   CurrentWatts=0 LowestJoules=0 ConsumedJoules=0


job error file:
srun: mem < mem-per-cpu - resizing mem to be equal to mem-per-cpu
srun: Job step creation temporarily disabled, retrying

c196 Slurmd.log

[2013-06-03T09:46:12+03:00] Launching batch job 86742 for UID 18991
[2013-06-03T09:46:12+03:00] Job accounting gather LINUX plugin loaded
[2013-06-03T09:46:12+03:00] switch NONE plugin loaded
[2013-06-03T09:46:12+03:00] Received cpu frequency information for 16 cpus
[2013-06-03T09:46:12+03:00] [86742] task/cgroup: loaded
[2013-06-03T09:46:12+03:00] [86742] Checkpoint plugin loaded: checkpoint/none
[2013-06-03T09:46:12+03:00] [86742] debug level = 2
[2013-06-03T09:46:12+03:00] [86742] task 0 (18044) started 
2013-06-03T09:46:12+03:00
[2013-06-03T09:46:12+03:00] [86742] AcctGatherEnergy NONE plugin loaded
[2013-06-03T09:46:42+03:00] Launching batch job 86743 for UID 18991
[2013-06-03T09:46:42+03:00] Job accounting gather LINUX plugin loaded
[2013-06-03T09:46:42+03:00] switch NONE plugin loaded
[2013-06-03T09:46:42+03:00] Received cpu frequency information for 16 cpus
[2013-06-03T09:46:42+03:00] [86743] task/cgroup: loaded
[2013-06-03T09:46:42+03:00] [86743] Checkpoint plugin loaded: checkpoint/none
[2013-06-03T09:46:42+03:00] [86743] debug level = 2
[2013-06-03T09:46:42+03:00] [86743] task 0 (18462) started 
2013-06-03T09:46:42+03:00
[2013-06-03T09:46:42+03:00] [86743] AcctGatherEnergy NONE plugin loaded
[2013-06-03T10:02:17+03:00] [86743] auth plugin for Munge 
(http://code.google.com/p/munge/) loaded
[2013-06-03T10:02:17+03:00] [86742] auth plugin for Munge 
(http://code.google.com/p/munge/) loaded


18041 ?        Sl     0:00 slurmstepd: [86742] 
18044 ?        S      0:00 /bin/bash /slurmdb/tmp/slurmd/job86742/slurm_script
18450 ?        S      0:00 srun blastn -num_alignments 5 -num_threads 6 -query 
/wrk/user/pb_20506_tmpdir/pb_chunk_00005.fasta -db nt -out 
/wrk/user/pb_20506_tmpdir/pb_chunk_00005.fasta.result


#SBATCH -t 24:00:00
#SBATCH -n 6
#SBATCH -p parallel
#SBATCH --nodes 1
#SBATCH --ntasks 1
#SBATCH --cpus-per-task=6
#SBATCH --mem 16000

Best Regards,
Tommi

Reply via email to