Some of my dependent jobs are going into BatchHold and the dependencies are missing.
Here I submitted 3 echo sleep jobs and then one job dependent on the other
three. The first 3 completed but the dependent job is hanging in BatchHold.
I have MinJobAge set to 300 - could this be the issue? Looks like the dependent
job lost the dependencies... JobState=PENDING Reason=JobHeldAdmin
Dependency=(null)
1800 C 0 slu 0.00 1.0 - mhill -
rrz011 8 00:01:41 Tue Mar 18 10:11:
57
1801 C 0 slu 0.00 1.0 - mhill -
rrz012 8 00:01:42 Tue Mar 18 10:12:
00
1802 C 0 slu 0.00 1.0 - mhill -
rrz013 8 00:01:42 Tue Mar 18 10:12:
02
These are the jobs that 1803 is dependent on...
[root@rrz-master ~]# scontrol show job 1800
slurm_load_jobs error: Invalid job id specified
[root@rrz-master ~]# scontrol show job 1801
slurm_load_jobs error: Invalid job id specified
[root@rrz-master ~]# scontrol show job 1802
slurm_load_jobs error: Invalid job id specified
This is the dependent job.
[root@rrz-master ~]# scontrol show job 1803
JobId=1803 Name=moab.job.FG0mBi
UserId=mhill(24177) GroupId=mhill(24177)
Priority=0 Account=(null) QOS=(null)
JobState=PENDING Reason=JobHeldAdmin Dependency=(null)
Requeue=0 Restarts=0 BatchFlag=1 ExitCode=0:0
RunTime=00:00:00 TimeLimit=04:00:00 TimeMin=N/A
SubmitTime=2014-03-18T10:10:48 EligibleTime=Unknown
StartTime=Unknown EndTime=Unknown
PreemptTime=None SuspendTime=None SecsPreSuspend=0
Partition=standard AllocNode:Sid=rrz-master:11210
ReqNodeList=(null) ExcNodeList=(null)
NodeList=(null)
NumNodes=1-1 NumCPUs=1 CPUs/Task=1 ReqS:C:T=*:*:*
MinCPUsNode=1 MinMemoryNode=0 MinTmpDiskNode=0
Features=(null) Gres=(null) Reservation=(null)
Shared=OK Contiguous=0 Licenses=(null) Network=(null)
Command=/opt/MOAB/spool/moab.job.FG0mBi
WorkDir=/turquoise/users/mhill
Comment='NACCESSPOLICY=SINGLEJOB??SJID:1494?SID:moab'
<<inline: image002.jpg>>
