That worked exactly as it should :) Thanks Moe!
On Wed, Jul 27, 2011 at 9:23 AM, <[email protected]> wrote: > Mark, > > This was easy to reproduce in SLURM version 2.3 and there is a simple two > line patch to correct this problem. In case you have difficulty applying the > patch to SLURM version 2.2, you can reference the change in git here: > > https://github.com/SchedMD/slurm/commit/176c3ce9e7d311a36185530cb113e1203a188506 > > > > Quoting Mark Nelson <[email protected]>: > >> Hi there, >> >> We're running SLURM 2.2.7 and we've found some odd behaviour of salloc >> that I wanted to ask about: >> If a user asks for an allocation using salloc that ends up pending >> because the machine is full and they then Ctrl-C's that salloc, the >> resulting job ends up in a "COMPLETED" state with a negative RunTime >> because the StartTime was set to be some time in the future (and the >> EndTime is the time that the salloc was Ctrl-C'ed). >> >> This isn't seen if the salloc is cancelled externally with scancel - >> in this case the JobState ends up being CANCELLED with a RunTime of 0 >> and StartTime is set to be equal to EndTime (the time of the scancel). >> >> So it seems that when scancel gets the SIGINT and exits it doesn't set >> the StartTime to be equal to the EndTime (or set RunTime to 0). Should >> it do this? >> >> I haven't reproduced it with SLURM 2.3 yet but I'm pretty sure we saw >> this behaviour in SLURM 2.1.x as well. I also haven't started digging >> into the code yet but I figured I'd ask first just in case it's an >> obvious fix for someone more familiar with the code. >> >> Here's the output of scontrol show job for the couple of cases outlined >> above: >> >> # the Ctrl-C'ed case: >> ~> salloc -N 128 --time=20:0 >> salloc: Pending job allocation 17654 >> salloc: job 17654 queued and waiting for resources >> salloc: Job aborted due to signal >> salloc: Job allocation 17654 has been revoked. >> >> ~> scontrol show job 17654 >> JobId=17654 Name=bash >> UserId=markn(589) GroupId=ibm(502) >> Priority=2 Account=ibm QOS=normal >> JobState=PENDING Reason=Resources Dependency=(null) >> Requeue=1 Restarts=0 BatchFlag=0 ExitCode=0:0 >> RunTime=00:00:00 TimeLimit=00:20:00 TimeMin=N/A >> SubmitTime=2011-07-26T12:08:14 EligibleTime=2011-07-26T12:08:14 >> StartTime=2011-07-26T17:11:57 EndTime=Unknown >> SuspendTime=None SecsPreSuspend=0 >> Partition=main AllocNode:Sid=tambo:7313 >> ReqBP_List=(null) ExcBP_List=(null) >> BP_List=(null) >> NumNodes=128-128 NumCPUs=512-512 CPUs/Task=1 ReqS:C:T=*:*:* >> MinCPUsNode=1 MinMemoryNode=0 MinTmpDiskNode=0 >> Features=(null) Gres=(null) Reservation=(null) >> Shared=OK Contiguous=0 Licenses=(null) Network=(null) >> Command=(null) >> WorkDir=/vlsci/IBM/markn/gromacs/workshop/d.dppc >> Block_ID=unassigned >> Connection=Small Reboot=no Rotate=yes Geometry=0x0x0 >> CnloadImage=default >> MloaderImage=default >> IoloadImage=default >> >> ~> scontrol show job 17654 >> JobId=17654 Name=bash >> UserId=markn(589) GroupId=ibm(502) >> Priority=2 Account=ibm QOS=normal >> JobState=COMPLETED Reason=Resources Dependency=(null) >> Requeue=1 Restarts=0 BatchFlag=0 ExitCode=0:0 >> RunTime=-05:-03:-32 TimeLimit=00:20:00 TimeMin=N/A >> SubmitTime=2011-07-26T12:08:14 EligibleTime=2011-07-26T12:08:14 >> StartTime=2011-07-26T17:11:57 EndTime=2011-07-26T12:08:25 >> SuspendTime=None SecsPreSuspend=0 >> Partition=main AllocNode:Sid=tambo:7313 >> ReqBP_List=(null) ExcBP_List=(null) >> BP_List=(null) >> NumNodes=128-128 NumCPUs=512-512 CPUs/Task=1 ReqS:C:T=*:*:* >> MinCPUsNode=1 MinMemoryNode=0 MinTmpDiskNode=0 >> Features=(null) Gres=(null) Reservation=(null) >> Shared=OK Contiguous=0 Licenses=(null) Network=(null) >> Command=(null) >> WorkDir=/vlsci/IBM/markn/gromacs/workshop/d.dppc >> Block_ID=unassigned >> Connection=Small Reboot=no Rotate=yes Geometry=0x0x0 >> CnloadImage=default >> MloaderImage=default >> IoloadImage=default >> >> >> # the scancel case: >> ~> salloc -N 128 --time=20:0 >> salloc: Pending job allocation 17658 >> salloc: job 17658 queued and waiting for resources >> salloc: Job allocation 17658 has been revoked. >> salloc: Job has been cancelled >> salloc: error: Failed to allocate resources: No error >> >> ~> scontrol show job 17658 >> JobId=17658 Name=bash >> UserId=markn(589) GroupId=ibm(502) >> Priority=2 Account=ibm QOS=normal >> JobState=PENDING Reason=Priority Dependency=(null) >> Requeue=1 Restarts=0 BatchFlag=0 ExitCode=0:0 >> RunTime=00:00:00 TimeLimit=00:20:00 TimeMin=N/A >> SubmitTime=2011-07-26T13:00:31 EligibleTime=2011-07-26T13:00:31 >> StartTime=2011-07-26T17:23:47 EndTime=Unknown >> SuspendTime=None SecsPreSuspend=0 >> Partition=main AllocNode:Sid=tambo:7313 >> ReqBP_List=(null) ExcBP_List=(null) >> BP_List=(null) >> NumNodes=128-128 NumCPUs=512-512 CPUs/Task=1 ReqS:C:T=*:*:* >> MinCPUsNode=1 MinMemoryNode=0 MinTmpDiskNode=0 >> Features=(null) Gres=(null) Reservation=(null) >> Shared=OK Contiguous=0 Licenses=(null) Network=(null) >> Command=(null) >> WorkDir=/vlsci/IBM/markn/gromacs/workshop/d.dppc >> Block_ID=unassigned >> Connection=Small Reboot=no Rotate=yes Geometry=0x0x0 >> CnloadImage=default >> MloaderImage=default >> IoloadImage=default >> >> ~> scancel 17658 >> >> ~> scontrol show job 17658 >> JobId=17658 Name=bash >> UserId=markn(589) GroupId=ibm(502) >> Priority=2 Account=ibm QOS=normal >> JobState=CANCELLED Reason=Priority Dependency=(null) >> Requeue=1 Restarts=0 BatchFlag=0 ExitCode=0:0 >> RunTime=00:00:00 TimeLimit=00:20:00 TimeMin=N/A >> SubmitTime=2011-07-26T13:00:31 EligibleTime=2011-07-26T13:00:31 >> StartTime=2011-07-26T13:01:02 EndTime=2011-07-26T13:01:02 >> SuspendTime=None SecsPreSuspend=0 >> Partition=main AllocNode:Sid=tambo:7313 >> ReqBP_List=(null) ExcBP_List=(null) >> BP_List=(null) >> NumNodes=128-128 NumCPUs=512-512 CPUs/Task=1 ReqS:C:T=*:*:* >> MinCPUsNode=1 MinMemoryNode=0 MinTmpDiskNode=0 >> Features=(null) Gres=(null) Reservation=(null) >> Shared=OK Contiguous=0 Licenses=(null) Network=(null) >> Command=(null) >> WorkDir=/vlsci/IBM/markn/gromacs/workshop/d.dppc >> Block_ID=unassigned >> Connection=Small Reboot=no Rotate=yes Geometry=0x0x0 >> CnloadImage=default >> MloaderImage=default >> IoloadImage=default >> >> >> Thanks! >> Mark. >> > > >
