That worked exactly as it should :)

Thanks Moe!

On Wed, Jul 27, 2011 at 9:23 AM,  <[email protected]> wrote:
> Mark,
>
> This was easy to reproduce in SLURM version 2.3 and there is a simple two
> line patch to correct this problem. In case you have difficulty applying the
> patch to SLURM version 2.2, you can reference the change in git here:
>
> https://github.com/SchedMD/slurm/commit/176c3ce9e7d311a36185530cb113e1203a188506
>
>
>
> Quoting Mark Nelson <[email protected]>:
>
>> Hi there,
>>
>> We're running SLURM 2.2.7 and we've found some odd behaviour of salloc
>> that I wanted to ask about:
>> If a user asks for an allocation using salloc that ends up pending
>> because the machine is full and they then Ctrl-C's that salloc, the
>> resulting job ends up in a "COMPLETED" state with a negative RunTime
>> because the StartTime was set to be some time in the future (and the
>> EndTime is the time that the salloc was Ctrl-C'ed).
>>
>> This isn't seen if the salloc is cancelled externally with scancel -
>> in this case the JobState ends up being CANCELLED with a RunTime of 0
>> and StartTime is set to be equal to EndTime (the time of the scancel).
>>
>> So it seems that when scancel gets the SIGINT and exits it doesn't set
>> the StartTime to be equal to the EndTime (or set RunTime to 0). Should
>> it do this?
>>
>> I haven't reproduced it with SLURM 2.3 yet but I'm pretty sure we saw
>> this behaviour in SLURM 2.1.x as well. I also haven't started digging
>> into the code yet but I figured I'd ask first just in case it's an
>> obvious fix for someone more familiar with the code.
>>
>> Here's the output of scontrol show job for the couple of cases  outlined
>> above:
>>
>> # the Ctrl-C'ed case:
>> ~> salloc -N 128 --time=20:0
>> salloc: Pending job allocation 17654
>> salloc: job 17654 queued and waiting for resources
>> salloc: Job aborted due to signal
>> salloc: Job allocation 17654 has been revoked.
>>
>> ~> scontrol show job 17654
>> JobId=17654 Name=bash
>>   UserId=markn(589) GroupId=ibm(502)
>>   Priority=2 Account=ibm QOS=normal
>>   JobState=PENDING Reason=Resources Dependency=(null)
>>   Requeue=1 Restarts=0 BatchFlag=0 ExitCode=0:0
>>   RunTime=00:00:00 TimeLimit=00:20:00 TimeMin=N/A
>>   SubmitTime=2011-07-26T12:08:14 EligibleTime=2011-07-26T12:08:14
>>   StartTime=2011-07-26T17:11:57 EndTime=Unknown
>>   SuspendTime=None SecsPreSuspend=0
>>   Partition=main AllocNode:Sid=tambo:7313
>>   ReqBP_List=(null) ExcBP_List=(null)
>>   BP_List=(null)
>>   NumNodes=128-128 NumCPUs=512-512 CPUs/Task=1 ReqS:C:T=*:*:*
>>   MinCPUsNode=1 MinMemoryNode=0 MinTmpDiskNode=0
>>   Features=(null) Gres=(null) Reservation=(null)
>>   Shared=OK Contiguous=0 Licenses=(null) Network=(null)
>>   Command=(null)
>>   WorkDir=/vlsci/IBM/markn/gromacs/workshop/d.dppc
>>   Block_ID=unassigned
>>   Connection=Small Reboot=no Rotate=yes Geometry=0x0x0
>>   CnloadImage=default
>>   MloaderImage=default
>>   IoloadImage=default
>>
>> ~> scontrol show job 17654
>> JobId=17654 Name=bash
>>   UserId=markn(589) GroupId=ibm(502)
>>   Priority=2 Account=ibm QOS=normal
>>   JobState=COMPLETED Reason=Resources Dependency=(null)
>>   Requeue=1 Restarts=0 BatchFlag=0 ExitCode=0:0
>>   RunTime=-05:-03:-32 TimeLimit=00:20:00 TimeMin=N/A
>>   SubmitTime=2011-07-26T12:08:14 EligibleTime=2011-07-26T12:08:14
>>   StartTime=2011-07-26T17:11:57 EndTime=2011-07-26T12:08:25
>>   SuspendTime=None SecsPreSuspend=0
>>   Partition=main AllocNode:Sid=tambo:7313
>>   ReqBP_List=(null) ExcBP_List=(null)
>>   BP_List=(null)
>>   NumNodes=128-128 NumCPUs=512-512 CPUs/Task=1 ReqS:C:T=*:*:*
>>   MinCPUsNode=1 MinMemoryNode=0 MinTmpDiskNode=0
>>   Features=(null) Gres=(null) Reservation=(null)
>>   Shared=OK Contiguous=0 Licenses=(null) Network=(null)
>>   Command=(null)
>>   WorkDir=/vlsci/IBM/markn/gromacs/workshop/d.dppc
>>   Block_ID=unassigned
>>   Connection=Small Reboot=no Rotate=yes Geometry=0x0x0
>>   CnloadImage=default
>>   MloaderImage=default
>>   IoloadImage=default
>>
>>
>> # the scancel case:
>> ~> salloc -N 128 --time=20:0
>> salloc: Pending job allocation 17658
>> salloc: job 17658 queued and waiting for resources
>> salloc: Job allocation 17658 has been revoked.
>> salloc: Job has been cancelled
>> salloc: error: Failed to allocate resources: No error
>>
>> ~> scontrol show job 17658
>> JobId=17658 Name=bash
>>   UserId=markn(589) GroupId=ibm(502)
>>   Priority=2 Account=ibm QOS=normal
>>   JobState=PENDING Reason=Priority Dependency=(null)
>>   Requeue=1 Restarts=0 BatchFlag=0 ExitCode=0:0
>>   RunTime=00:00:00 TimeLimit=00:20:00 TimeMin=N/A
>>   SubmitTime=2011-07-26T13:00:31 EligibleTime=2011-07-26T13:00:31
>>   StartTime=2011-07-26T17:23:47 EndTime=Unknown
>>   SuspendTime=None SecsPreSuspend=0
>>   Partition=main AllocNode:Sid=tambo:7313
>>   ReqBP_List=(null) ExcBP_List=(null)
>>   BP_List=(null)
>>   NumNodes=128-128 NumCPUs=512-512 CPUs/Task=1 ReqS:C:T=*:*:*
>>   MinCPUsNode=1 MinMemoryNode=0 MinTmpDiskNode=0
>>   Features=(null) Gres=(null) Reservation=(null)
>>   Shared=OK Contiguous=0 Licenses=(null) Network=(null)
>>   Command=(null)
>>   WorkDir=/vlsci/IBM/markn/gromacs/workshop/d.dppc
>>   Block_ID=unassigned
>>   Connection=Small Reboot=no Rotate=yes Geometry=0x0x0
>>   CnloadImage=default
>>   MloaderImage=default
>>   IoloadImage=default
>>
>> ~> scancel 17658
>>
>> ~> scontrol show job 17658
>> JobId=17658 Name=bash
>>   UserId=markn(589) GroupId=ibm(502)
>>   Priority=2 Account=ibm QOS=normal
>>   JobState=CANCELLED Reason=Priority Dependency=(null)
>>   Requeue=1 Restarts=0 BatchFlag=0 ExitCode=0:0
>>   RunTime=00:00:00 TimeLimit=00:20:00 TimeMin=N/A
>>   SubmitTime=2011-07-26T13:00:31 EligibleTime=2011-07-26T13:00:31
>>   StartTime=2011-07-26T13:01:02 EndTime=2011-07-26T13:01:02
>>   SuspendTime=None SecsPreSuspend=0
>>   Partition=main AllocNode:Sid=tambo:7313
>>   ReqBP_List=(null) ExcBP_List=(null)
>>   BP_List=(null)
>>   NumNodes=128-128 NumCPUs=512-512 CPUs/Task=1 ReqS:C:T=*:*:*
>>   MinCPUsNode=1 MinMemoryNode=0 MinTmpDiskNode=0
>>   Features=(null) Gres=(null) Reservation=(null)
>>   Shared=OK Contiguous=0 Licenses=(null) Network=(null)
>>   Command=(null)
>>   WorkDir=/vlsci/IBM/markn/gromacs/workshop/d.dppc
>>   Block_ID=unassigned
>>   Connection=Small Reboot=no Rotate=yes Geometry=0x0x0
>>   CnloadImage=default
>>   MloaderImage=default
>>   IoloadImage=default
>>
>>
>> Thanks!
>> Mark.
>>
>
>
>

Reply via email to