We're running SLURM 2.6.6-2 on two separate clusters.  On top of RedHat 6.5
in both cases, I believe.

 

On one of the clusters, a command like:

scontrol update JobId=12345 TimeLimit=1-0:00:00 

works fine.  We do 

scontrol show JobId and see the TimeLimit is now set to one day and the
EndTime has also been adjusted accordingly.

 

But on the other cluster, the TimeLimit becomes one day, but the EndTime
gets set to the current date and time, causing the job to terminate.  It
doesn't matter what value we use for TimeLimit. Similarly, if we modify the
EndTime instead of TimeLimit, we see the EndTime modified as desired, but
the TimeLimit then looks weird; we see things like
TimeLimit=4835679-34:02:00.

 

I am assuming there are internal calculations related to Dates and Times
that have some underlying dependency on external packages installed in the
OS, and that one system has an older, broken version of these packages.

 

Has anyone seen this before or knows how the changes to TimeLimit and
EndTime are being handled internally?

 

Thanks

 

Reply via email to