Thank you Doug, for your suggestion.
But I really want the job to start.
The problem appears when the timelimit later is increased: the job will
crash when it reaches the MaxTRESMins limit, and we do not want that
to happen.
I would like to be able to prolong the job, and that the job continues
to run until it has finished or has reached the timelimit.
Cheers,
-- Lennart Karlsson
UPPMAX, Uppsala University, Sweden
http://www.uppmax.uu.se
On 01/07/2016 05:38 PM, Douglas Jacobsen wrote:
I think you probably want to add "safe" to AccountingStorageEnforce in
slurm.conf; that should prevent it from starting jobs that would exceed
association limits.
----
Doug Jacobsen, Ph.D.
NERSC Computer Systems Engineer
National Energy Research Scientific Computing Center <http://www.nersc.gov>
[email protected]
------------- __o
---------- _ '\<,_
----------(_)/ (_)__________________________
On Thu, Jan 7, 2016 at 7:15 AM, Lennart Karlsson <[email protected]>
wrote:
We have set the MaxTRESMins limit on accounts and users, to make it
impossible to start what we think is outrageously large jobs.
But we have found an unwanted side effect:
When the user asks for a longer timelimit, we often allow that, and
when we increase the timelimit, sometimes jobs run into the
MaxTRESMins limit and die:
Dec 28 17:20:18 milou-q slurmctld: [2015-12-28T17:20:09.072] Job 6574528
timed out, the job is at or exceeds assoc 10056(b2013086/ansgar/(null)) max
tres(cpu) minutes of 600000 with 600001
For us, this looks like a bug.
Please, we would prefer the MaxTRESMins limit not to kill already
running jobs.
Cheers,
-- Lennart Karlsson
UPPMAX, Uppsala University, Sweden
http://www.uppmax.uu.se