Thank you Doug, for your suggestion.

But I really want the job to start.

The problem appears when the timelimit later is increased: the job will
crash when it reaches the MaxTRESMins limit,  and we do not want that
to happen.

I would like to be able to prolong the job, and that the job continues
to run until it has finished or has reached the timelimit.

Cheers,
-- Lennart Karlsson
    UPPMAX, Uppsala University, Sweden
    http://www.uppmax.uu.se


On 01/07/2016 05:38 PM, Douglas Jacobsen wrote:
I think you probably want to add "safe" to AccountingStorageEnforce in
slurm.conf;  that should prevent it from starting jobs that would exceed
association limits.

----
Doug Jacobsen, Ph.D.
NERSC Computer Systems Engineer
National Energy Research Scientific Computing Center <http://www.nersc.gov>
[email protected]

------------- __o
---------- _ '\<,_
----------(_)/  (_)__________________________


On Thu, Jan 7, 2016 at 7:15 AM, Lennart Karlsson <[email protected]>
wrote:


We have set the MaxTRESMins limit on accounts and users, to make it
impossible to start what we think is outrageously large jobs.

But we have found an unwanted side effect:
When the user asks for a longer timelimit, we often allow that, and
when we increase the timelimit, sometimes jobs run into the
MaxTRESMins limit and die:
Dec 28 17:20:18 milou-q slurmctld: [2015-12-28T17:20:09.072] Job 6574528
timed out, the job is at or exceeds assoc 10056(b2013086/ansgar/(null)) max
tres(cpu) minutes of 600000 with 600001

For us, this looks like a bug.

Please, we would prefer the MaxTRESMins limit not to kill already
running jobs.

Cheers,
-- Lennart Karlsson
    UPPMAX, Uppsala University, Sweden
    http://www.uppmax.uu.se


Reply via email to