Re: New behavior proposal --halt -% with job killing

Ole Tange Sat, 25 Apr 2015 00:42:07 -0700

On Thu, Apr 23, 2015 at 5:10 PM, Martin d'Anjou
<[email protected]> wrote:


Good summary:

> You could come up with many options here:
>
> When to halt:
> --halt [condition for halt]
> --timeout [condition for halt is an amount of time]
> --memfree [condition for halt is an amount of memory]
> kill -TERM [condition for halt is the signal]

I think our solution should make it possible to extend this list. E.g.
maybe it will be possible to detect whether the remote job failed or
the network connection to the remote server failed.

--memfree is special, however. It retries indefinitely, if the job
gets killed due to low memory.

> How to handle jobs after a halt:
> --halt-job-handling [killpending[,killrunning]]
> --timeout-job-handling [killpending[,killrunning]]
> and so on. Users could use both kills if they wanted both.

And --kill-TERM-job-handling

killrunning will always imply killpending, but the opposite is not the
case, right?

--retries should be thrown into the mix, too.

I can easily think of real life situations where the handling of a
death due to --memfree is different from a --timeout, and I think the
current behaviour (retrying indefinitely) is correct. But do we have a
real life situation where we want --halt-job-handling to be different
from --timeout-job-handling given that we have --retries?

I am reluctant to put in 3 options that are extremely rarely used
(there are plenty of options as it is and testing becomes harder the
more combinations needs to be tested).

> Or, you could use an explicit "plus" sign to mean halt and kill all running
> and pending jobs:
> --halt +1-99%

POLA would say that --halt +1-99% == --halt 1-99%


/Ole

Re: New behavior proposal --halt -% with job killing

Reply via email to