On Thu, Apr 23, 2015 at 5:10 PM, Martin d'Anjou <martin.danjo...@gmail.com> wrote:
Good summary: > You could come up with many options here: > > When to halt: > --halt [condition for halt] > --timeout [condition for halt is an amount of time] > --memfree [condition for halt is an amount of memory] > kill -TERM [condition for halt is the signal] I think our solution should make it possible to extend this list. E.g. maybe it will be possible to detect whether the remote job failed or the network connection to the remote server failed. --memfree is special, however. It retries indefinitely, if the job gets killed due to low memory. > How to handle jobs after a halt: > --halt-job-handling [killpending[,killrunning]] > --timeout-job-handling [killpending[,killrunning]] > and so on. Users could use both kills if they wanted both. And --kill-TERM-job-handling killrunning will always imply killpending, but the opposite is not the case, right? --retries should be thrown into the mix, too. I can easily think of real life situations where the handling of a death due to --memfree is different from a --timeout, and I think the current behaviour (retrying indefinitely) is correct. But do we have a real life situation where we want --halt-job-handling to be different from --timeout-job-handling given that we have --retries? I am reluctant to put in 3 options that are extremely rarely used (there are plenty of options as it is and testing becomes harder the more combinations needs to be tested). > Or, you could use an explicit "plus" sign to mean halt and kill all running > and pending jobs: > --halt +1-99% POLA would say that --halt +1-99% == --halt 1-99% /Ole