For some resources (like disk, or more acutely - RAM), there's not much we can do to provide assurances. Ultimately resource-driven task termination is managed at the node level, and may represent a real exhaustion of the resource. I'd be worried that trying to augment this might trade one problem for another - where the rationale for killing a task becomes non-deterministic, or even error-prone.
On Wed, Oct 28, 2015 at 3:45 PM, Josh Adams <[email protected]> wrote: > Good afternoon all, > > Is it possible to tell the scheduler to throttle kill rates for a given > job? When all tasks in a job start consuming too much disk or ram because > of an unexpected service dependency meltdown it would be nice if we had a > little buffer time to triage the issue without the scheduler killing them > all en masse for using more than their allocated resources simultaneously... > > Cheers, > Josh >
