Re: Throttling task kill rates per job?

Josh Adams Wed, 28 Oct 2015 16:01:52 -0700

Hi Bill, thanks for the quick response.

That's fair. I wonder if we could set a "start killing" threshold instead?
For example, we set a "danger zone" limit so that any task that's in the
danger zone is fair game to get killed. The closer it gets to the max (or
over the max of course) makes it more likely to get killed, up to "it
absolutely will be killed right away." This would achieve our goal of
reducing the likelihood of all shards getting killed at the same time, and
preserve the resource exhaustion protection you describe.


Josh

On Wed, Oct 28, 2015 at 3:55 PM, Bill Farner <[email protected]> wrote:

> For some resources (like disk, or more acutely - RAM), there's not much we
> can do to provide assurances.  Ultimately resource-driven task termination
> is managed at the node level, and may represent a real exhaustion of the
> resource.  I'd be worried that trying to augment this might trade one
> problem for another - where the rationale for killing a task becomes
> non-deterministic, or even error-prone.
>
> On Wed, Oct 28, 2015 at 3:45 PM, Josh Adams <[email protected]> wrote:
>
>> Good afternoon all,
>>
>> Is it possible to tell the scheduler to throttle kill rates for a given
>> job? When all tasks in a job start consuming too much disk or ram because
>> of an unexpected service dependency meltdown it would be nice if we had a
>> little buffer time to triage the issue without the scheduler killing them
>> all en masse for using more than their allocated resources simultaneously...
>>
>> Cheers,
>> Josh
>>
>
>

Re: Throttling task kill rates per job?

Reply via email to