In larger deployments, with many applications, you may not always be able
to ask good memory practices from app developers. We've found that
reporting *why* a job was killed, with details of container utilization, is
an effective way of helping app developers get better at mem mgmt.

The alternative, just having jobs die, incentives bad behaviors. For
example, a hurried job owner may just double memory of the executor,
trading slack for stability.

On Fri, Feb 12, 2016 at 6:36 AM Harry Metske <[email protected]> wrote:

> We don't want to use Docker (yet) in this environment, so DockerContainerizer
> is not an option.
> After thinking a bit longer, I tend to agree with Kamil and let the
> problem be handled differently.
>
> Thanks for the amazing fast responses!
>
> kind regards,
> Harry
>
>
> On 12 February 2016 at 12:28, Kamil Chmielewski <[email protected]>
> wrote:
>
>> On Fri, Feb 12, 2016 at 6:12 PM, Harry Metske <[email protected]>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>> Is there a specific reason why the slave does not first send a TERM
>>>>>> signal, and if that does not help after a certain timeout, send a KILL
>>>>>> signal?
>>>>>> That would give us a chance to cleanup consul registrations (and
>>>>>> other cleanup).
>>>>>>
>>>>>>
>> First of all it's wrong that you want to handle memory limit in your app.
>> Things like this are outside of its scope. Your app can be lost because
>> many different system or hardware failures that you just can't caught. You
>> need to let it crash and design your architecture with this in mind.
>> Secondly Mesos SIGKILL is consistent with linux OOM killer and it do the
>> right thing
>> https://github.com/torvalds/linux/blob/4e5448a31d73d0e944b7adb9049438a09bc332cb/mm/oom_kill.c#L586
>>
>> Best regards,
>> Kamil
>>
>
>

Reply via email to