Re: thermos executor overhead

Rick Mangi Wed, 07 Sep 2016 17:29:21 -0700

https://issues.apache.org/jira/browse/AURORA-1766


Thanks!


> On Sep 7, 2016, at 6:19 PM, Zameer Manji <[email protected]> wrote:
> 
> Changing the UI to account for this might be a good idea now that we support 
> other executors and multiple executors. Unfortunately, it's not going to be 
> easy because we don't even persist overhead per task. Would you mind filing a 
> ticket to track this enhancement?
> 
> Note that pushing overhead too low means that it's possible to create tasks 
> with such small CPU that the executor cannot start. This means tasks will 
> fail to launch. I derived the initial value empirically from observing 
> overhead in a large cluster and rounded up. I think the executor needs 0.1 
> cores at least to start a task and consumes ~0.05 cores afterwards to conduct 
> health checks and monitor process state.
> 
> Note that if you have health checks and too little CPU allocated to your 
> tasks, it means that you might deny CPU to the process that is being health 
> checked, causing it to fail randomly.
> 
> On Wed, Sep 7, 2016 at 3:18 PM, Stephan Erb <[email protected] 
> <mailto:[email protected]>> wrote:
> Personally, I would not mind if we drop the executor overhead completely and 
> ask the users add it on their own. We would probably have to enforce a 
> minimal task size to prevent Thermos OOMs, but that should not be a big 
> problem.
> 
> 
> 
> On Mi, 2016-09-07 at 16:41 -0400, Rick Mangi wrote:
>> One of the problems we saw from this was that aurora doesn’t seem include 
>> the thermos overhead when computing allocated resources, so we were seeing a 
>> huge gap between what aurora said we were reserving and what mesos said was 
>> available. Perhaps the aurora UI should take the thermos executor overhead 
>> into account when computing used resources.
>> 
>> 
>>> On Sep 7, 2016, at 4:17 PM, Joshua Cohen <[email protected] 
>>> <mailto:[email protected]>> wrote:
>>> 
>>> We run internally with -thermos_executor_cpu set to 0 (requiring task 
>>> owners to account for any executor CPU usage). This is generally safe, but 
>>> task owners should be notified that there's an outside chance they might 
>>> see CPU throttling that they were not previously seeing (assuming you're 
>>> using cgroup/cpu isolation that is).
>>> 
>>> On Wed, Sep 7, 2016 at 2:44 PM, Wesley Chow <[email protected] 
>>> <mailto:[email protected]>> wrote:
>>>> It’s currently set to a default of 0.25, which seems excessive to us since 
>>>> we tend to run a larger number of small tasks. Is bringing that down to 
>>>> 0.1 a terrible thing to do?
>>>> 
>>>> Thanks,
>>>> Wes
>>>> 
>>>> 
>>> 
>> 
>

signature.asc
Description: Message signed with OpenPGP using GPGMail

Re: thermos executor overhead

Reply via email to