https://issues.apache.org/jira/browse/AURORA-1766
Thanks! > On Sep 7, 2016, at 6:19 PM, Zameer Manji <[email protected]> wrote: > > Changing the UI to account for this might be a good idea now that we support > other executors and multiple executors. Unfortunately, it's not going to be > easy because we don't even persist overhead per task. Would you mind filing a > ticket to track this enhancement? > > Note that pushing overhead too low means that it's possible to create tasks > with such small CPU that the executor cannot start. This means tasks will > fail to launch. I derived the initial value empirically from observing > overhead in a large cluster and rounded up. I think the executor needs 0.1 > cores at least to start a task and consumes ~0.05 cores afterwards to conduct > health checks and monitor process state. > > Note that if you have health checks and too little CPU allocated to your > tasks, it means that you might deny CPU to the process that is being health > checked, causing it to fail randomly. > > On Wed, Sep 7, 2016 at 3:18 PM, Stephan Erb <[email protected] > <mailto:[email protected]>> wrote: > Personally, I would not mind if we drop the executor overhead completely and > ask the users add it on their own. We would probably have to enforce a > minimal task size to prevent Thermos OOMs, but that should not be a big > problem. > > > > On Mi, 2016-09-07 at 16:41 -0400, Rick Mangi wrote: >> One of the problems we saw from this was that aurora doesn’t seem include >> the thermos overhead when computing allocated resources, so we were seeing a >> huge gap between what aurora said we were reserving and what mesos said was >> available. Perhaps the aurora UI should take the thermos executor overhead >> into account when computing used resources. >> >> >>> On Sep 7, 2016, at 4:17 PM, Joshua Cohen <[email protected] >>> <mailto:[email protected]>> wrote: >>> >>> We run internally with -thermos_executor_cpu set to 0 (requiring task >>> owners to account for any executor CPU usage). This is generally safe, but >>> task owners should be notified that there's an outside chance they might >>> see CPU throttling that they were not previously seeing (assuming you're >>> using cgroup/cpu isolation that is). >>> >>> On Wed, Sep 7, 2016 at 2:44 PM, Wesley Chow <[email protected] >>> <mailto:[email protected]>> wrote: >>>> It’s currently set to a default of 0.25, which seems excessive to us since >>>> we tend to run a larger number of small tasks. Is bringing that down to >>>> 0.1 a terrible thing to do? >>>> >>>> Thanks, >>>> Wes >>>> >>>> >>> >> >
signature.asc
Description: Message signed with OpenPGP using GPGMail
