Hi there, We have a framework that runs on Mesos and DC/OS. There is a core and an agent design to our framework which equates to a Mesos scheduler and executor respectively. The executor is responsible for forking and managing processes w.r.t. to our problem domain. Given that the executor is written in Scala and runs on the JVM, we find that it requires at least 1.9 CPUs to be allocated in order to function reasonably well. Also, given that it is a JVM process we also “warm up” the executors by starting them for each distinct node that we receive offers for. This keeps our domain of task management feeling responsive.
Our problem is that our executor will consume 1.9 CPUs even when whether we have no further tasks. Given that Mesos deducts 1.9 from the number of available CPUs on each node, our users quickly complain that there’s no resource left to run anything else. I’m hoping to solicit ideas on how we can manage our executor more effectively. Clearly, consuming 1.9 cpus when effectively doing nothing is undesirable. Some ideas: * start the executor only when required - we tried this and the resulting experience felt sluggish given the overhead of starting the JVM based executor * start the executor with fewer CPU requirements (say, 1.0 CPUs), and then change its CPU share via ExecutorInfo when we have tasks to run - I’m not sure that this is possible - I think Mesos complains if ExecutorInfo is changed given that a previous task has supplied it * Given Mesos 1.3 and its support for multiple roles, have our framework register its own role so that the user has more control over where our executors are placed - at present we target all nodes where we receive an offer i.e. “*”. * re-write the executor off the JVM e.g. using Rust - this would be non-trivial Thoughts/more ideas? Thanks in advance. Kind regards,. Christopher Christopher Hunt Technical Lead, Lightbend Enterprise Suite @huntchr UTC+10

