Hi there,

We have a framework that runs on Mesos and DC/OS. There is a core and an agent 
design to our framework which equates to a Mesos scheduler and executor 
respectively. The executor is responsible for forking and managing processes 
w.r.t. to our problem domain. Given that the executor is written in Scala and 
runs on the JVM, we find that it requires at least 1.9 CPUs to be allocated in 
order to function reasonably well. Also, given that it is a JVM process we also 
“warm up” the executors by starting them for each distinct node that we receive 
offers for. This keeps our domain of task management feeling responsive.

Our problem is that our executor will consume 1.9 CPUs even when whether we 
have no further tasks. Given that Mesos deducts 1.9 from the number of 
available CPUs on each node, our users quickly complain that there’s no 
resource left to run anything else.

I’m hoping to solicit ideas on how we can manage our executor more effectively. 
Clearly, consuming 1.9 cpus when effectively doing nothing is undesirable.

Some ideas:

* start the executor only when required - we tried this and the resulting 
experience felt sluggish given the overhead of starting the JVM based executor
* start the executor with fewer CPU requirements (say, 1.0 CPUs), and then 
change its CPU share via ExecutorInfo when we have tasks to run - I’m not sure 
that this is possible - I think Mesos complains if ExecutorInfo is changed 
given that a previous task has supplied it
* Given Mesos 1.3 and its support for multiple roles, have our framework 
register its own role so that the user has more control over where our 
executors are placed - at present we target all nodes where we receive an offer 
i.e. “*”.
* re-write the executor off the JVM e.g. using Rust - this would be non-trivial

Thoughts/more ideas?

Thanks in advance.

Kind regards,.
Christopher

Christopher Hunt
Technical Lead, Lightbend Enterprise Suite
@huntchr
UTC+10

Reply via email to