Hello,

I'm wondering how other mesos users deal with scheduling of large tasks (using 
all resources offered by most agents).

On our cluster, we have various application launched mainly by marathon. Some 
of those applications have large instances (30 cpus) which use all resources 
from agents (most of our agents expose 30 cpus to mesos). Beyond these large 
applications (many instances, many resource per instance) we have a lot more 
applications whose instances are of various size (from 1 to 10 cpus).

Our issue lies with scheduling, since marathon uses offers from mesos as they 
come and it creates fragmentation: most agents have small tasks running which 
prevents big tasks to be scheduled. In an ideal world, mesos (or marathon) 
would make sure some apps (let's say frameworks if mesos takes that 
responsibility) have guarantees on large offers. We also have non-marathon 
in-house frameworks which have similar needs to launch large tasks.

Our current solution is to:

  *   use a dedicated marathon instance (and a dedicated role) for those big 
applications
  *   dedicate agents to this role

Of course, this require extra work since our mesos clusters are now sharded (it 
creates additional toil in term of maintenance & capacity planning).
Our thinking is that mesos allocator might be improved to distribute offers 
with a better heuristic than currently (offers are randomly sorted). A bit 
similar to what was suggested on 
http://mail-archives.apache.org/mod_mbox/mesos-user/201906.mbox/%3cCAHReGaiY0nJ0AevMvKbxAZsy2Xc=jmtszcucdxryzbvwkvv...@mail.gmail.com%3e,
 we could imagine to sort offers (offers from most used slaves first).

So I'm curious on how other users handle this kind of needs!

Regards,

-- ​
Grégoire Seux

Reply via email to