Note that with the newest marathon that is capable of handling multiple roles, you would not need to run a dedicated marathon instance.
On Tue, Oct 1, 2019 at 8:17 AM Grégoire Seux <g.s...@criteo.com> wrote: > Hello, > > I'm wondering how other mesos users deal with scheduling of large tasks > (using all resources offered by most agents). > > On our cluster, we have various application launched mainly by marathon. > Some of those applications have large instances (30 cpus) which use all > resources from agents (most of our agents expose 30 cpus to mesos). Beyond > these large applications (many instances, many resource per instance) we > have a lot more applications whose instances are of various size (from 1 to > 10 cpus). > > Our issue lies with scheduling, since marathon uses offers from mesos as > they come and it creates fragmentation: most agents have small tasks > running which prevents big tasks to be scheduled. In an ideal world, mesos > (or marathon) would make sure some apps (let's say frameworks if mesos > takes that responsibility) have guarantees on large offers. We also have > non-marathon in-house frameworks which have similar needs to launch large > tasks. > > Our current solution is to: > > - use a dedicated marathon instance (and a dedicated role) for those > big applications > - dedicate agents to this role > > Of course, this require extra work since our mesos clusters are now > sharded (it creates additional toil in term of maintenance & capacity > planning). > Our thinking is that mesos allocator might be improved to distribute > offers with a better heuristic than currently (offers are randomly sorted). > A bit similar to what was suggested on > http://mail-archives.apache.org/mod_mbox/mesos-user/201906.mbox/%3cCAHReGaiY0nJ0AevMvKbxAZsy2Xc=jmtszcucdxryzbvwkvv...@mail.gmail.com%3e, > we could imagine to sort offers (offers from most used slaves first). > > So I'm curious on how other users handle this kind of needs! > > Regards, > > -- > Grégoire Seux >