I'll echo Sharma's points. While it seems simple enough to see which moving parts you need to implement here, the long-term effort is large. I've been working on Aurora for 4.5 years, and still know of a lot of work we need to do. If your use case can fit into an existing framework (perhaps mod a feature request/contribution here and there), you'll free up a lot of time to focus on the problem you're actually trying to solve.
-=Bill On Mon, Sep 1, 2014 at 10:45 AM, Sharma Podila <[email protected]> wrote: > I am tempted to say that the short answer is, if your option B works, why > bother writing your own scheduler/framework? > > Writing a Mesos framework can be easy. However, writing a fault tolerant > Mesos framework that has good scalability, is performant, and is highly > available can be relatively hard. Here's a few things, off the top of my > head, that helped us make the decision to write our own: > > - There must be a good long term reason to write your own framework. > The scheduling/preemption/allocation model you spoke of may be a good > reason. For us, it was specific scheduling optimizations that are not > generic and are absent in other frameworks. > - Fault tolerance is a combination of a few things, Here's a few to > consider: > - Task reconciliation with Mesos master currently will involve more > than just using the reconcile feature. We augment it with heartbeats > from > tasks, Aurora does GC task, etc.. I believe it will take another Mesos > release (or two?) before we can rely solely on Mesos task > reconciliation. > - Framework itself must be highly available, for example, using > ZooKeeper leader election among multiple framework instances. > - Fault tolerant persistence of task states. For example, when > Mesos calls your framework with a status update of a task, that state > must > be reliably persisted. > - It sounds like achieving fair share allocation via preemptions is > important to you. That "external entity" you refer to may be non-trivial in > the long run. If you were to embark on writing your own framework, another > model to consider is to just have one framework scheduler instance for all > users. Then, put the preemptions and fair share logic inside it. There > could be complexities such as, for heterogeneous mix of task and slave > resource sizes, scaling down an arbitrary number of tasks from user A > doesn't imply they will benefit user B. The scheduler can perform this > better than an external entity, by only preempting the right ones, etc. > - That said, for simpler use cases, it may work just fine to have > an external entity. > - Scheduling itself is a hard problem. And can slow down quickly when > doing anything more than first-fit style, by adding a few constraints and > SLAs. Preemptions, for example, can slow down the scheduler in figuring out > the right tasks to preempt to honor the fair share SLAs. That is, assuming > you have more than a few hundred tasks. > - There were a few talks at MesosCon, ten days ago, on this topic > including one from us. The video/slides from the conference should be > available from MesosCon sometime soon. > > > > > > On Sun, Aug 31, 2014 at 7:51 AM, Stephan Erb <[email protected]> > wrote: > >> Hi everybody, >> >> I would like to assess the effort required to write a custom framework. >> >> Background: We have an application where we can start a flexible number >> of long-running worker processes performing number-crunching. The more >> processes the better. However, we have multiple users, each running an >> instance of the application and therefore competing for resources (as >> each tries to run as many worker processes as possible). >> >> For various reasons, we would like to run our application instances on >> top of mesos. There seem to be two ways to achieve this: >> >> A. Write a custom framework for our application that spawns the >> worker processes on demand. Each user gets to run one framework >> instance. We also need preemption of workers to achieve equality >> among frameworks. We could achieve this using an external entity >> monitoring all frameworks and telling to worst offenders to >> scale down a little. >> B. Instead of writing a framework, use a Service-Scheduler like >> Marathon, Aurora or Singularity to spawn the worker processes. >> Instead of just performing the scale-down, the external entity >> would dictate the number of worker processes for each >> application depending on its demand. >> >> >> The first choice seems to be the natural fit for Mesos. However, >> existing framework like Aurora seem to be battle-tested in regard to >> high availability, race conditions and issues like state reconciliation >> where the world view of scheduler and slaves are drifting apart. >> >> So this question boils down to: When considering to write a custom >> framework, which pitfalls do I have to be aware of? Can I come away with >> blindly implementing the scheduler API? Or do I always have to implement >> stuff like custom state-reconciliation in order to prevent orphaned >> tasks on slaves (for example, when my framework scheduler crashes or is >> temporarily unavailable)? >> >> Thanks for your input! >> >> Best Regards, >> Stephan >> >> >> >> >> >

