Hi everybody,

I would like to assess the effort required to write a custom framework.

Background: We have an application where we can start a flexible number
of long-running worker processes performing number-crunching. The more
processes the better. However, we have multiple users, each running an
instance of the application and therefore competing for resources (as
each tries to run as many worker processes as possible). 

For various reasons, we would like to run our application instances on
top of mesos. There seem to be two ways to achieve this:

     A. Write a custom framework for our application that spawns the
        worker processes on demand. Each user gets to run one framework
        instance. We also need preemption of workers to achieve equality
        among frameworks. We could achieve this using an external entity
        monitoring all frameworks and telling to worst offenders to
        scale down a little.
     B. Instead of writing a framework, use a Service-Scheduler like
        Marathon, Aurora or Singularity to spawn the worker processes.
        Instead of just performing the scale-down, the external entity
        would dictate the number of worker processes for each
        application depending on its demand.


The first choice seems to be the natural fit for Mesos. However,
existing framework like Aurora seem to be battle-tested in regard to
high availability, race conditions and issues like state reconciliation
where the world view of scheduler and slaves are drifting apart.

So this question boils down to: When considering to write a custom
framework, which pitfalls do I have to be aware of? Can I come away with
blindly implementing the scheduler API? Or do I always have to implement
stuff like custom state-reconciliation in order to prevent orphaned
tasks on slaves (for example, when my framework scheduler crashes or is
temporarily unavailable)?

Thanks for your input!

Best Regards,
Stephan




Reply via email to