Fair sharing, preemptive spark with multitenant support is something that we built at two sigma to allow our users to fairly share a giant cluster. It allows users to use more resources on a less busy cluster, keeps things fair on a busy cluster, and guarantees that no small jobs need to wait for more than a minute or so to start. We're currently open sourcing the entire system, including the multitenant mesos framework and spark scheduler integration, and it should be ready to use in under 4 weeks.
I'd recommend using our system, Cook. If you'd like to hear more about the math behind its scheduling algorithms, watch this talk by Li Jin at Mesoscon: http://youtu.be/BkBMYUe76oI After our initial open sourcing is done, we plan on integrating fenzo to improve Cook's bin packing and ability to do locality-aware scheduling. On Thu, Sep 10, 2015 at 1:05 PM Sharma Podila <[email protected]> wrote: > FYI- > If you are to use Fenzo in writing your framework, it has support for > limiting overall resources used by tasks with the use of a "group name". > That is, all tasks with a group name, say "userA", would be limited to > using the resources specified in the limit for the group. For this to work, > you would have to specify the limits for each user and specify each task's > group name as the user's name, same as in the limits. Each user can be > given different limits, if desired. See this > <https://github.com/Netflix/Fenzo/wiki/Resource-Allocation-Limits> for > details. > > In general, "fair share" is subjective. Quotas fragment the cluster and > can reduce the overall cluster utilization when only few users are active. > One improvement may be to treat the limits as soft limits. That is, let > users use resources beyond their limits if there is no contention. However, > for this to work well, we would need one of two things to be true: > > 1. rate of task completion is high enough that a new user will be able to > get resources after not using the cluster for a while, or, > 2. users' tasks that are consuming more resources than limits can be > preempted when needed for other users. > > The quota management in Mesos, that Guangya gave the link for, seems to > address some of these concerns. My understanding is that the MVP is going > to be the equivalent of hard limits. > > > > On Tue, Sep 8, 2015 at 11:55 PM, Guangya Liu <[email protected]> wrote: > >> Great that it helps! >> >> I think that it is a bit heavy to running Spark+Aurora+Mesos, but you can >> have a try if it can fill your requirement. ;-) >> >> In my understanding, I think that what you may want to have a try with >> Spark + (Customized Spark Scheduler, leverage Fenzo or others) + Mesos, but >> this may involve some code change for spark. >> >> Thanks, >> >> Guangya >> >> On Wed, Sep 9, 2015 at 2:05 PM, RJ Nowling <[email protected]> wrote: >> >>> Thanks, Guangya! >>> >>> Inspired by your comments, I've also been thinking about the option of >>> using Apache Aurora to provide some of the features I want. Spark could be >>> deployed in standalone mode on top of Aurora on top of Mesos. :) >>> >>> Funny enough, two of my colleagues (Tim St. Clair and Erik Erlandson) >>> seem to be tracking and commenting on the epic you linked to. :) >>> >>> On Wed, Sep 9, 2015 at 12:59 AM, Guangya Liu <[email protected]> wrote: >>> >>>> Hi RJ, please check my answers in line. >>>> >>>> Thanks, >>>> >>>> Guangya >>>> >>>> On Wed, Sep 9, 2015 at 1:24 PM, RJ Nowling <[email protected]> wrote: >>>> >>>>> Hi Guangya, >>>>> >>>>> My use case is actually trying to run Spark (in coarse grain mode) >>>>> with multiple users. I wanted ways to better ensure fair scheduling across >>>>> users. Spark provides very few primitives so I was hoping I could use >>>>> Mesos >>>>> to limit resources per user and control how the cluster is partitioned. >>>>> For >>>>> example, I may prefer that a Spark jobs share multiple machines without >>>>> using all resources on a single machine for fault tolerance. >>>>> >>>> For this scenario, you may want to schedule those offered resource >>>> again in framework level, you can leverage fenzo or what ever to enhance >>>> the scheduler part for spark to achieve your goal. >>>> >>>>> >>>>> I'm also considering the case of running multiple frameworks. In this >>>>> case, frameworks would have to coordinate to enforce user quotas and such. >>>>> It seems that this would be better solved somewhere below the framework >>>>> level. >>>>> >>>> For this scenario, there is an epic for "quota management" which can >>>> fill your requirement but it is still undergoing and not available now. >>>> epic: https://issues.apache.org/jira/browse/MESOS-1791 >>>> Design doc: >>>> https://docs.google.com/document/d/16iRNmziasEjVOblYp5bbkeBZ7pnjNlaIzPQqMTHQ-9I/edit?pli=1#heading=h.9g7fqjh6652v >>>> >>>>> >>>>> RJ >>>>> >>>>> >>>>> >>>>> On Sep 8, 2015, at 11:47 PM, Guangya Liu <[email protected]> wrote: >>>>> >>>>> Hi RJ, >>>>> >>>>> I think that your final goal is that you want to use framework running >>>>> on top of mesos to execute some tasks. Such logic should be in the >>>>> framework part. The netflix open sourced a framework scheduler library >>>>> named as fenzo, you may want to take a look at this one to see if it can >>>>> help you. >>>>> >>>>> >>>>> http://techblog.netflix.com/2015/08/fenzo-oss-scheduler-for-apache-mesos.html >>>>> https://github.com/Netflix/Fenzo >>>>> >>>>> Thanks, >>>>> >>>>> Guangya >>>>> >>>>> ------------------------------ >>>>> Date: Tue, 8 Sep 2015 23:09:36 -0500 >>>>> Subject: Re: Setting maximum per-node resources in offers >>>>> From: [email protected] >>>>> To: [email protected] >>>>> >>>>> Thanks, Klaus. >>>>> >>>>> I think I was probably misunderstanding the role of the allocator in >>>>> Mesos versus the scheduler in the framework sitting on top of Mesos. >>>>> Probably out of scope for Mesos to divide up resources as I was >>>>> suggesting. >>>>> >>>>> On Tue, Sep 8, 2015 at 10:48 PM, Klaus Ma <[email protected]> wrote: >>>>> >>>>> If it's the only framework, you will receive all nodes from Mesos as >>>>> offers. You can re-schedule those resources to run tasks on each node. >>>>> >>>>> >>>>> On 2015年09月09日 03:03, RJ Nowling wrote: >>>>> >>>>> Hi all, >>>>> >>>>> I have a smallish cluster with a lot of cores and RAM per node. I >>>>> want to support multiple users so I'd like to set up Mesos to provide a >>>>> maximum of 8 cores per node in the resource offers. Resource offers >>>>> should >>>>> include multiple nodes to reach the requirements of the user. For >>>>> example, >>>>> if the user requests 32 cores, I would like 8 cores from each of 4 nodes. >>>>> >>>>> Is this possible? Or can someone suggest alternatives? >>>>> >>>>> Thanks, >>>>> RJ >>>>> >>>>> >>>>> -- >>>>> Klaus Ma (马达), PMP® | http://www.cguru.net >>>>> >>>>> >>>>> >>>> >>> >> >

