Re: Setting maximum per-node resources in offers

David Greenberg Thu, 10 Sep 2015 10:39:13 -0700

Fair sharing, preemptive spark with multitenant support is something that
we built at two sigma to allow our users to fairly share a giant cluster.
It allows users to use more resources on a less busy cluster, keeps things
fair on a busy cluster, and guarantees that no small jobs need to wait for
more than a minute or so to start. We're currently open sourcing the entire
system, including the multitenant mesos framework and spark scheduler
integration, and it should be ready to use in under 4 weeks.


I'd recommend using our system, Cook. If you'd like to hear more about the
math behind its scheduling algorithms, watch this talk by Li Jin at
Mesoscon: http://youtu.be/BkBMYUe76oI

After our initial open sourcing is done, we plan on integrating fenzo to
improve Cook's bin packing and ability to do locality-aware scheduling.
On Thu, Sep 10, 2015 at 1:05 PM Sharma Podila <[email protected]> wrote:

> FYI-
> If you are to use Fenzo in writing your framework, it has support for
> limiting overall resources used by tasks with the use of a "group name".
> That is, all tasks with a group name, say "userA", would be limited to
> using the resources specified in the limit for the group. For this to work,
> you would have to specify the limits for each user and specify each task's
> group name as the user's name, same as in the limits. Each user can be
> given different limits, if desired. See this
> <https://github.com/Netflix/Fenzo/wiki/Resource-Allocation-Limits> for
> details.
>
> In general, "fair share" is subjective. Quotas fragment the cluster and
> can reduce the overall cluster utilization when only few users are active.
> One improvement may be to treat the limits as soft limits. That is, let
> users use resources beyond their limits if there is no contention. However,
> for this to work well, we would need one of two things to be true:
>
> 1. rate of task completion is high enough that a new user will be able to
> get resources after not using the cluster for a while, or,
> 2. users' tasks that are consuming more resources than limits can be
> preempted when needed for other users.
>
> The quota management in Mesos, that Guangya gave the link for, seems to
> address some of these concerns. My understanding is that the MVP is going
> to be the equivalent of hard limits.
>
>
>
> On Tue, Sep 8, 2015 at 11:55 PM, Guangya Liu <[email protected]> wrote:
>
>> Great that it helps!
>>
>> I think that it is a bit heavy to running Spark+Aurora+Mesos, but you can
>> have a try if it can fill your requirement. ;-)
>>
>> In my understanding, I think that what you may want to have a try with
>> Spark + (Customized Spark Scheduler, leverage Fenzo or others) + Mesos, but
>> this may involve some code change for spark.
>>
>> Thanks,
>>
>> Guangya
>>
>> On Wed, Sep 9, 2015 at 2:05 PM, RJ Nowling <[email protected]> wrote:
>>
>>> Thanks, Guangya!
>>>
>>> Inspired by your comments, I've also been thinking about the option of
>>> using Apache Aurora to provide some of the features I want.  Spark could be
>>> deployed in standalone mode on top of Aurora on top of Mesos. :)
>>>
>>> Funny enough, two of my colleagues (Tim St. Clair and Erik Erlandson)
>>> seem to be tracking and commenting on the epic you linked to.  :)
>>>
>>> On Wed, Sep 9, 2015 at 12:59 AM, Guangya Liu <[email protected]> wrote:
>>>
>>>> Hi RJ, please check my answers in line.
>>>>
>>>> Thanks,
>>>>
>>>> Guangya
>>>>
>>>> On Wed, Sep 9, 2015 at 1:24 PM, RJ Nowling <[email protected]> wrote:
>>>>
>>>>> Hi Guangya,
>>>>>
>>>>> My use case is actually trying to run Spark (in coarse grain mode)
>>>>> with multiple users. I wanted ways to better ensure fair scheduling across
>>>>> users. Spark provides very few primitives so I was hoping I could use 
>>>>> Mesos
>>>>> to limit resources per user and control how the cluster is partitioned. 
>>>>> For
>>>>> example, I may prefer that a Spark jobs share multiple machines without
>>>>> using all resources on a single machine for fault tolerance.
>>>>>
>>>> For this scenario, you may want to schedule those offered resource
>>>> again in framework level, you can leverage fenzo or what ever to enhance
>>>> the scheduler part for spark to achieve your goal.
>>>>
>>>>>
>>>>> I'm also considering the case of running multiple frameworks. In this
>>>>> case, frameworks would have to coordinate to enforce user quotas and such.
>>>>> It seems that this would be better solved somewhere below the framework
>>>>> level.
>>>>>
>>>> For this scenario, there is an epic for "quota management" which can
>>>> fill your requirement but it is still undergoing and not available now.
>>>> epic: https://issues.apache.org/jira/browse/MESOS-1791
>>>> Design doc:
>>>> https://docs.google.com/document/d/16iRNmziasEjVOblYp5bbkeBZ7pnjNlaIzPQqMTHQ-9I/edit?pli=1#heading=h.9g7fqjh6652v
>>>>
>>>>>
>>>>> RJ
>>>>>
>>>>>
>>>>>
>>>>> On Sep 8, 2015, at 11:47 PM, Guangya Liu <[email protected]> wrote:
>>>>>
>>>>> Hi RJ,
>>>>>
>>>>> I think that your final goal is that you want to use framework running
>>>>> on top of mesos to execute some tasks. Such logic should be in the
>>>>> framework part. The netflix open sourced a framework scheduler library
>>>>> named as fenzo, you may want to take a look at this one to see if it can
>>>>> help you.
>>>>>
>>>>>
>>>>> http://techblog.netflix.com/2015/08/fenzo-oss-scheduler-for-apache-mesos.html
>>>>> https://github.com/Netflix/Fenzo
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Guangya
>>>>>
>>>>> ------------------------------
>>>>> Date: Tue, 8 Sep 2015 23:09:36 -0500
>>>>> Subject: Re: Setting maximum per-node resources in offers
>>>>> From: [email protected]
>>>>> To: [email protected]
>>>>>
>>>>> Thanks, Klaus.
>>>>>
>>>>> I think I was probably misunderstanding the role of the allocator in
>>>>> Mesos versus the scheduler in the framework sitting on top of Mesos.
>>>>> Probably out of scope for Mesos to divide up resources as I was 
>>>>> suggesting.
>>>>>
>>>>> On Tue, Sep 8, 2015 at 10:48 PM, Klaus Ma <[email protected]> wrote:
>>>>>
>>>>> If it's the only framework, you will receive all nodes from Mesos as
>>>>> offers. You can re-schedule those resources to run tasks on each node.
>>>>>
>>>>>
>>>>> On 2015年09月09日 03:03, RJ Nowling wrote:
>>>>>
>>>>> Hi all,
>>>>>
>>>>> I have a smallish cluster with a lot of cores and RAM per node.  I
>>>>> want to support multiple users so I'd like to set up Mesos to provide a
>>>>> maximum of 8 cores per node in the resource offers.  Resource offers 
>>>>> should
>>>>> include multiple nodes to reach the requirements of the user.  For 
>>>>> example,
>>>>> if the user requests 32 cores, I would like 8 cores from each of 4 nodes.
>>>>>
>>>>> Is this possible?  Or can someone suggest alternatives?
>>>>>
>>>>> Thanks,
>>>>> RJ
>>>>>
>>>>>
>>>>> --
>>>>> Klaus Ma (马达), PMP® | http://www.cguru.net
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Setting maximum per-node resources in offers

Reply via email to