Re: Auto scaling spark driver on Mesos!

tommy xiao Tue, 28 Feb 2017 22:00:25 -0800

same here question? How do you archive your multi tenancy feature?

2017-03-01 3:03 GMT+08:00 Michael Gummelt <[email protected]>:


> I'm sorry, I still don't understand.  What functionality does this
> "compute layer" provide?  You say it provides Multi tenancy, but Mesos
> itself does that.  You also say it "keeps track of resources", but again,
> Mesos does that.  What does tagging/un-tagging resources provide?
>
>
> On Tue, Feb 28, 2017 at 12:46 AM, Ashish Mehta <[email protected]>
> wrote:
>
>> Yes Michael, I have tried dynamic allocation with my own Mesos cluser, it
>> works as expected and as documented!
>> But now I want to move ahead and integrate with our own "compute layer".
>> Our "compute layer"
>>
>>    - provides Multi tenancy with chronos and marathon over Mesos.
>>    - Manages/book-keeps all the resources on behalf of individual
>>    tenant, having some quota assigned to it. It keeps track of resource by
>>    tagging/un-tagging them in Mesos.
>>
>>
>> The problem with, out of the box "dynamic allocation" is that, our
>> "compute layer" doesn't know the resource utilisation of the application,
>> and can't tag/un-tag the machine automatically. If we tag all the machine
>> before running the application based on spark.cores.max, then we will not
>> be able to make use of "dynamic allocation" because the tagged machines are
>> reserved and can't be used for other applications.
>>
>> *So I want to articulate my initial query here and ask:*
>> What is the best way to get the feedback to my "compute layer", about
>> spark application's requirement for auto-scalling? How about the following
>>
>>    1. Application exposing some API, or dumping event to inform the
>>    framework when it needs more resources
>>    2. Or our "compute layer" polls the Mesos API to know the resources
>>    consumed by the application and deduce whether auto scaling is required or
>>    not.
>>
>> Thanks,
>> Ashish
>>
>> On Tue, Feb 28, 2017 at 2:48 AM, Michael Gummelt <[email protected]>
>> wrote:
>>
>>> I assume you've looked into dynamic allocation.  What do you need that
>>> isn't provided by dynamic allocation?
>>>
>>> On Mon, Feb 27, 2017 at 4:11 AM, David J. Palaitis <
>>> [email protected]> wrote:
>>>
>>>> by using a combination of Spark's dynamic allocation,
>>>> http://spark.apache.org/docs/latest/job-scheduli
>>>> ng.html#configuration-and-setup, and a framework scheduler like Cook,
>>>> https://github.com/twosigma/Cook/tree/master/spark, you can achieve
>>>> the desired auto-scaling effect without the overhead of managing
>>>> roles/constraints in mesos.  i'd be happy to discuss this in more detail if
>>>> you decide to give it a try.
>>>>
>>>> On Mon, Feb 27, 2017 at 3:14 AM, Ashish Mehta <[email protected]
>>>> > wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> We want to move to auto-scaling of spark driver, where in more
>>>>> resources are added into the available resources for "spark driver" based
>>>>> on requirement. The requirement can increase/decrease based on multiple 
>>>>> SQL
>>>>> queries being done over REST server, or number of queries with multiple
>>>>> user over thrift server over Spark (HiveServer2).
>>>>>
>>>>> *Existing approach with static number of resources:*
>>>>> We have a very large pool of resources, but my "spark driver" is
>>>>> allocated limited amount of "static" resource, and we achieve this by
>>>>> following
>>>>>
>>>>>    1. While running the application I tag machine in Mesos with the
>>>>>    name of my application, so that the offer is made accordingly.
>>>>>    2. My application is run with the constraint for above tagged
>>>>>    machine using "spark.mesos.constraints" configuration, so that the
>>>>>    application only accept offer made by these tagged machine, and don't 
>>>>> eat
>>>>>    up all the resource in my very large cluster.
>>>>>    3. Application launches executor on these accepted offers, and
>>>>>    they are used to do computation as defined by Spark job, or as and when
>>>>>    queries are fired over HTTP/Thrift server.
>>>>>
>>>>> *Approach for auto scaling:*
>>>>> Auto-scaling of driver helps us in many ways, and lets us use the
>>>>> resources with better efficiency.
>>>>> For enabling auto scaling, where in my spark application will get more
>>>>> and more resource offers, if it has consumed all the available resource,
>>>>> the workflow will be as follows
>>>>>
>>>>>    1. Running a daemon to monitor my app on Mesos
>>>>>    2. Keep on adding/removing machine for the application by
>>>>>    tagging/untagging them by monitoring the resource usage metric for my
>>>>>    application on Mesos.
>>>>>    3. Scale up/down based on Step 2 by tagging and untagging, and
>>>>>    take "some buffer" into account.
>>>>>
>>>>> I wanted to know the opinion of you guys on "*Approach for auto
>>>>> scaling*". Is this the right approach to solve auto scaling of Spark
>>>>> driver?
>>>>> Also tagging/untagging machine is something which we do to
>>>>> limit/manage the resources in our big cluster.
>>>>>
>>>>> Thanks,
>>>>> Ashish
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Michael Gummelt
>>> Software Engineer
>>> Mesosphere
>>>
>>
>>
>
>
> --
> Michael Gummelt
> Software Engineer
> Mesosphere
>



-- 
Deshi Xiao
Twitter: xds2000
E-mail: xiaods(AT)gmail.com

Re: Auto scaling spark driver on Mesos!

Reply via email to