Re: Auto scaling spark driver on Mesos!

Michael Gummelt Wed, 01 Mar 2017 15:52:31 -0800

On Wed, Mar 1, 2017 at 9:16 AM, Ashish Mehta <[email protected]>
wrote:


> Mesos might have introduced multi tenancy in new versions, but we wanted
> to provide multi-tenancy in versions prior to that, via our "compute
> layer", and we wanted to implement some isolation so that few machines are
> always reserved for a tenant.
>
What do you mean by "multi-tenancy"?  I mean "multiple containers running
on the same host".  Mesos has always provided this.  It's the core of what
it does.

Mesos also provides both static reservations and quota.  Do these not solve
your reservation requirements?


>
> Apologies that I didn't explain "tagging/un-tagging", by this I mean, we
> assign attributes to machines/resource in Mesos, so that they are offered
> with those attributes to application running on mesos. This helps us make
> use of "spark.mesos.constraints" configuration of spark, by which my spark
> application only accepts offers from machine whose attributes (tag) matches
> the constraints passed in.
>
> In case of static allocation of resource, we just tag (assign attributes)
> to the the resource = spark.core.max, and hence all these machines gets
> used up by application. But in case of automatic scaling, my "compute
> layer" should be able to know the resource requirement, to make automatic
> tagging (attributes assignment)/un-tagging (removing attributes).
>
> So reiterating my question. What is the best way to get the feedback to
> my "compute layer", about spark application's requirement for
> auto-scalling? How about the following
>
>    1. Application exposing some API, or dumping event to inform the
>    framework when it needs more resources
>    2. Or our "compute layer" polls the Mesos API to know the resources
>    consumed by the application and deduce whether auto scaling is required or
>    not.
>
> BTW Thanks for taking this talk forward.
>
> - Ashish
>
> On Wed, Mar 1, 2017 at 12:33 AM, Michael Gummelt <[email protected]>
> wrote:
>
>> I'm sorry, I still don't understand.  What functionality does this
>> "compute layer" provide?  You say it provides Multi tenancy, but Mesos
>> itself does that.  You also say it "keeps track of resources", but again,
>> Mesos does that.  What does tagging/un-tagging resources provide?
>>
>>
>> On Tue, Feb 28, 2017 at 12:46 AM, Ashish Mehta <[email protected]>
>> wrote:
>>
>>> Yes Michael, I have tried dynamic allocation with my own Mesos cluser,
>>> it works as expected and as documented!
>>> But now I want to move ahead and integrate with our own "compute layer".
>>> Our "compute layer"
>>>
>>>    - provides Multi tenancy with chronos and marathon over Mesos.
>>>    - Manages/book-keeps all the resources on behalf of individual
>>>    tenant, having some quota assigned to it. It keeps track of resource by
>>>    tagging/un-tagging them in Mesos.
>>>
>>>
>>> The problem with, out of the box "dynamic allocation" is that, our
>>> "compute layer" doesn't know the resource utilisation of the application,
>>> and can't tag/un-tag the machine automatically. If we tag all the machine
>>> before running the application based on spark.cores.max, then we will not
>>> be able to make use of "dynamic allocation" because the tagged machines are
>>> reserved and can't be used for other applications.
>>>
>>> *So I want to articulate my initial query here and ask:*
>>> What is the best way to get the feedback to my "compute layer", about
>>> spark application's requirement for auto-scalling? How about the following
>>>
>>>    1. Application exposing some API, or dumping event to inform the
>>>    framework when it needs more resources
>>>    2. Or our "compute layer" polls the Mesos API to know the resources
>>>    consumed by the application and deduce whether auto scaling is required 
>>> or
>>>    not.
>>>
>>> Thanks,
>>> Ashish
>>>
>>> On Tue, Feb 28, 2017 at 2:48 AM, Michael Gummelt <[email protected]
>>> > wrote:
>>>
>>>> I assume you've looked into dynamic allocation.  What do you need that
>>>> isn't provided by dynamic allocation?
>>>>
>>>> On Mon, Feb 27, 2017 at 4:11 AM, David J. Palaitis <
>>>> [email protected]> wrote:
>>>>
>>>>> by using a combination of Spark's dynamic allocation,
>>>>> http://spark.apache.org/docs/latest/job-scheduli
>>>>> ng.html#configuration-and-setup, and a framework scheduler like Cook,
>>>>> https://github.com/twosigma/Cook/tree/master/spark, you can achieve
>>>>> the desired auto-scaling effect without the overhead of managing
>>>>> roles/constraints in mesos.  i'd be happy to discuss this in more detail 
>>>>> if
>>>>> you decide to give it a try.
>>>>>
>>>>> On Mon, Feb 27, 2017 at 3:14 AM, Ashish Mehta <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> We want to move to auto-scaling of spark driver, where in more
>>>>>> resources are added into the available resources for "spark driver" based
>>>>>> on requirement. The requirement can increase/decrease based on multiple 
>>>>>> SQL
>>>>>> queries being done over REST server, or number of queries with multiple
>>>>>> user over thrift server over Spark (HiveServer2).
>>>>>>
>>>>>> *Existing approach with static number of resources:*
>>>>>> We have a very large pool of resources, but my "spark driver" is
>>>>>> allocated limited amount of "static" resource, and we achieve this by
>>>>>> following
>>>>>>
>>>>>>    1. While running the application I tag machine in Mesos with the
>>>>>>    name of my application, so that the offer is made accordingly.
>>>>>>    2. My application is run with the constraint for above tagged
>>>>>>    machine using "spark.mesos.constraints" configuration, so that
>>>>>>    the application only accept offer made by these tagged machine, and 
>>>>>> don't
>>>>>>    eat up all the resource in my very large cluster.
>>>>>>    3. Application launches executor on these accepted offers, and
>>>>>>    they are used to do computation as defined by Spark job, or as and 
>>>>>> when
>>>>>>    queries are fired over HTTP/Thrift server.
>>>>>>
>>>>>> *Approach for auto scaling:*
>>>>>> Auto-scaling of driver helps us in many ways, and lets us use the
>>>>>> resources with better efficiency.
>>>>>> For enabling auto scaling, where in my spark application will get
>>>>>> more and more resource offers, if it has consumed all the available
>>>>>> resource, the workflow will be as follows
>>>>>>
>>>>>>    1. Running a daemon to monitor my app on Mesos
>>>>>>    2. Keep on adding/removing machine for the application by
>>>>>>    tagging/untagging them by monitoring the resource usage metric for my
>>>>>>    application on Mesos.
>>>>>>    3. Scale up/down based on Step 2 by tagging and untagging, and
>>>>>>    take "some buffer" into account.
>>>>>>
>>>>>> I wanted to know the opinion of you guys on "*Approach for auto
>>>>>> scaling*". Is this the right approach to solve auto scaling of Spark
>>>>>> driver?
>>>>>> Also tagging/untagging machine is something which we do to
>>>>>> limit/manage the resources in our big cluster.
>>>>>>
>>>>>> Thanks,
>>>>>> Ashish
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Michael Gummelt
>>>> Software Engineer
>>>> Mesosphere
>>>>
>>>
>>>
>>
>>
>> --
>> Michael Gummelt
>> Software Engineer
>> Mesosphere
>>
>
>


-- 
Michael Gummelt
Software Engineer
Mesosphere

Re: Auto scaling spark driver on Mesos!

Reply via email to