Re: Cluster autoscaling in Spark+Mesos ?

Tim Chen Thu, 04 Jun 2015 14:32:34 -0700

Hi Dmitry,

That certainly can work just needs to coordinate the events you mentioned,
and make sure it does happen accordingly. Currently the Spark scheduler is
very job agnostic and doesn't understand what Spark job is it running, and
that's the next type of optimizations I'd like to put into the roadmap,
that understands the job type that it's running and can support certain
actions depending on what it is.


Do you have a specific use case you can prototype this? We can certainly
make this happen in the Spark side.

Tim





On Thu, Jun 4, 2015 at 2:11 PM, Dmitry Goldenberg <dgoldenberg...@gmail.com>
wrote:

> Tim,
>
> Aware of more resources - is that if it runs on Mesos or via any type of
> cluster manager?  Our thinking was that once we can determine that the
> cluster has changed, we could notify the streaming consumers to finish
> processing the current batch, then terminate, then resume streaming with a
> new instance of the Context.  Would that not cause Spark to refresh its
> awareness of the cluster resources?
>
> - Dmitry
>
> On Thu, Jun 4, 2015 at 5:03 PM, Tim Chen <t...@mesosphere.io> wrote:
>
>> Spark is aware there are more resources by getting more resource offers
>> and using those new offers.
>>
>> I don't think there is a way to refresh the Spark context for streaming.
>>
>> Tim
>>
>> On Thu, Jun 4, 2015 at 1:59 PM, Dmitry Goldenberg <
>> dgoldenberg...@gmail.com> wrote:
>>
>>> Thanks, Ankur. I'd be curious to understand how the data exchange
>>> happens in this case. How does Spark become aware of the fact that machines
>>> have been added to the cluster or have been removed from it?  And then, do
>>> you have some mechanism to perhaps restart the Spark consumers into
>>> refreshed Spark context's which are aware of the new cluster topology?
>>>
>>> On Thu, Jun 4, 2015 at 4:23 PM, Ankur Chauhan <an...@malloc64.com>
>>> wrote:
>>>
>>>> -----BEGIN PGP SIGNED MESSAGE-----
>>>> Hash: SHA1
>>>>
>>>> AFAIK Mesos does not support host level auto-scaling because that is
>>>> not the scope of the mesos-master or mesos-slave. In EC2 (like in my
>>>> case) we have autoscaling groups set with cloudwatch metrics hooked up
>>>> to scaling policies. In our case, we have the following.
>>>> * Add 1 host per AZ when cpu load is > 85% for 15 mins continuously.
>>>> * Remove 1 host if the cpu load is < 15% for 15 mins continuously.
>>>> * Similar monitoring + scale-up/scale-down based on memory.
>>>>
>>>> All of these rules have a cooldown period of 30mins so that we don't
>>>> end-up scaling up/down too fast.
>>>>
>>>> Then again, our workload is bursty (spark on mesos in fine-grained
>>>> mode). So, the new resources get used up and tasks distribute pretty
>>>> fast. The above may not work in case you have long-running tasks (such
>>>> as marathon tasks) because they would not be redistributed till some
>>>> task restarting happens.
>>>>
>>>> - -- Ankur
>>>>
>>>> On 04/06/2015 13:13, Dmitry Goldenberg wrote:
>>>> > Would it be accurate to say that Mesos helps you optimize resource
>>>> > utilization out of a preset  pool of resources, presumably servers?
>>>> > And its level of autoscaling is within that pool?
>>>> >
>>>> >
>>>> > On Jun 4, 2015, at 3:54 PM, Vinod Kone <vinodk...@gmail.com
>>>> > <mailto:vinodk...@gmail.com>> wrote:
>>>> >
>>>> >> Hey Dmitry. At the current time there is no built-in support for
>>>> >> Mesos to autoscale nodes in the cluster. I've heard people
>>>> >> (Netflix?) do it out of band on EC2.
>>>> >>
>>>> >> On Thu, Jun 4, 2015 at 9:08 AM, Dmitry Goldenberg
>>>> >> <dgoldenberg...@gmail.com <mailto:dgoldenberg...@gmail.com>>
>>>> >> wrote:
>>>> >>
>>>> >> A Mesos noob here. Could someone point me at the doc or summary
>>>> >> for the cluster autoscaling capabilities in Mesos?
>>>> >>
>>>> >> Is there a way to feed it events and have it detect the need to
>>>> >> bring in more machines or decommission machines?  Is there a way
>>>> >> to receive events back that notify you that machines have been
>>>> >> allocated or decommissioned?
>>>> >>
>>>> >> Would this work within a certain set of
>>>> >> "preallocated"/pre-provisioned/"stand-by" machines or will Mesos
>>>> >> go and grab machines from the cloud?
>>>> >>
>>>> >> What are the integration points of Apache Spark and Mesos?  What
>>>> >> are the true advantages of running Spark on Mesos?
>>>> >>
>>>> >> Can Mesos autoscale the cluster based on some signals/events
>>>> >> coming out of Spark runtime or Spark consumers, then cause the
>>>> >> consumers to run on the updated cluster, or signal to the
>>>> >> consumers to restart themselves into an updated cluster?
>>>> >>
>>>> >> Thanks.
>>>> >>
>>>> >>
>>>> -----BEGIN PGP SIGNATURE-----
>>>>
>>>> iQEcBAEBAgAGBQJVcLO2AAoJEOSJAMhvLp3LDuEH/1Bu3vhALR8+TPbsM5TscDOy
>>>> vFwyb+ACh8tKL2XoXPwBaMkXU5qPFGX9Wa5weDNCqcUqbvoZ6G9ScrXbpTpWVFTn
>>>> n240CxKGMqplgelDZmQAlixlPB8jUi9ZUfn6Z4FjuPUz1scLSyIOATxh57z0qRyp
>>>> kdbS3pcU5ZmS9N/CHwNGOI9qwk7ebA1HPLqkRnBJLHKXJ6savW4FbANYb8OLWcAM
>>>> It2GzbyAdrMMs7dgeaaEPnvwqnF5nSf2aERA9EjFyxBhJMgKidlUxFSxvMTD1jkx
>>>> xjMZJeeVDqVsdZWtJkNwNsjXQG7X7f2bWY14rDL4XM59X8XCLnxkODRMTeGjXBM=
>>>> =cHZK
>>>> -----END PGP SIGNATURE-----
>>>>
>>>
>>>
>>
>

Re: Cluster autoscaling in Spark+Mesos ?

Reply via email to