These are both really good posts: you should try and get them in to the
documentation.
with anything implementing dynamicness, there are some fun problems
(a) detecting the delays in the workflow. There's some good ideas here
(b) deciding where to address it. That means you need to monitor the e
gt;>>>
>>>>>> From: Evo Eftimov
>>>>>>
>>>>>> Date:2015/05/28 13:22 (GMT+00:00)
>>>>>>
>>>>>> To: Dmitry Goldenberg
>>>>>>
>>>>>> Cc: Gerard Maas ,spark users
&
Y AND DISK – when the memory gets
>>>>> exhausted spark streaming will resort to keeping new RDDs on disk which
>>>>> will prevent it from crashing and hence loosing them. Then some memory
>>>>> will
>>>>> get freed and it will resort back t
g and hence loosing them. Then some memory will
>>>> get freed and it will resort back to RAM and so on and so forth
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Sent from Samsung Mobile
>>>>
>>>>
o:* Evo Eftimov
> *Cc:* Cody Koeninger; Andrew Or; Gerard Maas; spark users
> *Subject:* Re: FW: Re: Autoscaling Spark cluster based on topic
> sizes/rate of growth in Kafka or Spark's metrics?
>
>
>
> Evo,
>
>
>
> One of the ideas is to shadow the current clus
back to RAM and so on and so forth
>>>
>>>
>>>
>>>
>>>
>>> Sent from Samsung Mobile
>>>
>>> Original message
>>>
>>> From: Evo Eftimov
>>>
>>> Date:2015/05/28 13:22 (GMT+00
gt;
>> Sent from Samsung Mobile
>>
>> Original message ----
>>
>> From: Evo Eftimov
>>
>> Date:2015/05/28 13:22 (GMT+00:00)
>>
>> To: Dmitry Goldenberg
>>
>> Cc: Gerard Maas ,spark users
>>
>> Subject: R
from Samsung Mobile
>
> Original message
>
> From: Evo Eftimov
>
> Date:2015/05/28 13:22 (GMT+00:00)
>
> To: Dmitry Goldenberg
>
> Cc: Gerard Maas ,spark users
>
> Subject: Re: Autoscaling Spark cluster based on topic sizes/rate of growth
&
June 3, 2015 4:46 PM
> *To:* Evo Eftimov
> *Cc:* Cody Koeninger; Andrew Or; Gerard Maas; spark users
> *Subject:* Re: FW: Re: Autoscaling Spark cluster based on topic
> sizes/rate of growth in Kafka or Spark's metrics?
>
>
>
> Evo,
>
>
>
> One of the ideas i
vo Eftimov
> *Cc:* Cody Koeninger; Andrew Or; Gerard Maas; spark users
> *Subject:* Re: FW: Re: Autoscaling Spark cluster based on topic
> sizes/rate of growth in Kafka or Spark's metrics?
>
>
>
> Evo,
>
>
>
> One of the ideas is to shadow the current cluster. This
ldenberg [mailto:dgoldenberg...@gmail.com]
> *Sent:* Wednesday, June 3, 2015 4:46 PM
> *To:* Evo Eftimov
> *Cc:* Cody Koeninger; Andrew Or; Gerard Maas; spark users
> *Subject:* Re: FW: Re: Autoscaling Spark cluster based on topic
> sizes/rate of growth in Kafka or Spark's metrics?
&
more
From: Dmitry Goldenberg [mailto:dgoldenberg...@gmail.com]
Sent: Wednesday, June 3, 2015 4:46 PM
To: Evo Eftimov
Cc: Cody Koeninger; Andrew Or; Gerard Maas; spark users
Subject: Re: FW: Re: Autoscaling Spark cluster based on topic sizes/rate of
growth in Kafka or Spark's metrics?
er cluster
>
>
>
> *From:* Dmitry Goldenberg [mailto:dgoldenberg...@gmail.com]
> *Sent:* Wednesday, June 3, 2015 4:14 PM
> *To:* Cody Koeninger
> *Cc:* Andrew Or; Evo Eftimov; Gerard Maas; spark users
> *Subject:* Re: FW: Re: Autoscaling Spark cluster based on topic
> sizes/rate
Maas; spark users
Subject: Re: FW: Re: Autoscaling Spark cluster based on topic sizes/rate of
growth in Kafka or Spark's metrics?
Would it be possible to implement Spark autoscaling somewhat along these lines?
--
1. If we sense that a new machine is needed, by watching the data lo
;> Until there is free RAM, spark streaming (spark) will NOT resort to disk –
>>> and of course resorting to disk from time to time (ie when there is no free
>>> RAM ) and taking a performance hit from that, BUT only until there is no
>>> free RAM
>>>
>>>
>>&
isk from time to time (ie when there is no free
>>> RAM ) and taking a performance hit from that, BUT only until there is no
>>> free RAM
>>>
>>>
>>>
>>> *From:* Dmitry Goldenberg [mailto:dgoldenberg...@gmail.com]
>>> *Sent:* Thursday, May
time (ie when there is no free
>> RAM ) and taking a performance hit from that, BUT only until there is no
>> free RAM
>>
>>
>>
>> *From:* Dmitry Goldenberg [mailto:dgoldenberg...@gmail.com]
>> *Sent:* Thursday, May 28, 2015 2:34 PM
>> *To:*
; free RAM
>>
>>
>>
>> *From:* Dmitry Goldenberg [mailto:dgoldenberg...@gmail.com]
>> *Sent:* Thursday, May 28, 2015 2:34 PM
>> *To:* Evo Eftimov
>> *Cc:* Gerard Maas; spark users
>> *Subject:* Re: FW: Re: Autoscaling Spark cluster based on topic
y until there is no
> free RAM
>
>
>
> *From:* Dmitry Goldenberg [mailto:dgoldenberg...@gmail.com]
> *Sent:* Thursday, May 28, 2015 2:34 PM
> *To:* Evo Eftimov
> *Cc:* Gerard Maas; spark users
> *Subject:* Re: FW: Re: Autoscaling Spark cluster based on topic
> sizes/r
and so forth
Sent from Samsung Mobile
Original message
From: Evo Eftimov
Date:2015/05/28 13:22 (GMT+00:00)
To: Dmitry Goldenberg
Cc: Gerard Maas ,spark users
Subject: Re: Autoscaling Spark cluster based on topic sizes/rate of growth in
Kafka or Spark's metri
>
>
>
>
> Sent from Samsung Mobile
>
> Original message
>
> From: Evo Eftimov
>
> Date:2015/05/28 13:22 (GMT+00:00)
>
> To: Dmitry Goldenberg
>
> Cc: Gerard Maas ,spark users
>
> Subject: Re: Autoscaling Spark cluster based on top
Original message
From: Evo Eftimov
Date:2015/05/28 13:22 (GMT+00:00)
To: Dmitry Goldenberg
Cc: Gerard Maas ,spark users
Subject: Re: Autoscaling Spark cluster based on topic sizes/rate of growth in
Kafka or Spark's metrics?
You can always spin new boxes i
Spark cluster based on topic sizes/rate of growth in Kafka or
Spark's metrics?
Thanks, Evo. Per the last part of your comment, it sounds like we will
need to implement a job manager which will be in control of starting the jobs,
monitoring the status of the Kafka topic(s), shutting jobs dow
of
> Partitions (hence tasks), more RAM per executor etc. Obviously this will
> cause some temporary delay in fact interruption in your processing but if
> the business use case can tolerate that then go for it
>
>
>
> *From:* Gerard Maas [mailto:gerard.m...@gmail.com]
> *Se
Subject: Re: Autoscaling Spark cluster based on topic sizes/rate of growth in
Kafka or Spark's metrics?
Hi,
tl;dr At the moment (with a BIG disclaimer *) elastic scaling of spark
streaming processes is not supported.
Longer version.
I assume that you are talking about
Thank you, Gerard.
We're looking at the receiver-less setup with Kafka Spark streaming so I'm
not sure how to apply your comments to that case (not that we have to use
receiver-less but it seems to offer some advantages over the
receiver-based).
As far as "the number of Kafka receivers is fixed f
Hi,
tl;dr At the moment (with a BIG disclaimer *) elastic scaling of spark
streaming processes is not supported.
*Longer version.*
I assume that you are talking about Spark Streaming as the discussion is
about handing Kafka streaming data.
Then you have two things to consider: the Streaming re
bq. detect the presence of a new node and start utilizing it
My understanding is that Spark is concerned with managing executors.
Whether request for an executor is fulfilled on an existing node or a new
node is up to the underlying cluster manager (YARN e.g.).
Assuming the cluster is single tenan
Thanks, Rajesh. I think that acquring/relinquishing executors is important
but I feel like there are at least two layers for resource allocation and
autoscaling. It seems that acquiring and relinquishing executors is a way
to optimize resource utilization within a pre-set Spark cluster of machine
Dell - Internal Use - Confidential
Did you check https://drive.google.com/file/d/0B7tmGAdbfMI2OXl6azYySk5iTGM/edit
and
http://spark.apache.org/docs/latest/job-scheduling.html#dynamic-resource-allocation
Not sure if the spark kafka receiver emits metrics on the lag, check this link
out
http://
30 matches
Mail list logo