Re: Autoscaling Spark cluster based on topic sizes/rate of growth in Kafka or Spark's metrics?

2015-06-12 Thread Steve Loughran
These are both really good posts: you should try and get them in to the documentation. with anything implementing dynamicness, there are some fun problems (a) detecting the delays in the workflow. There's some good ideas here (b) deciding where to address it. That means you need to monitor the e

Re: FW: Re: Autoscaling Spark cluster based on topic sizes/rate of growth in Kafka or Spark's metrics?

2015-06-11 Thread Dmitry Goldenberg
gt;>>> >>>>>> From: Evo Eftimov >>>>>> >>>>>> Date:2015/05/28 13:22 (GMT+00:00) >>>>>> >>>>>> To: Dmitry Goldenberg >>>>>> >>>>>> Cc: Gerard Maas ,spark users &

Re: FW: Re: Autoscaling Spark cluster based on topic sizes/rate of growth in Kafka or Spark's metrics?

2015-06-11 Thread Tathagata Das
Y AND DISK – when the memory gets >>>>> exhausted spark streaming will resort to keeping new RDDs on disk which >>>>> will prevent it from crashing and hence loosing them. Then some memory >>>>> will >>>>> get freed and it will resort back t

Re: FW: Re: Autoscaling Spark cluster based on topic sizes/rate of growth in Kafka or Spark's metrics?

2015-06-11 Thread Cody Koeninger
g and hence loosing them. Then some memory will >>>> get freed and it will resort back to RAM and so on and so forth >>>> >>>> >>>> >>>> >>>> >>>> Sent from Samsung Mobile >>>> >>>>

Re: FW: Re: Autoscaling Spark cluster based on topic sizes/rate of growth in Kafka or Spark's metrics?

2015-06-11 Thread Dmitry Goldenberg
o:* Evo Eftimov > *Cc:* Cody Koeninger; Andrew Or; Gerard Maas; spark users > *Subject:* Re: FW: Re: Autoscaling Spark cluster based on topic > sizes/rate of growth in Kafka or Spark's metrics? > > > > Evo, > > > > One of the ideas is to shadow the current clus

Re: FW: Re: Autoscaling Spark cluster based on topic sizes/rate of growth in Kafka or Spark's metrics?

2015-06-09 Thread Dmitry Goldenberg
back to RAM and so on and so forth >>> >>> >>> >>> >>> >>> Sent from Samsung Mobile >>> >>> Original message >>> >>> From: Evo Eftimov >>> >>> Date:2015/05/28 13:22 (GMT+00

Re: FW: Re: Autoscaling Spark cluster based on topic sizes/rate of growth in Kafka or Spark's metrics?

2015-06-04 Thread Cody Koeninger
gt; >> Sent from Samsung Mobile >> >> Original message ---- >> >> From: Evo Eftimov >> >> Date:2015/05/28 13:22 (GMT+00:00) >> >> To: Dmitry Goldenberg >> >> Cc: Gerard Maas ,spark users >> >> Subject: R

Re: FW: Re: Autoscaling Spark cluster based on topic sizes/rate of growth in Kafka or Spark's metrics?

2015-06-04 Thread Dmitry Goldenberg
from Samsung Mobile > > Original message > > From: Evo Eftimov > > Date:2015/05/28 13:22 (GMT+00:00) > > To: Dmitry Goldenberg > > Cc: Gerard Maas ,spark users > > Subject: Re: Autoscaling Spark cluster based on topic sizes/rate of growth &

Re: FW: Re: Autoscaling Spark cluster based on topic sizes/rate of growth in Kafka or Spark's metrics?

2015-06-03 Thread Dmitry Goldenberg
June 3, 2015 4:46 PM > *To:* Evo Eftimov > *Cc:* Cody Koeninger; Andrew Or; Gerard Maas; spark users > *Subject:* Re: FW: Re: Autoscaling Spark cluster based on topic > sizes/rate of growth in Kafka or Spark's metrics? > > > > Evo, > > > > One of the ideas i

Re: FW: Re: Autoscaling Spark cluster based on topic sizes/rate of growth in Kafka or Spark's metrics?

2015-06-03 Thread Dmitry Goldenberg
vo Eftimov > *Cc:* Cody Koeninger; Andrew Or; Gerard Maas; spark users > *Subject:* Re: FW: Re: Autoscaling Spark cluster based on topic > sizes/rate of growth in Kafka or Spark's metrics? > > > > Evo, > > > > One of the ideas is to shadow the current cluster. This

Re: FW: Re: Autoscaling Spark cluster based on topic sizes/rate of growth in Kafka or Spark's metrics?

2015-06-03 Thread Dmitry Goldenberg
ldenberg [mailto:dgoldenberg...@gmail.com] > *Sent:* Wednesday, June 3, 2015 4:46 PM > *To:* Evo Eftimov > *Cc:* Cody Koeninger; Andrew Or; Gerard Maas; spark users > *Subject:* Re: FW: Re: Autoscaling Spark cluster based on topic > sizes/rate of growth in Kafka or Spark's metrics? &

RE: FW: Re: Autoscaling Spark cluster based on topic sizes/rate of growth in Kafka or Spark's metrics?

2015-06-03 Thread Evo Eftimov
more From: Dmitry Goldenberg [mailto:dgoldenberg...@gmail.com] Sent: Wednesday, June 3, 2015 4:46 PM To: Evo Eftimov Cc: Cody Koeninger; Andrew Or; Gerard Maas; spark users Subject: Re: FW: Re: Autoscaling Spark cluster based on topic sizes/rate of growth in Kafka or Spark's metrics?

Re: FW: Re: Autoscaling Spark cluster based on topic sizes/rate of growth in Kafka or Spark's metrics?

2015-06-03 Thread Dmitry Goldenberg
er cluster > > > > *From:* Dmitry Goldenberg [mailto:dgoldenberg...@gmail.com] > *Sent:* Wednesday, June 3, 2015 4:14 PM > *To:* Cody Koeninger > *Cc:* Andrew Or; Evo Eftimov; Gerard Maas; spark users > *Subject:* Re: FW: Re: Autoscaling Spark cluster based on topic > sizes/rate

RE: FW: Re: Autoscaling Spark cluster based on topic sizes/rate of growth in Kafka or Spark's metrics?

2015-06-03 Thread Evo Eftimov
Maas; spark users Subject: Re: FW: Re: Autoscaling Spark cluster based on topic sizes/rate of growth in Kafka or Spark's metrics? Would it be possible to implement Spark autoscaling somewhat along these lines? -- 1. If we sense that a new machine is needed, by watching the data lo

Re: FW: Re: Autoscaling Spark cluster based on topic sizes/rate of growth in Kafka or Spark's metrics?

2015-06-03 Thread Dmitry Goldenberg
;> Until there is free RAM, spark streaming (spark) will NOT resort to disk – >>> and of course resorting to disk from time to time (ie when there is no free >>> RAM ) and taking a performance hit from that, BUT only until there is no >>> free RAM >>> >>> >>&

Re: FW: Re: Autoscaling Spark cluster based on topic sizes/rate of growth in Kafka or Spark's metrics?

2015-05-28 Thread Dmitry Goldenberg
isk from time to time (ie when there is no free >>> RAM ) and taking a performance hit from that, BUT only until there is no >>> free RAM >>> >>> >>> >>> *From:* Dmitry Goldenberg [mailto:dgoldenberg...@gmail.com] >>> *Sent:* Thursday, May

Re: FW: Re: Autoscaling Spark cluster based on topic sizes/rate of growth in Kafka or Spark's metrics?

2015-05-28 Thread Cody Koeninger
time (ie when there is no free >> RAM ) and taking a performance hit from that, BUT only until there is no >> free RAM >> >> >> >> *From:* Dmitry Goldenberg [mailto:dgoldenberg...@gmail.com] >> *Sent:* Thursday, May 28, 2015 2:34 PM >> *To:*

Re: FW: Re: Autoscaling Spark cluster based on topic sizes/rate of growth in Kafka or Spark's metrics?

2015-05-28 Thread Dmitry Goldenberg
; free RAM >> >> >> >> *From:* Dmitry Goldenberg [mailto:dgoldenberg...@gmail.com] >> *Sent:* Thursday, May 28, 2015 2:34 PM >> *To:* Evo Eftimov >> *Cc:* Gerard Maas; spark users >> *Subject:* Re: FW: Re: Autoscaling Spark cluster based on topic

Re: FW: Re: Autoscaling Spark cluster based on topic sizes/rate of growth in Kafka or Spark's metrics?

2015-05-28 Thread Andrew Or
y until there is no > free RAM > > > > *From:* Dmitry Goldenberg [mailto:dgoldenberg...@gmail.com] > *Sent:* Thursday, May 28, 2015 2:34 PM > *To:* Evo Eftimov > *Cc:* Gerard Maas; spark users > *Subject:* Re: FW: Re: Autoscaling Spark cluster based on topic > sizes/r

RE: FW: Re: Autoscaling Spark cluster based on topic sizes/rate of growth in Kafka or Spark's metrics?

2015-05-28 Thread Evo Eftimov
and so forth Sent from Samsung Mobile Original message From: Evo Eftimov Date:2015/05/28 13:22 (GMT+00:00) To: Dmitry Goldenberg Cc: Gerard Maas ,spark users Subject: Re: Autoscaling Spark cluster based on topic sizes/rate of growth in Kafka or Spark's metri

Re: FW: Re: Autoscaling Spark cluster based on topic sizes/rate of growth in Kafka or Spark's metrics?

2015-05-28 Thread Dmitry Goldenberg
> > > > > Sent from Samsung Mobile > > Original message > > From: Evo Eftimov > > Date:2015/05/28 13:22 (GMT+00:00) > > To: Dmitry Goldenberg > > Cc: Gerard Maas ,spark users > > Subject: Re: Autoscaling Spark cluster based on top

FW: Re: Autoscaling Spark cluster based on topic sizes/rate of growth in Kafka or Spark's metrics?

2015-05-28 Thread Evo Eftimov
Original message From: Evo Eftimov Date:2015/05/28 13:22 (GMT+00:00) To: Dmitry Goldenberg Cc: Gerard Maas ,spark users Subject: Re: Autoscaling Spark cluster based on topic sizes/rate of growth in Kafka or Spark's metrics? You can always spin new boxes i

Re: Autoscaling Spark cluster based on topic sizes/rate of growth in Kafka or Spark's metrics?

2015-05-28 Thread Evo Eftimov
Spark cluster based on topic sizes/rate of growth in Kafka or Spark's metrics? Thanks, Evo.  Per the last part of your comment, it sounds like we will need to implement a job manager which will be in control of starting the jobs, monitoring the status of the Kafka topic(s), shutting jobs dow

Re: Autoscaling Spark cluster based on topic sizes/rate of growth in Kafka or Spark's metrics?

2015-05-28 Thread Dmitry Goldenberg
of > Partitions (hence tasks), more RAM per executor etc. Obviously this will > cause some temporary delay in fact interruption in your processing but if > the business use case can tolerate that then go for it > > > > *From:* Gerard Maas [mailto:gerard.m...@gmail.com] > *Se

RE: Autoscaling Spark cluster based on topic sizes/rate of growth in Kafka or Spark's metrics?

2015-05-28 Thread Evo Eftimov
Subject: Re: Autoscaling Spark cluster based on topic sizes/rate of growth in Kafka or Spark's metrics? Hi, tl;dr At the moment (with a BIG disclaimer *) elastic scaling of spark streaming processes is not supported. Longer version. I assume that you are talking about

Re: Autoscaling Spark cluster based on topic sizes/rate of growth in Kafka or Spark's metrics?

2015-05-28 Thread Dmitry Goldenberg
Thank you, Gerard. We're looking at the receiver-less setup with Kafka Spark streaming so I'm not sure how to apply your comments to that case (not that we have to use receiver-less but it seems to offer some advantages over the receiver-based). As far as "the number of Kafka receivers is fixed f

Re: Autoscaling Spark cluster based on topic sizes/rate of growth in Kafka or Spark's metrics?

2015-05-28 Thread Gerard Maas
Hi, tl;dr At the moment (with a BIG disclaimer *) elastic scaling of spark streaming processes is not supported. *Longer version.* I assume that you are talking about Spark Streaming as the discussion is about handing Kafka streaming data. Then you have two things to consider: the Streaming re

Re: Autoscaling Spark cluster based on topic sizes/rate of growth in Kafka or Spark's metrics?

2015-05-27 Thread Ted Yu
bq. detect the presence of a new node and start utilizing it My understanding is that Spark is concerned with managing executors. Whether request for an executor is fulfilled on an existing node or a new node is up to the underlying cluster manager (YARN e.g.). Assuming the cluster is single tenan

Re: Autoscaling Spark cluster based on topic sizes/rate of growth in Kafka or Spark's metrics?

2015-05-27 Thread Dmitry Goldenberg
Thanks, Rajesh. I think that acquring/relinquishing executors is important but I feel like there are at least two layers for resource allocation and autoscaling. It seems that acquiring and relinquishing executors is a way to optimize resource utilization within a pre-set Spark cluster of machine

RE: Autoscaling Spark cluster based on topic sizes/rate of growth in Kafka or Spark's metrics?

2015-05-27 Thread Rajesh_Kalluri
Dell - Internal Use - Confidential Did you check https://drive.google.com/file/d/0B7tmGAdbfMI2OXl6azYySk5iTGM/edit and http://spark.apache.org/docs/latest/job-scheduling.html#dynamic-resource-allocation Not sure if the spark kafka receiver emits metrics on the lag, check this link out http://