Re: Breaking lineage and reducing stages in Spark Streaming

Anand Nalya Thu, 09 Jul 2015 03:19:04 -0700

Thats from the Streaming tab for Spark 1.4 WebUI.

On 9 July 2015 at 15:35, Michel Hubert <mich...@vsnsystemen.nl> wrote:


>  Hi,
>
>
>
> I was just wondering how you generated to second image with the charts.
>
> What product?
>
>
>
> *From:* Anand Nalya [mailto:anand.na...@gmail.com]
> *Sent:* donderdag 9 juli 2015 11:48
> *To:* spark users
> *Subject:* Breaking lineage and reducing stages in Spark Streaming
>
>
>
> Hi,
>
>
>
> I've an application in which an rdd is being updated with tuples coming
> from RDDs in a DStream with following pattern.
>
>
>
> dstream.foreachRDD(rdd => {
>
>   myRDD = myRDD.union(rdd.filter(myfilter)).reduceByKey(_+_)
>
> })
>
>
>
> I'm using cache() and checkpointin to cache results. Over the time, the
> lineage of myRDD keeps increasing and stages in each batch of dstream keeps
> increasing, even though all the earlier stages are skipped. When the number
> of stages grow big enough, the overall delay due to scheduling delay starts
> increasing. The processing time for each batch is still fixed.
>
>
>
> Following figures illustrate the problem:
>
>
>
> Job execution: https://i.imgur.com/GVHeXH3.png?1
>
> [image: Image removed by sender.]
>
> Delays: https://i.imgur.com/1DZHydw.png?1
>
> [image: Image removed by sender.]
>
> Is there some pattern that I can use to avoid this?
>
>
>
> Regards,
>
> Anand
>

Re: Breaking lineage and reducing stages in Spark Streaming

Reply via email to