Thats from the Streaming tab for Spark 1.4 WebUI. On 9 July 2015 at 15:35, Michel Hubert <mich...@vsnsystemen.nl> wrote:
> Hi, > > > > I was just wondering how you generated to second image with the charts. > > What product? > > > > *From:* Anand Nalya [mailto:anand.na...@gmail.com] > *Sent:* donderdag 9 juli 2015 11:48 > *To:* spark users > *Subject:* Breaking lineage and reducing stages in Spark Streaming > > > > Hi, > > > > I've an application in which an rdd is being updated with tuples coming > from RDDs in a DStream with following pattern. > > > > dstream.foreachRDD(rdd => { > > myRDD = myRDD.union(rdd.filter(myfilter)).reduceByKey(_+_) > > }) > > > > I'm using cache() and checkpointin to cache results. Over the time, the > lineage of myRDD keeps increasing and stages in each batch of dstream keeps > increasing, even though all the earlier stages are skipped. When the number > of stages grow big enough, the overall delay due to scheduling delay starts > increasing. The processing time for each batch is still fixed. > > > > Following figures illustrate the problem: > > > > Job execution: https://i.imgur.com/GVHeXH3.png?1 > > [image: Image removed by sender.] > > Delays: https://i.imgur.com/1DZHydw.png?1 > > [image: Image removed by sender.] > > Is there some pattern that I can use to avoid this? > > > > Regards, > > Anand >