Re: why one of Stage is into Skipped section instead of Completed
Thank you Silvio for the update. On Sat, Dec 26, 2015 at 1:14 PM, Silvio Fiorito < silvio.fior...@granturing.com> wrote: > Skipped stages result from existing shuffle output of a stage when > re-running a transformation. The executors will have the output of the > stage in their local dirs and Spark recognizes that, so rather than > re-computing, it will start from the following stage. So, this is a good > thing in that you’re not re-computing a stage. In your case, it looks like > there’s already the output of the userreqs RDD (reduceByKey) so it doesn’t > re-compute it. > > From: Prem Spark > Date: Friday, December 25, 2015 at 11:41 PM > To: "user@spark.apache.org" > Subject: why one of Stage is into Skipped section instead of Completed > > > Whats does the below Skipped Stage means. can anyone help in clarifying? > I was expecting 3 stages to get Succeeded but only 2 of them getting > completed while one is skipped. > Status: SUCCEEDED > Completed Stages: 2 > Skipped Stages: 1 > > Scala REPL Code Used: > > accounts is a basic RDD contains weblog text data. > > var accountsByID = accounts. > > map(line => line.split(',')). > > map(values => (values(0),values(4)+','+values(3))); > > var userreqs = sc. > > textFile("/loudacre/weblogs/*6"). > > map(line => line.split(' ')). > > map(words => (words(2),1)). > > reduceByKey((v1,v2) => v1 + v2); > > var accounthits = > > accountsByID.join(userreqs).map(pair => pair._2) > > accounthits. > > saveAsTextFile("/loudacre/userreqs") > > scala> accounthits.toDebugString > res15: String = > (32) MapPartitionsRDD[24] at map at :28 [] > | MapPartitionsRDD[23] at join at :28 [] > | MapPartitionsRDD[22] at join at :28 [] > | CoGroupedRDD[21] at join at :28 [] > +-(15) MapPartitionsRDD[15] at map at :25 [] > | | MapPartitionsRDD[14] at map at :24 [] > | | /loudacre/accounts/* MapPartitionsRDD[13] at textFile at > :21 [] > | | /loudacre/accounts/* HadoopRDD[12] at textFile at :21 [] > | ShuffledRDD[20] at reduceByKey at :25 [] > +-(32) MapPartitionsRDD[19] at map at :24 [] > | MapPartitionsRDD[18] at map at :23 [] > | /loudacre/weblogs/*6 MapPartitionsRDD[17] at textFile at > :22 [] > | /loudacre/weblogs/*6 HadoopRDD[16] at textFile at > > > > > > >
Re: why one of Stage is into Skipped section instead of Completed
Skipped stages result from existing shuffle output of a stage when re-running a transformation. The executors will have the output of the stage in their local dirs and Spark recognizes that, so rather than re-computing, it will start from the following stage. So, this is a good thing in that you’re not re-computing a stage. In your case, it looks like there’s already the output of the userreqs RDD (reduceByKey) so it doesn’t re-compute it. From: Prem Spark mailto:sparksure...@gmail.com>> Date: Friday, December 25, 2015 at 11:41 PM To: "user@spark.apache.org<mailto:user@spark.apache.org>" mailto:user@spark.apache.org>> Subject: why one of Stage is into Skipped section instead of Completed Whats does the below Skipped Stage means. can anyone help in clarifying? I was expecting 3 stages to get Succeeded but only 2 of them getting completed while one is skipped. Status: SUCCEEDED Completed Stages: 2 Skipped Stages: 1 Scala REPL Code Used: accounts is a basic RDD contains weblog text data. var accountsByID = accounts. map(line => line.split(',')). map(values => (values(0),values(4)+','+values(3))); var userreqs = sc. textFile("/loudacre/weblogs/*6"). map(line => line.split(' ')). map(words => (words(2),1)). reduceByKey((v1,v2) => v1 + v2); var accounthits = accountsByID.join(userreqs).map(pair => pair._2) accounthits. saveAsTextFile("/loudacre/userreqs") scala> accounthits.toDebugString res15: String = (32) MapPartitionsRDD[24] at map at :28 [] | MapPartitionsRDD[23] at join at :28 [] | MapPartitionsRDD[22] at join at :28 [] | CoGroupedRDD[21] at join at :28 [] +-(15) MapPartitionsRDD[15] at map at :25 [] | | MapPartitionsRDD[14] at map at :24 [] | | /loudacre/accounts/* MapPartitionsRDD[13] at textFile at :21 [] | | /loudacre/accounts/* HadoopRDD[12] at textFile at :21 [] | ShuffledRDD[20] at reduceByKey at :25 [] +-(32) MapPartitionsRDD[19] at map at :24 [] | MapPartitionsRDD[18] at map at :23 [] | /loudacre/weblogs/*6 MapPartitionsRDD[17] at textFile at :22 [] | /loudacre/weblogs/*6 HadoopRDD[16] at textFile at
why one of Stage is into Skipped section instead of Completed
Whats does the below Skipped Stage means. can anyone help in clarifying? I was expecting 3 stages to get Succeeded but only 2 of them getting completed while one is skipped. Status: SUCCEEDED Completed Stages: 2 Skipped Stages: 1 Scala REPL Code Used: accounts is a basic RDD contains weblog text data. var accountsByID = accounts. map(line => line.split(',')). map(values => (values(0),values(4)+','+values(3))); var userreqs = sc. textFile("/loudacre/weblogs/*6"). map(line => line.split(' ')). map(words => (words(2),1)). reduceByKey((v1,v2) => v1 + v2); var accounthits = accountsByID.join(userreqs).map(pair => pair._2) accounthits. saveAsTextFile("/loudacre/userreqs") scala> accounthits.toDebugString res15: String = (32) MapPartitionsRDD[24] at map at :28 [] | MapPartitionsRDD[23] at join at :28 [] | MapPartitionsRDD[22] at join at :28 [] | CoGroupedRDD[21] at join at :28 [] +-(15) MapPartitionsRDD[15] at map at :25 [] | | MapPartitionsRDD[14] at map at :24 [] | | /loudacre/accounts/* MapPartitionsRDD[13] at textFile at :21 [] | | /loudacre/accounts/* HadoopRDD[12] at textFile at :21 [] | ShuffledRDD[20] at reduceByKey at :25 [] +-(32) MapPartitionsRDD[19] at map at :24 [] | MapPartitionsRDD[18] at map at :23 [] | /loudacre/weblogs/*6 MapPartitionsRDD[17] at textFile at :22 [] | /loudacre/weblogs/*6 HadoopRDD[16] at textFile at