I've written a very simple Sort scala program with Spark. /object Sort {
def main(args: Array[String]): Unit = { if (args.length < 2) { System.err.println("Usage: Sort <data_file> <save_file>" + " [<slices>]") System.exit(1) } val conf = new SparkConf().setAppName("BigDataBench Sort") val spark = new SparkContext(conf) val logger = new JobPropertiesLogger(spark,"/home/abrandon/log.csv") val filename = args(0) val save_file = args(1) var splits = spark.defaultMinPartitions if (args.length > 2){ splits = args(2).toInt } val lines = spark.textFile(filename, splits) logger.start_timer() val data_map = lines.map(line => { (line, 1) }) val result = data_map.sortByKey().map { line => line._1} logger.stop_timer() logger.write_log("Sort By Key: Sort App") result.saveAsTextFile(save_file) println("Result has been saved to: " + save_file) } }/ Now, I was thinking that since there is only one wide transformation ("sortByKey") two stages will be spanned. However I see two jobs with one stage in Job 0 and two stages for Job 1. Am I missing something?. What I don't get is the first stage of the second job. it seems to do the same job as the stage of Job 0. <http://apache-spark-user-list.1001560.n3.nabble.com/file/n27047/cbKDZ.png> <http://apache-spark-user-list.1001560.n3.nabble.com/file/n27047/GXIkS.png> <http://apache-spark-user-list.1001560.n3.nabble.com/file/n27047/H9LXF.png> -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/DAG-of-Spark-Sort-application-spanning-two-jobs-tp27047.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org