Hi, all. I'm in an unusual situation. The code, ... 1: val cell = dataSet.flatMap(parse(_)).cache 2: val distinctCell = cell.keyBy(_._1).reduceByKey(removeDuplication(_, _)).mapValues(_._3).cache 3: val groupedCellByLine = distinctCell.map(cellToIterableColumn).groupByKey.cache 4: val result = (1 to groupedCellByLine.map(_._2.size).max).toArray ...
get stuck when the line 4 is executed. But if I add 'cell.collect' or 'cell.count' between line 3 and line 4, it works fine. I don't know why it happens. Does anyone have experience like this? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Stucked-job-work-well-after-rdd-count-or-rdd-collect-tp15776.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org