Stucked job work well after rdd.count or rdd.collect

Kevin Jung Sun, 05 Oct 2014 19:55:01 -0700

Hi, all.
I'm in an unusual situation.
The code,

...
1: val cell = dataSet.flatMap(parse(_)).cache
2: val distinctCell = cell.keyBy(_._1).reduceByKey(removeDuplication(_,
_)).mapValues(_._3).cache
3: val groupedCellByLine =
distinctCell.map(cellToIterableColumn).groupByKey.cache
4: val result = (1 to groupedCellByLine.map(_._2.size).max).toArray
...


get stuck when the line 4 is executed.
But if I add 'cell.collect' or 'cell.count' between line 3 and line 4, it
works fine.
I don't know why it happens.
Does anyone have experience like this?



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Stucked-job-work-well-after-rdd-count-or-rdd-collect-tp15776.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Stucked job work well after rdd.count or rdd.collect

Reply via email to