It may be confusing at first but there is also an important difference
between reduce and reduceByKey operations. 

reduce is an action on an RDD. Hence, it will request the evaluation of
transformations that resulted to the RDD.

In contrast, reduceByKey is a transformation on PairRDDs, not an action.
Therefore, distinct is implemented as a chain of transformations as below: 

map(x => (x, null)).reduceByKey((x, y) => x, numPartitions).map(_._1)



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Are-all-transformations-lazy-tp2582p10675.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Reply via email to