Are all transformations lazy?

David Thomas Tue, 11 Mar 2014 21:50:27 -0700

For example, is distinct() transformation lazy?

when I see the Spark source code, distinct applies a map-> reduceByKey ->
map function to the RDD elements. Why is this lazy? Won't the function be
applied immediately to the elements of RDD when I call someRDD.distinct?


  /**
   * Return a new RDD containing the distinct elements in this RDD.
   */
  def distinct(numPartitions: Int): RDD[T] =
    map(x => (x, null)).reduceByKey((x, y) => x, numPartitions).map(_._1)

  /**
   * Return a new RDD containing the distinct elements in this RDD.
   */
  def distinct(): RDD[T] = distinct(partitions.size)

Are all transformations lazy?

Reply via email to