Thanks! It is very helpful!
2013/11/4 Mark Hamstra <[email protected]> > scala> val rdd = sc.parallelize(1 to 1000000, 4) > rdd: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[0] at > parallelize at <console>:12 > > scala> rdd.mapPartitions { itr => itr.takeWhile(_ < 10) }.count > . > . > . > res1: Long = 9 > > In the first partition, 9 elements are iterated through before the > evaluation of the closure over that partition is completed; in the other > three partitions, only one element is examined. > > > > On Sun, Nov 3, 2013 at 11:32 PM, Xiang Huo <[email protected]> wrote: > >> Hi Mark, >> >> Could you tell me more detail information about how to short-circuit the >> filtering ? >> >> Thanks. >> >> Xiang >> >> >> 2013/11/4 Mark Hamstra <[email protected]> >> >>> You could short-circuit the filtering within the interator function >>> supplied to mapPartitions. >>> >>> >>> On Sunday, November 3, 2013, Xiang Huo wrote: >>> >>>> Hi all, >>>> >>>> I am trying to filter a smaller RDD data set from a large RDD data set. >>>> And the large one is sorted. So my question is that is there any way to >>>> make the filter method does't check every element in RDD but filter out all >>>> the other elements when one element doesn't meet the condition of filter. >>>> Because the large data set is sorted, when there is one element doesn't >>>> meet the requirement, all the following elements are impossible to meet. >>>> But checking them one by one will take a relative long time. >>>> So is there any way to save time for this part? >>>> >>>> Thanks, >>>> >>>> Xiang >>>> >>>> -- >>>> Xiang Huo >>>> Department of Computer Science >>>> University of Illinois at Chicago(UIC) >>>> Chicago, Illinois >>>> US >>>> Email: [email protected] >>>> or [email protected] >>>> >>> >> >> >> -- >> Xiang Huo >> Department of Computer Science >> University of Illinois at Chicago(UIC) >> Chicago, Illinois >> US >> Email: [email protected] >> or [email protected] >> > > -- Xiang Huo Department of Computer Science University of Illinois at Chicago(UIC) Chicago, Illinois US Email: [email protected] or [email protected]
