It all begins with calling rdd.iterator, which calls rdd.computeOrReadCheckpoint(). This materializes the RDD if it's not already materialized, or reads a previously checkpointed version if it is. See https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/RDD.scala#L216
On Thu, Apr 3, 2014 at 8:44 PM, David Thomas <dt5434...@gmail.com> wrote: > I'm trying to understand the Spark soure code. Could you please point me > to the code where the compute() function of RDD is called. Is that called > by the workers? > > > On Wed, Apr 2, 2014 at 5:36 PM, Patrick Wendell <pwend...@gmail.com>wrote: > >> The driver stores the meta-data associated with the partition, but the >> re-computation will occur on an executor. So if several partitions are >> lost, e.g. due to a few machines failing, the re-computation can be striped >> across the cluster making it fast. >> >> >> On Wed, Apr 2, 2014 at 11:27 AM, David Thomas <dt5434...@gmail.com>wrote: >> >>> Can someone explain how RDD is resilient? If one of the partition is >>> lost, who is responsible to recreate that partition - is it the driver >>> program? >>> >> >> >