It all begins with calling rdd.iterator, which calls
rdd.computeOrReadCheckpoint(). This materializes the RDD if it's not
already materialized, or reads a previously checkpointed version if it is.
See
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/RDD.scala#L216


On Thu, Apr 3, 2014 at 8:44 PM, David Thomas <dt5434...@gmail.com> wrote:

> I'm trying to understand the Spark soure code. Could you please point me
> to the code where the compute() function of RDD is called. Is that called
> by the workers?
>
>
> On Wed, Apr 2, 2014 at 5:36 PM, Patrick Wendell <pwend...@gmail.com>wrote:
>
>> The driver stores the meta-data associated with the partition, but the
>> re-computation will occur on an executor. So if several partitions are
>> lost, e.g. due to a few machines failing, the re-computation can be striped
>> across the cluster making it fast.
>>
>>
>> On Wed, Apr 2, 2014 at 11:27 AM, David Thomas <dt5434...@gmail.com>wrote:
>>
>>> Can someone explain how RDD is resilient? If one of the partition is
>>> lost, who is responsible to recreate that partition - is it the driver
>>> program?
>>>
>>
>>
>

Reply via email to