GitHub user tdas opened a pull request:

    https://github.com/apache/spark/pull/15396

    [SPARK-14804][Spark][Graphx] Fix checkpointing of VertexRDD/EdgeRDD

    ## What changes were proposed in this pull request?
    EdgeRDD/VertexRDD overrides checkpoint() and isCheckpointed() to forward 
these to the internal partitionRDD. So when checkpoint() is called on them, its 
the partitionRDD that actually gets checkpointed. However since 
isCheckpointed() also overridden to call partitionRDD.isCheckpointed, 
EdgeRDD/VertexRDD.isCheckpointed returns true even though this RDD is actually 
not checkpointed.
    
    This would have been fine except the RDD's internal logic for computing the 
RDD depends on isCheckpointed(). So for VertexRDD/EdgeRDD, since isCheckpointed 
is true, when computing Spark tries to read checkpoint data of 
VertexRDD/EdgeRDD even though they are not actually checkpointed. Through a 
crazy sequence of call forwarding, it reads checkpoint data of partitionsRDD 
and tries to cast it to types in Vertex/EdgeRDD. This leads to 
ClassCastException.
    
    The minimal fix that does not change any public behavior is to modify RDD 
internal to not use public override-able API for internal logic. 
    
    ## How was this patch tested?
    New unit tests.
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/tdas/spark SPARK-14804

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/15396.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #15396
    
----
commit 22d16b46e8ea18ec7a1b585103aa72f77a7e78f7
Author: Tathagata Das <[email protected]>
Date:   2016-10-07T22:04:31Z

    Fixed checkpointing

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to