GitHub user tdas opened a pull request:
https://github.com/apache/spark/pull/15396
[SPARK-14804][Spark][Graphx] Fix checkpointing of VertexRDD/EdgeRDD
## What changes were proposed in this pull request?
EdgeRDD/VertexRDD overrides checkpoint() and isCheckpointed() to forward
these to the internal partitionRDD. So when checkpoint() is called on them, its
the partitionRDD that actually gets checkpointed. However since
isCheckpointed() also overridden to call partitionRDD.isCheckpointed,
EdgeRDD/VertexRDD.isCheckpointed returns true even though this RDD is actually
not checkpointed.
This would have been fine except the RDD's internal logic for computing the
RDD depends on isCheckpointed(). So for VertexRDD/EdgeRDD, since isCheckpointed
is true, when computing Spark tries to read checkpoint data of
VertexRDD/EdgeRDD even though they are not actually checkpointed. Through a
crazy sequence of call forwarding, it reads checkpoint data of partitionsRDD
and tries to cast it to types in Vertex/EdgeRDD. This leads to
ClassCastException.
The minimal fix that does not change any public behavior is to modify RDD
internal to not use public override-able API for internal logic.
## How was this patch tested?
New unit tests.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/tdas/spark SPARK-14804
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/15396.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #15396
----
commit 22d16b46e8ea18ec7a1b585103aa72f77a7e78f7
Author: Tathagata Das <[email protected]>
Date: 2016-10-07T22:04:31Z
Fixed checkpointing
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]