GitHub user tdas opened a pull request:
https://github.com/apache/spark/pull/9983
[SPARK-12004] Preserve the RDD partitioner through RDD checkpointing
The solution is the save the RDD partitioner in a separate file in the RDD
checkpoint directory. That is, `<checkpoint dir>/_partitioner`. In most cases,
whether the RDD partitioner was recovered or not, does not affect the
correctness, only reduces performance. So this solution makes a best-effort
attempt to save and recover the partitioner. If either fails, the checkpointing
is not affected. This makes this patch safe and backward compatible.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/tdas/spark SPARK-12004
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/9983.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #9983
----
commit 5a0a1f9f94a2ee2640d0c482012ba992cfe88180
Author: Tathagata Das <[email protected]>
Date: 2015-11-26T01:09:05Z
Preserve partitioner through RDD checkpointing
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]