GitHub user tdas opened a pull request:

    https://github.com/apache/spark/pull/9983

    [SPARK-12004] Preserve the RDD partitioner through RDD checkpointing

    The solution is the save the RDD partitioner in a separate file in the RDD 
checkpoint directory. That is, `<checkpoint dir>/_partitioner`.  In most cases, 
whether the RDD partitioner was recovered or not, does not affect the 
correctness, only reduces performance. So this solution makes a best-effort 
attempt to save and recover the partitioner. If either fails, the checkpointing 
is not affected. This makes this patch safe and backward compatible.
     

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/tdas/spark SPARK-12004

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/9983.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #9983
    
----
commit 5a0a1f9f94a2ee2640d0c482012ba992cfe88180
Author: Tathagata Das <[email protected]>
Date:   2015-11-26T01:09:05Z

    Preserve partitioner through RDD checkpointing

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to