GitHub user tdas opened a pull request:

    https://github.com/apache/spark/pull/9988

    [SPARK-11932][STREAMING] Partition previous state RDD if partitioner not 
present

    The reason is that TrackStateRDDs generated by trackStateByKey expect the 
previous batch's TrackStateRDDs to have a partitioner. However, when recovery 
from DStream checkpoints, the RDDs recovered from RDD checkpoints do not have a 
partitioner attached to it. This is because RDD checkpoints do not preserve the 
partitioner (SPARK-12004).
    
    While #9983 solves SPARK-12004 by preserving the partitioner through RDD 
checkpoints, there may be a non-zero chance that the saving and recovery fails. 
To be resilient, this PR repartitions the previous state RDD if the partitioner 
is not detected. 

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/tdas/spark SPARK-11932

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/9988.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #9988
    
----
commit 0c5fe55b1ff8da4cca28b91860afbcfbd28e7422
Author: Tathagata Das <[email protected]>
Date:   2015-11-26T01:54:44Z

    Partition previous state RDD if partitioner not present

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to