GitHub user skonto opened a pull request:

    https://github.com/apache/spark/pull/19374

    [SPARK-22145][MESOS] fix supervise with checkpointing on mesos

    ## What changes were proposed in this pull request?
    
    - Fixes the issue with frameworkId being recovered by checkpointed data.
    - Keeps submission driver id is the only index for all data structures in 
the dispatcher. 
    Allocates a different task id per driver retry to satisfy the mesos 
requirements.
    Check the relevant ticket.
    ## How was this patch tested?
    
    Manually tested this. Launched a streaming job with checkpointing to hdfs, 
made the driver fail several times and observed behavior:
    
![image](https://user-images.githubusercontent.com/7945591/30940500-f7d2a744-a3e9-11e7-8c56-f2ccbb271e80.png)
    
    
![image](https://user-images.githubusercontent.com/7945591/30940550-19bc15de-a3ea-11e7-8a11-f48abfe36720.png)
    
    
![image](https://user-images.githubusercontent.com/7945591/30940524-083ea308-a3ea-11e7-83ae-00d3fa17b928.png)
    
    
![image](https://user-images.githubusercontent.com/7945591/30940579-2f0fb242-a3ea-11e7-82f9-86179da28b8c.png)
    
    
![image](https://user-images.githubusercontent.com/7945591/30940591-3b561b0e-a3ea-11e7-9dbd-e71912bb2ef3.png)
    
    
![image](https://user-images.githubusercontent.com/7945591/30940605-49c810ca-a3ea-11e7-8af5-67930851fd38.png)
    
    
![image](https://user-images.githubusercontent.com/7945591/30940631-59f4a288-a3ea-11e7-88cb-c3741b72bb13.png)
    
    
![image](https://user-images.githubusercontent.com/7945591/30940642-62346c9e-a3ea-11e7-8935-82e494925f67.png)
    
    
![image](https://user-images.githubusercontent.com/7945591/30940653-6c46d53c-a3ea-11e7-8dd1-5840d484d28c.png)
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/skonto/spark fix_retry

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/19374.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #19374
    
----
commit 0e5e5e0ef0d2ba030af71132955c63aadf4ca970
Author: Stavros Kontopoulos <st.kontopou...@gmail.com>
Date:   2017-09-27T22:04:38Z

    fix supervise with checkpointing

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to