Github user harishreedharan commented on the pull request:

    https://github.com/apache/spark/pull/5008#issuecomment-78852368
  
    @tdas - Actually this fixes one part of the problem, which is caused by 
starting of checkpoint at the time the job is generated. 
    
    But this can still cause an issue if you set `concurrentJobs` is pretty 
high. If you set that parameter high enough, a batch which may have started at 
time `t + maxRememberDuration`, might end up completing and checkpointing 
before a batch at time `t` if the batch at time `t` takes longer to get 
processed. 
    
    I have seen people set `concurrentJobs` to be pretty high when the cluster 
is large and the processing order is not exactly relevant. #4964 actually takes 
care of that situation (which is what the maps are for). If that is not 
required, there is an even easier fix than this one, which was in a previous 
commit in that PR: 
https://github.com/harishreedharan/spark/commit/fa93b871ba0fe22924ff0273e975e492a6a7043c
    Simply keep track of the last completed batch and delete the checkpoint 
when the checkpoint time is actually the same as the last completed batch.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to