Github user markhamstra commented on the pull request:

    https://github.com/apache/spark/pull/639#issuecomment-42154867
  
    Okay, first, I didn't start work on this JIRA myself because I haven't had 
the time to come to a complete understanding of the current and intended 
functionality, nor of the scope of the problem from running completeRecovery() 
on an Akka scheduler thread, so you shouldn't assume that the JIRA fully 
describes the problem or that solving just what it describes constitutes a 
complete fix.  For example, any exception currently thrown by 
completeRecovery() when run by the scheduler will not be caught because of 
SPARK-1620, so a complete fix to the use of the scheduler from `case 
ElectedLeader` will also have to incorporate whatever we end up doing to fix 
that issue.
    
    Second, what you've done in this PR significantly changes the results of 
calling completeRecovery(), and there is at least a minor problem in doing so.  
Previously, schedule() would be called synchronously after setting 
RecoveryState.ALIVE and before logging the completion of recovery.  Now you're 
causing recovery() to be called asynchronously when the TriggerSchedule event 
makes its way to the head of the message queue.  That will potentially be long 
after RecoveryState.ALIVE has been set, the completion of recovery has been 
logged, and other events are processed with the new RecoveryState even though 
schedule() has not yet been run.  The out-of-order logging is at least a minor 
problem.  The processing of other events before schedule() has been run is 
potentially a major problem -- but like I said, I haven't yet worked through 
exactly how things are supposed to work, so I'm also not certain how big of a 
problem this PR would create.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to