Github user markhamstra commented on the pull request:
https://github.com/apache/spark/pull/639#issuecomment-42195672
@CodingCat The documentation on Akka's handling of exceptions thrown from
scheduled functions isn't entirely clear, so before re-opening SPARK-1620 I
wrote various pieces of test code to discover exactly what the behavior is.
From what I can tell, a default uncaught exception handler set in the thread
that calls scheduler.schedule won't catch exceptions from a scheduled function,
but it will catch at least some exceptions thrown by Akka itself (e.g. if the
scheduled function closes over a reference that becomes unreachable in the
scheduled function.) Similarly, an uncaught exception handler set within the
ActorSystem (as we do in IndestructibleActorSystem) won't catch exceptions
thrown by scheduled functions.
What will work is an explicit try-catch in the scheduled code (which can
allow the scheduled event to happen again) or setting an uncaught exception
handler explicitly on the scheduled thread (which will not allow any further
scheduled events after the first thrown exception.) The latter approach is
about as generalizable as we can get, I think, and is the approach I took in
https://github.com/apache/spark/pull/622. That PR could be extended to cover
the scheduling of completeRecovery() as well, but maybe the fix for the rest of
SPARK-1686 will make that unnecessary.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---