GitHub user mccheah opened a pull request: https://github.com/apache/spark/pull/3638
[SPARK-4737] Task set manager properly handles serialization errors Dealing with [SPARK-4737], the handling of serialization errors should not be the DAGScheduler's responsibility. The task set manager now catches the error and aborts the stage. If the TaskSetManager throws a TaskNotSerializableException, the TaskSchedulerImpl will return an empty list of task descriptions, because no tasks were started. The scheduler should abort the stage gracefully. Note that I'm not too familiar with this part of the codebase and its place in the overall architecture of the Spark stack. If implementing it this way will have any averse side effects please voice that loudly. You can merge this pull request into a Git repository by running: $ git pull https://github.com/mccheah/spark task-set-manager-properly-handle-ser-err Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/3638.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3638 ---- commit 097e7a21e15d3adf45687bd58ff095088f0282f7 Author: mcheah <mch...@palantir.com> Date: 2014-12-06T01:45:41Z [SPARK-4737] Catching task serialization exception in TaskSetManager Our previous attempt at handling un-serializable tasks involved selectively sampling a task from a task set, and attempting to serialize it. If the serialization was successful, we assumed that all tasks in the task set would also be serializable. Unfortunately, this is not always the case. For example, ParallelCollectionRDD may have both empty and non-empty partitions, and the empty partitions would be serializable while the non-empty partitions actually contain non-serializable objects. This is one of many examples where sampling task serialization breaks. When task serialization exceptions occurred in the TaskSchedulerImpl and TaskSetManager, the result was that the exception was not caught and the entire scheduler would crash. It would restart, but in a bad state. There's no reason why the stage should not be aborted if any serialization error occurs when submitting a task set. If any task in a task set throws an exception upon serialization, the task set manager informs the DAGScheduler that the stage failed, aborts the stage. The TaskSchedulerImpl needs to return a set of task descriptions that were successfully submitted, but the set will be empty in the case of a serialization error. commit bf5e706918d92c761fa537a88bc15ec2c4cc7838 Author: mcheah <mch...@palantir.com> Date: 2014-12-08T20:39:45Z Fixing indentation. ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org