GitHub user mccheah opened a pull request:
https://github.com/apache/spark/pull/3638
[SPARK-4737] Task set manager properly handles serialization errors
Dealing with [SPARK-4737], the handling of serialization errors should not
be the DAGScheduler's responsibility. The task set manager now catches the
error and aborts the stage.
If the TaskSetManager throws a TaskNotSerializableException, the
TaskSchedulerImpl will return an empty list of task descriptions, because no
tasks were started. The scheduler should abort the stage gracefully.
Note that I'm not too familiar with this part of the codebase and its place
in the overall architecture of the Spark stack. If implementing it this way
will have any averse side effects please voice that loudly.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/mccheah/spark
task-set-manager-properly-handle-ser-err
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/3638.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #3638
----
commit 097e7a21e15d3adf45687bd58ff095088f0282f7
Author: mcheah <[email protected]>
Date: 2014-12-06T01:45:41Z
[SPARK-4737] Catching task serialization exception in TaskSetManager
Our previous attempt at handling un-serializable tasks involved
selectively sampling a task from a task set, and attempting to serialize
it. If the serialization was successful, we assumed that all tasks in
the task set would also be serializable.
Unfortunately, this is not always the case. For example,
ParallelCollectionRDD may have both empty and non-empty partitions, and
the empty partitions would be serializable while the non-empty
partitions actually contain non-serializable objects. This is one of
many examples where sampling task serialization breaks.
When task serialization exceptions occurred in the TaskSchedulerImpl and
TaskSetManager, the result was that the exception was not caught and the
entire scheduler would crash. It would restart, but in a bad state.
There's no reason why the stage should not be aborted if any
serialization error occurs when submitting a task set. If any task in a
task set throws an exception upon serialization, the task set manager
informs the DAGScheduler that the stage failed, aborts the stage. The
TaskSchedulerImpl needs to return a set of task descriptions that were
successfully submitted, but the set will be empty in the case of a
serialization error.
commit bf5e706918d92c761fa537a88bc15ec2c4cc7838
Author: mcheah <[email protected]>
Date: 2014-12-08T20:39:45Z
Fixing indentation.
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]