GitHub user mccheah opened a pull request:

    https://github.com/apache/spark/pull/3638

    [SPARK-4737] Task set manager properly handles serialization errors

    Dealing with [SPARK-4737], the handling of serialization errors should not 
be the DAGScheduler's responsibility. The task set manager now catches the 
error and aborts the stage.
    
    If the TaskSetManager throws a TaskNotSerializableException, the 
TaskSchedulerImpl will return an empty list of task descriptions, because no 
tasks were started. The scheduler should abort the stage gracefully.
    
    Note that I'm not too familiar with this part of the codebase and its place 
in the overall architecture of the Spark stack. If implementing it this way 
will have any averse side effects please voice that loudly.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/mccheah/spark 
task-set-manager-properly-handle-ser-err

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/3638.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #3638
    
----
commit 097e7a21e15d3adf45687bd58ff095088f0282f7
Author: mcheah <mch...@palantir.com>
Date:   2014-12-06T01:45:41Z

    [SPARK-4737] Catching task serialization exception in TaskSetManager
    
    Our previous attempt at handling un-serializable tasks involved
    selectively sampling a task from a task set, and attempting to serialize
    it. If the serialization was successful, we assumed that all tasks in
    the task set would also be serializable.
    
    Unfortunately, this is not always the case. For example,
    ParallelCollectionRDD may have both empty and non-empty partitions, and
    the empty partitions would be serializable while the non-empty
    partitions actually contain non-serializable objects. This is one of
    many examples where sampling task serialization breaks.
    
    When task serialization exceptions occurred in the TaskSchedulerImpl and
    TaskSetManager, the result was that the exception was not caught and the
    entire scheduler would crash. It would restart, but in a bad state.
    
    There's no reason why the stage should not be aborted if any
    serialization error occurs when submitting a task set. If any task in a
    task set throws an exception upon serialization, the task set manager
    informs the DAGScheduler that the stage failed, aborts the stage. The
    TaskSchedulerImpl needs to return a set of task descriptions that were
    successfully submitted, but the set will be empty in the case of a
    serialization error.

commit bf5e706918d92c761fa537a88bc15ec2c4cc7838
Author: mcheah <mch...@palantir.com>
Date:   2014-12-08T20:39:45Z

    Fixing indentation.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to