GitHub user kayousterhout opened a pull request:

    https://github.com/apache/spark/pull/16877

    [WIP] [SPARK-19538] Explicitly tell the DAGScheduler when a TaskSet is 
complete

    **** This is not intended to be merged! (see note at bottom) Posting to 
facilitate discussion around #16620. ****
    
    The pendingPartitions in Stage tracks partitions that still need to be 
computed, and is used by the DAGScheduler to determine when to mark the stage 
as complete. In most cases, this variable is exactly consistent with the tasks 
in the TaskSetManager (for the current version of the stage) that are still 
pending. However, as discussed in SPARK-19263, these can become inconsistent 
when an ShuffleMapTask for an earlier attempt of the stage completes, in which 
case the DAGScheduler may think the stage has finished, while the 
TaskSetManager is still waiting for some tasks to complete (see the description 
in this pull request: https://github.com/apache/spark/pull/16620). This leads 
to bugs like SPARK-19263. Another problem with this behavior is that listeners 
can get two StageCompleted messages: once when the DAGScheduler thinks the 
stage is complete, and a second when the TaskSetManager later decides the stage 
is complete. We should fix this.
    
    Unfortunately, merging this would lead to a performance regression, as 
discussed in #16620.  Currently, the DAGScheduler may mark a Stage as completed 
"early", before the TSM thinks it's done, based on task completions from 
earlier attempts of the stage, and with this commit, that can't happen.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/kayousterhout/spark-1 SPARK-19538

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/16877.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #16877
    
----
commit 3e9af42a6238bc786153db41320a5dbc10d917d0
Author: Kay Ousterhout <[email protected]>
Date:   2017-02-09T22:00:18Z

    [SPARK-19537] Move pendingPartitions to ShuffleMapStage.
    
    The pendingPartitions instance variable should be moved to ShuffleMapStage,
    because it is only used by ShuffleMapStages. This change is purely 
refactoring
    and does not change functionality.

commit b575e933dc6f774f8e9ca5dc23ef5f412181ab0e
Author: Kay Ousterhout <[email protected]>
Date:   2017-02-09T23:49:21Z

    [SPARK-19538] Explicitly tell the DAGScheduler when the task set is 
considered complete
    
    This avoids the DAGScheduler needing to do its own inference of when the
    TaskSetManager is complete.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to