GitHub user ueshin opened a pull request:

    https://github.com/apache/spark/pull/12655

    [SPARK-13902][SCHEDULER] Make DAGScheduler.getAncestorShuffleDependencies() 
return in topological order to ensure building ancestor stages first.

    ## What changes were proposed in this pull request?
    
    `DAGScheduler`sometimes generate incorrect stage graph.
    Some stages are generated for the same shuffleId twice or more and they are 
referenced by the child stages because the building order of the graph is not 
correct.
    
    This patch is fixing it.
    
    ## How was this patch tested?
    
    I added the sample RDD graph to show the illegal stage graph to 
`DAGSchedulerSuite`.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ueshin/apache-spark issues/SPARK-13902

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/12655.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #12655
    
----
commit 9a1724de0287b5ca41e30f3d3401fd721a2e1520
Author: Takuya UESHIN <[email protected]>
Date:   2016-03-15T02:21:09Z

    Add a test to check if the stage graph is properly built.

commit f8b7910ecb52a5954de091ed79d5de9c19ba2744
Author: Takuya UESHIN <[email protected]>
Date:   2016-03-15T02:22:42Z

    Make DAGScheduler.getAncestorShuffleDependencies() return in topological 
order to ensure building ancestor stages first.

commit 0ea3fc838f689729794b6ea3aaf0b88a339ec20c
Author: Takuya UESHIN <[email protected]>
Date:   2016-03-16T02:04:45Z

    Refactor getAncestorShuffleDependencies.

commit 697b32208262b3c1c10bc2cae43b891c7970811d
Author: Takuya UESHIN <[email protected]>
Date:   2016-03-16T12:55:50Z

    Fix topological sort.

commit d6d3c34e0e8387ce6390babba3df2464a8b2b4a1
Author: Takuya UESHIN <[email protected]>
Date:   2016-03-17T12:21:32Z

    Merge branch 'master' into issues/SPARK-13902

commit 1636531c65912bbfb68e4c669690a9f9107d9cd1
Author: Takuya UESHIN <[email protected]>
Date:   2016-03-28T07:01:27Z

    Add assertion to check not to overwrite illegally.

commit 92e9f4484b09f65829f6e9300042cc2b57979278
Author: Takuya UESHIN <[email protected]>
Date:   2016-03-28T07:19:09Z

    Modify to mitigate adds extra push&pop.

commit 4b412f5e73ca9cf5ab2de1a51f6c30f01286e89a
Author: Takuya UESHIN <[email protected]>
Date:   2016-03-28T07:48:42Z

    Modify comment.

commit 8fb9a149a03543a35c2a08c79edc53d49f66b5c2
Author: Takuya UESHIN <[email protected]>
Date:   2016-03-28T08:11:37Z

    Add a comment to explain what the test is doing.

commit e2cfeaf3ef5a7291a235bbcbb968d88959e52e93
Author: Takuya UESHIN <[email protected]>
Date:   2016-03-29T03:22:36Z

    Revert "Add assertion to check not to overwrite illegally."
    
    This reverts commit 1636531c65912bbfb68e4c669690a9f9107d9cd1.

commit 3a8ff84622c3f136fa3511561a789163c94b2f2e
Author: Takuya UESHIN <[email protected]>
Date:   2016-04-05T02:58:53Z

    Modify to cut down on the repeated scanning of data structures.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to