GitHub user ueshin opened a pull request:
https://github.com/apache/spark/pull/12060
[SPARK-14269][SCHEDULER] Eliminate unnecessary submitStage() call.
## What changes were proposed in this pull request?
Currently a method `submitStage()` for waiting stages is called on every
iteration of the event loop in `DAGScheduler` to submit all waiting stages, but
most of them are not necessary because they are not related to Stage status.
The case we should try to submit waiting stages is only when their parent
stages are successfully completed.
This elimination can improve `DAGScheduler` performance.
## How was this patch tested?
Added some checks and other existing tests, and our projects.
We have a project bottle-necked by `DAGScheduler`, having about 2000 stages.
Before this patch the almost all execution time in `Driver` process was
spent to process `submitStage()` of `dag-scheduler-event-loop` thread but after
this patch the performance was improved as follows:
| | total execution time | `dag-scheduler-event-loop` thread time |
`submitStage()` |
|--------|---------------------:|---------------------------------------:|----------------:|
| Before | 760 sec | 710 sec |
667 sec |
| After | 440 sec | 14 sec |
10 sec |
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/ueshin/apache-spark issues/SPARK-14269
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/12060.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #12060
----
commit 9a1724de0287b5ca41e30f3d3401fd721a2e1520
Author: Takuya UESHIN <[email protected]>
Date: 2016-03-15T02:21:09Z
Add a test to check if the stage graph is properly built.
commit f8b7910ecb52a5954de091ed79d5de9c19ba2744
Author: Takuya UESHIN <[email protected]>
Date: 2016-03-15T02:22:42Z
Make DAGScheduler.getAncestorShuffleDependencies() return in topological
order to ensure building ancestor stages first.
commit 0ea3fc838f689729794b6ea3aaf0b88a339ec20c
Author: Takuya UESHIN <[email protected]>
Date: 2016-03-16T02:04:45Z
Refactor getAncestorShuffleDependencies.
commit 697b32208262b3c1c10bc2cae43b891c7970811d
Author: Takuya UESHIN <[email protected]>
Date: 2016-03-16T12:55:50Z
Fix topological sort.
commit d6d3c34e0e8387ce6390babba3df2464a8b2b4a1
Author: Takuya UESHIN <[email protected]>
Date: 2016-03-17T12:21:32Z
Merge branch 'master' into issues/SPARK-13902
commit 1636531c65912bbfb68e4c669690a9f9107d9cd1
Author: Takuya UESHIN <[email protected]>
Date: 2016-03-28T07:01:27Z
Add assertion to check not to overwrite illegally.
commit 92e9f4484b09f65829f6e9300042cc2b57979278
Author: Takuya UESHIN <[email protected]>
Date: 2016-03-28T07:19:09Z
Modify to mitigate adds extra push&pop.
commit 4b412f5e73ca9cf5ab2de1a51f6c30f01286e89a
Author: Takuya UESHIN <[email protected]>
Date: 2016-03-28T07:48:42Z
Modify comment.
commit 8fb9a149a03543a35c2a08c79edc53d49f66b5c2
Author: Takuya UESHIN <[email protected]>
Date: 2016-03-28T08:11:37Z
Add a comment to explain what the test is doing.
commit e2cfeaf3ef5a7291a235bbcbb968d88959e52e93
Author: Takuya UESHIN <[email protected]>
Date: 2016-03-29T03:22:36Z
Revert "Add assertion to check not to overwrite illegally."
This reverts commit 1636531c65912bbfb68e4c669690a9f9107d9cd1.
commit e3c0de33290aaccdd826d5ca38b87ace73a01fb5
Author: Takuya UESHIN <[email protected]>
Date: 2016-03-11T05:45:24Z
Eliminate unnecessary `submitWaitingStages()` call.
commit b73eaac805dca779fd7635f63fdd12c78e634509
Author: Takuya UESHIN <[email protected]>
Date: 2016-03-30T09:29:33Z
Merge branch 'issues/SPARK-13902' into issues/SPARK-14269
commit 88c4bc1dd1c36b456432de2c895054799ff97a20
Author: Takuya UESHIN <[email protected]>
Date: 2016-03-30T08:47:20Z
Add some checks.
commit a304235c4b086469aa5b5ac8a7d2f0d25addc86f
Author: Takuya UESHIN <[email protected]>
Date: 2016-03-15T03:19:03Z
Try to submit only child stages of the completed stage.
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]