GitHub user heary-cao opened a pull request:
https://github.com/apache/spark/pull/19693
[CORE]improved statistical shuffle write time
## What changes were proposed in this pull request?
Creating the file to write to and creating a disk writer both involve
interacting with the disk, and can take a long time when we open or close many
files, so should be included in the shuffle write time.
so we call mergeSpillsWithTransferTo, only contains the write file the
time, but did not included in the shuffle write time when open and close many
merges spill files .
## How was this patch tested?
existed test cases.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/heary-cao/spark task_statistics
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/19693.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #19693
----
commit e1d6df4cecc757a7f66feefa2e3bd6816e7abd3f
Author: caoxuewen <[email protected]>
Date: 2017-11-08T07:57:28Z
improved statistical shuffle write time
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]