GitHub user cloud-fan opened a pull request:
https://github.com/apache/spark/pull/12417
[SPARK-14628][CORE] Simplify task metrics by always tracking read/write
metrics
## What changes were proposed in this pull request?
Part of the reason why TaskMetrics and its callers are complicated are due
to the optional metrics we collect, including input, output, shuffle read, and
shuffle write. I think we can always track them and just assign 0 as the
initial values. It is usually very obvious whether a task is supposed to read
any data or not. By always tracking them, we can remove a lot of map, foreach,
flatMap, getOrElse(0L) calls throughout Spark.
This patch also changes a few behaviors.
1. Removed the distinction of data read/write methods (e.g. Hadoop, Memory,
Network, etc).
2. Accumulate all data reads and writes, rather than only the first method.
(Fixes SPARK-5225)
## How was this patch tested?
existing tests.
This is bases on https://github.com/apache/spark/pull/12388, with more test
fixes.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/cloud-fan/spark metrics-refactor
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/12417.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #12417
----
commit 997f1e1ebdf1f376edd9ade8fe44cbea57a706b5
Author: Reynold Xin <[email protected]>
Date: 2016-04-14T07:51:14Z
Always track ShuffleReadMetrics (i.e. not an option)
commit 876b471447981faaf29860a46191af726d4b6777
Author: Reynold Xin <[email protected]>
Date: 2016-04-14T22:34:27Z
first round commits
commit b27b97b2bcdc9906a0a1a6d9f33a07865bc427d8
Author: Reynold Xin <[email protected]>
Date: 2016-04-14T22:47:40Z
mima and remove more options
commit 4a4a8bffc3712140f619c16e222b97e3e86a52de
Author: Reynold Xin <[email protected]>
Date: 2016-04-14T22:51:13Z
remove more options in StatsReportListener
commit 3cac11fee7c3af5200c1decb47d501edd68ee397
Author: Reynold Xin <[email protected]>
Date: 2016-04-14T23:04:49Z
fix comment
commit b0493f1e2845e64a7644e7792d3d1c40c2ca9870
Author: Reynold Xin <[email protected]>
Date: 2016-04-15T00:54:14Z
fix one test problem
commit 7791215509e9d48c822bcd392cdbb7bea9f8647e
Author: Reynold Xin <[email protected]>
Date: 2016-04-15T00:56:09Z
fix more test cases
commit fa78b5e359b239d919b5728970a1467a674dee99
Author: Reynold Xin <[email protected]>
Date: 2016-04-15T01:17:50Z
delete a failing test case since it is invalid now
commit cbc154f441e37263651c3f074ab7730636ba5115
Author: Wenchen Fan <[email protected]>
Date: 2016-04-15T13:03:02Z
fix all tests
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]