GitHub user cloud-fan opened a pull request:

    https://github.com/apache/spark/pull/12417

    [SPARK-14628][CORE] Simplify task metrics by always tracking read/write 
metrics

    ## What changes were proposed in this pull request?
    
    Part of the reason why TaskMetrics and its callers are complicated are due 
to the optional metrics we collect, including input, output, shuffle read, and 
shuffle write. I think we can always track them and just assign 0 as the 
initial values. It is usually very obvious whether a task is supposed to read 
any data or not. By always tracking them, we can remove a lot of map, foreach, 
flatMap, getOrElse(0L) calls throughout Spark.
    
    This patch also changes a few behaviors.
    
    1. Removed the distinction of data read/write methods (e.g. Hadoop, Memory, 
Network, etc).
    2. Accumulate all data reads and writes, rather than only the first method. 
(Fixes SPARK-5225)
    
    
    ## How was this patch tested?
    
    existing tests.
    
    This is bases on https://github.com/apache/spark/pull/12388, with more test 
fixes.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/cloud-fan/spark metrics-refactor

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/12417.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #12417
    
----
commit 997f1e1ebdf1f376edd9ade8fe44cbea57a706b5
Author: Reynold Xin <[email protected]>
Date:   2016-04-14T07:51:14Z

    Always track ShuffleReadMetrics (i.e. not an option)

commit 876b471447981faaf29860a46191af726d4b6777
Author: Reynold Xin <[email protected]>
Date:   2016-04-14T22:34:27Z

    first round commits

commit b27b97b2bcdc9906a0a1a6d9f33a07865bc427d8
Author: Reynold Xin <[email protected]>
Date:   2016-04-14T22:47:40Z

    mima and remove more options

commit 4a4a8bffc3712140f619c16e222b97e3e86a52de
Author: Reynold Xin <[email protected]>
Date:   2016-04-14T22:51:13Z

    remove more options in StatsReportListener

commit 3cac11fee7c3af5200c1decb47d501edd68ee397
Author: Reynold Xin <[email protected]>
Date:   2016-04-14T23:04:49Z

    fix comment

commit b0493f1e2845e64a7644e7792d3d1c40c2ca9870
Author: Reynold Xin <[email protected]>
Date:   2016-04-15T00:54:14Z

    fix one test problem

commit 7791215509e9d48c822bcd392cdbb7bea9f8647e
Author: Reynold Xin <[email protected]>
Date:   2016-04-15T00:56:09Z

    fix more test cases

commit fa78b5e359b239d919b5728970a1467a674dee99
Author: Reynold Xin <[email protected]>
Date:   2016-04-15T01:17:50Z

    delete a failing test case since it is invalid now

commit cbc154f441e37263651c3f074ab7730636ba5115
Author: Wenchen Fan <[email protected]>
Date:   2016-04-15T13:03:02Z

    fix all tests

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to