GitHub user andrewor14 opened a pull request:

    https://github.com/apache/spark/pull/10835

    [SPARK-12895] Implement TaskMetrics with accumulators

    This was a part of #10717. It was split into its own patch so it's more 
reviewable.
    
    The idea in this issue is to implement `TaskMetrics` using accumulators 
internally so in the future (SPARK-12896) we can just send accumulator updates 
from the executors to the driver instead of BOTH accumulator updates AND 
`TaskMetrics`, which is what we do today.
    
    **As of this patch, accumulators are now sent both ways between drivers and 
executors.** Before this patch, accumulators were only sent from the driver to 
the executors. I intend to restore the original behavior in SPARK-12896, for 
which I already have changes ready (as shown in #10717). I'm introducing this 
temporary change in behavior only to break the patches down so they are more 
approachable to reviewers.
    
    As a result, there are a few TODO's in the code that describe these 
dangling behaviors, notably:
    
    - `Accumulable#_value` is no longer transient because the driver needs 
`TaskMetrics` values from the executors
    - Things like `acc.value` and `acc = 10` are now allowed on the executors. 
Previously we detected whether we're on the executors or on the driver through 
`readObject`, but we can't do that anymore.
    - We have to explicitly set an accumulator to `acc.zero` in 
`TaskContextImpl`, because we can't do that in `readObject` anymore.
    
    All of these are temporary and will no longer be the case once we do 
SPARK-12896.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/andrewor14/spark task-metrics-use-accums

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/10835.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #10835
    
----
commit 42ca72de7031281119877ada77d8e734a7f45028
Author: Andrew Or <[email protected]>
Date:   2016-01-18T22:55:44Z

    Change types of some signatures

commit 2c62000769e21b0f206b637ffefeb2671780fa9b
Author: Andrew Or <[email protected]>
Date:   2016-01-18T23:02:14Z

    Boiler plate for all the new internal accums

commit 11677227827735ee4999d4c8d536f17391c4ffac
Author: Andrew Or <[email protected]>
Date:   2016-01-19T00:34:16Z

    Squashed commit of the following:
    
    commit 269031f162cbce031efd4cdce55908f46569a8c8
    Author: Andrew Or <[email protected]>
    Date:   Mon Jan 18 16:33:12 2016 -0800
    
        Remove unused method
    
    commit c04b5df944e32d6854ab5ed4a282b77df889d481
    Author: Andrew Or <[email protected]>
    Date:   Mon Jan 18 16:13:08 2016 -0800
    
        Review comments
    
    commit d2e4e23be82a0afb2f39d629ee7413591bc08c8d
    Author: Andrew Or <[email protected]>
    Date:   Mon Jan 18 14:42:19 2016 -0800
    
        One more
    
    commit 202d48e5ceab044e941f8f5a2d866982a0072637
    Merge: e99b9af 4f11e3f
    Author: Andrew Or <[email protected]>
    Date:   Mon Jan 18 14:27:47 2016 -0800
    
        Merge branch 'master' of github.com:apache/spark into 
get-or-create-metrics
    
    commit e99b9af23b8135b312bc4a968ba9d2ba7d71e127
    Merge: 34c7ce5 b8cb548
    Author: Andrew Or <[email protected]>
    Date:   Mon Jan 18 13:56:41 2016 -0800
    
        Merge branch 'master' of github.com:apache/spark into 
get-or-create-metrics
    
        Conflicts:
                core/src/main/scala/org/apache/spark/CacheManager.scala
                
core/src/main/scala/org/apache/spark/memory/StorageMemoryPool.scala
                core/src/test/scala/org/apache/spark/CacheManagerSuite.scala
    
    commit 34c7ce5bf724c781a37c352277f7c5cd86d33c9a
    Author: Andrew Or <[email protected]>
    Date:   Mon Jan 18 12:46:42 2016 -0800
    
        Hide updatedBlocks
    
    commit ad094f071472b9cf7b9f9bdb7cd00d88c402995d
    Author: Andrew Or <[email protected]>
    Date:   Mon Jan 18 12:30:59 2016 -0800
    
        Clean up JsonProtocol
    
        This commit collapsed 10 methods into 2. The 8 that were inlined
        were only used in 1 place each, and the body of each was quite
        small. The additional level of abstraction did not add much value
        and made the code verbose.
    
    commit 078598409225224f0532a45f34dae533695b25df
    Author: Andrew Or <[email protected]>
    Date:   Mon Jan 18 12:20:28 2016 -0800
    
        Replace set with register
    
        JsonProtocol remains the only place where we still call set
        on each of the *Metrics classes.
    
    commit b9d7fbf37cc410d44e462d9d08650a20decc8fc9
    Author: Andrew Or <[email protected]>
    Date:   Mon Jan 18 12:10:17 2016 -0800
    
        Clean up places where we set OutputMetrics
    
        Note: there's one remaining place, which is JsonProtocol.
    
    commit 62c96e1cdc472356dfbfb24cf9650a8f36017224
    Author: Andrew Or <[email protected]>
    Date:   Mon Jan 18 11:50:04 2016 -0800
    
        Add register* methods (get or create)

commit 144df46ba989e507470c9a5026874b15b34792ad
Author: Andrew Or <[email protected]>
Date:   2016-01-19T01:06:13Z

    Implement TaskMetrics using Accumulators

commit 5ec17c1020f8987d801791d7e091d53b675a461b
Author: Andrew Or <[email protected]>
Date:   2016-01-19T01:26:08Z

    Fix accums not set on the driver
    
    Previously we would always zero out an accumulator when we
    deserialize it. Certainly we don't want to do that on the driver.
    The changes in this commit are temporary and will be reverted
    in SPARK-12896.

commit 362cde55486c51cc1644299626f71c60f4aab820
Author: Andrew Or <[email protected]>
Date:   2016-01-19T01:35:10Z

    Fix test compile
    
    Tests don't pass yet, obviously...

commit e43e8be0fbbb71e27485a7b4167389d331a9e95a
Author: Andrew Or <[email protected]>
Date:   2016-01-19T02:11:14Z

    Make accum updates read from all registered accums
    
    + fix tests, which are still failing

commit 2330a377a3268d17b7dad60965baf6f4e3d97ec6
Author: Andrew Or <[email protected]>
Date:   2016-01-19T18:55:24Z

    Fix metrics being double counted on driver
    
    ... by setting it to zero on the executors, always.

commit cca87cc63a9bef485aa1736df4ff73ced336ddf6
Author: Andrew Or <[email protected]>
Date:   2016-01-19T19:09:53Z

    Merge branch 'master' of github.com:apache/spark into 
task-metrics-use-accums
    
    Conflicts:
        core/src/main/scala/org/apache/spark/CacheManager.scala
        core/src/main/scala/org/apache/spark/executor/Executor.scala
        core/src/main/scala/org/apache/spark/executor/ShuffleWriteMetrics.scala
        core/src/main/scala/org/apache/spark/executor/TaskMetrics.scala
        
core/src/main/scala/org/apache/spark/shuffle/sort/SortShuffleWriter.scala
        core/src/main/scala/org/apache/spark/util/JsonProtocol.scala
        
core/src/test/scala/org/apache/spark/ui/jobs/JobProgressListenerSuite.scala
        core/src/test/scala/org/apache/spark/util/JsonProtocolSuite.scala

commit 4ead1ba25bf73bf5dda6a96e2aca32ce49787f9b
Author: Andrew Or <[email protected]>
Date:   2016-01-19T19:45:59Z

    Miscellaneous updates; make diff smaller

commit 0f40753ae932f31f3576c1046368cbfeea942975
Author: Andrew Or <[email protected]>
Date:   2016-01-19T19:55:17Z

    Fix JsonProtocolSuite
    
    Also add some useful error printing if things don't match.

commit 76b605cc7d4fb5f7e0bcd3337cdd688e82b6d766
Author: Andrew Or <[email protected]>
Date:   2016-01-19T19:58:42Z

    Fix SQLQuerySuite

commit 7b5d840cda21c74b1f50f36d62aa19f096958b46
Author: Andrew Or <[email protected]>
Date:   2016-01-19T20:03:07Z

    Fix style

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to