[GitHub] spark pull request: SPARK-2099. Report progress while task is runn...

pwendell Thu, 24 Jul 2014 00:23:28 -0700

Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/1056#issuecomment-49975385
  
    I was actually a bit confused why `updateShuffleReadMetrics` is 
synchronized. Can that be called from multiple threads as-is? I wasn't aware of 
cases where we had multi-threaded access to `TaskMetrics` prior to this patch.
    
    To your point, there are really only two fields here that are not just 
commutative counters - `hostname` and `shuffleFinishTime`. Is there is a race 
where `hosntame` gets sent over as null in a heartbeat and it triggers an NPE 
downstream? Maybe we should set `shuffleFinishTime` to -1 by default to 
indicate it's not populated.
    
    My motivation for adding extra typing was to make it explicit for people 
extending TaskMetrics in the future. We already have some proposals for 
augmenting the `TaskMetrics` class and some of them add more complicated 
fields. Perhaps an alternative is just to document clearly in the class that we 
might take a snapshot of the TaskMetrics at any time and send it to consumers 
of the SparkListener interface.
    
    Over time, I'd actually hope these get re-implemented as accumulators...




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-2099. Report progress while task is runn...

Reply via email to