Github user sryza commented on the pull request:

    https://github.com/apache/spark/pull/2087#issuecomment-57236118
  
    > The current approach couples the updating of this metric with the 
heartbeats in a way that seems strange.
    
    The heartbeats (and task completion, which, my bad, I need to add in) are 
the only times when we use the value of this metric.  Is there an advantage to 
adding complexity to keep it more up to date than that?  We'd also be adding an 
extra branch on the read path, which I suppose might not be much compared with 
the crazy stuff Hadoop record readers do, but could still be a small perf hit.  
Last, in the (rare) case where we're reading a single huge record, we wouldn't 
get incremental measurements within it.
    
    We use a similar approach for shuffleReadMetrics, aggregating it across 
readers right before sending it to the driver.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to