Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/4067#issuecomment-72428547
  
    @rxin yeah that seems good.
    
    @ksakellis one other thing I realized which is a little confusing, right 
now we report the bytes as "read" from a cached RDD as soon as they are 
fetched/present on the executor even if they have not been consumed by the 
task. Tracking consumption incrementally (in bytes) will be really hard, so 
maybe this is the best answer for now. It's a bit weird though - I think this 
logic was written before we sent incremental updates back. In terms of getting 
the total _records_ read, we might need to make the assumption that the 
iterator is consumed in its entirety.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to