Github user squito commented on the pull request:

    https://github.com/apache/spark/pull/11105#issuecomment-197550485
  
    just thinking aloud here -- it seems like the implementation is complicated 
significantly by trying to support counters when you only partially read 
partitions, eg. with `take()` etc.  Is it really that meaningful to look at 
these counters after those operations, since the user rarely cares about 
whether a partition has been read fully?
    
    Or is the whole point of this just to make sure that if there is caching + 
subsequent full RDD materialization, you get sensible values?  eg. something 
like:
    
    ```scala
    val myRdd = input.map{ x => acc += 1; x * 2}
    myRdd.cache()
    myRdd.take(N) // big enough to read one partition completely
    myRdd.count()
    println(acc.value) // now that you've read the entire rdd, the value must 
be consistent
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to