Github user squito commented on the pull request:
https://github.com/apache/spark/pull/11105#issuecomment-197550485
just thinking aloud here -- it seems like the implementation is complicated
significantly by trying to support counters when you only partially read
partitions, eg. with `take()` etc. Is it really that meaningful to look at
these counters after those operations, since the user rarely cares about
whether a partition has been read fully?
Or is the whole point of this just to make sure that if there is caching +
subsequent full RDD materialization, you get sensible values? eg. something
like:
```scala
val myRdd = input.map{ x => acc += 1; x * 2}
myRdd.cache()
myRdd.take(N) // big enough to read one partition completely
myRdd.count()
println(acc.value) // now that you've read the entire rdd, the value must
be consistent
```
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]