Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/22524
@viirya thanks for adding the explanation! I think it's very clear and
helpful. By reading this, I have a new idea.
It seems to me that limit is mostly to stop produce data earlier for
upstream operators, so the code template should look like
```
while (iterator.hasNext() && !stopEarly()) {
// upstream operators
...
if (count < given_limit) {
count += 1
consume... // down stream operators
} else {
setStopEarly(true);
}
...
}
```
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]