Github user erikerlandson commented on the pull request:
https://github.com/apache/spark/pull/1254#issuecomment-47420288
My reasoning is that most use cases (or at least the ones I had in mind)
are something like rdd.drop(n), where n is much smaller than rdd.count(),
generally 1 or some other small number. FWIW, I implemented it via an
implicit object, so it's not directly on the RDD class per se. Another way to
look at it, these functions aren't worse than rdd.take(), as they use similar
logic.
However, it's true that if (n) is a large fraction of the size of the RDD,
then it will invoke computation of a large fraction of the partitions.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---