Github user mateiz commented on the pull request:
https://github.com/apache/spark/pull/1639#issuecomment-50581291
In case you don't see the hidden comment above: I don't think mapPartitions
would hurt performance here. All you do is pass through the parent's iterator.
When you call compute() you're already deserializing the RDD, so this won't
create extra work in that case.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---