Github user mateiz commented on the pull request:
https://github.com/apache/spark/pull/931#issuecomment-44909735
Hey @xiajunluan, this is a good start, but I made some comments throughout.
There are a few other question though:
- Performance: have you benchmarked this against the old version for
non-sorting use cases? We need to make sure the pluggable Comparator doesn't
break stuff
- Long-term it would be good to spill values even within a key for sort,
i.e. don't have ArrayBuffer as a combiner, just put in many values. But this
probably can't be done easily in this patch.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---