Github user mateiz commented on the pull request:
https://github.com/apache/spark/pull/136#issuecomment-37879126
That approach does look better, though there seem to be some bugs in the
code (e.g. compute() always works on partitions(0), and that code doesn't
handle the case if many partitions are empty and you need to look ahead more
than one).
This operation is kind of tricky in general -- it may be worth doing it in
just MLlib at first.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---