Github user JoshRosen commented on the pull request:
https://github.com/apache/spark/pull/1381#issuecomment-55626881
@aaronjosephs The binary search is a good idea, although I think there are
a few subtleties involved in getting it to work generally. Imagine that I call
sortByKey() on an RDD and then perform a transformation that preserves
sortedness (e.g. mapValues() or a regular filter()). In these cases, it would
be nice to recognize that the RDD is still sorted. For partitioners, we have
flags like `preservesPartitioning` for tracking which operations preserve the
space of keys in a partition, so it might be nice to add something similar for
other properties, such as sortedness, distinctness, etc.
Personally, I feel like that might be a larger design challenge that might
be worth deferring for a separate PR.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]