Github user markhamstra commented on the pull request:
https://github.com/apache/spark/pull/3632#issuecomment-68069421
The reason for separate classes is to cleanly segregate the
available/supportable functionality. Not every `PairRDD` has keys that can be
ordered, so `sortByKey` shouldn't be part of `PairRDD`. When keys can be
ordered, there is often a natural ordering that is already implicitly in scope.
When that is true, then we don't want to force the user to explicitly provide
an `Ordering` -- e.g. if you have an `RDD[Int, Foo]`, then rdd.sortByKey()
should just work. If you want a different Ordering, then you just need to
bring a new implicit Ordering for that key type into scope.
Things aren't as cleanly separated in the Java API because of the lack of
support for implicits there, but that doesn't mean that we should abandon the
separation between `PairRDD` and `OrderedRDD` on the Scala side or start
dirtying-up `PairRDD.scala` when we want to provide new methods for RDDs whose
keys and values can both be ordered.
I really think that we want to repeat the pattern of `OrderedRDD` for these
`DoublyOrderedRDD` -- or whatever better name you can come up with. The
biggest quirk I can see right now is if the types of both keys and values are
the same but you want to order them one way when sorting by key and a different
way when doing the secondary sort on values. That won't work with implicits
since there can only be one implicit `Ordering` for the type in scope at a
time. The problem could either be avoided by using distinct types for the key
and value roles, or a method signature with explicit orderings could be added
to address this corner case.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]