GitHub user foxik opened a pull request:
https://github.com/apache/spark/pull/4761
[SPARK-5969][PySpark] Fix descending pyspark.rdd.sortByKey.
The samples should always be sorted in ascending order, because
bisect.bisect_left is used on it. The reverse order of the result is already
achieved in rangePartitioner by reversing the found index.
The current implementation also work, but always uses only two partitions
-- the first one and the last one (because the bisect_left return returns
either "beginning" or "end" for a descending sequence).
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/foxik/spark fix-descending-sort
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/4761.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #4761
----
commit 5757490b5ff4f233ecbcaabd13d0282522884649
Author: Milan Straka <[email protected]>
Date: 2015-02-25T07:01:46Z
Fix descending pyspark.rdd.sortByKey.
The samples should always be sorted in ascending order, because
bisect.bisect_left is used on it. The reverse order of the result
is already achieved in rangePartitioner by reversing the found index.
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]