GitHub user foxik opened a pull request:

    https://github.com/apache/spark/pull/4761

    [SPARK-5969][PySpark] Fix descending pyspark.rdd.sortByKey.

    The samples should always be sorted in ascending order, because 
bisect.bisect_left is used on it. The reverse order of the result is already 
achieved in rangePartitioner by reversing the found index.
    
    The current implementation also work, but always uses only two partitions 
-- the first one and the last one (because the bisect_left return returns 
either "beginning" or "end" for a descending sequence).

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/foxik/spark fix-descending-sort

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/4761.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #4761
    
----
commit 5757490b5ff4f233ecbcaabd13d0282522884649
Author: Milan Straka <[email protected]>
Date:   2015-02-25T07:01:46Z

    Fix descending pyspark.rdd.sortByKey.
    
    The samples should always be sorted in ascending order, because
    bisect.bisect_left is used on it. The reverse order of the result
    is already achieved in rangePartitioner by reversing the found index.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to