GitHub user coderxiang opened a pull request:

    https://github.com/apache/spark/pull/5075

    [Core] SPARK-5954: Top by key

    This PR implements two functions
      - `topByKey(num: Int): RDD[(K, Array[V])]` finds the top-k values for 
each key in a pair RDD. This can be used, e.g., in computing top 
recommendations.
    
    - `takeOrderedByKey(num: Int): RDD[(K, Array[V])] ` does the opposite of 
`topByKey`
    
    
    @mengxr it shows on my test that removing `sorted` will fail the test, 
could you check it again?

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/coderxiang/spark topByKey

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/5075.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #5075
    
----
commit debccadf77cc8d8d5a55db83402d2ea85878d3c7
Author: Shuo Xiang <[email protected]>
Date:   2015-03-11T20:46:02Z

    topByKey

commit b10e325540ace10611051fae7366c4bd78d523f5
Author: Shuo Xiang <[email protected]>
Date:   2015-03-12T20:20:30Z

    Merge remote-tracking branch 'upstream/master' into topByKey

commit 0895c17b7ce3db8ddd0a3fffcb2fb1f40b611c05
Author: Shuo Xiang <[email protected]>
Date:   2015-03-17T20:41:45Z

    Merge remote-tracking branch 'upstream/master' into topByKey

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to