GitHub user ilganeli opened a pull request:

    https://github.com/apache/spark/pull/3723

    [SPARK-4417] New API: sample RDD to fixed number of items   

    Hi all - I've added an interface to split an RDD by a count of elements 
(instead of simply by percentage). I've also added new tests to validate this 
performance and I've updated a previously existing function interface to re-use 
common code. 

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ilganeli/spark SPARK-4417B

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/3723.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #3723
    
----
commit 4a8ad6a04a99930dd176dd199f0f26d5ecc80aa8
Author: Ilya Ganelin <[email protected]>
Date:   2014-12-10T18:34:48Z

    Upated RDD class to add a sampling function that provides N elements from 
the RDD and returns it as an RDD.

commit 564bcc14ad3c271876798b723cc84f7d7c4716dd
Author: Ilya Ganelin <[email protected]>
Date:   2014-12-10T18:37:17Z

    Added testing for new sampling function.

commit 350f039b4be9e9a34104652f5c74b2695254ad91
Author: Ilya Ganelin <[email protected]>
Date:   2014-12-10T18:47:25Z

    Fixing minor formatting problems

commit 8e625d35ecf1a2fa8a3b1f113196d7d3e56781d1
Author: Ilya Ganelin <[email protected]>
Date:   2014-12-17T18:13:06Z

    Merge remote-tracking branch 'upstream/master' into SPARK-4417B

commit 26b4b81ff4e3333769b343aa110c2c9112d80113
Author: Ilya Ganelin <[email protected]>
Date:   2014-12-17T18:17:55Z

    Spcing

commit 8d411c3a2d922028659111c75ed6b0cdc4956608
Author: Ilya Ganelin <[email protected]>
Date:   2014-12-17T18:19:26Z

    More Spacing

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to