GitHub user ilganeli opened a pull request:
https://github.com/apache/spark/pull/3723
[SPARK-4417] New API: sample RDD to fixed number of items
Hi all - I've added an interface to split an RDD by a count of elements
(instead of simply by percentage). I've also added new tests to validate this
performance and I've updated a previously existing function interface to re-use
common code.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/ilganeli/spark SPARK-4417B
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/3723.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #3723
----
commit 4a8ad6a04a99930dd176dd199f0f26d5ecc80aa8
Author: Ilya Ganelin <[email protected]>
Date: 2014-12-10T18:34:48Z
Upated RDD class to add a sampling function that provides N elements from
the RDD and returns it as an RDD.
commit 564bcc14ad3c271876798b723cc84f7d7c4716dd
Author: Ilya Ganelin <[email protected]>
Date: 2014-12-10T18:37:17Z
Added testing for new sampling function.
commit 350f039b4be9e9a34104652f5c74b2695254ad91
Author: Ilya Ganelin <[email protected]>
Date: 2014-12-10T18:47:25Z
Fixing minor formatting problems
commit 8e625d35ecf1a2fa8a3b1f113196d7d3e56781d1
Author: Ilya Ganelin <[email protected]>
Date: 2014-12-17T18:13:06Z
Merge remote-tracking branch 'upstream/master' into SPARK-4417B
commit 26b4b81ff4e3333769b343aa110c2c9112d80113
Author: Ilya Ganelin <[email protected]>
Date: 2014-12-17T18:17:55Z
Spcing
commit 8d411c3a2d922028659111c75ed6b0cdc4956608
Author: Ilya Ganelin <[email protected]>
Date: 2014-12-17T18:19:26Z
More Spacing
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]