GitHub user dorx opened a pull request:

    https://github.com/apache/spark/pull/1025

    [SPARK-2082] stratified sampling in PairRDDFunctions that guarantees exact 
sample size

    Implemented stratified sampling that guarantees exact sample size using 
ScaRSR with two passes over the RDD for sampling without replacement and three 
passes for sampling with replacement.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/dorx/spark stratified

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/1025.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1025
    
----
commit 14419775202e6eef1f0e1f0c74c7be9030aca73d
Author: Doris Xin <[email protected]>
Date:   2014-05-29T22:22:14Z

    SPARK-1939 Refactor takeSample method in RDD to use ScaSRS

commit ffea61a67d228edb476d29ca13a84bb3f9a22887
Author: Doris Xin <[email protected]>
Date:   2014-05-30T00:55:54Z

    SPARK-1939: Refactor takeSample method in RDD
    
    Reviewer comments addressed:
    - commons-math3 is now a test-only dependency. bumped up to v3.3
    - comments added to explain what computeFraction is doing
    - fixed the unit for computeFraction to use BinomialDitro for without
    replacement sampling
    - stylistic fixes

commit 7cab53a3926f4351432e5e3600b0796b9a4146e4
Author: Doris Xin <[email protected]>
Date:   2014-06-02T19:00:38Z

    fixed import bug in rdd.py

commit e3fd6a628317d559a08a7a20421e9c0618180902
Author: Doris Xin <[email protected]>
Date:   2014-06-02T19:06:18Z

    Merge branch 'master' into takeSample

commit 9ee94ee3c28e8d808063fef4e5d39f06ab738e0b
Author: Doris Xin <[email protected]>
Date:   2014-06-09T20:15:23Z

    [SPARK-2082] stratified sampling in PairRDDFunctions that guarantees exact 
sample size

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to