GitHub user freeman-lab opened a pull request:

    https://github.com/apache/spark/pull/2889

    Fix for sampling error in NumPy v1.9 [SPARK-3995][PYSPARK]

    Change maximum value for default seed during RDD sampling so that it is 
strictly less than 2 ** 32. This prevents a bug in the most recent version of 
NumPy, which cannot accept random seeds above this bound.
    
    Adds an extra test that uses the default seed (instead of setting it 
manually, as in the docstrings).
    
    @mengxr

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/freeman-lab/spark pyspark-sampling

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/2889.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2889
    
----
commit dc385ef6f103a28361de3e7599a1e15528973180
Author: freeman <the.freeman....@gmail.com>
Date:   2014-10-22T06:45:40Z

    Change maximum value for default seed
    
    - Fixes bug in NumPy v1.9 which truncates random seeds larger than or
    equal to 2 ** 32
    - Add an extra test for sampling with default seed

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to