GitHub user freeman-lab opened a pull request: https://github.com/apache/spark/pull/2889
Fix for sampling error in NumPy v1.9 [SPARK-3995][PYSPARK] Change maximum value for default seed during RDD sampling so that it is strictly less than 2 ** 32. This prevents a bug in the most recent version of NumPy, which cannot accept random seeds above this bound. Adds an extra test that uses the default seed (instead of setting it manually, as in the docstrings). @mengxr You can merge this pull request into a Git repository by running: $ git pull https://github.com/freeman-lab/spark pyspark-sampling Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2889.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2889 ---- commit dc385ef6f103a28361de3e7599a1e15528973180 Author: freeman <the.freeman....@gmail.com> Date: 2014-10-22T06:45:40Z Change maximum value for default seed - Fixes bug in NumPy v1.9 which truncates random seeds larger than or equal to 2 ** 32 - Add an extra test for sampling with default seed ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org