GitHub user w3iBStime opened a pull request:

    https://github.com/apache/spark/pull/17542

    Corrects interval notation in doc comment

    The random number generated by XORShiftRandom.nextDouble() is a value 
between zero and one, including zero but not including one. I.e., 0 <= x < 1 . 
I've denoted this by changing the closing square bracket to a closing 
parenthesis.
    
    You can also think of trying to uniformly randomly assign items in a list 
to three classes 'A', 'B' and 'C'. For each item, if {randomDouble * 3.0} is 
between 0.000 and 0.999, it gets assigned to A. If between 1.000 and 1.999, it 
goes to B. If between 2.000 and 2.999 it goes to C. All three classes have the 
same probability of receiving the item. If it were possible for the raw random 
number to be exactly 1.000, then after scaling the range by multiplying times 
3.0 class C would be slightly more likely to receive the item than A or B 
(assuming simple logic instead of more extensive/expensive logic to break ties).
    
    Also, see the existing comment in SamplingUtils which uses the same 
function: 
https://github.com/apache/spark/blob/79f5f281bb69cb2de9f64006180abd753e8ae427/core/src/main/scala/org/apache/spark/util/random/SamplingUtils.scala#L62
    
    https://en.wikipedia.org/wiki/Interval_(mathematics)
    
    ## What changes were proposed in this pull request?
    
    (Please fill in changes proposed in this fix)
    
    ## How was this patch tested?
    
    (Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)
    (If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)
    
    Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/w3iBStime/spark patch-1

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/17542.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #17542
    
----
commit 23428a74b4fbe7809c9bf266ebfa64e31c299ae6
Author: Brett Stime <[email protected]>
Date:   2017-04-05T17:57:26Z

    Corrects interval notation in doc comment
    
    The random number generated by XORShiftRandom.nextDouble() is a value 
between zero and one, including zero but not including one. I.e., 0 <= x < 1 . 
I've denoted this by changing the closing square bracket to a closing 
parenthesis.
    
    You can also think of trying to uniformly randomly assign items in a list 
to three classes 'A', 'B' and 'C'. For each item, if {randomDouble * 3.0} is 
between 0.000 and 0.999, it gets assigned to A. If between 1.000 and 1.999, it 
goes to B. If between 2.000 and 2.999 it goes to C. All three classes have the 
same probability of receiving the item. If it were possible for the raw random 
number to be exactly 1.000, then after scaling the range by multiplying times 
3.0 class C would be slightly more likely to receive the item than A or B 
(assuming simple logic instead of more extensive/expensive logic to break ties).
    
    Also, see the existing comment in SamplingUtils which uses the same 
function: 
https://github.com/apache/spark/blob/79f5f281bb69cb2de9f64006180abd753e8ae427/core/src/main/scala/org/apache/spark/util/random/SamplingUtils.scala#L62
    
    https://en.wikipedia.org/wiki/Interval_(mathematics)

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to