[jira] [Commented] (RNG-95) DiscreteUniformSampler

2019-08-01 Thread Alex D Herbert (JIRA)


[ 
https://issues.apache.org/jira/browse/RNG-95?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16898349#comment-16898349
 ] 

Alex D Herbert commented on RNG-95:
---

I have updated the DiscreteUniformSampler to use dedicated internal classes for 
the following ranges:
 * [0, n]
 * [n, m]
 * [0, 2^k)
 * [n, n+2^k)
 * [n, n]

The main difference is that for a small integer range the method of Lemire is 
used which avoids the modulus operation. Previously the method delegated to 
nextInt(int) in the UniformRandomProvider. It also has a dedicated method to 
handle a range that is a power of 2.

I have benchmarked this by using a method that creates the sampler and then 
obtains a number of samples from the sampler. This is compared to the standard 
nextInt(int) algorithm in the base random provider.

Tested with SplitMix64 as the generator. Values are generated in a loop and 
summed. The sum is consumed by JMH. I have not subtracted a baseline from the 
timings.
h2. Power of 2 Range: 256
|Samples|Sampler|nextInt|Relative Speed|
|1|6.04|5.61|1.08|
|2|8.24|9.06|0.91|
|4|12.69|14.26|0.89|
|8|19.25|22.57|0.85|
|16|32.81|39.02|0.84|
|100|1696458.69|1940814.67|0.87|

In this case the sampler is faster when 2 samples are required. It is never 
much faster than the base implementation.
h2. Non-power of 2 Range: 257
|Samples|Sampler|nextInt|Relative|
|1|16.26|8.99|1.81|
|2|18.35|13.66|1.34|
|4|21.97|22.22|0.99|
|8|29.78|39.15|0.76|
|16|45.39|74.62|0.61|
|100|1838009.36|4470998.66|0.41|

In this case the sampler has a greater initialisation cost. It is faster when 
4-8 samples are required. At the limit is twice as fast as the base 
implementation.
h2. Worst case non-power of 2 range: 2^30 + 1 = 1073741825

This range results in the highest rejection probability per sample. The 
expected number of integers required from the RNG per sample is 2.
|Samples|Sampler|nextInt|Relative|
|1|21.71|30.74|0.71|
|2|27.60|56.77|0.49|
|4|41.92|109.29|0.38|
|8|68.78|213.05|0.32|
|16|120.16|407.02|0.30|
|100|6378167.70|24462935.62|0.26|

In this case the sampler is faster even when 1 sample is required. At the limit 
is almost 4 times as fast as the base implementation.

> DiscreteUniformSampler
> --
>
> Key: RNG-95
> URL: https://issues.apache.org/jira/browse/RNG-95
> Project: Commons RNG
>  Issue Type: Improvement
>  Components: sampling
>Affects Versions: 1.3
>Reporter: Alex D Herbert
>Assignee: Alex D Herbert
>Priority: Minor
>
> The {{DiscreteUniformSampler}} delegates the creation of an integer in the 
> range {{[0, n)}} to the {{UniformRandomProvider}}.
> This sampler will be repeatedly used to sample the same range. The default 
> method in {{BaseProvider}} uses a dynamic algorithm that handles {{n}} 
> differently when a power of 2.
> When the range is a power of 2 the method can use a series of bits from a 
> random integer to generate a uniform integer in the range. This is fast.
> When the range is not a power of 2 the algorithm must reject samples when the 
> sample would result in an over-representation of a particular value in the 
> uniform range. This is necessary as {{n}} does not exactly fit into the 
> number of possible values {{[0, 2^31)}} that can be produced by the generator 
> (when using 31-bit signed integers). The rejection method uses integer 
> arithmetic to determine the number of samples that fit into the range: 
> {{samples = 2^31 / n}}. Extra samples that lead to over-representation are 
> rejected: {{extra = 2^31 % n}}.
> Since {{n}} will not change a pre-computation step is possible to select the 
> best algorithm. 
> n is a power of 2:
> {code:java}
> // Favouring the least significant bits
> // Pre-compute
> int mask = n - 1;
> return nextInt() & mask;
> // Or favouring the most significant bits
> // Pre-compute
> int shift = Integer.numberOfLeadingZeros(n) + 1;
> return nextInt() >>> shift;
> {code}
> n is not a power of 2:
> {code:java}
> // Sample using modulus
> // Pre-compute
> final int fence = (int)(0x8000L - 0x8000L % n - 1);
> int bits;
> do {
> bits = rng.nextInt() >>> 1;
> } while (bits > fence);
> return bits % n;
> // Or using 32-bit unsigned arithmetic avoiding modulus
> // Pre-compute
> final long fence = (1L << 32) % n;
> long result;
> do {
> // Compute 64-bit unsigned product of n * [0, 2^32 - 1)
> result = n * (rng.nextInt() & 0xL);
> // Test the sample uniformity.
> } while ((result & 0xL) < fence);
> // Divide by 2^32 to get the sample
> return (int)(result >>> 32);
> {code}
> The second method uses a range of 2^32 instead of 2^31 so reducing the 
> rejection probability and avoids the modulus operator; these both increase 
> speed.
> Note algorithm 1 returns sample values in a repeat cycle from all values in 
> the range {{[0, 2^31)}} due to the 

[jira] [Commented] (RNG-95) DiscreteUniformSampler

2019-07-16 Thread Alex D Herbert (JIRA)


[ 
https://issues.apache.org/jira/browse/RNG-95?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16886537#comment-16886537
 ] 

Alex D Herbert commented on RNG-95:
---

This method has been documented by Danial Lamire:

[Fast Random Integer Generation in an Interval|https://arxiv.org/abs/1805.10941]

> DiscreteUniformSampler
> --
>
> Key: RNG-95
> URL: https://issues.apache.org/jira/browse/RNG-95
> Project: Commons RNG
>  Issue Type: Improvement
>  Components: sampling
>Affects Versions: 1.3
>Reporter: Alex D Herbert
>Assignee: Alex D Herbert
>Priority: Minor
>
> The {{DiscreteUniformSampler}} delegates the creation of an integer in the 
> range {{[0, n)}} to the {{UniformRandomProvider}}.
> This sampler will be repeatedly used to sample the same range. The default 
> method in {{BaseProvider}} uses a dynamic algorithm that handles {{n}} 
> differently when a power of 2.
> When the range is a power of 2 the method can use a series of bits from a 
> random integer to generate a uniform integer in the range. This is fast.
> When the range is not a power of 2 the algorithm must reject samples when the 
> sample would result in an over-representation of a particular value in the 
> uniform range. This is necessary as {{n}} does not exactly fit into the 
> number of possible values {{[0, 2^31)}} that can be produced by the generator 
> (when using 31-bit signed integers). The rejection method uses integer 
> arithmetic to determine the number of samples that fit into the range: 
> {{samples = 2^31 / n}}. Extra samples that lead to over-representation are 
> rejected: {{extra = 2^31 % n}}.
> Since {{n}} will not change a pre-computation step is possible to select the 
> best algorithm. 
> n is a power of 2:
> {code:java}
> // Favouring the least significant bits
> // Pre-compute
> int mask = n - 1;
> return nextInt() & mask;
> // Or favouring the most significant bits
> // Pre-compute
> int shift = Integer.numberOfLeadingZeros(n) + 1;
> return nextInt() >>> shift;
> {code}
> n is not a power of 2:
> {code:java}
> // Sample using modulus
> // Pre-compute
> final int fence = (int)(0x8000L - 0x8000L % n - 1);
> int bits;
> do {
> bits = rng.nextInt() >>> 1;
> } while (bits > fence);
> return bits % n;
> // Or using 32-bit unsigned arithmetic avoiding modulus
> // Pre-compute
> final long fence = (1L << 32) % n;
> long result;
> do {
> // Compute 64-bit unsigned product of n * [0, 2^32 - 1)
> result = n * (rng.nextInt() & 0xL);
> // Test the sample uniformity.
> } while ((result & 0xL) < fence);
> // Divide by 2^32 to get the sample
> return (int)(result >>> 32);
> {code}
> The second method uses a range of 2^32 instead of 2^31 so reducing the 
> rejection probability and avoids the modulus operator; these both increase 
> speed.
> Note algorithm 1 returns sample values in a repeat cycle from all values in 
> the range {{[0, 2^31)}} due to the use of modulus, e.g.
> {noformat}
> 0, 1, 2, ..., 0, 1, 2, ...
> {noformat}
> Algorithm 2 returns sample values in a linear order, e.g.
> {noformat}
> 0, 0, 1, 1, 2, 2, ...
> {noformat}
> The suggested change is to implement smart pre-computation in the 
> {{DiscreteUniformSampler}} based on the range and use the algorithms that 
> favour the most significant bits from the generator.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)