[jira] [Commented] (RNG-95) DiscreteUniformSampler
[ https://issues.apache.org/jira/browse/RNG-95?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16898349#comment-16898349 ] Alex D Herbert commented on RNG-95: --- I have updated the DiscreteUniformSampler to use dedicated internal classes for the following ranges: * [0, n] * [n, m] * [0, 2^k) * [n, n+2^k) * [n, n] The main difference is that for a small integer range the method of Lemire is used which avoids the modulus operation. Previously the method delegated to nextInt(int) in the UniformRandomProvider. It also has a dedicated method to handle a range that is a power of 2. I have benchmarked this by using a method that creates the sampler and then obtains a number of samples from the sampler. This is compared to the standard nextInt(int) algorithm in the base random provider. Tested with SplitMix64 as the generator. Values are generated in a loop and summed. The sum is consumed by JMH. I have not subtracted a baseline from the timings. h2. Power of 2 Range: 256 |Samples|Sampler|nextInt|Relative Speed| |1|6.04|5.61|1.08| |2|8.24|9.06|0.91| |4|12.69|14.26|0.89| |8|19.25|22.57|0.85| |16|32.81|39.02|0.84| |100|1696458.69|1940814.67|0.87| In this case the sampler is faster when 2 samples are required. It is never much faster than the base implementation. h2. Non-power of 2 Range: 257 |Samples|Sampler|nextInt|Relative| |1|16.26|8.99|1.81| |2|18.35|13.66|1.34| |4|21.97|22.22|0.99| |8|29.78|39.15|0.76| |16|45.39|74.62|0.61| |100|1838009.36|4470998.66|0.41| In this case the sampler has a greater initialisation cost. It is faster when 4-8 samples are required. At the limit is twice as fast as the base implementation. h2. Worst case non-power of 2 range: 2^30 + 1 = 1073741825 This range results in the highest rejection probability per sample. The expected number of integers required from the RNG per sample is 2. |Samples|Sampler|nextInt|Relative| |1|21.71|30.74|0.71| |2|27.60|56.77|0.49| |4|41.92|109.29|0.38| |8|68.78|213.05|0.32| |16|120.16|407.02|0.30| |100|6378167.70|24462935.62|0.26| In this case the sampler is faster even when 1 sample is required. At the limit is almost 4 times as fast as the base implementation. > DiscreteUniformSampler > -- > > Key: RNG-95 > URL: https://issues.apache.org/jira/browse/RNG-95 > Project: Commons RNG > Issue Type: Improvement > Components: sampling >Affects Versions: 1.3 >Reporter: Alex D Herbert >Assignee: Alex D Herbert >Priority: Minor > > The {{DiscreteUniformSampler}} delegates the creation of an integer in the > range {{[0, n)}} to the {{UniformRandomProvider}}. > This sampler will be repeatedly used to sample the same range. The default > method in {{BaseProvider}} uses a dynamic algorithm that handles {{n}} > differently when a power of 2. > When the range is a power of 2 the method can use a series of bits from a > random integer to generate a uniform integer in the range. This is fast. > When the range is not a power of 2 the algorithm must reject samples when the > sample would result in an over-representation of a particular value in the > uniform range. This is necessary as {{n}} does not exactly fit into the > number of possible values {{[0, 2^31)}} that can be produced by the generator > (when using 31-bit signed integers). The rejection method uses integer > arithmetic to determine the number of samples that fit into the range: > {{samples = 2^31 / n}}. Extra samples that lead to over-representation are > rejected: {{extra = 2^31 % n}}. > Since {{n}} will not change a pre-computation step is possible to select the > best algorithm. > n is a power of 2: > {code:java} > // Favouring the least significant bits > // Pre-compute > int mask = n - 1; > return nextInt() & mask; > // Or favouring the most significant bits > // Pre-compute > int shift = Integer.numberOfLeadingZeros(n) + 1; > return nextInt() >>> shift; > {code} > n is not a power of 2: > {code:java} > // Sample using modulus > // Pre-compute > final int fence = (int)(0x8000L - 0x8000L % n - 1); > int bits; > do { > bits = rng.nextInt() >>> 1; > } while (bits > fence); > return bits % n; > // Or using 32-bit unsigned arithmetic avoiding modulus > // Pre-compute > final long fence = (1L << 32) % n; > long result; > do { > // Compute 64-bit unsigned product of n * [0, 2^32 - 1) > result = n * (rng.nextInt() & 0xL); > // Test the sample uniformity. > } while ((result & 0xL) < fence); > // Divide by 2^32 to get the sample > return (int)(result >>> 32); > {code} > The second method uses a range of 2^32 instead of 2^31 so reducing the > rejection probability and avoids the modulus operator; these both increase > speed. > Note algorithm 1 returns sample values in a repeat cycle from all values in > the range {{[0, 2^31
[jira] [Commented] (RNG-95) DiscreteUniformSampler
[ https://issues.apache.org/jira/browse/RNG-95?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16886537#comment-16886537 ] Alex D Herbert commented on RNG-95: --- This method has been documented by Danial Lamire: [Fast Random Integer Generation in an Interval|https://arxiv.org/abs/1805.10941] > DiscreteUniformSampler > -- > > Key: RNG-95 > URL: https://issues.apache.org/jira/browse/RNG-95 > Project: Commons RNG > Issue Type: Improvement > Components: sampling >Affects Versions: 1.3 >Reporter: Alex D Herbert >Assignee: Alex D Herbert >Priority: Minor > > The {{DiscreteUniformSampler}} delegates the creation of an integer in the > range {{[0, n)}} to the {{UniformRandomProvider}}. > This sampler will be repeatedly used to sample the same range. The default > method in {{BaseProvider}} uses a dynamic algorithm that handles {{n}} > differently when a power of 2. > When the range is a power of 2 the method can use a series of bits from a > random integer to generate a uniform integer in the range. This is fast. > When the range is not a power of 2 the algorithm must reject samples when the > sample would result in an over-representation of a particular value in the > uniform range. This is necessary as {{n}} does not exactly fit into the > number of possible values {{[0, 2^31)}} that can be produced by the generator > (when using 31-bit signed integers). The rejection method uses integer > arithmetic to determine the number of samples that fit into the range: > {{samples = 2^31 / n}}. Extra samples that lead to over-representation are > rejected: {{extra = 2^31 % n}}. > Since {{n}} will not change a pre-computation step is possible to select the > best algorithm. > n is a power of 2: > {code:java} > // Favouring the least significant bits > // Pre-compute > int mask = n - 1; > return nextInt() & mask; > // Or favouring the most significant bits > // Pre-compute > int shift = Integer.numberOfLeadingZeros(n) + 1; > return nextInt() >>> shift; > {code} > n is not a power of 2: > {code:java} > // Sample using modulus > // Pre-compute > final int fence = (int)(0x8000L - 0x8000L % n - 1); > int bits; > do { > bits = rng.nextInt() >>> 1; > } while (bits > fence); > return bits % n; > // Or using 32-bit unsigned arithmetic avoiding modulus > // Pre-compute > final long fence = (1L << 32) % n; > long result; > do { > // Compute 64-bit unsigned product of n * [0, 2^32 - 1) > result = n * (rng.nextInt() & 0xL); > // Test the sample uniformity. > } while ((result & 0xL) < fence); > // Divide by 2^32 to get the sample > return (int)(result >>> 32); > {code} > The second method uses a range of 2^32 instead of 2^31 so reducing the > rejection probability and avoids the modulus operator; these both increase > speed. > Note algorithm 1 returns sample values in a repeat cycle from all values in > the range {{[0, 2^31)}} due to the use of modulus, e.g. > {noformat} > 0, 1, 2, ..., 0, 1, 2, ... > {noformat} > Algorithm 2 returns sample values in a linear order, e.g. > {noformat} > 0, 0, 1, 1, 2, 2, ... > {noformat} > The suggested change is to implement smart pre-computation in the > {{DiscreteUniformSampler}} based on the range and use the algorithms that > favour the most significant bits from the generator. -- This message was sent by Atlassian JIRA (v7.6.14#76016)