Re: [rng] RNG-101 new MarsagliaTsangWang discrete probability sampler

2019-05-13 Thread Alex Herbert


> On 11 May 2019, at 23:25, Alex Herbert  wrote:
> 
> 
> 
>> On 11 May 2019, at 22:58, Gilles Sadowski  wrote:
>> 
>> Le sam. 11 mai 2019 à 23:32, Alex Herbert  a écrit 
>> :
>>> 
>>> 
>>> 
 On 10 May 2019, at 15:07, Gilles Sadowski  wrote:
 
 Hi.
 
 Le ven. 10 mai 2019 à 15:53, Alex Herbert >>> > a écrit :
> 
> 
> On 10/05/2019 14:27, Gilles Sadowski wrote:
>> Hi Alex.
>> 
>> Le ven. 10 mai 2019 à 13:57, Alex Herbert  a 
>> écrit :
>>> Can I get a review of the PR for RNG-101 please.
>> Thanks for this work!
>> 
>> I didn't go into the details; however, I see many fields and methods like
>> table1 ... table5
>> fillTable1 ... fillTable5
>> getTable1 ... getTable5
>> Wouldn't it be possible to use a 2D table:
>> table[5][];
>> so that e.g. only one "fillTable(int tableIndex, /* other args */)" 
>> method
>> is necessary (where "tableIndex" runs from 0 to 4)?
> 
> Yes. The design is based around using 5 tables as per the example code.
> 
> The sample() method knows which table it needs so it can directly jump
> to the table in question. I'd have to look at the difference in speed
> when using a 2D table as you are adding another array access but
> reducing the number of possible method calls (although you still need a
> method call). Maybe this will be optimised out by the JVM.
> 
> If the speed is not a factor then I'll rewrite it. Otherwise it's
> probably better done for speed as this is the entire point of the
> sampler given it disregards any probability under 2^-31 (i.e. it's not a
> perfectly fair sampler).
> 
> Note that 5 tables are needed for 5 hex digits (base 2^6). The paper
> states using 3 tables of base 2^10 then you get a speed increase
> (roughly 1.16x) at the cost of storage (roughly 9x). Changing to 2
> tables of base 2^15 does not make it much faster again.
> 
> I'll have a rethink to see if I can make the design work for different
> base sizes.
 
 That could be an extension made easier with the 2D table, but
 I quite agree that given the relatively minor speed improvement
 to be expected, it is not the main reason; the rationale was just to
 make the code a more compact and a little easier to grasp (IMHO).
 
 Gilles
>>> 
>>> I’ve done a more extensive look at the implications of changing the 
>>> implementation of the algorithm. This tested using: 1D or 2D tables; 
>>> interfaced storage to dynamic table types; base 6 or base 10 for the 
>>> algorithm; and allowing the base to be chosen. Results are in the Jira 
>>> ticket. Basically 2D arrays are slower and supporting choices for the 
>>> backing storage or base of the algorithm is slower.
>>> 
>>> To support the Poisson and Binomial samplers only requires 16-bit storage. 
>>> So a dedicated sampler using base 6 and short for the tables will be the 
>>> best compromise between storage space and speed. The base 10 sampler is 
>>> faster but takes about 9-10x more space in memory.
>>> 
>>> Note I originally wrote the sampler to use only 16-bit storage. I then 
>>> modified it to use dynamic storage without measuring performance. And so I 
>>> made it slightly slower.
>>> 
>>> The question is does the library even need to have a 32-bit storage 
>>> implementation? This would be used for a probability distribution with more 
>>> than 2^16 different possible samples. I think this would be an edge case. 
>>> Here the memory requirements will be in the tens of MB at a minimum but may 
>>> balloon to become much larger. In this case a different algorithm such as 
>>> the Alias method or a guide table is more memory stable as it only requires 
>>> 12 bytes of storage per index, irrespective of the shape of the probability 
>>> distribution.
>>> 
>>> If different implementations (of this algorithm) are added to the library 
>>> then the effect of using a sampler that dynamically chooses the storage 
>>> space and/or base for the algorithm is noticeable in the performance. In 
>>> this case these would be better served using a factory:
>>> 
>>> class DiscreteProbabilitySamplerFactory {
>>>   DiscreteSampler createDiscreteProbabilitySampler(UniformRandomProvider, 
>>> double[])
>>> }
>>> 
>>> But if specifically targeting this algorithm it could be:
>>> 
>>> class MarsagliaTsangWangDiscreteProbabilitySamplerFactory {
>>>   DiscreteSampler createDiscreteProbabilitySampler(UniformRandomProvider, 
>>> double[], boolean useBase10)
>>> }
>>> 
>>> Or something similar. The user can then choose to use a base 10 algorithm 
>>> if memory is not a concern.
>>> 
>>> I am wary of making this too complicated for just this sampler. So I would 
>>> vote for ignoring the base 10 version and sticking to the interfaced 
>>> storage implementation in the current PR or reverting back to the 16-bit 
>>> storage and not 

Re: [rng] RNG-101 new MarsagliaTsangWang discrete probability sampler

2019-05-11 Thread Alex Herbert



> On 11 May 2019, at 22:58, Gilles Sadowski  wrote:
> 
> Le sam. 11 mai 2019 à 23:32, Alex Herbert  a écrit :
>> 
>> 
>> 
>>> On 10 May 2019, at 15:07, Gilles Sadowski  wrote:
>>> 
>>> Hi.
>>> 
>>> Le ven. 10 mai 2019 à 15:53, Alex Herbert >> > a écrit :
 
 
 On 10/05/2019 14:27, Gilles Sadowski wrote:
> Hi Alex.
> 
> Le ven. 10 mai 2019 à 13:57, Alex Herbert  a 
> écrit :
>> Can I get a review of the PR for RNG-101 please.
> Thanks for this work!
> 
> I didn't go into the details; however, I see many fields and methods like
>  table1 ... table5
>  fillTable1 ... fillTable5
>  getTable1 ... getTable5
> Wouldn't it be possible to use a 2D table:
>  table[5][];
> so that e.g. only one "fillTable(int tableIndex, /* other args */)" method
> is necessary (where "tableIndex" runs from 0 to 4)?
 
 Yes. The design is based around using 5 tables as per the example code.
 
 The sample() method knows which table it needs so it can directly jump
 to the table in question. I'd have to look at the difference in speed
 when using a 2D table as you are adding another array access but
 reducing the number of possible method calls (although you still need a
 method call). Maybe this will be optimised out by the JVM.
 
 If the speed is not a factor then I'll rewrite it. Otherwise it's
 probably better done for speed as this is the entire point of the
 sampler given it disregards any probability under 2^-31 (i.e. it's not a
 perfectly fair sampler).
 
 Note that 5 tables are needed for 5 hex digits (base 2^6). The paper
 states using 3 tables of base 2^10 then you get a speed increase
 (roughly 1.16x) at the cost of storage (roughly 9x). Changing to 2
 tables of base 2^15 does not make it much faster again.
 
 I'll have a rethink to see if I can make the design work for different
 base sizes.
>>> 
>>> That could be an extension made easier with the 2D table, but
>>> I quite agree that given the relatively minor speed improvement
>>> to be expected, it is not the main reason; the rationale was just to
>>> make the code a more compact and a little easier to grasp (IMHO).
>>> 
>>> Gilles
>> 
>> I’ve done a more extensive look at the implications of changing the 
>> implementation of the algorithm. This tested using: 1D or 2D tables; 
>> interfaced storage to dynamic table types; base 6 or base 10 for the 
>> algorithm; and allowing the base to be chosen. Results are in the Jira 
>> ticket. Basically 2D arrays are slower and supporting choices for the 
>> backing storage or base of the algorithm is slower.
>> 
>> To support the Poisson and Binomial samplers only requires 16-bit storage. 
>> So a dedicated sampler using base 6 and short for the tables will be the 
>> best compromise between storage space and speed. The base 10 sampler is 
>> faster but takes about 9-10x more space in memory.
>> 
>> Note I originally wrote the sampler to use only 16-bit storage. I then 
>> modified it to use dynamic storage without measuring performance. And so I 
>> made it slightly slower.
>> 
>> The question is does the library even need to have a 32-bit storage 
>> implementation? This would be used for a probability distribution with more 
>> than 2^16 different possible samples. I think this would be an edge case. 
>> Here the memory requirements will be in the tens of MB at a minimum but may 
>> balloon to become much larger. In this case a different algorithm such as 
>> the Alias method or a guide table is more memory stable as it only requires 
>> 12 bytes of storage per index, irrespective of the shape of the probability 
>> distribution.
>> 
>> If different implementations (of this algorithm) are added to the library 
>> then the effect of using a sampler that dynamically chooses the storage 
>> space and/or base for the algorithm is noticeable in the performance. In 
>> this case these would be better served using a factory:
>> 
>> class DiscreteProbabilitySamplerFactory {
>>DiscreteSampler createDiscreteProbabilitySampler(UniformRandomProvider, 
>> double[])
>> }
>> 
>> But if specifically targeting this algorithm it could be:
>> 
>> class MarsagliaTsangWangDiscreteProbabilitySamplerFactory {
>>DiscreteSampler createDiscreteProbabilitySampler(UniformRandomProvider, 
>> double[], boolean useBase10)
>> }
>> 
>> Or something similar. The user can then choose to use a base 10 algorithm if 
>> memory is not a concern.
>> 
>> I am wary of making this too complicated for just this sampler. So I would 
>> vote for ignoring the base 10 version and sticking to the interfaced storage 
>> implementation in the current PR or reverting back to the 16-bit storage and 
>> not supporting very large distributions. In the later case this is at least 
>> partially supported by the fact that the method only supports probabilities 
>> down to 1^-31. Anything 

Re: [rng] RNG-101 new MarsagliaTsangWang discrete probability sampler

2019-05-11 Thread Gilles Sadowski
Le sam. 11 mai 2019 à 23:32, Alex Herbert  a écrit :
>
>
>
> > On 10 May 2019, at 15:07, Gilles Sadowski  wrote:
> >
> > Hi.
> >
> > Le ven. 10 mai 2019 à 15:53, Alex Herbert  > > a écrit :
> >>
> >>
> >> On 10/05/2019 14:27, Gilles Sadowski wrote:
> >>> Hi Alex.
> >>>
> >>> Le ven. 10 mai 2019 à 13:57, Alex Herbert  a 
> >>> écrit :
>  Can I get a review of the PR for RNG-101 please.
> >>> Thanks for this work!
> >>>
> >>> I didn't go into the details; however, I see many fields and methods like
> >>>   table1 ... table5
> >>>   fillTable1 ... fillTable5
> >>>   getTable1 ... getTable5
> >>> Wouldn't it be possible to use a 2D table:
> >>>   table[5][];
> >>> so that e.g. only one "fillTable(int tableIndex, /* other args */)" method
> >>> is necessary (where "tableIndex" runs from 0 to 4)?
> >>
> >> Yes. The design is based around using 5 tables as per the example code.
> >>
> >> The sample() method knows which table it needs so it can directly jump
> >> to the table in question. I'd have to look at the difference in speed
> >> when using a 2D table as you are adding another array access but
> >> reducing the number of possible method calls (although you still need a
> >> method call). Maybe this will be optimised out by the JVM.
> >>
> >> If the speed is not a factor then I'll rewrite it. Otherwise it's
> >> probably better done for speed as this is the entire point of the
> >> sampler given it disregards any probability under 2^-31 (i.e. it's not a
> >> perfectly fair sampler).
> >>
> >> Note that 5 tables are needed for 5 hex digits (base 2^6). The paper
> >> states using 3 tables of base 2^10 then you get a speed increase
> >> (roughly 1.16x) at the cost of storage (roughly 9x). Changing to 2
> >> tables of base 2^15 does not make it much faster again.
> >>
> >> I'll have a rethink to see if I can make the design work for different
> >> base sizes.
> >
> > That could be an extension made easier with the 2D table, but
> > I quite agree that given the relatively minor speed improvement
> > to be expected, it is not the main reason; the rationale was just to
> > make the code a more compact and a little easier to grasp (IMHO).
> >
> > Gilles
>
> I’ve done a more extensive look at the implications of changing the 
> implementation of the algorithm. This tested using: 1D or 2D tables; 
> interfaced storage to dynamic table types; base 6 or base 10 for the 
> algorithm; and allowing the base to be chosen. Results are in the Jira 
> ticket. Basically 2D arrays are slower and supporting choices for the backing 
> storage or base of the algorithm is slower.
>
> To support the Poisson and Binomial samplers only requires 16-bit storage. So 
> a dedicated sampler using base 6 and short for the tables will be the best 
> compromise between storage space and speed. The base 10 sampler is faster but 
> takes about 9-10x more space in memory.
>
> Note I originally wrote the sampler to use only 16-bit storage. I then 
> modified it to use dynamic storage without measuring performance. And so I 
> made it slightly slower.
>
> The question is does the library even need to have a 32-bit storage 
> implementation? This would be used for a probability distribution with more 
> than 2^16 different possible samples. I think this would be an edge case. 
> Here the memory requirements will be in the tens of MB at a minimum but may 
> balloon to become much larger. In this case a different algorithm such as the 
> Alias method or a guide table is more memory stable as it only requires 12 
> bytes of storage per index, irrespective of the shape of the probability 
> distribution.
>
> If different implementations (of this algorithm) are added to the library 
> then the effect of using a sampler that dynamically chooses the storage space 
> and/or base for the algorithm is noticeable in the performance. In this case 
> these would be better served using a factory:
>
> class DiscreteProbabilitySamplerFactory {
> DiscreteSampler createDiscreteProbabilitySampler(UniformRandomProvider, 
> double[])
> }
>
> But if specifically targeting this algorithm it could be:
>
> class MarsagliaTsangWangDiscreteProbabilitySamplerFactory {
> DiscreteSampler createDiscreteProbabilitySampler(UniformRandomProvider, 
> double[], boolean useBase10)
> }
>
> Or something similar. The user can then choose to use a base 10 algorithm if 
> memory is not a concern.
>
> I am wary of making this too complicated for just this sampler. So I would 
> vote for ignoring the base 10 version and sticking to the interfaced storage 
> implementation in the current PR or reverting back to the 16-bit storage and 
> not supporting very large distributions. In the later case this is at least 
> partially supported by the fact that the method only supports probabilities 
> down to 1^-31. Anything else is set to zero after scaling by 2^30 and 
> rounding. So large probability distributions are more likely to have values 
> that 

Re: [rng] RNG-101 new MarsagliaTsangWang discrete probability sampler

2019-05-11 Thread Alex Herbert


> On 10 May 2019, at 15:07, Gilles Sadowski  wrote:
> 
> Hi.
> 
> Le ven. 10 mai 2019 à 15:53, Alex Herbert  > a écrit :
>> 
>> 
>> On 10/05/2019 14:27, Gilles Sadowski wrote:
>>> Hi Alex.
>>> 
>>> Le ven. 10 mai 2019 à 13:57, Alex Herbert  a 
>>> écrit :
 Can I get a review of the PR for RNG-101 please.
>>> Thanks for this work!
>>> 
>>> I didn't go into the details; however, I see many fields and methods like
>>>   table1 ... table5
>>>   fillTable1 ... fillTable5
>>>   getTable1 ... getTable5
>>> Wouldn't it be possible to use a 2D table:
>>>   table[5][];
>>> so that e.g. only one "fillTable(int tableIndex, /* other args */)" method
>>> is necessary (where "tableIndex" runs from 0 to 4)?
>> 
>> Yes. The design is based around using 5 tables as per the example code.
>> 
>> The sample() method knows which table it needs so it can directly jump
>> to the table in question. I'd have to look at the difference in speed
>> when using a 2D table as you are adding another array access but
>> reducing the number of possible method calls (although you still need a
>> method call). Maybe this will be optimised out by the JVM.
>> 
>> If the speed is not a factor then I'll rewrite it. Otherwise it's
>> probably better done for speed as this is the entire point of the
>> sampler given it disregards any probability under 2^-31 (i.e. it's not a
>> perfectly fair sampler).
>> 
>> Note that 5 tables are needed for 5 hex digits (base 2^6). The paper
>> states using 3 tables of base 2^10 then you get a speed increase
>> (roughly 1.16x) at the cost of storage (roughly 9x). Changing to 2
>> tables of base 2^15 does not make it much faster again.
>> 
>> I'll have a rethink to see if I can make the design work for different
>> base sizes.
> 
> That could be an extension made easier with the 2D table, but
> I quite agree that given the relatively minor speed improvement
> to be expected, it is not the main reason; the rationale was just to
> make the code a more compact and a little easier to grasp (IMHO).
> 
> Gilles

I’ve done a more extensive look at the implications of changing the 
implementation of the algorithm. This tested using: 1D or 2D tables; interfaced 
storage to dynamic table types; base 6 or base 10 for the algorithm; and 
allowing the base to be chosen. Results are in the Jira ticket. Basically 2D 
arrays are slower and supporting choices for the backing storage or base of the 
algorithm is slower.

To support the Poisson and Binomial samplers only requires 16-bit storage. So a 
dedicated sampler using base 6 and short for the tables will be the best 
compromise between storage space and speed. The base 10 sampler is faster but 
takes about 9-10x more space in memory.

Note I originally wrote the sampler to use only 16-bit storage. I then modified 
it to use dynamic storage without measuring performance. And so I made it 
slightly slower.

The question is does the library even need to have a 32-bit storage 
implementation? This would be used for a probability distribution with more 
than 2^16 different possible samples. I think this would be an edge case. Here 
the memory requirements will be in the tens of MB at a minimum but may balloon 
to become much larger. In this case a different algorithm such as the Alias 
method or a guide table is more memory stable as it only requires 12 bytes of 
storage per index, irrespective of the shape of the probability distribution.

If different implementations (of this algorithm) are added to the library then 
the effect of using a sampler that dynamically chooses the storage space and/or 
base for the algorithm is noticeable in the performance. In this case these 
would be better served using a factory:

class DiscreteProbabilitySamplerFactory {
DiscreteSampler createDiscreteProbabilitySampler(UniformRandomProvider, 
double[])
}

But if specifically targeting this algorithm it could be:

class MarsagliaTsangWangDiscreteProbabilitySamplerFactory {
DiscreteSampler createDiscreteProbabilitySampler(UniformRandomProvider, 
double[], boolean useBase10)
}

Or something similar. The user can then choose to use a base 10 algorithm if 
memory is not a concern.

I am wary of making this too complicated for just this sampler. So I would vote 
for ignoring the base 10 version and sticking to the interfaced storage 
implementation in the current PR or reverting back to the 16-bit storage and 
not supporting very large distributions. In the later case this is at least 
partially supported by the fact that the method only supports probabilities 
down to 1^-31. Anything else is set to zero after scaling by 2^30 and rounding. 
So large probability distributions are more likely to have values that are 
misrepresented due to the conversion of probabilities to fractions with a base 
of 2^30.

Thoughts on this?

Alex


> 
>> 
>>> 
>>> The diff for "DiscreteSamplersList.java" refers to
>>>MarsagliaTsangWangBinomialSampler
>>> but
>>>   

Re: [rng] RNG-101 new MarsagliaTsangWang discrete probability sampler

2019-05-10 Thread Alex Herbert


On 10/05/2019 15:07, Gilles Sadowski wrote:

Hi.

Le ven. 10 mai 2019 à 15:53, Alex Herbert  a écrit :


On 10/05/2019 14:27, Gilles Sadowski wrote:

Hi Alex.

Le ven. 10 mai 2019 à 13:57, Alex Herbert  a écrit :

Can I get a review of the PR for RNG-101 please.

Thanks for this work!

I didn't go into the details; however, I see many fields and methods like
table1 ... table5
fillTable1 ... fillTable5
getTable1 ... getTable5
Wouldn't it be possible to use a 2D table:
table[5][];
so that e.g. only one "fillTable(int tableIndex, /* other args */)" method
is necessary (where "tableIndex" runs from 0 to 4)?

Yes. The design is based around using 5 tables as per the example code.

The sample() method knows which table it needs so it can directly jump
to the table in question. I'd have to look at the difference in speed
when using a 2D table as you are adding another array access but
reducing the number of possible method calls (although you still need a
method call). Maybe this will be optimised out by the JVM.

If the speed is not a factor then I'll rewrite it. Otherwise it's
probably better done for speed as this is the entire point of the
sampler given it disregards any probability under 2^-31 (i.e. it's not a
perfectly fair sampler).

Note that 5 tables are needed for 5 hex digits (base 2^6). The paper
states using 3 tables of base 2^10 then you get a speed increase
(roughly 1.16x) at the cost of storage (roughly 9x). Changing to 2
tables of base 2^15 does not make it much faster again.

I'll have a rethink to see if I can make the design work for different
base sizes.

That could be an extension made easier with the 2D table, but
I quite agree that given the relatively minor speed improvement
to be expected, it is not the main reason; the rationale was just to
make the code a more compact and a little easier to grasp (IMHO).

Gilles


Benchmark (randomSourceName)
   (samplerType)  Mode  Cnt  Score   Error  Units
DiscreteSamplersPerformance.baseline N/A
 N/A  avgt    5  2.043 ± 0.015  ns/op
DiscreteSamplersPerformance.nextInt SPLIT_MIX_64
 N/A  avgt    5  3.577 ± 0.028  ns/op
DiscreteSamplersPerformance.sample  SPLIT_MIX_64   
MarsagliaTsangWangDiscreteSampler  avgt    5  5.550 ± 0.019  ns/op
DiscreteSamplersPerformance.sample  SPLIT_MIX_64  
MarsagliaTsangWangDiscreteSampler2  avgt    5  5.974 ± 0.073  ns/op
DiscreteSamplersPerformance.sample  SPLIT_MIX_64   
MarsagliaTsangWangSmallMeanPoissonSampler  avgt    5  8.104 ± 0.048  ns/op
DiscreteSamplersPerformance.sample  SPLIT_MIX_64  
MarsagliaTsangWangSmallMeanPoissonSampler2  avgt    5  8.217 ± 0.015  ns/op
DiscreteSamplersPerformance.sample  SPLIT_MIX_64   
MarsagliaTsangWangBinomialSampler  avgt    5  8.321 ± 0.028  ns/op
DiscreteSamplersPerformance.sample  SPLIT_MIX_64  
MarsagliaTsangWangBinomialSampler2  avgt    5  9.277 ± 0.167  ns/op

The Poisson(mean = 22.9) sampler has a small difference:

8.104 - 3.577 = 4.527
8.217 - 3.577 = 4.64

About 2.4% slower.

But the Binomial (n=67,p=0.33) sampler is a fair bit slower:

8.321 - 3.577 = 4.744
9.277 - 3.577 = 5.7

About 20% slower.

So it seems that the JVM cannot optimise the 2D table look-up. It may 
well be due to the use of an interface to support different table 
storage types:


table.get(0, n)

If working direct with the array then:

array[0][n]

may be optimised away to just the second index access.

As it is the following:

table.get0(n)
table.get1(n)
...

is faster than

table.get(0, n)
table.get(1, n)
...

I have to admit that the code with the 2D table is nice and compact. But 
thinking about the implementation it can still support 2, 3, or 5 tables 
with the current approach. The later tables would just be empty.


Here are some results for a quick hack of a 3 table version (the suffix 
10 is for base 10):


Benchmark (randomSourceName)
 (samplerType)  Mode  Cnt  Score   Error  Units
DiscreteSamplersPerformance.baseline N/A
   N/A  avgt    5  2.042 ± 0.008  ns/op
DiscreteSamplersPerformance.nextInt SPLIT_MIX_64
   N/A  avgt    5  3.577 ± 0.025  ns/op
DiscreteSamplersPerformance.sample  SPLIT_MIX_64 
MarsagliaTsangWangDiscreteSampler  avgt    5  6.087 ± 2.804  ns/op
DiscreteSamplersPerformance.sample  SPLIT_MIX_64    
MarsagliaTsangWangDiscreteSampler2  avgt    5  6.002 ± 0.141  ns/op
DiscreteSamplersPerformance.sample  SPLIT_MIX_64   
MarsagliaTsangWangDiscreteSampler10  avgt    5  5.301 ± 0.015  ns/op
DiscreteSamplersPerformance.sample  SPLIT_MIX_64  
MarsagliaTsangWangDiscreteSampler102  avgt    5  5.715 ± 0.007  ns/op

There's a timing anomaly for the original 

Re: [rng] RNG-101 new MarsagliaTsangWang discrete probability sampler

2019-05-10 Thread Alex Herbert


On 10/05/2019 15:07, Gilles Sadowski wrote:

Hi.

Le ven. 10 mai 2019 à 15:53, Alex Herbert  a écrit :


On 10/05/2019 14:27, Gilles Sadowski wrote:

Hi Alex.

Le ven. 10 mai 2019 à 13:57, Alex Herbert  a écrit :

Can I get a review of the PR for RNG-101 please.

Thanks for this work!

I didn't go into the details; however, I see many fields and methods like
table1 ... table5
fillTable1 ... fillTable5
getTable1 ... getTable5
Wouldn't it be possible to use a 2D table:
table[5][];
so that e.g. only one "fillTable(int tableIndex, /* other args */)" method
is necessary (where "tableIndex" runs from 0 to 4)?

Yes. The design is based around using 5 tables as per the example code.

The sample() method knows which table it needs so it can directly jump
to the table in question. I'd have to look at the difference in speed
when using a 2D table as you are adding another array access but
reducing the number of possible method calls (although you still need a
method call). Maybe this will be optimised out by the JVM.

If the speed is not a factor then I'll rewrite it. Otherwise it's
probably better done for speed as this is the entire point of the
sampler given it disregards any probability under 2^-31 (i.e. it's not a
perfectly fair sampler).

Note that 5 tables are needed for 5 hex digits (base 2^6). The paper
states using 3 tables of base 2^10 then you get a speed increase
(roughly 1.16x) at the cost of storage (roughly 9x). Changing to 2
tables of base 2^15 does not make it much faster again.

I'll have a rethink to see if I can make the design work for different
base sizes.

That could be an extension made easier with the 2D table, but
I quite agree that given the relatively minor speed improvement
to be expected, it is not the main reason; the rationale was just to
make the code a more compact and a little easier to grasp (IMHO).

Gilles


Direct from JMH:

Benchmark (randomSourceName)   
(samplerType)  Mode  Cnt  Score   Error  Units
DiscreteSamplersPerformance.baseline N/A
 N/A  avgt5  2.038 ± 0.015  ns/op
DiscreteSamplersPerformance.nextInt SPLIT_MIX_64
 N/A  avgt5  3.752 ± 1.708  ns/op
DiscreteSamplersPerformance.sample  SPLIT_MIX_64   
MarsagliaTsangWangDiscreteSampler  avgt5  5.769 ± 0.005  ns/op
DiscreteSamplersPerformance.sample  SPLIT_MIX_64  
MarsagliaTsangWangDiscreteSampler2  avgt5  5.954 ± 0.038  ns/op

Note: The sampler is very fast!

The baseline is just the timing for JMH to consume an int value. This 
does not even hit the generator. That is the nextInt() method.


So to get the time for the sampler we subtract the nextInt baseline (I 
used the median as one sample was slow):


5.752 - 3.559 = 2.193
5.954 - 3.559 = 2.395

It's about 10% slower using a 2D table.

I've only run this on the Discrete sampler. It is a bit artificial as 
the probability distribution is [0.1, 0.2, 0.3, 0.4] so the tables may 
not be very well filled. I'll repeat for the Poisson sampler as that has 
a long tail and larger tables.


I'm currently favouring leaving the code as it is. It matches the 
original source paper so anyone reading that will understand the code. 
Perhaps it needs more documentation.






The diff for "DiscreteSamplersList.java" refers to
 MarsagliaTsangWangBinomialSampler
but
MarsagliaTsangWangSmallMeanPoissonSampler
seems to be missing.

Oops, I missed adding that back. I built the PR from code where I was
testing lots of implementations.

I've just added it back and it is still passing locally. Travis should
see that too as I pushed the change to the PR.


Regards,
Gilles


This is a new sampler based on the source code from the paper:

George Marsaglia, Wai Wan Tsang, Jingbo Wang (2004)
Fast Generation of Discrete Random Variables.
Journal of Statistical Software. Vol. 11, Issue. 3, pp. 1-11.

https://www.jstatsoft.org/article/view/v011i03

The code has no explicit licence.

The paper states:

"We have provided C versions of the two methods described here, for
inclusion in the “Browse
files”section of the journal. ... You may then want to examine the
components of the two files, for illumination
or for extracting portions that might be usefully applied to your
discrete distributions."

So I assuming that it can be incorporated with little modification.

The Java implementation has been rewritten to allow the storage to be
optimised for the required size. The generation of the tables has been
adapted appropriately and checks have been added on the input parameters
to ensure the sampler does not generate exceptions once constructed (I
found out the hard way that the original code was not entirely correct).

Thanks.

Alex

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: 

Re: [rng] RNG-101 new MarsagliaTsangWang discrete probability sampler

2019-05-10 Thread Gilles Sadowski
Hi.

Le ven. 10 mai 2019 à 15:53, Alex Herbert  a écrit :
>
>
> On 10/05/2019 14:27, Gilles Sadowski wrote:
> > Hi Alex.
> >
> > Le ven. 10 mai 2019 à 13:57, Alex Herbert  a 
> > écrit :
> >> Can I get a review of the PR for RNG-101 please.
> > Thanks for this work!
> >
> > I didn't go into the details; however, I see many fields and methods like
> >table1 ... table5
> >fillTable1 ... fillTable5
> >getTable1 ... getTable5
> > Wouldn't it be possible to use a 2D table:
> >table[5][];
> > so that e.g. only one "fillTable(int tableIndex, /* other args */)" method
> > is necessary (where "tableIndex" runs from 0 to 4)?
>
> Yes. The design is based around using 5 tables as per the example code.
>
> The sample() method knows which table it needs so it can directly jump
> to the table in question. I'd have to look at the difference in speed
> when using a 2D table as you are adding another array access but
> reducing the number of possible method calls (although you still need a
> method call). Maybe this will be optimised out by the JVM.
>
> If the speed is not a factor then I'll rewrite it. Otherwise it's
> probably better done for speed as this is the entire point of the
> sampler given it disregards any probability under 2^-31 (i.e. it's not a
> perfectly fair sampler).
>
> Note that 5 tables are needed for 5 hex digits (base 2^6). The paper
> states using 3 tables of base 2^10 then you get a speed increase
> (roughly 1.16x) at the cost of storage (roughly 9x). Changing to 2
> tables of base 2^15 does not make it much faster again.
>
> I'll have a rethink to see if I can make the design work for different
> base sizes.

That could be an extension made easier with the 2D table, but
I quite agree that given the relatively minor speed improvement
to be expected, it is not the main reason; the rationale was just to
make the code a more compact and a little easier to grasp (IMHO).

Gilles

>
> >
> > The diff for "DiscreteSamplersList.java" refers to
> > MarsagliaTsangWangBinomialSampler
> > but
> >MarsagliaTsangWangSmallMeanPoissonSampler
> > seems to be missing.
>
> Oops, I missed adding that back. I built the PR from code where I was
> testing lots of implementations.
>
> I've just added it back and it is still passing locally. Travis should
> see that too as I pushed the change to the PR.
>
> >
> > Regards,
> > Gilles
> >
> >> This is a new sampler based on the source code from the paper:
> >>
> >> George Marsaglia, Wai Wan Tsang, Jingbo Wang (2004)
> >> Fast Generation of Discrete Random Variables.
> >> Journal of Statistical Software. Vol. 11, Issue. 3, pp. 1-11.
> >>
> >> https://www.jstatsoft.org/article/view/v011i03
> >>
> >> The code has no explicit licence.
> >>
> >> The paper states:
> >>
> >> "We have provided C versions of the two methods described here, for
> >> inclusion in the “Browse
> >> files”section of the journal. ... You may then want to examine the
> >> components of the two files, for illumination
> >> or for extracting portions that might be usefully applied to your
> >> discrete distributions."
> >>
> >> So I assuming that it can be incorporated with little modification.
> >>
> >> The Java implementation has been rewritten to allow the storage to be
> >> optimised for the required size. The generation of the tables has been
> >> adapted appropriately and checks have been added on the input parameters
> >> to ensure the sampler does not generate exceptions once constructed (I
> >> found out the hard way that the original code was not entirely correct).
> >>
> >> Thanks.
> >>
> >> Alex

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [rng] RNG-101 new MarsagliaTsangWang discrete probability sampler

2019-05-10 Thread Alex Herbert



On 10/05/2019 14:27, Gilles Sadowski wrote:

Hi Alex.

Le ven. 10 mai 2019 à 13:57, Alex Herbert  a écrit :

Can I get a review of the PR for RNG-101 please.

Thanks for this work!

I didn't go into the details; however, I see many fields and methods like
   table1 ... table5
   fillTable1 ... fillTable5
   getTable1 ... getTable5
Wouldn't it be possible to use a 2D table:
   table[5][];
so that e.g. only one "fillTable(int tableIndex, /* other args */)" method
is necessary (where "tableIndex" runs from 0 to 4)?


Yes. The design is based around using 5 tables as per the example code.

The sample() method knows which table it needs so it can directly jump 
to the table in question. I'd have to look at the difference in speed 
when using a 2D table as you are adding another array access but 
reducing the number of possible method calls (although you still need a 
method call). Maybe this will be optimised out by the JVM.


If the speed is not a factor then I'll rewrite it. Otherwise it's 
probably better done for speed as this is the entire point of the 
sampler given it disregards any probability under 2^-31 (i.e. it's not a 
perfectly fair sampler).


Note that 5 tables are needed for 5 hex digits (base 2^6). The paper 
states using 3 tables of base 2^10 then you get a speed increase 
(roughly 1.16x) at the cost of storage (roughly 9x). Changing to 2 
tables of base 2^15 does not make it much faster again.


I'll have a rethink to see if I can make the design work for different 
base sizes.




The diff for "DiscreteSamplersList.java" refers to
MarsagliaTsangWangBinomialSampler
but
   MarsagliaTsangWangSmallMeanPoissonSampler
seems to be missing.


Oops, I missed adding that back. I built the PR from code where I was 
testing lots of implementations.


I've just added it back and it is still passing locally. Travis should 
see that too as I pushed the change to the PR.




Regards,
Gilles


This is a new sampler based on the source code from the paper:

George Marsaglia, Wai Wan Tsang, Jingbo Wang (2004)
Fast Generation of Discrete Random Variables.
Journal of Statistical Software. Vol. 11, Issue. 3, pp. 1-11.

https://www.jstatsoft.org/article/view/v011i03

The code has no explicit licence.

The paper states:

"We have provided C versions of the two methods described here, for
inclusion in the “Browse
files”section of the journal. ... You may then want to examine the
components of the two files, for illumination
or for extracting portions that might be usefully applied to your
discrete distributions."

So I assuming that it can be incorporated with little modification.

The Java implementation has been rewritten to allow the storage to be
optimised for the required size. The generation of the tables has been
adapted appropriately and checks have been added on the input parameters
to ensure the sampler does not generate exceptions once constructed (I
found out the hard way that the original code was not entirely correct).

Thanks.

Alex

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [rng] RNG-101 new MarsagliaTsangWang discrete probability sampler

2019-05-10 Thread Gilles Sadowski
Hi Alex.

Le ven. 10 mai 2019 à 13:57, Alex Herbert  a écrit :
>
> Can I get a review of the PR for RNG-101 please.

Thanks for this work!

I didn't go into the details; however, I see many fields and methods like
  table1 ... table5
  fillTable1 ... fillTable5
  getTable1 ... getTable5
Wouldn't it be possible to use a 2D table:
  table[5][];
so that e.g. only one "fillTable(int tableIndex, /* other args */)" method
is necessary (where "tableIndex" runs from 0 to 4)?

The diff for "DiscreteSamplersList.java" refers to
   MarsagliaTsangWangBinomialSampler
but
  MarsagliaTsangWangSmallMeanPoissonSampler
seems to be missing.

Regards,
Gilles

> This is a new sampler based on the source code from the paper:
>
> George Marsaglia, Wai Wan Tsang, Jingbo Wang (2004)
> Fast Generation of Discrete Random Variables.
> Journal of Statistical Software. Vol. 11, Issue. 3, pp. 1-11.
>
> https://www.jstatsoft.org/article/view/v011i03
>
> The code has no explicit licence.
>
> The paper states:
>
> "We have provided C versions of the two methods described here, for
> inclusion in the “Browse
> files”section of the journal. ... You may then want to examine the
> components of the two files, for illumination
> or for extracting portions that might be usefully applied to your
> discrete distributions."
>
> So I assuming that it can be incorporated with little modification.
>
> The Java implementation has been rewritten to allow the storage to be
> optimised for the required size. The generation of the tables has been
> adapted appropriately and checks have been added on the input parameters
> to ensure the sampler does not generate exceptions once constructed (I
> found out the hard way that the original code was not entirely correct).
>
> Thanks.
>
> Alex

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



[rng] RNG-101 new MarsagliaTsangWang discrete probability sampler

2019-05-10 Thread Alex Herbert

Can I get a review of the PR for RNG-101 please.

This is a new sampler based on the source code from the paper:

George Marsaglia, Wai Wan Tsang, Jingbo Wang (2004)
Fast Generation of Discrete Random Variables.
Journal of Statistical Software. Vol. 11, Issue. 3, pp. 1-11.

https://www.jstatsoft.org/article/view/v011i03

The code has no explicit licence.

The paper states:

"We have provided C versions of the two methods described here, for 
inclusion in the “Browse
files”section of the journal. ... You may then want to examine the 
components of the two files, for illumination
or for extracting portions that might be usefully applied to your 
discrete distributions."


So I assuming that it can be incorporated with little modification.

The Java implementation has been rewritten to allow the storage to be 
optimised for the required size. The generation of the tables has been 
adapted appropriately and checks have been added on the input parameters 
to ensure the sampler does not generate exceptions once constructed (I 
found out the hard way that the original code was not entirely correct).


Thanks.

Alex



-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org