Re: [statistics] Mode function for Cauchy distribution

2019-05-09 Thread Alex Herbert



> On 9 May 2019, at 21:17, Eric Barnhill  wrote:
> 
> Awesome!
> 
> On Thu, May 9, 2019 at 10:44 AM Udit Arora  wrote:
> 
>> I will see what I can do. It will take some time, but I will get to know
>> more about the other distributions.
>> 
>> 
>> On Thu, 9 May 2019, 10:58 pm Eric Barnhill, 
>> wrote:
>> 
>>> Udit, is it clear what to do here? Gilles recommends you propose some
>> edits
>>> to ContinuousDistribution instead, to return Mode and Median.
>>> 
>>> But then, if an interface is altered, all the classes that implement that
>>> interface need to have these functions added, so we hope you are up for
>> all
>>> that additional work. We can help you.

I think it would be prudent to go through all the distributions and see what is 
defined for each. Wikipedia has a helper table for all its distributions 
containing:

Mean
Median
Mode
Variance
Skewness
Ex. kurtosis
Entropy
Fisher Information

If many are undefined then you are adding to an interface something not 
generally supported.

Currently the ContinuousDistribution interface only has the mean and the 
variance. But note that these are used by the inverse cumulative probability 
method in the base abstract class. Same goes for the DiscreteDistribution.

I am +0 for adding more methods. I don’t see a reason not to. But nor do I see 
a need (within the library) to have them at the interface level if the mode or 
median for example are not required in a generic way.

>>> 
>>> Last is the idea of accessor methods. if the method starts with get_()
>> then
>>> in principle this is just returning a field already present. But with
>> that
>>> in mind, I don't know why we already have a method name like getMean() in
>>> this interface. We don't really know whether for a given distribution,
>> that
>>> would be a true accessor or need to be calculated. So I think all these
>>> method names should just be mean(), mode(), median(), etc.
>>> 
>>> So sorry if this is blowing up into more work than you expected. It often
>>> works that way! I certainly think these changes are worthwhile however.
>>> 
>>> 
>>> 
>>> On Thu, May 9, 2019 at 7:17 AM Gilles Sadowski 
>>> wrote:
>>> 
 Hi Udit.
 
 Le jeu. 9 mai 2019 à 12:52, Udit Arora  a
>> écrit :
> 
> I intend to add a mode function for the Cauchy Distribution. It is a
 small
> addition which i thought might be helpful.
 
 How will it be helpful?  I.e. what would an application developer
 be able to do, that he can't with the current code?
 
 You've surely noted that that the class you want to modify is but
 one of the implementations of the interface "ContinuousDistribution".
 So if you propose to change the API, the change should be done
 at the interface level, and the appropriate computation performed, or
 method overloads defined, for all implementations.
 
 The "accessor" methods refer to fields that were set by the contructor;
 e.g. for "CauchyDistribution", "median" and "scale".
 In this case, it happens that "mode" has the same value as "median",
 but does this warrant an additional method?
 
 Regards,
 Gilles
 
> Thanks
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
 For additional commands, e-mail: dev-h...@commons.apache.org
 
 
>>> 
>> 


-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [statistics] Mode function for Cauchy distribution

2019-05-09 Thread Eric Barnhill
Awesome!

On Thu, May 9, 2019 at 10:44 AM Udit Arora  wrote:

> I will see what I can do. It will take some time, but I will get to know
> more about the other distributions.
>
>
> On Thu, 9 May 2019, 10:58 pm Eric Barnhill, 
> wrote:
>
> > Udit, is it clear what to do here? Gilles recommends you propose some
> edits
> > to ContinuousDistribution instead, to return Mode and Median.
> >
> > But then, if an interface is altered, all the classes that implement that
> > interface need to have these functions added, so we hope you are up for
> all
> > that additional work. We can help you.
> >
> > Last is the idea of accessor methods. if the method starts with get_()
> then
> > in principle this is just returning a field already present. But with
> that
> > in mind, I don't know why we already have a method name like getMean() in
> > this interface. We don't really know whether for a given distribution,
> that
> > would be a true accessor or need to be calculated. So I think all these
> > method names should just be mean(), mode(), median(), etc.
> >
> > So sorry if this is blowing up into more work than you expected. It often
> > works that way! I certainly think these changes are worthwhile however.
> >
> >
> >
> > On Thu, May 9, 2019 at 7:17 AM Gilles Sadowski 
> > wrote:
> >
> > > Hi Udit.
> > >
> > > Le jeu. 9 mai 2019 à 12:52, Udit Arora  a
> écrit :
> > > >
> > > > I intend to add a mode function for the Cauchy Distribution. It is a
> > > small
> > > > addition which i thought might be helpful.
> > >
> > > How will it be helpful?  I.e. what would an application developer
> > > be able to do, that he can't with the current code?
> > >
> > > You've surely noted that that the class you want to modify is but
> > > one of the implementations of the interface "ContinuousDistribution".
> > > So if you propose to change the API, the change should be done
> > > at the interface level, and the appropriate computation performed, or
> > > method overloads defined, for all implementations.
> > >
> > > The "accessor" methods refer to fields that were set by the contructor;
> > > e.g. for "CauchyDistribution", "median" and "scale".
> > > In this case, it happens that "mode" has the same value as "median",
> > > but does this warrant an additional method?
> > >
> > > Regards,
> > > Gilles
> > >
> > > > Thanks
> > >
> > > -
> > > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> > > For additional commands, e-mail: dev-h...@commons.apache.org
> > >
> > >
> >
>


Re: [rng] Copying samplers

2019-05-09 Thread Gilles Sadowski
Le jeu. 9 mai 2019 à 17:00, Alex Herbert  a écrit :
>
>
> On 09/05/2019 15:39, Gilles Sadowski wrote:
> > Le jeu. 9 mai 2019 à 15:41, Alex Herbert  a écrit 
> > :
> >> On Sat, 4 May 2019 at 23:52, Alex Herbert  wrote:
> >>
> >>>
>  On 4 May 2019, at 22:34, Gilles Sadowski  wrote:
> 
>  Hi.
> 
>  Le sam. 4 mai 2019 à 21:31, Alex Herbert  a
> >>> écrit :
> >
> >
> >> On 4 May 2019, at 14:46, Gilles Sadowski  wrote:
> >>
> >> Hello.
> >>
> >> Le ven. 3 mai 2019 à 16:57, Alex Herbert  >>> > a écrit :
> >>> Most of the samplers in the library have very small states that are
> >>> easy
> >>> to compute. Some have computations that are more expensive, such as
> >>> the
> >>> LargeMeanPoissonSampler or the DiscreteProbabilityCollectionSampler.
> >>>
> >>> However once the state is computed the only part of the state that
> >>> changes is the RNG. I would like to suggest a way to copy samplers as
> >>> something like:
> >>>
> >>> DiscreteSampler newInstance(UniformRandomProvider)
> >>>
> >>> The new instance would share all the private state of the first
> >>> sampler
> >>> except the RNG. This can be used for multi-threaded applications which
> >>> require a new sampler per thread but sample from the same
> >>> distribution.
> >>> A particular case in point is the as yet not integrated
> >>> MarsagliaTsangWangSmallMeanPoissonSampler (see RNG-91 [1]) which has a
> >>> "large" state [2] that takes a "long" time [3] to compute but is
> >>> effectively immutable. This could be shared across instances saving
> >>> memory for parallel application.
> >>>
> >>> A copy instance would be almost zero set-up time and provide
> >>> opportunity
> >>> for caching of commonly used samplers.
> >> The goal is sharing (immutable) state so it seems that the semantics is
> >> not "copy".
> >>
> >> Isn't it a "factory" that we are after?  E.g. something like:
> >> public final class CachedSamplingFactory {
> >>private static PoissonSamplerCache poisson = new
> >>> PoissonSamplerCache();
> >>public PoissonSampler createPoissonSampler(UniformRandomProvider
> >> rng, double mean) {
> >>if (!poisson.isCached(mean)) {
> >>poisson.createCache(mean); // Initialize (requires
> >> synchronization) ...
> >>}
> >>return new PoissonSampler(poisson.getCache(mean), rng); //
> >> Construct using pre-built state.
> >>}
> >> }
> >> [It may be overkill, more work, and less performant…]
> > But you need a factory for every class you want to share state for. And
> >>> the factory actually has to look in a cache. If you operate on an instance
> >>> then you get what you want. Another working version of the same sampler. 
> >>> It
> >>> would also be thread safe without synchronisation as long as the state is
> >>> immutable. The only mutable state is the passed in RNG.
>  Agreed.  It was what I meant by the last sentence.
> 
> >> IIUC, you suggest to add "newInstance" in the "DiscreatSampler"
> >>> interface (?).
> > I did think of extending DiscreteSampler with this functionality. Not
> >>> adding to the interface as it currently is ‘functional’ as it has only one
> >>> method. I think that should not change. Having thought about it a bit more
> >>> I like the idea of a new functional interface. Perhaps:
> > interface DiscreteSamplerProvider {
> > DiscreteSampler create(UniformRandomProvider rng);
> > }
> >
> > Substitute ‘Provider’ for:
> >
> > - Generator
> > - Supplier (possible clash or alignment with Java 8 depending on the
> >>> way it is done)
> > - Factory (though the method is not static so I do not like this)
> > - etc
> >
> > So this then becomes a functional interface that can be used by
> >>> anything. However instances of a sampler would be expected to return a
> >>> sampler matching their own functionality.
> > Note there are some samplers not implementing an interface that also
> >>> could benefit from this. Namely CollectionSampler and
> >>> DiscreteProbabilityCollectionSampler. So does this need a generic 
> >>> interface:
> > Sampler {
> > T sample();
> > }
> >
> > To be complimented with:
> >
> > SamplerProvider {
> > Sampler create(UniformRandomProvider rng);
> > }
> >
> > So the library would require:
> >
> > SamplerProvider
> > DiscreteSamplerProvider
> > ContinuousSamplerProvider
> >
> > Any sampler can choose to implement being a Provider. There are some
> >>> cases where it is mute. For example a ZigguratNormalizedGaussianSampler
> >>> just stores the rng in the constructor. However it could still be a
> >>> Provider just the method would only call the constructor. It would allow
> >>> writing a generic 

Re: [statistics] Mode function for Cauchy distribution

2019-05-09 Thread Udit Arora
I will see what I can do. It will take some time, but I will get to know
more about the other distributions.


On Thu, 9 May 2019, 10:58 pm Eric Barnhill,  wrote:

> Udit, is it clear what to do here? Gilles recommends you propose some edits
> to ContinuousDistribution instead, to return Mode and Median.
>
> But then, if an interface is altered, all the classes that implement that
> interface need to have these functions added, so we hope you are up for all
> that additional work. We can help you.
>
> Last is the idea of accessor methods. if the method starts with get_() then
> in principle this is just returning a field already present. But with that
> in mind, I don't know why we already have a method name like getMean() in
> this interface. We don't really know whether for a given distribution, that
> would be a true accessor or need to be calculated. So I think all these
> method names should just be mean(), mode(), median(), etc.
>
> So sorry if this is blowing up into more work than you expected. It often
> works that way! I certainly think these changes are worthwhile however.
>
>
>
> On Thu, May 9, 2019 at 7:17 AM Gilles Sadowski 
> wrote:
>
> > Hi Udit.
> >
> > Le jeu. 9 mai 2019 à 12:52, Udit Arora  a écrit :
> > >
> > > I intend to add a mode function for the Cauchy Distribution. It is a
> > small
> > > addition which i thought might be helpful.
> >
> > How will it be helpful?  I.e. what would an application developer
> > be able to do, that he can't with the current code?
> >
> > You've surely noted that that the class you want to modify is but
> > one of the implementations of the interface "ContinuousDistribution".
> > So if you propose to change the API, the change should be done
> > at the interface level, and the appropriate computation performed, or
> > method overloads defined, for all implementations.
> >
> > The "accessor" methods refer to fields that were set by the contructor;
> > e.g. for "CauchyDistribution", "median" and "scale".
> > In this case, it happens that "mode" has the same value as "median",
> > but does this warrant an additional method?
> >
> > Regards,
> > Gilles
> >
> > > Thanks
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> > For additional commands, e-mail: dev-h...@commons.apache.org
> >
> >
>


Re: [statistics] Mode function for Cauchy distribution

2019-05-09 Thread Eric Barnhill
Udit, is it clear what to do here? Gilles recommends you propose some edits
to ContinuousDistribution instead, to return Mode and Median.

But then, if an interface is altered, all the classes that implement that
interface need to have these functions added, so we hope you are up for all
that additional work. We can help you.

Last is the idea of accessor methods. if the method starts with get_() then
in principle this is just returning a field already present. But with that
in mind, I don't know why we already have a method name like getMean() in
this interface. We don't really know whether for a given distribution, that
would be a true accessor or need to be calculated. So I think all these
method names should just be mean(), mode(), median(), etc.

So sorry if this is blowing up into more work than you expected. It often
works that way! I certainly think these changes are worthwhile however.



On Thu, May 9, 2019 at 7:17 AM Gilles Sadowski  wrote:

> Hi Udit.
>
> Le jeu. 9 mai 2019 à 12:52, Udit Arora  a écrit :
> >
> > I intend to add a mode function for the Cauchy Distribution. It is a
> small
> > addition which i thought might be helpful.
>
> How will it be helpful?  I.e. what would an application developer
> be able to do, that he can't with the current code?
>
> You've surely noted that that the class you want to modify is but
> one of the implementations of the interface "ContinuousDistribution".
> So if you propose to change the API, the change should be done
> at the interface level, and the appropriate computation performed, or
> method overloads defined, for all implementations.
>
> The "accessor" methods refer to fields that were set by the contructor;
> e.g. for "CauchyDistribution", "median" and "scale".
> In this case, it happens that "mode" has the same value as "median",
> but does this warrant an additional method?
>
> Regards,
> Gilles
>
> > Thanks
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> For additional commands, e-mail: dev-h...@commons.apache.org
>
>


Re: [rng] Utility for creating permutations of hex digits

2019-05-09 Thread Gilles Sadowski
Le jeu. 9 mai 2019 à 17:07, Alex Herbert  a écrit :
>
>
> On 09/05/2019 15:46, Gilles Sadowski wrote:
> > Le jeu. 9 mai 2019 à 14:30, Alex Herbert  a écrit 
> > :
> >>
> >>
> >>> On 9 May 2019, at 12:58, Gilles Sadowski  wrote:
> >>>
> >>> Hi.
> >>>
> >>> Le jeu. 9 mai 2019 à 13:31, Alex Herbert  >>> > a écrit :
>  The Middle Square Weyl Sequence (MSWS) generator uses an internal Weyl 
>  sequence [1] to create randomness. This is basically a linear increment 
>  added to a sum that will eventually wrap (due to overflow) to restart at 
>  the beginning. The MSWS paper recommends an increment with a high number 
>  of different bits set in a random pattern across the 64-bit of the long. 
>  The paper recommends using a permutation of 8 from the 16 hex digits for 
>  the upper and lower 32-bits.
> 
>  The source code for the MSWS provides a routine that generates a 
>  permutation. Unfortunately:
> 
>  - The code is GPL 3 so restricting it from use under the Apache licence 
>  (without jumping through some hoops)
>  - The algorithm is a simple rejection method that suffers from high 
>  rejection probability when approaching 8 digits already chosen
> 
>  I have created an alternative faster implementation for use when seeding 
>  the MSWS generator. However it may be a function to be reused in other 
>  places.
> 
>  The question is where to put this utility function. It requires a source 
>  of randomness to create the permutation. It has the following signature:
> 
>  /**
>  * Creates an {@code int} containing a permutation of 8 hex digits chosen 
>  from 16.
>  *
>  * @param rng Source of randomness.
>  * @return Hex digit permutation.
>  */
>  public static int createIntHexPermutation(UniformRandomProvider rng);
> 
>  Likewise:
> 
>  /**
>  * Creates a {@code long} containing a permutation of 8 hex digits chosen 
>  from 16 in
>  * the upper and lower 32-bits.
>  *
>  * @param rng Source of randomness.
>  * @return Hex digit permutation.
>  */
>  public static long createLongHexPermutation(UniformRandomProvider rng);
> 
>  Options:
> 
>  - Put it as a package private function inside the MSWS generator to be 
>  used only when creating this generator. Package private allows unit 
>  testing the algorithm does provides the random permutation 16-choose-8
>  - Put it as a helper function in org.apache.commons.rng.core.util
> >>> - In "SeedFactory" (?).
> >>>
> >>> For MSWS ("core" module), the increment would be an argument to the 
> >>> constructor
> >>> (allowing the user to shoot himself in the foot, like when passing a
> >>> bad seed), and
> >>> "RandomSource" ("simple" module) would offer to provide an instance
> >>> for which the
> >>> increment was computed according to the recommendation.
> >>
> >> OK. That makes it easier to build the reference implementation in Core as 
> >> it just matches the C reference code. I can add the seeding function to 
> >> SeedFactory in the Simple module. So if a user passes anything to be used 
> >> as the seed then it passes through unchanged (or converted). But if they 
> >> do not provide a seed then it should be generated appropriately.
> >>
> >> This means I should really get on with updating the RandomSourceInternal 
> >> and ProviderBuilder (RNG 75 [1]). It currently does not support creating 
> >> seeds based on the exact RandomSource. It just uses the native seed type 
> >> of the RandomSource. Here are the current use cases that should be handled:
> >>
> >> - MSWS recommends a seed with a permutation of hex digits.
> >> - XorShiRo family of generators all require seeds with at least some 
> >> non-zero elements.
> >>
> >> My idea was to target this part of the ProviderBuilder createSeed method:
> >>
> >> if (seed == null) {
> >>  // Create a random seed of the appropriate native type.
> >>
> >>  if (source.getSeed().equals(Integer.class)) {
> >>  nativeSeed = SeedFactory.createInt();
> >>  } else if (source.getSeed().equals(Long.class)) {
> >>  nativeSeed = SeedFactory.createLong();
> >>
> >>
> >> To change it to:
> >>
> >> if (seed == null) {
> >>  // Delegate to the source to create an appropriate seed (since it 
> >> knows best)
> >>  return source.createSeed()
> > But IIUC, that would mean that the code for computing the seed
> > is in "core", not "simple" (where "SeedFactory" is defined).
>
> Sorry, my code snippet was not fully qualified. This is from
>
> org.apache.commons.rng.simple.internal.ProviderBuilder.createSeed
>
> This is the place where a seed is currently made so your point is
> satisfied.

All fine then. :-)

> The current process for the seed creation is a bit limited. It just
> builds int, long, int[], or long[]. It is not currently able to build
> arrays of the correct 

Re: [rng] Utility for creating permutations of hex digits

2019-05-09 Thread Alex Herbert



On 09/05/2019 15:46, Gilles Sadowski wrote:

Le jeu. 9 mai 2019 à 14:30, Alex Herbert  a écrit :




On 9 May 2019, at 12:58, Gilles Sadowski  wrote:

Hi.

Le jeu. 9 mai 2019 à 13:31, Alex Herbert mailto:alex.d.herb...@gmail.com>> a écrit :

The Middle Square Weyl Sequence (MSWS) generator uses an internal Weyl sequence 
[1] to create randomness. This is basically a linear increment added to a sum 
that will eventually wrap (due to overflow) to restart at the beginning. The 
MSWS paper recommends an increment with a high number of different bits set in 
a random pattern across the 64-bit of the long. The paper recommends using a 
permutation of 8 from the 16 hex digits for the upper and lower 32-bits.

The source code for the MSWS provides a routine that generates a permutation. 
Unfortunately:

- The code is GPL 3 so restricting it from use under the Apache licence 
(without jumping through some hoops)
- The algorithm is a simple rejection method that suffers from high rejection 
probability when approaching 8 digits already chosen

I have created an alternative faster implementation for use when seeding the 
MSWS generator. However it may be a function to be reused in other places.

The question is where to put this utility function. It requires a source of 
randomness to create the permutation. It has the following signature:

/**
* Creates an {@code int} containing a permutation of 8 hex digits chosen from 
16.
*
* @param rng Source of randomness.
* @return Hex digit permutation.
*/
public static int createIntHexPermutation(UniformRandomProvider rng);

Likewise:

/**
* Creates a {@code long} containing a permutation of 8 hex digits chosen from 
16 in
* the upper and lower 32-bits.
*
* @param rng Source of randomness.
* @return Hex digit permutation.
*/
public static long createLongHexPermutation(UniformRandomProvider rng);

Options:

- Put it as a package private function inside the MSWS generator to be used 
only when creating this generator. Package private allows unit testing the 
algorithm does provides the random permutation 16-choose-8
- Put it as a helper function in org.apache.commons.rng.core.util

- In "SeedFactory" (?).

For MSWS ("core" module), the increment would be an argument to the constructor
(allowing the user to shoot himself in the foot, like when passing a
bad seed), and
"RandomSource" ("simple" module) would offer to provide an instance
for which the
increment was computed according to the recommendation.


OK. That makes it easier to build the reference implementation in Core as it 
just matches the C reference code. I can add the seeding function to 
SeedFactory in the Simple module. So if a user passes anything to be used as 
the seed then it passes through unchanged (or converted). But if they do not 
provide a seed then it should be generated appropriately.

This means I should really get on with updating the RandomSourceInternal and 
ProviderBuilder (RNG 75 [1]). It currently does not support creating seeds 
based on the exact RandomSource. It just uses the native seed type of the 
RandomSource. Here are the current use cases that should be handled:

- MSWS recommends a seed with a permutation of hex digits.
- XorShiRo family of generators all require seeds with at least some non-zero 
elements.

My idea was to target this part of the ProviderBuilder createSeed method:

if (seed == null) {
 // Create a random seed of the appropriate native type.

 if (source.getSeed().equals(Integer.class)) {
 nativeSeed = SeedFactory.createInt();
 } else if (source.getSeed().equals(Long.class)) {
 nativeSeed = SeedFactory.createLong();


To change it to:

if (seed == null) {
 // Delegate to the source to create an appropriate seed (since it knows 
best)
 return source.createSeed()

But IIUC, that would mean that the code for computing the seed
is in "core", not "simple" (where "SeedFactory" is defined).


Sorry, my code snippet was not fully qualified. This is from

org.apache.commons.rng.simple.internal.ProviderBuilder.createSeed

This is the place where a seed is currently made so your point is 
satisfied.


The current process for the seed creation is a bit limited. It just 
builds int, long, int[], or long[]. It is not currently able to build 
arrays of the correct length or build seeds with specific requirements 
based on the source. That is what I would like to change. So if the 
source was a MSWS then it would build the seed using the hex digit 
permutation method. If the source was a XorShiRo family it would build 
seeds with no zeros.


My idea was to move the null seed creation pathway into 
RandomSourceInternal. So if the RandomSource has specific needs for the 
seed then the internal enum can override the default method to create 
the seed.



My point was that, even though the designer of the algorithm indeed
"knows best", the user should be allowed to pass any seed/increment
(even if it is "not recommended").
The "simple" API's role is 

Re: [rng] Copying samplers

2019-05-09 Thread Alex Herbert



On 09/05/2019 15:39, Gilles Sadowski wrote:

Le jeu. 9 mai 2019 à 15:41, Alex Herbert  a écrit :

On Sat, 4 May 2019 at 23:52, Alex Herbert  wrote:




On 4 May 2019, at 22:34, Gilles Sadowski  wrote:

Hi.

Le sam. 4 mai 2019 à 21:31, Alex Herbert  a

écrit :




On 4 May 2019, at 14:46, Gilles Sadowski  wrote:

Hello.

Le ven. 3 mai 2019 à 16:57, Alex Herbert 
> a écrit :

Most of the samplers in the library have very small states that are

easy

to compute. Some have computations that are more expensive, such as

the

LargeMeanPoissonSampler or the DiscreteProbabilityCollectionSampler.

However once the state is computed the only part of the state that
changes is the RNG. I would like to suggest a way to copy samplers as
something like:

DiscreteSampler newInstance(UniformRandomProvider)

The new instance would share all the private state of the first

sampler

except the RNG. This can be used for multi-threaded applications which
require a new sampler per thread but sample from the same

distribution.

A particular case in point is the as yet not integrated
MarsagliaTsangWangSmallMeanPoissonSampler (see RNG-91 [1]) which has a
"large" state [2] that takes a "long" time [3] to compute but is
effectively immutable. This could be shared across instances saving
memory for parallel application.

A copy instance would be almost zero set-up time and provide

opportunity

for caching of commonly used samplers.

The goal is sharing (immutable) state so it seems that the semantics is
not "copy".

Isn't it a "factory" that we are after?  E.g. something like:
public final class CachedSamplingFactory {
   private static PoissonSamplerCache poisson = new

PoissonSamplerCache();

   public PoissonSampler createPoissonSampler(UniformRandomProvider
rng, double mean) {
   if (!poisson.isCached(mean)) {
   poisson.createCache(mean); // Initialize (requires
synchronization) ...
   }
   return new PoissonSampler(poisson.getCache(mean), rng); //
Construct using pre-built state.
   }
}
[It may be overkill, more work, and less performant…]

But you need a factory for every class you want to share state for. And

the factory actually has to look in a cache. If you operate on an instance
then you get what you want. Another working version of the same sampler. It
would also be thread safe without synchronisation as long as the state is
immutable. The only mutable state is the passed in RNG.

Agreed.  It was what I meant by the last sentence.


IIUC, you suggest to add "newInstance" in the "DiscreatSampler"

interface (?).

I did think of extending DiscreteSampler with this functionality. Not

adding to the interface as it currently is ‘functional’ as it has only one
method. I think that should not change. Having thought about it a bit more
I like the idea of a new functional interface. Perhaps:

interface DiscreteSamplerProvider {
DiscreteSampler create(UniformRandomProvider rng);
}

Substitute ‘Provider’ for:

- Generator
- Supplier (possible clash or alignment with Java 8 depending on the

way it is done)

- Factory (though the method is not static so I do not like this)
- etc

So this then becomes a functional interface that can be used by

anything. However instances of a sampler would be expected to return a
sampler matching their own functionality.

Note there are some samplers not implementing an interface that also

could benefit from this. Namely CollectionSampler and
DiscreteProbabilityCollectionSampler. So does this need a generic interface:

Sampler {
T sample();
}

To be complimented with:

SamplerProvider {
Sampler create(UniformRandomProvider rng);
}

So the library would require:

SamplerProvider
DiscreteSamplerProvider
ContinuousSamplerProvider

Any sampler can choose to implement being a Provider. There are some

cases where it is mute. For example a ZigguratNormalizedGaussianSampler
just stores the rng in the constructor. However it could still be a
Provider just the method would only call the constructor. It would allow
writing a generic multi-threaded application that just uses e.g. a
DiscreteSamplerProvider to create samplers for each thread. You can then
drop in the actual implementation you require. For example you could swap
the type of PoissonSampler in your simulation by swapping the provider for
the Poisson distribution.

How does that sound?

Fine to have
  DiscreteSamplerProvider
  ContinuousSamplerProvider
[Perhaps the "Supplier" suffix would be better to avoid confusion with
"UniformRandomProvider".]

At first sight, I don't think that the generic interface would have
any actual use since, ultimately, the return value of "sample()"
will be either "int" or "double" (no polymorphism).


The generic interface is for the samplers that are typed for collections
and currently return a sample T, or those that return arrays. It would not
be for Integer or Double from the probability distribution samplers. Here
are what could use it:


Re: [rng] Utility for creating permutations of hex digits

2019-05-09 Thread Gilles Sadowski
Le jeu. 9 mai 2019 à 14:30, Alex Herbert  a écrit :
>
>
>
> > On 9 May 2019, at 12:58, Gilles Sadowski  wrote:
> >
> > Hi.
> >
> > Le jeu. 9 mai 2019 à 13:31, Alex Herbert  > > a écrit :
> >>
> >> The Middle Square Weyl Sequence (MSWS) generator uses an internal Weyl 
> >> sequence [1] to create randomness. This is basically a linear increment 
> >> added to a sum that will eventually wrap (due to overflow) to restart at 
> >> the beginning. The MSWS paper recommends an increment with a high number 
> >> of different bits set in a random pattern across the 64-bit of the long. 
> >> The paper recommends using a permutation of 8 from the 16 hex digits for 
> >> the upper and lower 32-bits.
> >>
> >> The source code for the MSWS provides a routine that generates a 
> >> permutation. Unfortunately:
> >>
> >> - The code is GPL 3 so restricting it from use under the Apache licence 
> >> (without jumping through some hoops)
> >> - The algorithm is a simple rejection method that suffers from high 
> >> rejection probability when approaching 8 digits already chosen
> >>
> >> I have created an alternative faster implementation for use when seeding 
> >> the MSWS generator. However it may be a function to be reused in other 
> >> places.
> >>
> >> The question is where to put this utility function. It requires a source 
> >> of randomness to create the permutation. It has the following signature:
> >>
> >> /**
> >> * Creates an {@code int} containing a permutation of 8 hex digits chosen 
> >> from 16.
> >> *
> >> * @param rng Source of randomness.
> >> * @return Hex digit permutation.
> >> */
> >> public static int createIntHexPermutation(UniformRandomProvider rng);
> >>
> >> Likewise:
> >>
> >> /**
> >> * Creates a {@code long} containing a permutation of 8 hex digits chosen 
> >> from 16 in
> >> * the upper and lower 32-bits.
> >> *
> >> * @param rng Source of randomness.
> >> * @return Hex digit permutation.
> >> */
> >> public static long createLongHexPermutation(UniformRandomProvider rng);
> >>
> >> Options:
> >>
> >> - Put it as a package private function inside the MSWS generator to be 
> >> used only when creating this generator. Package private allows unit 
> >> testing the algorithm does provides the random permutation 16-choose-8
> >> - Put it as a helper function in org.apache.commons.rng.core.util
> >
> > - In "SeedFactory" (?).
> >
> > For MSWS ("core" module), the increment would be an argument to the 
> > constructor
> > (allowing the user to shoot himself in the foot, like when passing a
> > bad seed), and
> > "RandomSource" ("simple" module) would offer to provide an instance
> > for which the
> > increment was computed according to the recommendation.
>
>
> OK. That makes it easier to build the reference implementation in Core as it 
> just matches the C reference code. I can add the seeding function to 
> SeedFactory in the Simple module. So if a user passes anything to be used as 
> the seed then it passes through unchanged (or converted). But if they do not 
> provide a seed then it should be generated appropriately.
>
> This means I should really get on with updating the RandomSourceInternal and 
> ProviderBuilder (RNG 75 [1]). It currently does not support creating seeds 
> based on the exact RandomSource. It just uses the native seed type of the 
> RandomSource. Here are the current use cases that should be handled:
>
> - MSWS recommends a seed with a permutation of hex digits.
> - XorShiRo family of generators all require seeds with at least some non-zero 
> elements.
>
> My idea was to target this part of the ProviderBuilder createSeed method:
>
> if (seed == null) {
> // Create a random seed of the appropriate native type.
>
> if (source.getSeed().equals(Integer.class)) {
> nativeSeed = SeedFactory.createInt();
> } else if (source.getSeed().equals(Long.class)) {
> nativeSeed = SeedFactory.createLong();
>
>
> To change it to:
>
> if (seed == null) {
> // Delegate to the source to create an appropriate seed (since it knows 
> best)
> return source.createSeed()

But IIUC, that would mean that the code for computing the seed
is in "core", not "simple" (where "SeedFactory" is defined).
My point was that, even though the designer of the algorithm indeed
"knows best", the user should be allowed to pass any seed/increment
(even if it is "not recommended").
The "simple" API's role is to provide recommended values as defaults.

Gilles

>
> The RandomSourceInternal which already has knowledge of the native type of 
> the seed for the generator will be extended to know the length for array 
> seeds, and a default implementation of create for each native seed type 
> allowing override for generators with specific requirements.
>
> [1] https://issues.apache.org/jira/browse/RNG-75 
> 
>
> >
> > Regards,
> > Gilles
> >
> >>
> >> Note that the function is an alternative to that used by the 
> 

Re: [rng] Copying samplers

2019-05-09 Thread Gilles Sadowski
Le jeu. 9 mai 2019 à 15:41, Alex Herbert  a écrit :
>
> On Sat, 4 May 2019 at 23:52, Alex Herbert  wrote:
>
> >
> >
> > > On 4 May 2019, at 22:34, Gilles Sadowski  wrote:
> > >
> > > Hi.
> > >
> > > Le sam. 4 mai 2019 à 21:31, Alex Herbert  a
> > écrit :
> > >>
> > >>
> > >>
> > >>> On 4 May 2019, at 14:46, Gilles Sadowski  wrote:
> > >>>
> > >>> Hello.
> > >>>
> > >>> Le ven. 3 mai 2019 à 16:57, Alex Herbert  > > a écrit :
> > 
> >  Most of the samplers in the library have very small states that are
> > easy
> >  to compute. Some have computations that are more expensive, such as
> > the
> >  LargeMeanPoissonSampler or the DiscreteProbabilityCollectionSampler.
> > 
> >  However once the state is computed the only part of the state that
> >  changes is the RNG. I would like to suggest a way to copy samplers as
> >  something like:
> > 
> >  DiscreteSampler newInstance(UniformRandomProvider)
> > 
> >  The new instance would share all the private state of the first
> > sampler
> >  except the RNG. This can be used for multi-threaded applications which
> >  require a new sampler per thread but sample from the same
> > distribution.
> > 
> >  A particular case in point is the as yet not integrated
> >  MarsagliaTsangWangSmallMeanPoissonSampler (see RNG-91 [1]) which has a
> >  "large" state [2] that takes a "long" time [3] to compute but is
> >  effectively immutable. This could be shared across instances saving
> >  memory for parallel application.
> > 
> >  A copy instance would be almost zero set-up time and provide
> > opportunity
> >  for caching of commonly used samplers.
> > >>>
> > >>> The goal is sharing (immutable) state so it seems that the semantics is
> > >>> not "copy".
> > >>>
> > >>> Isn't it a "factory" that we are after?  E.g. something like:
> > >>> public final class CachedSamplingFactory {
> > >>>   private static PoissonSamplerCache poisson = new
> > PoissonSamplerCache();
> > >>>
> > >>>   public PoissonSampler createPoissonSampler(UniformRandomProvider
> > >>> rng, double mean) {
> > >>>   if (!poisson.isCached(mean)) {
> > >>>   poisson.createCache(mean); // Initialize (requires
> > >>> synchronization) ...
> > >>>   }
> > >>>   return new PoissonSampler(poisson.getCache(mean), rng); //
> > >>> Construct using pre-built state.
> > >>>   }
> > >>> }
> > >>> [It may be overkill, more work, and less performant…]
> > >>
> > >> But you need a factory for every class you want to share state for. And
> > the factory actually has to look in a cache. If you operate on an instance
> > then you get what you want. Another working version of the same sampler. It
> > would also be thread safe without synchronisation as long as the state is
> > immutable. The only mutable state is the passed in RNG.
> > >
> > > Agreed.  It was what I meant by the last sentence.
> > >
> > >>>
> > >>> IIUC, you suggest to add "newInstance" in the "DiscreatSampler"
> > interface (?).
> > >>
> > >> I did think of extending DiscreteSampler with this functionality. Not
> > adding to the interface as it currently is ‘functional’ as it has only one
> > method. I think that should not change. Having thought about it a bit more
> > I like the idea of a new functional interface. Perhaps:
> > >>
> > >> interface DiscreteSamplerProvider {
> > >>DiscreteSampler create(UniformRandomProvider rng);
> > >> }
> > >>
> > >> Substitute ‘Provider’ for:
> > >>
> > >> - Generator
> > >> - Supplier (possible clash or alignment with Java 8 depending on the
> > way it is done)
> > >> - Factory (though the method is not static so I do not like this)
> > >> - etc
> > >>
> > >> So this then becomes a functional interface that can be used by
> > anything. However instances of a sampler would be expected to return a
> > sampler matching their own functionality.
> > >>
> > >> Note there are some samplers not implementing an interface that also
> > could benefit from this. Namely CollectionSampler and
> > DiscreteProbabilityCollectionSampler. So does this need a generic interface:
> > >>
> > >> Sampler {
> > >>T sample();
> > >> }
> > >>
> > >> To be complimented with:
> > >>
> > >> SamplerProvider {
> > >>Sampler create(UniformRandomProvider rng);
> > >> }
> > >>
> > >> So the library would require:
> > >>
> > >> SamplerProvider
> > >> DiscreteSamplerProvider
> > >> ContinuousSamplerProvider
> > >>
> > >> Any sampler can choose to implement being a Provider. There are some
> > cases where it is mute. For example a ZigguratNormalizedGaussianSampler
> > just stores the rng in the constructor. However it could still be a
> > Provider just the method would only call the constructor. It would allow
> > writing a generic multi-threaded application that just uses e.g. a
> > DiscreteSamplerProvider to create samplers for each thread. You can then
> > drop in the actual implementation you require. For 

Re: [statistics] Mode function for Cauchy distribution

2019-05-09 Thread Gilles Sadowski
Hi Udit.

Le jeu. 9 mai 2019 à 12:52, Udit Arora  a écrit :
>
> I intend to add a mode function for the Cauchy Distribution. It is a small
> addition which i thought might be helpful.

How will it be helpful?  I.e. what would an application developer
be able to do, that he can't with the current code?

You've surely noted that that the class you want to modify is but
one of the implementations of the interface "ContinuousDistribution".
So if you propose to change the API, the change should be done
at the interface level, and the appropriate computation performed, or
method overloads defined, for all implementations.

The "accessor" methods refer to fields that were set by the contructor;
e.g. for "CauchyDistribution", "median" and "scale".
In this case, it happens that "mode" has the same value as "median",
but does this warrant an additional method?

Regards,
Gilles

> Thanks

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [rng] Copying samplers

2019-05-09 Thread Alex Herbert
On Sat, 4 May 2019 at 23:52, Alex Herbert  wrote:

>
>
> > On 4 May 2019, at 22:34, Gilles Sadowski  wrote:
> >
> > Hi.
> >
> > Le sam. 4 mai 2019 à 21:31, Alex Herbert  a
> écrit :
> >>
> >>
> >>
> >>> On 4 May 2019, at 14:46, Gilles Sadowski  wrote:
> >>>
> >>> Hello.
> >>>
> >>> Le ven. 3 mai 2019 à 16:57, Alex Herbert  > a écrit :
> 
>  Most of the samplers in the library have very small states that are
> easy
>  to compute. Some have computations that are more expensive, such as
> the
>  LargeMeanPoissonSampler or the DiscreteProbabilityCollectionSampler.
> 
>  However once the state is computed the only part of the state that
>  changes is the RNG. I would like to suggest a way to copy samplers as
>  something like:
> 
>  DiscreteSampler newInstance(UniformRandomProvider)
> 
>  The new instance would share all the private state of the first
> sampler
>  except the RNG. This can be used for multi-threaded applications which
>  require a new sampler per thread but sample from the same
> distribution.
> 
>  A particular case in point is the as yet not integrated
>  MarsagliaTsangWangSmallMeanPoissonSampler (see RNG-91 [1]) which has a
>  "large" state [2] that takes a "long" time [3] to compute but is
>  effectively immutable. This could be shared across instances saving
>  memory for parallel application.
> 
>  A copy instance would be almost zero set-up time and provide
> opportunity
>  for caching of commonly used samplers.
> >>>
> >>> The goal is sharing (immutable) state so it seems that the semantics is
> >>> not "copy".
> >>>
> >>> Isn't it a "factory" that we are after?  E.g. something like:
> >>> public final class CachedSamplingFactory {
> >>>   private static PoissonSamplerCache poisson = new
> PoissonSamplerCache();
> >>>
> >>>   public PoissonSampler createPoissonSampler(UniformRandomProvider
> >>> rng, double mean) {
> >>>   if (!poisson.isCached(mean)) {
> >>>   poisson.createCache(mean); // Initialize (requires
> >>> synchronization) ...
> >>>   }
> >>>   return new PoissonSampler(poisson.getCache(mean), rng); //
> >>> Construct using pre-built state.
> >>>   }
> >>> }
> >>> [It may be overkill, more work, and less performant…]
> >>
> >> But you need a factory for every class you want to share state for. And
> the factory actually has to look in a cache. If you operate on an instance
> then you get what you want. Another working version of the same sampler. It
> would also be thread safe without synchronisation as long as the state is
> immutable. The only mutable state is the passed in RNG.
> >
> > Agreed.  It was what I meant by the last sentence.
> >
> >>>
> >>> IIUC, you suggest to add "newInstance" in the "DiscreatSampler"
> interface (?).
> >>
> >> I did think of extending DiscreteSampler with this functionality. Not
> adding to the interface as it currently is ‘functional’ as it has only one
> method. I think that should not change. Having thought about it a bit more
> I like the idea of a new functional interface. Perhaps:
> >>
> >> interface DiscreteSamplerProvider {
> >>DiscreteSampler create(UniformRandomProvider rng);
> >> }
> >>
> >> Substitute ‘Provider’ for:
> >>
> >> - Generator
> >> - Supplier (possible clash or alignment with Java 8 depending on the
> way it is done)
> >> - Factory (though the method is not static so I do not like this)
> >> - etc
> >>
> >> So this then becomes a functional interface that can be used by
> anything. However instances of a sampler would be expected to return a
> sampler matching their own functionality.
> >>
> >> Note there are some samplers not implementing an interface that also
> could benefit from this. Namely CollectionSampler and
> DiscreteProbabilityCollectionSampler. So does this need a generic interface:
> >>
> >> Sampler {
> >>T sample();
> >> }
> >>
> >> To be complimented with:
> >>
> >> SamplerProvider {
> >>Sampler create(UniformRandomProvider rng);
> >> }
> >>
> >> So the library would require:
> >>
> >> SamplerProvider
> >> DiscreteSamplerProvider
> >> ContinuousSamplerProvider
> >>
> >> Any sampler can choose to implement being a Provider. There are some
> cases where it is mute. For example a ZigguratNormalizedGaussianSampler
> just stores the rng in the constructor. However it could still be a
> Provider just the method would only call the constructor. It would allow
> writing a generic multi-threaded application that just uses e.g. a
> DiscreteSamplerProvider to create samplers for each thread. You can then
> drop in the actual implementation you require. For example you could swap
> the type of PoissonSampler in your simulation by swapping the provider for
> the Poisson distribution.
> >>
> >> How does that sound?
> >
> > Fine to have
> >  DiscreteSamplerProvider
> >  ContinuousSamplerProvider
> > [Perhaps the "Supplier" suffix would be better to avoid confusion 

Re: [rng] Utility for creating permutations of hex digits

2019-05-09 Thread Alex Herbert


> On 9 May 2019, at 12:58, Gilles Sadowski  wrote:
> 
> Hi.
> 
> Le jeu. 9 mai 2019 à 13:31, Alex Herbert  > a écrit :
>> 
>> The Middle Square Weyl Sequence (MSWS) generator uses an internal Weyl 
>> sequence [1] to create randomness. This is basically a linear increment 
>> added to a sum that will eventually wrap (due to overflow) to restart at the 
>> beginning. The MSWS paper recommends an increment with a high number of 
>> different bits set in a random pattern across the 64-bit of the long. The 
>> paper recommends using a permutation of 8 from the 16 hex digits for the 
>> upper and lower 32-bits.
>> 
>> The source code for the MSWS provides a routine that generates a 
>> permutation. Unfortunately:
>> 
>> - The code is GPL 3 so restricting it from use under the Apache licence 
>> (without jumping through some hoops)
>> - The algorithm is a simple rejection method that suffers from high 
>> rejection probability when approaching 8 digits already chosen
>> 
>> I have created an alternative faster implementation for use when seeding the 
>> MSWS generator. However it may be a function to be reused in other places.
>> 
>> The question is where to put this utility function. It requires a source of 
>> randomness to create the permutation. It has the following signature:
>> 
>> /**
>> * Creates an {@code int} containing a permutation of 8 hex digits chosen 
>> from 16.
>> *
>> * @param rng Source of randomness.
>> * @return Hex digit permutation.
>> */
>> public static int createIntHexPermutation(UniformRandomProvider rng);
>> 
>> Likewise:
>> 
>> /**
>> * Creates a {@code long} containing a permutation of 8 hex digits chosen 
>> from 16 in
>> * the upper and lower 32-bits.
>> *
>> * @param rng Source of randomness.
>> * @return Hex digit permutation.
>> */
>> public static long createLongHexPermutation(UniformRandomProvider rng);
>> 
>> Options:
>> 
>> - Put it as a package private function inside the MSWS generator to be used 
>> only when creating this generator. Package private allows unit testing the 
>> algorithm does provides the random permutation 16-choose-8
>> - Put it as a helper function in org.apache.commons.rng.core.util
> 
> - In "SeedFactory" (?).
> 
> For MSWS ("core" module), the increment would be an argument to the 
> constructor
> (allowing the user to shoot himself in the foot, like when passing a
> bad seed), and
> "RandomSource" ("simple" module) would offer to provide an instance
> for which the
> increment was computed according to the recommendation.


OK. That makes it easier to build the reference implementation in Core as it 
just matches the C reference code. I can add the seeding function to 
SeedFactory in the Simple module. So if a user passes anything to be used as 
the seed then it passes through unchanged (or converted). But if they do not 
provide a seed then it should be generated appropriately.

This means I should really get on with updating the RandomSourceInternal and 
ProviderBuilder (RNG 75 [1]). It currently does not support creating seeds 
based on the exact RandomSource. It just uses the native seed type of the 
RandomSource. Here are the current use cases that should be handled:

- MSWS recommends a seed with a permutation of hex digits.
- XorShiRo family of generators all require seeds with at least some non-zero 
elements.

My idea was to target this part of the ProviderBuilder createSeed method:

if (seed == null) {
// Create a random seed of the appropriate native type.

if (source.getSeed().equals(Integer.class)) {
nativeSeed = SeedFactory.createInt();
} else if (source.getSeed().equals(Long.class)) {
nativeSeed = SeedFactory.createLong();


To change it to:

if (seed == null) {
// Delegate to the source to create an appropriate seed (since it knows 
best)
return source.createSeed()

The RandomSourceInternal which already has knowledge of the native type of the 
seed for the generator will be extended to know the length for array seeds, and 
a default implementation of create for each native seed type allowing override 
for generators with specific requirements.

[1] https://issues.apache.org/jira/browse/RNG-75 


> 
> Regards,
> Gilles
> 
>> 
>> Note that the function is an alternative to that used by the 
>> SplittableRandom to create an increment for its own Weyl sequence. That uses 
>> a fast method that is prone to weak randomness in potential output.
>> 
>> If other methods will potentially be added to the helper class a more 
>> generic name should be used. Possibilities are:
>> 
>> PermutationUtils
>> SequenceUtils
>> IncrementUtils
>> SeedUtils
>> 
>> Given that the method is for seeding Weyl sequences then I am favouring 
>> SeedUtils.
>> 
>> 
>> [1] https://en.wikipedia.org/wiki/Weyl_sequence 
>>  
>> > 

Re: [rng] Utility for creating permutations of hex digits

2019-05-09 Thread Gilles Sadowski
Hi.

Le jeu. 9 mai 2019 à 13:31, Alex Herbert  a écrit :
>
> The Middle Square Weyl Sequence (MSWS) generator uses an internal Weyl 
> sequence [1] to create randomness. This is basically a linear increment added 
> to a sum that will eventually wrap (due to overflow) to restart at the 
> beginning. The MSWS paper recommends an increment with a high number of 
> different bits set in a random pattern across the 64-bit of the long. The 
> paper recommends using a permutation of 8 from the 16 hex digits for the 
> upper and lower 32-bits.
>
> The source code for the MSWS provides a routine that generates a permutation. 
> Unfortunately:
>
> - The code is GPL 3 so restricting it from use under the Apache licence 
> (without jumping through some hoops)
> - The algorithm is a simple rejection method that suffers from high rejection 
> probability when approaching 8 digits already chosen
>
> I have created an alternative faster implementation for use when seeding the 
> MSWS generator. However it may be a function to be reused in other places.
>
> The question is where to put this utility function. It requires a source of 
> randomness to create the permutation. It has the following signature:
>
> /**
>  * Creates an {@code int} containing a permutation of 8 hex digits chosen 
> from 16.
>  *
>  * @param rng Source of randomness.
>  * @return Hex digit permutation.
>  */
> public static int createIntHexPermutation(UniformRandomProvider rng);
>
> Likewise:
>
> /**
>  * Creates a {@code long} containing a permutation of 8 hex digits chosen 
> from 16 in
>  * the upper and lower 32-bits.
>  *
>  * @param rng Source of randomness.
>  * @return Hex digit permutation.
>  */
> public static long createLongHexPermutation(UniformRandomProvider rng);
>
> Options:
>
> - Put it as a package private function inside the MSWS generator to be used 
> only when creating this generator. Package private allows unit testing the 
> algorithm does provides the random permutation 16-choose-8
> - Put it as a helper function in org.apache.commons.rng.core.util

- In "SeedFactory" (?).

For MSWS ("core" module), the increment would be an argument to the constructor
(allowing the user to shoot himself in the foot, like when passing a
bad seed), and
"RandomSource" ("simple" module) would offer to provide an instance
for which the
increment was computed according to the recommendation.

Regards,
Gilles

>
> Note that the function is an alternative to that used by the SplittableRandom 
> to create an increment for its own Weyl sequence. That uses a fast method 
> that is prone to weak randomness in potential output.
>
> If other methods will potentially be added to the helper class a more generic 
> name should be used. Possibilities are:
>
> PermutationUtils
> SequenceUtils
> IncrementUtils
> SeedUtils
>
> Given that the method is for seeding Weyl sequences then I am favouring 
> SeedUtils.
>
>
> [1] https://en.wikipedia.org/wiki/Weyl_sequence 
> 

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



[rng] Utility for creating permutations of hex digits

2019-05-09 Thread Alex Herbert
The Middle Square Weyl Sequence (MSWS) generator uses an internal Weyl sequence 
[1] to create randomness. This is basically a linear increment added to a sum 
that will eventually wrap (due to overflow) to restart at the beginning. The 
MSWS paper recommends an increment with a high number of different bits set in 
a random pattern across the 64-bit of the long. The paper recommends using a 
permutation of 8 from the 16 hex digits for the upper and lower 32-bits.

The source code for the MSWS provides a routine that generates a permutation. 
Unfortunately:

- The code is GPL 3 so restricting it from use under the Apache licence 
(without jumping through some hoops)
- The algorithm is a simple rejection method that suffers from high rejection 
probability when approaching 8 digits already chosen

I have created an alternative faster implementation for use when seeding the 
MSWS generator. However it may be a function to be reused in other places.

The question is where to put this utility function. It requires a source of 
randomness to create the permutation. It has the following signature:

/**
 * Creates an {@code int} containing a permutation of 8 hex digits chosen from 
16.
 *
 * @param rng Source of randomness.
 * @return Hex digit permutation.
 */
public static int createIntHexPermutation(UniformRandomProvider rng);

Likewise:

/**
 * Creates a {@code long} containing a permutation of 8 hex digits chosen from 
16 in
 * the upper and lower 32-bits.
 *
 * @param rng Source of randomness.
 * @return Hex digit permutation.
 */
public static long createLongHexPermutation(UniformRandomProvider rng);

Options:

- Put it as a package private function inside the MSWS generator to be used 
only when creating this generator. Package private allows unit testing the 
algorithm does provides the random permutation 16-choose-8
- Put it as a helper function in org.apache.commons.rng.core.util

Note that the function is an alternative to that used by the SplittableRandom 
to create an increment for its own Weyl sequence. That uses a fast method that 
is prone to weak randomness in potential output.

If other methods will potentially be added to the helper class a more generic 
name should be used. Possibilities are:

PermutationUtils
SequenceUtils
IncrementUtils
SeedUtils

Given that the method is for seeding Weyl sequences then I am favouring 
SeedUtils.


[1] https://en.wikipedia.org/wiki/Weyl_sequence 


Re: [STATISTICS][Regression][Linear Math] Is there any plan/anyone working on a new Linear Math module currently?

2019-05-09 Thread Gilles Sadowski
Hi.

Le mer. 8 mai 2019 à 23:59, Eric Barnhill  a écrit :
>
> It looks to me like the EJML library is the best choice for linear algebra

https://lessthanoptimal.github.io/Java-Matrix-Benchmark/runtime/2019_02_i53570/

> right now, is well supported, and we should not reinvent the wheel

+1

> unless
> we have the motivation and expertise to do so.

Quite unlikely to be done in time for it to be useful to the GSoC assignment.

>
> EJML is under the Apache 2.0 license which I read to mean we can use it in
> any derivative way we please so long as (and this would be true regardless
> if the license requires it IMO) we attribute the source.
>
> So as a default plan I would shade these libraries within the regression
> module,

+1

It may be prudent to delineate an interface between "Commons" and
the linear algebra functionalities providers (cf. list in the above link),
so that we can switch from one to another and analyze the impact of
doing so.

Regards,
Gilles

> with thanks and attribution to the EJML site and org.
>
>
> On Wed, May 8, 2019 at 2:49 PM Rob Tompkins  wrote:
>
> >
> >
> > > On May 8, 2019, at 4:37 PM, Ben Nguyen  wrote:
> > >
> > > Hello,
> > >
> > > The regression module will require a lot of linear math, specifically
> > matrix operations which I’ve heard is outdated. Are there any updates on
> > it’s development? Is this someone’s GSoC project? If not I could try to
> > help by attempting to start porting regression essential operations. But
> > the dependencies for the current library is vast so this would end up being
> > a large endeavor and I know I am not one to properly design a linear math
> > library, I only know the basics, it would probably become a mess. So if
> > there is no current development plan I fear I might have to start by using
> > the old library for now until linear’s development kicks in…. Is this okay?
> > >
> >
> > I suppose the question is: what is commons-numbers, and if a matrix is a
> > “number” or it is sufficiently different to warrant a separate component.
> >
> > It is worth noting that in there have been past arguments over additional
> > math components before we get 1.0 releases for the current ones in flight
> > (but I feel like the fastest route to any component’s 1.0 should take
> > priority).
> >
> > What are other folks’ thoughts here? I would think that linear algebra
> > would likely be a widely used library as it’s fairly fundamental to a
> > collection of machine learning algorithms as they are based in least
> > squares.
> >
> > -Rob
> >
> > > Thank you,
> > > Ben
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> > For additional commands, e-mail: dev-h...@commons.apache.org
> >
> >

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



[statistics] Mode function for Cauchy distribution

2019-05-09 Thread Udit Arora
I intend to add a mode function for the Cauchy Distribution. It is a small
addition which i thought might be helpful.
Thanks